Archive for SEO

服务器 5xx error 错误

用上 google site map 插件已经几天了,今天查了一下状态是居然是: Error , 真是晕菜。

Error Detail
5xx error Network unreachable

不明白什么是 5xx error , google之: URL unreachable /5xx error

See RFC 2616 for a complete list of these status codes. Likely reasons for this error are an internal server error or a server busy error. If the server is busy, it may have returned an overloaded status to ask the Googlebot to crawl the site more slowly. In this case, we’ll return again later to crawl additional pages.

原来是web server在google spider来的时候down了。 还好, 等待下次crawl。

Comments

再谈search engine与内容重复的问题

There are a number of reasons why pages don’t show up in search engine results.

One area where this is particularly true is when the content at more than one web address, or URL, appears to be substantially similar at each of the locations it is seen by the search engines.

Some duplication of content may mean that pages are filtered at the time of serving of results by search engines, and there is no guarantee as to which version of a page will show in results and which versions won’t. Duplication of content may also mean that some sites and some pages aren’t indexed by search engines at all, or that a search engine crawling program will stop indexing all of the pages of a site because it finds too many copies of the same pages under different URLs.

There are a few different reasons why search engines dislike duplicated content. One is that they don’t want to show the same pages in their search results. Another is that they don’t want to spend the resources in indexing pages that are substantially similar.

I’ve listed some areas where duplicate content exists on the web, or seems to exist from the stance of search engine crawling and indexing programs. I’ve also included a list of some patents and some papers that discuss duplicate content issues on the web.

Where search engines see duplicate content

1. Product descriptions from manufacturers, publishers, and producers reproduced by a number of different distributors in large ecommerce sites

When more than one site sells the same products, they often use text from the manufacturer or producer of the product as a product description on their pages. Add to that the fact that the name of product and the name of the creator, manufacturer, writer, or recording artist may also be on the page, there may be a considerable amount of content showing up on the web on pages that aren’t related to each other but offer the same products.

2. Alternative print pages

Many sites offer the same content on different pages that may be formatted for printers. If the site owner doesn’t use robots.txt disallow statement or a meta “noindex” tag on these pages to keep search engines from indexing them, they may appear in search engine indexes.

3. Pages that reproduce syndicated RSS feeds through a server side script

When RSS feeds from sites are shown on other pages in addition to the pages of the site where they originally appear, and that text is displayed using a server side include that presents the information as html on the pages, then it could appear as duplicate content on those other pages. When feeds are shown using client side includes, such as java script, there is much less likelihood that a search engine will pick up that content and index it.

4. Canonicalization issues, where a search engine may see the same page as different pages with different URLs

Because search engines index URLs rather than pages, it’s possible for them to index the same pages that is presented different ways. A “canonical URL” is one that is determined to be the “best” URL for a page, but search engines don’t always recognize that the same page is being presented multiple ways. For example, the following URLs may all point to the same page:

http://www.example.com
https://www.example.com
http://www.example.com/index.htm
https://www.example.com/index.htm
http://example.com
https://example.com
http://example.com/index.htm
https://example.com/index.htm

5. Pages that serve session IDs to search engines, so that they try to crawl and index the same page under different URLs

Some sites serve information in their URLs to track visitors as they go through the pages of a site. If this type of tracking information is provided to search engine crawling programs, then those programs may index the same page under different URLs, repeatedly. See, for instance, http://www.sears.com

As the Google Webmaster guidelines tell us:

Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.

6. Pages that serve multiple data variables through URLs, so that they crawl and index the same page under different URLs

Some sites show different data variables in their URLs. In this instance, an example shows this well:

http://www3.jcpenney.com/jcp/Products.aspx?
DeptID=469
&CatID=29841
&CatTyp=DEP
&ItemTyp=G
&GrpTyp=SIZ
&ItemID=0e273be
&ProdSeq=2
&Cat=tees+%26+tanks
&Dep=
&PCat=
&PCatID=28237
&RefPage=ProductList
&Sale=
&ProdCount=32
&RecPtr=
&ShowMenu=
&TTYP=
&ShopBy=0
&RefPageName=CategoryAll.aspx
&RefCatID=28237
&RefDeptID=469
&Page=1
&CmCatId=469|28237|29841

It’s possible for a search engine to try to index the page above with all of those data variables in different orders.

7. Pages that share too many common elements, or where those are very similar from one page to another, including title, meta descriptions, headings, navigation, and text that is shared globally.

This is a frequent problem for large ecommerce sites that insist on having their brand name, and information about that brand in every title on every page of their site, and use content management systems that don’t allow them to have distinct meta description tags for each page of their site.

8. Copyright infringement

When someone duplicates the content on your site, it may cause your pages to be filtered out of search results. A site like copyscape may help you find some of these pages. Searching for unique strings of text on your pages, in quotation marks, may help uncover others.

9. Use of the same or very similar pages on different subdomains or different country top level domains (TLDs)

Using different subdomains and different top level domains for the pages of your organization may be a nice way to create different brands, or focus upon different kinds of content, services, or products. But duplicating content from one to another may create the risk that some of your pages don’t get indexed by search engines, or are filtered out of search results. Again, from the Google Webmaster Guidelines:

Don’t create multiple pages, subdomains, or domains with substantially duplicate content.

10. Article syndication
Many people create articles, and offer them to others as long as a link and attribution to the original source is made. The risk here is that the search engines may filter out the original article and show one of the syndicated copies.

11. Mirrored sites
Mirrors of sites used to be very popular, for when a site became so busy that people could use an alternative source to get to the same information or content. Larger sites that might have used mirrored sites in the past often use muliple servers and load balancing these days, but mirrors do still exist (and the wikipedia has a nice article about mirrors explaining why). Search engines may be able to recognize duplicated URL structures of mirrored sites, and may ignore some mirrored sites that they find.

Comments

Bounces

Bounces is the number of times visitors exited from the entrance page without visiting any other pages on your site (immediate exits). Bounce Rate is Bounces divided by Entrances.

Comments

SEO Tips

Keywords should be used in title tag, meta tags, headlines at the top of the page, relevant incoming and outgoing links, alt tags,etc.

Keywords should be used as first word in title tag.

Keywords should be bolded near the top of the page.

Keywords should be italicized near the top of the page.

Keyword should be sprinkled throughout paragraphs.

Make the main page attractive to sell the idea of moving forward into the site.

Main index page should be designed to give the overall theme of the site.

Link to relevant site/pages inside a site.

Use appropriate anchor text for the link.

Link to site that you can trust having no spammy text.

One-way link are better, if the link is to your site.

Linking to high PR in your field may be beneficial.

Definitions of SEO on the Web

Comments

About SEO category

Search engine optimization (SEO) is a process of promoting your business online with search engines for better visibility & rankings.

The term used to describe the marketing technique of preparing a website to enhance its chances of being ranked in the top results of a search engine once a relevant search is undertaken. A number of factors are important when optimising a website, including the content and structure of the website’s copy and page layout, the HTML meta-tags and the submission process.

Comments