Archive for June, 2006

关键字密度查询

Keyword Density Checker

Find out what keywords are popular on your pages. The keywords will be shown in a visual way .

Comments

Nofollow Attribute

如果有些链接内容不想被spider抓取,可以用 <#a rel="nofollow" xhref="http://www.site.com/page.html" mce_href="http://www.site.com/page.html" #>Visit My Page<#/a#> , 这个对于comment spam 会很有用。

Nofollow Attribute

Comments

再谈search engine与内容重复的问题

There are a number of reasons why pages don’t show up in search engine results.

One area where this is particularly true is when the content at more than one web address, or URL, appears to be substantially similar at each of the locations it is seen by the search engines.

Some duplication of content may mean that pages are filtered at the time of serving of results by search engines, and there is no guarantee as to which version of a page will show in results and which versions won’t. Duplication of content may also mean that some sites and some pages aren’t indexed by search engines at all, or that a search engine crawling program will stop indexing all of the pages of a site because it finds too many copies of the same pages under different URLs.

There are a few different reasons why search engines dislike duplicated content. One is that they don’t want to show the same pages in their search results. Another is that they don’t want to spend the resources in indexing pages that are substantially similar.

I’ve listed some areas where duplicate content exists on the web, or seems to exist from the stance of search engine crawling and indexing programs. I’ve also included a list of some patents and some papers that discuss duplicate content issues on the web.

Where search engines see duplicate content

1. Product descriptions from manufacturers, publishers, and producers reproduced by a number of different distributors in large ecommerce sites

When more than one site sells the same products, they often use text from the manufacturer or producer of the product as a product description on their pages. Add to that the fact that the name of product and the name of the creator, manufacturer, writer, or recording artist may also be on the page, there may be a considerable amount of content showing up on the web on pages that aren’t related to each other but offer the same products.

2. Alternative print pages

Many sites offer the same content on different pages that may be formatted for printers. If the site owner doesn’t use robots.txt disallow statement or a meta “noindex” tag on these pages to keep search engines from indexing them, they may appear in search engine indexes.

3. Pages that reproduce syndicated RSS feeds through a server side script

When RSS feeds from sites are shown on other pages in addition to the pages of the site where they originally appear, and that text is displayed using a server side include that presents the information as html on the pages, then it could appear as duplicate content on those other pages. When feeds are shown using client side includes, such as java script, there is much less likelihood that a search engine will pick up that content and index it.

4. Canonicalization issues, where a search engine may see the same page as different pages with different URLs

Because search engines index URLs rather than pages, it’s possible for them to index the same pages that is presented different ways. A “canonical URL” is one that is determined to be the “best” URL for a page, but search engines don’t always recognize that the same page is being presented multiple ways. For example, the following URLs may all point to the same page:

http://www.example.com
https://www.example.com
http://www.example.com/index.htm
https://www.example.com/index.htm
http://example.com
https://example.com
http://example.com/index.htm
https://example.com/index.htm

5. Pages that serve session IDs to search engines, so that they try to crawl and index the same page under different URLs

Some sites serve information in their URLs to track visitors as they go through the pages of a site. If this type of tracking information is provided to search engine crawling programs, then those programs may index the same page under different URLs, repeatedly. See, for instance, http://www.sears.com

As the Google Webmaster guidelines tell us:

Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.

6. Pages that serve multiple data variables through URLs, so that they crawl and index the same page under different URLs

Some sites show different data variables in their URLs. In this instance, an example shows this well:

http://www3.jcpenney.com/jcp/Products.aspx?
DeptID=469
&CatID=29841
&CatTyp=DEP
&ItemTyp=G
&GrpTyp=SIZ
&ItemID=0e273be
&ProdSeq=2
&Cat=tees+%26+tanks
&Dep=
&PCat=
&PCatID=28237
&RefPage=ProductList
&Sale=
&ProdCount=32
&RecPtr=
&ShowMenu=
&TTYP=
&ShopBy=0
&RefPageName=CategoryAll.aspx
&RefCatID=28237
&RefDeptID=469
&Page=1
&CmCatId=469|28237|29841

It’s possible for a search engine to try to index the page above with all of those data variables in different orders.

7. Pages that share too many common elements, or where those are very similar from one page to another, including title, meta descriptions, headings, navigation, and text that is shared globally.

This is a frequent problem for large ecommerce sites that insist on having their brand name, and information about that brand in every title on every page of their site, and use content management systems that don’t allow them to have distinct meta description tags for each page of their site.

8. Copyright infringement

When someone duplicates the content on your site, it may cause your pages to be filtered out of search results. A site like copyscape may help you find some of these pages. Searching for unique strings of text on your pages, in quotation marks, may help uncover others.

9. Use of the same or very similar pages on different subdomains or different country top level domains (TLDs)

Using different subdomains and different top level domains for the pages of your organization may be a nice way to create different brands, or focus upon different kinds of content, services, or products. But duplicating content from one to another may create the risk that some of your pages don’t get indexed by search engines, or are filtered out of search results. Again, from the Google Webmaster Guidelines:

Don’t create multiple pages, subdomains, or domains with substantially duplicate content.

10. Article syndication
Many people create articles, and offer them to others as long as a link and attribution to the original source is made. The risk here is that the search engines may filter out the original article and show one of the syndicated copies.

11. Mirrored sites
Mirrors of sites used to be very popular, for when a site became so busy that people could use an alternative source to get to the same information or content. Larger sites that might have used mirrored sites in the past often use muliple servers and load balancing these days, but mirrors do still exist (and the wikipedia has a nice article about mirrors explaining why). Search engines may be able to recognize duplicated URL structures of mirrored sites, and may ignore some mirrored sites that they find.

Comments

inbound link

A link coming from an external source into a web site. A search engine listing to your site would be considered an inbound link. These links increase traffic and increase your site popularity according to search engines.

Comments (1)

26步建设网站

翻译原文: The A to Z Guide to Getting Website Traffic

A)关键字研究
在做任何举动之前,先用关键字工具来做一个广泛的调查,看哪些关键字/词适用于你的站点。你的直接竞争对手用的是什么关键词?是不是有一些有市场潜能的关键字你还没有发现?或许你能从中开拓出另一片崭新的境地?总之: 一切从关键字出发….
B)域名
如果你希望你的公司名字打出品牌来,那就选一个准确反映它的域名吧。如果你的公司叫Kawunga,那就注册www.kawunga.com,如果不幸地已被注册,那就用www.kawunga-widgets.com。不要用下划线。

C)避免沙盒测试
在你确定关键词和公司名称之后,尽早买下你的域名。尽快的把站点设置弄起来,放上一个简单的页面告诉别人你是谁,做什么的,以及网站详细内容会马上出来云云。保证它被Google和Yahoo!抓取到。(可以提交或者从其他站点连接到该站)

D)创建内容
为你的站点制作30页真实的原创的内容。这会令蜘蛛有东西可取,这也可以让你有机会从搜索引擎结果中了解更多的关键词。

E)站点设计
坚持“保持简单”的方针。调用一个外部CSS文件,清除页面上的Java Script代码,改用一个外部文件来调用它们,不要使用框架,以使用图片的方式使用Flash,千万不要做成整站Flash。不要把站点弄的花里胡哨, 让浏览者觉得杂乱。保持整洁、简单。让浏览者可轻易找到需要的东西而不需到处张望。

F)页面大小
你的页面占的KB越小越好——特别是首页。优化那些图片,使页面载入更快。西方的大多数人和企业的网络速度都是很快的,但是在其他国家或者用移动电话可能就不是。如果你的站点载入缓慢,可能它在被访问到之前就已经失去了访问者。

G)可用性
确保你的站点遵循一些可用性规则。记住,人们在其他站点花的时间会更多,所以不要挑战设计习惯。不要使用PDF文件作为在线阅读。对已访问连接使用不同颜色,用好大小标题。查阅更多关于可用性的小知识,它们会让你受益不少。

H) 站点优化
在标题中使用你选取的关键词(最重要),然后大小标题和文字中也尽量包含。 保证你的页面/内容是关于这些关键词的。如果你要卖小玩意,就写关于小玩意的内容。不要只是在文字中重复小玩意这个词。

I) 通站连接
通站连接是指在每一页面都相同的连接。 它们帮助新到访者不至于迷路。 有时它们在页面左侧,有时在顶部以小块出现。 有时它们也会在页面底部。确保你的每个页面都有老式风格的通站连接。我通常在页面顶部放上小块式,在底部放上文字式。看什么方式最适合你用了。

J) 标头
使用加粗标头。人们上网常常只是扫视不会仔细读。所以基本上,他们都能看到的就是标头了。 如你的标头不够吸引眼球,人们可能不会在内容上持久停留。如果可以,适量使用关键词。

K) 站点地图
创建一个站点地图,包含站内所有的页面连接。保持更新。它帮助蜘蛛进入每个页面。在主页面放一个指向站点地图的文字连接。

L) 内容
每2~3天添加一个新页,200~500词。内容要原创,不要复制他人。内容越是原创、有用,就会有越多人阅读它,连接到它,最重要的是–以后还会再来。

M) 优化有度
远离过度优化技术。过度优化指不择手段提高排行,使用搜索引擎不认同的方式,如关键字堆叠,门页,隐形文字,什么的。为长久考虑,优化要有度。作弊优化的 站点通常最求短期效果,如色情,赌博和黑市。(看看你收到过的垃圾邮件就知道了。)这些作弊站点一般就刚刚足够一次活动周期。

N) 竞争分析
谁在和你争夺市场?到Yahoo用“link:”查查看。象这样:link:http://www.yourdomain.com 学习你的竞争者,也去取得那些网站的连接。最好能取代他们!

O) 提交
提交到5组目录:

1,Dmoz.org和Yahoo(本地的,比如Yahoo.co.uk, Yahoo.ca等等,如果可以的话)
2,找到你的行业目录并不加入他们。如果必须付费,也要看价格是否合理。
3,所在国家或地区的本地目录。
4,其他任何合适的目录。
5,如果你针对当地市场,则一定要加入黄页。(因为搜索引擎会使用这些列表来运作本地搜索。)

P) 博客
建立一个关于你的行业的博客并至少每周写一篇新文章。 允许访客留言评论,或者甚至让他们自己在上面写文章。这可以创造更多内容,并使人们记得再回来看有什么新东西。

Q)外部连接
只需将你的网站提交到合适的站点,请他们也连向你,告诉他们这会令他们的访客收益。但不要在这方便花太多时间,如果你有原创的好内容,他们自然会找到你并 连向你。记住,如果“内容为王”,那么“连接是后”。远离互惠连接、连接农场、鬼连接以及其他任何非自然的连接。他们不一定会伤害你,但是Google跟 踪这一切:你什么时候获得了一个连接、多长时间获得一个连接,谁连着现在连向你的网站,你在哪里住,早餐吃的什么,等等。(有点玩笑,但确实是像那样的)

R)统计
保证你的服务器有个好的统计程序,并好好利用它。如果不能用到好的,就尽可能付费得到一个。如果不知道你的访客的信息,象谁在访问、来自哪些地方、造访频率如何等等,你会因为不具备这些基本的工具而在如何改进网站方面感到困惑。

S)付费点击(PPC)
注册Google AdWords与Yahoo Search Marketing。花一些钱做广告吸引人们到你的网站上来,也可以用它来塑造品牌。PPC会为你的网站固带来固定的客流,当然也会给你的潜在客户更多深 入了解的机会。你并不必须做到No.1,甚至不必须做到No.5……只要保证你的广告在你的关键词下面排在搜索结果的第一页就行,注意广告的花费要合理。

T)向前看
在你的市场方向上保持关注最新动态。如果一个新产品会在下一季出来,现在就在网站写一写。先发制人,搜索引擎和连接者都会奖赏你。

U)文章
每周写一篇文章,然后到尽可能多的在线刊物去发布。(附上连接,连回到你的网站)在你自己的网站上也放上这些文章。这不仅会给你的网站创造很多连接,也会促使人们点击到你的网站,最重要的是,在你的访客眼中你会是一个专家!他们也许会开始搜索你的名字来找你的网站呢。

V)研究你的流量
在30到90天后你就会有足够的网站统计数据来做分析了。先用下面这些问题把数据过一遍:

- 你的访客来自哪里?
- 他们用哪些搜索引擎?
- 他们查询什么字、词组?
- 你网站上的哪些页面被访问得最多?
- 你网站上的入口页面是哪些?
- 退出页面呢?
- 访客按照怎样的路径浏览你的网站?

把得到的结果提炼一下,精简你的网站:

- 用最受欢迎的页面鼓励访客过来,为你创造价值。
- 调整你认为最合适的浏览路径。
- 弄明白为何他们从那些退出页面退出。

然后,根据访客来访所用的关键词,细微的调整你的关键词设置。如果你的目标词是“green widgets”,而你的访客通过搜寻“green leather widgets”找到你,那你就应该开始创造一个“leather widgets”的内容了。

W)确认你的提交
3-4个月后,确认一下你是不是已经当初提交的目录所收录,象Dmoz.org等等。如果没有,再提交一次,或者更好的办法是,客气地给编辑写一个email询问原因。最后,发现新的目录,值得提交的话,不要放过。

X)RSS feeds
RSS (Real Simple Syndication或Rich Site Summary) 正成为因特网营销者的有力工具。你可以又简单又快速的给网站添加新内容。文章的feed时常更新,所以你可以跟你的访客(以及搜索引擎)他们需要的东西 ——新鲜内容!你可以用RSS来推广任何新内容,比如新的页面、文章、blog、新闻稿,等等。

Y)新闻稿
新闻稿是指你提交给媒体机构(报纸、广播、电视、杂志)的书面交流信息,通常用来发布一些有新闻价值的公告。你可以创建任何新的文章、公司信息、产品信息 的新闻稿。如果它足够吸引人或者原创,媒体机构的工作人员会将它收录进去,写一份文章报道。有时,可能你的网站地址在纽约时报上被公布了,而你还不知道 呢。

Z)保持新鲜内容
记住,每2-3天写一个新页面。我只是简单的这样提一下,但其实这可能是这整片文章中最关键的一点。不停的写!没有新鲜内容,你的网站会在搜索引擎中慢慢落下来。要保持第一,你的网站一定要有你那个领域里最快更新、最新鲜最吸引人的原创内容。

遵循以上26条简单的步骤,我确信,一年之内你的网站必定取得成功。你会获得所在行业的巨大流量、亲眼看到你的事业蒸蒸日上!
所以,开始写吧,把你自己写到第一!

关于作者:
Shawn Campbell是电子商务的活跃分子,参与创立了Red Carpet Web Promotion, Inc.。从1998年起,他就致力于研究和开发市场策略,达到在搜索引擎中的显著效果。Shawn是搜索引擎优化(SEO)领域的先驱之一。

编译:来源:http://www.flaviensun.com/weblog

Comments

Duplicate Content Filter: What it is and how it works

Duplicate Content has become a huge topic of discussion lately, thanks to the new filters that search engines have implemented. This article will help you understand why you might be caught in the filter, and ways to avoid it. We’ll also show you how you can determine if your pages have duplicate content, and what to do to fix it.

Search engine spam is any deceitful attempts to deliberately trick the search engine into returning inappropriate, redundant, or poor-quality search results. Many times this behavior is seen in pages that are exact replicas of other pages which are created to receive better results in the search engine. Many people assume that creating multiple or similar copies of the same page will either increase their chances of getting listed in search engines or help them get multiple listings, due to the presence of more keywords.

In order to make a search more relevant to a user, search engines use a filter that removes the duplicate content pages from the search results, and the spam along with it. Unfortunately, good, hardworking webmasters have fallen prey to the filters imposed by the search engines that remove duplicate content. It is those webmasters who unknowingly spam the search engines, when there are some things they can do to avoid being filtered out. In order for you to truly understand the concepts you can implement to avoid the duplicate content filter, you need to know how this filter works.

First, we must understand that the term “duplicate content penalty” is actually a misnomer. When we refer to penalties in search engine rankings, we are actually talking about points that are deducted from a page in order to come to an overall relevancy score. But in reality, duplicate content pages are not penalized. Rather they are simply filtered, the way you would use a sieve to remove unwanted particles. Sometimes, “good particles” are accidentally filtered out.

Knowing the difference between the filter and the penalty, you can now understand how a search engine determines what duplicate content is. There are basically four types of duplicate content that are filtered out:

  1. Websites with Identical Pages - These pages are considered duplicate, as well as websites that are identical to another website on the Internet are also considered to be spam. Affiliate sites with the same look and feel which contain identical content, for example, are especially vulnerable to a duplicate content filter. Another example would be a website with doorway pages. Many times, these doorways are skewed versions of landing pages. However, these landing pages are identical to other landing pages. Generally, doorway pages are intended to be used to spam the search engines in order to manipulate search engine results.
  2. Scraped Content - Scraped content is taking content from a web site and repackaging it to make it look different, but in essence it is nothing more than a duplicate page. With the popularity of blogs on the internet and the syndication of those blogs, scraping is becoming more of a problem for search engines.
  3. E-Commerce Product Descriptions - Many eCommerce sites out there use the manufacturer’s descriptions for the products, which hundreds or thousands of other eCommerce stores in the same competitive markets are using too. This duplicate content, while harder to spot, is still considered spam.
  4. Distribution of Articles - If you publish an article, and it gets copied and put all over the Internet, this is good, right? Not necessarily for all the sites that feature the same article. This type of duplicate content can be tricky, because even though Yahoo and MSN determine the source of the original article and deems it most relevant in search results, other search engines like Google may not, according to some experts.

So, how does a search engine’s duplicate content filter work? Essentially, when a search engine robot crawls a website, it reads the pages, and stores the information in its database. Then, it compares its findings to other information it has in its database. Depending upon a few factors, such as the overall relevancy score of a website, it then determines which are duplicate content, and then filters out the pages or the websites that qualify as spam. Unfortunately, if your pages are not spam, but have enough similar content, they may still be regarded as spam.

There are several things you can do to avoid the duplicate content filter. First, you must be able to check your pages for duplicate content. Using our Similar Page Checker, you will be able to determine similarity between two pages and make them as unique as possible. By entering the URLs of two pages, this tool will compare those pages, and point out how they are similar so that you can make them unique.

Since you need to know which sites might have copied your site or pages, you will need some help. We recommend using a tool that searches for copies of your page on the Internet: www.copyscape.com. Here, you can put in your web page URL to find replicas of your page on the Internet. This can help you create unique content, or even address the issue of someone “borrowing” your content without your permission.

Let’s look at the issue regarding some search engines possibly not considering the source of the original content from distributed articles. Remember, some search engines, like Google, use link popularity to determine the most relevant results. Continue to build your link popularity, while using tools like www.copyscape.com to find how many other sites have the same article, and if allowed by the author, you may be able to alter the article as to make the content unique.

If you use distributed articles for your content, consider how relevant the article is to your overall web page and then to the site as a whole. Sometimes, simply adding your own commentary to the articles can be enough to avoid the duplicate content filter; the Similar Page Checker could help you make your content unique. Further, the more relevant articles you can add to compliment the first article, the better. Search engines look at the entire web page and its relationship to the whole site, so as long as you aren’t exactly copying someone’s pages, you should be fine.

If you have an eCommerce site, you should write original descriptions for your products. This can be hard to do if you have many products, but it really is necessary if you wish to avoid the duplicate content filter. Here’s another example why using the Similar Page Checker is a great idea. It can tell you how you can change your descriptions so as to have unique and original content for your site. This also works well for scraped content also. Many scraped content sites offer news. With the Similar Page Checker, you can easily determine where the news content is similar, and then change it to make it unique.

Do not rely on an affiliate site which is identical to other sites or create identical doorway pages. These types of behaviors are not only filtered out immediately as spam, but there is generally no comparison of the page to the site as a whole if another site or page is found as duplicate, and get your entire site in trouble.

The duplicate content filter is sometimes hard on sites that don’t intend to spam the search engines. But it is ultimately up to you to help the search engines determine that your site is as unique as possible. By using the tools in this article to eliminate as much duplicate content as you can, you’ll help keep your site original and fresh.

Comments

Search Engine Spider Simulator

Comments

Google的中国难题

Comments

扎针

Anti-GFW

Comments

google 新闻源

SEO workshop时提到怎么成为google news的新闻源问题,模糊记得说是可以自己去提交…

现在找到答案了,原来在这里: http://www.google.com/support/news/bin/request.py

讨论: How to add my news site in google news

Comments

« Previous entries