Bloggers Guide To Using Robotstxt And Robots Meta Tags To Optimise Indexing

Posted on July 22, 2009

This post describes how to use a combination of robots.txt and robots meta tags to make sure that your Wordpress blog does not leak pagerank or authority.

Search engine robots want your Wordpress blog content. They are programmed to crawl your site, look at everything and report back to the Master Indexer with their findings. The Master Indexer then makes sure that your content can be found. However there are some things that robots in their relentless content crunching march should not have access to. For example the indexing of duplicate content on your blog can lead to the dilution of your blogs authority.

This post addresses this and other related problems by outlining how you can ensure that all your relevant content is crawled and indexed, and at the same time access to non-relevant areas of your blog are restricted. Robots are dumb they just follow the Master Indexers orders. When they reach your site they are programmed to crawl everything. However with the right tools you can re-program them to only crawl the parts of your site that should indexed.

Why The Robots Must Be Controlled?

We all want the search engines like Google to index our fantastic and unique blog pages, and drive hordes of targetted traffic to them. However there are some non content focussed pages and directories that we don”t want crawled or indexed.

What Parts Of Our Blog Do We Not Want Crawled By The Search Engines?

The main parts of our blog that we don”t want crawled are:

  • the Wordpress installation directories, and
  • ny potential duplicate content like archives

What Is The Benefit Of Stopping Robots Seeing The Wordpress Directories?

There are three main benefits to stopping your directories been indexed. These are

  • no-one can inadvertently see what we have in our Wordpress installation, eg which plugins we use
  • the “theme” of our site won”t be diluted by the indexing of spurius non-relevant files
  • Googlebot has less work to do to index our site

How Do We Stop The Robots Crawling Our Directories?

The robots.txt file which resides in the top level directory of our site contains rules that robots must obey. Think of these rules as Jedi Mind Melds that robots cannot resist and must follow. A detailed description of robots.txt is beyond the scope of this post but you can find some excellent info on creating a robots.txt file at the following address: <a href=”http://www.robotstxt.org/robotstxt.html”>How To Use And Create A Robots.txt File</a>

What Should We Include In A Wordpress Robots.txt File?

The following is a simple example that can be used as the basis for your Wordpress Robots.txt file:

User-agent: *

Disallow: /wp-admin

Disallow: /wp-includes

Disallow: /wp-content/plugins

Disallow: /wp-content/cache

Disallow: /wp-content/themes

Allow: /wp-content/uploads

What this says is:

  • allow all robots to crawl my Wordpress blog, but
  • don”t allow the wp-admin directory to be crawled
  • don”t allow the wp-includes directory to be crawled
  • don”t allow the plugins directory to be crawled
  • don”t allow the cache directory to be crawled
  • don”t allow the themes directory to be crawled
  • do allow the any content uploads such as images to be crawled

You can use the robots.txt tool provided by Google Webmaster tools to check out and create your robots.txt file.

Why Do We Want To Stop Wordpress Archives Been Crawled?

First Things First - There is no Google duplicate content penalty. However, the Google algorithms will decide what version of a particular piece of content is going to rank best so there may be the appearance of a duplicate content penalty because no-one knows how the algorithms are coided and they change over time. However authority sites are the best place to put your unique content as they will rank better than smaller and newer sites. This is because over time authority sites (when done well) build up <strong>trust</strong>.

Now thats out of the way lets look at why you might want <strong>to not have your archieves crawled and indexed</strong>. If you have a piece of good and unique content  and it is in multiple places on your site (date, category, tag, author archives) then the robots will crawl each of these “duplicates” and report back to the Master Indexer. The problem is that this will give the Master Indexer a headache as it will have to decide which of these duplicates is going to rank higher. This is compounded by the fact that each of these duplicates may actually get links from different blogs and sites. The net result is that the authority of your main content post is diluted and you don”t want that to happen. If your site is freash and new this is not going to make much of a difference but as your site grows it will make a difference. Besides its always good to start any new endevour with the right approach and develop good habits.

How Do We Stop Robots Crawling Our Wordpress Archives?

In order to control this duplicate content issue with archives we can tell the robots what archives to crawl and which to ignore with meta robots tags. There is no easy way to use the meta robots tags that are needed to do this but there is a very good plugin that will help you to do the job: <a  href=”http://yoast.com/wordpress/meta-robots-wordpress-plugin/”>The Meta Robots Wordpress Plugin</a> allows you to control which archives are crawled and indexed.

This plugin makes its easy to:

  • stop robots from indexing your Wordpress login, register and admin pages
  • allows you to disable author based archives
  • allows you to disable date based archives
  • allows you to nofollow the category listings on single posts and pages
  • allows you to nofollow outbound links on your frontpage
  • and nofollow tag links

It also has a number of other useful features.

This is one plugin that I always use.

Summary

So we can see from this post that a combination of robots.txt and robots meta tags can be used to ensure that your Wordpress blog does not leak any authority or pagerank. Do please comment if you have any questions or would like to start a discussion.

Posts You May Also Be Interested In:

Tags: ,

1 Response

  1. [...] Original post:  Bloggers Guide To Using Robotstxt And Robots Meta Tags To Optimise Indexing [...]


Leave a Reply

Recommened Sites

Article Boxer

This site is has great tools for the quick production of good quality content for your blog, websites or articles marketing.