“robots.txt” for Wordpress

Robot exclusion is something that I don’t often think about, and I decided to focus on it today and clean up some of the search engine results my site will be generating, now that I have this shiny new wordpress log up.

I was looking at my robots.txt and trying to figure out what do disallow, and I came up with the following:

User-agent: *
disallow: /wordpress
disallow: /archive/category
disallow: /feed
disallow: /comments
disallow: /styles

/wordpress” is an obvious inclusion. I don’t want my log-in page cached.

I’ve taken a vow of permanence for certain links. Anything stored at /archive/ccyy/mm/dd/slug will remain until the end of time. Similarly, anything at the dd, and mm level should remain the same. I actually have nothing at the ccyy level. I hope to fix that at some point. Any other page I consider to have ever-changing primary content. This includes any feed links, and links to categories because posts may or may not remain in a particular category forever.

(note: I moved this post from general to weblog during revision. This would definately make it disappear from one page and appear on another)

Re-categorising posts means that “/archive/category” is too fluid to index by search engines properly. While it would be somewhat convenient, I’d hate to have someone come in for an article that has been re-categorised and move on because he or she did not find it.

/feed” is likewise verboten, unless someone knows something about search engines that I don’t. The special case “/comments” takes care of comment feeds as well.

I don’t want ”/styles” mirrored, because I get enough junk during a Google image search to pollute the service further with my interface images.

That concludes my explanation. I now open the floor to comments, suggestions, etc. about this configuration.

Comments (2)

  1. eglobe1 wrote:

    what is robot use for? if my website do not have robot. is there any problem?

    can u tell me what is the cons and pros for the robots.txt ?/

    Wednesday, August 30, 2006 at 12:17 am #
  2. lowmagnet wrote:

    The robot file simply tells search engines where to not go. The pro side is they don’t go where they’re not supposed to go, and they don’t double index your content. No cons, really, it’s an all-pro affair.

    Wednesday, August 30, 2006 at 7:12 am #