Posted as Blogging, General Tips, Google, SEO Tips, Webmaster resources
Robots.txt file can be used to prevent search engines from indexing the parts of your web sites. Some times you don’t want to index sensitive pages of your web pages then robots.txt file can be useful. Also robots.txt file is a good option to avoid duplicate content penalties by search engines.
As name suggest robots.txt is a plain text file. The path of the robots.txt file should be as follows:
http://www.example.com/robots.txt or http://blog.example.com/robots.txt
Always include Robots.txt file in root directory and not in the subdirectory. As when any robot visits your site it will first look for Robots.txt in root directory and if file not present at that location it simply assume that you want to index all your web pages.
Let’s see syntax of Robots.txt file:
If you want to allow all Robots to index all your pages then include this Robots.txt file:
User-agent: *
Disallow:
Here User-agent means all Robots visiting your site to crawl the pages.
If you want to ban all Robots from indexing your site then include this Robots.txt file:
User-agent: *
Disallow: /
To ban specific Robot from indexing pages, include code like:
User-agent: Googlebot
Disallow: /
To ban some web pages like /Category directory with all sub pages:
User-agent: *
Disallow: /Category/
To allow only specific Robots to index your pages:
User-agent: Googlebot
Disallow:
User-agent: *
Disallow: /
Here you can make one mistake. Do not add specific robot ban after Disallow:/ i.e
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow:
Will ban all Robots including Googlebot from indexing your web site.
Many sites have duplicate content penalties. If same content is accessible from two or more different url’s then it’s said to be duplicate content. If you have /category or /Archive directories then you have chance of having duplicate content penalty. Either show posts excerpt on Category and Archive pages or use robots.txt to ban indexing these pages. Then your Robots.txt file will be look like this:
User-agent: *
Disallow: /category/
Disallow: /archives/
See the complete list of Robots and more guide on Robots files. And tools to validate Robots.txt file.
Keep in mind that it’s not mandatory to all Robots to follow the robots.txt file. Don’t rely on Robots.txt file if you have very sensitive data and don’t want to get that indexed by any means. Use other ways like password protected files.
Also banning most part of your web site is not a good SEO idea. Search engines will not visit the sites having most part banned from indexing.
One last important thing, if you are using sitemap.xml file to submit your sitemap to search engines then make sure that all the url’s submitted in sitemap.xml file are crawlable and you have not accidentally banned any url in Robots.txt file.
It’s always a good idea to use Google webmater tools to validate your Robots.txt file. Here you can check whether specific url’s are allowed or banned to Robots. (Google has added features in Robots.txt tool to report syntax errors. Also you can include your sitemap.xml file in robots.txt file)
6 Responses
What Everybody Ought to Know About Blogging - 97 Blog Tips
August 23rd, 2007 at 12:49 pm
1[...] How to use Robots.txt file effectively by Vijay Shinde [...]
What Everybody Ought to Know About Blogging - 97 Blog Tips : Opportunities for Life
August 23rd, 2007 at 4:38 pm
2[...] How to use Robots.txt file effectively by Vijay Shinde [...]
31 Days to Building a Better Blog « Sztuka Spekulacji
August 5th, 2008 at 7:37 pm
3[...] How to use Robots.txt file effectively by Vijay Shinde [...]
PPC Coach Review
February 5th, 2009 at 8:46 am
4Great article. I’ll definitely be back. All the best, Leonel
Marilyn
June 10th, 2009 at 10:02 pm
5Thank you for this post, it was very informative, even two years later. I’m interested to know about the effect of disallow. Do you know if the robots will stop coming back to check on your url if it finds a robots.txt like:
User-agent: *
Disallow: /
I am trying to come up with a beta release strategy. I’d like to prevent indexing during the private beta phase, then turn indexing on when the site goes public. I’d love to hear your thoughts on this matter.
Tech Maish
October 5th, 2009 at 4:49 pm
6If we just add this code in our robots.txt file, then it is ok or not. (For Wordpress)
User-agent: *
Disallow: /
OR we have to add some more tags
RSS feed for comments on this post · TrackBack URI
Leave a reply
Recent Posts
Categories
Archives
E-Mail Newsletter
Subscribe via RSS
Favorite Posts
Recent Readers
Favorite This Blog
Recent Comments
Blogroll
Pages
Copyright © 2007 eTechBuzz.com | WordPress Powered Theme | Privacy policy