A guide to robots.txt files
What is Robots.txt?
Robots.txt is a simple text file that is usually created by webmasters to instruct robots how to crawl and index a website. The primary purpose of the file is to control how search engine robots crawl and index a site.
For optimum accessibility, the robots.txt file should be located in the top-level directory of a server, for example www.example.co.uk/robots.txt.
Why is a Robots.txt File Important?
These are the main reasons a robots.txt is important:
- We know that search engines hate duplicate content on websites and one way to avoid receiving SEO penalties for this reason is to use the robots.txt file to block robots from crawling the pages which contain duplicate content.
- The robots.txt file can be used to stop robots from crawling and indexing private and confidential content that shouldn’t be visible to the public. However it is advised that other methods are put in place to protect this content and not to solely rely on the robots.txt file.
- By using the robots.txt file to block robots from crawling specific areas of your website you can save bandwidth and server resources.
- The robots.txt file can also be used to stop robots from crawling less important content on your website and therefore allows the robots more time to crawl the content that is intended to be shown in search results.
The Basic Operations
The following are some common uses of the robots.txt file.
Allow all bots to access your entire website:
Block all bots from accessing your entire website:
Allow a single robot but block all other robots from accessing your entire website:
Block all robots from accessing a specific folder:
With the recent introduction of the Robots.txt tester tool to Google’s Webmaster Tools, it now means that ensuring your file is correct and fully optimised is even easier. Below are some of the features the tool contains:
- Identify which line in the robots.txt file is blocking a specific page
- View older versions of the robots.txt file to identify historic issues
- Test specific changes to your robots.txt file before making them live
To access this tool in Webmaster tools simply navigate to a specific account, select ‘Crawl’ from the left hand navigation and then select ‘robots.txt Tester’.
Image source: Search Engine Watch