What is a robots.txt file
The robots.txt file is basically a text file placed in the highest-level directory of your website. The main purpose of this file is to indicate those files and folders of your website which you do not want search engine crawlers to access. The file uses the Robots Exclusion Standard, which is a protocol with a small set of commands that can be used to specify different sections of your site and different crawlers very easily.
Why you need it for SEO
The robots.txt file controls how search engine spiders see and interact with your web pages. The first thing a search engine spider like Googlebot looks at when it is visiting a page is the robots.txt file. If you have one, make sure it is not harming your ranking or blocking content you don’t want to be blocked. You can use the Google guidelines tool, which will warn you if you are blocking certain page resources that Google needs to understand your pages. Different types of files which you can control from robots.txt are listed below.
- Non-image files
For non-image files (that is, web pages) robots.txt should only be used to control crawling traffic, typically because you don’t want your server to be overwhelmed by Google’s crawler or to waste crawl budget crawling unimportant or similar pages on your site repeatedly.
- Image files
robots.txt does prevent image files from appearing in Google search results.
- Resource files
You can use robots.txt to block resource files such as unimportant image, script, or style files if you think that pages loaded without these resources will not be significantly affected by the loss.
How to create a robots.txt file
The robots.txt file is usually located in the same place on any website, so it is easy to determine if a site has one. Just add “/robots.txt” to the end of a domain name as shown below.
The syntax for using the keywords is as follows:
User-agent: [the name of the robot the following rule applies to] Disallow: [the URL path you want to block] Allow: [the URL path in of a subdirectory, within a blocked parent directory, that you want to unblock]
All robots.txt instructions result in one of the following three outcomes
- Full allow: All content may be crawled.
- Full disallow: No content may be crawled.
- Conditional allow: The directives in the robots.txt determine the ability to crawl certain content.
If you want to disallow all web crawlers, your robots.txt file will look like:
User-agent: * Disallow: /
If you want to give access to all crawlers then replace
An example of conditional-allow looks like
User-agent: * Disallow: / User-agent: Mediapartner-Google Allow: /
You can also specify names of files which you do not want to be crawled. For example, if you want to restrict your photos folder then the syntax would be
User-agent: * Disallow: /photos
Test your robots.txt file
If you have access and permission you can use the Google search console to test your robots.txt file. Instructions to do so are found here (tool not public – requires login). Through this tester tool allows you to see the changes as you adjust your robots.txt file.
If you use a robots.txt file, make sure it is being used properly. An incorrect robots.txt file can block Googlebot from indexing your page. Ensure you are not blocking pages that Google needs to rank your pages.