What is the robots.txt file and why do you need it for SEO?

1573 0

custom-robots-txt-file-blogger

What is a robots.txt file

The robots.txt file is basically a text file placed in the highest-level directory of your website. The main purpose of this file is to indicate those files and folders of your website which you do not want search engine crawlers to access. The file uses the Robots Exclusion Standard, which is a protocol with a small set of commands that can be used to specify different sections of your site and different crawlers very easily.

Why you need it for SEO

The robots.txt file controls how search engine spiders see and interact with your web pages. The first thing a search engine spider like Googlebot looks at when it is visiting a page is the robots.txt file. If you have one, make sure it is not harming your ranking or blocking content you don’t want to be blocked. You can use the Google guidelines tool, which will warn you if you are blocking certain page resources that Google needs to understand your pages. Different types of files which you can control from robots.txt are listed below.

  • Non-image files

For non-image files (that is, web pages) robots.txt should only be used to control crawling traffic, typically because you don’t want your server to be overwhelmed by Google’s crawler or to waste crawl budget crawling unimportant or similar pages on your site repeatedly.

  • Image files

robots.txt does prevent image files from appearing in Google search results.

  • Resource files

You can use robots.txt to block resource files such as unimportant image, script, or style files if you think that pages loaded without these resources will not be significantly affected by the loss.

How to create a robots.txt file

The robots.txt file is usually located in the same place on any website, so it is easy to determine if a site has one. Just add “/robots.txt” to the end of a domain name as shown below.

www.yourwebsite.com/robots.txt

The syntax for using the keywords is as follows:

User-agent: [the name of the robot the following rule applies to]
Disallow: [the URL path you want to block]
Allow: [the URL path in of a subdirectory, within a blocked parent directory, that you want to unblock]

All robots.txt instructions result in one of the following three outcomes

  • Full allow: All content may be crawled.
  • Full disallow: No content may be crawled.
  • Conditional allow: The directives in the robots.txt determine the ability to crawl certain content.

If you want to disallow all web crawlers, your robots.txt file will look like:

User-agent: *
Disallow: /

If you want to give access to all crawlers then replace Disallow: /with Disallow:

An example of conditional-allow looks like

User-agent: *
Disallow: /
User-agent: Mediapartner-Google
Allow: /

You can also specify names of files which you do not want to be crawled. For example, if you want to restrict your photos folder then the syntax would be

User-agent: *
Disallow: /photos

Test your robots.txt file

If you have access and permission you can use the Google search console to test your robots.txt file. Instructions to do so are found here (tool not public – requires login). Through this tester tool allows you to see the changes as you adjust your robots.txt file.

Final Thoughts

If you use a robots.txt file, make sure it is being used properly. An incorrect robots.txt file can block Googlebot from indexing your page. Ensure you are not blocking pages that Google needs to rank your pages.

 

In this article

Join the Conversation


css.php