Dismiss Notice
Welcome to Our Community
Wanting to join the rest of our members? Feel free to sign up today.

robots.txt file

Discussion in 'Search Engine Optimization' started by elsacross, Mar 23, 2011.

  1. elsacross

    elsacross New Member Webmaster

    A robots.txt file won't have any SEO implications at all, per se, but using one incorrectly can prevent your website from being crawled properly. This means that one or more of your web pages (or your entire site) may not be indexed and ranked by the search engines.

    The most important thing to understand about a robots.txt file is that you may not need it at all! Many webmasters create one (incorrectly) and upload it to their website only to find their site missing from Google after the next update.

    My recommendation is not to use a robots.txt file unless you absolutely have to. After all, it is impossible to mess up a file that doesn't exist.

    When do you need a robots.txt file? Only when you have directories or files that you don't want the robots to crawl. A good example would be if you have a download page for a software package or ebook that you sell.

    You obviously wouldn't want to have your download page listed in Google where people could find it and download your item for free.

    You can prevent that from happening by blocking that page with a robots.txt file.

    Note: Create your robots.txt file in Notepad as a plain text file, then upload it to your server's root directory.

    There are several ways to use a robots.txt file, but the simplest, safest, and most effective way is to simply disallow a particular directory.

    For example, you have a download page called /software-download.html. You could create a special directory called /secret and place the download page in that Directory. You would create a robots.txt file with these two lines:

    User-agent: *
    Disallow: /secret/

    The * means that all robots (including googlebot) should respect the line(s) that follows. In this case, all robots that respect and follow robots.txt directives (some don't) will ignore the /secret folder and all files that are in it.

    Another way to disallow crawling of the file is to disallow it exclusively, like this:

    User-agent: *
    Disallow: software-download.html

    If you want to prevent googlebot (or any other robot) from crawling but allow all others, you must explicitly name the one you want to exclude. For example, the following robots.txt file would prevent googlebot from crawling the /secret directory while allowing all others:

    User-agent: googlebot
    Disallow: /secret/

    You can also disallow crawling of multiple directories and files by adding an entry for each one:

    User-agent: *
    Disallow: /secret/
    Disallow: /cgi-bin/
    Disallow: /images/
    Disallow: software-download.html

    A robots.txt file can be a powerful tool when used correctly, but when used incorrectly it can leave your private data exposed to the public and/or ruin your search engine rankings. Rule of thumb: Use a robots.txt file only when necessary and make sure you use it correctly.

Featured Resources (View All)

Share This Page