![]() |
What is it for? |
This file is intended for the "spiders". Spiders are programs which explore the Web and enable search engines to discover your site and to analyze its contents.
Leaving instructions to these spiders, you can:
- prohibit the exploration of your site to some spiders (also called "agents" or "bots")
- prohibit the exploration of certain pages of your site to the spiders,
- prohibit the exploration of certain pages to some spiders.
Note that the "robots" meta tag can also be add in each one of your pages to prohibit indexing.
![]() |
Syntax |
The syntax accepted by the robots offers a minimum of flexibility:
- spaces are optional
- the use of capital or lower case does'nt matter (It is not "case sensitive")
Every line must start with one of the 3 following options:
# |
Comment. This will be ignored by robots. |
User-Agent: |
This mention can be followed by * or by the exact name of an existing spider. |
Disallow: |
This mention can be followed by only ONE name of repertory or of file. |
Typical syntax is as follow:
User-Agent: AAAAAAA User-Agent: AAAAAAA' etc....... |
where AAAAAAA and AAAAAAA' are the names of the robots and BBBBBBB, BBBBBBB ', CCCCCCC and CCCCCCC' the names of the files and/or repertories that you wish to hide to these robots.
If you use the * instead of the name of a robot, the following lines will be regarded as prohibitions of indexing for ALL the robots.
If you use the "/" in the place of the file name, NO file of the site will be indexed.
![]() |
Building the robots.txt file |
LinkSpirit is a free utility, downloadable on this site, that enables you to create or edit easily "robots" Meta-tags and the "robot.txt" file.
This utility carries out a checking of the syntax of your Robots.txt file by taking account of the rules being reproduced on: http://www.robotstxt.org/wc/norobots.html and of the list of the robots proposed on: http://www.robotstxt.org/wc/active/html/index.html.
If you wish to proceed manually, you just need a text editor (Wordpad for example) to create a text file (with the .txt extension).
Here a typical example of what can be the contents of the file robots.txt.
User-Agent: * |
a) User-agent: * indicates to the spider of any search engine that the access to the site is subjected to the following limitations:
b) Disallow:/download/dwnld.php the page "dwnld.php" located in the "download" directory must not be indexed
c) Disallow:/sources/ none of the files contained in the "sources" repertory must be indexed.
d) Disallow:/admin/perso/ none of the files contained in the "admin/perso" repertory must be indexed.
Note: During the transfer of this file to your FTP, be sure to use the ASCII transfert mode.
![]() |
General rules |
a) Only one robots.txt file must exist on the whole of your site. It must be located at the root of it.
b) If you wish to impose different rules for each search engine, you can (and you must) create several User-agent sections.
c) The name of the file (robots.txt) must be imperatively lowcase written.
d) Register only one directory or one file name behind the Disallow order. Disallow syntax: "file1.htm, file2.htm" is not authorized. "Dir1/, Dir2/" is not either authorized.
E) Transfer your robots.txt file in ASCII mode. Many ftp-client modify the code of the txt files when they are not transferred in ASCII mode. This is the cause of the most frequently encountered problems with the robots.txt file.
![]() |
Standard rules |
a) the asterisk (*) is only accepted in the User-agent field. "Disallow: *" or "Disallow: *.*" or "Disallow: *.gif" is not authorized b) "Allow" field does not exist.
![]() |
Google rules |
a) Asterisk (*) and Dollars ($) can be used in the Disallow field. They enable to hide all the files of a particular type. "Disallow: /*.gif$" will hide all the gif files
b) The Allow field exists and enables to manage exceptions to a general prohibition.
CAUTION: "Google rules" can make your robots.txt file incomprehensible to other robots if they appear in a "User-agent: *" zone. Thus allways put those particular instructions after "User-agent: Googlebot".