Yahoo! Bot (Yahoo! Slurp) – Supporting wildcards in robots.txt

There is another update from Yahoo! that they now started supporting two new additional characters in your robots.txt for advance handling. They are ‘*’ and ‘$’, I will try to explain the use and syntax briefly below.

Use of ‘*’ in Robots.txt:

Now you can use ‘*’ symbol in the URL which you are mentioning along with the Disallow: to denote a wildcard match. Normally we use * in the User-agent field and that describes ‘any Robot’. Similar way the use of ‘*’ in robots directives for Yahoo! Slurp to wildcard match a sequence of characters in your URL. For example:

User-Agent: Yahoo! Slurp
Allow: /india*/
Disallow: /*_image*.html
Disallow: /*?sessionid

The robots directives above will:

  • allow all directories that begin with ‘india’, such as ‘/india_html/’ or ‘/india_images/’ to be crawled
  • disallow any files or directories which contain ‘_image’, such as ‘/card_image.html’ or ‘/store_image/product.html’ to be crawled
  • disallow any files with ‘?sessionid’ in their URL string, such as ‘/cart.php?sessionid=2035kits’ to be crawled

Use of ‘$’ in Robots.txt:

You can now also use ‘$’ in robots directives for Slurp to anchor the match to the end of the URL string. Without this symbol, Yahoo! slurp would treat the URLs against all directives, instead of taking it as anchors at the end of the URL string. For example:

User-Agent: Yahoo! Slurp
Disallow: /*.gif$
Allow: /*?$

The robots directives above will

  • Disallow all files ending in ‘.gif’ in your entire site. Note that without the ‘$’, this would disallow all files containing ‘.gif’ in their file path
  • Allow all files ending in ‘?’ to be included. This would not automatically allow files that just contain ‘?’ somewhere in the URL string

A Word from the Author:

Did you believe that Yahoo! doesn’t support the ‘Allow’ tag ever? They Do and they have accepted that in their official blog too. You can get updates about Yahoo! Slurp and related topics from here. And about the yahoo Slurp and the new Wild Card Symbols, read more on here.

