Skip to content
Site Tools
Narrow screen resolution Wide screen resolution Auto adjust screen size Increase font size Decrease font size Default font size default color blue color green color
You are here: Index arrow Code snippets arrow Simple howto arrow Controlling msn bot
Controlling msn bot Print


The MSN Search web crawler MSNBot enables website owners to control which pages MSN Search indexes and how often MSNBot accesses your website.

You can prevent (the pain in the butt) MSNBot and other standards-compliant crawlers from crawling a Server or collecting information and links from specific pages on your website by using a robots.txt file and/or meta tags.
This is a bit popular cause not everybody is that happy with the annoying bot.

Note
If other sites link to your site, your site's URL and any text you include in HTML anchor tags may still be added to our index.
However, your site content is not added to the index.
Use the robots.txt file to control access to your website or part of the server .

To control how and when your website is crawled, create a robots.txt file in the top-level (root) directory of your website.
In the robots.txt file, you can specify which web crawlers to allow or block.
Note that while MSNBot complies with the standards for robots.txt, not all web crawlers comply.

To conform to the Robots Exclusion Standard, MSNBot searches for robots.txt.
When you create the file, make sure that the file is named robots.txt.
Crawling and indexing restrictions may not work correctly if you name the file robot.txt.

Each time MSNBot crawls your website, it looks in your Web Server's root directory for a robots.txt file.
If the file exists, MSNBot checks to see if MSNBot is an allowed user Agent, and if any crawling or indexing restrictions have been set.

To set which web crawlers can access your website, use the syntax in the table below for your robots.txt file. MSN Search also includes image searching provided by Picsearch.
If you do not want your images indexed, you can block the Picsearch crawler, Psbot, as described in the following table.

Each time MSNBot crawls your website, it looks in your web server's root directory for a robots.txt file.
If the file exists, MSNBot checks to see if MSNBot is an allowed user agent, and if any crawling or indexing restrictions have been set.

To set which web crawlers can access your website, use the syntax in the table below for your robots.txt file.
MSN Search also includes image searching provided by Picsearch.
If you do not want your images indexed, you can block the Picsearch crawler, Psbot, as described in the following table.

Text strings in the robots.txt file are not case-sensitive.

To do this: Use this syntax:
Allow all robots full access and to prevent "file not found: robots.txt" errors Create an empty robots.txt file
Allow all robots complete access
User-agent: * 
Disallow:
Allow only MSNBot access
User-agent: msnbot
Disallow:
User-agent: *
Disallow: /
Exclude all robots from the entire server
User-agent: *
Disallow: /
Exclude only MSNBot
User-agent: msnbot
Disallow: /
Exclude only Psbot (Picsearch)
User-agent: psbot
Disallow: /

Restrict indexing and link crawling within your website

You can block MSNBot from crawling specific file types linked to your website by specifying MSNBot as the user-agent for a Disallow tag that specifies the file types to exclude.

To do this: Use this syntax: Examples
Restrict MSNBot from indexing specific file types
User-agent: msnbot
Disallow: /*.[file extension]$

(the "$" is required)
User-agent: msnbot
Disallow: /*.PDF$
Disallow: /*.jpeg$ 
Disallow: /*.exe$
Use metadata tags to control page indexing and link crawling

You can allow MSNBot to crawl your website and still restrict access to specific web pages and documents by using the noindex and nofollow meta tags within the page code.
The noindex tag allows the Web Page to be retrieved by MSNBot, but blocks indexing of its content.
The nofollow tag blocks the web crawler from following links in the web page that go to other web pages or documents.
Note that not all web crawling robots obey these tags.

If you want to set access and indexing restrictions for your website, replace the user-agent name robotswith msnbotor "*".msnbot in the tag syntax examples below. You can use each tag alone or combine both tags into a single meta tag.

To do this: Add this to the page header:
Restrict MSNBot from indexing a page
<META NAME="msnbot" CONTENT="noindex" />
Restrict all robots from indexing a page
<META NAME="*" CONTENT="noindex" />
Restrict MSNBot from following links on a page
<META NAME="msnbot" CONTENT="nofollow" /> 
Restrict all robots from following links on a page
<META NAME="robots" CONTENT="nofollow" />
Block MSNBot from both indexing and following links
<META NAME="msnbot" CONTENT="noindex,nofollow" />
Prevent MSNBot from caching a page
<META NAME="msnbot" CONTENT="nocache" />
or 
<META NAME="msnbot" CONTENT="noarchive" />

Limit crawl frequency

If you occasionally get high traffic from MSNBot, you can specify a crawl delay parameter in the robots.txt file to specify how often, in seconds, MSNBot can access your website.
To do this, add this syntax to your robots.txt file:

User-agent: msnbot
Crawl-delay: 120

If you still find that MSNBot is placing too high a load on your web server, contact MSN Search Site Owner Support.

When you contact us about an issue, include the following information so that we can help you more quickly:

  • The target website address that MSNBot put in the robots.txt file
  • The date range when the issue occurred
  • The access logs

Comments (0)add feed
Write comment

You must be logged in to post a comment. Please register if you do not have an account yet.



Highlight anything and click below


General searches
Google
Wikipedia
All the web
Open directory
Yahoo

Dictionaries
Webster
 
Tag it:
Delicious
BlinkList
blogmarks
digg
NewsVine
Next >

Enter Amount:

Newest downloads

Seperate stickies and announcements
CPanel and PHPmyAdmin Links add on
Admin posts in red
Snap add on
PHP 2007 Manual module

Popular searches

nuke scripts upload photo module content module group nsn your account login mp3 nuke menu nukesigs open proxy popular searches stop counting own ip inferno calendar grand info emporium upload nuke upgrade script mail test client nsn user info

Archive collector

Phpnuke
Security
Microsoft
Firefox
Music

Link buttons



Use any of the below images to link to my site.

phpnuke-database

phpnuke-database

phpnuke database and services

phpnuke database and services

phpnuke database and services

Syndicate




Recommended educational sites:

http://www.harvard.edu

http://www.cs.cmu.edu

http://www.cs.wcupa.edu

http://www.d.umn.edu

phpnuke database rss news feeds - the latest news from the best sites
Bad Credit Mortgages | Myspace Layouts | Debt Consolidation | Remortgages | Bad Credit Remortgage