Transfer Domain Names

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 2 December 2009

New User Agent for News

Posted on 08:02 by Unknown
Webmaster Level: Intermediate

Today we are announcing a new user agent for robots.txt called Googlebot-News that gives publishers even more control over their content. In case you haven't heard of robots.txt, it's a web-wide standard that has been in use since 1994 and which has support from all major search engines and well-behaved "robots" that process the web. When a search engine checks whether it has permission to crawl and index a web page, the "check if we're allowed to crawl this page" mechanism is robots.txt.

Publishers could easily contact us via a form if they didn't want to be included in Google News but did want to be in Google's web search index. Now, publishers can manage their content in Google News in an even more automated way. Site owners can just add Googlebot-News specific directives to their robots.txt file. Similar to the Googlebot and Googlebot-Image user agents, the new Googlebot-News user agent can be used to specify which pages of a website should be crawled and ultimately appear in Google News.

Here are a few examples for publishers:

Include pages in both Google web search and News:
User-agent: Googlebot
Disallow:

This is the easiest case. In fact, a robots.txt file is not even required for this case.

Include pages in Google web search, but not in News:
User-agent: Googlebot
Disallow:

User-agent: Googlebot-News
Disallow: /

This robots.txt file says that no files are disallowed from Google's general web crawler, called Googlebot, but the user agent "Googlebot-News" is blocked from all files on the website.

Include pages in Google News, but not Google web search:
User-agent: Googlebot
Disallow: /

User-agent: Googlebot-News
Disallow:

When parsing a robots.txt file, Google obeys the most specific directive. The first two lines tell us that Googlebot (the user agent for Google's web index) is blocked from crawling any pages from the site. The next directive, which applies to the more specific user agent for Google News, overrides the blocking of Googlebot and gives permission for Google News to crawl pages from the website.

Block different sets of pages from Google web search and Google News:
User-agent: Googlebot
Disallow: /latest_news

User-agent: Googlebot-News
Disallow: /archives

The pages blocked from Google web search and Google News can be controlled independently. This robots.txt file blocks recent news articles (URLs in the /latest_news folder) from Google web search, but allows them to appear on Google News. Conversely, it blocks premium content (URLs in the /archives folder) from Google News, but allows them to appear in Google web search.

Stop Google web search and Google News from crawling pages:
User-agent: Googlebot
Disallow: /

This robots.txt file tells Google that Googlebot, the user agent for our web search crawler, should not crawl any pages from the site. Because no specific directive for Googlebot-News is given, our News search will abide by the general guidance for Googlebot and will not crawl pages for Google News.

For some queries, we display results from Google News in a discrete box or section on the web search results page, along with our regular web search results. We sometimes do this for Images, Videos, Maps, and Products, too. This is known as Universal search results. Since Google News powers Universal "News" search results, if you block the Googlebot-News user agent then your site's news stories won't be included in Universal search results.

We are currently testing our support for the new user agent. If you see any problems please let us know. Note that it is possible for Google to return a link to a page in some situations even when we didn't crawl that page. If you'd like to read more about robots.txt, we provide additional documentation on our website. We hope webmasters will enjoy the flexibility and easier management that the Googlebot-News user agent provides.

Written by Jonathan Simon, Webmaster Trends Analyst
Email ThisBlogThis!Share to XShare to Facebook
Posted in crawling and indexing, intermediate | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Switching to the new website verification API
    Webmaster level: advanced Just over a year ago we introduced a new API for website verification for Google services. In the spirit of keepi...
  • Structured Data dashboard: new markup error reports for easier debugging
    Since we launched the Structured Data dashboard last year, it has quickly become one of the most popular features in Webmaster Tools. We’ve...
  • "It's on Google! YAY!" - Getting webmaster help in our forum
    Webmaster level: all It's been a bit more than five years now that our Webmaster Help Forum has been up and running, helping webmasters...
  • Supporting rel="canonical" HTTP Headers
    Webmaster level: Advanced Based on your feedback, we’re happy to announce that Google web search now supports link rel="canonical"...
  • Getting started with structured data
    Webmaster level: All If Google understands your website’s content in a structured way, we can present that content more accurately and more ...
  • Responsive design – harnessing the power of media queries
    Webmaster Level: Intermediate / Advanced We love data, and spend a lot of time monitoring the analytics on our websites. Any web developer d...
  • Introducing the Structured Data Dashboard
    Webmaster level: All Structured data is becoming an increasingly important part of the web ecosystem. Google makes use of structured data in...
  • Tell us what you think!
    (Cross-posted on the Google Product Ideas Blog ) The Webmaster Central team does our best to support the webmaster community via Webmaster T...
  • Improving URL removals on third-party sites
    Webmaster level: all Content on the Internet changes or disappears, and occasionally it's helpful to have search results for it updated ...
  • Protect your site from spammers with reCAPTCHA
    Webmaster Level: All If you allow users to publish content on your website, from leaving comments to creating user profiles , you’ll likely...

Categories

  • advanced
  • beginner
  • crawling and indexing
  • events
  • feedback and communication
  • general tips
  • hacked sites
  • hreflang
  • images
  • intermediate
  • localization
  • malware
  • mobile
  • performance
  • products and services
  • search results
  • sitemaps
  • structured data
  • url removals
  • verification
  • video
  • webmaster guidelines
  • webmaster tools

Blog Archive

  • ►  2014 (2)
    • ►  January (2)
  • ►  2013 (35)
    • ►  December (6)
    • ►  November (1)
    • ►  October (2)
    • ►  September (2)
    • ►  August (4)
    • ►  July (2)
    • ►  June (4)
    • ►  May (3)
    • ►  April (2)
    • ►  March (6)
    • ►  February (2)
    • ►  January (1)
  • ►  2012 (55)
    • ►  December (3)
    • ►  November (1)
    • ►  October (5)
    • ►  September (2)
    • ►  August (5)
    • ►  July (5)
    • ►  June (6)
    • ►  May (7)
    • ►  April (7)
    • ►  March (6)
    • ►  February (2)
    • ►  January (6)
  • ►  2011 (75)
    • ►  December (7)
    • ►  November (2)
    • ►  October (5)
    • ►  September (8)
    • ►  August (10)
    • ►  July (5)
    • ►  June (10)
    • ►  May (8)
    • ►  April (6)
    • ►  March (6)
    • ►  February (5)
    • ►  January (3)
  • ►  2010 (81)
    • ►  December (9)
    • ►  November (9)
    • ►  October (4)
    • ►  September (8)
    • ►  August (6)
    • ►  July (2)
    • ►  June (6)
    • ►  May (6)
    • ►  April (12)
    • ►  March (11)
    • ►  February (1)
    • ►  January (7)
  • ▼  2009 (52)
    • ▼  December (7)
      • Helping webmasters from user to user
      • Handling legitimate cross-domain content duplication
      • Your site's performance in Webmaster Tools
      • How fast is your site?
      • New User Agent for News
      • Region Tags in Google Search Results
      • Changes in First Click Free
    • ►  November (9)
    • ►  October (13)
    • ►  September (8)
    • ►  August (6)
    • ►  July (5)
    • ►  June (4)
Powered by Blogger.

About Me

Unknown
View my complete profile