Transfer Domain Names

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Tuesday, 8 June 2010

Our new search index: Caffeine

Posted on 17:00 by Unknown
(Cross-posted on the Official Google Blog)

Today, we're announcing the completion of a new web indexing system called Caffeine. Caffeine provides 50 percent fresher results for web searches than our last index, and it's the largest collection of web content we've offered. Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.

Some background for those of you who don't build search engines for a living like us: when you search Google, you're not searching the live web. Instead you're searching Google's index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here's a good explanation of how it all works.)

So why did we build a new search indexing system? Content on the web is blossoming. It's growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage is richer and more complex. In addition, people's expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.

To keep up with the evolution of the web and to meet rising user expectations, we've built Caffeine. The image below illustrates how our old indexing system worked compared to Caffeine:
Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before — no matter when or where it was published.

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

We've built Caffeine with the future in mind. Not only is it fresher, it's a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.

Posted by Carrie Grimes, Software Engineer
Email ThisBlogThis!Share to XShare to Facebook
Posted in crawling and indexing | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Switching to the new website verification API
    Webmaster level: advanced Just over a year ago we introduced a new API for website verification for Google services. In the spirit of keepi...
  • Structured Data dashboard: new markup error reports for easier debugging
    Since we launched the Structured Data dashboard last year, it has quickly become one of the most popular features in Webmaster Tools. We’ve...
  • "It's on Google! YAY!" - Getting webmaster help in our forum
    Webmaster level: all It's been a bit more than five years now that our Webmaster Help Forum has been up and running, helping webmasters...
  • Supporting rel="canonical" HTTP Headers
    Webmaster level: Advanced Based on your feedback, we’re happy to announce that Google web search now supports link rel="canonical"...
  • Getting started with structured data
    Webmaster level: All If Google understands your website’s content in a structured way, we can present that content more accurately and more ...
  • Responsive design – harnessing the power of media queries
    Webmaster Level: Intermediate / Advanced We love data, and spend a lot of time monitoring the analytics on our websites. Any web developer d...
  • Introducing the Structured Data Dashboard
    Webmaster level: All Structured data is becoming an increasingly important part of the web ecosystem. Google makes use of structured data in...
  • Tell us what you think!
    (Cross-posted on the Google Product Ideas Blog ) The Webmaster Central team does our best to support the webmaster community via Webmaster T...
  • Improving URL removals on third-party sites
    Webmaster level: all Content on the Internet changes or disappears, and occasionally it's helpful to have search results for it updated ...
  • Protect your site from spammers with reCAPTCHA
    Webmaster Level: All If you allow users to publish content on your website, from leaving comments to creating user profiles , you’ll likely...

Categories

  • advanced
  • beginner
  • crawling and indexing
  • events
  • feedback and communication
  • general tips
  • hacked sites
  • hreflang
  • images
  • intermediate
  • localization
  • malware
  • mobile
  • performance
  • products and services
  • search results
  • sitemaps
  • structured data
  • url removals
  • verification
  • video
  • webmaster guidelines
  • webmaster tools

Blog Archive

  • ►  2014 (2)
    • ►  January (2)
  • ►  2013 (35)
    • ►  December (6)
    • ►  November (1)
    • ►  October (2)
    • ►  September (2)
    • ►  August (4)
    • ►  July (2)
    • ►  June (4)
    • ►  May (3)
    • ►  April (2)
    • ►  March (6)
    • ►  February (2)
    • ►  January (1)
  • ►  2012 (55)
    • ►  December (3)
    • ►  November (1)
    • ►  October (5)
    • ►  September (2)
    • ►  August (5)
    • ►  July (5)
    • ►  June (6)
    • ►  May (7)
    • ►  April (7)
    • ►  March (6)
    • ►  February (2)
    • ►  January (6)
  • ►  2011 (75)
    • ►  December (7)
    • ►  November (2)
    • ►  October (5)
    • ►  September (8)
    • ►  August (10)
    • ►  July (5)
    • ►  June (10)
    • ►  May (8)
    • ►  April (6)
    • ►  March (6)
    • ►  February (5)
    • ►  January (3)
  • ▼  2010 (81)
    • ►  December (9)
    • ►  November (9)
    • ►  October (4)
    • ►  September (8)
    • ►  August (6)
    • ►  July (2)
    • ▼  June (6)
      • Sitemaps: One file, many content types
      • Quality links to your site
      • Google Videos best practices
      • Our new search index: Caffeine
      • Crawl Errors now reports soft 404s
      • Grab bag videos are back!
    • ►  May (6)
    • ►  April (12)
    • ►  March (11)
    • ►  February (1)
    • ►  January (7)
  • ►  2009 (52)
    • ►  December (7)
    • ►  November (9)
    • ►  October (13)
    • ►  September (8)
    • ►  August (6)
    • ►  July (5)
    • ►  June (4)
Powered by Blogger.

About Me

Unknown
View my complete profile