Organizing the world’s information: where does it all come from?Organizing the world’s information: where does it all come from?Vice President of Product & Design
Since Google was founded more than 22 years ago, we’ve continued to pursue an ambitious mission of organizing the world’s information and making it universally accessible and useful. While we started with organizing web pages, our mission has always been much more expansive. We didn’t set out to organize the web’s information, but all the world’s information.
Quickly, Google expanded beyond the web and began to look for new ways to understand the world and make information and knowledge accessible for more people. The internet–and the world–have changed a lot since those early days, and we’ve continued to improve Google Search to both anticipate and respond to the ever-evolving information needs that people have.
It’s no mystery that the search results you saw back in 1998 look different than what you might find today. So we wanted to share an overview of where the information on Google comes from and, in another post, how we approach organizing an ever-expanding universe of web pages, images, videos, real-world insights and all the other forms of information out there.
Information from the open web
You’re probably familiar with web listings on Google–the iconic “blue link” results that take you to pages from across the web. These listings, along with many other features on the search results page, link out to pages on the open web that we’ve crawled and indexed, following instructions provided by the site creators themselves.
Site owners have the control to tell our web crawler (Googlebot) what pages we should crawl and index, and they even have more granular controls to indicate which portions of a page should appear as a text snippet on Google Search. Using our developer tools, site creators can choose if they want to be discovered via Google and optimize their sites to improve how they’re presented, with the aim to get more free traffic from people looking for the information and services they’re offering.
Google Search is one of many ways people find information and websites. Every day, we send billions of visitors to sites across the web, and the traffic we send has grown every year since Google started. This traffic goes to a wide range of websites, helping people discover new companies, blogs, and products, not just the largest, well known sites on the web. Every day, we send visitors to well over 100 million different websites.
Common knowledge and public data sources
Creators, publishers and businesses of all sizes work to create unique content, products and services. But there is also information that falls into the category of what you might describe as common knowledge–information that wasn’t uniquely created or doesn’t “belong” to any one person, but represents a set of facts that is broadly known. Think: the birthdate of a historical figure, the height of the tallest mountain in South America, or even what day it is today.
We help people easily find these types of facts through a variety of Google Search features like knowledge panels. The information comes from a wide range of openly licensed sources such as Wikipedia, The Encyclopedia of Life, Johns Hopkins University CSSE COVID-19 Data, and the Data Commons Project, an open knowledge database of statistical data we started in collaboration with the U.S. Census, Bureau of Labor Statistics, Eurostat, World Bank and many others.
Another type of common knowledge is the product of calculations, and this is information that Google often generates directly. So when you search for a conversion of time (“What time is it in London?”) or measurement (“How many pounds in a metric ton?”), or want to know the square root of 348, those are pieces of information that Google calculates. Fun fact: we also calculate the sunrise and sunset times for locations based on latitude and longitude!
Licenses and partnerships
When it comes to organizing information, unstructured data (words and phrases on web pages) is more challenging for our automated systems to understand. Structured databases, including public knowledge bases like Wikidata, make it a lot easier for our systems to understand, organize and present facts in helpful features and formats.
For some specialized types of data, like sports scores, information about TV shows and movies, and song lyrics, there are providers who work to organize information in a structured format and offer technical solutions (like APIs) to deliver fresh info. We license data from these companies to ensure that providers and creators (like music publishers and artists) are compensated for their work. When people come to Google looking for this information, they can access it right away.
We always work to deliver high quality information, and for topics like health or civic participation that affect people’s livelihoods, easy access to reliable, authoritative information is critically important. For these types of topics, we work with organizations like local health authorities, such as the CDC in the U.S., and nonpartisan, nonprofit organizations like Democracy Works to make authoritative information readily available on Google.
Information that people and businesses provide
There’s a wide range of information that exists in the world that isn’t currently available on the open web, so we look for ways to help people and businesses share these updates, including by providing information directly to Google. Local businesses can claim their Business Profile and share the latest with potential customers on Search, even if they don’t have a website. In fact, each month Google Search connects people with more than 120 million businesses that don’t have a website. On average, local results in Search drive more than 4 billion connections for businesses every month, including more than 2 billion visits to websites as well as connections like phone calls, directions, ordering food and making reservations.
We’re also deeply investing in new techniques to ensure that we’re reflecting the latest accurate information. This can be especially challenging as local information is constantly changing and not often accurately reflected on the web. For example, in the wake of COVID-19, we’ve used our Duplex conversational technology to call businesses, helping to update their listings by confirming details like modified store hours or whether they offer takeout and delivery. Since this work began, we’ve made over 3 million updates to businesses like pharmacies, restaurants and grocery stores that have been seen over 20 billion times in Maps and Search.
Other businesses like airlines, retailers and manufacturers also provide Google and other sites with data about their products and inventory through direct feeds. So when you search for a flight from Bogota to Lima, or want to learn more about the specs of the hottest new headphones, Google can provide high quality information straight from the source.
We also provide ways for people to share their knowledge about places across more than 220 countries and territories. Thanks to millions of contributions submitted by users every day–from reviews and ratings to photos, answers to questions, address updates and more–people all around the world can find the latest, accurate local information on Google Search and Maps.
Newly created information and insights from Google
Through advancements in AI and machine learning, we’ve developed innovative ways to derive new insights from the world around us, providing people with information that can not only help them in their everyday lives, but also keep them safe.
For years, people have turned to our Popular Times feature to help gauge the crowds at their favorite brunch spots or visit their local grocery store when it’s less busy. We’re continually improving the accuracy and coverage of this feature, currently available for 20 million places around the world on Maps and Search. Now, this technology is serving more critical needs during COVID. With an expansion of our live busyness feature, these Google insights are helping people take crowdedness into account as they patronize businesses through the pandemic.
Organizing information and making it accessible and useful
Simply compiling a wide range of information is not enough. Core to making information accessible is organizing it in a way that people can actually use it.
How we organize information continues to evolve, especially as new information and content formats become available. To learn more about our approach to provide you with helpful, well-organized search results pages, check out the next blog in our How Search Works series.