How Google May Annotate Images to Improve Search Results
How might Google be trying to improve upon the information that they learn from sources such as knowledge bases to help them answer search queries?
The information they may use to enrich a knowledge base may be learned or inferred from outside of those knowledge bases from:
- Analyzing images
- other data sources
This patent defines knowledge bases for us, why they are important, and points out some examples of how Google is looking at Entities when it may annotate images:
A knowledge base is an important repository of structured and unstructured data. The data stored in a knowledge base may include information such as entities, facts about entities, and relationships between entities. This information can be used to assist with or satisfy user search queries processed by a search engine.
Examples of knowledge bases include Google Knowledge Graph and Knowledge Vault, Microsoft Satori Knowledge Base, DBpedia, Yahoo! Knowledge Base, and Wolfram Knowledgebase.
The focus of this patent is upon improving what might be found in knowledge bases:
The data stored in a knowledge base may be enriched or expanded by harvesting information from wide variety of sources. For example, entities and facts may be obtained by crawling text included in Internet web pages. As another example, entities and facts may be collected using machine learning algorithms.
All gathered information may be stored in a knowledge base to enrich the information that is available for processing search queries.
Analysing Images to Enrich Knowledge Base Information
This approach can involve a process to annotate images and select object entities contained in those images. I am reminded of a post I recently wrote about Google annotating images, How Google May Map Image Queries
The effort to better understand images, and annotate them, and explore related entities lets Google focus upon “relationships between the object entities and attribute entities, and store the relationships in a knowledge base.”
It’s possible that Google can learn things from images of real-world objects (what Google referred to entities as when it introduced the Google Knowledge Graph in 2012.)
I also wrote a post about images and image search at Google becoming more semantic, and you can see that in the labels that they have added to categories in Google image search results. I wrote about those in Google Image Search Labels Becoming More Semantic?
When writing about mapping image queries, I couldn’t help thinking about how labels were being used in categories to organize information in a more helpful way, and have been suggesting to people that if they want to learn more about entities that might be related to a topic that they are researching, either to create content or to do keyword research, they should be doing image searches and looking at those semantic labels.
This new patent focuses upon assigning annotations to images to identify entities contained in the images. During this labeling, they may select an object entity among the entities based on the annotations and then decide on at least one attribute entity using the annotated images that may also contain the object entity. They may also try to infer a relationship between the object entity the attribute entity or entities and include that relationship in a knowledge base.
In accordance with one exemplary embodiment, a computer-implemented method is provided for enriching a knowledge base for search queries. The method includes assigning annotations to images stored in a database. The annotations may identify entities contained in the images. An object entity among the entities may be selected based on the annotations. At least one attribute entity may be determined using the annotated images containing the object entity. A relationship between the object entity and the at least one attribute entity may be inferred and stored in a knowledge base.
For example, when I search for my hometown, Carlsbad in Google image search, one of the category labels is for Legoland, which is an amusement park located in Carlsbad, California. Showing that as a label tells us that Legoland is located in Carlsbad (the captions for the pictures of Legoland tell us that it is located in Carlsbad.)
This patent can be found at:
Computerized systems and methods for enriching a knowledge base for search queries
Inventors: Ran El Manor and Yaniv Leviathan
Assignee: Google LLC
US Patent: 10,534,810
Granted: January 14, 2020
Filed: February 29, 2016
Systems and methods are disclosed for enriching a knowledge base for search queries. According to certain embodiments, images are assigned annotations that identify entities contained in the images. An object entity is selected among the entities based on the annotations and at least one attribute entity is determined using annotated images containing the object entity. A relationship between the object entity and the at least one attribute entity is inferred and stored in the knowledge base. In some embodiments, confidence may be calculated for the entities. The confidence scores may be aggregated across a plurality of images to identify an object entity.
Confidence Scores While Labeling of Entities in Images
One of the first phrases to jump out at me when I scanned this patent to decide that I wanted to write about it was the phrase, “confidence scores,” which reminded me of association scores which I wrote about discussing Google trying to extract information about entities and relationships with other entities and confidence scores about the relationships between those entities, and about attributes involving the entities. I mentioned association scores in the post Entity Extractions for Knowledge Graphs at Google, because those scores were described in the patent Computerized systems and methods for extracting and storing information regarding entities.
I also referred to these confidence scores when I wrote about Answering Questions Using Knowledge Graphs, because association scores or confidence scores can lead to better answers to questions about entities in search results, which is an aim of this patent, and how it attempts to analyze and label images and understand the relationships between entities shown in those images.
The patent lays out the purpose it serves when it may analyze and annotate images like this:
Embodiments of the present disclosure provide improved systems and methods for enriching a knowledge base for search queries. The information used to enrich a knowledge base may be learned or inferred from analyzing images and other data sources.
In accordance with some embodiments, object recognition technology is used to annotate images stored in databases or harvested from Internet web pages. The annotations may identify who and/or what is contained in the images.
The disclosed embodiments can learn which annotations are good indicators for facts by aggregating annotations over object entities and facts that are already known to be true. Grouping annotated images by object entity helps identify the top annotations for the object entity.
Top annotations can be selected as attributes for the object entities and relationships can be inferred between the object entities and the attributes.
As used herein, the term “inferring” refers to operations where an entity relationship is inferred from or determined using indirect factors such as image context, known entity relationships, and data stored in a knowledge base to draw an entity relationship conclusion instead of learning the entity-relationship from an explicit statement of the relationship such as in text on an Internet web page.
The inferred relationships may be stored in a knowledge base and subsequently used to assist with or respond to user search queries processed by a search engine.
The patent then tells us about how confidence scores are used, that they calculate confidence scores for annotations assigned to images. Those “confidence scores may reflect the likelihood that an entity identified by an annotation is actually contained in an image.”
If you look back up at the pictures for Legoland above, it may be considered an attribute entity of the Object Entity Carlsbad, because Legoland is located in Carlsbad. The label annotations indicate what the images portray, and infer a relationship between the entities.
Just like an image search for Milan Italy shows a category label for Duomo, a Cathedral located in the City. The Duomo is an attribute entity of the Object Entity of Milan because it is located in Milan Italy.
In those examples, we are inferring from Legoland being included under pictures of Carlsbad that it is an attribute entity of Carlsbad, and that the Duomo is an attribute entity of Milan because it is included in results of a search for Milan.
A search engine may learn from label annotations and because of confidence scores about images because the search engine (or indexing engine thereof) may index:
- Image annotations
- Object entities
- Attribute entities
- Relationships between object entities and attribute entities
- Facts learned about object entities
The Illustrations from the patent show us images of a Bear, eating a Fish, to tell us that the Bear is an Object Entity, and the Fish is an Attribute Entity and that Bears eat Fish.
We are also shown that Bears, as object Entities have other Attribute Entities associated with them, since they will go into the water to hunt fish, and roam around on the grass.
Annotations may be detailed and cover objects within photos or images, like the bear eating the fish above. The patent points out a range of entities that might appear in a single image by telling us about a photo from a baseball game:
An annotation may identify an entity contained in an image. An entity may be a person, place, thing, or concept. For example, an image taken at a baseball game may contain entities such as “baseball fan”, “grass”, “baseball player”, “baseball stadium”, etc.
An entity may also be a specific person, place, thing, or concept. For example, the image taken at the baseball game may contain entities such as “Nationals Park” and “Ryan Zimmerman”.
Defining an Object Entity in an Image
The patent provides more insights into what object entities are and how they might be selected:
An object entity may be an entity selected among the entities contained in a plurality of annotated images. Object entities may be used to group images to learn facts about those object entities. In some embodiments, a server may select a plurality of images and assign annotations to those images.
A server may select an object entity based on the entity contained in the greatest number of annotated images as identified by the annotations.
For example, a group of 50 images may be assigned annotations that identify George Washington in 30 of those images. Accordingly, a server may select George Washington as the object entity if 30 out of 50 annotated images is the greatest number for any identified entity.
Confidence scores may also be determined for annotations. Confidence scores are an indication that an entity identified by an annotation is actually contained in an image. It “quantifies a level of confidence in an annotation being accurate.” That confidence score could be calculated by using a template matching algorithm. The annotated image may be compared with a template image.
Defining an Attribute Entity in an Image
An attribute entity may be an entity that is among the entities contained in images that contain the object entity. They are entities other than the object entity.
Annotated images that contain the object entity may be grouped and an attribute entity may be selected based on what entity might be contained in the greatest number of grouped images as identified by the annotations.
So, a group of 30 annotated images containing object entity “George Washington” may also include 20 images that contain “Martha Washington.”
In that case, “Martha Washington,” may be considered an attribute entity
(Of Course, “Martha Washington Could be an object Entity, and “George Washington, appearing in a number of the “Martha Washington” labeled images could be considered the attribute entity.)
Infering Relationships between entities by Analyzing Images
If more than a threshold of images of “Michael Jordon” contains a basketball in his hand, a relationship between “Michael Jordan” and basketball might be made (That Michael Jordan is a basketball player.)
From analyzing images of bears hunting for fish in water, and roaming around on grassy fields, some relationships between bears and fish and water and grass can be made also:
By analyzing images of Michael Jordan with a basketball in his hand wearing a Chicago Bulls jersey, a search query asking a question such as “What basketball team does Michael Jordan play for?” may be satisfied with the answer “Chicago Bulls”.
To answer a query such as “What team did Michael Jordan play basketball for, Google could perform an image search for “Michael Jordan playing basketball”. Having those images that contain the object entity of interest can allow the images to be analyzed and an answer provided. See the picture at the top of this post, showing Michael Jordan in a Bulls jersey.
This process to collect and annotate images can be done using any images found on the Web, and isn’t limited to images that might be found in places like Wikipedia.
Google can analyze images online in a way that scales on a web-wide basis, and by analyzing images, it may provide insights that a knowledge graph might not, such as to answer the question, “where do Grizzly Bears hunt?” an analysis of photos reveals that they like to hunt near water so that they can eat fish.
The confidence scores in this patent aren’t like the association scores in the other patents about entities that I wrote about, because they are trying to gauge how likely it is that what is in a photo or image is indeed the entity that it might then be labeled with.
The association scores that I wrote about were trying to gauge how likely relationships between entities and attributes might be more likely to be true based upon things such as the reliability and popularity of the sources of that information.
So, Google is trying to learn about real-world objects (entities) by analyzing pictures of those entities (ones that it has confidence in), as an alternative way of learning about the world and the things within it.