Job Search Engine Using Occupation Vectors
I worked for the Courts of Delaware at Superior Court.
I started working there as the Assistant Criminal Deputy Prothonotary.
I changed positions after 7 years there, and I became a Mini/Micro Computer Network Administrator.
The Court used an old English title for that first position which meant that I supervised Court Clerks in the Criminal Department of the Court. In the second position, I never ever saw a mini/micro-computer but it was a much more technical position. I was reminded of those titles when writing this post.
What unusual job titles might you have held in the past?
A Job Search Engine Based on Occupation Vectors and a Job Identification Model
For a two week period, Google was granted patents with the same name each of those 2 weeks. This is the first of the two patents during that period granted under the name “Search Engine.”
It is about a specific type of search engine. One that focuses upon a specific search vertical – A Job Search Engine.
The second patent granted under the name “Search Engine,” was one that focused upon indexing data related to applications on mobile devices. I wrote about it in the post A Native Application Vertical Search Engine at Google
The reason why I find it important to learn about and understand how these new “Search Engine” patents work is that they adopt some newer approaches to answering searches than some of the previous vertical search engines developed by Google. Understanding how they work may provide some ideas about how older searches at Google may have changed.
This Job Search Engine patent works with a job identification model to enhance job search by improving the quality of search results in response to a job search query.
We are told that the job identification model can identify relevant job postings that could otherwise go unnoticed by conventional algorithms due to inherent limitations of keyword-based searching. What implications does this have for organic search at Google that has focused upon keyword search?
This job search may use methods in addition to conventional keyword-based searching. It uses an identification model that can identify relevant job postings which include job titles that do not match the keywords of a received job search query.
So, the patent tells us that in a query using the words “Patent Guru,” the job identification model may identify postings related to a:
- “Patent Attorney”
- “Intellectual Property Attorney”
- the like
The method behind job searching may include (remember the word “vector.” It is one I am seeing from Google a lot lately):
- Defining a vector vocabulary
- Defining an occupation taxonomy includings multiple different occupations
- Obtaining multiple labeled training data items, wherein each labeled training data item is associated with at least:
- (i) a job title
- (ii) an occupation
- Generating an occupation vector which includes a feature weight for each respective term in the vector vocabulary
- Associating each respective occupation vector with an occupation in the occupation taxonomy based on the occupation of the labeled training data item used to generate the occupation vector
- Receiving a search query that includes a string related to a characteristic of one or more potential job opportunities, generating a first vector based on the received query
- Determining, for each respective occupation of the multiple occupations in the occupation taxonomy, a confidence score that is indicative of whether the query vector is correctly classified in the respective occupation
- Selecting the particular occupation that is associated with the highest confidence score
- Obtaining one or more job postings using the selected occupation
- Providing the obtained job postings in a set of search results in response to the search query
These operations may include:
Feature Weights for Terms in Vector Vocabularies
It sounds like Google is trying to understand job position titles and how they may be connected with each other, and developing a vector vocabulary, and build ontologies of related positions
A feature weight may be based on:
- A term frequency determined on a number of occurrences of each term in the job title of the training data item
- An inverse occupation frequency that is determined based on a number of occupations in the occupation taxonomy where each respective term in the job title of the respective training data item is present.
- An occupation derivative based on a density of each respective term in the job title of the respective training data item across each of the respective occupations in the occupation taxonomy
- Both (i) a second value representing the inverse occupation frequency that is determined based, at least in part, on a number of occupations in the occupation taxonomy where each respective term in the job title of the respective training data item is present and (ii) a third value representing an occupation derivative that is based, at least in part, on a density of each respective term in the job title of the respective training data item across each of the respective occupations in the occupation taxonomy
- A sum of (i) the second value representing the inverse occupation frequency, and (ii) one-third of the third value representing the occupation derivative
The predetermined vector vocabulary may include terms that are present in training data items stored in a text corpus and terms that are not present in at least one training data item stored in the text corpus.
This Job Search Engine Patent can be found at:
Inventors: Ye Tian, Seyed Reza Mir Ghaderi, Xuejun Tao), Matthew Courtney, Pei-Chun Chen, and Christian Posse
Assignee: Google LLC
US Patent: 10,643,183
Granted: May 5, 2020
Filed: October 18, 2016
Methods, systems, and apparatus, including computer programs encoded on storage devices, for performing a job opportunity search. In one aspect, a system includes a data processing apparatus, and a computer-readable storage device having stored thereon instructions that, when executed by the data processing apparatus, cause the data processing apparatus to perform operations.
The operations include defining a vector vocabulary, defining an occupation taxonomy that includes multiple different occupations, obtaining multiple labeled training data items, wherein each labeled training data item is associated with at least (i) a job title, and (ii) an occupation, generating, for each of the respective labeled training data items, an occupation vector that includes a feature weight for each respective term in the vector vocabulary and associating each respective occupation vector with an occupation in the occupation taxonomy based on the occupation of the labeled training data item used to generate the occupation vector.
The Job Identification Model
Job postings from many different sources may be related to one or more occupations.
An occupation may include a particular category that encompasses one or more job titles that describe the same profession.
Two or more of the obtained job postings may be related to the same, or substantially similar, occupation while using different terminology to describe a job title for each of the two or more particular job postings.
Such differences in the terminology used to describe a particular job title of a job posting may arise for a variety of different reasons:
- Different people from different employers draft each respective job posting
- Unique job titles may be based on the culture of the employer’s company, the employer’s marketing strategy, or the like
How an Job Identification Model May Work
- At a first hair salon marketed as a rugged barbershop, advertises a job posting for a “barber”
- At a second hair salon marketed as a trendy beauty salon, advertises a job posting for a “stylist”
- At both, the job posting seeks a person for the occupation of a “hairdresser” who cuts and styles hair
- In a search system limited to keyword-based searching, a searcher seeking job opportunities for a “hairdresser” searchings for job opportunities using the term “barber” may not receive available job postings for a “stylist,” “hairdresser,” or the like if those job postings do not include the term “barber”
- The process in this patent uses a job identification model seeking to address this problem
The job occupation model includes:
- A classification unit
- An occupation taxonomy
The occupation taxonomy associates known job titles from existing job posts with one or more particular occupations.
During training, the job identification model associates each occupation vector that was generated for an obtained job posting with an occupation in the occupation taxonomy.
The classification unit may receive the search query and generate a query vector.
The classification unit may access the occupation taxonomy and calculate, for each particular occupation in the occupation taxonomy, a confidence score that is indicative of the likelihood that the query vector is properly classified into each particular occupation of the multiple occupations in the occupation taxonomy.
Then, the classification unit may select the occupation associated with the highest confidence score as the occupation that is related to the query vector and provide the selected occupation to the job identification model.
An Example of a Search Under this Job Opportunities Search Engine:
- A searcher queries “Software Guru” into a search box
- The search query may be received by the job identification model
- The job identification model provides an input to the classification unit including the query
- The classification unit generates a query vector
- The classification unit analyzes the query vector in view of the one or more occupation vectors that were generated and associated with each particular occupation in the occupation taxonomy such as occupation vectors
- The classification unit may then determine that the query vector is associated with a particular occupation based on a calculated confidence score, and select the particular occupation
- The job identification model may receive the particular occupation from the classification unit
- Alternatively, or in addition, the output from the classification unit may include a confidence score that indicates the likelihood that the query vector is related to the occupation output by the occupation taxonomy
- The occupation output from the occupation taxonomy can be used to retrieve relevant job postings
- The references to job postings that were identified using the job posting index are returned to the user device
- The obtained references to job postings may be displayed on the graphical user interface
- The obtained references to job postings may be presented as search results and include references to job postings for a “Senior Programmer,” a “Software Engineer,” a “Software Ninja,” or the like
- The job postings included in the search results were determined to be responsive to the search query “Software Guru” based at least in part on the vector analysis of the query vector and one or more occupation vectors used to train the occupation taxonomy and not merely based on keyword searching alone
Specifically, given the output of a particular occupation, the job identification model can retrieve one or more job postings using a job posting index that stores references to job postings based on occupation type
Takeaways About this Job Search Engine
In addition to the details about, the patent tells us how an occupation taxonomy may be trained, using training data. It also provides more details about the Job identification model. And then tells us about how a job search is performed using that job identification model.
I mentioned above that this job search engine patent and the application search engine patent are using methods that we may see in other search verticals at Google. I have written about one approach that could be used in Organic search in the post Google Using Website Representation Vectors to Classify with Expertise and Authority
Another one of those may involve image searching at Google. I’ve written about Google Image Search Labels Becoming More Semantic?
I will be posting more soon about how Google Image search is using neural networks to categorize and cluster Images to return in search results.