Driving Behavioral Change for Information Management through Data-Driven Gree...
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
1. Recruiters, Job Seekers and Spammers:
Innovations in Job Search at LinkedIn
Daria Sorokina
Senior Data Scientist
LinkedIn
2. Part I: Recruiters
“Multiple Objective Optimization in Recommendation
Systems”, Mario Rodriguez, Christian Posse, Ethan
Zhang. RecSys‟12
3.
4. TalentMatch Job
Posting
Member
Profiles Ranked
Talent
Talent
Match
5. TalentMatch Model
Job Posting
title industry …
geo description
company functional area
Text similarity
features
Candidate
General Current Position
expertise title
specialties summary
education tenure length
headline industry
geo functional area
experience …
The model can be trained on user activity signals
like job ad clicks or job applications
7. Job Seeker Intent
PASSIVE
NON-JOB-
SEEKER
ACTIVE
Model: time till the job change
o How long will this person stay in this job after this date?
o Trained on past job positions from our users profiles
o Accelerated failure time (AFT) model
o
æ ö
Ti = exp çå bk xik + sei ÷
è k ø
9. TalentMatch Utility
fn(email rate, reply rate)
Job-Seeking Intent:
16x reply rate on
career-related mail
Reply
Rate
10. How: Controlled
Re-ranking Ranking Score
Distributions
Talent Match ranking
Match Score
1, Item X, 0.98, Non-Seeker
2, Item Y, 0.91, Non-Seeker
--------------------------------------- Divergenc
3, Item Z, 0.89, Active e
score
Re-ranking
function f()
optimize for
both
Improved ranking
Match Score, Reranking Score
1, Item X, 0.98, 0.98, Non-Seeker Objective Score:
2, Item Z, 0.89, 0.93, Active #Active in top N
--------------------------------------------
3, Item Y, 0.91, 0.91, Non-Seeker
11. Part II: Job Seekers
Learning to Rank. Fast and personalized.
13. Learning To Rank
Regular approach
– A data point is a pair: {Query, Document}
– Data label: “Is this document relevant for this query?”
Can be done by crowdsourcing
Job Search reality
– A data point is a triple: {Query, Job position, User}
– Data label: “Is this job relevant for this user who asked
this query?”
Depends on the user‟s location, industry, seniority…
Too much to ask from a random person
Have to collect labels from user signals
14. We use simplified version of FairPairs
(Radlinski, Joachims AAAI‟06)
Clicked!
✔
flipped
✗ Each pair is flipped
with a 50% chance
✔
not flipped Choose pairs where
✔ only the lower
document is clicked
✗ label 0
not flipped
Save 1 positive (lower)
✔ label 1
and 1 negative (upper)
results for the labeled
✗
data set
flipped
✗
15. Fair Pairs data is not enough for training
The user clicks or skips only whatever is shown
Bad results are not shown
So there will be no “really bad” negatives in the training data
We need to add them!
For queries with many results, add all results from the last page as
“easy negatives”
label 0
label 0
label 0
…
…
label 0
16. Learning To Rank – Training a Model
Best models for LTR are complex ensembles of trees
– See results of Yahoo Learning to Rank „10 competition
– LambdaMART, BagBoo, Additive Groves, MatrixNet …
Complex models come at a cost
– It takes long to calculate predictions
– Requires a lot of optimization, often used with multi-level
ranking
Can we train a simple model that will resemble a
complex one?
– Train a complex model
– Get insights on what it looks like
– Modify a simple model accordingly
17. Training a Simple Model using a Complex Model
Base simple model – logistic or linear regression
p
log = b0 + b1 x1 + b2 x2 +... + bn xn
1- p
– Does not handle well features with non-linear effects
– Does not handle interactions (e.g., if-then-else rules)
Target complex model – Additive Groves
– (Sorokina, Caruana, Riedewald ECML‟07)
(1/N)· +…+ + (1/N)· +…+ +…+ (1/N)· +…+
– Comes with interaction detection and effect visualization tools
18. Improving LR – Feature Transformations
Additive Groves can model and visualize non-linear effects
Approximate the effect curve
average prediction
with a polynomial transform T(x)
– anything simple will do
Apply T(x) to the original feature
values
feature values
average prediction
Now the feature effect is linear
Regression model will love it!
b0 + b1 T(x1 )+ b2 x2 +... + bn xn
T(x) values
19. Improving LR – Interaction Splits
Additive Groves‟ interaction detection tool produces a list of strong
interactions and corresponding joint effect plots
average prediction
X2=1 Effect of X1 is stronger when
X2 = 0
Simple regression will not
capture this
Often such X2 interacts with
other features as well
values of feature X1
X2=?
Solution:
Build separate models
for different values of X2
b0 + b1 x1 +... + bn xn a0 + a1 x1 +...+ an xn
20. Improving LR – Tree with LR leaves and transforms
Both operations (effect transforms and interaction splits) can be
applied multiple times in any order
Resulting model – a simple tree with regression model leaves
X2=?
b0 + b1 T(x1 )+...+ bn xn X10< 0.1234 ?
a0 + a1 P(x1 )+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn )
Gives a significant boost to the performance of the basic LR model
21. TreeExtra package
A set of machine learning tools
– Additive Groves ensemble
– Interaction detection
– Effect and interaction visualization
http://additivegroves.net
– Created by Daria Sorokina while in Cornell, CMU, Yandex, LinkedIn
from 2006 to 2013
26. Training data for the search spam classifier
Find the queries targeted by spammers.
– 10,000 most common non-name queries.
– Spammers love optimizing for [marketing]
– But not so much for [david smith]
Look at top results for a generic user.
– i.e., show unpersonalized search results.
Label data by crowdsourcing.
– Definition of spam is non-personalized
Train a model
– Spam scores are recalculated offline once in a while
– So the model complexity is not an issue
– Additive Groves works well. (Could use any ensemble of trees)
…This meaningfully contributes to the growth of our three diverse revenue streams – Talent Solutions, Marketing Solutions, and Premium Subscriptions:Talent Solutions:As the world’s largest professional network, LinkedIn is the single best place to connect with passive and high-quality active job candidates. LinkedIn Talent Solutions improves the efficiency of recruiting the best at scale, giving recruiting teams a competitive edge in the war for talent. [LinkedIn uniquely possesses an unprecedented wealth of accurate and up-to-date professional information, the full extent of which can only be accessed through LinkedIn Talent Solutions. We provide the tools that enable recruiting teams to understand their target audience, position their company as the employer of choice, engage with relevant, high-quality candidates at scale and accurately measure their results.]Marketing Solutions:LinkedIn Marketing Solutions helps advertisers and marketers reach influential, affluent and highly-educated audiences in a very relevant and engaging way. LinkedIn has the most valuable audience by composition anywhere on the internet. This global and unique asset gets products and services in front of right professional at right time. [LinkedIn Marketing Solutions helps advertisers/marketers reach influential, affluent and highly-educated audiences in a very relevant and engaging way. Ithas three competitive advantages over any other online platform: Scale, Accuracy, and Portfolio: Scale: Over 200 million members that are all professionals; one of the most influential, affluent and highly educated audiences on the Web; more decision makers, higher average household incomes, and more college or post-college graduates than U.S. visitors of many leading business websites. Accuracy: Rich profile-based targeting allows advertisers to reach very specific audiences; targeting includes by geography, job function, industry, company size, seniority, age, gender, company name, LinkedIn Group and even job title. Portfolio: A suite of high-impact and engaging products helps advertisers get their messages across, from text-based ads to massive display campaigns to socially-driven branding opportunities like Company Pages and Groups.] Premium Subscriptions:LinkedIn Premium Subscriptions are tailored for an array of member needs and segments, to provide the right/additional tools to enable customers to be better at what they do every day / more productive and successful in their careers. The aim of our three business lines, and all of the products and services we develop is to connect talent and opportunity at a massive scale, and to make the world’s professionals more productive and successful. In doing so, we aspire to create economic opportunity for every professional in the world.
So, here is high level overview of Talent Match. Someone comes to the site and posts a job. We then scour the entire member database looking for the members who best match that job, and we recommend a ranked list of those members to the job poster.
This is how we do this matching. We combine the job and the candidate into a single feature vector, where each feature denotes various similarity measures between attributes of the job and attributes of the job poster, and then we find the relative importance of these features using a supervised learning method like logistic regression trained on a click signal such as job applications. This gives us a model that knows how to differentiate good job-member pairs from bad job-member pairs.
Let’s go over the facets of the utility function of the Talent Match system. First, the snippet needs to be good enough to convince the job poster to purchase the recommendations. That’s the booking rate. Then, once purchased, the job poster gets to look at the full profile of the candidate recommended and decides whether or not they are indeed a good match for the job. If the candidate is a good match, the job poster may then decide to email the candidate regarding the job opportunity. That’s the email rate. Finally, if the candidate is interested, then the candidate will reply positively to the job poster. Giving us the reply rate. Now that the link is established, they can take it from there. But from our perspective, these 3 steps are required for there to be relevant engagement within this system.Out of the 3 facets of the utility function, the reply rate was identified as needing improvement. Job posters were complaining the they were emailing candidates, but the candidates were not replying enough. This was the problem we needed to solve. We figured the booking rate and the email rate were well accounted for by the existing TalentMatch model, but even if someone is a great match for the job, that does not mean they are going to reply. So, we thought that maybe people were not replying because they were probably not looking for a job. What if we could determine if someone was a job seeker, and then include more of those people in the recommendations?
So, we had already developed a model that computes the job seeking propensity for each member, and we affectionately refer to this model as flightmeter. It turns out that many people who are open to new opportunities, do not self-identify as job seekers, so this model helps us identify those people. You can think of the job seeking propensity as the probability that the member will switch positions in the next month. We also output a segmentation of this probability into actives, passives, and non-job-seekers, and we consider actives and passives to have a high job seeking intent.This Flightmeter model is completely different from the TalentMatch model. It is a survival model where the entity whose survival we’re analyzing is a job, or more specifically, a position. Based on data derived from the lifetime of millions of positions, we model the duration of a position as a function of various features in what is known as an accelerated failure time model, and this allows us to compute the probability that a given position will end within the next month.
There are many signals the we can use to compute the job seeking intent. We may have the user’s job seeking activity on the site: are they searching or applying for jobs. Those are obvious signals. But we have others. For example, we know that different industries have different attrition rates. This plot includes a few representative industries and their survival curves. The survival curve gives the probability that someone will still be at their position X months down the road if they start that position today.These are survival curves for a few of the most extreme industries, some of the most hazardous including “political organization” and “animation” and some of the least hazardous including “alternative medicine” and “ranching”. In the “political organization” industry, which is the red line at the bottom, more than 50% of people don’t last 2 years in a given position.
So, Intuitively, it makes sense to suggest users who are job seekers in TalentMatch. But we confirmed our intuition, we ran the numbers, and saw that users with a high job seeking intent (actives and passives) have a much higher rate of reply to career related emails when compared to non-job-seekers (16 times the reply rate). And this is exactly the facet of the utility function of TalentMatch that we are interested in improving. So, what we want to do is incorporate the job seeker intent into the TalentMatch model, and we want to do so without negatively affecting the booking rate and the email rate.
So, what we want is a controlled perturbation of the ranking output by the talent match model, and this is how we are gonna do it: given the talent match ranking, we run a perturbation function on it that generates another ranking, the perturbed ranking, which optimizes for a metric we’re interested in (in the case of TalentMatch, it’s number of users with high-job seeking intent in the top-12 recommendations). Given the 2 rankings and their distribution of match scores, we can compute the distance between them using a variety of metrics, for example KL divergence or Euclidean distance. This divergence score is what will help us to make sure we are not negatively affecting the quality of the recommendations. Notice how, in the perturbed ranking, item Z was bumped from its original third position, below the cutoff line, to the second position, and so whereas before we had 2 non-seekers above the cutoff, meaning they would be recommended, now we have a non-seeker and an active. Also notice, that the perturbation is minimal. We should feel comfortable bumping item Z to the second position, but not to the first position.There are then 3 functions that we need to define: the perturbation function, the divergence function, and the objective function. The parameters of the perturbation function is what we will be estimating based the performance established by the divergence and objective measures: we want high scores on the objective and low scores on the divergence.