SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Recruiters, Job Seekers and Spammers:
Innovations in Job Search at LinkedIn


          Daria Sorokina
          Senior Data Scientist
          LinkedIn
Part I: Recruiters

“Multiple Objective Optimization in Recommendation
Systems”, Mario Rodriguez, Christian Posse, Ethan
Zhang. RecSys‟12
TalentMatch   Job
              Posting


  Member
  Profiles              Ranked
                        Talent


               Talent
               Match
TalentMatch Model
  Job Posting

title         industry       …
geo           description
company       functional area
                                           Text similarity
                                           features
    Candidate

General       Current Position
expertise     title
specialties   summary
education     tenure length
headline      industry
geo           functional area
experience    …



                                 The model can be trained on user activity signals
                                   like job ad clicks or job applications
TalentMatch Utility = fn(email rate, reply rate)


             Email
             Rate




 Recruiter
                     Reply                  Job
                           Problem!
                     Rate                   seeker?
Job Seeker Intent
                                    PASSIVE

          NON-JOB-
          SEEKER
                                        ACTIVE

Model: time till the job change
o   How long will this person stay in this job after this date?
o   Trained on past job positions from our users profiles
o   Accelerated failure time (AFT) model
o
                    æ               ö
           Ti = exp çå bk xik + sei ÷
                    è k             ø
Job-Seeker
Feature
Example:
Attrition by
Industry  Probability




                        Time
TalentMatch Utility
          fn(email rate, reply rate)



Job-Seeking Intent:
16x reply rate on
career-related mail
                          Reply
                          Rate
How: Controlled
Re-ranking                                     Ranking Score
                                               Distributions
Talent Match ranking
Match Score
1, Item X, 0.98, Non-Seeker
2, Item Y, 0.91, Non-Seeker
---------------------------------------                              Divergenc
3, Item Z, 0.89, Active                                              e
                                                                     score

            Re-ranking
            function f()
                                                                    optimize for
                                                                       both
Improved ranking
Match Score, Reranking Score
1, Item X, 0.98, 0.98, Non-Seeker                              Objective Score:
2, Item Z, 0.89, 0.93, Active                                  #Active in top N
--------------------------------------------
3, Item Y, 0.91, 0.91, Non-Seeker
Part II: Job Seekers
Learning to Rank. Fast and personalized.
Job Search.
Query “Data Scientist LinkedIn”
Learning To Rank
 Regular approach
  – A data point is a pair: {Query, Document}
  – Data label: “Is this document relevant for this query?”
      Can be done by crowdsourcing


 Job Search reality
  – A data point is a triple: {Query, Job position, User}
  – Data label: “Is this job relevant for this user who asked
    this query?”
      Depends on the user‟s location, industry, seniority…
      Too much to ask from a random person
      Have to collect labels from user signals
We use simplified version of FairPairs
(Radlinski, Joachims AAAI‟06)
                        Clicked!
                          ✔
flipped
                          ✗                   Each pair is flipped
                                               with a 50% chance
                          ✔
not flipped                                   Choose pairs where
                          ✔                    only the lower
                                               document is clicked
                          ✗        label 0
not flipped
                                              Save 1 positive (lower)
                          ✔        label 1
                                               and 1 negative (upper)
                                               results for the labeled
                          ✗
                                               data set
 flipped
                          ✗
Fair Pairs data is not enough for training

   The user clicks or skips only whatever is shown
   Bad results are not shown
   So there will be no “really bad” negatives in the training data
   We need to add them!

 For queries with many results, add all results from the last page as
  “easy negatives”

                                                             label 0
                                                             label 0
                                                             label 0
                                                              …
                                    …
                                                             label 0
Learning To Rank – Training a Model

 Best models for LTR are complex ensembles of trees
   – See results of Yahoo Learning to Rank „10 competition
   – LambdaMART, BagBoo, Additive Groves, MatrixNet …

 Complex models come at a cost
   – It takes long to calculate predictions
   – Requires a lot of optimization, often used with multi-level
     ranking

 Can we train a simple model that will resemble a
  complex one?
   – Train a complex model
   – Get insights on what it looks like
   – Modify a simple model accordingly
Training a Simple Model using a Complex Model

 Base simple model – logistic or linear regression
                              p
                       log        = b0 + b1 x1 + b2 x2 +... + bn xn
                             1- p

    – Does not handle well features with non-linear effects
    – Does not handle interactions (e.g., if-then-else rules)


 Target complex model – Additive Groves
    – (Sorokina, Caruana, Riedewald ECML‟07)


(1/N)·     +…+         + (1/N)·          +…+            +…+ (1/N)·    +…+



    – Comes with interaction detection and effect visualization tools
Improving LR – Feature Transformations

 Additive Groves can model and visualize non-linear effects

                                         Approximate the effect curve
  average prediction




                                          with a polynomial transform T(x)
                                                              – anything simple will do

                                         Apply T(x) to the original feature
                                          values

                       feature values



                                         average prediction
 Now the feature effect is linear
 Regression model will love it!

 b0 + b1 T(x1 )+ b2 x2 +... + bn xn
                                                                       T(x) values
Improving LR – Interaction Splits
 Additive Groves‟ interaction detection tool produces a list of strong
  interactions and corresponding joint effect plots
 average prediction




                      X2=1                      Effect of X1 is stronger when
                                                 X2 = 0
                                                Simple regression will not
                                                 capture this
                                                Often such X2 interacts with
                                                 other features as well

                        values of feature X1
                                                                      X2=?

  Solution:
  Build separate models
   for different values of X2
                                                b0 + b1 x1 +... + bn xn      a0 + a1 x1 +...+ an xn
Improving LR – Tree with LR leaves and transforms

 Both operations (effect transforms and interaction splits) can be
  applied multiple times in any order

 Resulting model – a simple tree with regression model leaves

                              X2=?




        b0 + b1 T(x1 )+...+ bn xn           X10< 0.1234 ?




                      a0 + a1 P(x1 )+...+ anQ(xn )     g 0 + g1 R(x1 )+...+ g nQ(xn )

 Gives a significant boost to the performance of the basic LR model
TreeExtra package

 A set of machine learning tools
   – Additive Groves ensemble
   – Interaction detection
   – Effect and interaction visualization

 http://additivegroves.net
   – Created by Daria Sorokina while in Cornell, CMU, Yandex, LinkedIn
     from 2006 to 2013
Part III: Spammers

Fighting black SEO
Search Spam
Search Spam
Search Spam
Training data for the search spam classifier
 Find the queries targeted by spammers.
    – 10,000 most common non-name queries.
    – Spammers love optimizing for [marketing]
    – But not so much for [david smith]

 Look at top results for a generic user.
    – i.e., show unpersonalized search results.

 Label data by crowdsourcing.
    – Definition of spam is non-personalized

 Train a model
    – Spam scores are recalculated offline once in a while
    – So the model complexity is not an issue
    – Additive Groves works well. (Could use any ensemble of trees)
ROC curve. Choosing thresholds.


              1
Spam score
 threshold   0.9

             0.8

     a       0.7

             0.6

             0.5

      b      0.4

             0.3

0<a<b<1      0.2

             0.1

              0
                   0   0.2   0.4   0.6   0.8   1
Integrating the Spam Score into Relevance

 Spam model yields a probability between 0 and 1.

 Convert spam score into a factor
    – [0.0 <= score <= a]
         not a spammer,
         factor = 1.0
    – [b <= score <= 1.0]
         Spammer
         factor = 0.0
    – [a <= score <= b]
         Suspicious
         linearly scale score from [a, b] to [1, 0]


 Multiply relevance score by factor
We are hiring!

Weitere ähnliche Inhalte

Was ist angesagt?

Neo4j Data Science Presentation
Neo4j Data Science PresentationNeo4j Data Science Presentation
Neo4j Data Science PresentationMax De Marzi
 
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Qi Guo
 
Semantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingSemantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingGlen Cathey
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceTrey Grainger
 
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...Susanna Frazier
 
Navigating Semantic Search
Navigating Semantic SearchNavigating Semantic Search
Navigating Semantic SearchMonster
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
 
An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchDavid Amerland
 
Diversity Sourcing Part II
Diversity Sourcing Part IIDiversity Sourcing Part II
Diversity Sourcing Part IIKay Kelison
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j PresentationMax De Marzi
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...Christian Posse
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
 
Sourcing 1.0 to sourcing 2.0
Sourcing 1.0 to sourcing 2.0Sourcing 1.0 to sourcing 2.0
Sourcing 1.0 to sourcing 2.0louismaerten
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 

Was ist angesagt? (17)

Neo4j Data Science Presentation
Neo4j Data Science PresentationNeo4j Data Science Presentation
Neo4j Data Science Presentation
 
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
 
Semantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingSemantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and Recruiting
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
 
Navigating Semantic Search
Navigating Semantic SearchNavigating Semantic Search
Navigating Semantic Search
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904Labs
 
An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic Search
 
Diversity Sourcing Part II
Diversity Sourcing Part IIDiversity Sourcing Part II
Diversity Sourcing Part II
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j Presentation
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Sourcing 1.0 to sourcing 2.0
Sourcing 1.0 to sourcing 2.0Sourcing 1.0 to sourcing 2.0
Sourcing 1.0 to sourcing 2.0
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Making things findable
Making things findableMaking things findable
Making things findable
 

Ähnlich wie Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Machine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketMachine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketFabian Abel
 
Optimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationOptimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationHan Jiang
 
Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiersbutest
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務台灣資料科學年會
 
2011 Crowdsourcing Search Evaluation
2011 Crowdsourcing Search Evaluation2011 Crowdsourcing Search Evaluation
2011 Crowdsourcing Search EvaluationBrian Johnson
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningChun-Ming Chang
 
Learning when to give up: theory, practice and perspectives
Learning when to give up: theory, practice and perspectivesLearning when to give up: theory, practice and perspectives
Learning when to give up: theory, practice and perspectivesGiuseppe (Pino) Di Fabbrizio
 
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...Mario Rodriguez
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profilingyingfeng
 
AI and ML Skills for the Testing World Tutorial
AI and ML Skills for the Testing World TutorialAI and ML Skills for the Testing World Tutorial
AI and ML Skills for the Testing World TutorialTariq King
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetCrossing Minds
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boostingbutest
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Francesca Lazzeri, PhD
 
Big Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInBig Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInAlexis Baird
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and SemanticsDaniel Tunkelang
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30
 
1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기Sungmin Kim
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 

Ähnlich wie Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn (20)

Machine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketMachine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job Market
 
Optimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationOptimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluation
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiers
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務
 
2011 Crowdsourcing Search Evaluation
2011 Crowdsourcing Search Evaluation2011 Crowdsourcing Search Evaluation
2011 Crowdsourcing Search Evaluation
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
Learning when to give up: theory, practice and perspectives
Learning when to give up: theory, practice and perspectivesLearning when to give up: theory, practice and perspectives
Learning when to give up: theory, practice and perspectives
 
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profiling
 
AI and ML Skills for the Testing World Tutorial
AI and ML Skills for the Testing World TutorialAI and ML Skills for the Testing World Tutorial
AI and ML Skills for the Testing World Tutorial
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
 
Big Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInBig Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedIn
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

  • 1. Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn Daria Sorokina Senior Data Scientist LinkedIn
  • 2. Part I: Recruiters “Multiple Objective Optimization in Recommendation Systems”, Mario Rodriguez, Christian Posse, Ethan Zhang. RecSys‟12
  • 3.
  • 4. TalentMatch Job Posting Member Profiles Ranked Talent Talent Match
  • 5. TalentMatch Model Job Posting title industry … geo description company functional area Text similarity features Candidate General Current Position expertise title specialties summary education tenure length headline industry geo functional area experience … The model can be trained on user activity signals like job ad clicks or job applications
  • 6. TalentMatch Utility = fn(email rate, reply rate) Email Rate Recruiter Reply Job Problem! Rate seeker?
  • 7. Job Seeker Intent PASSIVE NON-JOB- SEEKER ACTIVE Model: time till the job change o How long will this person stay in this job after this date? o Trained on past job positions from our users profiles o Accelerated failure time (AFT) model o æ ö Ti = exp çå bk xik + sei ÷ è k ø
  • 9. TalentMatch Utility fn(email rate, reply rate) Job-Seeking Intent: 16x reply rate on career-related mail Reply Rate
  • 10. How: Controlled Re-ranking Ranking Score Distributions Talent Match ranking Match Score 1, Item X, 0.98, Non-Seeker 2, Item Y, 0.91, Non-Seeker --------------------------------------- Divergenc 3, Item Z, 0.89, Active e score Re-ranking function f() optimize for both Improved ranking Match Score, Reranking Score 1, Item X, 0.98, 0.98, Non-Seeker Objective Score: 2, Item Z, 0.89, 0.93, Active #Active in top N -------------------------------------------- 3, Item Y, 0.91, 0.91, Non-Seeker
  • 11. Part II: Job Seekers Learning to Rank. Fast and personalized.
  • 12. Job Search. Query “Data Scientist LinkedIn”
  • 13. Learning To Rank  Regular approach – A data point is a pair: {Query, Document} – Data label: “Is this document relevant for this query?”  Can be done by crowdsourcing  Job Search reality – A data point is a triple: {Query, Job position, User} – Data label: “Is this job relevant for this user who asked this query?”  Depends on the user‟s location, industry, seniority…  Too much to ask from a random person  Have to collect labels from user signals
  • 14. We use simplified version of FairPairs (Radlinski, Joachims AAAI‟06) Clicked! ✔ flipped ✗  Each pair is flipped with a 50% chance ✔ not flipped  Choose pairs where ✔ only the lower document is clicked ✗ label 0 not flipped  Save 1 positive (lower) ✔ label 1 and 1 negative (upper) results for the labeled ✗ data set flipped ✗
  • 15. Fair Pairs data is not enough for training  The user clicks or skips only whatever is shown  Bad results are not shown  So there will be no “really bad” negatives in the training data  We need to add them!  For queries with many results, add all results from the last page as “easy negatives” label 0 label 0 label 0 … … label 0
  • 16. Learning To Rank – Training a Model  Best models for LTR are complex ensembles of trees – See results of Yahoo Learning to Rank „10 competition – LambdaMART, BagBoo, Additive Groves, MatrixNet …  Complex models come at a cost – It takes long to calculate predictions – Requires a lot of optimization, often used with multi-level ranking  Can we train a simple model that will resemble a complex one? – Train a complex model – Get insights on what it looks like – Modify a simple model accordingly
  • 17. Training a Simple Model using a Complex Model  Base simple model – logistic or linear regression p log = b0 + b1 x1 + b2 x2 +... + bn xn 1- p – Does not handle well features with non-linear effects – Does not handle interactions (e.g., if-then-else rules)  Target complex model – Additive Groves – (Sorokina, Caruana, Riedewald ECML‟07) (1/N)· +…+ + (1/N)· +…+ +…+ (1/N)· +…+ – Comes with interaction detection and effect visualization tools
  • 18. Improving LR – Feature Transformations  Additive Groves can model and visualize non-linear effects  Approximate the effect curve average prediction with a polynomial transform T(x) – anything simple will do  Apply T(x) to the original feature values feature values average prediction  Now the feature effect is linear  Regression model will love it! b0 + b1 T(x1 )+ b2 x2 +... + bn xn T(x) values
  • 19. Improving LR – Interaction Splits  Additive Groves‟ interaction detection tool produces a list of strong interactions and corresponding joint effect plots average prediction X2=1  Effect of X1 is stronger when X2 = 0  Simple regression will not capture this  Often such X2 interacts with other features as well values of feature X1 X2=?  Solution:  Build separate models for different values of X2 b0 + b1 x1 +... + bn xn a0 + a1 x1 +...+ an xn
  • 20. Improving LR – Tree with LR leaves and transforms  Both operations (effect transforms and interaction splits) can be applied multiple times in any order  Resulting model – a simple tree with regression model leaves X2=? b0 + b1 T(x1 )+...+ bn xn X10< 0.1234 ? a0 + a1 P(x1 )+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn )  Gives a significant boost to the performance of the basic LR model
  • 21. TreeExtra package  A set of machine learning tools – Additive Groves ensemble – Interaction detection – Effect and interaction visualization  http://additivegroves.net – Created by Daria Sorokina while in Cornell, CMU, Yandex, LinkedIn from 2006 to 2013
  • 26. Training data for the search spam classifier  Find the queries targeted by spammers. – 10,000 most common non-name queries. – Spammers love optimizing for [marketing] – But not so much for [david smith]  Look at top results for a generic user. – i.e., show unpersonalized search results.  Label data by crowdsourcing. – Definition of spam is non-personalized  Train a model – Spam scores are recalculated offline once in a while – So the model complexity is not an issue – Additive Groves works well. (Could use any ensemble of trees)
  • 27. ROC curve. Choosing thresholds. 1 Spam score threshold 0.9 0.8 a 0.7 0.6 0.5 b 0.4 0.3 0<a<b<1 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1
  • 28. Integrating the Spam Score into Relevance  Spam model yields a probability between 0 and 1.  Convert spam score into a factor – [0.0 <= score <= a]  not a spammer,  factor = 1.0 – [b <= score <= 1.0]  Spammer  factor = 0.0 – [a <= score <= b]  Suspicious  linearly scale score from [a, b] to [1, 0]  Multiply relevance score by factor

Hinweis der Redaktion

  1. …This meaningfully contributes to the growth of our three diverse revenue streams – Talent Solutions, Marketing Solutions, and Premium Subscriptions:Talent Solutions:As the world’s largest professional network, LinkedIn is the single best place to connect with passive and high-quality active job candidates. LinkedIn Talent Solutions improves the efficiency of recruiting the best at scale, giving recruiting teams a competitive edge in the war for talent. [LinkedIn uniquely possesses an unprecedented wealth of accurate and up-to-date professional information, the full extent of which can only be accessed through LinkedIn Talent Solutions. We provide the tools that enable recruiting teams to understand their target audience, position their company as the employer of choice, engage with relevant, high-quality candidates at scale and accurately measure their results.]Marketing Solutions:LinkedIn Marketing Solutions helps advertisers and marketers reach influential, affluent and highly-educated audiences in a very relevant and engaging way. LinkedIn has the most valuable audience by composition anywhere on the internet. This global and unique asset gets products and services in front of right professional at right time. [LinkedIn Marketing Solutions helps advertisers/marketers reach influential, affluent and highly-educated audiences in a very relevant and engaging way. Ithas three competitive advantages over any other online platform: Scale, Accuracy, and Portfolio: Scale: Over 200 million members that are all professionals; one of the most influential, affluent and highly educated audiences on the Web; more decision makers, higher average household incomes, and more college or post-college graduates than U.S. visitors of many leading business websites.  Accuracy: Rich profile-based targeting allows advertisers to reach very specific audiences; targeting includes by geography, job function, industry, company size, seniority, age, gender, company name, LinkedIn Group and even job title. Portfolio: A suite of high-impact and engaging products helps advertisers get their messages across, from text-based ads to massive display campaigns to socially-driven branding opportunities like Company Pages and Groups.] Premium Subscriptions:LinkedIn Premium Subscriptions are tailored for an array of member needs and segments, to provide the right/additional tools to enable customers to be better at what they do every day / more productive and successful in their careers. The aim of our three business lines, and all of the products and services we develop is to connect talent and opportunity at a massive scale, and to make the world’s professionals more productive and successful. In doing so, we aspire to create economic opportunity for every professional in the world.
  2. So, here is high level overview of Talent Match. Someone comes to the site and posts a job. We then scour the entire member database looking for the members who best match that job, and we recommend a ranked list of those members to the job poster.
  3. This is how we do this matching. We combine the job and the candidate into a single feature vector, where each feature denotes various similarity measures between attributes of the job and attributes of the job poster, and then we find the relative importance of these features using a supervised learning method like logistic regression trained on a click signal such as job applications. This gives us a model that knows how to differentiate good job-member pairs from bad job-member pairs.
  4. Let’s go over the facets of the utility function of the Talent Match system. First, the snippet needs to be good enough to convince the job poster to purchase the recommendations. That’s the booking rate. Then, once purchased, the job poster gets to look at the full profile of the candidate recommended and decides whether or not they are indeed a good match for the job. If the candidate is a good match, the job poster may then decide to email the candidate regarding the job opportunity. That’s the email rate. Finally, if the candidate is interested, then the candidate will reply positively to the job poster. Giving us the reply rate. Now that the link is established, they can take it from there. But from our perspective, these 3 steps are required for there to be relevant engagement within this system.Out of the 3 facets of the utility function, the reply rate was identified as needing improvement. Job posters were complaining the they were emailing candidates, but the candidates were not replying enough. This was the problem we needed to solve. We figured the booking rate and the email rate were well accounted for by the existing TalentMatch model, but even if someone is a great match for the job, that does not mean they are going to reply. So, we thought that maybe people were not replying because they were probably not looking for a job. What if we could determine if someone was a job seeker, and then include more of those people in the recommendations?
  5. So, we had already developed a model that computes the job seeking propensity for each member, and we affectionately refer to this model as flightmeter. It turns out that many people who are open to new opportunities, do not self-identify as job seekers, so this model helps us identify those people. You can think of the job seeking propensity as the probability that the member will switch positions in the next month. We also output a segmentation of this probability into actives, passives, and non-job-seekers, and we consider actives and passives to have a high job seeking intent.This Flightmeter model is completely different from the TalentMatch model. It is a survival model where the entity whose survival we’re analyzing is a job, or more specifically, a position. Based on data derived from the lifetime of millions of positions, we model the duration of a position as a function of various features in what is known as an accelerated failure time model, and this allows us to compute the probability that a given position will end within the next month.
  6. There are many signals the we can use to compute the job seeking intent. We may have the user’s job seeking activity on the site: are they searching or applying for jobs. Those are obvious signals. But we have others. For example, we know that different industries have different attrition rates. This plot includes a few representative industries and their survival curves. The survival curve gives the probability that someone will still be at their position X months down the road if they start that position today.These are survival curves for a few of the most extreme industries, some of the most hazardous including “political organization” and “animation” and some of the least hazardous including “alternative medicine” and “ranching”. In the “political organization” industry, which is the red line at the bottom, more than 50% of people don’t last 2 years in a given position.
  7. So, Intuitively, it makes sense to suggest users who are job seekers in TalentMatch. But we confirmed our intuition, we ran the numbers, and saw that users with a high job seeking intent (actives and passives) have a much higher rate of reply to career related emails when compared to non-job-seekers (16 times the reply rate). And this is exactly the facet of the utility function of TalentMatch that we are interested in improving. So, what we want to do is incorporate the job seeker intent into the TalentMatch model, and we want to do so without negatively affecting the booking rate and the email rate.
  8. So, what we want is a controlled perturbation of the ranking output by the talent match model, and this is how we are gonna do it: given the talent match ranking, we run a perturbation function on it that generates another ranking, the perturbed ranking, which optimizes for a metric we’re interested in (in the case of TalentMatch, it’s number of users with high-job seeking intent in the top-12 recommendations). Given the 2 rankings and their distribution of match scores, we can compute the distance between them using a variety of metrics, for example KL divergence or Euclidean distance. This divergence score is what will help us to make sure we are not negatively affecting the quality of the recommendations. Notice how, in the perturbed ranking, item Z was bumped from its original third position, below the cutoff line, to the second position, and so whereas before we had 2 non-seekers above the cutoff, meaning they would be recommended, now we have a non-seeker and an active. Also notice, that the perturbation is minimal. We should feel comfortable bumping item Z to the second position, but not to the first position.There are then 3 functions that we need to define: the perturbation function, the divergence function, and the objective function. The parameters of the perturbation function is what we will be estimating based the performance established by the divergence and objective measures: we want high scores on the objective and low scores on the divergence.