SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
formation Retrieval at LinkedIn
Shakti Sinha Daniel Tunkelang
Head, Search Relevance Head, Query Understanding
Shakti Daniel
Find and be Found:
Why do 225M+ people use LinkedIn?
2
Profile: the professional identity of record.
3
Job recommendations.
4
Publishing platform for professional content.
5
Search helps members find and be found.
6
Search for people,
7
Search for people, jobs,
8
Search for people, jobs, groups, and more.
9
Every search is personalized.
10
Let’s talk a bit about how it all works.
§  Query Understanding
§  Ranking
More at http://data.linkedin.com/search.
11
Query Understanding
12
Daniel Tunkelang
Head, Query Understanding
Pre-retrieval: segment and tag queries.
lucene software engineer
lucene “software engineer”
LinkedIn’s focus: entity-oriented search.
14
Company
Employees
Jobs
Name
Search
Query tagging: key to query understanding.
§  Using human judgments to evaluate tag precision.
–  Extremely accurate (> 99%) for identifying person names.
–  Harder to distinguish company vs. title vs. skill (e.g., oracle dba).
§  Comparing CTR for tag matches vs. non-matches.
–  Difference can be large enough to suggest filtering vs. ranking:
15
Detecting navigational vs. exploratory queries.
Pre-retrieval
§  Sequence of query tags.
Post-retrieval
§  Distribution of scores / features.
16
Click behavior
§  Title searches >50x more
likely to get 2+ clicks than
name searches.
Query expansion for exploratory queries.
17
software patent lawyer
Query expansions derived
from reformulations.
e.g., lawyer -> attorney
Understanding misspelled queries.
18
daniel tankalong infomation retrieval
marisa meyer ingenero eletrico
jonathan podemsky desenista industrail
Did you mean daniel tunkelang?
Did you mean marissa mayer?
Did you mean johnathan podemsky?
Did you mean information retrieval?
Did you mean ingeniero electrico?
Did you mean desenhista industrial?
Spelling out the details.
entity data
people, companies
successful queries
tunkelang =>
reformulations
marisa => marissa
n-grams
dublin => du ub bl li in
metaphones
mark/marc => MRK
word pairs
johnathan podemsky
INDEX
} {marisa meyer yoohoo
marissa
marisa
meyer
mayer
yahoo
yoohoo
19
Ranking
20
Shakti Sinha
Head, Search Relevance
LinkedIn search is personalized.
21
kevin scott
But global factors matter.
22
Relevant results can be in or out of network.
23
§  Searcher’s network matters for relevance.
–  Within network results have higher CTR.
§  But the network is not enough.
–  About two thirds of search clicks come from out of
network results.
Personalized machine-learned ranking.
24
§  Data point is a triple (searcher, query, document).
–  Searcher features are important!
§  Labels: Is this document relevant to the query and
the user?
–  Depends on the user’s network, location, etc.
–  Too much to ask random person to judge.
§  Training data has to be collected from search logs.
Search log data has biases.
25
§  Presentation bias
–  Results shown higher tend to get clicked more often.
–  Use FairPairs [Radlinski and Joachims, AAAI’06].
not flipped
flipped
flipped
Clicked!
✗
✔
✔
✗
✗
✗
training data
Search log data has biases.
26
§  Sample bias
–  User clicks or skips only what is shown.
–  What about low scoring results from existing model?
–  Add low-scoring results as ‘easy negatives’ so model
learns bad results not presented to user.
…
label 0
label 0
label 0
label 0
…
page 1 page 2 page 3 page n
27
How to train your model.
How to train your model.
28
§  Train simple models to resemble complex ones.
–  Build Additive Groves model [Sorokina et al, ECML ’07],
which is good at detecting interactions.
§  Build tree with logistic regression leaves.
§  By restricting tree to user and query features, only
regression model evaluated for each document.
β0 +β1 T(x1)+...+βn xn
α0 +α1 P(x1)+...+αnQ(xn )
X2=?
X10< 0.1234 ?
γ0 +γ1 R(x1)+...+γnQ(xn )
Take-Aways
§  LinkedIn’s search problem is unique because of deep role
of personalization – users are integral part of the corpus.
§  Query understanding allows us to optimize for entity-
oriented search against semi-structured content.
§  Ranking requires us to contextually apply global and
personalized user, query, and document features.
29
Thank you!
30
225,
Want to learn more?
§  Check out http://data.linkedin.com/search.
§  Contact us:
–  Shakti: ssinha@linkedin.com
http://linkedin.com/in/sdsinha
–  Daniel: dtunkelang@linkedin.com
http://linkedin.com/in/dtunkelang
–  Asif: amakhani@linkedin.com
http://linkedin.com/in/asifmakhani
§  Did we mention that we’re hiring?
31

Weitere ähnliche Inhalte

Was ist angesagt?

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Qi Guo
 
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEOSearch Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Koray Tugberk GUBUR
 

Was ist angesagt? (20)

Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
[CXL Live 16] How to Utilize Your Test Capacity? by Ton Wesseling
[CXL Live 16] How to Utilize Your Test Capacity? by Ton Wesseling[CXL Live 16] How to Utilize Your Test Capacity? by Ton Wesseling
[CXL Live 16] How to Utilize Your Test Capacity? by Ton Wesseling
 
Search relevance
Search relevanceSearch relevance
Search relevance
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
21 Success Tactics and Your All-In-One Roadmap for Enterprise SEO and Mega Si...
21 Success Tactics and Your All-In-One Roadmap for Enterprise SEO and Mega Si...21 Success Tactics and Your All-In-One Roadmap for Enterprise SEO and Mega Si...
21 Success Tactics and Your All-In-One Roadmap for Enterprise SEO and Mega Si...
 
Web Search and Mining
Web Search and MiningWeb Search and Mining
Web Search and Mining
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
How to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With PythonHow to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With Python
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Semantic Publishing and Entity SEO - Conteference 20-11-2022
Semantic Publishing and Entity SEO - Conteference 20-11-2022Semantic Publishing and Entity SEO - Conteference 20-11-2022
Semantic Publishing and Entity SEO - Conteference 20-11-2022
 
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered Search
 
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEOSearch Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Approximate Vector Search at Scale, With Application to Image Search - SciPY ...
Approximate Vector Search at Scale, With Application to Image Search - SciPY ...Approximate Vector Search at Scale, With Application to Image Search - SciPY ...
Approximate Vector Search at Scale, With Application to Image Search - SciPY ...
 
System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and search
 

Andere mochten auch

Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInRecruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Daria Sorokina
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
Daniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
Daniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
Daniel Tunkelang
 

Andere mochten auch (10)

Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInRecruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 
Design in Tech Report 2017
Design in Tech Report 2017Design in Tech Report 2017
Design in Tech Report 2017
 

Ähnlich wie Find and be Found: Information Retrieval at LinkedIn

Ähnlich wie Find and be Found: Information Retrieval at LinkedIn (20)

Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedIn
 
Keep calm presentation for cipd exhibition 2012
Keep calm presentation for cipd exhibition 2012Keep calm presentation for cipd exhibition 2012
Keep calm presentation for cipd exhibition 2012
 
smAlbany 2013 power resume_search presentation times union monster
smAlbany 2013 power resume_search presentation  times union monstersmAlbany 2013 power resume_search presentation  times union monster
smAlbany 2013 power resume_search presentation times union monster
 
LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices
 
LinkedIn Basics and Best Practices July 2018
LinkedIn Basics and Best Practices July 2018LinkedIn Basics and Best Practices July 2018
LinkedIn Basics and Best Practices July 2018
 
Personal Brand Exploration I George Stefas
Personal Brand Exploration I George StefasPersonal Brand Exploration I George Stefas
Personal Brand Exploration I George Stefas
 
Questions on sourcing
Questions on sourcingQuestions on sourcing
Questions on sourcing
 
Intermediate LinkedIn - November 2018
Intermediate LinkedIn - November 2018Intermediate LinkedIn - November 2018
Intermediate LinkedIn - November 2018
 
LinkedIn For Your Job Search
LinkedIn For Your Job SearchLinkedIn For Your Job Search
LinkedIn For Your Job Search
 
Linkedin for Danish University Students
Linkedin for Danish University StudentsLinkedin for Danish University Students
Linkedin for Danish University Students
 
Referrals Get Hired - Speach 2013
Referrals Get Hired - Speach 2013Referrals Get Hired - Speach 2013
Referrals Get Hired - Speach 2013
 
LinkedIn Hiring Playbook
LinkedIn Hiring PlaybookLinkedIn Hiring Playbook
LinkedIn Hiring Playbook
 
Smb hiring playbook
Smb hiring playbookSmb hiring playbook
Smb hiring playbook
 
LinkedIn Basics and Best Practices
LinkedIn Basics and Best PracticesLinkedIn Basics and Best Practices
LinkedIn Basics and Best Practices
 
LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices
 
LinkedIn for Your Job Search
LinkedIn for Your Job SearchLinkedIn for Your Job Search
LinkedIn for Your Job Search
 
Quarterly Product Release Webinar: Q1 Edition
Quarterly Product Release Webinar: Q1 EditionQuarterly Product Release Webinar: Q1 Edition
Quarterly Product Release Webinar: Q1 Edition
 
New LinkedIn Recruiter Product Enhancements | North America Webcast
New LinkedIn Recruiter Product Enhancements | North America WebcastNew LinkedIn Recruiter Product Enhancements | North America Webcast
New LinkedIn Recruiter Product Enhancements | North America Webcast
 
The art of intranet search
The art of intranet searchThe art of intranet search
The art of intranet search
 
Toronto | ConnectIn 2013
Toronto | ConnectIn 2013Toronto | ConnectIn 2013
Toronto | ConnectIn 2013
 

Mehr von Daniel Tunkelang

Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
Daniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
Daniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 
Design for Interaction
Design for InteractionDesign for Interaction
Design for Interaction
Daniel Tunkelang
 

Mehr von Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter Authority
 
Design for Interaction
Design for InteractionDesign for Interaction
Design for Interaction
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
 
exploring semantic means
exploring semantic meansexploring semantic means
exploring semantic means
 
Set Retrieval 2.0
Set Retrieval 2.0Set Retrieval 2.0
Set Retrieval 2.0
 
Guided Summarization
Guided SummarizationGuided Summarization
Guided Summarization
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Find and be Found: Information Retrieval at LinkedIn

  • 1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions formation Retrieval at LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding Shakti Daniel Find and be Found:
  • 2. Why do 225M+ people use LinkedIn? 2
  • 3. Profile: the professional identity of record. 3
  • 5. Publishing platform for professional content. 5
  • 6. Search helps members find and be found. 6
  • 9. Search for people, jobs, groups, and more. 9
  • 10. Every search is personalized. 10
  • 11. Let’s talk a bit about how it all works. §  Query Understanding §  Ranking More at http://data.linkedin.com/search. 11
  • 13. Pre-retrieval: segment and tag queries. lucene software engineer lucene “software engineer”
  • 14. LinkedIn’s focus: entity-oriented search. 14 Company Employees Jobs Name Search
  • 15. Query tagging: key to query understanding. §  Using human judgments to evaluate tag precision. –  Extremely accurate (> 99%) for identifying person names. –  Harder to distinguish company vs. title vs. skill (e.g., oracle dba). §  Comparing CTR for tag matches vs. non-matches. –  Difference can be large enough to suggest filtering vs. ranking: 15
  • 16. Detecting navigational vs. exploratory queries. Pre-retrieval §  Sequence of query tags. Post-retrieval §  Distribution of scores / features. 16 Click behavior §  Title searches >50x more likely to get 2+ clicks than name searches.
  • 17. Query expansion for exploratory queries. 17 software patent lawyer Query expansions derived from reformulations. e.g., lawyer -> attorney
  • 18. Understanding misspelled queries. 18 daniel tankalong infomation retrieval marisa meyer ingenero eletrico jonathan podemsky desenista industrail Did you mean daniel tunkelang? Did you mean marissa mayer? Did you mean johnathan podemsky? Did you mean information retrieval? Did you mean ingeniero electrico? Did you mean desenhista industrial?
  • 19. Spelling out the details. entity data people, companies successful queries tunkelang => reformulations marisa => marissa n-grams dublin => du ub bl li in metaphones mark/marc => MRK word pairs johnathan podemsky INDEX } {marisa meyer yoohoo marissa marisa meyer mayer yahoo yoohoo 19
  • 21. LinkedIn search is personalized. 21 kevin scott
  • 22. But global factors matter. 22
  • 23. Relevant results can be in or out of network. 23 §  Searcher’s network matters for relevance. –  Within network results have higher CTR. §  But the network is not enough. –  About two thirds of search clicks come from out of network results.
  • 24. Personalized machine-learned ranking. 24 §  Data point is a triple (searcher, query, document). –  Searcher features are important! §  Labels: Is this document relevant to the query and the user? –  Depends on the user’s network, location, etc. –  Too much to ask random person to judge. §  Training data has to be collected from search logs.
  • 25. Search log data has biases. 25 §  Presentation bias –  Results shown higher tend to get clicked more often. –  Use FairPairs [Radlinski and Joachims, AAAI’06]. not flipped flipped flipped Clicked! ✗ ✔ ✔ ✗ ✗ ✗ training data
  • 26. Search log data has biases. 26 §  Sample bias –  User clicks or skips only what is shown. –  What about low scoring results from existing model? –  Add low-scoring results as ‘easy negatives’ so model learns bad results not presented to user. … label 0 label 0 label 0 label 0 … page 1 page 2 page 3 page n
  • 27. 27 How to train your model.
  • 28. How to train your model. 28 §  Train simple models to resemble complex ones. –  Build Additive Groves model [Sorokina et al, ECML ’07], which is good at detecting interactions. §  Build tree with logistic regression leaves. §  By restricting tree to user and query features, only regression model evaluated for each document. β0 +β1 T(x1)+...+βn xn α0 +α1 P(x1)+...+αnQ(xn ) X2=? X10< 0.1234 ? γ0 +γ1 R(x1)+...+γnQ(xn )
  • 29. Take-Aways §  LinkedIn’s search problem is unique because of deep role of personalization – users are integral part of the corpus. §  Query understanding allows us to optimize for entity- oriented search against semi-structured content. §  Ranking requires us to contextually apply global and personalized user, query, and document features. 29
  • 31. Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact us: –  Shakti: ssinha@linkedin.com http://linkedin.com/in/sdsinha –  Daniel: dtunkelang@linkedin.com http://linkedin.com/in/dtunkelang –  Asif: amakhani@linkedin.com http://linkedin.com/in/asifmakhani §  Did we mention that we’re hiring? 31