SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Shakti          Daniel




     formation Retrieval: Search at LinkedIn
Shakti Sinha               Daniel Tunkelang
Head, Search Relevance     Head, Query Understanding

    Recruiting Solutions                               1
Why do 200M+ people use LinkedIn?




                                    2
People use LinkedIn because of other people.




                                          3
Search helps members find and be found.




                                          4
Rich collection of professional content.




                                           5
Every search is personalized.




                                6
Let’s talk a bit about how it all works.

§  Query Understanding

§  Search Spam

§  Unified Search

More at http://data.linkedin.com/search.



                                           7
Query Understanding




                      8
People are semi-structured objects.




  for i in [1..n]!
    s ← w 1 w 2 … w i!
    if Pc(s) > 0!
      a ← new Segment()!
      a.segs ← {s}!
      a.prob ← Pc(s)!
      B[i] ← {a}!
    for j in [1..i-1]!
       for b in B[j]!
         s ← wj wj+1 … wi!
         if Pc(s) > 0!
            a ← new Segment()!
            a.segs ← b.segs U {s}!
            a.prob ← b.prob * Pc(s)!
            B[i] ← B[i] U {a}!
     sort B[i] by prob!
     truncate B[i] to size k!



                                       9
Word sense is contextual.




                            10
Understand queries as early as possible.




                                           11
Query structure has many applications.

§    Boost results that match query interpretation.
§    Bucket search log analysis by query classes.
§    Query rewriting specific to query classes.
§    …



      Query understanding focuses on set-level metrics.

                  Not just about best answer,
                  but getting to best question.


                                                          12
Search Spam




              13
Let’s look at a search spammer.




                                  14
Summary is verbose but legitimate.




                                     15
But then comes the keyword stuffing.




                                       16
How we train our search spam classifier.

§  Find the queries targeted by spammers.
   –  10,000 most common non-name queries.


§  Look at top results for a generic user.
   –  i.e., show unpersonalized search results.


§  Remove private profiles.
   –  Members first! Can’t sacrifice privacy to fight spammers.


§  Label data by crowdsourcing.
   –  Relevance is subjective, but spam is relatively objective.


                                                                   17
ROC curve for spam thresholding.

                   1
     Spam score
      threshold   0.9

                  0.8
          a
                  0.7

                  0.6

                  0.5
           b
                  0.4

                  0.3

     0<a<b<1      0.2

                  0.1

                   0
                        0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1




                                                                                      18
Integrate spamminess into relevance score.

§  Spam model yields a probability between 0 and 1.

§  Use spam score as piecewise linear factor:
      if score < spammin:
           # not a spammer
           relevance *= 1.0
      elif score > spammax:
           # spammer
           relevance *= 0.0
      else:
           # linear function of spamminess
           relevance *= (spammax - score) / (spammax - spammin)


                                                                  19
Spam is an arms race.

§  We can’t reveal precisely which features we use for spam
    detection, or spammers will work around them.

§  Spammers will try to reverse-engineer us anyway.

§  Personalization benefits us and our legitimate users – it’s
    hard to spam your way to high personalized ranking.

§  Fighting spam is all about making the investment less
    profitable for the spammer.



                                                              20
Unified Search




                 21
Un-Unified Search




                    22
Introducing LinkedIn Unified Search!

Goal: make all of our content more discoverable.

Three new features:
§  Query Auto-Complete
§  Content Type Suggestions
§  Unified Search Result Page




                                                   23
Query Auto-Complete




                      24
Best completion not always the most popular.

§  In a heavy-tailed distribution, even the most popular
    queries account for a small fraction of distribution.

§  We don’t want to suggest generic queries that would
    produce useless results.
   –  e.g., c -> company, j -> jobs


§  Goal is to not only to infer user’s intent but also suggest a
    search that yields relevant results across content types.




                                                                25
Content Type Suggestions




                           26
How we compute content type suggestions.

§  Rank content types by likelihood of a successful search.
   –  Consider click-through behavior as well as downstream actions.


§  Bootstrap using what we know from pre-unified search
    behavior.
   –  Tricky part is compensating for findability bias.


§  Continuously evaluate and collect feedback through user
    behavior.
   –  E.g., members using the left rail to select a particular vertical.




                                                                           27
Unified Search Result Page




                             28
Intent Detection and Page Construction

§  Relevance is now a two-part computation:

              P(Content Type | User, Query)
                             x
          P(Document | User, Query, Content Type)

§  Intent detection comes first: inefficient to send all queries
    to all verticals.

§  Secondary components introduce diversity.


                                                                    29
Summary

§    Personalize every search and leverage structure.
§    Understand queries as early as possible.
§    Fight the spammers that be.
§    Unify and simplify the search experience.


             Goal: help LinkedIn’s 200M+
             members find and be found.




                                                         30
Thank you!




             31
Want to learn more?

§  Check out http://data.linkedin.com/search.

§  Contact us:
     –  Shakti: ssinha@linkedin.com
                http://linkedin.com/in/sdsinha

   –  Daniel: dtunkelang@linkedin.com
              http://linkedin.com/in/dtunkelang

§  Did we mention that we’re hiring?


                                                  32

Weitere ähnliche Inhalte

Andere mochten auch

Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
Daniel Tunkelang
 

Andere mochten auch (16)

MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
User Acquisition Strategy Guide
User Acquisition Strategy Guide User Acquisition Strategy Guide
User Acquisition Strategy Guide
 
Natural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable TechnologyNatural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable Technology
 
E-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job SearchE-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job Search
 
LinkedIn for Students
LinkedIn for StudentsLinkedIn for Students
LinkedIn for Students
 
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
 
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedIn
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Get LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get ConnectedGet LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get Connected
 
Social Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get ConnectedSocial Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get Connected
 
Linkedin for students
Linkedin for studentsLinkedin for students
Linkedin for students
 
Linkedin for high school students
Linkedin for high school studentsLinkedin for high school students
Linkedin for high school students
 
Joining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn GroupsJoining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn Groups
 
Debt collection letter - What do I do?
Debt collection letter - What do I do?Debt collection letter - What do I do?
Debt collection letter - What do I do?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 

Mehr von Daniel Tunkelang

Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
Daniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
Daniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
Daniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
Daniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
Daniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 

Mehr von Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

[In]formation Retrieval: Search at LinkedIn

  • 1. Shakti Daniel formation Retrieval: Search at LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding Recruiting Solutions 1
  • 2. Why do 200M+ people use LinkedIn? 2
  • 3. People use LinkedIn because of other people. 3
  • 4. Search helps members find and be found. 4
  • 5. Rich collection of professional content. 5
  • 6. Every search is personalized. 6
  • 7. Let’s talk a bit about how it all works. §  Query Understanding §  Search Spam §  Unified Search More at http://data.linkedin.com/search. 7
  • 9. People are semi-structured objects. for i in [1..n]! s ← w 1 w 2 … w i! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! 9
  • 10. Word sense is contextual. 10
  • 11. Understand queries as early as possible. 11
  • 12. Query structure has many applications. §  Boost results that match query interpretation. §  Bucket search log analysis by query classes. §  Query rewriting specific to query classes. §  … Query understanding focuses on set-level metrics. Not just about best answer, but getting to best question. 12
  • 14. Let’s look at a search spammer. 14
  • 15. Summary is verbose but legitimate. 15
  • 16. But then comes the keyword stuffing. 16
  • 17. How we train our search spam classifier. §  Find the queries targeted by spammers. –  10,000 most common non-name queries. §  Look at top results for a generic user. –  i.e., show unpersonalized search results. §  Remove private profiles. –  Members first! Can’t sacrifice privacy to fight spammers. §  Label data by crowdsourcing. –  Relevance is subjective, but spam is relatively objective. 17
  • 18. ROC curve for spam thresholding. 1 Spam score threshold 0.9 0.8 a 0.7 0.6 0.5 b 0.4 0.3 0<a<b<1 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 18
  • 19. Integrate spamminess into relevance score. §  Spam model yields a probability between 0 and 1. §  Use spam score as piecewise linear factor: if score < spammin: # not a spammer relevance *= 1.0 elif score > spammax: # spammer relevance *= 0.0 else: # linear function of spamminess relevance *= (spammax - score) / (spammax - spammin) 19
  • 20. Spam is an arms race. §  We can’t reveal precisely which features we use for spam detection, or spammers will work around them. §  Spammers will try to reverse-engineer us anyway. §  Personalization benefits us and our legitimate users – it’s hard to spam your way to high personalized ranking. §  Fighting spam is all about making the investment less profitable for the spammer. 20
  • 23. Introducing LinkedIn Unified Search! Goal: make all of our content more discoverable. Three new features: §  Query Auto-Complete §  Content Type Suggestions §  Unified Search Result Page 23
  • 25. Best completion not always the most popular. §  In a heavy-tailed distribution, even the most popular queries account for a small fraction of distribution. §  We don’t want to suggest generic queries that would produce useless results. –  e.g., c -> company, j -> jobs §  Goal is to not only to infer user’s intent but also suggest a search that yields relevant results across content types. 25
  • 27. How we compute content type suggestions. §  Rank content types by likelihood of a successful search. –  Consider click-through behavior as well as downstream actions. §  Bootstrap using what we know from pre-unified search behavior. –  Tricky part is compensating for findability bias. §  Continuously evaluate and collect feedback through user behavior. –  E.g., members using the left rail to select a particular vertical. 27
  • 29. Intent Detection and Page Construction §  Relevance is now a two-part computation: P(Content Type | User, Query) x P(Document | User, Query, Content Type) §  Intent detection comes first: inefficient to send all queries to all verticals. §  Secondary components introduce diversity. 29
  • 30. Summary §  Personalize every search and leverage structure. §  Understand queries as early as possible. §  Fight the spammers that be. §  Unify and simplify the search experience. Goal: help LinkedIn’s 200M+ members find and be found. 30
  • 32. Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact us: –  Shakti: ssinha@linkedin.com http://linkedin.com/in/sdsinha –  Daniel: dtunkelang@linkedin.com http://linkedin.com/in/dtunkelang §  Did we mention that we’re hiring? 32