SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Shakti          Daniel




     formation Retrieval: Search at LinkedIn
Shakti Sinha               Daniel Tunkelang
Head, Search Relevance     Head, Query Understanding

    Recruiting Solutions                               1
Why do 200M+ people use LinkedIn?




                                    2
People use LinkedIn because of other people.




                                          3
Search helps members find and be found.




                                          4
Rich collection of professional content.




                                           5
Every search is personalized.




                                6
Let’s talk a bit about how it all works.

§  Query Understanding

§  Search Spam

§  Unified Search

More at http://data.linkedin.com/search.



                                           7
Query Understanding




                      8
People are semi-structured objects.




  for i in [1..n]!
    s ← w 1 w 2 … w i!
    if Pc(s) > 0!
      a ← new Segment()!
      a.segs ← {s}!
      a.prob ← Pc(s)!
      B[i] ← {a}!
    for j in [1..i-1]!
       for b in B[j]!
         s ← wj wj+1 … wi!
         if Pc(s) > 0!
            a ← new Segment()!
            a.segs ← b.segs U {s}!
            a.prob ← b.prob * Pc(s)!
            B[i] ← B[i] U {a}!
     sort B[i] by prob!
     truncate B[i] to size k!



                                       9
Word sense is contextual.




                            10
Understand queries as early as possible.




                                           11
Query structure has many applications.

§    Boost results that match query interpretation.
§    Bucket search log analysis by query classes.
§    Query rewriting specific to query classes.
§    …



      Query understanding focuses on set-level metrics.

                  Not just about best answer,
                  but getting to best question.


                                                          12
Search Spam




              13
Let’s look at a search spammer.




                                  14
Summary is verbose but legitimate.




                                     15
But then comes the keyword stuffing.




                                       16
How we train our search spam classifier.

§  Find the queries targeted by spammers.
   –  10,000 most common non-name queries.


§  Look at top results for a generic user.
   –  i.e., show unpersonalized search results.


§  Remove private profiles.
   –  Members first! Can’t sacrifice privacy to fight spammers.


§  Label data by crowdsourcing.
   –  Relevance is subjective, but spam is relatively objective.


                                                                   17
ROC curve for spam thresholding.

                   1
     Spam score
      threshold   0.9

                  0.8
          a
                  0.7

                  0.6

                  0.5
           b
                  0.4

                  0.3

     0<a<b<1      0.2

                  0.1

                   0
                        0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1




                                                                                      18
Integrate spamminess into relevance score.

§  Spam model yields a probability between 0 and 1.

§  Use spam score as piecewise linear factor:
      if score < spammin:
           # not a spammer
           relevance *= 1.0
      elif score > spammax:
           # spammer
           relevance *= 0.0
      else:
           # linear function of spamminess
           relevance *= (spammax - score) / (spammax - spammin)


                                                                  19
Spam is an arms race.

§  We can’t reveal precisely which features we use for spam
    detection, or spammers will work around them.

§  Spammers will try to reverse-engineer us anyway.

§  Personalization benefits us and our legitimate users – it’s
    hard to spam your way to high personalized ranking.

§  Fighting spam is all about making the investment less
    profitable for the spammer.



                                                              20
Unified Search




                 21
Un-Unified Search




                    22
Introducing LinkedIn Unified Search!

Goal: make all of our content more discoverable.

Three new features:
§  Query Auto-Complete
§  Content Type Suggestions
§  Unified Search Result Page




                                                   23
Query Auto-Complete




                      24
Best completion not always the most popular.

§  In a heavy-tailed distribution, even the most popular
    queries account for a small fraction of distribution.

§  We don’t want to suggest generic queries that would
    produce useless results.
   –  e.g., c -> company, j -> jobs


§  Goal is to not only to infer user’s intent but also suggest a
    search that yields relevant results across content types.




                                                                25
Content Type Suggestions




                           26
How we compute content type suggestions.

§  Rank content types by likelihood of a successful search.
   –  Consider click-through behavior as well as downstream actions.


§  Bootstrap using what we know from pre-unified search
    behavior.
   –  Tricky part is compensating for findability bias.


§  Continuously evaluate and collect feedback through user
    behavior.
   –  E.g., members using the left rail to select a particular vertical.




                                                                           27
Unified Search Result Page




                             28
Intent Detection and Page Construction

§  Relevance is now a two-part computation:

              P(Content Type | User, Query)
                             x
          P(Document | User, Query, Content Type)

§  Intent detection comes first: inefficient to send all queries
    to all verticals.

§  Secondary components introduce diversity.


                                                                    29
Summary

§    Personalize every search and leverage structure.
§    Understand queries as early as possible.
§    Fight the spammers that be.
§    Unify and simplify the search experience.


             Goal: help LinkedIn’s 200M+
             members find and be found.




                                                         30
Thank you!




             31
Want to learn more?

§  Check out http://data.linkedin.com/search.

§  Contact us:
     –  Shakti: ssinha@linkedin.com
                http://linkedin.com/in/sdsinha

   –  Daniel: dtunkelang@linkedin.com
              http://linkedin.com/in/dtunkelang

§  Did we mention that we’re hiring?


                                                  32

Weitere ähnliche Inhalte

Andere mochten auch

Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
Daniel Tunkelang
 

Andere mochten auch (16)

MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
User Acquisition Strategy Guide
User Acquisition Strategy Guide User Acquisition Strategy Guide
User Acquisition Strategy Guide
 
Natural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable TechnologyNatural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable Technology
 
E-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job SearchE-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job Search
 
LinkedIn for Students
LinkedIn for StudentsLinkedIn for Students
LinkedIn for Students
 
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
 
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedIn
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Get LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get ConnectedGet LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get Connected
 
Social Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get ConnectedSocial Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get Connected
 
Linkedin for students
Linkedin for studentsLinkedin for students
Linkedin for students
 
Linkedin for high school students
Linkedin for high school studentsLinkedin for high school students
Linkedin for high school students
 
Joining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn GroupsJoining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn Groups
 
Debt collection letter - What do I do?
Debt collection letter - What do I do?Debt collection letter - What do I do?
Debt collection letter - What do I do?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 

Mehr von Daniel Tunkelang

Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
Daniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
Daniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
Daniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
Daniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
Daniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 

Mehr von Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

[In]formation Retrieval: Search at LinkedIn

  • 1. Shakti Daniel formation Retrieval: Search at LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding Recruiting Solutions 1
  • 2. Why do 200M+ people use LinkedIn? 2
  • 3. People use LinkedIn because of other people. 3
  • 4. Search helps members find and be found. 4
  • 5. Rich collection of professional content. 5
  • 6. Every search is personalized. 6
  • 7. Let’s talk a bit about how it all works. §  Query Understanding §  Search Spam §  Unified Search More at http://data.linkedin.com/search. 7
  • 9. People are semi-structured objects. for i in [1..n]! s ← w 1 w 2 … w i! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! 9
  • 10. Word sense is contextual. 10
  • 11. Understand queries as early as possible. 11
  • 12. Query structure has many applications. §  Boost results that match query interpretation. §  Bucket search log analysis by query classes. §  Query rewriting specific to query classes. §  … Query understanding focuses on set-level metrics. Not just about best answer, but getting to best question. 12
  • 14. Let’s look at a search spammer. 14
  • 15. Summary is verbose but legitimate. 15
  • 16. But then comes the keyword stuffing. 16
  • 17. How we train our search spam classifier. §  Find the queries targeted by spammers. –  10,000 most common non-name queries. §  Look at top results for a generic user. –  i.e., show unpersonalized search results. §  Remove private profiles. –  Members first! Can’t sacrifice privacy to fight spammers. §  Label data by crowdsourcing. –  Relevance is subjective, but spam is relatively objective. 17
  • 18. ROC curve for spam thresholding. 1 Spam score threshold 0.9 0.8 a 0.7 0.6 0.5 b 0.4 0.3 0<a<b<1 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 18
  • 19. Integrate spamminess into relevance score. §  Spam model yields a probability between 0 and 1. §  Use spam score as piecewise linear factor: if score < spammin: # not a spammer relevance *= 1.0 elif score > spammax: # spammer relevance *= 0.0 else: # linear function of spamminess relevance *= (spammax - score) / (spammax - spammin) 19
  • 20. Spam is an arms race. §  We can’t reveal precisely which features we use for spam detection, or spammers will work around them. §  Spammers will try to reverse-engineer us anyway. §  Personalization benefits us and our legitimate users – it’s hard to spam your way to high personalized ranking. §  Fighting spam is all about making the investment less profitable for the spammer. 20
  • 23. Introducing LinkedIn Unified Search! Goal: make all of our content more discoverable. Three new features: §  Query Auto-Complete §  Content Type Suggestions §  Unified Search Result Page 23
  • 25. Best completion not always the most popular. §  In a heavy-tailed distribution, even the most popular queries account for a small fraction of distribution. §  We don’t want to suggest generic queries that would produce useless results. –  e.g., c -> company, j -> jobs §  Goal is to not only to infer user’s intent but also suggest a search that yields relevant results across content types. 25
  • 27. How we compute content type suggestions. §  Rank content types by likelihood of a successful search. –  Consider click-through behavior as well as downstream actions. §  Bootstrap using what we know from pre-unified search behavior. –  Tricky part is compensating for findability bias. §  Continuously evaluate and collect feedback through user behavior. –  E.g., members using the left rail to select a particular vertical. 27
  • 29. Intent Detection and Page Construction §  Relevance is now a two-part computation: P(Content Type | User, Query) x P(Document | User, Query, Content Type) §  Intent detection comes first: inefficient to send all queries to all verticals. §  Secondary components introduce diversity. 29
  • 30. Summary §  Personalize every search and leverage structure. §  Understand queries as early as possible. §  Fight the spammers that be. §  Unify and simplify the search experience. Goal: help LinkedIn’s 200M+ members find and be found. 30
  • 32. Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact us: –  Shakti: ssinha@linkedin.com http://linkedin.com/in/sdsinha –  Daniel: dtunkelang@linkedin.com http://linkedin.com/in/dtunkelang §  Did we mention that we’re hiring? 32