SlideShare a Scribd company logo
1 of 45
Download to read offline
MMM, Search!
Daniel Tunkelang
dtunkelang@gmail.com

Presented to Wikimedia Foundation on April 27, 2020
What is search?
Search is a process.
• Searchers
• start with information-seeking goals.

• express and elaborate those goals as queries.

• Search Engines
• translate queries into representations of intent.

• retrieve results relevant to that intent and rank them.

Communication isn’t perfect, so the process is iterative.
Search is many things.
• Known-Item search vs. exploratory search.

• Seeking specific item vs. knowing when you see it.

• Search is a means to an end, not the end itself.

• Getting information, shopping, communication, etc.

• It takes a lot of hard work to make search feel effortless.

• Indexing, query understanding, matching, ranking.
Metrics, Models, Methods
The most important decisions for a search engine are:

• Metrics: what we measure and optimize for.

• Models: how we model the search experience.

• Methods: how we help searchers achieve success.
Metrics
Metrics:
What do we need to know?
• Binary Relevance

• Are searchers finding relevant results?

• Session Success

• How often are search sessions successful?

• Search Efficiency

• How much effort are searchers making?
Binary Relevance
Relevance is a measure of
information conveyed by a
document relative to a query.



Relationship between document
and query, though necessary, is not
sufficient to determine relevance.

William Goffman, 1964
Relevance has shades of gray, but
non-relevance is black and white.
Example: Email
• Can Google decide which of my emails are important?

• ¯_(ツ)_/¯

• Can Google decide which of my emails are spam?

• Definitely!
Measure Binary Relevance!
• Build a (query, document) binary relevance model.

• (we’ll get back to that in a moment)

• Embrace positional bias: measure at top ranks.

• Can use top k results or weighted sample.

• Stratify for meaningful query and user segments.

• Leverage query classification and user data.
Search is a journey.
Searcher
Search isn’t always one-shot.
Search can’t be always one-shot.
Measure Session Success!
• Measure session conversion, not just query conversion.

• Much better proxy for user’s success!

• Compute metrics based on first query of session.

• Distribution of journeys for common intent.

• Segment sessions into tasks? Maybe, but optional. 

• Multi-task sessions uncommon; treat as noise.
Search Efficiency
Searching is not fun.
Having found is fun.
• If search is too hard or takes too long, searchers give up.

• Compare successful and unsuccessful sessions.

• Measure how much time searchers spend in sessions.

• Especially time on search rather than results.

• Measure searcher effort.

• Pagination, reformulation, refinement, etc.
Metrics: Summary
• Binary Relevance

• Are searchers finding relevant results?

• Session Success

• How often are search sessions successful?

• Search Efficiency

• How much effort are searchers making?
Models
Models:
What do we model and how?
• Query Categorization

• What is the primary domain for a query?

• Query Similarity

• Do two queries express similar / identical intent?

• Binary Relevance

• How to estimate relevance of results to queries?
Query Categorization
Search starts with query understanding.
Query understanding starts with categorization.
• Map query to a primary content taxonomy.

• Subject, product type, domain, etc.

• Identify high-level intent, independent of content interest.

• Title, category, brand, site help, etc.

• Categories should be coherent, distinctive, and useful.

• Good categorization requires good categories.
How to Train Your
Query Categorization Model
• Label your most frequent head queries manually.

• Top 1000 queries are probably worth it.

• For torso queries, infer categories from engagement.

• Looking for overwhelmingly dominant category.

• Now train a model using labeled head and torso queries.

• This training data is biased, but manageably so.

• No need to use fancy deep learning / AI. Try fastText.
Query Similarity
Query ambiguity is rare.
Query similarity is common.
• Some queries do not express a clear intent, but most do.

• Most “ambiguous” queries turn out to be broad.

• Bigger opportunity: multiple queries express same intent.

• Or at least the same distribution of intents.

• Recognizing similar / identical queries is huge opportunity.

• Query rewriting, aggregating signals, etc.
How to Model
Query Similarity
• Start with the simple stuff: shallow query canonicalization.

• Character normalization, stemming, word order.

• Look at edit distance, especially for spelling errors.

• Tail queries at edit distance 1 from head queries.

• Compare embeddings of queries and results.

• Especially to keep the other methods honest.
Binary Relevance
Focus on simplest question.
• Worry whether a result is relevant or non-relevant.

• Relevant vs. more relevant is often subjective.

• Assume that query understanding has done its job first.

• Result relevance depends on query understanding.

• Assume that relevance is objective and universal.

• Personalization: a nice-to-have, not a must-have.
How to Train Your
Binary Relevance Model
• Collect human binary relevance judgments. Lots of them. 

• Quantity is more important than quality.

• Pay attention to query distribution and stratify sample. 

• Collect judgements that teach you something.

• Come to terms with presentation and position bias.

• Users mostly interact with top-ranked results.
Models: Summary
• Query Categorization

• Simple model to map query to primary intent.

• Query Similarity

• Recognize queries with same or similar intent.

• Binary Relevance

• Use human judgments to train relevance model.
Methods
Methods:
What are some useful tricks?
• Optimize for Query Performance

• Help searchers make better queries.

• Map Tail Queries to Head Intents

• Searchers aren’t as unique as you think!

• Learn from Successful Sessions

• Help others discovers successful paths.
Optimize for
Query Performance
• Expected searcher success for query.

• Function of query, not of any particular result.

• Can use any measure of searcher success.

• But consider focusing on session success.

• Can incorporate sorting, refinement, or other factors.

• But keep it simple. Query is probably enough.
What is query performance?
Best way to predict query performance?
Historical query performance.
Stuck in the tail? No data?
These methods can help.
Predict query performance.
Then optimize for it.
• Consider every surface where you suggest queries.

• Autocomplete, guides, related searches, etc.

• Offer suggestions with high predicted performance.

• Or at least nudge users wherever possible.

• Use query rewriting to improve query performance.

• Rewrite to similar, high-performing queries.
Pull Your Tail From Your Head
Many tail queries
express head intents.
• Misspelled queries are often misspellings of head queries.

• Common misspellings are uncommon.

• Many queries have a dominant singular or plural form.

• Often, though not always, the same intent.

• Also word order or other grammatical transformations.

• Such removal of low-information / noise words.
Rewrite tail queries!
• Prioritize correcting misspellings of head queries.

• Be more aggressive, skip tokenization, etc.

• Look for head queries equivalent to tail queries.

• Stemming, reordering terms, dropping noise words.

• But check to make sure intent is actually preserved!

• Remember earlier discussion of query similarity.
Learn From Success
Successful searchers
can help everyone else.
• Some queries lead to great performance for everyone.

• e.g., known-item searches by name or title.

• But for some queries, performance is user-dependent.

• Some users are more sophisticated or persistent.

• Successful users discovers successful paths.

• Use trails of successful users to build shortcuts!
Optimize complex journeys.
• Detect the searches for which searchers need help.

• Queries for which successful sessions are long.

• Find the actions that successful searchers take.

• Category / facet refinements, reformulations.

• Promote those actions in the search experience.

• Create shortcuts in the navigational landscape.
Methods: Summary
• Optimize for Query Performance

• Suggest better queries and rewrite others.

• Map Tail Queries to Head Intents

• Rewrite tail queries as similar head queries.

• Learn from Successful Sessions

• Create shortcuts based on successful paths.
Putting It All Together
• Metrics, models, and methods — they all matter.

• Query understanding first, then result relevance.

• Binary result relevance first, then result ranking.

• Session performance, not just query performance.

• Get as much leverage as possible from head queries.
Thank You!
• More Resources

• Query Understanding

https://queryunderstanding.com/

• My Medium (not just about search)

https://medium.com/@dtunkelang

• Contact me directly!

dtunkelang@gmail.com

More Related Content

What's hot

Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Projectvaibhavdeoda
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assemblyZhuyi Xue
 
Scientific writing
Scientific writingScientific writing
Scientific writingLAKSHMANAN S
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Bioinformatics and Artificial Intelligence (AI) the interrelation between the...
Bioinformatics and Artificial Intelligence (AI) the interrelation between the...Bioinformatics and Artificial Intelligence (AI) the interrelation between the...
Bioinformatics and Artificial Intelligence (AI) the interrelation between the...Swapsg
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challengesinside-BigData.com
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
An Ambitious Wikidata Tutorial
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial_Emw
 
Qualitative Data Analysis using NVivo10 - A workshop for facilitators
Qualitative Data Analysis using NVivo10 - A workshop for facilitatorsQualitative Data Analysis using NVivo10 - A workshop for facilitators
Qualitative Data Analysis using NVivo10 - A workshop for facilitatorsBrenda Cecilia Padilla Rodríguez
 
Analysis of ATAC-seq data
Analysis of ATAC-seq dataAnalysis of ATAC-seq data
Analysis of ATAC-seq dataShaojun Xie
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single CellQIAGEN
 

What's hot (20)

Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Project
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assembly
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
Scientific writing
Scientific writingScientific writing
Scientific writing
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Viral and bac
Viral and bacViral and bac
Viral and bac
 
Bioinformatics and Artificial Intelligence (AI) the interrelation between the...
Bioinformatics and Artificial Intelligence (AI) the interrelation between the...Bioinformatics and Artificial Intelligence (AI) the interrelation between the...
Bioinformatics and Artificial Intelligence (AI) the interrelation between the...
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
An Ambitious Wikidata Tutorial
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial
 
Qualitative Data Analysis using NVivo10 - A workshop for facilitators
Qualitative Data Analysis using NVivo10 - A workshop for facilitatorsQualitative Data Analysis using NVivo10 - A workshop for facilitators
Qualitative Data Analysis using NVivo10 - A workshop for facilitators
 
Analysis of ATAC-seq data
Analysis of ATAC-seq dataAnalysis of ATAC-seq data
Analysis of ATAC-seq data
 
Proteomics
ProteomicsProteomics
Proteomics
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
 
Publishing Scientific Papers
Publishing Scientific Papers Publishing Scientific Papers
Publishing Scientific Papers
 

Similar to MMM, Search!

Statistics for MBA.pptx
Statistics for MBA.pptxStatistics for MBA.pptx
Statistics for MBA.pptxPradeep513562
 
Keyword research - Digital Marketing - SEO
Keyword research - Digital Marketing - SEOKeyword research - Digital Marketing - SEO
Keyword research - Digital Marketing - SEONeeraj Reddy
 
Business research Questionnaire Design
Business research Questionnaire DesignBusiness research Questionnaire Design
Business research Questionnaire DesignNishant Pahad
 
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)ALATechSource
 
CBI Workshop -.pptx
CBI Workshop -.pptxCBI Workshop -.pptx
CBI Workshop -.pptxAmira Samy
 
Questionnaire design for beginners (Bart Rienties)
Questionnaire design for beginners (Bart Rienties)Questionnaire design for beginners (Bart Rienties)
Questionnaire design for beginners (Bart Rienties)Bart Rienties
 
Effective interviewing skills
Effective interviewing skillsEffective interviewing skills
Effective interviewing skillsJamshaid Iqbal
 
business research method chp 7]
business research method  chp 7]business research method  chp 7]
business research method chp 7]fizza tanvir
 
Hci evaluationa frame work lec 14
Hci evaluationa frame work lec 14Hci evaluationa frame work lec 14
Hci evaluationa frame work lec 14Anwal Mirza
 
Guidelines for search features development a comparison of general users and ...
Guidelines for search features development a comparison of general users and ...Guidelines for search features development a comparison of general users and ...
Guidelines for search features development a comparison of general users and ...Ferli Castillo
 
Behavioural Interviewing Skills, 2012-2013
Behavioural Interviewing Skills, 2012-2013Behavioural Interviewing Skills, 2012-2013
Behavioural Interviewing Skills, 2012-2013ankiit aggarwal
 
Simple User Research Methods: the First Step to Improving Your Website
Simple User Research Methods: the First Step to Improving Your WebsiteSimple User Research Methods: the First Step to Improving Your Website
Simple User Research Methods: the First Step to Improving Your WebsiteRebecca Blakiston
 
Research method - How to interview?
Research method - How to interview?Research method - How to interview?
Research method - How to interview?Hafizah Hajimia
 
Getting it Right with Keyword Research - Stukent Expert Session
Getting it Right with Keyword Research - Stukent Expert SessionGetting it Right with Keyword Research - Stukent Expert Session
Getting it Right with Keyword Research - Stukent Expert SessionStukent Inc.
 
Training Program Evaluation
Training Program EvaluationTraining Program Evaluation
Training Program EvaluationLaura Pasquini
 

Similar to MMM, Search! (20)

Search Analytics - Comperio
Search Analytics - ComperioSearch Analytics - Comperio
Search Analytics - Comperio
 
Statistics for MBA.pptx
Statistics for MBA.pptxStatistics for MBA.pptx
Statistics for MBA.pptx
 
Keyword research - Digital Marketing - SEO
Keyword research - Digital Marketing - SEOKeyword research - Digital Marketing - SEO
Keyword research - Digital Marketing - SEO
 
Business research Questionnaire Design
Business research Questionnaire DesignBusiness research Questionnaire Design
Business research Questionnaire Design
 
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
 
CBI Workshop -.pptx
CBI Workshop -.pptxCBI Workshop -.pptx
CBI Workshop -.pptx
 
Questionnaire design for beginners (Bart Rienties)
Questionnaire design for beginners (Bart Rienties)Questionnaire design for beginners (Bart Rienties)
Questionnaire design for beginners (Bart Rienties)
 
Effective interviewing skills
Effective interviewing skillsEffective interviewing skills
Effective interviewing skills
 
business research method chp 7]
business research method  chp 7]business research method  chp 7]
business research method chp 7]
 
HCI_Lecture04.pptx
HCI_Lecture04.pptxHCI_Lecture04.pptx
HCI_Lecture04.pptx
 
Requirements elicitation
Requirements elicitationRequirements elicitation
Requirements elicitation
 
Hci evaluationa frame work lec 14
Hci evaluationa frame work lec 14Hci evaluationa frame work lec 14
Hci evaluationa frame work lec 14
 
Guidelines for search features development a comparison of general users and ...
Guidelines for search features development a comparison of general users and ...Guidelines for search features development a comparison of general users and ...
Guidelines for search features development a comparison of general users and ...
 
Behavioural Interviewing Skills, 2012-2013
Behavioural Interviewing Skills, 2012-2013Behavioural Interviewing Skills, 2012-2013
Behavioural Interviewing Skills, 2012-2013
 
Simple User Research Methods: the First Step to Improving Your Website
Simple User Research Methods: the First Step to Improving Your WebsiteSimple User Research Methods: the First Step to Improving Your Website
Simple User Research Methods: the First Step to Improving Your Website
 
Unit 2: Research.
Unit 2: Research.Unit 2: Research.
Unit 2: Research.
 
Research method - How to interview?
Research method - How to interview?Research method - How to interview?
Research method - How to interview?
 
Getting it Right with Keyword Research - Stukent Expert Session
Getting it Right with Keyword Research - Stukent Expert SessionGetting it Right with Keyword Research - Stukent Expert Session
Getting it Right with Keyword Research - Stukent Expert Session
 
Training Program Evaluation
Training Program EvaluationTraining Program Evaluation
Training Program Evaluation
 
Free sample 25% Professional in Business Analysis PMI-PBA
Free sample 25%  Professional in Business Analysis PMI-PBAFree sample 25%  Professional in Business Analysis PMI-PBA
Free sample 25% Professional in Business Analysis PMI-PBA
 

More from Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A ManifestoDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?Daniel Tunkelang
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query UnderstandingDaniel Tunkelang
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and ContextDaniel Tunkelang
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and SemanticsDaniel Tunkelang
 

More from Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedIn
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 

Recently uploaded

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 

Recently uploaded (20)

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 

MMM, Search!

  • 1. MMM, Search! Daniel Tunkelang dtunkelang@gmail.com Presented to Wikimedia Foundation on April 27, 2020
  • 3. Search is a process. • Searchers • start with information-seeking goals. • express and elaborate those goals as queries. • Search Engines • translate queries into representations of intent. • retrieve results relevant to that intent and rank them. Communication isn’t perfect, so the process is iterative.
  • 4. Search is many things. • Known-Item search vs. exploratory search. • Seeking specific item vs. knowing when you see it. • Search is a means to an end, not the end itself. • Getting information, shopping, communication, etc. • It takes a lot of hard work to make search feel effortless. • Indexing, query understanding, matching, ranking.
  • 5. Metrics, Models, Methods The most important decisions for a search engine are: • Metrics: what we measure and optimize for. • Models: how we model the search experience. • Methods: how we help searchers achieve success.
  • 7. Metrics: What do we need to know? • Binary Relevance • Are searchers finding relevant results? • Session Success • How often are search sessions successful? • Search Efficiency • How much effort are searchers making?
  • 8. Binary Relevance Relevance is a measure of information conveyed by a document relative to a query.
 
 Relationship between document and query, though necessary, is not sufficient to determine relevance. William Goffman, 1964
  • 9. Relevance has shades of gray, but non-relevance is black and white.
  • 10. Example: Email • Can Google decide which of my emails are important? • ¯_(ツ)_/¯ • Can Google decide which of my emails are spam? • Definitely!
  • 11. Measure Binary Relevance! • Build a (query, document) binary relevance model. • (we’ll get back to that in a moment) • Embrace positional bias: measure at top ranks. • Can use top k results or weighted sample. • Stratify for meaningful query and user segments. • Leverage query classification and user data.
  • 12. Search is a journey. Searcher
  • 13. Search isn’t always one-shot. Search can’t be always one-shot.
  • 14. Measure Session Success! • Measure session conversion, not just query conversion. • Much better proxy for user’s success! • Compute metrics based on first query of session. • Distribution of journeys for common intent. • Segment sessions into tasks? Maybe, but optional. • Multi-task sessions uncommon; treat as noise.
  • 16. Searching is not fun. Having found is fun. • If search is too hard or takes too long, searchers give up. • Compare successful and unsuccessful sessions. • Measure how much time searchers spend in sessions. • Especially time on search rather than results. • Measure searcher effort. • Pagination, reformulation, refinement, etc.
  • 17. Metrics: Summary • Binary Relevance • Are searchers finding relevant results? • Session Success • How often are search sessions successful? • Search Efficiency • How much effort are searchers making?
  • 19. Models: What do we model and how? • Query Categorization • What is the primary domain for a query? • Query Similarity • Do two queries express similar / identical intent? • Binary Relevance • How to estimate relevance of results to queries?
  • 21. Search starts with query understanding. Query understanding starts with categorization. • Map query to a primary content taxonomy. • Subject, product type, domain, etc. • Identify high-level intent, independent of content interest. • Title, category, brand, site help, etc. • Categories should be coherent, distinctive, and useful. • Good categorization requires good categories.
  • 22. How to Train Your Query Categorization Model • Label your most frequent head queries manually. • Top 1000 queries are probably worth it. • For torso queries, infer categories from engagement. • Looking for overwhelmingly dominant category. • Now train a model using labeled head and torso queries. • This training data is biased, but manageably so. • No need to use fancy deep learning / AI. Try fastText.
  • 24. Query ambiguity is rare. Query similarity is common. • Some queries do not express a clear intent, but most do. • Most “ambiguous” queries turn out to be broad. • Bigger opportunity: multiple queries express same intent. • Or at least the same distribution of intents. • Recognizing similar / identical queries is huge opportunity. • Query rewriting, aggregating signals, etc.
  • 25. How to Model Query Similarity • Start with the simple stuff: shallow query canonicalization. • Character normalization, stemming, word order. • Look at edit distance, especially for spelling errors. • Tail queries at edit distance 1 from head queries. • Compare embeddings of queries and results. • Especially to keep the other methods honest.
  • 27. Focus on simplest question. • Worry whether a result is relevant or non-relevant. • Relevant vs. more relevant is often subjective. • Assume that query understanding has done its job first. • Result relevance depends on query understanding. • Assume that relevance is objective and universal. • Personalization: a nice-to-have, not a must-have.
  • 28. How to Train Your Binary Relevance Model • Collect human binary relevance judgments. Lots of them. • Quantity is more important than quality. • Pay attention to query distribution and stratify sample. • Collect judgements that teach you something. • Come to terms with presentation and position bias. • Users mostly interact with top-ranked results.
  • 29. Models: Summary • Query Categorization • Simple model to map query to primary intent. • Query Similarity • Recognize queries with same or similar intent. • Binary Relevance • Use human judgments to train relevance model.
  • 31. Methods: What are some useful tricks? • Optimize for Query Performance • Help searchers make better queries. • Map Tail Queries to Head Intents • Searchers aren’t as unique as you think! • Learn from Successful Sessions • Help others discovers successful paths.
  • 33. • Expected searcher success for query. • Function of query, not of any particular result. • Can use any measure of searcher success. • But consider focusing on session success. • Can incorporate sorting, refinement, or other factors. • But keep it simple. Query is probably enough. What is query performance?
  • 34. Best way to predict query performance? Historical query performance.
  • 35. Stuck in the tail? No data? These methods can help.
  • 36. Predict query performance. Then optimize for it. • Consider every surface where you suggest queries. • Autocomplete, guides, related searches, etc. • Offer suggestions with high predicted performance. • Or at least nudge users wherever possible. • Use query rewriting to improve query performance. • Rewrite to similar, high-performing queries.
  • 37. Pull Your Tail From Your Head
  • 38. Many tail queries express head intents. • Misspelled queries are often misspellings of head queries. • Common misspellings are uncommon. • Many queries have a dominant singular or plural form. • Often, though not always, the same intent. • Also word order or other grammatical transformations. • Such removal of low-information / noise words.
  • 39. Rewrite tail queries! • Prioritize correcting misspellings of head queries. • Be more aggressive, skip tokenization, etc. • Look for head queries equivalent to tail queries. • Stemming, reordering terms, dropping noise words. • But check to make sure intent is actually preserved! • Remember earlier discussion of query similarity.
  • 41. Successful searchers can help everyone else. • Some queries lead to great performance for everyone. • e.g., known-item searches by name or title. • But for some queries, performance is user-dependent. • Some users are more sophisticated or persistent. • Successful users discovers successful paths. • Use trails of successful users to build shortcuts!
  • 42. Optimize complex journeys. • Detect the searches for which searchers need help. • Queries for which successful sessions are long. • Find the actions that successful searchers take. • Category / facet refinements, reformulations. • Promote those actions in the search experience. • Create shortcuts in the navigational landscape.
  • 43. Methods: Summary • Optimize for Query Performance • Suggest better queries and rewrite others. • Map Tail Queries to Head Intents • Rewrite tail queries as similar head queries. • Learn from Successful Sessions • Create shortcuts based on successful paths.
  • 44. Putting It All Together • Metrics, models, and methods — they all matter. • Query understanding first, then result relevance. • Binary result relevance first, then result ranking. • Session performance, not just query performance. • Get as much leverage as possible from head queries.
  • 45. Thank You! • More Resources • Query Understanding
 https://queryunderstanding.com/ • My Medium (not just about search)
 https://medium.com/@dtunkelang • Contact me directly!
 dtunkelang@gmail.com