SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
© 2019 The MITRE Corporation. All rights reserved.
Quaerite – Search Relevance Toolkit
Tim Allison
tallison@apache.org, @_tallison
April 24, 2019
Haystack Conference
Approved for Public Release;
Distribution Unlimited. Case
Number 18-3138-5
| 2 |
© 2019 The MITRE Corporation. All rights reserved.
Debt of Gratitude
▪ Thank you Doug Turnbull, John Berryman and Open Source
Connections for the inspiration/examples/training with tmdb and for
sharing your ground truth set!
| 3 |
© 2019 The MITRE Corporation. All rights reserved.
Yet Another Toolkit? Why!?
▪ How many parameters do we have?
▪ How many permutations of those parameters are available?
| 4 |
© 2019 The MITRE Corporation. All rights reserved.
Available Parameters
▪ 14 tokenizers https://lucene.apache.org/solr/guide/7_1/tokenizers.html
▪ ~45 token filters (not including language-specific token filters – see next slide)
https://lucene.apache.org/solr/guide/7_1/filter-descriptions.html
▪ Query parsers
▪ Query operators, minimum should match, should, must, not
▪ Token/field based scoring – best_fields, most_fields, cross_fields
▪ Field boosting
▪ Phrasal boosting/shingling
▪ Synonym lists, taxonomies
▪ Similarity scoring parameters (with BM25)
▪ Elevate
▪ External signal enrichment
– manual or automatic (NLP – entity extraction, categorization, etc.)
▪ Reranking via machine learning (Learning to Rank)
| 4 |
© 2019 The MITRE Corporation. All rights reserved. For internal MITRE use
| 5 |
© 2019 The MITRE Corporation. All rights reserved.
Each Token Filter Can Have Many Parameters
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="0"
preserveOriginal="1"/>
| 5 |
© 2019 The MITRE Corporation. All rights reserved. For internal MITRE use
| 6 |
© 2019 The MITRE Corporation. All rights reserved.
Overview – Offline testing toolkit
Prerequisites:
1. Reliable, generalizable ground truth
2. Reliable, useful underlying data
3. Offline metric has to have some connection to KPIs
4. Expertise – you still have to know what you’re doing!!!
| 7 |
© 2019 The MITRE Corporation. All rights reserved.
Main Tools
1. Run Experiments
2. Generate Experiments
▪ All permutations (grid search)
▪ Random experiments (random search)
3. Genetic Algorithm
▪ Cross-fold validation!!!
▪ Complementary to LTR -- main diff is algorithm and in running offline to tune general settings rather
than as reranking top n
| 8 |
© 2019 The MITRE Corporation. All rights reserved.
Odds and Ends
▪ Analyzer Comparison over (mostly) the index
▪ Significant Terms (yawn…for archaic versions of Solr)…and planning to
add these as parameters in “generate experiments”
| 9 |
© 2019 The MITRE Corporation. All rights reserved.
Adding Porter Stemming: create account
creat
created: 709
create: 551
creating: 269
creates: 153
creat: 1
account
account: 3244
accounts: 1924
accounting: 1548
accountants: 340
accountant: 176
accounted: 134
accountability: 74
accountable: 74
accountancy: 65
account's: 7
accountant's: 7
| 10 |
© 2019 The MITRE Corporation. All rights reserved.
Status
▪ Alpha release 3/22/2019 (Solr only)
▪ Beta1 release this week (?)
– This will include support for ElasticSearch
▪ Dream
– Incorporate experiment generation/GA into Rated Ranking Evaluator (RRE)
– Apache Incubator -> Top Level Project (TLP)
| 11 |
© 2019 The MITRE Corporation. All rights reserved.
Links
▪ Main site: https://github.com/mitre/quaerite
▪ Examples: https://github.com/mitre/quaerite/blob/master/quaerite-
examples/README.md
▪ Contact
– tallison@apache.org
– @_tallison

Weitere ähnliche Inhalte

Was ist angesagt?

Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowCambridge Semantics
 
Datahive 360 - Felipe Wesbonk
Datahive 360 - Felipe WesbonkDatahive 360 - Felipe Wesbonk
Datahive 360 - Felipe WesbonkImmelda Oord
 
Building A Feature Factory
Building A Feature FactoryBuilding A Feature Factory
Building A Feature FactoryDatabricks
 
The DataSift platform
The DataSift platform The DataSift platform
The DataSift platform ChrisParsons7
 
Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...TigerGraph
 
Turning Machine Learning Prototypes into Products
Turning Machine Learning Prototypes into ProductsTurning Machine Learning Prototypes into Products
Turning Machine Learning Prototypes into ProductsAll Things Open
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsBigML, Inc
 
SharePoint Search Results Branding
SharePoint Search Results BrandingSharePoint Search Results Branding
SharePoint Search Results BrandingCory Peters
 
Powering Next Best Action
Powering Next Best ActionPowering Next Best Action
Powering Next Best ActionAll Things Open
 
How a global manufacturing company built a data science capability from scratch
How a global manufacturing company built a data science capability from scratchHow a global manufacturing company built a data science capability from scratch
How a global manufacturing company built a data science capability from scratchCarlo Torniai
 
Schema on read with runtime fields
Schema on read with runtime fieldsSchema on read with runtime fields
Schema on read with runtime fieldsElasticsearch
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en AzureElena Lopez
 
Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...
Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...
Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...Databricks
 
APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...
APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...
APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...apidays
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Databricks
 
API Management: La Puerta de enlace (por Francisco Nieto)
API Management: La Puerta de enlace (por Francisco Nieto)API Management: La Puerta de enlace (por Francisco Nieto)
API Management: La Puerta de enlace (por Francisco Nieto)Jorge Millán Cabrera
 

Was ist angesagt? (20)

Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 
Datahive 360 - Felipe Wesbonk
Datahive 360 - Felipe WesbonkDatahive 360 - Felipe Wesbonk
Datahive 360 - Felipe Wesbonk
 
Building A Feature Factory
Building A Feature FactoryBuilding A Feature Factory
Building A Feature Factory
 
The DataSift platform
The DataSift platform The DataSift platform
The DataSift platform
 
Esri in AWS Cloud
Esri in AWS CloudEsri in AWS Cloud
Esri in AWS Cloud
 
Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
 
Turning Machine Learning Prototypes into Products
Turning Machine Learning Prototypes into ProductsTurning Machine Learning Prototypes into Products
Turning Machine Learning Prototypes into Products
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning Workflows
 
SharePoint Search Results Branding
SharePoint Search Results BrandingSharePoint Search Results Branding
SharePoint Search Results Branding
 
Powering Next Best Action
Powering Next Best ActionPowering Next Best Action
Powering Next Best Action
 
How a global manufacturing company built a data science capability from scratch
How a global manufacturing company built a data science capability from scratchHow a global manufacturing company built a data science capability from scratch
How a global manufacturing company built a data science capability from scratch
 
Schema on read with runtime fields
Schema on read with runtime fieldsSchema on read with runtime fields
Schema on read with runtime fields
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
 
Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...
Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...
Building a Scalable Data Science Solution to Outperform Sales Execution in Tr...
 
APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...
APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...
APIdays Paris 2019 - Data APIs as a service: Focusing on your core business w...
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
 
Esri ArcGIS Federal
Esri ArcGIS FederalEsri ArcGIS Federal
Esri ArcGIS Federal
 
Esri WebGIS Platform
Esri WebGIS PlatformEsri WebGIS Platform
Esri WebGIS Platform
 
API Management: La Puerta de enlace (por Francisco Nieto)
API Management: La Puerta de enlace (por Francisco Nieto)API Management: La Puerta de enlace (por Francisco Nieto)
API Management: La Puerta de enlace (por Francisco Nieto)
 

Ähnlich wie Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison

Implementing Machine Learning Incrementally
Implementing Machine Learning IncrementallyImplementing Machine Learning Incrementally
Implementing Machine Learning IncrementallyRavindra Guntur
 
Hyperledger weatherreport20190219 公開版
Hyperledger weatherreport20190219 公開版Hyperledger weatherreport20190219 公開版
Hyperledger weatherreport20190219 公開版Hyperleger Tokyo Meetup
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQServiceRocket
 
Keeping SharePoint Always On
Keeping SharePoint Always OnKeeping SharePoint Always On
Keeping SharePoint Always OnAntonioMaio2
 
Using Machine Learning to Debug complex Oracle RAC Issues
Using Machine Learning  to Debug complex Oracle RAC IssuesUsing Machine Learning  to Debug complex Oracle RAC Issues
Using Machine Learning to Debug complex Oracle RAC IssuesAnil Nair
 
Extreme Automation: The Emergence of RPA and AI for Treasury
Extreme Automation: The Emergence of RPA and AI for TreasuryExtreme Automation: The Emergence of RPA and AI for Treasury
Extreme Automation: The Emergence of RPA and AI for TreasuryKyriba Corporation
 
FLITE_Presentation JG v
FLITE_Presentation JG vFLITE_Presentation JG v
FLITE_Presentation JG vWesley Samples
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationSafe Software
 
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Amazon Web Services Korea
 
MITRE-Module 2 Slides.pdf
MITRE-Module 2 Slides.pdfMITRE-Module 2 Slides.pdf
MITRE-Module 2 Slides.pdfReZa AdineH
 
Proofpoint Emerging Threats Suricata 5.0 Webinar
Proofpoint Emerging Threats Suricata 5.0 WebinarProofpoint Emerging Threats Suricata 5.0 Webinar
Proofpoint Emerging Threats Suricata 5.0 WebinarJason Williams
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019DataKitchen
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonOpenSource Connections
 
Driving TAS Enterprise Fitness
Driving TAS Enterprise FitnessDriving TAS Enterprise Fitness
Driving TAS Enterprise FitnessVMware Tanzu
 
Crafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jetCrafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jetRohit Dhamija
 
RPA Portfolio Assessment
RPA Portfolio Assessment RPA Portfolio Assessment
RPA Portfolio Assessment Eric Rodman
 

Ähnlich wie Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison (20)

Implementing Machine Learning Incrementally
Implementing Machine Learning IncrementallyImplementing Machine Learning Incrementally
Implementing Machine Learning Incrementally
 
Hyperledger weatherreport20190219 公開版
Hyperledger weatherreport20190219 公開版Hyperledger weatherreport20190219 公開版
Hyperledger weatherreport20190219 公開版
 
Robotic Process Auditing
Robotic Process Auditing Robotic Process Auditing
Robotic Process Auditing
 
Atlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQAtlassian Executive Business Forum - LinkedIn HQ
Atlassian Executive Business Forum - LinkedIn HQ
 
Keeping SharePoint Always On
Keeping SharePoint Always OnKeeping SharePoint Always On
Keeping SharePoint Always On
 
Using Machine Learning to Debug complex Oracle RAC Issues
Using Machine Learning  to Debug complex Oracle RAC IssuesUsing Machine Learning  to Debug complex Oracle RAC Issues
Using Machine Learning to Debug complex Oracle RAC Issues
 
Extreme Automation: The Emergence of RPA and AI for Treasury
Extreme Automation: The Emergence of RPA and AI for TreasuryExtreme Automation: The Emergence of RPA and AI for Treasury
Extreme Automation: The Emergence of RPA and AI for Treasury
 
FLITE_Presentation JG v
FLITE_Presentation JG vFLITE_Presentation JG v
FLITE_Presentation JG v
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data Integration
 
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
 
MITRE-Module 2 Slides.pdf
MITRE-Module 2 Slides.pdfMITRE-Module 2 Slides.pdf
MITRE-Module 2 Slides.pdf
 
Proofpoint Emerging Threats Suricata 5.0 Webinar
Proofpoint Emerging Threats Suricata 5.0 WebinarProofpoint Emerging Threats Suricata 5.0 Webinar
Proofpoint Emerging Threats Suricata 5.0 Webinar
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
 
Driving TAS Enterprise Fitness
Driving TAS Enterprise FitnessDriving TAS Enterprise Fitness
Driving TAS Enterprise Fitness
 
BRE Deep Dive
BRE Deep DiveBRE Deep Dive
BRE Deep Dive
 
Crafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jetCrafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jet
 
Enabling Agility Through DevOps
Enabling Agility Through DevOpsEnabling Agility Through DevOps
Enabling Agility Through DevOps
 
Motadata product itsm overview
Motadata product itsm overviewMotadata product itsm overview
Motadata product itsm overview
 
RPA Portfolio Assessment
RPA Portfolio Assessment RPA Portfolio Assessment
RPA Portfolio Assessment
 

Mehr von OpenSource Connections

How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...OpenSource Connections
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajOpenSource Connections
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...OpenSource Connections
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...OpenSource Connections
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...OpenSource Connections
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...OpenSource Connections
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...OpenSource Connections
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah ViaOpenSource Connections
 
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...OpenSource Connections
 
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...OpenSource Connections
 
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...OpenSource Connections
 

Mehr von OpenSource Connections (20)

Encores
EncoresEncores
Encores
 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
 
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
 
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
 
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
 

Kürzlich hochgeladen

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit - Tim Allison

  • 1. © 2019 The MITRE Corporation. All rights reserved. Quaerite – Search Relevance Toolkit Tim Allison tallison@apache.org, @_tallison April 24, 2019 Haystack Conference Approved for Public Release; Distribution Unlimited. Case Number 18-3138-5
  • 2. | 2 | © 2019 The MITRE Corporation. All rights reserved. Debt of Gratitude ▪ Thank you Doug Turnbull, John Berryman and Open Source Connections for the inspiration/examples/training with tmdb and for sharing your ground truth set!
  • 3. | 3 | © 2019 The MITRE Corporation. All rights reserved. Yet Another Toolkit? Why!? ▪ How many parameters do we have? ▪ How many permutations of those parameters are available?
  • 4. | 4 | © 2019 The MITRE Corporation. All rights reserved. Available Parameters ▪ 14 tokenizers https://lucene.apache.org/solr/guide/7_1/tokenizers.html ▪ ~45 token filters (not including language-specific token filters – see next slide) https://lucene.apache.org/solr/guide/7_1/filter-descriptions.html ▪ Query parsers ▪ Query operators, minimum should match, should, must, not ▪ Token/field based scoring – best_fields, most_fields, cross_fields ▪ Field boosting ▪ Phrasal boosting/shingling ▪ Synonym lists, taxonomies ▪ Similarity scoring parameters (with BM25) ▪ Elevate ▪ External signal enrichment – manual or automatic (NLP – entity extraction, categorization, etc.) ▪ Reranking via machine learning (Learning to Rank) | 4 | © 2019 The MITRE Corporation. All rights reserved. For internal MITRE use
  • 5. | 5 | © 2019 The MITRE Corporation. All rights reserved. Each Token Filter Can Have Many Parameters <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" preserveOriginal="1"/> | 5 | © 2019 The MITRE Corporation. All rights reserved. For internal MITRE use
  • 6. | 6 | © 2019 The MITRE Corporation. All rights reserved. Overview – Offline testing toolkit Prerequisites: 1. Reliable, generalizable ground truth 2. Reliable, useful underlying data 3. Offline metric has to have some connection to KPIs 4. Expertise – you still have to know what you’re doing!!!
  • 7. | 7 | © 2019 The MITRE Corporation. All rights reserved. Main Tools 1. Run Experiments 2. Generate Experiments ▪ All permutations (grid search) ▪ Random experiments (random search) 3. Genetic Algorithm ▪ Cross-fold validation!!! ▪ Complementary to LTR -- main diff is algorithm and in running offline to tune general settings rather than as reranking top n
  • 8. | 8 | © 2019 The MITRE Corporation. All rights reserved. Odds and Ends ▪ Analyzer Comparison over (mostly) the index ▪ Significant Terms (yawn…for archaic versions of Solr)…and planning to add these as parameters in “generate experiments”
  • 9. | 9 | © 2019 The MITRE Corporation. All rights reserved. Adding Porter Stemming: create account creat created: 709 create: 551 creating: 269 creates: 153 creat: 1 account account: 3244 accounts: 1924 accounting: 1548 accountants: 340 accountant: 176 accounted: 134 accountability: 74 accountable: 74 accountancy: 65 account's: 7 accountant's: 7
  • 10. | 10 | © 2019 The MITRE Corporation. All rights reserved. Status ▪ Alpha release 3/22/2019 (Solr only) ▪ Beta1 release this week (?) – This will include support for ElasticSearch ▪ Dream – Incorporate experiment generation/GA into Rated Ranking Evaluator (RRE) – Apache Incubator -> Top Level Project (TLP)
  • 11. | 11 | © 2019 The MITRE Corporation. All rights reserved. Links ▪ Main site: https://github.com/mitre/quaerite ▪ Examples: https://github.com/mitre/quaerite/blob/master/quaerite- examples/README.md ▪ Contact – tallison@apache.org – @_tallison