SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Taxonomical Semantical
Magical Search
OpenSource Connections
Doug Turnbull
Relevance Lead
dturnbull@o19s.com
@softwaredoug
© OpenSource Connections, 2017
Solr/ES consulting: team 100%
focused on relevance
Learn to rank – semantic search –
relevance – personalization – findability
Who are we?
© OpenSource Connections, 2017
Reflect:
What problem are you trying to solve
when you jump to 'semantic search'?
© OpenSource Connections, 2017
"We studied spontaneous word choice for objects in five
application-related domains, and found the variability to be
surprisingly large. In every case two people favored the same
term with probability <0.20. "
"Simulations show how this fundamental
property of language limits the success of
various design methodologies for
vocabulary-driven interaction. "
© OpenSource Connections, 2017
Solve with keyword stuffing?
- Content creators guarantee every "shoe" has a
"shoe" keyword somewhere!
- And every wing-tip mentions dress shoes…
- ...Ad infinitum…
© OpenSource Connections, 2017
Solve with tagging?
- Java is a type of JVM language. Should this be
tagged JVM too? What is a "query string"? Which
of these tags is useful for search?
- Who tags everything? Is it consistent? What are
the rules?
(taken from Stackoverflow)
© OpenSource Connections, 2017
Solve with synonyms?
Yes! Synonyms can be a tool that can help us. But
it's easy to mess up:
shoes => dress shoes
wing tips,shoes
tennis shoes,shoes
When I search for tennis shoes, why do I get wing
tips; why do I get dresses?!?
© OpenSource Connections, 2017
Talking teaches/reminds vocab
(Searching)
shoes dress shoes brown wing tips
Searcher learning:
results gives clues to
help shopper refine
further
Searcher trusting:
more confident on
terms to use
Searcher
uncertain: uses
broad queries to
experiment
© OpenSource Connections, 2017
Searchers get more specific...
wing tips
Hierarchy of Ideas:
NP (item): "wing tips"
type_of:"dress shoes"
type_of:"shoe"
shoes
NP(item): "shoe"
More
specific
© OpenSource Connections, 2017
… and try types of modifiers
wing tips
NP (item): "wing tips"
type_of:"dress shoes"
type_of:"shoe"
sapphire wing tips
NP (item): "wing tips"
type_of:"dress shoes"
type_of:"shoe"
ADJ (color) "sapphire"
type_of:"blue"
© OpenSource Connections, 2017
Semantic search:
enable semantic exploration
Low term specificity:
search term specifies a
wide category to explore
Searching for "shoes"
High term specificity:
search term too specific, try
semantically broader/similar
items
"Show 'dress shoes' for
'oxfords' "
© OpenSource Connections, 2017
Make Solr grok type-of relationships
"wing tip" is a type of "dress shoe" is a type of "shoe"
Search here, only
show wing tips
Search here, show all
things that are a
type-of shoe
Beyond the actual terms used in docs
© OpenSource Connections, 2017
Per-entity terms a taxonomy
Shoes
Athletic Shoes
Dress Shoes
High Heels
Oxfords
Wing Tips
Running Shoes
Tennis Shoes
Blue Sapphire
Sky blue
A search taxonomy (not the
taxonomy for your site nav)
© OpenSource Connections, 2017
Index-time tax. expansion
Item
Color
Size
Substrings ->
Entities
Expand to
broad/narrow
tennis shoes => footwearshoesathletictennis_shoes
sapphire => bluesapphire
© OpenSource Connections, 2017
In Solr...
Item
Color
Size
Possible to build from
simple keepwords
Query or Index time
synonyms uses TF*IDF of
concept
Substrings ->
Entities
Expand to
broad/narrow
tennis shoes => tennis_shoes,athletic_shoes,shoes,...
sapphire => sapphire,blue
© OpenSource Connections, 2017
In Solr, index time...
(Input Text) You will love these maroon dress shoes
(tokenization & maybe stemming) [you] [will] [love] [these] [maroon] [dress] [shoes]
compound/decompound (syn filter) [you] [will] [love] [these] [maroon] [dress_shoes]
Keepwords for entity [dress_shoes]
Semantic expansion (syn filter) [dress_shoes] [shoes]
(Input Text) You will love these maroon dress shoes
(tokenization & maybe stemming) [you] [will] [love] [these] [maroon] [dress] [shoes]
compound/decompound (syn filter) [you] [will] [love] [these] [maroon] [dress_shoes]
Keepwords for entity [maroon]
Semantic expansion (syn filter) [maroon] [brown]
"Item"
copy
field
"Color"
copy
field
© OpenSource Connections, 2017
Index time solution
(Input Text) brown wing tips
(Item analyzer output) [wing_tips] [dress_shoes]
[shoes]
(Input Text) brown wing tips
(Color analyzer output) [brown]
Matches maroon, because at index
time: maroon => brown, maroon
IDF Highest for wing_tips
Lowest for shoes
(eliminate TF? norms?)
q=brown wing tips
&defType=edismax
&sow=false
&qf=item^100 color^10
(you'll want to search more than
these semantic fields)
© OpenSource Connections, 2017
Query-time tax. expansion
How do users think
of your items?
Item
Color
Size
Trained/built
From Query logs
Substrings ->
Entities
Expand to
broad/narrow
tennis shoes => item:"tennis shoes" OR item:"athletic
shoes" OR item:"shoes" ...
sapphire => color:blue OR color:sapphire
sapphire tennis shoes
© OpenSource Connections, 2017
Query Phrase In Solr...
(Input Text) Brown wing tips
Semantic expansion (syn filter) [wing tips] [dress shoes] [shoes]
(Input Text) Brown wing tips
Semantic expansion (syn filter) [brown] [maroon]
Item
Semantic
Analyzer
Color
Semantic
Analyzer
Transform to description("dress shoes" OR "wing tips" OR shoes OR maroon OR brown)
Problems:
- two query analyzers for same field not possible in Solr
- Can't re-tokenize [dress shoes] -> "dress shoes" phrase q
© OpenSource Connections, 2017
Match Query Parserhttps://github.com/o19s/match-query-parser
q=brown wing tips
&defType=edismax
&qf=description title
&bq={!match analyze_as=item_tax search_with=phrase qf=description
v=$q}^100
&bq={!match analyze_as=color_tax search_with=phrase qf=description v=$q}
How to analyze
query string
Phrase: retokenize
multi word tokens and
do phrase search
© OpenSource Connections, 2017
Other building blocks
Auto Phrase Token Filter / Query Auto Filtering:
- https://github.com/lucidworks/auto-phrase-tokenfilter
- https://lucidworks.com/2015/02/17/introducing-query-autofiltering/
Health-on-net Lucene Synonyms
- https://github.com/healthonnet/hon-lucene-synonyms
Sematext Query Segmenter:
- https://github.com/sematext/query-segmenter
Shopping 24 Bmax Query Parser
- https://github.com/shopping24/solr-bmax-queryparser
© OpenSource Connections, 2017
Deriving Querqy rules from taxonomies
https://github.com/renekrie/querqy
© OpenSource Connections, 2017
Query Time vs Index Time
Query Time:
PROS
- No need to reindex when
updating managed vocab
CONS
- Relevance scoring of terms
(boosts help)
- Complex / slow queries
Index Time:
PROS
- TF*IDF more accurate scoring
(broad concepts score low,
narrow score high)
- Faster queries
CONS
- Reindexing for synonym
changes
© OpenSource Connections, 2017
Structure your docs for query understanding
Relevance engineer's challenge:
- Where can we begin with a taxonomy?
- Reuse filters & facets
- Reuse your page's navigational taxonomy?
- Track which searches land on pages (old school click
tracking)?
- Zero results tracking?
- How do we incentivize content creators to move away from
keyword stuffing to organizing to search keyword taxonomy?
- Finally: we don't care about the source data model, only what helps
users find things
© OpenSource Connections, 2017
SHReC Algorithm
© OpenSource Connections, 2017
SHReC Algorithm
Simple doc frequency in-content to look for super-concepts / sub-concepts
term/phrase x subsumes y (x parent concept?) when:
df(x) > df(y)
df(x ∧ y) / df(y) >= α (α = 1 complete subsumption)
© OpenSource Connections, 2017
SHReC Algorithm Example
Shoes
Wing Tips
df("shoes") > df("wing tips")
df("shoes" ∧ "wing tips") / df("wing tips") >= 0.8
© OpenSource Connections, 2017
SHReC Algorithm with Solr
Shoes
Wing Tips
df("shoes") > df("wing tips")
df("shoes" ∧ "wing tips") / df("wing tips") >= 0.8
Cache doc freq (q=*:*&facet.field=item&facet=true)
q=item:"wing tips" AND item:shoes, num results
© OpenSource Connections, 2017
Unfortunately reality is messy
Shoes
Wing Tips
Your data
probably
looks like
© OpenSource Connections, 2017
Idea:mine other corpus?
Shoes Wing Tips
● but still, what
phrases do
you test?
© OpenSource Connections, 2017
Statistically sig. colocations
Wing Tips
WingTips
Student t-test against null hypothesis that wing / tips
unrelated
© OpenSource Connections, 2017
Refinements
shoe
dress shoe (12%) wing tip (23%)
tennis shoe (11%)
blue dress shoe (1%)
sapphire brooks brothers dress shoe (0.001%)
brown dress shoe (20%)
Colors scattered
throughout
Sub
concepts,
likely child
phrases
tennis shoe (11%)
Siblings refine
each other
running shoe (34%)
Should these be in
supercategory
"athletic shoes"?
© OpenSource Connections, 2017
Refinement mining in Solr
docs = [{
"query": "shoe"
"refinement": "dress shoe"
},
{
"query": "shoe"
"refinement": "brown shoe"
},
{
"query": "tie"
"refinement": "brown tie"
}]
q=query:shoe&
facet=true&
facet.field=refinement
Refinements:
- dress shoe (4)
- tennis shoe (2)
- ...
© OpenSource Connections, 2017
SHReC w/ Refinements
docs = [{
"query": "shoe"
"refinement": "dress shoe"
},
{
"query": "shoe"
"refinement": "brown shoe"
},
{
"query": "tie"
"refinement": "brown tie"
}]
q=query:shoe&
facet=true&
facet.field=refinement
© OpenSource Connections, 2017
SHReC w/ Refinements
q=query:shoe&
facet=true&
facet.field=refinement
Num results for q=shoe
(Slow, but you do this rarely)
Seed the
corpus
exploration
SHReC
© OpenSource Connections, 2017
SHReC w/ sig terms
scoreNodes(
select(
facet(collectionName,
q="query:shoes",
buckets="refinements",
bucketSorts="count(*) desc",
bucketSizeLimit="100",
count(*)),
refine_graph as node,
"count(*)",
replace(collection, null, withValue=collectionName),
replace(field, null, withValue=refine_graph))
)
What's actually
happening in
SHReC is
significance
scoring, which is
baked into Solr:
Relationship of
local vs global
© OpenSource Connections, 2017
Other ways of measuring term stat. significance
● Trey G. Solr knowledge graph (hope you saw his
talk)!
https://lucidworks.com/video/leveraging-lucenesolr-as
-a-knowledge-graph-and-intent-engine/
● Mark Harwood Elastic Graph / Sig Terms
https://www.elastic.co/elasticon/conf/2016/sf/graph-c
apabilities-in-the-elastic-stack
© OpenSource Connections, 2017
But word2vec, LDA, etc
- Focused on content, not users: Focused on discovering topics/synonyms in
content: we often need search query to content vernacular mappings
- Traditional topic modeling flat
- Hierarchies extracted from content don't reflect user's hierarchies & how they
map to content
- Don't confuse co-occurences with synonyms without extensive data
modeling/munging to get your content here
© OpenSource Connections, 2017
Questions?
Further Reading:
- Relevant Search!
- Blog articles:
- Building Entity-focused search w/ Keyphrases:
- http://opensourceconnections.com/blog/2016/12/02/solr-elasticsearch-synony
ms-better-patterns-keyphrases/
- Synonym best practices:
- http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-p
atterns-taxonomies/
- Match Query Parser:
- http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multite
rm-synonyms/
Discount code: relsearch
http://manning.com
- <shoutout BLOOOMBERG!!>
- We built a learning to rank plugin for that other
search engine...
Shameless plug

Weitere ähnliche Inhalte

Ähnlich wie Taxonomical Semantical Magical Search - Doug Turnbull, OpenSource Connections

Learn more about Entity Extraction May 2014
Learn more about Entity Extraction May 2014Learn more about Entity Extraction May 2014
Learn more about Entity Extraction May 2014Anders Häggdahl
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customersrichwig
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataAndy Stretton
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinDatabricks
 
Swot Analysis Essay.pdf
Swot Analysis Essay.pdfSwot Analysis Essay.pdf
Swot Analysis Essay.pdfEvelin Santos
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
 
Constructing your search
Constructing your searchConstructing your search
Constructing your searchJamie Bisset
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
#1NLab17 - Eight for Eight: Finishing Strong
#1NLab17 - Eight for Eight: Finishing Strong #1NLab17 - Eight for Eight: Finishing Strong
#1NLab17 - Eight for Eight: Finishing Strong One North
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Lucidworks
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAmazon Web Services
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014PyData
 
Information Architecture
Information ArchitectureInformation Architecture
Information ArchitectureOlivier Tripet
 
Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?Fred Leise
 
Conversation and Memory - ALX401-R - re:Invent 2017
Conversation and Memory - ALX401-R - re:Invent 2017Conversation and Memory - ALX401-R - re:Invent 2017
Conversation and Memory - ALX401-R - re:Invent 2017Amazon Web Services
 

Ähnlich wie Taxonomical Semantical Magical Search - Doug Turnbull, OpenSource Connections (20)

Learn more about Entity Extraction May 2014
Learn more about Entity Extraction May 2014Learn more about Entity Extraction May 2014
Learn more about Entity Extraction May 2014
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer Shin
 
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
 
Swot Analysis Essay
Swot Analysis EssaySwot Analysis Essay
Swot Analysis Essay
 
Swot Analysis Essay.pdf
Swot Analysis Essay.pdfSwot Analysis Essay.pdf
Swot Analysis Essay.pdf
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Key Phrases for Better Search
Key Phrases for Better SearchKey Phrases for Better Search
Key Phrases for Better Search
 
Constructing your search
Constructing your searchConstructing your search
Constructing your search
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
#1NLab17 - Eight for Eight: Finishing Strong
#1NLab17 - Eight for Eight: Finishing Strong #1NLab17 - Eight for Eight: Finishing Strong
#1NLab17 - Eight for Eight: Finishing Strong
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AI
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
Quepy
QuepyQuepy
Quepy
 
yn
ynyn
yn
 
Information Architecture
Information ArchitectureInformation Architecture
Information Architecture
 
Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?
 
Conversation and Memory - ALX401-R - re:Invent 2017
Conversation and Memory - ALX401-R - re:Invent 2017Conversation and Memory - ALX401-R - re:Invent 2017
Conversation and Memory - ALX401-R - re:Invent 2017
 

Mehr von Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Kürzlich hochgeladen

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Kürzlich hochgeladen (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Taxonomical Semantical Magical Search - Doug Turnbull, OpenSource Connections

  • 1. Taxonomical Semantical Magical Search OpenSource Connections Doug Turnbull Relevance Lead dturnbull@o19s.com @softwaredoug © OpenSource Connections, 2017
  • 2. Solr/ES consulting: team 100% focused on relevance Learn to rank – semantic search – relevance – personalization – findability Who are we?
  • 3. © OpenSource Connections, 2017 Reflect: What problem are you trying to solve when you jump to 'semantic search'?
  • 4. © OpenSource Connections, 2017 "We studied spontaneous word choice for objects in five application-related domains, and found the variability to be surprisingly large. In every case two people favored the same term with probability <0.20. " "Simulations show how this fundamental property of language limits the success of various design methodologies for vocabulary-driven interaction. "
  • 5. © OpenSource Connections, 2017 Solve with keyword stuffing? - Content creators guarantee every "shoe" has a "shoe" keyword somewhere! - And every wing-tip mentions dress shoes… - ...Ad infinitum…
  • 6. © OpenSource Connections, 2017 Solve with tagging? - Java is a type of JVM language. Should this be tagged JVM too? What is a "query string"? Which of these tags is useful for search? - Who tags everything? Is it consistent? What are the rules? (taken from Stackoverflow)
  • 7. © OpenSource Connections, 2017 Solve with synonyms? Yes! Synonyms can be a tool that can help us. But it's easy to mess up: shoes => dress shoes wing tips,shoes tennis shoes,shoes When I search for tennis shoes, why do I get wing tips; why do I get dresses?!?
  • 8. © OpenSource Connections, 2017 Talking teaches/reminds vocab (Searching) shoes dress shoes brown wing tips Searcher learning: results gives clues to help shopper refine further Searcher trusting: more confident on terms to use Searcher uncertain: uses broad queries to experiment
  • 9. © OpenSource Connections, 2017 Searchers get more specific... wing tips Hierarchy of Ideas: NP (item): "wing tips" type_of:"dress shoes" type_of:"shoe" shoes NP(item): "shoe" More specific
  • 10. © OpenSource Connections, 2017 … and try types of modifiers wing tips NP (item): "wing tips" type_of:"dress shoes" type_of:"shoe" sapphire wing tips NP (item): "wing tips" type_of:"dress shoes" type_of:"shoe" ADJ (color) "sapphire" type_of:"blue"
  • 11. © OpenSource Connections, 2017 Semantic search: enable semantic exploration Low term specificity: search term specifies a wide category to explore Searching for "shoes" High term specificity: search term too specific, try semantically broader/similar items "Show 'dress shoes' for 'oxfords' "
  • 12. © OpenSource Connections, 2017 Make Solr grok type-of relationships "wing tip" is a type of "dress shoe" is a type of "shoe" Search here, only show wing tips Search here, show all things that are a type-of shoe Beyond the actual terms used in docs
  • 13. © OpenSource Connections, 2017 Per-entity terms a taxonomy Shoes Athletic Shoes Dress Shoes High Heels Oxfords Wing Tips Running Shoes Tennis Shoes Blue Sapphire Sky blue A search taxonomy (not the taxonomy for your site nav)
  • 14. © OpenSource Connections, 2017 Index-time tax. expansion Item Color Size Substrings -> Entities Expand to broad/narrow tennis shoes => footwearshoesathletictennis_shoes sapphire => bluesapphire
  • 15. © OpenSource Connections, 2017 In Solr... Item Color Size Possible to build from simple keepwords Query or Index time synonyms uses TF*IDF of concept Substrings -> Entities Expand to broad/narrow tennis shoes => tennis_shoes,athletic_shoes,shoes,... sapphire => sapphire,blue
  • 16. © OpenSource Connections, 2017 In Solr, index time... (Input Text) You will love these maroon dress shoes (tokenization & maybe stemming) [you] [will] [love] [these] [maroon] [dress] [shoes] compound/decompound (syn filter) [you] [will] [love] [these] [maroon] [dress_shoes] Keepwords for entity [dress_shoes] Semantic expansion (syn filter) [dress_shoes] [shoes] (Input Text) You will love these maroon dress shoes (tokenization & maybe stemming) [you] [will] [love] [these] [maroon] [dress] [shoes] compound/decompound (syn filter) [you] [will] [love] [these] [maroon] [dress_shoes] Keepwords for entity [maroon] Semantic expansion (syn filter) [maroon] [brown] "Item" copy field "Color" copy field
  • 17. © OpenSource Connections, 2017 Index time solution (Input Text) brown wing tips (Item analyzer output) [wing_tips] [dress_shoes] [shoes] (Input Text) brown wing tips (Color analyzer output) [brown] Matches maroon, because at index time: maroon => brown, maroon IDF Highest for wing_tips Lowest for shoes (eliminate TF? norms?) q=brown wing tips &defType=edismax &sow=false &qf=item^100 color^10 (you'll want to search more than these semantic fields)
  • 18. © OpenSource Connections, 2017 Query-time tax. expansion How do users think of your items? Item Color Size Trained/built From Query logs Substrings -> Entities Expand to broad/narrow tennis shoes => item:"tennis shoes" OR item:"athletic shoes" OR item:"shoes" ... sapphire => color:blue OR color:sapphire sapphire tennis shoes
  • 19. © OpenSource Connections, 2017 Query Phrase In Solr... (Input Text) Brown wing tips Semantic expansion (syn filter) [wing tips] [dress shoes] [shoes] (Input Text) Brown wing tips Semantic expansion (syn filter) [brown] [maroon] Item Semantic Analyzer Color Semantic Analyzer Transform to description("dress shoes" OR "wing tips" OR shoes OR maroon OR brown) Problems: - two query analyzers for same field not possible in Solr - Can't re-tokenize [dress shoes] -> "dress shoes" phrase q
  • 20. © OpenSource Connections, 2017 Match Query Parserhttps://github.com/o19s/match-query-parser q=brown wing tips &defType=edismax &qf=description title &bq={!match analyze_as=item_tax search_with=phrase qf=description v=$q}^100 &bq={!match analyze_as=color_tax search_with=phrase qf=description v=$q} How to analyze query string Phrase: retokenize multi word tokens and do phrase search
  • 21. © OpenSource Connections, 2017 Other building blocks Auto Phrase Token Filter / Query Auto Filtering: - https://github.com/lucidworks/auto-phrase-tokenfilter - https://lucidworks.com/2015/02/17/introducing-query-autofiltering/ Health-on-net Lucene Synonyms - https://github.com/healthonnet/hon-lucene-synonyms Sematext Query Segmenter: - https://github.com/sematext/query-segmenter Shopping 24 Bmax Query Parser - https://github.com/shopping24/solr-bmax-queryparser
  • 22. © OpenSource Connections, 2017 Deriving Querqy rules from taxonomies https://github.com/renekrie/querqy
  • 23. © OpenSource Connections, 2017 Query Time vs Index Time Query Time: PROS - No need to reindex when updating managed vocab CONS - Relevance scoring of terms (boosts help) - Complex / slow queries Index Time: PROS - TF*IDF more accurate scoring (broad concepts score low, narrow score high) - Faster queries CONS - Reindexing for synonym changes
  • 24. © OpenSource Connections, 2017 Structure your docs for query understanding Relevance engineer's challenge: - Where can we begin with a taxonomy? - Reuse filters & facets - Reuse your page's navigational taxonomy? - Track which searches land on pages (old school click tracking)? - Zero results tracking? - How do we incentivize content creators to move away from keyword stuffing to organizing to search keyword taxonomy? - Finally: we don't care about the source data model, only what helps users find things
  • 25. © OpenSource Connections, 2017 SHReC Algorithm
  • 26. © OpenSource Connections, 2017 SHReC Algorithm Simple doc frequency in-content to look for super-concepts / sub-concepts term/phrase x subsumes y (x parent concept?) when: df(x) > df(y) df(x ∧ y) / df(y) >= α (α = 1 complete subsumption)
  • 27. © OpenSource Connections, 2017 SHReC Algorithm Example Shoes Wing Tips df("shoes") > df("wing tips") df("shoes" ∧ "wing tips") / df("wing tips") >= 0.8
  • 28. © OpenSource Connections, 2017 SHReC Algorithm with Solr Shoes Wing Tips df("shoes") > df("wing tips") df("shoes" ∧ "wing tips") / df("wing tips") >= 0.8 Cache doc freq (q=*:*&facet.field=item&facet=true) q=item:"wing tips" AND item:shoes, num results
  • 29. © OpenSource Connections, 2017 Unfortunately reality is messy Shoes Wing Tips Your data probably looks like
  • 30. © OpenSource Connections, 2017 Idea:mine other corpus? Shoes Wing Tips ● but still, what phrases do you test?
  • 31. © OpenSource Connections, 2017 Statistically sig. colocations Wing Tips WingTips Student t-test against null hypothesis that wing / tips unrelated
  • 32. © OpenSource Connections, 2017 Refinements shoe dress shoe (12%) wing tip (23%) tennis shoe (11%) blue dress shoe (1%) sapphire brooks brothers dress shoe (0.001%) brown dress shoe (20%) Colors scattered throughout Sub concepts, likely child phrases tennis shoe (11%) Siblings refine each other running shoe (34%) Should these be in supercategory "athletic shoes"?
  • 33. © OpenSource Connections, 2017 Refinement mining in Solr docs = [{ "query": "shoe" "refinement": "dress shoe" }, { "query": "shoe" "refinement": "brown shoe" }, { "query": "tie" "refinement": "brown tie" }] q=query:shoe& facet=true& facet.field=refinement Refinements: - dress shoe (4) - tennis shoe (2) - ...
  • 34. © OpenSource Connections, 2017 SHReC w/ Refinements docs = [{ "query": "shoe" "refinement": "dress shoe" }, { "query": "shoe" "refinement": "brown shoe" }, { "query": "tie" "refinement": "brown tie" }] q=query:shoe& facet=true& facet.field=refinement
  • 35. © OpenSource Connections, 2017 SHReC w/ Refinements q=query:shoe& facet=true& facet.field=refinement Num results for q=shoe (Slow, but you do this rarely) Seed the corpus exploration SHReC
  • 36. © OpenSource Connections, 2017 SHReC w/ sig terms scoreNodes( select( facet(collectionName, q="query:shoes", buckets="refinements", bucketSorts="count(*) desc", bucketSizeLimit="100", count(*)), refine_graph as node, "count(*)", replace(collection, null, withValue=collectionName), replace(field, null, withValue=refine_graph)) ) What's actually happening in SHReC is significance scoring, which is baked into Solr: Relationship of local vs global
  • 37. © OpenSource Connections, 2017 Other ways of measuring term stat. significance ● Trey G. Solr knowledge graph (hope you saw his talk)! https://lucidworks.com/video/leveraging-lucenesolr-as -a-knowledge-graph-and-intent-engine/ ● Mark Harwood Elastic Graph / Sig Terms https://www.elastic.co/elasticon/conf/2016/sf/graph-c apabilities-in-the-elastic-stack
  • 38. © OpenSource Connections, 2017 But word2vec, LDA, etc - Focused on content, not users: Focused on discovering topics/synonyms in content: we often need search query to content vernacular mappings - Traditional topic modeling flat - Hierarchies extracted from content don't reflect user's hierarchies & how they map to content - Don't confuse co-occurences with synonyms without extensive data modeling/munging to get your content here
  • 39. © OpenSource Connections, 2017 Questions? Further Reading: - Relevant Search! - Blog articles: - Building Entity-focused search w/ Keyphrases: - http://opensourceconnections.com/blog/2016/12/02/solr-elasticsearch-synony ms-better-patterns-keyphrases/ - Synonym best practices: - http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-p atterns-taxonomies/ - Match Query Parser: - http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multite rm-synonyms/ Discount code: relsearch http://manning.com
  • 40. - <shoutout BLOOOMBERG!!> - We built a learning to rank plugin for that other search engine... Shameless plug