SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Payloads in Solr
Erik Hatcher
Senior Solutions Architect / co-founder, Lucidworks
Solr now smoothly integrates with Lucene-level payloads.
Payloads provide optional per-term metadata, numeric or
otherwise. Payloads help solve challenging use cases such as
per-store product pricing and per-term confidence/weighting.
This session will present the payload feature from the Lucene
layer up to the Solr integration, including per-store pricing,
per-term weighting, and more.
Payloads in Solr
Payloads in Solr
01
tl;dr
• Solr 6.6+ via SOLR-1485
• per-term position metadata
• Use cases:
• per-store pricing
• weighting terms: e.g. confidence of
term, or importance/relevance of term
• weighting term types (synonyms
factor lower, verbs factor higher)
Payloads in Solr
01
Lucene’s Payloads
• Token: PayloadAttribute
• byte[] per term position, optional
• Several components set payloads
• Similarity.SimScorer

#computePayloadFactor
• No built-in components (outside
Lucene’s test cases), before
SOLR-1485, implemented this
• PostingsEnum#getPayload
Payloads in Solr
http://lucene.apache.org/core/6_6_0/core/
org/apache/lucene/codecs/lucene50/
Lucene50PostingsFormat.html
Postings Format
Payloads in Solr
01
Lucene’s Token
• Field
• Attributes:
• CharTerm: term text
• … Keyword, Type, Offset,…
• and Payload!
Payloads in Solr
01
setPayload(bytes)
• DelimitedPayloadTokenFilter
• NumericPayloadTokenFilter
• TokenOffsetPayloadTokenFilter
• TypeAsPayloadTokenFilter
• pre-analyzed field (Solr)
Payloads in Solr
01
DelimitedPayloadTokenFilter
Payloads in Solr
01
DelimitedPayloadTokenFilter
• term1|payload1 term2|payload2
• encodes payloads as:
• float,
• int,
• or string / raw bytes
field weighted_terms_dps
term one
doc 0
freq 1
pos 0
payload 1.0
term three
doc 0
freq 1
pos 2
payload 3.0
term two
doc 0
freq 1
pos 1
payload 2.0
term weighted
doc 1
freq 2
pos 0
payload 50.0
pos 1
payload 100.0
Payloads in Solr
01
Use Cases
• products with per-store pricing
• boosting by weighted terms
• down-boosting synonyms
Payloads in Solr
01Traditional per-store pricing
strategies
• Explode docs:
• num_docs=products
* stores (1M products
* 5000 stores could
be up to 5B docs!)
• query-time
collapsing (by
product id)
• Explode fields:
• default_price
• store_price_0001
• store_price_0002
• …
store_price_NNNN
• query-time field
choice
• eg. up to 5000 fields
per document
Payloads in Solr
01
Payload-based per-store pricing
• default_price
• store_prices:
• terms: STORE_0001… STORE_NNNN
• per-term payload of price
• One additional field
• with up to num_stores terms/payloads
Payloads in Solr
01
Down-boosting synonyms
id,synonyms_with_payloads
99,tv
synonyms.txt
Television, Televisions, TV, TVs
/select?wt=csv&fl=id,score&

q={!payload_score 

f=synonyms_with_payloads

v=$payload_term

func=max}

&payload_term=television


id,score
99,0.1
&payload_term=tv
id,score
99,1.0
{
"add-field-type": {
"name": "synonyms_with_payloads",
"stored": "true",
"class": "solr.TextField",
"positionIncrementGap": “100",
"indexAnalyzer": {
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.SynonymGraphFilterFactory",
"expand": "true",
"ignoreCase": "true",
"synonyms": "synonyms.txt"
},
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.NumericPayloadTokenFilterFactory",
"payload": "0.1",
"typeMatch": "SYNONYM"
}
]
},
,"queryAnalyzer": {
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
}
}}
Payloads in Solr
01
Solr Integration
• Schema-aware
• DelimitedPayloadTokenFilter:
• float, integer, identity
• NumericPayloadTokenFilter: float
• Function / Value Source
• payload()
• Query parsers
• {!payload_score}
• {!payload_check}
• Default (data_driven) schema has built-in payload-enabled
dynamic field mappings:
• *_dpf, *_dpi, and *_dps
Payloads in Solr
01
Solr features with payloads
• searching (scoring by payload):

q={!payload_score…}
• searching (filtering by payload):

fq={!frange cost=999 l=0 u=100}payload(…)
• sorting:

sort=payload(…) desc
• faceting:

facet.query={!frange 

l=0 u=100 

v=$payload_func}
&payload_func=payload(…)
Payloads in Solr
01
payload()
• payload(field,

term

[,default_value

[,min|max|average|first]])
• Operates on float or integer encoded payloads
• Value source, returning a single float per-document
• Multiple term matches are possible, returning the min,
max, or average. first is a special short-circuit
• If no term match for document, returns default value,
or zero
Payloads in Solr
01
payload() uses
• &payload_function=payload(….)
• Returning: 

fl=payload_result:${payload_function}
• Sorting:

sort=${payload function} desc
• Range faceting:

facet.query={!frange 

key=up_to_one_hundred

l=0 u=100 v=$payload_function}
• Matching:
• without payload considered: term query, eg {!term}
• with payloads factored: {!payload_check}
Payloads in Solr
01
{!payload_score}
• SpanQuery wrapping, payload-based scoring
• SpanQuery support: currently SpanNearQuery of
SpanTermQuery’s
• scoring:
• payload function: min, max, or average
• includeSpanScore=true: multiples payload
function result by base query scoring
• with a simple term query, payload() function is
equivalent (with includeSpanScore=false)
Payloads in Solr
01
{!payload_score} examples
{!payload_score 

f=payloaded_field_name 

v=term_value

func=min|max|average
[includeSpanScore=false]
}
{!payload_score

f=vals_dpf 

func=average

v=weighted

includeSpanScore=true}
Payloads in Solr
01
{!xmlparser}
• {!xmlparser}

<BoostingTermQuery 

fieldName="weighted_terms_dpf">

weighted

</BoostingTermQuery>
• == {!payload_score f=weighted_terms_dpf
func=average includeSpanScore=true}
Payloads in Solr
01
{!payload_check}
• SpanQuery wrapping, phrase relevancy scoring
• SpanQuery support: currently SpanNearQuery of
SpanTermQuery’s
• matching:
• matches when all terms match all corresponding
payloads, in order
• scoring:
• uses SpanNearQuery’s score
Payloads in Solr
01
{!payload_check}
id,words_dps
99,taking|VERB the|ARTICLE train|NOUN
q={!payload_check 

f=words_dps 

v=train

payloads=NOUN}
q={!payload_check 

f=words_dps

v='the train'

payloads='ARTICLE NOUN'}
Payloads in Solr
01
Payload Cons
• payload(): if used as a {!func} q or facet.query it will
compute value for ALL documents in index. To PostFilter fq
payload function computation of just matching documents use
{!frange} with payload()
• Updating values
• Atomic field update
• (could multivalue and delete/add a single term|value)?
• could mean updating all inventory for all stores for a single
change
• no current range faceting support (of functions in general)
Payloads in Solr
01
What’s next
• SOLR-10541 - “Range facet by function”
• solves range faceting by payload
• LUCENE-7854: term frequency “payload”
• coming soon, see SOLR-11358
• OpenNLP types => payloads
• Pluggable encoders/decoders?
Payloads in Solr
https://lucidworks.com/2017/09/14/solr-payloads/
Further reading
Payloads in Solr

Weitere ähnliche Inhalte

Was ist angesagt?

Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Bfs and dfs in data structure
Bfs and dfs in  data structure Bfs and dfs in  data structure
Bfs and dfs in data structure Ankit Kumar Singh
 
nftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewallnftables - the evolution of Linux Firewall
nftables - the evolution of Linux FirewallMarian Marinov
 
Stack using Linked List
Stack using Linked ListStack using Linked List
Stack using Linked ListSayantan Sur
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesJimmy Angelakos
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestI Goo Lee
 
Busquedas heuristicas y no informadas
Busquedas heuristicas y no informadasBusquedas heuristicas y no informadas
Busquedas heuristicas y no informadasedelinc
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesSeveralnines
 
Hive 입문 발표 자료
Hive 입문 발표 자료Hive 입문 발표 자료
Hive 입문 발표 자료beom kyun choi
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
N Queen Algorithm
N Queen AlgorithmN Queen Algorithm
N Queen AlgorithmA.I. Tazib
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah
 
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우PgDay.Seoul
 
Radix sort presentation
Radix sort presentationRadix sort presentation
Radix sort presentationRatul Hasan
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache SolrSease
 

Was ist angesagt? (20)

Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Bfs and dfs in data structure
Bfs and dfs in  data structure Bfs and dfs in  data structure
Bfs and dfs in data structure
 
nftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewallnftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewall
 
Stack using Linked List
Stack using Linked ListStack using Linked List
Stack using Linked List
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
 
Linked list
Linked listLinked list
Linked list
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software Test
 
Busquedas heuristicas y no informadas
Busquedas heuristicas y no informadasBusquedas heuristicas y no informadas
Busquedas heuristicas y no informadas
 
Druid+superset
Druid+supersetDruid+superset
Druid+superset
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best Practices
 
Hive 입문 발표 자료
Hive 입문 발표 자료Hive 입문 발표 자료
Hive 입문 발표 자료
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
PostgreSQL and PL/Java
PostgreSQL and PL/JavaPostgreSQL and PL/Java
PostgreSQL and PL/Java
 
Design Pattern Cheatsheet
Design Pattern CheatsheetDesign Pattern Cheatsheet
Design Pattern Cheatsheet
 
N Queen Algorithm
N Queen AlgorithmN Queen Algorithm
N Queen Algorithm
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
 
Radix sort presentation
Radix sort presentationRadix sort presentation
Radix sort presentation
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 

Ähnlich wie Solr Payloads

In memory OLAP engine
In memory OLAP engineIn memory OLAP engine
In memory OLAP engineWO Community
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Lucidworks
 
How to get Automated Testing "Done"
How to get Automated Testing "Done"How to get Automated Testing "Done"
How to get Automated Testing "Done"TEST Huddle
 
An Introduction To Python - Problem Solving: Flowcharts & Test Cases, Boolean...
An Introduction To Python - Problem Solving: Flowcharts & Test Cases, Boolean...An Introduction To Python - Problem Solving: Flowcharts & Test Cases, Boolean...
An Introduction To Python - Problem Solving: Flowcharts & Test Cases, Boolean...Blue Elephant Consulting
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language Weaveworks
 
MongoDB World 2019: Just-in-time Validation with JSON Schema
MongoDB World 2019: Just-in-time Validation with JSON SchemaMongoDB World 2019: Just-in-time Validation with JSON Schema
MongoDB World 2019: Just-in-time Validation with JSON SchemaMongoDB
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysisVermaAkash32
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Lucidworks
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost ModelOlav Sandstå
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to ApexSujit Kumar
 
Test Coverage: An Art and a Science
Test Coverage: An Art and a ScienceTest Coverage: An Art and a Science
Test Coverage: An Art and a ScienceTeamQualityPro
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Lucidworks
 
FFW Gabrovo PMG - JavaScript 1
FFW Gabrovo PMG - JavaScript 1FFW Gabrovo PMG - JavaScript 1
FFW Gabrovo PMG - JavaScript 1Toni Kolev
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in HiveDataWorks Summit
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 

Ähnlich wie Solr Payloads (20)

In memory OLAP engine
In memory OLAP engineIn memory OLAP engine
In memory OLAP engine
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
 
How to get Automated Testing "Done"
How to get Automated Testing "Done"How to get Automated Testing "Done"
How to get Automated Testing "Done"
 
An Introduction To Python - Problem Solving: Flowcharts & Test Cases, Boolean...
An Introduction To Python - Problem Solving: Flowcharts & Test Cases, Boolean...An Introduction To Python - Problem Solving: Flowcharts & Test Cases, Boolean...
An Introduction To Python - Problem Solving: Flowcharts & Test Cases, Boolean...
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language
 
MongoDB World 2019: Just-in-time Validation with JSON Schema
MongoDB World 2019: Just-in-time Validation with JSON SchemaMongoDB World 2019: Just-in-time Validation with JSON Schema
MongoDB World 2019: Just-in-time Validation with JSON Schema
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to Apex
 
Test Coverage: An Art and a Science
Test Coverage: An Art and a ScienceTest Coverage: An Art and a Science
Test Coverage: An Art and a Science
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
 
FFW Gabrovo PMG - JavaScript 1
FFW Gabrovo PMG - JavaScript 1FFW Gabrovo PMG - JavaScript 1
FFW Gabrovo PMG - JavaScript 1
 
J unit
J unitJ unit
J unit
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 

Mehr von Erik Hatcher

Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 

Mehr von Erik Hatcher (20)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 

Kürzlich hochgeladen

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Kürzlich hochgeladen (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Solr Payloads

  • 1. Payloads in Solr Erik Hatcher Senior Solutions Architect / co-founder, Lucidworks
  • 2. Solr now smoothly integrates with Lucene-level payloads. Payloads provide optional per-term metadata, numeric or otherwise. Payloads help solve challenging use cases such as per-store product pricing and per-term confidence/weighting. This session will present the payload feature from the Lucene layer up to the Solr integration, including per-store pricing, per-term weighting, and more. Payloads in Solr
  • 3. Payloads in Solr 01 tl;dr • Solr 6.6+ via SOLR-1485 • per-term position metadata • Use cases: • per-store pricing • weighting terms: e.g. confidence of term, or importance/relevance of term • weighting term types (synonyms factor lower, verbs factor higher)
  • 4. Payloads in Solr 01 Lucene’s Payloads • Token: PayloadAttribute • byte[] per term position, optional • Several components set payloads • Similarity.SimScorer
 #computePayloadFactor • No built-in components (outside Lucene’s test cases), before SOLR-1485, implemented this • PostingsEnum#getPayload
  • 6. Payloads in Solr 01 Lucene’s Token • Field • Attributes: • CharTerm: term text • … Keyword, Type, Offset,… • and Payload!
  • 7. Payloads in Solr 01 setPayload(bytes) • DelimitedPayloadTokenFilter • NumericPayloadTokenFilter • TokenOffsetPayloadTokenFilter • TypeAsPayloadTokenFilter • pre-analyzed field (Solr)
  • 9. Payloads in Solr 01 DelimitedPayloadTokenFilter • term1|payload1 term2|payload2 • encodes payloads as: • float, • int, • or string / raw bytes
  • 10. field weighted_terms_dps term one doc 0 freq 1 pos 0 payload 1.0 term three doc 0 freq 1 pos 2 payload 3.0 term two doc 0 freq 1 pos 1 payload 2.0 term weighted doc 1 freq 2 pos 0 payload 50.0 pos 1 payload 100.0
  • 11. Payloads in Solr 01 Use Cases • products with per-store pricing • boosting by weighted terms • down-boosting synonyms
  • 12. Payloads in Solr 01Traditional per-store pricing strategies • Explode docs: • num_docs=products * stores (1M products * 5000 stores could be up to 5B docs!) • query-time collapsing (by product id) • Explode fields: • default_price • store_price_0001 • store_price_0002 • … store_price_NNNN • query-time field choice • eg. up to 5000 fields per document
  • 13. Payloads in Solr 01 Payload-based per-store pricing • default_price • store_prices: • terms: STORE_0001… STORE_NNNN • per-term payload of price • One additional field • with up to num_stores terms/payloads
  • 14.
  • 15. Payloads in Solr 01 Down-boosting synonyms id,synonyms_with_payloads 99,tv synonyms.txt Television, Televisions, TV, TVs /select?wt=csv&fl=id,score&
 q={!payload_score 
 f=synonyms_with_payloads
 v=$payload_term
 func=max}
 &payload_term=television 
 id,score 99,0.1 &payload_term=tv id,score 99,1.0
  • 16. { "add-field-type": { "name": "synonyms_with_payloads", "stored": "true", "class": "solr.TextField", "positionIncrementGap": “100", "indexAnalyzer": { "tokenizer": { "class": "solr.StandardTokenizerFactory" }, "filters": [ { "class": "solr.SynonymGraphFilterFactory", "expand": "true", "ignoreCase": "true", "synonyms": "synonyms.txt" }, { "class": "solr.LowerCaseFilterFactory" }, { "class": "solr.NumericPayloadTokenFilterFactory", "payload": "0.1", "typeMatch": "SYNONYM" } ] }, ,"queryAnalyzer": { "tokenizer": { "class": "solr.StandardTokenizerFactory" }, "filters": [ { "class": "solr.LowerCaseFilterFactory" } ] } }}
  • 17. Payloads in Solr 01 Solr Integration • Schema-aware • DelimitedPayloadTokenFilter: • float, integer, identity • NumericPayloadTokenFilter: float • Function / Value Source • payload() • Query parsers • {!payload_score} • {!payload_check} • Default (data_driven) schema has built-in payload-enabled dynamic field mappings: • *_dpf, *_dpi, and *_dps
  • 18. Payloads in Solr 01 Solr features with payloads • searching (scoring by payload):
 q={!payload_score…} • searching (filtering by payload):
 fq={!frange cost=999 l=0 u=100}payload(…) • sorting:
 sort=payload(…) desc • faceting:
 facet.query={!frange 
 l=0 u=100 
 v=$payload_func} &payload_func=payload(…)
  • 19. Payloads in Solr 01 payload() • payload(field,
 term
 [,default_value
 [,min|max|average|first]]) • Operates on float or integer encoded payloads • Value source, returning a single float per-document • Multiple term matches are possible, returning the min, max, or average. first is a special short-circuit • If no term match for document, returns default value, or zero
  • 20. Payloads in Solr 01 payload() uses • &payload_function=payload(….) • Returning: 
 fl=payload_result:${payload_function} • Sorting:
 sort=${payload function} desc • Range faceting:
 facet.query={!frange 
 key=up_to_one_hundred
 l=0 u=100 v=$payload_function} • Matching: • without payload considered: term query, eg {!term} • with payloads factored: {!payload_check}
  • 21. Payloads in Solr 01 {!payload_score} • SpanQuery wrapping, payload-based scoring • SpanQuery support: currently SpanNearQuery of SpanTermQuery’s • scoring: • payload function: min, max, or average • includeSpanScore=true: multiples payload function result by base query scoring • with a simple term query, payload() function is equivalent (with includeSpanScore=false)
  • 22. Payloads in Solr 01 {!payload_score} examples {!payload_score 
 f=payloaded_field_name 
 v=term_value
 func=min|max|average [includeSpanScore=false] } {!payload_score
 f=vals_dpf 
 func=average
 v=weighted
 includeSpanScore=true}
  • 23. Payloads in Solr 01 {!xmlparser} • {!xmlparser}
 <BoostingTermQuery 
 fieldName="weighted_terms_dpf">
 weighted
 </BoostingTermQuery> • == {!payload_score f=weighted_terms_dpf func=average includeSpanScore=true}
  • 24. Payloads in Solr 01 {!payload_check} • SpanQuery wrapping, phrase relevancy scoring • SpanQuery support: currently SpanNearQuery of SpanTermQuery’s • matching: • matches when all terms match all corresponding payloads, in order • scoring: • uses SpanNearQuery’s score
  • 25. Payloads in Solr 01 {!payload_check} id,words_dps 99,taking|VERB the|ARTICLE train|NOUN q={!payload_check 
 f=words_dps 
 v=train
 payloads=NOUN} q={!payload_check 
 f=words_dps
 v='the train'
 payloads='ARTICLE NOUN'}
  • 26. Payloads in Solr 01 Payload Cons • payload(): if used as a {!func} q or facet.query it will compute value for ALL documents in index. To PostFilter fq payload function computation of just matching documents use {!frange} with payload() • Updating values • Atomic field update • (could multivalue and delete/add a single term|value)? • could mean updating all inventory for all stores for a single change • no current range faceting support (of functions in general)
  • 27. Payloads in Solr 01 What’s next • SOLR-10541 - “Range facet by function” • solves range faceting by payload • LUCENE-7854: term frequency “payload” • coming soon, see SOLR-11358 • OpenNLP types => payloads • Pluggable encoders/decoders?