SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Advanced Relevancy Ranking
Paul Nelson
Chief Architect / Search Technologies
2
Search Technologies Overview
• Formed June 2005
• Over 100 employees and growing
• Over 400 customers worldwide
• Presence in US, Latin America, UK & Germany
• Deep enterprise search expertise
• Consistent revenue growth and profitability
• Search Engine Independent
3
Lucene Relevancy: Simple Operators
• term(A)  TF(A) * IDF(A)
• Implemented with DefaultSimilarity / TermQuery
• TF(A) = sqrt(termInDocCount)
• IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0
• and(A,B)  A * B
• Implemented with BooleanQuery()
• or(A, B)  A + B
• Implemented with BooleanQuery()
• max(A, B)  max(A, B)
• Implemented with DisjunctionMaxQuery()
3
4
Simple Operators - Example
and
or max
george martha washington custis
0.10 0.20 0.60 0.90
0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90
0.3 * 0.9 = 0.27
5
Less Used Operators
• boost(f, A)  (A * f)
• Implemented with Query.setBoost(f)
• constant(f, A)  if(A) then f else 0.0
• Implemented with ConstantScoreQuery()
• boostPlus(A, B)  if(A) then (A + B) else 0.0
• Implemented with BooleanQuery()
• boostMul(f, A, B)  if(B) then (A * f) else A
• Implemented with BoostingQuery()
5
6
Problem: Need for More Flexibility
• Difficult / impossible to use all operators
• Many not available in standard query parsers
• Complex expressions = string manipulation
• This is messy
• Query construction is in the application layer
• Your UI programmer is creating query expressions?
• Seriously?
• Hard to create and use new operators
• Requires modifying query parsers - yuck
6
7
Solr
Query Processing Language
7
User
Interface
QPL
Engine
Search
QPL
Script
8
Introducing: QPL
• Query Processing Language
• Domain Specific Language for Constructing Queries
• Built on Groovy
• https://wiki.searchtechnologies.com/index.php/QPL_Home_Page
• Solr Plug-Ins
• Query Parser
• Search Component
• “The 4GL for Text Search Query Expressions”
• Server-side Solr Access
• Cores, Analyzers, Embedded Search, Results XML
8
9
Solr Plug-Ins
10
QPL Configuration – solrconfig.xml
<queryParser name="qpl"
class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">
<str name="scriptFile">parser.qpl</str>
<str name="defaultField">text</str>
</queryParser>
<searchComponent name="qplSearchFirst"
class="com.searchtechnologies.qpl.solr.QPLSearchComponent">
<str name="scriptFile">search.qpl</str>
<str name="defaultField">text</str>
<str name="isProcessScript">false</str>
</searchComponent>
Query Parser Configuration:
Search Component Configuration:
11
QPL Example #1
myTerms = solr.tokenize(query);
phraseQ = phrase(myTerms);
andQ = and(myTerms);
return phraseQ^3.0 | andQ^2.0 | orQ;
Tokenize:
Phrase Query:
And Query:
Put It All Together:
orQ = (myTerms.size() <= 2) ? null :
orMin( (myTerms.size()+1)/2, myTerms);
Or Query:
12
Thesaurus Example #2
myTerms = solr.tokenize(query);
thes = Thesaurus.load("thesaurus.xml")
thesQ = thes.expand(0.8f,
solr.tokenizer("text"), myTerms);
return and(thesQ);
Tokenize:
Load Thesaurus: (cached)
Thesaurus Expansion:
Put It All Together:
Original Query: bathroom humor
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
13
More Operators
Boolean Query Parser:
pQ = parseQuery("(george or martha) near/5 washington")
Relevancy Ranking Operators:
q1 = boostPlus(query, optionalQ)
q2 = boostMul(0.5, query, optionalQ)
q3 = constant(0.5, query)
Composite Queries:
compQ = and(compositeMax(
["title":1.5, "body":0.8],
"george", "washington"))
14
News Feed Use Case
14
Order Documents Date
1 markets+terms Today
2 markets Today
3 terms Today
4 companies Today
5 markets+terms Yesterday
6 markets Yesterday
7 terms Yesterday
8 companies Yesterday
9 markets, companies older
15
News Feed Use Case – Step 1
markets = split(solr.markets, "s*;s*")
marketsQ = field("markets", or(markets));
terms = solr.tokenize(query);
termsQ = field("body",
or(thesaurus.expand(0.9f, terms)))
compIds = split(solr.compIds, "s*;s*")
compIdsQ = field("companyIds", or(compIds))
Segments:
Terms:
Companies:
16
News Feed Use Case – Step 2
todayDate = sdf.format(c.getTime())
todayQ = field("date_s",todayDate)
c.add(Calendar.DAY_OF_MONTH, -1)
yesterdayDate = sdf.format(c.getTime())
yesterdayQ = field("date_s",yesterdayDate)
Today:
Yesterday:
sdf = new SimpleDateFormat("yyyy-MM-dd")
cal = Calendar.getInstance()
17
News Feed Use Case
17
Order Documents Date
1 markets+terms Today
2 markets Today
3 terms Today
4 companies Today
5 markets+terms Yesterday
6 markets Yesterday
7 terms Yesterday
8 companies Yesterday
9 markets, companies older
18
News Feed Use Case – Step 3
sq1 = constant(4.0, and(marketsQ, termsQ))
sq2 = constant(3.0, marketsQ)
sq3 = constant(2.0, termsQ)
sq4 = constant(1.0, compIdsQ)
subjectQ = max(sq1, sq2, sq3, sq4)
tq1 = constant(10.0, todayQ)
tq2 = constant(1.0, yesterdayQ)
timeQ = max(tq1, tq2)
recentQ = and(subjectQ, timeQ)
Weighted Subject Queries:
Weighted Time Queries:
Put it All Together:
return max(recentQ, or(marketsQ,compIdsQ)^0.01))
19
Embedded Search Example #1
results = solr.search('subjectsCore', or(qTerms), 50)
subjectsQ = or(results*.subjectId)
return field("title", and(qTerms)) | subjectsQ^0.9;
Execute an Embedded Search:
Create a query from the results:
Put it all together:
qTerms = solr.tokenize(qTerms);
20
Embedded Search Example #2
results = solr.search('categories', and(qTerms), 10)
myList = solr.newList();
myList.add("relatedCategories", results*.title);
solr.addResponse(myList)
Execute an Embedded Search:
Create a Solr named list:
Add it to the XML response:
qTerms = solr.tokenize(qTerms);
21
Other Features
• Embedded Grouping Queries
• Oh yes they did!
• Proximity operators
• ADJ, NEAR/#, BEFORE/#
• Reverse Lemmatizer
• Prefers exact matches over variants
• Transformer
• Applies transformations recursively to query trees
21
22
Solr
Query Processing Language
22
User
Interface
QPL
Engine
Search
Data as entered
by user Boolean
Query Expression
QPL
Script
Application
Dev Team
Search Team
23
Solr
QPL: Using External Sources to Build Queries
23
User
Interface
QPL
Engine
Search
QPL
Script
RDBMS
Other
Indexes
Thesaurus
CONTACT
Paul Nelson
pnelson@searchtechnologies.com

Weitere ähnliche Inhalte

Was ist angesagt?

Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Yonik Seeley
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Foreign Data Wrapper Enhancements
Foreign Data Wrapper EnhancementsForeign Data Wrapper Enhancements
Foreign Data Wrapper EnhancementsShigeru Hanada
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)Koji Sekiguchi
 
Mastering solr
Mastering solrMastering solr
Mastering solrjurcello
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill WorkshopCharles Givre
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptLucidworks
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat SheetHortonworks
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Apache SOLR in AEM 6
Apache SOLR in AEM 6Apache SOLR in AEM 6
Apache SOLR in AEM 6Yash Mody
 
Pig_Presentation
Pig_PresentationPig_Presentation
Pig_PresentationArjun Shah
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 

Was ist angesagt? (20)

Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Foreign Data Wrapper Enhancements
Foreign Data Wrapper EnhancementsForeign Data Wrapper Enhancements
Foreign Data Wrapper Enhancements
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
 
Mastering solr
Mastering solrMastering solr
Mastering solr
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill Workshop
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Apache SOLR in AEM 6
Apache SOLR in AEM 6Apache SOLR in AEM 6
Apache SOLR in AEM 6
 
Pig_Presentation
Pig_PresentationPig_Presentation
Pig_Presentation
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 

Ähnlich wie Advanced query parsing techniques

Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationMongoDB
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 
#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects Explained#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects ExplainedAtul Gupta(8X)
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享Chengjen Lee
 
SQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API ConsumersSQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API ConsumersJerod Johnson
 
Alternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderAlternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderKadharBashaJ
 
Oracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google AppsOracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google AppsSergei Martens
 
Salesforce Summer 14 Release
Salesforce Summer 14 ReleaseSalesforce Summer 14 Release
Salesforce Summer 14 ReleaseJyothylakshmy P.U
 
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...Mikael Svenson
 
Nu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.comNu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.comDay Software
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull lucenerevolution
 

Ähnlich wie Advanced query parsing techniques (20)

Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects Explained#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects Explained
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
React inter3
React inter3React inter3
React inter3
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
SQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API ConsumersSQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API Consumers
 
Alternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderAlternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builder
 
Oracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google AppsOracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google Apps
 
70433 Dumps DB
70433 Dumps DB70433 Dumps DB
70433 Dumps DB
 
Javascript
JavascriptJavascript
Javascript
 
Salesforce Summer 14 Release
Salesforce Summer 14 ReleaseSalesforce Summer 14 Release
Salesforce Summer 14 Release
 
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
 
Nu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.comNu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.com
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 
Polyglot
PolyglotPolyglot
Polyglot
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Kürzlich hochgeladen

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Kürzlich hochgeladen (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Advanced query parsing techniques

  • 1. Advanced Relevancy Ranking Paul Nelson Chief Architect / Search Technologies
  • 2. 2 Search Technologies Overview • Formed June 2005 • Over 100 employees and growing • Over 400 customers worldwide • Presence in US, Latin America, UK & Germany • Deep enterprise search expertise • Consistent revenue growth and profitability • Search Engine Independent
  • 3. 3 Lucene Relevancy: Simple Operators • term(A)  TF(A) * IDF(A) • Implemented with DefaultSimilarity / TermQuery • TF(A) = sqrt(termInDocCount) • IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0 • and(A,B)  A * B • Implemented with BooleanQuery() • or(A, B)  A + B • Implemented with BooleanQuery() • max(A, B)  max(A, B) • Implemented with DisjunctionMaxQuery() 3
  • 4. 4 Simple Operators - Example and or max george martha washington custis 0.10 0.20 0.60 0.90 0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90 0.3 * 0.9 = 0.27
  • 5. 5 Less Used Operators • boost(f, A)  (A * f) • Implemented with Query.setBoost(f) • constant(f, A)  if(A) then f else 0.0 • Implemented with ConstantScoreQuery() • boostPlus(A, B)  if(A) then (A + B) else 0.0 • Implemented with BooleanQuery() • boostMul(f, A, B)  if(B) then (A * f) else A • Implemented with BoostingQuery() 5
  • 6. 6 Problem: Need for More Flexibility • Difficult / impossible to use all operators • Many not available in standard query parsers • Complex expressions = string manipulation • This is messy • Query construction is in the application layer • Your UI programmer is creating query expressions? • Seriously? • Hard to create and use new operators • Requires modifying query parsers - yuck 6
  • 8. 8 Introducing: QPL • Query Processing Language • Domain Specific Language for Constructing Queries • Built on Groovy • https://wiki.searchtechnologies.com/index.php/QPL_Home_Page • Solr Plug-Ins • Query Parser • Search Component • “The 4GL for Text Search Query Expressions” • Server-side Solr Access • Cores, Analyzers, Embedded Search, Results XML 8
  • 10. 10 QPL Configuration – solrconfig.xml <queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str> </queryParser> <searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str> </searchComponent> Query Parser Configuration: Search Component Configuration:
  • 11. 11 QPL Example #1 myTerms = solr.tokenize(query); phraseQ = phrase(myTerms); andQ = and(myTerms); return phraseQ^3.0 | andQ^2.0 | orQ; Tokenize: Phrase Query: And Query: Put It All Together: orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms); Or Query:
  • 12. 12 Thesaurus Example #2 myTerms = solr.tokenize(query); thes = Thesaurus.load("thesaurus.xml") thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms); return and(thesQ); Tokenize: Load Thesaurus: (cached) Thesaurus Expansion: Put It All Together: Original Query: bathroom humor [or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
  • 13. 13 More Operators Boolean Query Parser: pQ = parseQuery("(george or martha) near/5 washington") Relevancy Ranking Operators: q1 = boostPlus(query, optionalQ) q2 = boostMul(0.5, query, optionalQ) q3 = constant(0.5, query) Composite Queries: compQ = and(compositeMax( ["title":1.5, "body":0.8], "george", "washington"))
  • 14. 14 News Feed Use Case 14 Order Documents Date 1 markets+terms Today 2 markets Today 3 terms Today 4 companies Today 5 markets+terms Yesterday 6 markets Yesterday 7 terms Yesterday 8 companies Yesterday 9 markets, companies older
  • 15. 15 News Feed Use Case – Step 1 markets = split(solr.markets, "s*;s*") marketsQ = field("markets", or(markets)); terms = solr.tokenize(query); termsQ = field("body", or(thesaurus.expand(0.9f, terms))) compIds = split(solr.compIds, "s*;s*") compIdsQ = field("companyIds", or(compIds)) Segments: Terms: Companies:
  • 16. 16 News Feed Use Case – Step 2 todayDate = sdf.format(c.getTime()) todayQ = field("date_s",todayDate) c.add(Calendar.DAY_OF_MONTH, -1) yesterdayDate = sdf.format(c.getTime()) yesterdayQ = field("date_s",yesterdayDate) Today: Yesterday: sdf = new SimpleDateFormat("yyyy-MM-dd") cal = Calendar.getInstance()
  • 17. 17 News Feed Use Case 17 Order Documents Date 1 markets+terms Today 2 markets Today 3 terms Today 4 companies Today 5 markets+terms Yesterday 6 markets Yesterday 7 terms Yesterday 8 companies Yesterday 9 markets, companies older
  • 18. 18 News Feed Use Case – Step 3 sq1 = constant(4.0, and(marketsQ, termsQ)) sq2 = constant(3.0, marketsQ) sq3 = constant(2.0, termsQ) sq4 = constant(1.0, compIdsQ) subjectQ = max(sq1, sq2, sq3, sq4) tq1 = constant(10.0, todayQ) tq2 = constant(1.0, yesterdayQ) timeQ = max(tq1, tq2) recentQ = and(subjectQ, timeQ) Weighted Subject Queries: Weighted Time Queries: Put it All Together: return max(recentQ, or(marketsQ,compIdsQ)^0.01))
  • 19. 19 Embedded Search Example #1 results = solr.search('subjectsCore', or(qTerms), 50) subjectsQ = or(results*.subjectId) return field("title", and(qTerms)) | subjectsQ^0.9; Execute an Embedded Search: Create a query from the results: Put it all together: qTerms = solr.tokenize(qTerms);
  • 20. 20 Embedded Search Example #2 results = solr.search('categories', and(qTerms), 10) myList = solr.newList(); myList.add("relatedCategories", results*.title); solr.addResponse(myList) Execute an Embedded Search: Create a Solr named list: Add it to the XML response: qTerms = solr.tokenize(qTerms);
  • 21. 21 Other Features • Embedded Grouping Queries • Oh yes they did! • Proximity operators • ADJ, NEAR/#, BEFORE/# • Reverse Lemmatizer • Prefers exact matches over variants • Transformer • Applies transformations recursively to query trees 21
  • 22. 22 Solr Query Processing Language 22 User Interface QPL Engine Search Data as entered by user Boolean Query Expression QPL Script Application Dev Team Search Team
  • 23. 23 Solr QPL: Using External Sources to Build Queries 23 User Interface QPL Engine Search QPL Script RDBMS Other Indexes Thesaurus