SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Advanced Query Parsing Techniques
Aruna Kumar Pamulapati (Arun)
Technical Consultant
Search Technologies Overview
Formed June 2005
Over 100 employees and growing
Over 500 customers worldwide
Presence in US, Latin America, UK & Germany
Deep enterprise search expertise
Consistent revenue growth and profitability
Search Engine Independent

2

The expert in the search space
Lucene Relevancy: Simple Operators
term(A)  TF(A) * IDF(A)
Implemented with DefaultSimilarity / TermQuery
TF(A) = sqrt(termInDocCount)
IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0

and(A,B)  A * B
Implemented with BooleanQuery()

or(A, B)  A + B
Implemented with BooleanQuery()

max(A, B)  max(A, B)
Implemented with DisjunctionMaxQuery()

3

The expert in the search space
Simple Operators - Example
0.3 * 0.9 = 0.27
and

0.1 + 0.2 = 0.30

or

max(0, 0.9) = 0.90

max

george

martha

washington

custis

0.10

0.20

0.60

0.90

4

The expert in the search space
Less Used Operators
boost(f, A)  (A * f)
Implemented with Query.setBoost(f)

constant(f, A)  if(A) then f else 0.0
Implemented with ConstantScoreQuery()

boostPlus(A, B)  if(A) then (A + B) else 0.0
Implemented with BooleanQuery()

boostMul(f, A, B)  if(B) then (A * f) else A
Implemented with BoostingQuery()

5

The expert in the search space
Problem: Need for More Flexibility
Difficult / impossible to use all operators
Many not available in standard query parsers

Complex expressions = string manipulation
This is messy

Query construction is in the application layer
Your UI programmer is creating query expressions?
Seriously?

Hard to create and use new operators
Requires modifying query parsers - yuck
6

The expert in the search space
Query Processing Language

Solr
User
Interface

QPL
Engine

Search

QPL
Script

7

The expert in the search space
Introducing: QPL
Query Processing Language
Domain Specific Language for Constructing Queries
Built on Groovy
https://wiki.searchtechnologies.com/index.php/QPL_Home_Page

Solr Plug-Ins
Query Parser
Search Component

“The 4GL for Text Search Query Expressions”
Server-side Solr Access
Cores, Analyzers, Embedded Search, Results XML

8

The expert in the search space
Solr Plug-Ins

9

The expert in the search space
QPL Configuration – solrconfig.xml
Query Parser Configuration:
<queryParser name="qpl"
class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">
<str name="scriptFile">parser.qpl</str>
<str name="defaultField">text</str>
</queryParser>

Search Component Configuration:
<searchComponent name="qplSearchFirst"
class="com.searchtechnologies.qpl.solr.QPLSearchComponent">
<str name="scriptFile">search.qpl</str>
<str name="defaultField">text</str>
<str name="isProcessScript">false</str>
</searchComponent>

10

The expert in the search space
QPL Example #1
Tokenize:
myTerms = solr.tokenize(query);
Phrase Query:
phraseQ = phrase(myTerms);
And Query:
andQ = and(myTerms);
Or Query:
orQ = (myTerms.size() <= 2) ? null :
orMin( (myTerms.size()+1)/2, myTerms);

Put It All Together:
return phraseQ^3.0 | andQ^2.0 | orQ;

11

The expert in the search space
Thesaurus Example #2
Tokenize:
myTerms = solr.tokenize(query);
Load Thesaurus: (cached)
thes = Thesaurus.load("thesaurus.xml")

Thesaurus Expansion:
thesQ = thes.expand(0.8f,
solr.tokenizer("text"), myTerms);
Put It All Together:
Original Query: bathroom humor
return and(thesQ);
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]

12

The expert in the search space
More Operators
Boolean Query Parser:
pQ = parseQuery("(george or martha) near/5 washington")

Relevancy Ranking Operators:
q1 = boostPlus(query, optionalQ)
q2 = boostMul(0.5, query, optionalQ)
q3 = constant(0.5, query)
Composite Queries:
compQ = and(compositeMax(
["title":1.5, "body":0.8],
"george", "washington"))

13

The expert in the search space
News Feed Use Case
Order
1
2
3
4
5
6
7
8
9

Documents
markets+terms
markets
terms
companies
markets+terms
markets
terms
companies
markets, companies

Date
Today
Today
Today
Today
Yesterday
Yesterday
Yesterday
Yesterday

older

14

The expert in the search space
News Feed Use Case – Step 1
Segments:
markets = split(solr.markets, "s*;s*")
marketsQ = field("markets", or(markets));
Terms:
terms = solr.tokenize(query);
termsQ = field("body",
or(thesaurus.expand(0.9f, terms)))
Companies:
compIds = split(solr.compIds, "s*;s*")
compIdsQ = field("companyIds", or(compIds))

15

The expert in the search space
News Feed Use Case – Step 2
sdf = new SimpleDateFormat("yyyy-MM-dd")
cal = Calendar.getInstance()

Today:
todayDate = sdf.format(c.getTime())
todayQ = field("date_s",todayDate)
Yesterday:
c.add(Calendar.DAY_OF_MONTH, -1)
yesterdayDate = sdf.format(c.getTime())
yesterdayQ = field("date_s",yesterdayDate)

16

The expert in the search space
News Feed Use Case – Step 3
Weighted Subject Queries:
sq1 = constant(4.0, and(marketsQ, termsQ))
sq2 = constant(3.0, marketsQ)
sq3 = constant(2.0, termsQ)
sq4 = constant(1.0, compIdsQ)
subjectQ = max(sq1, sq2, sq3, sq4)
Weighted Time Queries:
tq1 = constant(10.0, todayQ)
tq2 = constant(1.0, yesterdayQ)
timeQ = max(tq1, tq2)
Put it All Together:
recentQ = and(subjectQ, timeQ)
return max(recentQ, or(marketsQ,compIdsQ)^0.01))

17

The expert in the search space
BT RLP Tokenizer Use Case – Step 1
Define field type:
<tokenizer
class="com.basistech.rlp.solr.RLPTokenizerFactory"
rlpContext=“<PATH>rlp-context-bl1.xml"
postAltLemmas="false"
lang="eng"
postPartOfSpeech="false"/>

QPL Expansion:
finalExpandedQuery = transform(queryTerms,
[ TERM:{
ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term)
if(btCustomTokens.size()> 1)
return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1]));
else
return ctx.op;
} ]
);

18

The expert in the search space
BT RLP Tokenizer Use Case – Step 2
Original User Query:

following is "presentation on QPL"

QPL Parsed:
and(and(term(following),term(is)),
phrase(term(presentation),term(on),term(QPL)))

BT Expansion + QPL Transformation :
and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(b
e))),phrase(term(presentation),term(on),term(QPL)))

19

The expert in the search space
BT RLP Tokenizer Use Case – Step 3
and

and

phrase

or

Following
^1.5

follow

or

is

be

Presentation on QPL

^1.5

20

The expert in the search space
Embedded Search Example #1
qTerms = solr.tokenize(qTerms);

Execute an Embedded Search:
results = solr.search('subjectsCore', or(qTerms), 50)

Create a query from the results:
subjectsQ = or(results*.subjectId)

Put it all together:
return field("title", and(qTerms)) | subjectsQ^0.9;

21

The expert in the search space
Embedded Search Example #2
qTerms = solr.tokenize(qTerms);

Execute an Embedded Search:
results = solr.search('categories', and(qTerms), 10)

Create a Solr named list:
myList = solr.newList();
myList.add("relatedCategories", results*.title);

Add it to the XML response:
solr.addResponse(myList)

22

The expert in the search space
Other Features
Embedded Grouping Queries
Oh yes they did!

Proximity operators
ADJ, NEAR/#, BEFORE/#

Reverse Lemmatizer
Prefers exact matches over variants

Transformer
Applies transformations recursively to query trees

23

The expert in the search space
Query Processing Language
Application
Dev Team

User
Interface

Data as entered
by user

Search Team
Solr
QPL
Engine

QPL
Script

24

Search

Boolean
Query Expression

The expert in the search space
Query Processing Language
RDBMS

Other
Indexes

Thesaurus

Solr
User
Interface

QPL
Engine

Search

QPL
Script

25

The expert in the search space
More on QPL…

http://www.searchtechnologies.com/
query-parsing-language.html

26

The expert in the search space
THANK YOU
Contact: apamulapati@searchtechnologies.com
www.searchtechnologies.com

Weitere ähnliche Inhalte

Was ist angesagt?

Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat SheetHortonworks
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using OptiqJulian Hyde
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill WorkshopCharles Givre
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Time Series Analysis Sample Code
Time Series Analysis Sample CodeTime Series Analysis Sample Code
Time Series Analysis Sample CodeAiden Wu, FRM
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...Lucidworks
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Charles Givre
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingNeo4j
 
Dapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUDDapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUDBlank Chen
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10genMongoDB
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 

Was ist angesagt? (20)

Influxdb and time series data
Influxdb and time series dataInfluxdb and time series data
Influxdb and time series data
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill Workshop
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Time Series Analysis Sample Code
Time Series Analysis Sample CodeTime Series Analysis Sample Code
Time Series Analysis Sample Code
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and Modeling
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Dapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUDDapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUD
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 

Andere mochten auch

The things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchThe things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchSearch Technologies
 
Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Search Technologies
 
The Evolution of Search and Big Data
The Evolution of Search and Big DataThe Evolution of Search and Big Data
The Evolution of Search and Big DataSearch Technologies
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchSearch Technologies
 
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)Florian Hopf
 

Andere mochten auch (6)

The things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchThe things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 Search
 
Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013
 
The Evolution of Search and Big Data
The Evolution of Search and Big DataThe Evolution of Search and Big Data
The Evolution of Search and Big Data
 
Wikipedia Cloud Search Webinar
Wikipedia Cloud Search WebinarWikipedia Cloud Search Webinar
Wikipedia Cloud Search Webinar
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for Search
 
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
 

Ähnlich wie Advanced Query Parsing Techniques

Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniqueslucenerevolution
 
The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180Mahmoud Samir Fayed
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Lucidworks
 
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_osdstuartnz
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologiesBat Programmer
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreLukas Fittl
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Testing time and concurrency Rx
Testing time and concurrency RxTesting time and concurrency Rx
Testing time and concurrency RxTamir Dresher
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
 
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...mfrancis
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
The Ring programming language version 1.4.1 book - Part 3 of 31
The Ring programming language version 1.4.1 book - Part 3 of 31The Ring programming language version 1.4.1 book - Part 3 of 31
The Ring programming language version 1.4.1 book - Part 3 of 31Mahmoud Samir Fayed
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportAnton Arhipov
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalM Malai
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerMydbops
 

Ähnlich wie Advanced Query Parsing Techniques (20)

Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniques
 
The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
 
Rx workshop
Rx workshopRx workshop
Rx workshop
 
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_os
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologies
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Testing time and concurrency Rx
Testing time and concurrency RxTesting time and concurrency Rx
Testing time and concurrency Rx
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
query_tuning.pdf
query_tuning.pdfquery_tuning.pdf
query_tuning.pdf
 
The Ring programming language version 1.4.1 book - Part 3 of 31
The Ring programming language version 1.4.1 book - Part 3 of 31The Ring programming language version 1.4.1 book - Part 3 of 31
The Ring programming language version 1.4.1 book - Part 3 of 31
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience Report
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-final
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
 

Kürzlich hochgeladen

How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementNuwan Dias
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideHironori Washizaki
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Juan Carlos Gonzalez
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Alexander Turgeon
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Kürzlich hochgeladen (20)

How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API Management
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 

Advanced Query Parsing Techniques

  • 1. Advanced Query Parsing Techniques Aruna Kumar Pamulapati (Arun) Technical Consultant
  • 2. Search Technologies Overview Formed June 2005 Over 100 employees and growing Over 500 customers worldwide Presence in US, Latin America, UK & Germany Deep enterprise search expertise Consistent revenue growth and profitability Search Engine Independent 2 The expert in the search space
  • 3. Lucene Relevancy: Simple Operators term(A)  TF(A) * IDF(A) Implemented with DefaultSimilarity / TermQuery TF(A) = sqrt(termInDocCount) IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0 and(A,B)  A * B Implemented with BooleanQuery() or(A, B)  A + B Implemented with BooleanQuery() max(A, B)  max(A, B) Implemented with DisjunctionMaxQuery() 3 The expert in the search space
  • 4. Simple Operators - Example 0.3 * 0.9 = 0.27 and 0.1 + 0.2 = 0.30 or max(0, 0.9) = 0.90 max george martha washington custis 0.10 0.20 0.60 0.90 4 The expert in the search space
  • 5. Less Used Operators boost(f, A)  (A * f) Implemented with Query.setBoost(f) constant(f, A)  if(A) then f else 0.0 Implemented with ConstantScoreQuery() boostPlus(A, B)  if(A) then (A + B) else 0.0 Implemented with BooleanQuery() boostMul(f, A, B)  if(B) then (A * f) else A Implemented with BoostingQuery() 5 The expert in the search space
  • 6. Problem: Need for More Flexibility Difficult / impossible to use all operators Many not available in standard query parsers Complex expressions = string manipulation This is messy Query construction is in the application layer Your UI programmer is creating query expressions? Seriously? Hard to create and use new operators Requires modifying query parsers - yuck 6 The expert in the search space
  • 8. Introducing: QPL Query Processing Language Domain Specific Language for Constructing Queries Built on Groovy https://wiki.searchtechnologies.com/index.php/QPL_Home_Page Solr Plug-Ins Query Parser Search Component “The 4GL for Text Search Query Expressions” Server-side Solr Access Cores, Analyzers, Embedded Search, Results XML 8 The expert in the search space
  • 9. Solr Plug-Ins 9 The expert in the search space
  • 10. QPL Configuration – solrconfig.xml Query Parser Configuration: <queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str> </queryParser> Search Component Configuration: <searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str> </searchComponent> 10 The expert in the search space
  • 11. QPL Example #1 Tokenize: myTerms = solr.tokenize(query); Phrase Query: phraseQ = phrase(myTerms); And Query: andQ = and(myTerms); Or Query: orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms); Put It All Together: return phraseQ^3.0 | andQ^2.0 | orQ; 11 The expert in the search space
  • 12. Thesaurus Example #2 Tokenize: myTerms = solr.tokenize(query); Load Thesaurus: (cached) thes = Thesaurus.load("thesaurus.xml") Thesaurus Expansion: thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms); Put It All Together: Original Query: bathroom humor return and(thesQ); [or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)] 12 The expert in the search space
  • 13. More Operators Boolean Query Parser: pQ = parseQuery("(george or martha) near/5 washington") Relevancy Ranking Operators: q1 = boostPlus(query, optionalQ) q2 = boostMul(0.5, query, optionalQ) q3 = constant(0.5, query) Composite Queries: compQ = and(compositeMax( ["title":1.5, "body":0.8], "george", "washington")) 13 The expert in the search space
  • 14. News Feed Use Case Order 1 2 3 4 5 6 7 8 9 Documents markets+terms markets terms companies markets+terms markets terms companies markets, companies Date Today Today Today Today Yesterday Yesterday Yesterday Yesterday older 14 The expert in the search space
  • 15. News Feed Use Case – Step 1 Segments: markets = split(solr.markets, "s*;s*") marketsQ = field("markets", or(markets)); Terms: terms = solr.tokenize(query); termsQ = field("body", or(thesaurus.expand(0.9f, terms))) Companies: compIds = split(solr.compIds, "s*;s*") compIdsQ = field("companyIds", or(compIds)) 15 The expert in the search space
  • 16. News Feed Use Case – Step 2 sdf = new SimpleDateFormat("yyyy-MM-dd") cal = Calendar.getInstance() Today: todayDate = sdf.format(c.getTime()) todayQ = field("date_s",todayDate) Yesterday: c.add(Calendar.DAY_OF_MONTH, -1) yesterdayDate = sdf.format(c.getTime()) yesterdayQ = field("date_s",yesterdayDate) 16 The expert in the search space
  • 17. News Feed Use Case – Step 3 Weighted Subject Queries: sq1 = constant(4.0, and(marketsQ, termsQ)) sq2 = constant(3.0, marketsQ) sq3 = constant(2.0, termsQ) sq4 = constant(1.0, compIdsQ) subjectQ = max(sq1, sq2, sq3, sq4) Weighted Time Queries: tq1 = constant(10.0, todayQ) tq2 = constant(1.0, yesterdayQ) timeQ = max(tq1, tq2) Put it All Together: recentQ = and(subjectQ, timeQ) return max(recentQ, or(marketsQ,compIdsQ)^0.01)) 17 The expert in the search space
  • 18. BT RLP Tokenizer Use Case – Step 1 Define field type: <tokenizer class="com.basistech.rlp.solr.RLPTokenizerFactory" rlpContext=“<PATH>rlp-context-bl1.xml" postAltLemmas="false" lang="eng" postPartOfSpeech="false"/> QPL Expansion: finalExpandedQuery = transform(queryTerms, [ TERM:{ ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term) if(btCustomTokens.size()> 1) return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1])); else return ctx.op; } ] ); 18 The expert in the search space
  • 19. BT RLP Tokenizer Use Case – Step 2 Original User Query: following is "presentation on QPL" QPL Parsed: and(and(term(following),term(is)), phrase(term(presentation),term(on),term(QPL))) BT Expansion + QPL Transformation : and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(b e))),phrase(term(presentation),term(on),term(QPL))) 19 The expert in the search space
  • 20. BT RLP Tokenizer Use Case – Step 3 and and phrase or Following ^1.5 follow or is be Presentation on QPL ^1.5 20 The expert in the search space
  • 21. Embedded Search Example #1 qTerms = solr.tokenize(qTerms); Execute an Embedded Search: results = solr.search('subjectsCore', or(qTerms), 50) Create a query from the results: subjectsQ = or(results*.subjectId) Put it all together: return field("title", and(qTerms)) | subjectsQ^0.9; 21 The expert in the search space
  • 22. Embedded Search Example #2 qTerms = solr.tokenize(qTerms); Execute an Embedded Search: results = solr.search('categories', and(qTerms), 10) Create a Solr named list: myList = solr.newList(); myList.add("relatedCategories", results*.title); Add it to the XML response: solr.addResponse(myList) 22 The expert in the search space
  • 23. Other Features Embedded Grouping Queries Oh yes they did! Proximity operators ADJ, NEAR/#, BEFORE/# Reverse Lemmatizer Prefers exact matches over variants Transformer Applies transformations recursively to query trees 23 The expert in the search space
  • 24. Query Processing Language Application Dev Team User Interface Data as entered by user Search Team Solr QPL Engine QPL Script 24 Search Boolean Query Expression The expert in the search space