This document introduces the Query Processing Language (QPL), which is a domain-specific language for constructing search queries. QPL scripts allow for complex queries to be built programmatically and executed against a Solr search engine. Key benefits of QPL include increased flexibility over standard query parsers, the ability to easily create and use custom operators, and integrating additional data sources into queries through embedded searches. Examples demonstrate how QPL can be used for thesaurus expansion, composite queries, and integrating external data. QPL aims to provide a "4GL for text search query expressions" to simplify query construction for application developers.
2. Search Technologies Overview
Formed June 2005
Over 100 employees and growing
Over 500 customers worldwide
Presence in US, Latin America, UK & Germany
Deep enterprise search expertise
Consistent revenue growth and profitability
Search Engine Independent
2
The expert in the search space
3. Lucene Relevancy: Simple Operators
term(A) TF(A) * IDF(A)
Implemented with DefaultSimilarity / TermQuery
TF(A) = sqrt(termInDocCount)
IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0
and(A,B) A * B
Implemented with BooleanQuery()
or(A, B) A + B
Implemented with BooleanQuery()
max(A, B) max(A, B)
Implemented with DisjunctionMaxQuery()
3
The expert in the search space
4. Simple Operators - Example
0.3 * 0.9 = 0.27
and
0.1 + 0.2 = 0.30
or
max(0, 0.9) = 0.90
max
george
martha
washington
custis
0.10
0.20
0.60
0.90
4
The expert in the search space
5. Less Used Operators
boost(f, A) (A * f)
Implemented with Query.setBoost(f)
constant(f, A) if(A) then f else 0.0
Implemented with ConstantScoreQuery()
boostPlus(A, B) if(A) then (A + B) else 0.0
Implemented with BooleanQuery()
boostMul(f, A, B) if(B) then (A * f) else A
Implemented with BoostingQuery()
5
The expert in the search space
6. Problem: Need for More Flexibility
Difficult / impossible to use all operators
Many not available in standard query parsers
Complex expressions = string manipulation
This is messy
Query construction is in the application layer
Your UI programmer is creating query expressions?
Seriously?
Hard to create and use new operators
Requires modifying query parsers - yuck
6
The expert in the search space
8. Introducing: QPL
Query Processing Language
Domain Specific Language for Constructing Queries
Built on Groovy
https://wiki.searchtechnologies.com/index.php/QPL_Home_Page
Solr Plug-Ins
Query Parser
Search Component
“The 4GL for Text Search Query Expressions”
Server-side Solr Access
Cores, Analyzers, Embedded Search, Results XML
8
The expert in the search space
10. QPL Configuration – solrconfig.xml
Query Parser Configuration:
<queryParser name="qpl"
class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">
<str name="scriptFile">parser.qpl</str>
<str name="defaultField">text</str>
</queryParser>
Search Component Configuration:
<searchComponent name="qplSearchFirst"
class="com.searchtechnologies.qpl.solr.QPLSearchComponent">
<str name="scriptFile">search.qpl</str>
<str name="defaultField">text</str>
<str name="isProcessScript">false</str>
</searchComponent>
10
The expert in the search space
11. QPL Example #1
Tokenize:
myTerms = solr.tokenize(query);
Phrase Query:
phraseQ = phrase(myTerms);
And Query:
andQ = and(myTerms);
Or Query:
orQ = (myTerms.size() <= 2) ? null :
orMin( (myTerms.size()+1)/2, myTerms);
Put It All Together:
return phraseQ^3.0 | andQ^2.0 | orQ;
11
The expert in the search space
12. Thesaurus Example #2
Tokenize:
myTerms = solr.tokenize(query);
Load Thesaurus: (cached)
thes = Thesaurus.load("thesaurus.xml")
Thesaurus Expansion:
thesQ = thes.expand(0.8f,
solr.tokenizer("text"), myTerms);
Put It All Together:
Original Query: bathroom humor
return and(thesQ);
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
12
The expert in the search space
13. More Operators
Boolean Query Parser:
pQ = parseQuery("(george or martha) near/5 washington")
Relevancy Ranking Operators:
q1 = boostPlus(query, optionalQ)
q2 = boostMul(0.5, query, optionalQ)
q3 = constant(0.5, query)
Composite Queries:
compQ = and(compositeMax(
["title":1.5, "body":0.8],
"george", "washington"))
13
The expert in the search space
14. News Feed Use Case
Order
1
2
3
4
5
6
7
8
9
Documents
markets+terms
markets
terms
companies
markets+terms
markets
terms
companies
markets, companies
Date
Today
Today
Today
Today
Yesterday
Yesterday
Yesterday
Yesterday
older
14
The expert in the search space
15. News Feed Use Case – Step 1
Segments:
markets = split(solr.markets, "s*;s*")
marketsQ = field("markets", or(markets));
Terms:
terms = solr.tokenize(query);
termsQ = field("body",
or(thesaurus.expand(0.9f, terms)))
Companies:
compIds = split(solr.compIds, "s*;s*")
compIdsQ = field("companyIds", or(compIds))
15
The expert in the search space
16. News Feed Use Case – Step 2
sdf = new SimpleDateFormat("yyyy-MM-dd")
cal = Calendar.getInstance()
Today:
todayDate = sdf.format(c.getTime())
todayQ = field("date_s",todayDate)
Yesterday:
c.add(Calendar.DAY_OF_MONTH, -1)
yesterdayDate = sdf.format(c.getTime())
yesterdayQ = field("date_s",yesterdayDate)
16
The expert in the search space
17. News Feed Use Case – Step 3
Weighted Subject Queries:
sq1 = constant(4.0, and(marketsQ, termsQ))
sq2 = constant(3.0, marketsQ)
sq3 = constant(2.0, termsQ)
sq4 = constant(1.0, compIdsQ)
subjectQ = max(sq1, sq2, sq3, sq4)
Weighted Time Queries:
tq1 = constant(10.0, todayQ)
tq2 = constant(1.0, yesterdayQ)
timeQ = max(tq1, tq2)
Put it All Together:
recentQ = and(subjectQ, timeQ)
return max(recentQ, or(marketsQ,compIdsQ)^0.01))
17
The expert in the search space
18. BT RLP Tokenizer Use Case – Step 1
Define field type:
<tokenizer
class="com.basistech.rlp.solr.RLPTokenizerFactory"
rlpContext=“<PATH>rlp-context-bl1.xml"
postAltLemmas="false"
lang="eng"
postPartOfSpeech="false"/>
QPL Expansion:
finalExpandedQuery = transform(queryTerms,
[ TERM:{
ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term)
if(btCustomTokens.size()> 1)
return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1]));
else
return ctx.op;
} ]
);
18
The expert in the search space
19. BT RLP Tokenizer Use Case – Step 2
Original User Query:
following is "presentation on QPL"
QPL Parsed:
and(and(term(following),term(is)),
phrase(term(presentation),term(on),term(QPL)))
BT Expansion + QPL Transformation :
and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(b
e))),phrase(term(presentation),term(on),term(QPL)))
19
The expert in the search space
20. BT RLP Tokenizer Use Case – Step 3
and
and
phrase
or
Following
^1.5
follow
or
is
be
Presentation on QPL
^1.5
20
The expert in the search space
21. Embedded Search Example #1
qTerms = solr.tokenize(qTerms);
Execute an Embedded Search:
results = solr.search('subjectsCore', or(qTerms), 50)
Create a query from the results:
subjectsQ = or(results*.subjectId)
Put it all together:
return field("title", and(qTerms)) | subjectsQ^0.9;
21
The expert in the search space
22. Embedded Search Example #2
qTerms = solr.tokenize(qTerms);
Execute an Embedded Search:
results = solr.search('categories', and(qTerms), 10)
Create a Solr named list:
myList = solr.newList();
myList.add("relatedCategories", results*.title);
Add it to the XML response:
solr.addResponse(myList)
22
The expert in the search space
23. Other Features
Embedded Grouping Queries
Oh yes they did!
Proximity operators
ADJ, NEAR/#, BEFORE/#
Reverse Lemmatizer
Prefers exact matches over variants
Transformer
Applies transformations recursively to query trees
23
The expert in the search space
24. Query Processing Language
Application
Dev Team
User
Interface
Data as entered
by user
Search Team
Solr
QPL
Engine
QPL
Script
24
Search
Boolean
Query Expression
The expert in the search space