SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Search Intelligence &
MarkLogic Search API
MarkLogic World 2012
Will Thompson
wthompson@jonesmcclure.com
Search API Resources
• 5-minute Guide to the Search API
• MarkLogic Search Developer's Guide
• developer.marklogic.com
• MarkMail.org
• MarkLogic Developer Listserv
Code
Github:
https://github.com/wthoolihan/MLUC-2012-Examples
Search Intelligence
Search Intelligence
Search Intelligence
• Get the most out of our XML in search
– Approach 1: GUI
Search Intelligence
• Get the most out of our XML in search
– Approach 1: GUI
Search Intelligence
• Get the most out of our XML in search
– Approach 2: Syntax
Search Intelligence
• Get the most out of our XML in search
– Approach 2: Syntax
Search Intelligence
• Get the most out of our XML in search
– Approach 3: Facets
Search Intelligence
• Get the most out of our XML in search
– Approach 3:
Facets, constraints, filters
Search Intelligence
• Get the most out of our XML in search
– Infer (Search Intelligence)
Enrich Your Query!
• Infer
– Use knowledge about the user
– Look for meaning in search terms
• Enrich
– Translate into more complex query
– Gain speed, accuracy
Enrich Your Query!
• Strategies
– Custom term handling
• Works well for single term transformations
• See: http://developer.marklogic.com/try/ninja/page13
– Roll your own parser
• A lot of work (see Michael Blakeley’s xqysp)
– Work between parse and search steps
Search API Overview
• The Search API is an XQuery library module designed to
simplify creating search applications:
o Parser
o Constraints
o Faceting
o Snippets
• High performance, scalability
• Extensible
Search API Extensibility
• Search API provides several points to hook in
• Hooks are defined in Search API options XML node
o Custom constraints
o Custom grammar
o Custom snippets
o Custom term handling
o Search operators
Search API Basics
• Search API module:
• Main entry point: search:search()
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
• parses $qtext with given $options
• executes search
• returns <search:response>
o set of <search:result>s
o facets
o snippets
o metrics and other info
Search API Basics
• Search API
options:
Search API Extensibility
• Snippet:
• Constraint:
Search API Extensibility
• Term handler:
• Parser:
let $custom-parser-output :=
my:parse($qtext)
search:resolve(
$custom-parser-output,
$options
)
Search API Basics
• Search API parser:
• Execute search:
• 1st half of search:search()
• returns annotated cts:query XML
• 2nd half of search:search()
• accepts cts:query XML as input
search:parse() Strategy
1. Call search:parse()
2. Analyze and enrich the query XML
3. Call search:resolve()
Our Use Case
• O’Connor’s Online
– Search portal built on MarkLogic
– Legal rules and commentaries content
– Problem
• Users will enter citation numbers, abbreviations, etc. expecting
complete results
• Text editorial content follows different conventions
– Solution
• Detect special cases pre-search and enrich query
Example: detect year
• Content:
– MarkLogic database of news/op-ed articles
• Organized into year directories:
/content/1990
/content/1991
/content/1992
...
/content/2012
• Year is in directory structure, not article text
– But users will still include year in search terms
How to transform query?
• Recursive typeswitch
(function mapping on):
do-stuff-here($q)
Example: detect year
Example: detect year
let $terms := "1996 United States Olympics"
return local:detect-year(search:parse($terms))
Example: detect year
• Strategy depends on your content model
• Other possibilities
– date detection
– date ranges
– locations
– etc.
search:parse() Strategy
• Weakness
– Limited to single word token
• Similar to custom term handling
• What about multiple tokens?
– Analyze querystring text directly using regex
• Dangerous
– Transform cts:query XML into intermediate form
• Preserve Boolean logic & grouping
• Preserve phrases
• Preserve constraints
Building Intermediate Query
• The hack
– Basically, undoing some of the parser's work
– Text "run" concept
• Similar to WordprocessingML
Building Intermediate Query
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
• Content:
– Same MarkLogic database of news/op-ed articles from
detect-year() example
• Query:
– Same as before: "1996 United States Olypmics"
– Start with the search:parse()output
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
1. Flatten query
– remove implicit and-queries from search:parse() output:
1. Flatten query
– XML should look more like cts:query string
representation:
Example: multi-word thesaurus
cts:and-query(
(cts:word-query("1996", "lang=en", 1),
cts:word-query("United", "lang=en", 1),
cts:word-query("States", "lang=en", 1),
cts:word-query("Olympics", "lang=en", 1)),
())
1. Flatten query
• Typeswitch on
cts:and-query:
1. Check and-queries for
parent and-query
2. Remove the nested
ones, copy through
anything else
Example: multi-word thesaurus
Example: multi-word thesaurus
1. Flatten query
– Typeswitch function output:
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
2. Join sibling words in <run>:
• Typeswitch on cts:word-query:
1. Ignore phrases
2. Delete if query is
not the first.
3. Take first
word-query in
sequence and
join with its
following siblings
into a <run>
2. Join sibling words in <run>:
• Input:
– search:parse("1996 United States Olympics")/local:unnest-
ands(.)/local:create-runs(.)
• Output:
Example: multi-word thesaurus
2. Join sibling words in <run>:
• Input:
– search:parse("1996 (sprint OR marathon) United States
Olympics")/local:unnest-ands(.)/local:create-runs(.)
• Output:
Example: multi-word thesaurus
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
3. Transform <run>s:
1. Store terms in thesaurus
2. Build cts:or-query of thesaurus terms
3. Using cts:or-query of terms, cts:highlight() <run>s,
and replace with thesaurus synonyms
3. Transform <run>s:
1. store terms in
thesaurus
Example: multi-word thesaurus
3. Transform <run>s:
2. build cts:or-query of thesaurus terms:
Example: multi-word thesaurus
3. Transform <run>s:
3. replace matches with synonyms:
– cts:highlight() - powerful cts:query-based find/replace
»
»
Example: multi-word thesaurus
3. Transform <run>s:
3. replace matches with synonyms:
Example: multi-word thesaurus
3. Transform <run>s:
Input:
Example: multi-word thesaurus
let $q-thsr :=
cts:or-query(
doc("thesaurus.xml")
//thsr:entry/thsr:term/cts:word-query(string(.)))
)
let $q-runs :=
search:parse("1996 United States Olympics")
/local:unnest-ands(.)/local:create-runs(.)
return local:thsr-expand($runs, $q-thsr)
3. Transform <run>s:
Output:
Example: multi-word thesaurus
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
4. Convert <run>s back to word queries
– Typeswitch:
Example: multi-word thesaurus
4. Convert <run>s back to word queries
Input:
Example: multi-word thesaurus
let $q-thsr :=
cts:or-query(
doc("thesaurus.xml")
//thsr:entry/thsr:term/cts:word-query(string(.)))
)
let $runs := search:parse("1996 United States Olympics")
/local:unnest-ands(.)/local:create-runs(.)
let $expanded := local:thsr-expand($runs, $q-thsr)
return local:resolve-runs($expanded)
4. Convert <run>s back to word queries
Output:
Example: multi-word thesaurus
Combining Examples
local:thsr-expand-runs($runs, $q-thsr)
/local:resolve-runs($expanded)/local:detect-year($runs)
Enrich Your Query!
• Takeaway
1. No added GUI
2. Didn't ask the user for additional input
3. Able to build more robust query before
executing search
• Many potential applications:
– Ad-hoc weighting:
Search API Hacking
local:q-add-weights(
search:parse("bananas"),
(<element ns="$ns" name="p" weight="1"/>,
<element ns="$ns" name="b" weight="2"/>,
<element ns="$ns" name="title" weight="3.5"/>)
)
• Many potential applications:
– Automatic spell correction:
Search API Hacking
• Many potential applications:
– Detect entities
• Transform text into element-based query
• Less false positives and exclusions
• Leverage indexes:
Search API Hacking
"New York Times"
Search API Hacking
• Other ideas
– Regex unparsed query string
• apply constraints, operators, etc as configured in Search API based on key
words/patterns
– Custom term handler
• single-term transformations
– Combine with data enrichment on ingestion
• MarkLogic Entity Framework
• Linguistic processing
Hazards
• Chaos
– Daisy chained transformations can have unintended
consequences
– Performance
• Pre-search transformations need to be fast
• make sure to leverage indexes as much as possible
• Larger queries do take longer
Questions

Weitere ähnliche Inhalte

Ähnlich wie Search Intelligence & MarkLogic Search API

SURE_2014 Poster 2.0
SURE_2014 Poster 2.0SURE_2014 Poster 2.0
SURE_2014 Poster 2.0
Alex Sumner
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research Report
Alex Sumner
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 

Ähnlich wie Search Intelligence & MarkLogic Search API (20)

SURE_2014 Poster 2.0
SURE_2014 Poster 2.0SURE_2014 Poster 2.0
SURE_2014 Poster 2.0
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniSiteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research Report
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart Webinar
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 

Search Intelligence & MarkLogic Search API

  • 1. Search Intelligence & MarkLogic Search API MarkLogic World 2012 Will Thompson wthompson@jonesmcclure.com
  • 2. Search API Resources • 5-minute Guide to the Search API • MarkLogic Search Developer's Guide • developer.marklogic.com • MarkMail.org • MarkLogic Developer Listserv
  • 6. Search Intelligence • Get the most out of our XML in search – Approach 1: GUI
  • 7. Search Intelligence • Get the most out of our XML in search – Approach 1: GUI
  • 8. Search Intelligence • Get the most out of our XML in search – Approach 2: Syntax
  • 9. Search Intelligence • Get the most out of our XML in search – Approach 2: Syntax
  • 10. Search Intelligence • Get the most out of our XML in search – Approach 3: Facets
  • 11. Search Intelligence • Get the most out of our XML in search – Approach 3: Facets, constraints, filters
  • 12. Search Intelligence • Get the most out of our XML in search – Infer (Search Intelligence)
  • 13. Enrich Your Query! • Infer – Use knowledge about the user – Look for meaning in search terms • Enrich – Translate into more complex query – Gain speed, accuracy
  • 14. Enrich Your Query! • Strategies – Custom term handling • Works well for single term transformations • See: http://developer.marklogic.com/try/ninja/page13 – Roll your own parser • A lot of work (see Michael Blakeley’s xqysp) – Work between parse and search steps
  • 15. Search API Overview • The Search API is an XQuery library module designed to simplify creating search applications: o Parser o Constraints o Faceting o Snippets • High performance, scalability • Extensible
  • 16. Search API Extensibility • Search API provides several points to hook in • Hooks are defined in Search API options XML node o Custom constraints o Custom grammar o Custom snippets o Custom term handling o Search operators
  • 17. Search API Basics • Search API module: • Main entry point: search:search() import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy"; • parses $qtext with given $options • executes search • returns <search:response> o set of <search:result>s o facets o snippets o metrics and other info
  • 18. Search API Basics • Search API options:
  • 19. Search API Extensibility • Snippet: • Constraint:
  • 20. Search API Extensibility • Term handler: • Parser: let $custom-parser-output := my:parse($qtext) search:resolve( $custom-parser-output, $options )
  • 21. Search API Basics • Search API parser: • Execute search: • 1st half of search:search() • returns annotated cts:query XML • 2nd half of search:search() • accepts cts:query XML as input
  • 22. search:parse() Strategy 1. Call search:parse() 2. Analyze and enrich the query XML 3. Call search:resolve()
  • 23. Our Use Case • O’Connor’s Online – Search portal built on MarkLogic – Legal rules and commentaries content – Problem • Users will enter citation numbers, abbreviations, etc. expecting complete results • Text editorial content follows different conventions – Solution • Detect special cases pre-search and enrich query
  • 24. Example: detect year • Content: – MarkLogic database of news/op-ed articles • Organized into year directories: /content/1990 /content/1991 /content/1992 ... /content/2012 • Year is in directory structure, not article text – But users will still include year in search terms
  • 25. How to transform query? • Recursive typeswitch (function mapping on): do-stuff-here($q)
  • 27. Example: detect year let $terms := "1996 United States Olympics" return local:detect-year(search:parse($terms))
  • 28. Example: detect year • Strategy depends on your content model • Other possibilities – date detection – date ranges – locations – etc.
  • 29. search:parse() Strategy • Weakness – Limited to single word token • Similar to custom term handling • What about multiple tokens? – Analyze querystring text directly using regex • Dangerous – Transform cts:query XML into intermediate form • Preserve Boolean logic & grouping • Preserve phrases • Preserve constraints
  • 30. Building Intermediate Query • The hack – Basically, undoing some of the parser's work – Text "run" concept • Similar to WordprocessingML
  • 31. Building Intermediate Query • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 32. Example: multi-word thesaurus • Content: – Same MarkLogic database of news/op-ed articles from detect-year() example • Query: – Same as before: "1996 United States Olypmics" – Start with the search:parse()output
  • 33. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 34. Example: multi-word thesaurus 1. Flatten query – remove implicit and-queries from search:parse() output:
  • 35. 1. Flatten query – XML should look more like cts:query string representation: Example: multi-word thesaurus cts:and-query( (cts:word-query("1996", "lang=en", 1), cts:word-query("United", "lang=en", 1), cts:word-query("States", "lang=en", 1), cts:word-query("Olympics", "lang=en", 1)), ())
  • 36. 1. Flatten query • Typeswitch on cts:and-query: 1. Check and-queries for parent and-query 2. Remove the nested ones, copy through anything else Example: multi-word thesaurus
  • 37. Example: multi-word thesaurus 1. Flatten query – Typeswitch function output:
  • 38. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 39. Example: multi-word thesaurus 2. Join sibling words in <run>: • Typeswitch on cts:word-query: 1. Ignore phrases 2. Delete if query is not the first. 3. Take first word-query in sequence and join with its following siblings into a <run>
  • 40. 2. Join sibling words in <run>: • Input: – search:parse("1996 United States Olympics")/local:unnest- ands(.)/local:create-runs(.) • Output: Example: multi-word thesaurus
  • 41. 2. Join sibling words in <run>: • Input: – search:parse("1996 (sprint OR marathon) United States Olympics")/local:unnest-ands(.)/local:create-runs(.) • Output: Example: multi-word thesaurus
  • 42. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 43. Example: multi-word thesaurus 3. Transform <run>s: 1. Store terms in thesaurus 2. Build cts:or-query of thesaurus terms 3. Using cts:or-query of terms, cts:highlight() <run>s, and replace with thesaurus synonyms
  • 44. 3. Transform <run>s: 1. store terms in thesaurus Example: multi-word thesaurus
  • 45. 3. Transform <run>s: 2. build cts:or-query of thesaurus terms: Example: multi-word thesaurus
  • 46. 3. Transform <run>s: 3. replace matches with synonyms: – cts:highlight() - powerful cts:query-based find/replace » » Example: multi-word thesaurus
  • 47. 3. Transform <run>s: 3. replace matches with synonyms: Example: multi-word thesaurus
  • 48. 3. Transform <run>s: Input: Example: multi-word thesaurus let $q-thsr := cts:or-query( doc("thesaurus.xml") //thsr:entry/thsr:term/cts:word-query(string(.))) ) let $q-runs := search:parse("1996 United States Olympics") /local:unnest-ands(.)/local:create-runs(.) return local:thsr-expand($runs, $q-thsr)
  • 50. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 51. 4. Convert <run>s back to word queries – Typeswitch: Example: multi-word thesaurus
  • 52. 4. Convert <run>s back to word queries Input: Example: multi-word thesaurus let $q-thsr := cts:or-query( doc("thesaurus.xml") //thsr:entry/thsr:term/cts:word-query(string(.))) ) let $runs := search:parse("1996 United States Olympics") /local:unnest-ands(.)/local:create-runs(.) let $expanded := local:thsr-expand($runs, $q-thsr) return local:resolve-runs($expanded)
  • 53. 4. Convert <run>s back to word queries Output: Example: multi-word thesaurus
  • 55. Enrich Your Query! • Takeaway 1. No added GUI 2. Didn't ask the user for additional input 3. Able to build more robust query before executing search
  • 56. • Many potential applications: – Ad-hoc weighting: Search API Hacking local:q-add-weights( search:parse("bananas"), (<element ns="$ns" name="p" weight="1"/>, <element ns="$ns" name="b" weight="2"/>, <element ns="$ns" name="title" weight="3.5"/>) )
  • 57. • Many potential applications: – Automatic spell correction: Search API Hacking
  • 58. • Many potential applications: – Detect entities • Transform text into element-based query • Less false positives and exclusions • Leverage indexes: Search API Hacking "New York Times"
  • 59. Search API Hacking • Other ideas – Regex unparsed query string • apply constraints, operators, etc as configured in Search API based on key words/patterns – Custom term handler • single-term transformations – Combine with data enrichment on ingestion • MarkLogic Entity Framework • Linguistic processing
  • 60. Hazards • Chaos – Daisy chained transformations can have unintended consequences – Performance • Pre-search transformations need to be fast • make sure to leverage indexes as much as possible • Larger queries do take longer