SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
SOLR BASED
SEARCH
SHUBHANGI PARDESHI
ENHANCE SEARCH WITH SYNONYM , PROXIMITY, PHRASES, RELEVANCY RANKED SORTING AND
MANY MORE !!
SOLR BASED SEARCH
CONTENTS
▸ Introduction to Lucene
▸ Introduction to Solr
▸ Terminologies
▸ Steps
▸ Document / Query Analysis
▸ Solr Search Features
▸ Solr Search - Query types
▸ Search Interfaces
▸ Search Challenges and solution
SOLR BASED SEARCH
WHAT IS LUCENE ?
▸ Open Source full text search (IR) library /API
▸ Witten in Java by Doug Cutting
▸ Major Components
▸ Indexing (Inverted Index : keyword -> page) : IndexWritter , (20-30% of
data size)
▸ Search Algorithm : IndexSearcher
▸ No notion of schema
▸ Example Usage : Atlassian Jira / Confluence , Salesforce, Oracle Text Search
▸ Lucene is very powerful & difficult to use
SOLR BASED SEARCH
WHAT IS SOLR ?
▸ A full text Enterprise Search Server
▸ Caching
▸ Replication
▸ Easy administration
▸ Web Service layer on top of Lucene
▸ Non-Relation data storage and processing
▸ Loose schema to define type and fields
▸ Better recall and precision with various configurations options
▸ Easy to use
SOLR BASED SEARCH
TERMINOLOGIES
▸ Document : Unit of Index and Search
▸ Format : XML , JSON , CSV
▸ Fields : Name - Value pair , type is associated with each
field
▸ Search :
▸ Query : QueryParser - Creates query ——- >
IndexSearcher —- > Return hits
▸ Create Indexes
▸ Build Document
▸ Analyse Document
▸ Index Document
▸ Search
▸ Input Query
▸ Analyse Query
▸ Render Result
SOLR BASED SEARCH
STEPS
GET
CONTENTS
BUILD
SOLR DOC
ANALYSE
DOC
INDEX DOC
SEARCH UI
BUILD
QUERY
SEARCH
QUERY
STRING
RENDER
RESULT
ANALYSE
QUERY
CREATE INDEXES
SEARCH DOCUMENT
SOLR BASED SEARCH
SEARCH STRING / DOCUMENT ANALYSIS
▸ Analysis = Analyzer + Tokenizer + Filter
▸ Analyzer for Index and Search may or may not same
▸ E.g. <filedType name=“nametext” class=“solo.TextField”>
<analyzer class=“org.apache.lucene.analysis.core.WhitespaceAnalyzer” />
<fieldType>
<fieldType name=“nametext” class=“solo.TextField”>
<analyzer type=“index”>
<tokenizer class=“solo.StandardTokenizerFactory” />
<filter class=“solr.LowerCaseFilterFactory” />
<filter class=“solr.KeepWorFilterFactory” words=“keepwords.txt” />
<filter class=“solr.SynonymFilterFactory” synonyms=“synonymsfile.txt” />
<analyzer>
<analyzer type=“query”>
<tokenizer class=“solo.StandardTokenizerFactory” />
<filter class=“solo.LowerCaseFilterFactory” />
<analyzer>
<fieldType>
SOLR BASED SEARCH
SOLR SEARCH FEATURES
▸ Ranked Search : High score documents at top , score is one of the field in hits
▸ Field Searching
▸ Custom Sort by Field
▸ Boosting Result
▸ Multiword synonyms (Solr 6.5 onwards)
▸ Stemming
▸ Hit highlight
▸ Autocomplete
SOLR BASED SEARCH
SOLR SEARCH FEATURES
▸ Faceting
▸ Term Frequency
▸ Document age consideration
▸ Spellchecks
▸ Typo tolerant
▸ Phonetic match
▸ OpenNLP / UIMA integration
▸ Pagination
▸ Functions for computation (Like this)
▸ So on …
SOLR BASED SEARCH
SOLR SEARCH - VARIOUS TYPES OF QUERIES
▸ Simple text Search :
▸ Find films where genre contains word “action” (q=genre:Action)
▸ Find films where genre contains word “Thriller” (q=genre:Thriller)
▸ Find films where genre contains words Action and Thriller ( fq=genre:Action&fq=genre:Thriller&q=*:*)
▸ Find films directed by Gary Bose (q=directed_by:Gary&q=directed_by:bose)
▸ Strict term presence search :
▸ Find films where genre contains word “action” as well as “Thriller" (q=*:*&fq=genre:(+action +thriller))
▸ Find films directed by person whose name contains words “Gary” as well as
“Bose” (fq=+directed_by:Bose&fq=+directed_by:Gary)
▸ Proximity Search
▸ Find films where genre contains words Action and Thriller 5 words apart (q=*:*&fq=genre:"action
adventure”~20)
SOLR BASED SEARCH
SOLR SEARCH - VARIOUS TYPES OF QUERIES
▸ Phrase Search
▸ Find films with genre “Action Thriller” (q=genre:”Action Thriller”) (or this way)
(q=*:*&fq=genre:”action thriller”)
▸ Faceted Search
▸ Movies released during 2005 and 2006 , get count for each director
(fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO
2006-11-30T00:00:00Z ]&q=*:*&&facet=true&facet.field=directed_by)
▸ Fuzzy Search (~)
▸ Genre contains word sychologikal (q=*:*&fq=genre:sychologikal~)
▸ Negative Search
▸ Genre contains only Action but no Thriller (q=*:*&fq=genre:action&fq=-genre:thriller)
TEXT
▸ Wildcard Search
▸ Genre contains word like *ction (q=*:*&fq=genre:*ction)
▸ Conditional Logic in search
▸ Genre contains Psychological AND thriller (q=*:*&fq=genre:(psychological AND Thriller))
▸ Genre contains Psycological OR Thriller (q=*:*&fq=genre:(psychological OR Thriller))
▸ Genre contains Psychological but no Thriller (q=*:*&fq=genre:(psychological NOT
Thriller))
▸ Range Search
▸ Movies released during 2005 and 2006 (fl=initial_release_date&fq=initial_release_date:
[ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*)
▸ So on…
TEXT
EXAMPLES OF SEARCH INTERFACE
▸ REST API
▸ http://<host>:<port>/solr/<collection>/query?
▸ APIs such as SolrJ
▸ Solr Admin UI
SOLR BASED SEARCH
PRECISION AND RECALL WITH SOLR
WHAT
HOW
Results Relevant Results More Hits More Relevant Results at High Rank
Solr Synonyms
Fuzzy Search
Proximity Search
Phrase Search
Negative Search
Strict Term Presence
Doc Boosting
Index Binary Docs
Multiline Search
Index Many Fields
Search String Limit
SOLR BASED SEARCH
CHALLENGES
▸ Domain Specific knowledge transformation into config files as Synonym , protwords etc
▸ Proper Solr Collection Configuration
▸ Slight change in query string words changes search results considerably
▸ Use stemming
▸ High recall
▸ Limit search by score
▸ Spaces
▸ Custom tokanizer
▸ Spelling mistakes in query string
▸ Fuzzy Search and/or spelling checkers
▸ Document Field names get indexed
SOLR BASED SEARCH
THANK YOU !!

Weitere ähnliche Inhalte

Ähnlich wie Solr basedsearch

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UNLucidworks
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
Cheatsheet: Google Search
Cheatsheet: Google SearchCheatsheet: Google Search
Cheatsheet: Google SearchKasper de Waard
 
Amrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfAmrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfamrapalibuildersreviews
 
Martina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteMartina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteNordicSitecoreConference
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...Josue Balandrano
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Houston tech fest dev intro to sharepoint search
Houston tech fest   dev intro to sharepoint searchHouston tech fest   dev intro to sharepoint search
Houston tech fest dev intro to sharepoint searchMichael Oryszak
 
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsFinding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsMichael Reinsch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
 

Ähnlich wie Solr basedsearch (20)

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Cheatsheet: Google Search
Cheatsheet: Google SearchCheatsheet: Google Search
Cheatsheet: Google Search
 
Google Search Cheat Sheet
Google Search Cheat SheetGoogle Search Cheat Sheet
Google Search Cheat Sheet
 
Amrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfAmrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdf
 
Google Is a Two Page Site
Google Is a Two Page SiteGoogle Is a Two Page Site
Google Is a Two Page Site
 
Martina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteMartina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesite
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Google cheatsheet
Google cheatsheetGoogle cheatsheet
Google cheatsheet
 
Azure Search for Your Apps
Azure Search for Your AppsAzure Search for Your Apps
Azure Search for Your Apps
 
Azure Search for Your App
Azure Search for Your AppAzure Search for Your App
Azure Search for Your App
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Houston tech fest dev intro to sharepoint search
Houston tech fest   dev intro to sharepoint searchHouston tech fest   dev intro to sharepoint search
Houston tech fest dev intro to sharepoint search
 
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsFinding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 

Kürzlich hochgeladen

Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Kürzlich hochgeladen (20)

Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

Solr basedsearch

  • 1. SOLR BASED SEARCH SHUBHANGI PARDESHI ENHANCE SEARCH WITH SYNONYM , PROXIMITY, PHRASES, RELEVANCY RANKED SORTING AND MANY MORE !!
  • 2. SOLR BASED SEARCH CONTENTS ▸ Introduction to Lucene ▸ Introduction to Solr ▸ Terminologies ▸ Steps ▸ Document / Query Analysis ▸ Solr Search Features ▸ Solr Search - Query types ▸ Search Interfaces ▸ Search Challenges and solution
  • 3. SOLR BASED SEARCH WHAT IS LUCENE ? ▸ Open Source full text search (IR) library /API ▸ Witten in Java by Doug Cutting ▸ Major Components ▸ Indexing (Inverted Index : keyword -> page) : IndexWritter , (20-30% of data size) ▸ Search Algorithm : IndexSearcher ▸ No notion of schema ▸ Example Usage : Atlassian Jira / Confluence , Salesforce, Oracle Text Search ▸ Lucene is very powerful & difficult to use
  • 4. SOLR BASED SEARCH WHAT IS SOLR ? ▸ A full text Enterprise Search Server ▸ Caching ▸ Replication ▸ Easy administration ▸ Web Service layer on top of Lucene ▸ Non-Relation data storage and processing ▸ Loose schema to define type and fields ▸ Better recall and precision with various configurations options ▸ Easy to use
  • 5. SOLR BASED SEARCH TERMINOLOGIES ▸ Document : Unit of Index and Search ▸ Format : XML , JSON , CSV ▸ Fields : Name - Value pair , type is associated with each field ▸ Search : ▸ Query : QueryParser - Creates query ——- > IndexSearcher —- > Return hits
  • 6. ▸ Create Indexes ▸ Build Document ▸ Analyse Document ▸ Index Document ▸ Search ▸ Input Query ▸ Analyse Query ▸ Render Result SOLR BASED SEARCH STEPS GET CONTENTS BUILD SOLR DOC ANALYSE DOC INDEX DOC SEARCH UI BUILD QUERY SEARCH QUERY STRING RENDER RESULT ANALYSE QUERY CREATE INDEXES SEARCH DOCUMENT
  • 7. SOLR BASED SEARCH SEARCH STRING / DOCUMENT ANALYSIS ▸ Analysis = Analyzer + Tokenizer + Filter ▸ Analyzer for Index and Search may or may not same ▸ E.g. <filedType name=“nametext” class=“solo.TextField”> <analyzer class=“org.apache.lucene.analysis.core.WhitespaceAnalyzer” /> <fieldType> <fieldType name=“nametext” class=“solo.TextField”> <analyzer type=“index”> <tokenizer class=“solo.StandardTokenizerFactory” /> <filter class=“solr.LowerCaseFilterFactory” /> <filter class=“solr.KeepWorFilterFactory” words=“keepwords.txt” /> <filter class=“solr.SynonymFilterFactory” synonyms=“synonymsfile.txt” /> <analyzer> <analyzer type=“query”> <tokenizer class=“solo.StandardTokenizerFactory” /> <filter class=“solo.LowerCaseFilterFactory” /> <analyzer> <fieldType>
  • 8. SOLR BASED SEARCH SOLR SEARCH FEATURES ▸ Ranked Search : High score documents at top , score is one of the field in hits ▸ Field Searching ▸ Custom Sort by Field ▸ Boosting Result ▸ Multiword synonyms (Solr 6.5 onwards) ▸ Stemming ▸ Hit highlight ▸ Autocomplete
  • 9. SOLR BASED SEARCH SOLR SEARCH FEATURES ▸ Faceting ▸ Term Frequency ▸ Document age consideration ▸ Spellchecks ▸ Typo tolerant ▸ Phonetic match ▸ OpenNLP / UIMA integration ▸ Pagination ▸ Functions for computation (Like this) ▸ So on …
  • 10. SOLR BASED SEARCH SOLR SEARCH - VARIOUS TYPES OF QUERIES ▸ Simple text Search : ▸ Find films where genre contains word “action” (q=genre:Action) ▸ Find films where genre contains word “Thriller” (q=genre:Thriller) ▸ Find films where genre contains words Action and Thriller ( fq=genre:Action&fq=genre:Thriller&q=*:*) ▸ Find films directed by Gary Bose (q=directed_by:Gary&q=directed_by:bose) ▸ Strict term presence search : ▸ Find films where genre contains word “action” as well as “Thriller" (q=*:*&fq=genre:(+action +thriller)) ▸ Find films directed by person whose name contains words “Gary” as well as “Bose” (fq=+directed_by:Bose&fq=+directed_by:Gary) ▸ Proximity Search ▸ Find films where genre contains words Action and Thriller 5 words apart (q=*:*&fq=genre:"action adventure”~20)
  • 11. SOLR BASED SEARCH SOLR SEARCH - VARIOUS TYPES OF QUERIES ▸ Phrase Search ▸ Find films with genre “Action Thriller” (q=genre:”Action Thriller”) (or this way) (q=*:*&fq=genre:”action thriller”) ▸ Faceted Search ▸ Movies released during 2005 and 2006 , get count for each director (fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*&&facet=true&facet.field=directed_by) ▸ Fuzzy Search (~) ▸ Genre contains word sychologikal (q=*:*&fq=genre:sychologikal~) ▸ Negative Search ▸ Genre contains only Action but no Thriller (q=*:*&fq=genre:action&fq=-genre:thriller)
  • 12. TEXT ▸ Wildcard Search ▸ Genre contains word like *ction (q=*:*&fq=genre:*ction) ▸ Conditional Logic in search ▸ Genre contains Psychological AND thriller (q=*:*&fq=genre:(psychological AND Thriller)) ▸ Genre contains Psycological OR Thriller (q=*:*&fq=genre:(psychological OR Thriller)) ▸ Genre contains Psychological but no Thriller (q=*:*&fq=genre:(psychological NOT Thriller)) ▸ Range Search ▸ Movies released during 2005 and 2006 (fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*) ▸ So on…
  • 13. TEXT EXAMPLES OF SEARCH INTERFACE ▸ REST API ▸ http://<host>:<port>/solr/<collection>/query? ▸ APIs such as SolrJ ▸ Solr Admin UI
  • 14. SOLR BASED SEARCH PRECISION AND RECALL WITH SOLR WHAT HOW Results Relevant Results More Hits More Relevant Results at High Rank Solr Synonyms Fuzzy Search Proximity Search Phrase Search Negative Search Strict Term Presence Doc Boosting Index Binary Docs Multiline Search Index Many Fields Search String Limit
  • 15. SOLR BASED SEARCH CHALLENGES ▸ Domain Specific knowledge transformation into config files as Synonym , protwords etc ▸ Proper Solr Collection Configuration ▸ Slight change in query string words changes search results considerably ▸ Use stemming ▸ High recall ▸ Limit search by score ▸ Spaces ▸ Custom tokanizer ▸ Spelling mistakes in query string ▸ Fuzzy Search and/or spelling checkers ▸ Document Field names get indexed