SlideShare ist ein Scribd-Unternehmen logo
SOLR BASED
SEARCH
SHUBHANGI PARDESHI
ENHANCE SEARCH WITH SYNONYM , PROXIMITY, PHRASES, RELEVANCY RANKED SORTING AND
MANY MORE !!
SOLR BASED SEARCH
CONTENTS
▸ Introduction to Lucene
▸ Introduction to Solr
▸ Terminologies
▸ Steps
▸ Document / Query Analysis
▸ Solr Search Features
▸ Solr Search - Query types
▸ Search Interfaces
▸ Search Challenges and solution
SOLR BASED SEARCH
WHAT IS LUCENE ?
▸ Open Source full text search (IR) library /API
▸ Witten in Java by Doug Cutting
▸ Major Components
▸ Indexing (Inverted Index : keyword -> page) : IndexWritter , (20-30% of
data size)
▸ Search Algorithm : IndexSearcher
▸ No notion of schema
▸ Example Usage : Atlassian Jira / Confluence , Salesforce, Oracle Text Search
▸ Lucene is very powerful & difficult to use
SOLR BASED SEARCH
WHAT IS SOLR ?
▸ A full text Enterprise Search Server
▸ Caching
▸ Replication
▸ Easy administration
▸ Web Service layer on top of Lucene
▸ Non-Relation data storage and processing
▸ Loose schema to define type and fields
▸ Better recall and precision with various configurations options
▸ Easy to use
SOLR BASED SEARCH
TERMINOLOGIES
▸ Document : Unit of Index and Search
▸ Format : XML , JSON , CSV
▸ Fields : Name - Value pair , type is associated with each
field
▸ Search :
▸ Query : QueryParser - Creates query ——- >
IndexSearcher —- > Return hits
▸ Create Indexes
▸ Build Document
▸ Analyse Document
▸ Index Document
▸ Search
▸ Input Query
▸ Analyse Query
▸ Render Result
SOLR BASED SEARCH
STEPS
GET
CONTENTS
BUILD
SOLR DOC
ANALYSE
DOC
INDEX DOC
SEARCH UI
BUILD
QUERY
SEARCH
QUERY
STRING
RENDER
RESULT
ANALYSE
QUERY
CREATE INDEXES
SEARCH DOCUMENT
SOLR BASED SEARCH
SEARCH STRING / DOCUMENT ANALYSIS
▸ Analysis = Analyzer + Tokenizer + Filter
▸ Analyzer for Index and Search may or may not same
▸ E.g. <filedType name=“nametext” class=“solo.TextField”>
<analyzer class=“org.apache.lucene.analysis.core.WhitespaceAnalyzer” />
<fieldType>
<fieldType name=“nametext” class=“solo.TextField”>
<analyzer type=“index”>
<tokenizer class=“solo.StandardTokenizerFactory” />
<filter class=“solr.LowerCaseFilterFactory” />
<filter class=“solr.KeepWorFilterFactory” words=“keepwords.txt” />
<filter class=“solr.SynonymFilterFactory” synonyms=“synonymsfile.txt” />
<analyzer>
<analyzer type=“query”>
<tokenizer class=“solo.StandardTokenizerFactory” />
<filter class=“solo.LowerCaseFilterFactory” />
<analyzer>
<fieldType>
SOLR BASED SEARCH
SOLR SEARCH FEATURES
▸ Ranked Search : High score documents at top , score is one of the field in hits
▸ Field Searching
▸ Custom Sort by Field
▸ Boosting Result
▸ Multiword synonyms (Solr 6.5 onwards)
▸ Stemming
▸ Hit highlight
▸ Autocomplete
SOLR BASED SEARCH
SOLR SEARCH FEATURES
▸ Faceting
▸ Term Frequency
▸ Document age consideration
▸ Spellchecks
▸ Typo tolerant
▸ Phonetic match
▸ OpenNLP / UIMA integration
▸ Pagination
▸ Functions for computation (Like this)
▸ So on …
SOLR BASED SEARCH
SOLR SEARCH - VARIOUS TYPES OF QUERIES
▸ Simple text Search :
▸ Find films where genre contains word “action” (q=genre:Action)
▸ Find films where genre contains word “Thriller” (q=genre:Thriller)
▸ Find films where genre contains words Action and Thriller ( fq=genre:Action&fq=genre:Thriller&q=*:*)
▸ Find films directed by Gary Bose (q=directed_by:Gary&q=directed_by:bose)
▸ Strict term presence search :
▸ Find films where genre contains word “action” as well as “Thriller" (q=*:*&fq=genre:(+action +thriller))
▸ Find films directed by person whose name contains words “Gary” as well as
“Bose” (fq=+directed_by:Bose&fq=+directed_by:Gary)
▸ Proximity Search
▸ Find films where genre contains words Action and Thriller 5 words apart (q=*:*&fq=genre:"action
adventure”~20)
SOLR BASED SEARCH
SOLR SEARCH - VARIOUS TYPES OF QUERIES
▸ Phrase Search
▸ Find films with genre “Action Thriller” (q=genre:”Action Thriller”) (or this way)
(q=*:*&fq=genre:”action thriller”)
▸ Faceted Search
▸ Movies released during 2005 and 2006 , get count for each director
(fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO
2006-11-30T00:00:00Z ]&q=*:*&&facet=true&facet.field=directed_by)
▸ Fuzzy Search (~)
▸ Genre contains word sychologikal (q=*:*&fq=genre:sychologikal~)
▸ Negative Search
▸ Genre contains only Action but no Thriller (q=*:*&fq=genre:action&fq=-genre:thriller)
TEXT
▸ Wildcard Search
▸ Genre contains word like *ction (q=*:*&fq=genre:*ction)
▸ Conditional Logic in search
▸ Genre contains Psychological AND thriller (q=*:*&fq=genre:(psychological AND Thriller))
▸ Genre contains Psycological OR Thriller (q=*:*&fq=genre:(psychological OR Thriller))
▸ Genre contains Psychological but no Thriller (q=*:*&fq=genre:(psychological NOT
Thriller))
▸ Range Search
▸ Movies released during 2005 and 2006 (fl=initial_release_date&fq=initial_release_date:
[ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*)
▸ So on…
TEXT
EXAMPLES OF SEARCH INTERFACE
▸ REST API
▸ http://<host>:<port>/solr/<collection>/query?
▸ APIs such as SolrJ
▸ Solr Admin UI
SOLR BASED SEARCH
PRECISION AND RECALL WITH SOLR
WHAT
HOW
Results Relevant Results More Hits More Relevant Results at High Rank
Solr Synonyms
Fuzzy Search
Proximity Search
Phrase Search
Negative Search
Strict Term Presence
Doc Boosting
Index Binary Docs
Multiline Search
Index Many Fields
Search String Limit
SOLR BASED SEARCH
CHALLENGES
▸ Domain Specific knowledge transformation into config files as Synonym , protwords etc
▸ Proper Solr Collection Configuration
▸ Slight change in query string words changes search results considerably
▸ Use stemming
▸ High recall
▸ Limit search by score
▸ Spaces
▸ Custom tokanizer
▸ Spelling mistakes in query string
▸ Fuzzy Search and/or spelling checkers
▸ Document Field names get indexed
SOLR BASED SEARCH
THANK YOU !!

Weitere ähnliche Inhalte

Ähnlich wie Solr basedsearch

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Jayesh Bhoyar
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Lucidworks
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Cheatsheet: Google Search
Cheatsheet: Google SearchCheatsheet: Google Search
Cheatsheet: Google Search
Kasper de Waard
 
Google Search Cheat Sheet
Google Search Cheat SheetGoogle Search Cheat Sheet
Google Search Cheat Sheet
Tiffany Hamburg Hamburg
 
Amrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfAmrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdf
amrapalibuildersreviews
 
Google Is a Two Page Site
Google Is a Two Page SiteGoogle Is a Two Page Site
Google Is a Two Page Site
Martina Helene Welander
 
Martina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteMartina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesite
NordicSitecoreConference
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
Josue Balandrano
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
otisg
 
Google cheatsheet
Google cheatsheetGoogle cheatsheet
Google cheatsheet
Alejandro Rivera Santander
 
Azure Search for Your Apps
Azure Search for Your AppsAzure Search for Your Apps
Azure Search for Your Apps
Nurul Arif Setiawan
 
Azure Search for Your App
Azure Search for Your AppAzure Search for Your App
Azure Search for Your App
Nurul Arif Setiawan
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
Kira
 
Houston tech fest dev intro to sharepoint search
Houston tech fest   dev intro to sharepoint searchHouston tech fest   dev intro to sharepoint search
Houston tech fest dev intro to sharepoint search
Michael Oryszak
 
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsFinding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Michael Reinsch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Sperasoft
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Amine Ferchichi
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Alexandre Rafalovitch
 

Ähnlich wie Solr basedsearch (20)

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Cheatsheet: Google Search
Cheatsheet: Google SearchCheatsheet: Google Search
Cheatsheet: Google Search
 
Google Search Cheat Sheet
Google Search Cheat SheetGoogle Search Cheat Sheet
Google Search Cheat Sheet
 
Amrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfAmrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdf
 
Google Is a Two Page Site
Google Is a Two Page SiteGoogle Is a Two Page Site
Google Is a Two Page Site
 
Martina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteMartina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesite
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Google cheatsheet
Google cheatsheetGoogle cheatsheet
Google cheatsheet
 
Azure Search for Your Apps
Azure Search for Your AppsAzure Search for Your Apps
Azure Search for Your Apps
 
Azure Search for Your App
Azure Search for Your AppAzure Search for Your App
Azure Search for Your App
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Houston tech fest dev intro to sharepoint search
Houston tech fest   dev intro to sharepoint searchHouston tech fest   dev intro to sharepoint search
Houston tech fest dev intro to sharepoint search
 
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsFinding the right stuff, an intro to Elasticsearch with Ruby/Rails
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 

Kürzlich hochgeladen

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
ramrag33
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
IJECEIAES
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
architagupta876
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
GauravCar
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 

Kürzlich hochgeladen (20)

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 

Solr basedsearch

  • 1. SOLR BASED SEARCH SHUBHANGI PARDESHI ENHANCE SEARCH WITH SYNONYM , PROXIMITY, PHRASES, RELEVANCY RANKED SORTING AND MANY MORE !!
  • 2. SOLR BASED SEARCH CONTENTS ▸ Introduction to Lucene ▸ Introduction to Solr ▸ Terminologies ▸ Steps ▸ Document / Query Analysis ▸ Solr Search Features ▸ Solr Search - Query types ▸ Search Interfaces ▸ Search Challenges and solution
  • 3. SOLR BASED SEARCH WHAT IS LUCENE ? ▸ Open Source full text search (IR) library /API ▸ Witten in Java by Doug Cutting ▸ Major Components ▸ Indexing (Inverted Index : keyword -> page) : IndexWritter , (20-30% of data size) ▸ Search Algorithm : IndexSearcher ▸ No notion of schema ▸ Example Usage : Atlassian Jira / Confluence , Salesforce, Oracle Text Search ▸ Lucene is very powerful & difficult to use
  • 4. SOLR BASED SEARCH WHAT IS SOLR ? ▸ A full text Enterprise Search Server ▸ Caching ▸ Replication ▸ Easy administration ▸ Web Service layer on top of Lucene ▸ Non-Relation data storage and processing ▸ Loose schema to define type and fields ▸ Better recall and precision with various configurations options ▸ Easy to use
  • 5. SOLR BASED SEARCH TERMINOLOGIES ▸ Document : Unit of Index and Search ▸ Format : XML , JSON , CSV ▸ Fields : Name - Value pair , type is associated with each field ▸ Search : ▸ Query : QueryParser - Creates query ——- > IndexSearcher —- > Return hits
  • 6. ▸ Create Indexes ▸ Build Document ▸ Analyse Document ▸ Index Document ▸ Search ▸ Input Query ▸ Analyse Query ▸ Render Result SOLR BASED SEARCH STEPS GET CONTENTS BUILD SOLR DOC ANALYSE DOC INDEX DOC SEARCH UI BUILD QUERY SEARCH QUERY STRING RENDER RESULT ANALYSE QUERY CREATE INDEXES SEARCH DOCUMENT
  • 7. SOLR BASED SEARCH SEARCH STRING / DOCUMENT ANALYSIS ▸ Analysis = Analyzer + Tokenizer + Filter ▸ Analyzer for Index and Search may or may not same ▸ E.g. <filedType name=“nametext” class=“solo.TextField”> <analyzer class=“org.apache.lucene.analysis.core.WhitespaceAnalyzer” /> <fieldType> <fieldType name=“nametext” class=“solo.TextField”> <analyzer type=“index”> <tokenizer class=“solo.StandardTokenizerFactory” /> <filter class=“solr.LowerCaseFilterFactory” /> <filter class=“solr.KeepWorFilterFactory” words=“keepwords.txt” /> <filter class=“solr.SynonymFilterFactory” synonyms=“synonymsfile.txt” /> <analyzer> <analyzer type=“query”> <tokenizer class=“solo.StandardTokenizerFactory” /> <filter class=“solo.LowerCaseFilterFactory” /> <analyzer> <fieldType>
  • 8. SOLR BASED SEARCH SOLR SEARCH FEATURES ▸ Ranked Search : High score documents at top , score is one of the field in hits ▸ Field Searching ▸ Custom Sort by Field ▸ Boosting Result ▸ Multiword synonyms (Solr 6.5 onwards) ▸ Stemming ▸ Hit highlight ▸ Autocomplete
  • 9. SOLR BASED SEARCH SOLR SEARCH FEATURES ▸ Faceting ▸ Term Frequency ▸ Document age consideration ▸ Spellchecks ▸ Typo tolerant ▸ Phonetic match ▸ OpenNLP / UIMA integration ▸ Pagination ▸ Functions for computation (Like this) ▸ So on …
  • 10. SOLR BASED SEARCH SOLR SEARCH - VARIOUS TYPES OF QUERIES ▸ Simple text Search : ▸ Find films where genre contains word “action” (q=genre:Action) ▸ Find films where genre contains word “Thriller” (q=genre:Thriller) ▸ Find films where genre contains words Action and Thriller ( fq=genre:Action&fq=genre:Thriller&q=*:*) ▸ Find films directed by Gary Bose (q=directed_by:Gary&q=directed_by:bose) ▸ Strict term presence search : ▸ Find films where genre contains word “action” as well as “Thriller" (q=*:*&fq=genre:(+action +thriller)) ▸ Find films directed by person whose name contains words “Gary” as well as “Bose” (fq=+directed_by:Bose&fq=+directed_by:Gary) ▸ Proximity Search ▸ Find films where genre contains words Action and Thriller 5 words apart (q=*:*&fq=genre:"action adventure”~20)
  • 11. SOLR BASED SEARCH SOLR SEARCH - VARIOUS TYPES OF QUERIES ▸ Phrase Search ▸ Find films with genre “Action Thriller” (q=genre:”Action Thriller”) (or this way) (q=*:*&fq=genre:”action thriller”) ▸ Faceted Search ▸ Movies released during 2005 and 2006 , get count for each director (fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*&&facet=true&facet.field=directed_by) ▸ Fuzzy Search (~) ▸ Genre contains word sychologikal (q=*:*&fq=genre:sychologikal~) ▸ Negative Search ▸ Genre contains only Action but no Thriller (q=*:*&fq=genre:action&fq=-genre:thriller)
  • 12. TEXT ▸ Wildcard Search ▸ Genre contains word like *ction (q=*:*&fq=genre:*ction) ▸ Conditional Logic in search ▸ Genre contains Psychological AND thriller (q=*:*&fq=genre:(psychological AND Thriller)) ▸ Genre contains Psycological OR Thriller (q=*:*&fq=genre:(psychological OR Thriller)) ▸ Genre contains Psychological but no Thriller (q=*:*&fq=genre:(psychological NOT Thriller)) ▸ Range Search ▸ Movies released during 2005 and 2006 (fl=initial_release_date&fq=initial_release_date: [ 2005-10-27T00:00:00Z TO 2006-11-30T00:00:00Z ]&q=*:*) ▸ So on…
  • 13. TEXT EXAMPLES OF SEARCH INTERFACE ▸ REST API ▸ http://<host>:<port>/solr/<collection>/query? ▸ APIs such as SolrJ ▸ Solr Admin UI
  • 14. SOLR BASED SEARCH PRECISION AND RECALL WITH SOLR WHAT HOW Results Relevant Results More Hits More Relevant Results at High Rank Solr Synonyms Fuzzy Search Proximity Search Phrase Search Negative Search Strict Term Presence Doc Boosting Index Binary Docs Multiline Search Index Many Fields Search String Limit
  • 15. SOLR BASED SEARCH CHALLENGES ▸ Domain Specific knowledge transformation into config files as Synonym , protwords etc ▸ Proper Solr Collection Configuration ▸ Slight change in query string words changes search results considerably ▸ Use stemming ▸ High recall ▸ Limit search by score ▸ Spaces ▸ Custom tokanizer ▸ Spelling mistakes in query string ▸ Fuzzy Search and/or spelling checkers ▸ Document Field names get indexed