SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
How SOLR Search Works
Rajat Jain - 20th Dec, 2016
Agenda
• What do you mean by Search?
• Search Requirements
• Comparison of SOLR with SQL/NoSQL
• SOLR Architecture
• SOLR Usage in Trellis
• How Google Search Works
• Other Search Technologies
What do you mean by Search?
What do you mean by Search?
What do you mean by Search?
Search Requirements
• Text Search – eg. “Architects”
• Filters – eg. “In New Delhi”, “iOS”
• Sorting – eg. “Best Match”, “Highest Rating”, etc.
• And More..
• Facets
• Stemming
• Fuzzy Matching
• Image Search, etc.
Search Requirements
• Full Text Search
• Fast reads (writes can be slower)
• Various Combinations of Filters
• Various Combinations of Sorting
• Non Features:
• Real-time – usually staleness is not a problem
• Data Integrity – usually not a source of storage – can be ‘lossy’
Search Requirements – Faceted Search
• A Type of Filtering with
suggestions
• In most cases – sorted by
number
• Basically helps the user to
narrow down the search without
having to ‘guess’ how to narrow
it
Conventional Storage for Search
• SQL (MySQL)
• Relational Tables
• Normalized Data
• Assuming using Keys / Indexes for reads & writes
• Optimized for reads and writes & transactional data (acid transactions)
• Lots of security, etc.
• Table Data stored in File System
• Indexing - Individual columns – set of columns
• Full Text search – recent addition (full text index)
Conventional Storage for Search
• No SQL (think MongoDB)
• Key Value Pairs
• De-normalized Data
• Unstructured Data
• Optimized for Reads – writes can be slightly slower (in case of transactional)
• Data stored in File System
• Indexing – individual fields
• Full Text Search – has in-built support
Advantages of SOLR over MySQL/NoSQL
• Reversed Index
• Mind-blowing Text-analysis / stemming / scoring / fuzziness
• Weighting fields / boosting – custom scoring functions
• Single document concept – no relations (in general)
• Faceting support out-of-the box
• Optimized for search and search alone (at scale without performance
drop)
SOLR Architecture – Indexing
• Take a ‘document’ / field, etc.
• For each field apply set of filters
/ tokenizers
• Convert to individual tokens
• Update the ‘inverted’ index
based on the tokens
• In general in the Index keep
track of stats, etc. for the various
terms
• Different indexes per field
SOLR Architecture - Indexing
13
XML Update
Handler
CSV Update
Handler
/update /update/csv
XML Update
with custom
processor chain
/update/xml
Extracting
RequestHandler
(PDF, Word, …)
/update/extract
Lucene Index
Data Import
Handler
Database pull
RSS pull
Simple
transforms
SQL DB
RSS
feed
<doc>
<title>
Remove Duplicates
processor
Logging
processor
Index
processor
Custom Transform
processor
PDF
HTTP POST
HTTP POST
pull
pull
Update Processor Chain (per handler)
Lucene
Text Index
Analyzers
SOLR Architecture – Searching
• User enters query
• Parse the query, i.e. apply the
required filters and tokenizers
• Converted to tokens
• Parallel search across multiple
indexes (per field)
• Score all the documents
• Sort in async fashion
SOLR Architecture - Full
SOLR Architecture – Updating Index
• Types of Index Updates
• Instant Index
• Incremental Indexing
• Full Indexing
• Index Update Strategies
• Instant / Incremental Index cannot happen continuously
• Too much causes performance degradation
• Full Index periodically to optimize the index
SOLR Architecture – Scalability
• Sharding
• Splitting collections across servers
– search in parallel
• Replication
• More than one copy of the data
for failover
• SolrCloud
• Using Zookeeper for managing
clusters
SOLR Architecture – Other Features
• Stemming
• Identify root word and variations of the word, eg. "stems", "stemmer",
"stemming", "stemmed" as based on "stem"
• Fuzzy Matching
• Similar Words / Misspellings
• Edit Distance
• NLP
• Identify Entities / Nouns in Search Query
• OpenNLP Plugin for SOLR
• And much more…
SOLR Usage in Trellis
• Architecture
• Data-in from MySQL
• Index Update Strategy
• AutoComplete
• Basic Search
• Advanced Search
• Filters / Sorting / Facets & More
• Demo (Incl. Config Files)
How Google Search Works
• Crawling
• Robots.txt
• Indexing
• Multiple Indexes – Instant / Daily / Weekly / Long Tail
• Searching
• NLP, Stemming, Auto-correct, etc.
• Ranking – PageRank
• Video - https://www.youtube.com/watch?v=BNHR6IQJGZs
Other Search Technologies
• ElasticSearch
• Much newer than Solr
• Built-in scalability
• Uses same Lucene as the base
• JSON instead of XML
• Good for Analytical querying
• Others
• Splunk
• Sphinx
That’s All Folks
References
• SOLR Home Page -
http://lucene.apache.org/solr/
• Tutorials
• http://www.solrtutorial.com/index.h
tml
• https://lucene.apache.org/solr/4_10
_0/tutorial.html
• Just Google the rest!!

Weitere ähnliche Inhalte

Was ist angesagt?

Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with PythonGokhan Atil
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowDatabricks
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark mldatamantra
 

Was ist angesagt? (20)

Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark ml
 

Ähnlich wie How Solr Search Works

Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBAndrew Siemer
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyBozhidar Bozhanov
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuningNIKHIL DUBEY
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSaaS Is Beautiful
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for DrupalChris Caple
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution designAlexander Tokarev
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced editionAlexander Tokarev
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?SearchStax
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrJake Mannix
 

Ähnlich wie How Solr Search Works (20)

Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuning
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
Solr
SolrSolr
Solr
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+Solr
 

Mehr von Atlogys Technical Consulting

BDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy SeriesBDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy SeriesAtlogys Technical Consulting
 
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)Atlogys Technical Consulting
 
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech TalkInfinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech TalkAtlogys Technical Consulting
 
Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!Atlogys Technical Consulting
 

Mehr von Atlogys Technical Consulting (20)

Latest UI guidelines for Web Apps
Latest UI guidelines for Web AppsLatest UI guidelines for Web Apps
Latest UI guidelines for Web Apps
 
Discipline at Atlogys
Discipline at AtlogysDiscipline at Atlogys
Discipline at Atlogys
 
Reprogram your mind for Positive Thinking
Reprogram your mind for Positive ThinkingReprogram your mind for Positive Thinking
Reprogram your mind for Positive Thinking
 
Docker @ Atlogys
Docker @ AtlogysDocker @ Atlogys
Docker @ Atlogys
 
Tests for Scalable, Fast, Secure Apps
Tests for Scalable, Fast, Secure AppsTests for Scalable, Fast, Secure Apps
Tests for Scalable, Fast, Secure Apps
 
Atomic Design with PatternLabs
Atomic Design with PatternLabsAtomic Design with PatternLabs
Atomic Design with PatternLabs
 
Git and Version Control at Atlogys
Git and Version Control at AtlogysGit and Version Control at Atlogys
Git and Version Control at Atlogys
 
Guidelines HTML5 & CSS3 - Atlogys (2018)
Guidelines HTML5 & CSS3 - Atlogys (2018)Guidelines HTML5 & CSS3 - Atlogys (2018)
Guidelines HTML5 & CSS3 - Atlogys (2018)
 
Rabbit MQ - Tech Talk at Atlogys
Rabbit MQ - Tech Talk at Atlogys Rabbit MQ - Tech Talk at Atlogys
Rabbit MQ - Tech Talk at Atlogys
 
BDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy SeriesBDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy Series
 
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
 
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech TalkInfinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
 
Wordpress Tech Talk
Wordpress Tech Talk Wordpress Tech Talk
Wordpress Tech Talk
 
Tech Talk on ReactJS
Tech Talk on ReactJSTech Talk on ReactJS
Tech Talk on ReactJS
 
Atlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DBAtlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DB
 
Atlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Tech Talk - Web 2.0 Design GuidelinesAtlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Tech Talk - Web 2.0 Design Guidelines
 
Firebase Tech Talk By Atlogys
Firebase Tech Talk By AtlogysFirebase Tech Talk By Atlogys
Firebase Tech Talk By Atlogys
 
Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!
 
Smart CTO Service
Smart CTO ServiceSmart CTO Service
Smart CTO Service
 
Atlogys Technical Consulting
Atlogys Technical ConsultingAtlogys Technical Consulting
Atlogys Technical Consulting
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

How Solr Search Works

  • 1. How SOLR Search Works Rajat Jain - 20th Dec, 2016
  • 2. Agenda • What do you mean by Search? • Search Requirements • Comparison of SOLR with SQL/NoSQL • SOLR Architecture • SOLR Usage in Trellis • How Google Search Works • Other Search Technologies
  • 3. What do you mean by Search?
  • 4. What do you mean by Search?
  • 5. What do you mean by Search?
  • 6. Search Requirements • Text Search – eg. “Architects” • Filters – eg. “In New Delhi”, “iOS” • Sorting – eg. “Best Match”, “Highest Rating”, etc. • And More.. • Facets • Stemming • Fuzzy Matching • Image Search, etc.
  • 7. Search Requirements • Full Text Search • Fast reads (writes can be slower) • Various Combinations of Filters • Various Combinations of Sorting • Non Features: • Real-time – usually staleness is not a problem • Data Integrity – usually not a source of storage – can be ‘lossy’
  • 8. Search Requirements – Faceted Search • A Type of Filtering with suggestions • In most cases – sorted by number • Basically helps the user to narrow down the search without having to ‘guess’ how to narrow it
  • 9. Conventional Storage for Search • SQL (MySQL) • Relational Tables • Normalized Data • Assuming using Keys / Indexes for reads & writes • Optimized for reads and writes & transactional data (acid transactions) • Lots of security, etc. • Table Data stored in File System • Indexing - Individual columns – set of columns • Full Text search – recent addition (full text index)
  • 10. Conventional Storage for Search • No SQL (think MongoDB) • Key Value Pairs • De-normalized Data • Unstructured Data • Optimized for Reads – writes can be slightly slower (in case of transactional) • Data stored in File System • Indexing – individual fields • Full Text Search – has in-built support
  • 11. Advantages of SOLR over MySQL/NoSQL • Reversed Index • Mind-blowing Text-analysis / stemming / scoring / fuzziness • Weighting fields / boosting – custom scoring functions • Single document concept – no relations (in general) • Faceting support out-of-the box • Optimized for search and search alone (at scale without performance drop)
  • 12. SOLR Architecture – Indexing • Take a ‘document’ / field, etc. • For each field apply set of filters / tokenizers • Convert to individual tokens • Update the ‘inverted’ index based on the tokens • In general in the Index keep track of stats, etc. for the various terms • Different indexes per field
  • 13. SOLR Architecture - Indexing 13 XML Update Handler CSV Update Handler /update /update/csv XML Update with custom processor chain /update/xml Extracting RequestHandler (PDF, Word, …) /update/extract Lucene Index Data Import Handler Database pull RSS pull Simple transforms SQL DB RSS feed <doc> <title> Remove Duplicates processor Logging processor Index processor Custom Transform processor PDF HTTP POST HTTP POST pull pull Update Processor Chain (per handler) Lucene Text Index Analyzers
  • 14. SOLR Architecture – Searching • User enters query • Parse the query, i.e. apply the required filters and tokenizers • Converted to tokens • Parallel search across multiple indexes (per field) • Score all the documents • Sort in async fashion
  • 16. SOLR Architecture – Updating Index • Types of Index Updates • Instant Index • Incremental Indexing • Full Indexing • Index Update Strategies • Instant / Incremental Index cannot happen continuously • Too much causes performance degradation • Full Index periodically to optimize the index
  • 17. SOLR Architecture – Scalability • Sharding • Splitting collections across servers – search in parallel • Replication • More than one copy of the data for failover • SolrCloud • Using Zookeeper for managing clusters
  • 18. SOLR Architecture – Other Features • Stemming • Identify root word and variations of the word, eg. "stems", "stemmer", "stemming", "stemmed" as based on "stem" • Fuzzy Matching • Similar Words / Misspellings • Edit Distance • NLP • Identify Entities / Nouns in Search Query • OpenNLP Plugin for SOLR • And much more…
  • 19. SOLR Usage in Trellis • Architecture • Data-in from MySQL • Index Update Strategy • AutoComplete • Basic Search • Advanced Search • Filters / Sorting / Facets & More • Demo (Incl. Config Files)
  • 20. How Google Search Works • Crawling • Robots.txt • Indexing • Multiple Indexes – Instant / Daily / Weekly / Long Tail • Searching • NLP, Stemming, Auto-correct, etc. • Ranking – PageRank • Video - https://www.youtube.com/watch?v=BNHR6IQJGZs
  • 21. Other Search Technologies • ElasticSearch • Much newer than Solr • Built-in scalability • Uses same Lucene as the base • JSON instead of XML • Good for Analytical querying • Others • Splunk • Sphinx
  • 22. That’s All Folks References • SOLR Home Page - http://lucene.apache.org/solr/ • Tutorials • http://www.solrtutorial.com/index.h tml • https://lucene.apache.org/solr/4_10 _0/tutorial.html • Just Google the rest!!