SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
www.edureka.co/apache-solr
Leverage Apache Solr and Lucene To
Boost Your Search
View Apache Solr course details at www.edureka.co/apache-solr
For Queries :
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
For more details please contact us:
US : 1800 275 9730 (toll free)
INDIA : +91 88808 62004
Email Us : webinars@edureka.co
Slide 2 www.edureka.co/apache-solr
Objectives
At the end of this module, you will be able to understand:
The need for search engine for enterprise grade applications
The objectives & challenges of search engine
What is Indexing & Searching & Why do you need them?
How is Indexing & Searching Handled in Lucene
What is Solr & its features?
What is Solr schema & its structure?
How to achieve Bigdata/NoSQL needs using SolrCloud
Leveraging Solr Capabilities with Hadoop
About job opportunity for Solr Developers
Slide 3Slide 3Slide 3 www.edureka.co/apache-solr
Why Do I Need Search Engines ?
Slide 4Slide 4Slide 4 www.edureka.co/apache-solr
Search Engine: Why do I need them?
1. Text Based Search
2. Filter
3. Documents
1
2
3
Slide 5Slide 5Slide 5 www.edureka.co/apache-solr
Search Engine – What it should be?
If you need a storage engine to search records / documents using text-based keywords it should support following
features:
1. Should be optimized for faster text searches
2. Should have flexible schema
3. Should support sorting of documents
4. Web Scale - Should be optimized for reads
5. Should be document oriented
Slide 6Slide 6Slide 6 www.edureka.co/apache-solr
Cleartrip Spatial Search
Slide 7Slide 7Slide 7 www.edureka.co/apache-solr
What is Lucene ?
 Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
 Used by LinkedIn, Twitter, … and many more (see http://wiki.apache.org/lucene-java/PoweredBy )
 Scalable & High-performance Indexing
 Powerful, Accurate and Efficient Search Algorithms
 Cross-Platform Solution
» Open Source & 100% pure Java
» Implementations in other programming languages available that are index-compatible
Doug Cutting “Creator”
Slide 8Slide 8Slide 8 www.edureka.co/apache-solr
Indexing – How it works?
I like edureka courses
Edureka teaches big
data courses
Edureka helps learn new
technologies easily
Document - 1 (“D1”) Document - 2 (“D2”) Document - 3 (“D3”)
“edureka” = {D1, D2, D3}
“courses” = {D1, D2}
“teaches” = {D2}
“big” = {D2}
“data” = {D2}
“helps” = {D3}
“edureka”
Slide 9Slide 9Slide 9 www.edureka.co/apache-solr
Lucene – Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 10Slide 10Slide 10 www.edureka.co/apache-solr
Lucene – Searching In Index
QueryParser
Analyzer
IndexSearcherExpression
Query object
Text fragments
 Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 11Slide 11Slide 11 www.edureka.co/apache-solr
Scoring – Score Boosting
Document’s weight / score can be changed from default, which is called as boosting
 Lucene allows influencing search results by "boosting" at different times:
Scoring
Index Time
Query Time
Index-time boost by calling Field.setBoost() before
a document is added to the index
Query-time boost by setting a boost on a query clause,
calling Query.setBoost()
Slide 12Slide 12Slide 12 www.edureka.co/apache-solr
A Search System
The first step of all search engines, is a concept called
Indexing
Indexing is the processing of original data into a highly
efficient cross-reference lookup in order to facilitate rapid
searching
Analyze: Search engine does not index text directly. The
text are broken into a series of individual atomic elements
called tokens
Searching is the process of consulting the search index
and retrieving the documents matching the query, sorted
in the requested sort order
Acquire
content
Build
document
Analyze
document
Index
document
Index
Search UI
Build
query
Render
results
Run query
Slide 13Slide 13Slide 13 www.edureka.co/apache-solr
Solr is an open source enterprise search server / web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java API’s as RESTful services
You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP
You query it via HTTP GET and receive XML, JSON, CSV or binary results
What is Solr ?
Slide 14Slide 14Slide 14 www.edureka.co/apache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML, JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable, auto index replication, auto, Extensible Plugin Architecture
Solr: Key Features
Slide 15Slide 15Slide 15 www.edureka.co/apache-solr
Solr – Who is using it ?
For more information, go to: http://lucidworks.com/blog/who-uses-lucenesolr/
Slide 16Slide 16Slide 16 www.edureka.co/apache-solr
Solr: Architecture
Slide 17Slide 17Slide 17 www.edureka.co/apache-solr
Request
Handler
Query Parser
Response
Writer
Index
qt: selects a RequestHandler for a query using/select(by default, the DisMaxRequestHandler is used)
defType : selects a query parser for the query
(by default, uses whatever has been
configured for the RequestHandler)
qf: selects which fields to query
in the index(by default, all fields
are required)
wt: selects a response writer
for formatting the query
response
fq: filters query by applying an additional query to
the initial query’s results, caches the results
Rows:
specifies the
number of rows
to be displayed
at one time
Start: specifies an
offset(by default 0)
into the query results
where the returned
response should begin
Solr: Search Process
Slide 18Slide 18Slide 18 www.edureka.co/apache-solr
Velocity Search UI / Solritas
 Solr includes a sample search UI based on the VelocityResponseWriter (also known as Solritas) that
demonstrates several useful features, such as:
» Searching
» Faceting
» Highlighting
» Autocomplete
» Geospatial searching
You can access the Velocity sample Search UI here:
http://localhost:8983/solr/browse
Slide 19Slide 19Slide 19 www.edureka.co/apache-solr
Faceting
 Faceting is the arrangement of search results into categories based on indexed terms
 Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were
found for each term
 Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for
Slide 20Slide 20Slide 20 www.edureka.co/apache-solr
Faceting
 A category is an aspect of indexed documents which can be used
to classify the documents
» For example, in a collection of books at an online bookstore,
categories of a book can be its price, author, publication date,
binding type, and so on
Slide 21Slide 21Slide 21 www.edureka.co/apache-solr
Faceting
 In faceted search, in addition to the standard set
of search results, we also get facet results,
which are lists of subcategories for certain
categories
» For example, for the price facet, we get a
list of relevant price ranges; for the author
facet, we get a list of relevant authors; and
so on. In most UIs, when users click one of
these subcategories, the search is
narrowed, or drilled down, and a new
search limited to this subcategory (e.g., to a
specific price range or author) is performed
Slide 22Slide 22Slide 22 www.edureka.co/apache-solr
Demo
Slide 23Slide 23Slide 23 www.edureka.co/apache-solr
 Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability
called SolrCloud
 SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas
 Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas
 Documents can be sent to any server and ZooKeeper will figure it out
SolrCloud
Slide 24Slide 24Slide 24 www.edureka.co/apache-solr
Architecture
Slide 25Slide 25Slide 25 www.edureka.co/apache-solr
Leveraging Solr Capabilities with Hadoop
 Solr provides us fast, efficient, powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing, and will do things like automatic fail over etc.
 Hence its very suitable as NoSQL replacement for traditional databases in many situations, especially when the size of
the data exceeds what is reasonable with a typical RDBMS
 We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
 In all the major Hadoop distribution like Cloudera, Hortonworks, MapR you can integrate Solr easily
Slide 26Slide 26Slide 26 www.edureka.co/apache-solr
PDF
Word
HTML
.
.
.
Raw Files
Lucene
SolR SolR SolR
Query Response
Search
Web App
MapReduce
Indexing Job
Raw Files Indexed
HDFS
(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 27Slide 27Slide 27 www.edureka.co/apache-solr
Job trends for Apache Solr
Leverage Apache Solr and Lucene to Boost Your Search

Weitere ähnliche Inhalte

Mehr von Edureka!

Mehr von Edureka! (20)

Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 
ITIL® Tutorial for Beginners | ITIL® Foundation Training | Edureka
ITIL® Tutorial for Beginners | ITIL® Foundation Training | EdurekaITIL® Tutorial for Beginners | ITIL® Foundation Training | Edureka
ITIL® Tutorial for Beginners | ITIL® Foundation Training | Edureka
 
Difference between ITIL v3 and ITIL 4 | ITIL® Foundation Training | Edureka
Difference between ITIL v3 and ITIL 4 | ITIL® Foundation Training | EdurekaDifference between ITIL v3 and ITIL 4 | ITIL® Foundation Training | Edureka
Difference between ITIL v3 and ITIL 4 | ITIL® Foundation Training | Edureka
 
Jenkins vs Bamboo | Differences Between Jenkins and Bamboo | Edureka
Jenkins vs Bamboo | Differences Between Jenkins and Bamboo | EdurekaJenkins vs Bamboo | Differences Between Jenkins and Bamboo | Edureka
Jenkins vs Bamboo | Differences Between Jenkins and Bamboo | Edureka
 
What Is Digital Marketing? | Digital Marketing Tutorial | Edureka
What Is Digital Marketing? | Digital Marketing Tutorial | EdurekaWhat Is Digital Marketing? | Digital Marketing Tutorial | Edureka
What Is Digital Marketing? | Digital Marketing Tutorial | Edureka
 
What is JUnit? | Edureka
What is JUnit? | EdurekaWhat is JUnit? | Edureka
What is JUnit? | Edureka
 
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | EdurekaMachine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Leverage Apache Solr and Lucene to Boost Your Search

  • 1. www.edureka.co/apache-solr Leverage Apache Solr and Lucene To Boost Your Search View Apache Solr course details at www.edureka.co/apache-solr For Queries : Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : webinars@edureka.co
  • 2. Slide 2 www.edureka.co/apache-solr Objectives At the end of this module, you will be able to understand: The need for search engine for enterprise grade applications The objectives & challenges of search engine What is Indexing & Searching & Why do you need them? How is Indexing & Searching Handled in Lucene What is Solr & its features? What is Solr schema & its structure? How to achieve Bigdata/NoSQL needs using SolrCloud Leveraging Solr Capabilities with Hadoop About job opportunity for Solr Developers
  • 3. Slide 3Slide 3Slide 3 www.edureka.co/apache-solr Why Do I Need Search Engines ?
  • 4. Slide 4Slide 4Slide 4 www.edureka.co/apache-solr Search Engine: Why do I need them? 1. Text Based Search 2. Filter 3. Documents 1 2 3
  • 5. Slide 5Slide 5Slide 5 www.edureka.co/apache-solr Search Engine – What it should be? If you need a storage engine to search records / documents using text-based keywords it should support following features: 1. Should be optimized for faster text searches 2. Should have flexible schema 3. Should support sorting of documents 4. Web Scale - Should be optimized for reads 5. Should be document oriented
  • 6. Slide 6Slide 6Slide 6 www.edureka.co/apache-solr Cleartrip Spatial Search
  • 7. Slide 7Slide 7Slide 7 www.edureka.co/apache-solr What is Lucene ?  Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications  Used by LinkedIn, Twitter, … and many more (see http://wiki.apache.org/lucene-java/PoweredBy )  Scalable & High-performance Indexing  Powerful, Accurate and Efficient Search Algorithms  Cross-Platform Solution » Open Source & 100% pure Java » Implementations in other programming languages available that are index-compatible Doug Cutting “Creator”
  • 8. Slide 8Slide 8Slide 8 www.edureka.co/apache-solr Indexing – How it works? I like edureka courses Edureka teaches big data courses Edureka helps learn new technologies easily Document - 1 (“D1”) Document - 2 (“D2”) Document - 3 (“D3”) “edureka” = {D1, D2, D3} “courses” = {D1, D2} “teaches” = {D2} “big” = {D2} “data” = {D2} “helps” = {D3} “edureka”
  • 9. Slide 9Slide 9Slide 9 www.edureka.co/apache-solr Lucene – Writing to Index Field Field Field Field Analyzer IndexWriter Directory Document Classes used when indexing documents with Lucene
  • 10. Slide 10Slide 10Slide 10 www.edureka.co/apache-solr Lucene – Searching In Index QueryParser Analyzer IndexSearcherExpression Query object Text fragments  Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
  • 11. Slide 11Slide 11Slide 11 www.edureka.co/apache-solr Scoring – Score Boosting Document’s weight / score can be changed from default, which is called as boosting  Lucene allows influencing search results by "boosting" at different times: Scoring Index Time Query Time Index-time boost by calling Field.setBoost() before a document is added to the index Query-time boost by setting a boost on a query clause, calling Query.setBoost()
  • 12. Slide 12Slide 12Slide 12 www.edureka.co/apache-solr A Search System The first step of all search engines, is a concept called Indexing Indexing is the processing of original data into a highly efficient cross-reference lookup in order to facilitate rapid searching Analyze: Search engine does not index text directly. The text are broken into a series of individual atomic elements called tokens Searching is the process of consulting the search index and retrieving the documents matching the query, sorted in the requested sort order Acquire content Build document Analyze document Index document Index Search UI Build query Render results Run query
  • 13. Slide 13Slide 13Slide 13 www.edureka.co/apache-solr Solr is an open source enterprise search server / web application Solr Uses the Lucene Search Library and extends it Solr exposes lucene Java API’s as RESTful services You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP You query it via HTTP GET and receive XML, JSON, CSV or binary results What is Solr ?
  • 14. Slide 14Slide 14Slide 14 www.edureka.co/apache-solr Advanced Full-Text Search Capabilities Optimized for High Volume Web Traffic Standards Based Open Interfaces - XML, JSON and HTTP Comprehensive HTML Administration Interfaces Server statistics exposed over JMX for monitoring Near Real-time indexing and Adaptable with XML Configuration Linearly scalable, auto index replication, auto, Extensible Plugin Architecture Solr: Key Features
  • 15. Slide 15Slide 15Slide 15 www.edureka.co/apache-solr Solr – Who is using it ? For more information, go to: http://lucidworks.com/blog/who-uses-lucenesolr/
  • 16. Slide 16Slide 16Slide 16 www.edureka.co/apache-solr Solr: Architecture
  • 17. Slide 17Slide 17Slide 17 www.edureka.co/apache-solr Request Handler Query Parser Response Writer Index qt: selects a RequestHandler for a query using/select(by default, the DisMaxRequestHandler is used) defType : selects a query parser for the query (by default, uses whatever has been configured for the RequestHandler) qf: selects which fields to query in the index(by default, all fields are required) wt: selects a response writer for formatting the query response fq: filters query by applying an additional query to the initial query’s results, caches the results Rows: specifies the number of rows to be displayed at one time Start: specifies an offset(by default 0) into the query results where the returned response should begin Solr: Search Process
  • 18. Slide 18Slide 18Slide 18 www.edureka.co/apache-solr Velocity Search UI / Solritas  Solr includes a sample search UI based on the VelocityResponseWriter (also known as Solritas) that demonstrates several useful features, such as: » Searching » Faceting » Highlighting » Autocomplete » Geospatial searching You can access the Velocity sample Search UI here: http://localhost:8983/solr/browse
  • 19. Slide 19Slide 19Slide 19 www.edureka.co/apache-solr Faceting  Faceting is the arrangement of search results into categories based on indexed terms  Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found for each term  Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for
  • 20. Slide 20Slide 20Slide 20 www.edureka.co/apache-solr Faceting  A category is an aspect of indexed documents which can be used to classify the documents » For example, in a collection of books at an online bookstore, categories of a book can be its price, author, publication date, binding type, and so on
  • 21. Slide 21Slide 21Slide 21 www.edureka.co/apache-solr Faceting  In faceted search, in addition to the standard set of search results, we also get facet results, which are lists of subcategories for certain categories » For example, for the price facet, we get a list of relevant price ranges; for the author facet, we get a list of relevant authors; and so on. In most UIs, when users click one of these subcategories, the search is narrowed, or drilled down, and a new search limited to this subcategory (e.g., to a specific price range or author) is performed
  • 22. Slide 22Slide 22Slide 22 www.edureka.co/apache-solr Demo
  • 23. Slide 23Slide 23Slide 23 www.edureka.co/apache-solr  Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability called SolrCloud  SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas  Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas  Documents can be sent to any server and ZooKeeper will figure it out SolrCloud
  • 24. Slide 24Slide 24Slide 24 www.edureka.co/apache-solr Architecture
  • 25. Slide 25Slide 25Slide 25 www.edureka.co/apache-solr Leveraging Solr Capabilities with Hadoop  Solr provides us fast, efficient, powerful full-text search and near real-time indexing and SolrCloud is flexible distributed search and indexing, and will do things like automatic fail over etc.  Hence its very suitable as NoSQL replacement for traditional databases in many situations, especially when the size of the data exceeds what is reasonable with a typical RDBMS  We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr  In all the major Hadoop distribution like Cloudera, Hortonworks, MapR you can integrate Solr easily
  • 26. Slide 26Slide 26Slide 26 www.edureka.co/apache-solr PDF Word HTML . . . Raw Files Lucene SolR SolR SolR Query Response Search Web App MapReduce Indexing Job Raw Files Indexed HDFS (Hadoop Distributed File System) Scalable Indexing Input Data
  • 27. Slide 27Slide 27Slide 27 www.edureka.co/apache-solr Job trends for Apache Solr