2. Presentation Agenda
Team Introduction
Basics and History
Use Cases & Current Usage
Highlights
Appendix
DISCLAIMER: This is a knowledge-sharing
session and not a recommendation for any
specific technology / product
From the web
Migration
Distributed Database Architecture 2
4. Basics
1
2
3
4
• Used for Indexing and Searching
• Built on top of Lucene API
• Solr and ES take Lucene API and build features on
top. API accessed through web server
• Smaller version of Google which has indexed and
ranked the web pages
Search platform for Web sites. Search platform for organization.
• Lucene – search engine packaged together in
set of jar files
Distributed Database Architecture 4
5. History
• Differences in design and architecture.
Distributed Database Architecture 5
ES was released in 2010.
Additional features.
Solr released in 2008.
6. Key Players: Solr and ElasticSearch
1
2
3
Latest Version= Solr 4.6.1
released on Jan 28, 2014
Collection – Main logical
structure for Solr
Index – Main logical structure for
ES
Architecture
• Distributed
• Fault tolerant and auto
replicas
• Coord: Only ElasticSearch
nodes + zen discovery. Split
brain.
• Single leader
• Automatic leader election
Solr ElasticSearch (ES)
Latest Version= ElasticSearch
1.0.0 released on Feb 12, 2014
Architecture
• Distributed
• Fault tolerant and auto
replicas
• Coord: Apache Solr +
ZooKeeper ensemble. So
quorum
• Leader per shard
• Automatic leader election
Distributed Database Architecture 6
7. Resume recommendations
UseCase1
Challenge
• Company ABC helps other firms hire skilled developers, project
managers. Empower customers to find the right job candidate
from a database of 8 million profiles.
• Need fast and predictable performance.
• Include geo-spatial.
Success
• Customer hires using the company ABC.
• ABC stores searches made by customers.
• Identify candidates, skills, compensation structure to
enhance the customer search experience with better
matches.
• Make recommendations to customers on salaries, future
market needs etc.
• Eliminate duplicate profiles with realtime indexing and
percolation.
• Provides enhanced customers experience, faster
responses
Opportunity
• Use ES as the search engine with realtime indexing
and nested querying.
Point
Distributed Database Architecture 7
8. Integration - Use Case 2
THE
FULL
CIRCLE
Kibana
Visualization engine for
dynamic dashboards created
in real-time or on-the-fly
ElasticSearch
Search, analyze in realtime
Logstash
Take logs, scrub, parse and
enrich the data
Distributed Database Architecture 8
9. Chatagent for 460 million documents – Use Case 3
9
Challenge
6,000 customers from around the world use LiveChat daily to communicate with their customers from one person owned businesses to
international organizations like LG, Apple, Adobe etc.
LiveChat customers conduct 3.6 million queries and 220 million “get” operations per day on 460 million documents. LiveChat keeps these
documents updated with 70 million indexing operations every day.
Solution
Advantage
• Reduce query time from 2 seconds to 100 ms
• Streamline updating from hours to seconds
• Guarantee maximum uptime
• Scale to meet the needs of 6,000 customers
• Store and search on 460 million documents
• Process 3.6 million queries per day
• Scalability, indexing, Full text search allows users to search through chat archives
• Faceting makes it possible to pull various statistics for LiveChat clients.
• ES acts as single datastore, data updates available immediately - Now each of the documents is updated in LiveChat on an average of 20 to
30 times every 20 to 60 seconds.
Distributed Database Architecture
10. Current Uses
1
2
3
4
• Use Case 1
• Use Case 2
• Use Case 4
• Use Case 3
x • Use Case X
10Distributed Database Architecture
11. Highlights
Schema and config –
Solrconfig.xml, es.yml – change
no. of shards and replicas live
Scaling - nodes autobalanced,
/ Solr -3755 or shard splitting /add a
document
Nesting (address, users & rights,
boolean, parent children)
Index=different types of
documents and analyzer
Point
Node discovery and fault
discovery. Zookeeper
Point
Multiple documents per schema
and parent-child
Point
Percolator
Point
Aggregation+facets in ES
/Facets in Solr
Distributed Database Architecture 11
12. Highlights (contd. 2)
Auto-load balancer and auto-sharding
Marvel metrics on 03/13/2014
Brain Split problem in ES
Structured queryDSL and query control
Real-time indexing /near real-time indexing
Query routing and Solr 5816 to be introduced
1
2
3
4
5
6
Distributed Database Architecture 12
13. ElasticSearch / Solr funnel
UIMA
Text analysis debugger,
spell check
Decision tree faceting /
Drilldown
Cloudera, Mapr, DataStax
support Solr
Filters for queries across
nested documents
Query handling analyzer and
language, term suggester,
autocomplete
Realtime GET with query routing
Hortonworks, Couchbase
support ElasticSearch
Distributed Database Architecture 13
14. FROM THE WEB
Web CPA
This is only an FYI: Found some customers moving from Solr to ElasticSearch but
could not find any article which mentioned that clients moved from ES to Solr.
Caveat: No prejudice but it would be good to hear what customers say.
Let us also check this site: http://www.ymc.ch/en/why-we-chose-solr-4-0-instead-of-elasticsearch
http://www.mgt-commerce.com/magento-elasticsearch.html
Foursquare= http://engineering.foursquare.com/2012/08/09/foursquare-now-uses-
elastic-search-and-on-a-related-note-slashem-also-works-with-elastic-search/
Jetwick= http://karussell.wordpress.com/2011/02/07/why-jetwick-moved-from-solr-
to-elasticsearch/
Netricos= http://www.netricos.com/blog/posts/how-we-are-using-elastic-search
Stumbleupon = http://www.elasticsearch.org/case-study/stumbleupon/
UK govt. site= https://gds.blog.gov.uk/2012/08/03/from-solr-to-elasticsearch/
Wikimedia= http://thenextweb.com/insider/2014/01/06/wikimedia-will-replace-
search-elasticsearch-beta-users-february-users-march-april/#!xDKnd
Distributed Database Architecture 14
15. 2 Parts of a whole – The Math
Solr performs very well on small
indexes that don’t change very often
1
Scalability, auto-sharding, GUI
admin, schemaless, real-time,
nested queries, routing and the
way indexing and queries are
handled which provide faster
execution of queries and better
indexing provide a distinct
advantage to using ES
2
Solr
ElasticSearch
Distributed Database Architecture 15
16. Migration
Step 1
Use river plugin to migrate
from existing Solr to ES.
Step 2
Pulls the content from
existing Solr cluster and
index it in ES
Step 3
When you decide to switch to
Elasticsearch permanently, you would
obviously switch your indexing to
directly index content from your
sources to Elasticsearch. Keeping Solr
in the middle is not a recommended
setup.
Distributed Database Architecture 16
17. If we have a small site and need
search features without the
distributed bells-and-whistles,
both Solr and ElasticSearch are
efficient
If we are planning a large
installation that requires
running distributed search
with nesting, scalability,
sharding, real-time
ElasticSearch can do a better
job.
Conclusion
Distributed Database Architecture 17
Both products
trying to catch-up
based on other
product’s capabilities
18. Where do we go from here ?
---------------------------------------
The best way to define this is:
Some possible next steps….
Question to ask
Distributed Database Architecture 18