Elasticsearch is a distributed, open source search and analytics engine that allows full-text searches of structured and unstructured data. It is built on top of Apache Lucene and uses JSON documents. Elasticsearch can index, search, and analyze big volumes of data in near real-time. It is horizontally scalable, fault tolerant, and easy to deploy and administer.
2. Agenda
• Overview
– History, Product overview
– ES Vocabulary
– Feature set
• Demo
– Setup/ Configuration
– Eco system
– APIs for Index/Search & monitor
3. What is ElasticSearch?
– Document (Json) oriented search engine
– Distributed
– Horizontally scalable and Highly Available
– Multi-tenancy enabled
– API centric & RESTful
– Built on Lucene search engine library
& used for
– full-text search, structured search, analytics, or all
three in combination
4. • Elastic search has become de facto search
solution
• few popular examples
• GitHub uses Elasticsearch to query 130 billion lines of
code.
• Wikipedia uses Elasticsearch to provide full-text search
with highlighted search snippets, and search-as-you-
type and did-you-mean suggestions.
• Stack Overflow combines full-text search with
geolocation queries and uses more-like-this to find
related questions and answers.
5. History
Shay Benon @kimchy
Doug Cutting @cutting
Started Lucene in 1999, released under apache in
2005.
Now part of cloudera supporting rival solution solr
and commercial offerings
Elasticsearch released in February 2010.
Worked on this for 6 years (started with compass)
Now part of http://elastic.co commercial offerings
6. Building Blocks
Term Description ( ~analogy with relational database)
Cluster ~Database cluster
Group of nodes
Node ~Instance of database
A JVM process, usually a machine
Index ~Database schema
Hosts mapping types and their definitions contains
many shards
Mapping Type ~Database Table
Field description, indexing requirements
Document ~Database row
Json document.
Shard A Lucene index. Scalable unit and heart of search
engine (primary and replica)
10. value add over lucene
• Distributed
– Combines results with fork join against multiple indexes, with the new building blocks
• Transaction Log
– The transaction log guarantees durability, Operations are automatically replayed when a
shard is reopened
– It also simplifies shard relocation/recovery, Helps when moving a shard from one node
to another by being able to replay the changes while transferring committed segments
• Flush/Refresh/Monitor APIs
– For managing the cluster/node/index statuses
• Query DSL
– provides huge set of grammar for search syntax
12. Document Metadata Fields
• _id - The id of the document
• _type - The document type
• _source - enabled Stores the original document that
was indexed
• _all enabled Indexes all values of all document fields
• _timestamp disabled timestamp associated with the
document
• _ttl disabled optionally defines an expiration time
• _size disabled indexes the size of the uncompressed
16. Search Types
• COUNT
• Returns no hits, only total count matching the query,
thus executes in a
• single round trip to the shards
• SCAN
• Allows to iterate over large amounts of data using a
cursor to paginate and hence memory efficient, helpful
for re-indexing and decorating data outside the ES.
• SEARCH
• General search
21. Few interesting Features
• Bulk Indexing
– Send multiple docs to ES
• Multi Get APIs
– Get multiple documents in a single API
• Percolator
– The idea is to have ES to notify your application when new content matches your filters
instead of having to constantly poll the search engine to check for new updates. Great
for building alerts
• Pagination
• Highlighting
25. Configuration
• Enabling store compression uses 55% less
storage (LZF/snappy)
• Disabling the '_all' field saves you 13% in
storage.
• Removing _source saves ~26% storage on disk
• ES_HEAP_SIZE set it ½ of the machine memory
(os file cache)
• bootstrap.mlockall to true avoids swap
26. References
• https://www.youtube.com/watch?v=5444z-L2V2A&spfreload=1 - “Lucene now
and then” from Lucene creator Doug Cutting @ twitter, Gives history and how
lucene evolved.
• https://www.youtube.com/watch?v=lpZ6ZajygDY - from elastic search creator
Shay Benon (Its 3 years old, but its very good content on data design patterns)
• https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html -
Official documentation from elasticsearch
• https://www.manning.com/books/elasticsearch-in-action - From this place
diagrams were picked in this presentation