Elasticsearch is a distributed, open source search and analytics engine built on Apache Lucene. It allows storing and searching of documents of any schema in JSON format. Documents are organized into indexes which can have multiple shards and replicas for scalability and high availability. Elasticsearch provides a RESTful API and can be easily extended with plugins. It is widely used for full-text search, structured search, analytics and more in applications requiring real-time search and analytics of large volumes of data.
2. Topics to cover
• Elasticsearch and its introduction
–
–
–
–
Cluster
Node
Index
Shards
• Primary
• secondary
• Installation
• Setup and configuration
– Data Node
– Master Node
– Serving Node
• Queries
– Varies Queries
3. What is Elasticsearch?
• Elasticsearch is a search server based on
Lucene. It provides a distributed, multitenantcapable full-text search engine with a RESTful
web interface and schema-free JSON
documents. Elasticsearch is developed in Java
and is released as open source under the
terms of the Apache License.
4. What is Apache Lucene
• Apache LuceneTM is a high-performance, fullfeatured text search engine library written
entirely in Java. It is a technology suitable for
nearly any application that requires full-text
search, especially cross-platform.
5. Features
• Real time analytics
• Distributed
• High availability
– Automatic discovery of peers in a cluster
•
•
•
•
•
•
•
Multi tenant architecture
Full text
Document oriented
Schema free
RESTful API
Per-operation persistence
Easy to extend with a plugin system for new functionality
7. Document
$ curl -XGET http://localhost:9200/gems/document/pry-0.5.9
In ElasticSearch,
everything is stored as a
Document. Document can
be addressed and
retrieved by querying
their attributes.
8. Document Types
Lets us specify document properties, so we can differentiate the
objects
Shard
Each Shard is a separate native Lucene Index.
9. Replica
An exact copy of primary Shard. Helps in setting
up High Availability, increases query throughput.
10. Index
• ElasticSearch stores its
data in logical Indices.
Think of a
table,collection or a
database.
• An Index has atleast 1
primary Shard, and 0 or
more Replicas.
11. Cluster
A collection of cooperating ElasticSearch nodes.
Gives better availability and performance via
Index Sharding and Replicas.
12. Installation
• Download and unzip the latest Elasticsearch
distribution
– http://www.elasticsearch.org/download/
• Run bin/elasticsearch -f on Unix,
or bin/elasticsearch.bat on Windows
• Run curl -X GET http://localhost:9200/
Note:ElasticSearch is built using Java, and requires at least Java 6 in order to run.
15. How to add Index
• To index that we decide on an index name ("movies"), a type name
("movie") and an id ("1") and make a request following the pattern
described above with the JSON object in the body.
curl -XPUT "http://localhost:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972
}'
16. The _search endpoint
• http://serverName:9200/_search • Search across all indexes and all types.
• http://serverName:9200/indexname/_search • Search across all types in the indexname index.
• http://serverName:9200/indexname/post/_search
•
- Search explicitly for documents of type indexname within the post index
17. Basic Queries Using Only the Query String
{endpoint}/_search?q=fashion&size=5
e.g http://fullservername.com/_search?q=fashion&size=5
curl -XGET {endpoint}/_search -d 'Query-as-JSON'
For example:
curl -XGET {endpoint}/_search -d '{
"query" : {
"term" : { "user": "kimchy" }
}
}
17
18. Match all / Find Everything
{
"query": {
"match_all": {}
}
}