SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Downloaden Sie, um offline zu lesen
Learning
ElasticSearch
—
Fifth Elephant 2013, Bangalore.
Anurag Patel Red Hat
http://xinh.org/5el
Also available at
ElasticWho?
ElasticSearch is a flexible and powerful open source, distributed
real-time search and analytics engine.
Features
Real time analytics
Distributed
High availability
Multi tenant architecture
Full text
Document oriented
Schema free
RESTful API
Per-operation persistence
Distributed
Start small and scale horizontally out of the box. For more capacity,
just add more nodes and let the cluster reorganize itself.
High Availability
ElasticSearch clusters detect and remove failed nodes, and
reorganize themselves.
Multi Tenancy
A cluster can host multiple indices which can be queried
independently, or as a group.
$ curl -XPUT http://localhost:9200/people
$ curl -XPUT http://localhost:9200/gems
$ curl -XPUT http://localhost:9200/gems/document/pry-0.5.9
$ curl -XGET http://localhost:9200/gems/document/pry-0.5.9
Document Oriented
Store complex real world entities in Elasticsearch as structured JSON
documents.
{
"_id": "pry-0.5.9",
"_index": "gems",
"_source": {
"authors": [
"John Mair (banisterfiend)"
],
"autorequire": null,
"bindir": "bin",
"cert_chain": [],
"date": "Sun Feb 20 11:00:00 UTC 2011",
"default_executable": null,
"description": "attach an irb-like session to any object at runtime",
"email": "jrmair@gmail.com"
}
}
RESTful API
Almost any operation can be performed using a simple RESTful
interface using JSON over HTTP.
curl -X GET
curl -X PUT
curl -X POST
curl -X DELETE
Apache Lucene
ElasticSearch is built on top of Apache Lucene. Lucene is a high
performance, full-featured Information Retrieval library, written in
Java.
ElasticSearch Terminology
Document
$ curl -XGET http://localhost:9200/gems/document/pry-0.5.9
In ElasticSearch, everything is stored as a Document. Document can
be addressed and retrieved by querying their attributes.
{
"_id": "pry-0.5.9",
"_index": "gems",
"_source": {
"authors": [
"John Mair (banisterfiend)"
],
"autorequire": null,
"bindir": "bin",
"cert_chain": [],
"date": "Sun Feb 20 11:00:00 UTC 2011",
"default_executable": null,
"description": "attach an irb-like session to any object at runtime",
"email": "jrmair@gmail.com",
"executables": [
"pry"
],
"extensions": [],
"extra_rdoc_files": [],
"files": [
"lib/pry/commands.rb",
"lib/pry/command_base.rb",
"lib/pry/completion.rb",
"lib/pry/core_extensions.rb",
"lib/pry/hooks.rb",
"lib/pry/print.rb",
"lib/pry/prompts.rb",
"lib/pry/pry_class.rb",
"lib/pry/pry_instance.rb",
"lib/pry/version.rb",
"lib/pry.rb",
"examples/example_basic.rb",
Document Types
Lets us specify document properties, so we can differentiate the
objects.
Shard
Each Shard is a separate native Lucene Index. Lets us overcome RAM
limitations, hard disk capacity.
Replica
An exact copy of primary Shard. Helps in setting up HA, increases
query throughput.
Index
ElasticSearch stores its data in logical Indices. Think of a table,
collection or a database.
An Index has atleast 1 primary Shard, and 0 or more Replicas.
Cluster
A collection of cooperating ElasticSearch nodes. Gives better
availability and performance via Index Sharding and Replicas.
ElasticSearch Workshop
Download and start
Download ElasticSearch from
http://www.elasticsearch.org/download
# service elasticsearch start
# /etc/init.d/elasticsearch start
# ./bin/elasticsearch -f
ElasticSearch Plugins
A site plugin to view contents of ElasticSearch cluster.
Restart ElasticSearch. Plugins are detected and loaded on service
startup.
# cd /usr/share/elasticsearch
# ./bin/plugin -install mobz/elasticsearch-head
# cd /opt/elasticsearch-0.90.2
# ./bin/plugin -install mobz/elasticsearch-head
elasticsearch-head
RESTful interface
$ curl -XGET 'http://localhost:9200/'
{
"ok" : true,
"status" : 200,
"name" : "Drake, Frank",
"version" : {
"number" : "0.90.2",
"snapshot_build" : false,
"lucene_version" : "4.3.1"
},
"tagline" : "You Know, for Search"
}
Create Index
$ curl -XPUT 'http://localhost:9200/gems'
{
"ok":true,
"acknowledged":true
}
Cluster status
$ curl -XGET 'localhost:9200/_status'
{"ok":true,"_shards":{"total":20,"successful":10,"failed":0},
"indices":{"gems":{"index":{"primary_size":"495b","primary_size_in_bytes":495,
"size":"495b","size_in_bytes":495},"translog":{"operations":0},
"docs":{"num_docs":0,"max_doc":0,"deleted_docs":0},"merges":
{"current":0,"current_docs":0,"current_size":"0b","current_size_in_bytes":0,
"total":0,"total_time":"0s","total_time_in_millis":0,"total_docs":0,
"total_size":"0b","total_size_in_bytes":0},
...
...
...
Pretty Output
$ curl -XGET 'localhost:9200/_status?pretty'
$ curl -XGET 'localhost:9200/_status' | python -mjson.tool
$ curl -XGET 'localhost:9200/_status' | json_reformat
{
"ok": true,
"_shards": {
"total": 20,
"successful": 10,
"failed": 0
},
"indices": {
"gems": {
"index": {
"primary_size": "495b",
"primary_size_in_bytes": 495,
"size": "495b",
"size_in_bytes": 495
},
...
Delete Index
$ curl -XDELETE 'http://localhost:9200/gems'
{
"ok":true,
"acknowledged":true
}
Create custom Index
{
"settings" : {
"index" : {
"number_of_shards" : 6,
"number_of_replicas" : 0
}
}
}
$ curl -XPUT 'http://localhost:9200/gems' -d @body.json
{
"ok":true,
"acknowledged":true
}
Index a document
{
"name": "pry",
"platform": "ruby",
"rubygems_version": "1.5.2",
"description": "attach an irb-like session to any object at runtime",
"email": "anurag@example.com",
"has_rdoc": true,
"homepage": "http://banisterfiend.wordpress.com"
}
$ curl -XPOST 'http://localhost:9200/gems/test/' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"lsJgxiwET6eg",
"_version":1
}
Get document
$ curl -XGET 'http://localhost:9200/gems/test/lsJgxiwET6eg' | python -mjson.tool
{
"_id": "lsJgxiwET6eg",
"_index": "gems",
"_source": {
"description": "attach an irb-like session to any object at runtime",
"email": "anurag@example.com",
"has_rdoc": true,
"homepage": "http://banisterfiend.wordpress.com",
"name": "pry",
"platform": "ruby",
"rubygems_version": "1.5.2"
},
"_type": "test",
"_version": 1,
"exists": true
}
Index another document
{
"name": "grit",
"platform": "jruby",
"rubygems_version": "2.5.0",
"description": "Ruby library for extracting information from a git repository.",
"email": "mojombo@github.com",
"has_rdoc": false,
"homepage": "http://github.com/mojombo/grit"
}
$ curl -XPOST 'http://localhost:9200/gems/test/' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"ijUOHi2cQc2",
"_version":1
}
Custom Document IDs
IDs are unique across Index. Composed of DocumentType and ID.
{
"name": "grit",
"platform": "jruby",
"rubygems_version": "2.5.1",
"description": "Ruby library for extracting information from a git repository.",
"email": "mojombo@github.com",
"has_rdoc": false,
"homepage": "http://github.com/mojombo/grit"
}
$ curl -XPUT 'http://localhost:9200/gems/test/grit-2.5.1' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":1
}
Document Versions
$ curl -XPUT 'http://localhost:9200/gems/test/grit-2.5.1' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":2
}
Searching Documents
{
"query": {
"term": {"name": "pry"}
}
}
$ curl -XPOST http://localhost:9200/gems/_search -d @body.json | python -mjson.tool
{
"_shards": {
"failed": 0,
"successful": 6,
"total": 6
},
"hits": {
"hits": [
{
"_id": "MWkKgzsMRgK",
"_index": "gems",
"_score": 1.4054651,
"_source": {
"description": "attach an irb-like session to any object at runtime",
"email": "anurag@example.com",
"has_rdoc": true,
"homepage": "http://banisterfiend.wordpress.com",
"name": "pry",
"platform": "ruby",
"rubygems_version": "1.5.2"
},
"_type": "test"
}
],
"max_score": 1.4054651,
"total": 1
Counting Documents
{
"term": {"name": "pry"}
}
$ curl -XGET http://localhost:9200/gems/test/_count -d @body.json
{
"_shards": {
"failed": 0,
"successful": 6,
"total": 6
},
"count": 1
}
Update a Document
The partial document is merged using simple recursive merge.
{
"doc": {
"platform": "macruby"
}
}
$ curl -XPOST http://localhost:9200/gems/test/grit-2.5.1/_update -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":4
}
Update via Script
{
"script" : "ctx._source.platform = vm_name",
"params" : {
"vm_name" : "rubinius"
}
}
$ curl -XPOST http://localhost:9200/gems/test/grit-2.5.1/_update -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":5
}
Delete Document
$ curl -XDELETE 'http://localhost:9200/gems/test/grit-2.5.1'
{
"ok":true,
"found":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":6
}
Put Mapping
{
"gem" : {
"properties" : {
"name" : {"type" : "string", "index": "not_analyzed"},
"platform" : {"type" : "string", "index": "not_analyzed"},
"rubygems_version" : {"type" : "string", "index": "not_analyzed"},
"description" : {"type" : "string", "store" : "yes"},
"has_rdoc" : {"type" : "boolean"}
}
}
}
$ curl -XPUT 'http://localhost:9200/gems/gem/_mapping' -d @body.json
$ curl -XGET 'http://localhost:9200/gems/_mapping' | python -mjson.tool
Index Document with Mapping
{
"name": "grit",
"platform": "ruby",
"rubygems_version": "2.5.1",
"description": "Ruby library for extracting information from a git repository.",
"email": "mojombo@github.com",
"has_rdoc": false,
"homepage": "http://github.com/mojombo/grit"
}
$ curl -XPUT 'http://localhost:9200/gems/gem/grit-2.5.1' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"gem",
"_id":"grit-2.5.1",
"_version":1
}
Matching documents
{
"query": {
"match" : {
"description" : "git repository"
}
}
}
$ curl -XPOST http://localhost:9200/gems/gem/_search -d @body.json
Highlighting
{
"query": {
"match" : {
"description" : "git repository"
}
},
"highlight" : {
"fields" : {
"description" : {}
}
}
}
$ curl -XPOST http://localhost:9200/gems/gem/_search -d @body.json
"highlight": {
"description": [
"Ruby library for extracting information from a <em>git</em> <em>repository</em>."
]
}
Search Facets
{
"query": { "match_all" : {} },
"facets" : {
"gem_names" : {
"terms" : { "field": "name" }
}
}
}
$ curl -XPOST http://localhost:9200/gems/_search -d @body.json
...
"facets": {
"gem_names": {
"_type": "terms",
"missing": 0,
"other": 0,
"terms": [
{
"count": 2,
"term": "pry"
},
{
"count": 2,
"term": "grit"
},
{
"count": 1,
"term": "abc"
}
],
"total": 5
}
},
(Lab)
Analyzing Aadhaar's Datasets
Download Public Dataset
Download from Aadhaar Public Data Portal at
https://data.uidai.gov.in
Download Tools
$ git clone https://github.com/gnurag/aadhaar
Prepare Data & Configure
# gem install yajl-ruby tire activesupport
$ git clone https://github.com/gnurag/aadhaar
$ cd aadhaar/data
$ unzip UIDAI-ENR-DETAIL-20121001.zip
$ cd ../bin
$ vi aadhaar.rb
Configuration
AADHAAR_DATA_DIR = "/path/to/aadhaar/data"
ES_URL = "http://localhost:9200"
ES_INDEX = 'aadhaar'
ES_TYPE = "UID"
BATCH_SIZE = 1000
Index
$ ruby aadhaar.rb
Running Examples
$ curl -XPOST http://localhost:9200/aadhaar/UID/_search -d
@template.json | python -mjson.tool
Additional Notes
Index Aliases
Group multiple Indexes, and query them together.
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "add" : { "index" : "index1", "alias" : "master-alias" } }
{ "add" : { "index" : "index2", "alias" : "master-alias" } }
]
}'
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "index2", "alias" : "master-alias" } }
]
}'
Document Routing
Control which Shard the document will be placed and queried from.
Parents & Children
$ curl -XPUT http://localhost:9200/gems/gem/roxml?parent=rexml -d '{
"tag" : "something"
}'
Custom Analyzers
Boosting Search Results
ElasticSearch Ecosystem
A wide range of site plugins, analyzers, river plugins available from
the community.
THE END/@gnurag github

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
Yogiji Creations
 

Was ist angesagt? (20)

Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearch
 
Online index rebuild automation
Online index rebuild automationOnline index rebuild automation
Online index rebuild automation
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Top 65 SQL Interview Questions and Answers | Edureka
Top 65 SQL Interview Questions and Answers | EdurekaTop 65 SQL Interview Questions and Answers | Edureka
Top 65 SQL Interview Questions and Answers | Edureka
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Apache Cassandra - Diagnostics and monitoring
Apache Cassandra - Diagnostics and monitoringApache Cassandra - Diagnostics and monitoring
Apache Cassandra - Diagnostics and monitoring
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
 

Andere mochten auch

Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
Amazee Labs
 

Andere mochten auch (20)

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Elasticsearch Query DSL - Not just for wizards...
Elasticsearch Query DSL - Not just for wizards...Elasticsearch Query DSL - Not just for wizards...
Elasticsearch Query DSL - Not just for wizards...
 
What's new in Elasticsearch v5
What's new in Elasticsearch v5What's new in Elasticsearch v5
What's new in Elasticsearch v5
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Scaling real-time search and analytics with Elasticsearch
Scaling real-time search and analytics with ElasticsearchScaling real-time search and analytics with Elasticsearch
Scaling real-time search and analytics with Elasticsearch
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
 
Plack at OSCON 2010
Plack at OSCON 2010Plack at OSCON 2010
Plack at OSCON 2010
 
Elasticsearch, the story so far
Elasticsearch, the story so farElasticsearch, the story so far
Elasticsearch, the story so far
 
HTML5, Websockets & the Mobile Web
HTML5, Websockets & the Mobile WebHTML5, Websockets & the Mobile Web
HTML5, Websockets & the Mobile Web
 
Elasticsearch Workshop
Elasticsearch WorkshopElasticsearch Workshop
Elasticsearch Workshop
 
Campaign Technology
Campaign TechnologyCampaign Technology
Campaign Technology
 
Getting Started Of Elasticsearch
Getting Started Of ElasticsearchGetting Started Of Elasticsearch
Getting Started Of Elasticsearch
 
Unit Testing and Tools - ADNUG
Unit Testing and Tools - ADNUGUnit Testing and Tools - ADNUG
Unit Testing and Tools - ADNUG
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Show me the problem- Our insights journey at Netflix
Show me the problem- Our insights journey at NetflixShow me the problem- Our insights journey at Netflix
Show me the problem- Our insights journey at Netflix
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1
 

Ähnlich wie Workshop: Learning Elasticsearch

Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 

Ähnlich wie Workshop: Learning Elasticsearch (20)

Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
Voxxed Athens 2018 - Elasticsearch (R)Evolution — You Know, for Search...
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Java clients for elasticsearch
Java clients for elasticsearchJava clients for elasticsearch
Java clients for elasticsearch
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with Elasticsearch
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Workshop: Learning Elasticsearch

  • 1. Learning ElasticSearch — Fifth Elephant 2013, Bangalore. Anurag Patel Red Hat
  • 3. ElasticWho? ElasticSearch is a flexible and powerful open source, distributed real-time search and analytics engine.
  • 4. Features Real time analytics Distributed High availability Multi tenant architecture Full text Document oriented Schema free RESTful API Per-operation persistence
  • 5. Distributed Start small and scale horizontally out of the box. For more capacity, just add more nodes and let the cluster reorganize itself.
  • 6. High Availability ElasticSearch clusters detect and remove failed nodes, and reorganize themselves.
  • 7. Multi Tenancy A cluster can host multiple indices which can be queried independently, or as a group. $ curl -XPUT http://localhost:9200/people $ curl -XPUT http://localhost:9200/gems $ curl -XPUT http://localhost:9200/gems/document/pry-0.5.9 $ curl -XGET http://localhost:9200/gems/document/pry-0.5.9
  • 8. Document Oriented Store complex real world entities in Elasticsearch as structured JSON documents. { "_id": "pry-0.5.9", "_index": "gems", "_source": { "authors": [ "John Mair (banisterfiend)" ], "autorequire": null, "bindir": "bin", "cert_chain": [], "date": "Sun Feb 20 11:00:00 UTC 2011", "default_executable": null, "description": "attach an irb-like session to any object at runtime", "email": "jrmair@gmail.com" } }
  • 9. RESTful API Almost any operation can be performed using a simple RESTful interface using JSON over HTTP. curl -X GET curl -X PUT curl -X POST curl -X DELETE
  • 10. Apache Lucene ElasticSearch is built on top of Apache Lucene. Lucene is a high performance, full-featured Information Retrieval library, written in Java.
  • 12. Document $ curl -XGET http://localhost:9200/gems/document/pry-0.5.9 In ElasticSearch, everything is stored as a Document. Document can be addressed and retrieved by querying their attributes. { "_id": "pry-0.5.9", "_index": "gems", "_source": { "authors": [ "John Mair (banisterfiend)" ], "autorequire": null, "bindir": "bin", "cert_chain": [], "date": "Sun Feb 20 11:00:00 UTC 2011", "default_executable": null, "description": "attach an irb-like session to any object at runtime", "email": "jrmair@gmail.com", "executables": [ "pry" ], "extensions": [], "extra_rdoc_files": [], "files": [ "lib/pry/commands.rb", "lib/pry/command_base.rb", "lib/pry/completion.rb", "lib/pry/core_extensions.rb", "lib/pry/hooks.rb", "lib/pry/print.rb", "lib/pry/prompts.rb", "lib/pry/pry_class.rb", "lib/pry/pry_instance.rb", "lib/pry/version.rb", "lib/pry.rb", "examples/example_basic.rb",
  • 13. Document Types Lets us specify document properties, so we can differentiate the objects.
  • 14. Shard Each Shard is a separate native Lucene Index. Lets us overcome RAM limitations, hard disk capacity.
  • 15. Replica An exact copy of primary Shard. Helps in setting up HA, increases query throughput.
  • 16. Index ElasticSearch stores its data in logical Indices. Think of a table, collection or a database. An Index has atleast 1 primary Shard, and 0 or more Replicas.
  • 17. Cluster A collection of cooperating ElasticSearch nodes. Gives better availability and performance via Index Sharding and Replicas.
  • 19. Download and start Download ElasticSearch from http://www.elasticsearch.org/download # service elasticsearch start # /etc/init.d/elasticsearch start # ./bin/elasticsearch -f
  • 20. ElasticSearch Plugins A site plugin to view contents of ElasticSearch cluster. Restart ElasticSearch. Plugins are detected and loaded on service startup. # cd /usr/share/elasticsearch # ./bin/plugin -install mobz/elasticsearch-head # cd /opt/elasticsearch-0.90.2 # ./bin/plugin -install mobz/elasticsearch-head
  • 22. RESTful interface $ curl -XGET 'http://localhost:9200/' { "ok" : true, "status" : 200, "name" : "Drake, Frank", "version" : { "number" : "0.90.2", "snapshot_build" : false, "lucene_version" : "4.3.1" }, "tagline" : "You Know, for Search" }
  • 23. Create Index $ curl -XPUT 'http://localhost:9200/gems' { "ok":true, "acknowledged":true }
  • 24. Cluster status $ curl -XGET 'localhost:9200/_status' {"ok":true,"_shards":{"total":20,"successful":10,"failed":0}, "indices":{"gems":{"index":{"primary_size":"495b","primary_size_in_bytes":495, "size":"495b","size_in_bytes":495},"translog":{"operations":0}, "docs":{"num_docs":0,"max_doc":0,"deleted_docs":0},"merges": {"current":0,"current_docs":0,"current_size":"0b","current_size_in_bytes":0, "total":0,"total_time":"0s","total_time_in_millis":0,"total_docs":0, "total_size":"0b","total_size_in_bytes":0}, ... ... ...
  • 25. Pretty Output $ curl -XGET 'localhost:9200/_status?pretty' $ curl -XGET 'localhost:9200/_status' | python -mjson.tool $ curl -XGET 'localhost:9200/_status' | json_reformat { "ok": true, "_shards": { "total": 20, "successful": 10, "failed": 0 }, "indices": { "gems": { "index": { "primary_size": "495b", "primary_size_in_bytes": 495, "size": "495b", "size_in_bytes": 495 }, ...
  • 26. Delete Index $ curl -XDELETE 'http://localhost:9200/gems' { "ok":true, "acknowledged":true }
  • 27. Create custom Index { "settings" : { "index" : { "number_of_shards" : 6, "number_of_replicas" : 0 } } } $ curl -XPUT 'http://localhost:9200/gems' -d @body.json { "ok":true, "acknowledged":true }
  • 28. Index a document { "name": "pry", "platform": "ruby", "rubygems_version": "1.5.2", "description": "attach an irb-like session to any object at runtime", "email": "anurag@example.com", "has_rdoc": true, "homepage": "http://banisterfiend.wordpress.com" } $ curl -XPOST 'http://localhost:9200/gems/test/' -d @body.json { "ok":true, "_index":"gems", "_type":"test", "_id":"lsJgxiwET6eg", "_version":1 }
  • 29. Get document $ curl -XGET 'http://localhost:9200/gems/test/lsJgxiwET6eg' | python -mjson.tool { "_id": "lsJgxiwET6eg", "_index": "gems", "_source": { "description": "attach an irb-like session to any object at runtime", "email": "anurag@example.com", "has_rdoc": true, "homepage": "http://banisterfiend.wordpress.com", "name": "pry", "platform": "ruby", "rubygems_version": "1.5.2" }, "_type": "test", "_version": 1, "exists": true }
  • 30. Index another document { "name": "grit", "platform": "jruby", "rubygems_version": "2.5.0", "description": "Ruby library for extracting information from a git repository.", "email": "mojombo@github.com", "has_rdoc": false, "homepage": "http://github.com/mojombo/grit" } $ curl -XPOST 'http://localhost:9200/gems/test/' -d @body.json { "ok":true, "_index":"gems", "_type":"test", "_id":"ijUOHi2cQc2", "_version":1 }
  • 31. Custom Document IDs IDs are unique across Index. Composed of DocumentType and ID. { "name": "grit", "platform": "jruby", "rubygems_version": "2.5.1", "description": "Ruby library for extracting information from a git repository.", "email": "mojombo@github.com", "has_rdoc": false, "homepage": "http://github.com/mojombo/grit" } $ curl -XPUT 'http://localhost:9200/gems/test/grit-2.5.1' -d @body.json { "ok":true, "_index":"gems", "_type":"test", "_id":"grit-2.5.1", "_version":1 }
  • 32. Document Versions $ curl -XPUT 'http://localhost:9200/gems/test/grit-2.5.1' -d @body.json { "ok":true, "_index":"gems", "_type":"test", "_id":"grit-2.5.1", "_version":2 }
  • 33. Searching Documents { "query": { "term": {"name": "pry"} } } $ curl -XPOST http://localhost:9200/gems/_search -d @body.json | python -mjson.tool { "_shards": { "failed": 0, "successful": 6, "total": 6 }, "hits": { "hits": [ { "_id": "MWkKgzsMRgK", "_index": "gems", "_score": 1.4054651, "_source": { "description": "attach an irb-like session to any object at runtime", "email": "anurag@example.com", "has_rdoc": true, "homepage": "http://banisterfiend.wordpress.com", "name": "pry", "platform": "ruby", "rubygems_version": "1.5.2" }, "_type": "test" } ], "max_score": 1.4054651, "total": 1
  • 34. Counting Documents { "term": {"name": "pry"} } $ curl -XGET http://localhost:9200/gems/test/_count -d @body.json { "_shards": { "failed": 0, "successful": 6, "total": 6 }, "count": 1 }
  • 35. Update a Document The partial document is merged using simple recursive merge. { "doc": { "platform": "macruby" } } $ curl -XPOST http://localhost:9200/gems/test/grit-2.5.1/_update -d @body.json { "ok":true, "_index":"gems", "_type":"test", "_id":"grit-2.5.1", "_version":4 }
  • 36. Update via Script { "script" : "ctx._source.platform = vm_name", "params" : { "vm_name" : "rubinius" } } $ curl -XPOST http://localhost:9200/gems/test/grit-2.5.1/_update -d @body.json { "ok":true, "_index":"gems", "_type":"test", "_id":"grit-2.5.1", "_version":5 }
  • 37. Delete Document $ curl -XDELETE 'http://localhost:9200/gems/test/grit-2.5.1' { "ok":true, "found":true, "_index":"gems", "_type":"test", "_id":"grit-2.5.1", "_version":6 }
  • 38. Put Mapping { "gem" : { "properties" : { "name" : {"type" : "string", "index": "not_analyzed"}, "platform" : {"type" : "string", "index": "not_analyzed"}, "rubygems_version" : {"type" : "string", "index": "not_analyzed"}, "description" : {"type" : "string", "store" : "yes"}, "has_rdoc" : {"type" : "boolean"} } } } $ curl -XPUT 'http://localhost:9200/gems/gem/_mapping' -d @body.json $ curl -XGET 'http://localhost:9200/gems/_mapping' | python -mjson.tool
  • 39. Index Document with Mapping { "name": "grit", "platform": "ruby", "rubygems_version": "2.5.1", "description": "Ruby library for extracting information from a git repository.", "email": "mojombo@github.com", "has_rdoc": false, "homepage": "http://github.com/mojombo/grit" } $ curl -XPUT 'http://localhost:9200/gems/gem/grit-2.5.1' -d @body.json { "ok":true, "_index":"gems", "_type":"gem", "_id":"grit-2.5.1", "_version":1 }
  • 40. Matching documents { "query": { "match" : { "description" : "git repository" } } } $ curl -XPOST http://localhost:9200/gems/gem/_search -d @body.json
  • 41. Highlighting { "query": { "match" : { "description" : "git repository" } }, "highlight" : { "fields" : { "description" : {} } } } $ curl -XPOST http://localhost:9200/gems/gem/_search -d @body.json "highlight": { "description": [ "Ruby library for extracting information from a <em>git</em> <em>repository</em>." ] }
  • 42. Search Facets { "query": { "match_all" : {} }, "facets" : { "gem_names" : { "terms" : { "field": "name" } } } } $ curl -XPOST http://localhost:9200/gems/_search -d @body.json ... "facets": { "gem_names": { "_type": "terms", "missing": 0, "other": 0, "terms": [ { "count": 2, "term": "pry" }, { "count": 2, "term": "grit" }, { "count": 1, "term": "abc" } ], "total": 5 } },
  • 44. Download Public Dataset Download from Aadhaar Public Data Portal at https://data.uidai.gov.in
  • 45. Download Tools $ git clone https://github.com/gnurag/aadhaar
  • 46. Prepare Data & Configure # gem install yajl-ruby tire activesupport $ git clone https://github.com/gnurag/aadhaar $ cd aadhaar/data $ unzip UIDAI-ENR-DETAIL-20121001.zip $ cd ../bin $ vi aadhaar.rb
  • 47. Configuration AADHAAR_DATA_DIR = "/path/to/aadhaar/data" ES_URL = "http://localhost:9200" ES_INDEX = 'aadhaar' ES_TYPE = "UID" BATCH_SIZE = 1000
  • 49. Running Examples $ curl -XPOST http://localhost:9200/aadhaar/UID/_search -d @template.json | python -mjson.tool
  • 51. Index Aliases Group multiple Indexes, and query them together. curl -XPOST 'http://localhost:9200/_aliases' -d ' { "actions" : [ { "add" : { "index" : "index1", "alias" : "master-alias" } } { "add" : { "index" : "index2", "alias" : "master-alias" } } ] }' curl -XPOST 'http://localhost:9200/_aliases' -d ' { "actions" : [ { "remove" : { "index" : "index2", "alias" : "master-alias" } } ] }'
  • 52. Document Routing Control which Shard the document will be placed and queried from.
  • 53. Parents & Children $ curl -XPUT http://localhost:9200/gems/gem/roxml?parent=rexml -d '{ "tag" : "something" }'
  • 56. ElasticSearch Ecosystem A wide range of site plugins, analyzers, river plugins available from the community.