Elasticsearch Introduction at BigData meetup

Introduction to

Elasticsearch
27th May 2014 - BigData Meetup
Eric Rodriguez

@wavyx

About Me
Eric Rodriguez
Founder of data.be
!
• Web entrepreneur
• Data addict
• Multi-Language: PHP, Java/
Groovy/Grails, .Net, …
be.linkedin.com/in/erodriguez
!
github.com/wavyx
!
@wavyx

Elasticsearch - Company
• Founded in 2012 => http://www.elasticsearch.com

• Professional services

• Training

• Consultancy / Development support

• Production support subscription (3 levels of SLAs)

Enterprises using Elasticsearch

(M)ELK Stack
• Elasticsearch - Search server based on Lucene

• Logstash -Tool for managing events and logs

• Kibana -Visualize logs and time-stamped data

• Marvel - Monitor your cluster’s heartbeat
You Know, for Search…

Logstash
• Collect, parse, index, and search logs

Kibana
• A versatile dashboard to see and interact with your data

Marvel
• Monitor the health of your cluster 
cluster-wide metrics, overview of all nodes and
indices and events (master election, new nodes)

real time,

search and

analytics engine

open-source
Lucene
JSON
schema

free

document 
store
RESTful
API
documentation
scalability
high

availability

distributed

multi

tenancy
per-operation 
persistence

Use Cases
• Full-Text Search

• Data Store

• Analytics

• Alerts

• Ads

• …

Elasticsearch core
• Apache Lucene is a high-performance, full-featured text search engine library
written entirely in Java

• Elasticsearch added value: “Simple is best”
• Simple API (with documentation)

• JSON & RESTful

• Sharding & Replication

• Extensibility: plugins and scripts

• Interoperability: clients and integrations

Terms for DBAs
• Index

• Type

• Document

• Fields

• Mapping
ElasticsearchRDBMs
• Database

• Table

• Row

• Column

• Schema

Plug & Play
• Zero conﬁguration

• 4 LoC to get started ;)

Alive !
=> http://localhost:9200/?pretty

REST
• Check your cluster, node, and index health, status, and statistics

• Administer your cluster, node, and index data and metadata

• Perform CRUD (Create, Read, Update, and Delete) and
search operations against your indexes

• Execute advanced search operations such as paging, sorting,
ﬁltering, scripting, faceting, aggregations, and many others

Basic Operations 1/3
• Add a document
• Create index

• Modify/Replace a document
• Delete a document
• Delete index

• Update a document

Mapping 1/2
• Define how a document should be mapped
(similar to schema): searchable fields, tokenization,
storage, ..

• Explicit mapping is defined on an index/type level

• A default mapping is automatically created

Mapping 2/2
• Core types: string, integer/long, ﬂoat/double, boolean, and null

• Other types:Array, Object, Nested, IP, GeoPoint, GeoShape,
Attachment

• Example

Search API 1/2
• Multi-index, Multi-type
• Uri search - Google like 
Operators (AND/OR), ﬁelds, sort, paging, wildcards, …

Search API 2/2
• Paging & Sort

• Fields: selection, scripts

• Post ﬁlter

• Highlighting

• Rescoring

• Explain

• …

Query DSL
• “SQL” for elasticsearch

• Queries should be used

• for full text search

• where the result depends on a relevance score

• Filters should be used

• for binary yes/no searches

• for queries on exact values

Analysis 1/2
• Analysis is extracting “terms” from a given text
• Processing natural language to make it computer
searchable

• Conﬁgurable registry of Analyzers that can be used

• to break indexed (analyzed) ﬁelds when a
document is indexed

• to process query strings

Analysis 2/2
• Analyzers are composed of

• a singleTokenizer (may be preceded by one or
more CharFilters)

• zero or moreTokenFilters

• Default Analyzers 
standard, pattern, whitespace, language, snowball

Analytics
• Aggregation of information: similar to “group by”

• Facets

• Aggregated data based on a search query

• One-dimensional results

• Ex:“term facets” return facetcounts for various values for a specific field  
Think color, tag, category, …

• Aggregations (ES 1.0+)

• Nested Facets

• Basic Stats: mean, min, max, std dev, term counts

• SignificantTerms, Percentiles, Cardinality estimations

Facets
• not yet deprecated, but use aggregations!
• Various Facets 
terms, range, histogram, date,
statistical, geo distance, …

Aggregations
• A generic powerful framework that can be divided into 2 main families:

• Bucketing 
Each bucket is associated with a key and a document criterion 
The aggregation process provides a list of buckets - each one with a set of
documents that "belong" to it.

• Metric 
Aggregations that keep track and compute metrics over a set of
documents.

• Aggregations can be nested !

Bucket Aggregators
• global

• ﬁlter

• missing

• terms

• range

• date range

• ip range
• histogram

• date histogram

• geo distance

• geohash grid

• nested

• reverse nested

• top hits (version 1.3)

Metrics Aggregators
• count

• stats

• extended stats

• cardinality

• percentiles
• min

• max

• sum

• avg

Search for end users
• Suggesters - “Did you mean” 
Terms, Phrases, Completion, Context

• “More like this” 
Find documents that are "like" provided text by
running it against one or more ﬁelds

Percolator
• Classic ES

1. Add & Index documents

2. Search with queries
3. Retrieve matching documents

• Percolator
1. Add & Index queries

2. Percolate documents
3. Retrieve matching queries

Why Percolate ?!
• Alerts: social media mentions, weather forecast, news alerts

• Automatic Monitoring: price monitoring, stock alerts, logs

• Ads: display targeted ads based on user’s search queries

• Enrich: percolate new documents, then add query matches
as document tags

High Availability 1/2
• Sharding - Write Scalability
• Split logical data over multiple machines & Control data ﬂows

• Each index has a ﬁxed number of shards

• Improve indexing performance

• Replication - Read Scalability
• Each shard can have 0-many replicas (dynamic setup)

• Removing SPOF (Single Point Of Failure)

• Improve search performance

High Availability 2/2
• Zen Discovery
• Automatic discovery of nodes within a cluster
and electing a master node

• Useful for failover and replication

• Speciﬁc modules:Amazon EC2, Microsoft
Azure, Google Compute Engine

• Snapshot & Restore module

Cluster Management
• Marvel - http://www.elasticsearch.org/overview/marvel/

• BigDesk - http://bigdesk.org/

• Paramedic - https://github.com/karmi/elasticsearch-
paramedic

• KOPF - https://github.com/lmenezes/elasticsearch-kopf/

• Elastic HQ - http://www.elastichq.org/

Clients & Integration
• Ecosystem: Kibana, Logstash, Marvel, Hadoop integration

• API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net,
Ruby, Scala, Clojure, Go, Erlang, …

• Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring,
Drupal,Wordpress, …

• Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ,
ActiveMQ,Amazon SQS, File System,Twitter,Wikipedia, RSS, …

Fast & Furious Evolution
Version 1.1 
March 25, 2014
• Cardinality Agg

• Percentiles Agg

• SigniﬁcantTerms
Agg

• SearchTemplates

• Cross ﬁelds search

• Alias for indices &
templates
Version 1.2 
May 22, 2014
• Java 7

• Indexing & Merging
performance

• Aggregations
performance

• Context suggester

• Deep scrolling

• Field value factor
Benchmark API coming in 1.3
Version 1.0 
Feb 12, 2014
• Aggregations

• Snapshot & Restore

• Distributed
Percolator

• Cat API

• Federated search

• Doc values

• Circuit breaker

Resources
• http://www.elasticsearch.org/guide/

• http://www.elasticsearch.org/videos/

• http://www.elasticsearchtutorial.com/

• http://exploringelasticsearch.com/

• http://joelabrahamsson.com/elasticsearch-101/

• http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/

• http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-
plugins.html

Books
• Elasticsearch Server 
http://www.packtpub.com/
elasticsearch-server-2e/book
• Elasticsearch in Action 
http://www.manning.com/
hinman/

Books
• Elasticsearch Cookbook 
elasticsearch-cookbook/book
• Mastering Elasticsearch 
mastering-elasticsearch-
querying-and-data-handling/
book

Books
• Elasticsearch -The Deﬁnitive Guide 
http://www.elasticsearch.org/blog/elasticsearch-deﬁnitive-guide/

Thank you!
eric@data.be - @wavyx
be.linkedin.com/in/erodriguez - github.com/wavyx
http://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/

Elasticsearch Introduction at BigData meetup

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (17)

Ähnlich wie Elasticsearch Introduction at BigData meetup

Ähnlich wie Elasticsearch Introduction at BigData meetup (20)

Mehr von Eric Rodriguez (Hiring in Lex)

Mehr von Eric Rodriguez (Hiring in Lex) (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Elasticsearch Introduction at BigData meetup