Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
2. About Me
Eric Rodriguez
Founder of data.be
!
⢠Web entrepreneur
⢠Data addict
⢠Multi-Language: PHP, Java/
Groovy/Grails, .Net, âŚ
be.linkedin.com/in/erodriguez
!
github.com/wavyx
!
@wavyx
3. Elasticsearch - Company
⢠Founded in 2012 => http://www.elasticsearch.com
⢠Professional services
⢠Training
⢠Consultancy / Development support
⢠Production support subscription (3 levels of SLAs)
5. (M)ELK Stack
⢠Elasticsearch - Search server based on Lucene
⢠Logstash -Tool for managing events and logs
⢠Kibana -Visualize logs and time-stamped data
⢠Marvel - Monitor your clusterâs heartbeat
You Know, for SearchâŚ
8. Marvel
⢠Monitor the health of your clusterâ¨
cluster-wide metrics, overview of all nodes and
indices and events (master election, new nodes)
9. real time,
search and
analytics engine
open-source
Lucene
JSON
schema
free
documentâ¨
store
RESTful
API
documentation
scalability
high
availability
distributed
multi
tenancy
per-operationâ¨
persistence
22. REST
⢠Check your cluster, node, and index health, status, and statistics
⢠Administer your cluster, node, and index data and metadata
⢠Perform CRUD (Create, Read, Update, and Delete) and
search operations against your indexes
⢠Execute advanced search operations such as paging, sorting,
ďŹltering, scripting, faceting, aggregations, and many others
26. Mapping 1/2
⢠DeďŹne how a document should be mapped
(similar to schema): searchable ďŹelds, tokenization,
storage, ..
⢠Explicit mapping is deďŹned on an index/type level
⢠A default mapping is automatically created
27. Mapping 2/2
⢠Core types: string, integer/long, ďŹoat/double, boolean, and null
⢠Other types:Array, Object, Nested, IP, GeoPoint, GeoShape,
Attachment
⢠Example
28. Search API 1/2
⢠Multi-index, Multi-type
⢠Uri search - Google likeâ¨
Operators (AND/OR), ďŹelds, sort, paging, wildcards, âŚ
30. Query DSL
⢠âSQLâ for elasticsearch
⢠Queries should be used
⢠for full text search
⢠where the result depends on a relevance score
⢠Filters should be used
⢠for binary yes/no searches
⢠for queries on exact values
33. Analysis 1/2
⢠Analysis is extracting âtermsâ from a given text
⢠Processing natural language to make it computer
searchable
⢠ConďŹgurable registry of Analyzers that can be used
⢠to break indexed (analyzed) ďŹelds when a
document is indexed
⢠to process query strings
34. Analysis 2/2
⢠Analyzers are composed of
⢠a singleTokenizer (may be preceded by one or
more CharFilters)
⢠zero or moreTokenFilters
⢠Default Analyzersâ¨
standard, pattern, whitespace, language, snowball
35. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
36. Analytics
⢠Aggregation of information: similar to âgroup byâ
⢠Facets
⢠Aggregated data based on a search query
⢠One-dimensional results
⢠Ex:âterm facetsâ return facetcounts for various values for a speciďŹc ďŹeld â¨
Think color, tag, category, âŚ
⢠Aggregations (ES 1.0+)
⢠Nested Facets
⢠Basic Stats: mean, min, max, std dev, term counts
⢠SigniďŹcantTerms, Percentiles, Cardinality estimations
37. Facets
⢠not yet deprecated, but use aggregations!
⢠Various Facetsâ¨
terms, range, histogram, date,
statistical, geo distance, âŚ
38. Aggregations
⢠A generic powerful framework that can be divided into 2 main families:
⢠Bucketingâ¨
Each bucket is associated with a key and a document criterionâ¨
The aggregation process provides a list of buckets - each one with a set of
documents that "belong" to it.
⢠Metricâ¨
Aggregations that keep track and compute metrics over a set of
documents.
⢠Aggregations can be nested !
39. Bucket Aggregators
⢠global
⢠ďŹlter
⢠missing
⢠terms
⢠range
⢠date range
⢠ip range
⢠histogram
⢠date histogram
⢠geo distance
⢠geohash grid
⢠nested
⢠reverse nested
⢠top hits (version 1.3)
41. Search for end users
⢠Suggesters - âDid you meanââ¨
Terms, Phrases, Completion, Context
⢠âMore like thisââ¨
Find documents that are "like" provided text by
running it against one or more ďŹelds
42. Percolator
⢠Classic ES
1. Add & Index documents
2. Search with queries
3. Retrieve matching documents
⢠Percolator
1. Add & Index queries
2. Percolate documents
3. Retrieve matching queries
43. Why Percolate ?!
⢠Alerts: social media mentions, weather forecast, news alerts
⢠Automatic Monitoring: price monitoring, stock alerts, logs
⢠Ads: display targeted ads based on userâs search queries
⢠Enrich: percolate new documents, then add query matches
as document tags
44. High Availability 1/2
⢠Sharding - Write Scalability
⢠Split logical data over multiple machines & Control data ďŹows
⢠Each index has a ďŹxed number of shards
⢠Improve indexing performance
⢠Replication - Read Scalability
⢠Each shard can have 0-many replicas (dynamic setup)
⢠Removing SPOF (Single Point Of Failure)
⢠Improve search performance
45. High Availability 2/2
⢠Zen Discovery
⢠Automatic discovery of nodes within a cluster
and electing a master node
⢠Useful for failover and replication
⢠SpeciďŹc modules:Amazon EC2, Microsoft
Azure, Google Compute Engine
⢠Snapshot & Restore module