1. Elasticsearch as a time series
database
Felix Barnsteiner | iSYS Software
2. Felix Barnsteiner
● Author of the blog “Elasticsearch
as a Time Series Data Store”
● Project Lead stagemonitor
○ Open Source performance
monitoring
● iSYS Software GmbH
○ custom software
○ jstage.de
■ E-Commerce Platform, Consulting
and Services
○ We’re hiring!
3. Agenda
● Time Series What?
● Why Elasticsearch?
● Data Model
● Elasticsearch Mapping Tuning
● Index Management
11. In search for a new TSDB
● New database for stagemonitor
● Replacing Graphite
○ Installation can be tricky
○ Scaling issues
○ Memory hungry
○ No support for special chars
■ GET /index.html
■ GET_|index:html
● Good things about Graphite
○ Grafana
○ Functions
○ Community/Tools/Collectors
12. Stagemonitor
● Open Source Java Performance Monitoring
○ Mainly Web Applications
● Monitoring of clustered environments
● For Dev, QA and Ops
18. Don’t Match Requirements
● OpenTSDB
○ Not easy to install
○ Hadoop and Hbase
● KairosDB
○ Not easy to install
○ Cassandra + KairosDB + Config
● Prometheus
○ No Clustering
● Druid
○ No visualization tool
19. Easy to install
● InfluxDB
○ rpm, deb, homebrew
● Elasticsearch
○ rpm, deb, download & run
○ Runs anywhere
○ Stagemonitor already requires Elasticsearch
24. Elasticsearch Benefits
● Large community
● Proven to scale
● Redundancy - replicas
● Backups - snapshot and restore
● Easy to install and manage
● Already needed for stagemonitor
○ Only one instead of two data bases to install
● Marvel uses ES as a TSDB
● CERN says
27. How do Timers work?
● Stagemonitor uses Dropwizard Metrics
● Not one report per request
○ Not scaleable
○ Sampling
● Response times are aggregated in memory
● Stores 1000 representative response times
● Computes metrics like percentiles, min/max,
stdev, throughput/rate
● Reports each 60 seconds (configurable)
37. Mapping
● _all: contains all values of all other fields
● _source: contains original JSON
● Saves disk space
● BUT…
● JSON is not visible
● Data can only be retrieved via aggregations
38. Mapping
● Don’t analyze tags (like application, host, ...)
○ No full text search/stemming needed
○ Only filter by exact values
● doc_values: true
○ Reduces heap usage (and OOMEs) at the cost of
disk space
○ Needed for aggregations (un-inverted index)
○ Default in 2.0
39. Mapping
● index: no for metric values
○ Not searchable by value
○ Saves disk space
● Using integers and floats instead of longs and
doubles
○ Saves disk space
○ OK for our use case
○ May not be OK for yours
40. Benchmark
● ~23 M datapoints
● A week’s worth of stagemonitor metric data
● when reporting every minute
-> ~500MB
InfluxDB: ~360MB (0.9.4.1)
42. One Index Per Day
● Logstash like index format
● stagemonitor-metrics-2016.01.18
● Only relevant indices have to be queried
● Easier/more efficient to delete
● Mapping can be changed every day
○ Number of shards
43. Hot/Cold Architecture
● Recent data is queried more often
● Beefy nodes hold recent data (hot)
○ More CPU, RAM
○ SSDs
● Cheaper nodes hold older data (cold)
○ Less CPU, RAM
○ HDDs
44. Hot/Cold Architecture
Beefy Node(s) Cheap Node(s)
elasticsearch.yml
node.box_type: hot
elasticsearch.yml
node.box_type: cold
Index
Today
Index
Yesterday
Index 2
days ago
... ... ...
45. Shard allocation filtering
● New Indices are allocated to hot nodes
{
"template": "stagemonitor-metrics-*",
"settings": {
"index": {
"routing.allocation.require.box_type": "hot"
}
}
…
}
46. Shard allocation filtering
● As indices grow old, they are moved to cold
nodes
PUT /stagemonitor-requests-*,
-stagemonitor-requests-2016.01.18,
-stagemonitor-requests-2016.01.17,
-stagemonitor-requests-2016.01.16/_settings
{
index.routing.allocation.require.box_type=cold
}
47. Hot/Cold Architecture
Beefy Node(s) Cheap Node(s)
elasticsearch.yml
node.box_type: hot
elasticsearch.yml
node.box_type: cold
New
Index
Old
Index
Old
Index
48. Force Merge (optimize)
● Improves resource use
● Improves cluster recovery speed and restarts
of node
● Requires a lot of CPU and disk resources
while running
● Should be performed on cold nodes
○ They usually aren’t doing much
○ Hot nodes usually are busy due to
indexing/querying
51. Tools
● Curator
○ Command line tool by Elastic
○ Execute 3 jobs as cron
● Stagemonitor
○ Automatically handled
○ How many days on hot nodes
○ Configurable delete delay