2. Lithium makes software that helps brands
better connect with their customers
Our social software helps companies respond on social networks and build
trusted content on a community they own.
3. Empower brands to distill terabytes of daily
data into understanding participation
▪ fast
▪ flexible
▪ scalable
What
products/services
are generating the
most conversations?
Who is authoring
content that
generates the most
kudos/likes?
Are customer posts
getting timely
replies?
What types of
content does this
audience segment
look for?
7. Bulk initial load / rebuild of data
Hadoop
mysql stream
Transform/
route
…
JSON Elasticsearch
8. Bulk loading
▪ Make sure ingest logic is robust
• Idempotent for bulk reply - ‘_id’
• Include revision based on processor/time
• Check cluster/index status to make sure ready to ingest
▪ Know the cache and thread pool sizes
• Bulk – fixed - # of processors - queue size 50
• Handle back off and retry
▪ How many docs?
• Like capacity - test with data –
• number of shards
• index.refresh_interval: 30s
• indices.memory.index_buffer_size: 5%
• indices.memory.*
• index.translog.*
10. Faceting
▪ Don't forget about memory!
• Strings - not_analyzed
• Numbers long vs int, double vs float, etc
• Do you need seconds/minutes when faceting?
• fielddata format - doc_values (1.0)
• Admin API’s allow checking field data size + evictions
• indices.cache.filter.size: 15%
• indices.fielddata.cache.size: 45%
11. Faceting II
▪ Accuracy
• shard_size
• Number of shards
• Cardinality
• Routing
▪ Great custom plugin
framework
• Uniques
• Array faceting
12. Impact
▪ Order of magnitude improvement
▪ Developers able to focus on improving insights
▪ community + elasticsearch + hadoop + horton works =
exciting