Elasticsearch performance tips

•Download as PPTX, PDF•

1 like•694 views

christianuhlcc

Some essential Elasticsearch Performance Tips I talked about at the Munich Search Meetup

Software

codecentric AG 17.12.2014 Seite 1
Essential Elasticsearch
Performance Tips

codecentric AG
BULKING
17.12.2014 Seite 2
Why doesn‘t anyone bulk?!?

codecentric AG
INDEXING SPEED
17.12.2014 Seite 3
Change your configuration during important events

codecentric AG
MAPPING
17.12.2014 Seite 4
Y‘all need to think more abour your mappings

codecentric AG
FILTERS AND CACHES
17.12.2014 Seite 5
Filters instead of Queries as often as possible

codecentric AG
FILTERS ORDERING
17.12.2014 Seite 6
What comes first?
The Chicken or the Egg?
The Query or the Filter?

codecentric AG
QUERIES
17.12.2014 Seite 7
So much room for optimizations!

codecentric AG
AGGREGATIONS
17.12.2014 Seite 8
Aggregations are expensive!

codecentric AG
DOC VALUES
17.12.2014 Seite 9
Store field data on disk instead of on the heap

codecentric AG
UPDATES
17.12.2014 Seite 10
There‘s no such thing as an update

codecentric AG
IDENTIFIER
17.12.2014 Seite 11
Use friendly IDs !

codecentric AG
SHARDING
17.12.2014 Seite 12
Choose a sharding strategy!

codecentric AG
ROUTING
17.12.2014 Seite 13
Avoid distibuted searches by routing

codecentric AG
CLIENTS
17.12.2014 Seite 14
Use the right client!

codecentric AG
SERVER CONFIGURATION
17.12.2014 Seite 15
There are plenty of essential configurations

codecentric AG
FINAL WORDS
17.12.2014 Seite 16
Tuning: Measure, don‘t guess!

codecentric AG
FINAL WORDS
17.12.2014 Seite 17
One change at a time!

codecentric AG
CHRISTIAN UHL
CHRISTIAN.UHL@CODECENTRIC.DE
@CHRISUHLCC
HTTPS://BLOG.CODECENTRIC.DE/
QUESTIONS?
17.12.2014 Seite 18

codecentric AG
• HTTPS://WWW.FLICKR.COM/PHOTOS/TURATTI/6322618398/
• HTTPS://WWW.FLICKR.COM/PHOTOS/BILLDPIX/15294289120/
• HTTPS://WWW.FLICKR.COM/PHOTOS/63541243@N04/14214457280/
• HTTPS://WWW.FLICKR.COM/PHOTOS/AMORTIZE/527435776/
• HTTPS://WWW.FLICKR.COM/PHOTOS/JAREDZIMMERMAN/1392753867/
• HTTPS://WWW.FLICKR.COM/PHOTOS/JONSEIDMAN1988/6155279037/
• HTTPS://WWW.FLICKR.COM/PHOTOS/FORNAL/424716302/
• HTTPS://WWW.FLICKR.COM/PHOTOS/EDYSON/107902861
• HTTPS://WWW.FLICKR.COM/PHOTOS/JMCPHOTOS/2131206015/
• HTTPS://WWW.FLICKR.COM/PHOTOS/AUTOWITCH/4271929/
• HTTPS://WWW.FLICKR.COM/PHOTOS/MRMUSKRAT/3637703614/
• HTTPS://WWW.FLICKR.COM/PHOTOS/PHOTOLIBRARIAN/7578139852/
BILDNACHWEISE
17.12.2014 Seite 19

Similar to Elasticsearch performance tips

Coffee Oil Extraction unit: LabWork and SolidWorks DesignSara Chergaoui

Flash Economics and Lessons learned from operating low latency platforms at h...Aerospike, Inc.

GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心NVIDIA Taiwan

High availability microsoftvsoracleJacques Kostic

Get a better user experience by upgrading to the new HP EliteBook 840 G9 - In...Principled Technologies

5 Tips for a More Efficient Data CenterWestern Digital

Data proliferation and machine learning: The case for upgrading your servers ...Principled Technologies

Optimizing Total Cost of Ownership for the AWS CloudAmazon Web Services

Cloud Native Cost OptimizationAdrian Cockcroft

A way to visual the best storage media for an applicationTony Roug

Getting The Most Out Of Your Flash/SSDsAerospike, Inc.

The benefits of value SAS and data center NVMe drives with Dell EMC PowerEdgePrincipled Technologies

Support more customers and gain business insights faster - Infographic Principled Technologies

H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIASri Ambati

Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central

Sustainable Architecture DesignKevin Francis

Flash Stories: How Customers Make Smarter Decisions FasterWestern Digital

Green Cloud ComputingNitish Shekhar (Google Analytics,Adwords Certified)

Delivering Exceptional Customer Experiences from the CloudSAP Customer Experience

OpenStack Days KrakowVeronika Smidova

Similar to Elasticsearch performance tips (20)

Coffee Oil Extraction unit: LabWork and SolidWorks Design

Flash Economics and Lessons learned from operating low latency platforms at h...

GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心

High availability microsoftvsoracle

Get a better user experience by upgrading to the new HP EliteBook 840 G9 - In...

5 Tips for a More Efficient Data Center

Data proliferation and machine learning: The case for upgrading your servers ...

Optimizing Total Cost of Ownership for the AWS Cloud

Cloud Native Cost Optimization

A way to visual the best storage media for an application

Getting The Most Out Of Your Flash/SSDs

The benefits of value SAS and data center NVMe drives with Dell EMC PowerEdge

Support more customers and gain business insights faster - Infographic

H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA

Direct3D and the Future of Graphics APIs - AMD at GDC14

Sustainable Architecture Design

Flash Stories: How Customers Make Smarter Decisions Faster

Green Cloud Computing

Delivering Exceptional Customer Experiences from the Cloud

OpenStack Days Krakow

Recently uploaded

5 Signs You Need a Fashion PLM Software.pdfWave PLM

What is Binary Language? Computer Number SystemsJheuzeDellosa

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Project Based Learning (A.I).pptx detail explanationkaushalgiri8080

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

Active Directory Penetration Testing, cionsystems.com.pdfCionsystems

Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Exploring iOS App Development: Simplifying the ProcessEvangelist Apps https://twitter.com/EvangelistSW/

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171

Test Automation Strategy for Frontend and BackendArshad QA

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf

What is Binary Language? Computer Number Systems

How To Use Server-Side Rendering with Nuxt.js

Unlocking the Future of AI Agents with Large Language Models

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Project Based Learning (A.I).pptx detail explanation

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

Cloud Management Software Platforms: OpenStack

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Active Directory Penetration Testing, cionsystems.com.pdf

Salesforce Certified Field Service Consultant

HR Software Buyers Guide in 2024 - HRSoftware.com

Exploring iOS App Development: Simplifying the Process

why an Opensea Clone Script might be your perfect match.pdf

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...

Diamond Application Development Crafting Solutions with Precision

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf

Test Automation Strategy for Frontend and Backend

Elasticsearch performance tips

1. codecentric AG 17.12.2014 Seite 1 Essential Elasticsearch Performance Tips

2. codecentric AG BULKING 17.12.2014 Seite 2 Why doesn‘t anyone bulk?!?

3. codecentric AG INDEXING SPEED 17.12.2014 Seite 3 Change your configuration during important events

4. codecentric AG MAPPING 17.12.2014 Seite 4 Y‘all need to think more abour your mappings

5. codecentric AG FILTERS AND CACHES 17.12.2014 Seite 5 Filters instead of Queries as often as possible

6. codecentric AG FILTERS ORDERING 17.12.2014 Seite 6 What comes first? The Chicken or the Egg? The Query or the Filter?

7. codecentric AG QUERIES 17.12.2014 Seite 7 So much room for optimizations!

8. codecentric AG AGGREGATIONS 17.12.2014 Seite 8 Aggregations are expensive!

9. codecentric AG DOC VALUES 17.12.2014 Seite 9 Store field data on disk instead of on the heap

10. codecentric AG UPDATES 17.12.2014 Seite 10 There‘s no such thing as an update

11. codecentric AG IDENTIFIER 17.12.2014 Seite 11 Use friendly IDs !

12. codecentric AG SHARDING 17.12.2014 Seite 12 Choose a sharding strategy!

13. codecentric AG ROUTING 17.12.2014 Seite 13 Avoid distibuted searches by routing

14. codecentric AG CLIENTS 17.12.2014 Seite 14 Use the right client!

15. codecentric AG SERVER CONFIGURATION 17.12.2014 Seite 15 There are plenty of essential configurations

16. codecentric AG FINAL WORDS 17.12.2014 Seite 16 Tuning: Measure, don‘t guess!

17. codecentric AG FINAL WORDS 17.12.2014 Seite 17 One change at a time!

18. codecentric AG CHRISTIAN UHL CHRISTIAN.UHL@CODECENTRIC.DE @CHRISUHLCC HTTPS://BLOG.CODECENTRIC.DE/ QUESTIONS? 17.12.2014 Seite 18

19. codecentric AG • HTTPS://WWW.FLICKR.COM/PHOTOS/TURATTI/6322618398/ • HTTPS://WWW.FLICKR.COM/PHOTOS/BILLDPIX/15294289120/ • HTTPS://WWW.FLICKR.COM/PHOTOS/63541243@N04/14214457280/ • HTTPS://WWW.FLICKR.COM/PHOTOS/AMORTIZE/527435776/ • HTTPS://WWW.FLICKR.COM/PHOTOS/JAREDZIMMERMAN/1392753867/ • HTTPS://WWW.FLICKR.COM/PHOTOS/JONSEIDMAN1988/6155279037/ • HTTPS://WWW.FLICKR.COM/PHOTOS/FORNAL/424716302/ • HTTPS://WWW.FLICKR.COM/PHOTOS/EDYSON/107902861 • HTTPS://WWW.FLICKR.COM/PHOTOS/JMCPHOTOS/2131206015/ • HTTPS://WWW.FLICKR.COM/PHOTOS/AUTOWITCH/4271929/ • HTTPS://WWW.FLICKR.COM/PHOTOS/MRMUSKRAT/3637703614/ • HTTPS://WWW.FLICKR.COM/PHOTOS/PHOTOLIBRARIAN/7578139852/ BILDNACHWEISE 17.12.2014 Seite 19

Editor's Notes

Bulking for Indexing, creating, updating and deleting Bulk size in Bytes, not number of documents If in doubt, smaller batch sizes Parallelize multiple bulks Async calls
Turn of refresh while indexing Delay flushes Throttle merging Maybe increase indices.memory.index_buffer_size Set replicas to zero (only DURING indexing, right?) Disable warmup
Do you really need your _all field? _source field & stored ??? Reduce analysis Field norms Term frequencies & positions Not_analyzed is your friend Dynamic mapping is for playtime, not production
No Scoring Filter results can be cached Most Simple filters are cached, but not all (geo) Compound filters are not cached Expicitly control cache with _cache Bool filters query the cache for sub-filters, but and/or/not don‘t Moving Target Consider the scope -> filtered query probably? Filter applied after query, but not in „filtered query“!
Regular Queries query first, filter afterwards Filtered query filters first Elements of Bool filters are executed sequentially Place most restrictive filter first Accelerator filter Additional filter on general terms Better for caching Reduce Work for heavyweight filters
Pagination Don‘t load too many results at once Avoid deep pagination Index-time vs. Query time optimizations: Try to do prework during index time E.g. Prefix Query vs. Edge Ngram Warmup for „common queries“ Turn on the slow log Use multi-search if applicable
Load lazy as much as possible Hide lesser needed ones Only load once during pagination
For example sorting Filed data stored in RAM Expensive for the JVM, Garbage Collection Issues OS File System cache can take care of that Slightly slower Test them!
Update is a delete + add Partial updates still read the whole document Even „small“ updates can be expensive
Sequential Ids allow optimized storage (binary stored) Javas UUID is truly random Internally Elasticsearch uses FlakeIDs
Multiple Shards allow for paralell writes Multiple Replicas allow parallel reads Indexing more expensive Safety Sharding makes reads slower Accurate scoring round trip Second round trip for the search Reduce step Third roundtrip to retrieve final set of documents 2 Rules of distributed Search: Distributed Search is expensive! Searching multiple indexes is the same as searching multiple shards
Only works for isolated „chunks“ of Data in the same index Maybe „Users“ Routing key overrides shard key Popular Example UserID Multipe users will share a shard Shards will be different in size Alternative: Aliases Move out Large users to new index Have alias point to all indexes Drawback: Cluster state will become big, high network impact
Use existent client librarys If Java, prefer NodeClient Alternative Transport Client Http Long lived connections Check http chunking
Maximum Number of File Descriptors Avoid Swapping ES_HEAP_SIZE (Xms = Xmx) Leave enough memory to the OS ½ memory to ES Not more than 32GB If using doc values, a few GB should be enough Use concurrent GC Default is CMS, maybe try G1 Check your Java Version Avoid virtualisation Noisy Neighbours Storage Use local Use SSD RAID 0

Elasticsearch performance tips

Recommended

Recommended

More Related Content

Similar to Elasticsearch performance tips

Similar to Elasticsearch performance tips (20)

Recently uploaded

Recently uploaded (20)

Elasticsearch performance tips

Editor's Notes