Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

RocksDB detail

4.797 Aufrufe

Veröffentlicht am

RocksDB detail

Veröffentlicht in: Software
  • Login to see the comments

RocksDB detail

  1. 1. 안미진 RocksDB Embedded Key-Value Store for Flash and RAM
  2. 2. Contents 1. RocksDB Introduction 2. RocksDB Architecture 3. LSM DB 4. RocksDB Compaction Overview
  3. 3. RocksDB Introduction • Open source based on LevelDB 1.5, written in C++ • Key-Value persistent store • Embedded Library • Pluggable database • Optimized for fast storage (flash or RAM) • Optimized for server workloads • Get(), Put(), Delete() A persistent key-value store for fast storage environments
  4. 4. Three Basic Constructs of RocksDB • Memtable – in-memory data structure – A buffer, temporarily host the incoming writes • Logfile – Sequentially-written file – On storage
  5. 5. Three Basic Constructs of RocksDB • SSTable(=SSTfile) – Sorted Static Table on storage – A file which contains a set of arbitrary, sorted key-value pairs inside – Organized in levels – Immutable in its life time – sorted data → to facilitate easy lookup of keys – Storage of the entire database
  6. 6. SSTable-BlockBasedTable of RocksDB Data Data Data … Meta (filter) Meta (stats) … Meta index Data index Footer The default SSTable format in RocksDB
  7. 7. SSTable & Memtable of RocksDB • On-disk SSTable indexes are always loaded into memory • All writes go directly to the Memtable index • Reads check the Memtable first → the SSTable indexes • Periodically, the Memtable is flushed to disk as an SSTable • Periodically, on-disk SSTables are merged → update/delete records will overwrite/remove the older data
  8. 8. Simplified RocksDB Memory Storage Memtable SSTable 1 SSTable 2 Key Offset Key Offset … … Index Key Value Key Value … … Simplified SSTable file
  9. 9. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request Read Request LSM Files CompactionFlush Switch Switch
  10. 10. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request Read Request LSM Files CompactionFlush Switch Switch
  11. 11. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request Read Request LSM Files CompactionFlush Switch Switch
  12. 12. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request LSM Files CompactionFlush Switch Switch Read Request
  13. 13. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request LSM Files CompactionFlush Switch Switch Read Request
  14. 14. RocksDB Architecture Active Memtable (4MB) Immutable Memtable Memory Disk Write Level 0 (4 SSTfile) Level 1 (10MB) Level 2 (100MB) . . . . . . . . . Info Log MANIFEST CURRENT Compaction Log SSTfile (2MB)
  15. 15. Log-Structured Merge Tree • LSM-tree – N-level merge trees – Splitting a logical tree into several physical pieces – So that the most-recently-updated portion of data is in a tree in memory – Transform random writes into sequential writes using logfile & in-memory store(Memtable)
  16. 16. Log-Structured Merge DB to minimize “random writes” Write RequestRead Request Read Write data in RAM Read Only data in RAM on disk Periodic Compaction Transaction Log
  17. 17. Log-Structured Merge DB to minimize “random writes” ① Data Write(Insert, Update) • New puts are written to memory(Memtable) & logfile sequentially • Memtable is filled up → flushed to a SSTable on disk • Operated in memory, no disk access → faster than B+ tree ② Data Read • Memtable → SSTable • Maintain all the SSTable indexes in memory
  18. 18. RocksDB Compaction Multi-threaded compactions • Background Multi-thread → periodically do the “compaction” → parallel compactions on different parts of the database can occur simultaneously • Merge SSTfiles to a bigger SSTfile • Remove multiple copies of the same key – Duplicate or overwritten keys • Process deletions of keys • Supports two different styles of compaction – Tunable compaction to trade-off
  19. 19. RocksDB Compaction Storage SSTable 1 SSTable 2 SSTable 3 SSTable 4 SSTable 5
  20. 20. 1. Level Style Compaction • RocksDB default compaction style • Stores data in multiple levels in the database • More recent data → L0 The oldest data → Lmax • Files in L0 - overlapping keys, sorted by flush time Files in L1 and higher - non-overlapping keys, sorted by key • Each level is 10 times larger than the previous one Inherited from LevelDB
  21. 21. Level Style Compaction Compaction process cache log level1 level2 level3 level0 ① Pick one file from level N ② Compact it with all its overlapping files from level N+1 ③ Replace them with new files in level N+1
  22. 22. Level Style Compaction Compaction example 5 bytes 6 bytes 10 bytes 10 bytes 11 bytes 10 bytes Level-0 Level-1 Level-2 Stage 1 Stage 2 Stage 3 Two compactions by Level Style Compaction
  23. 23. Level 0 → Level 1 Compaction • Level 0 → overlapping keys • Compaction includes all files from L1 • All files from L1 are compacted with L0 • L0 → L1 compaction completion L1 → L2 compaction start • Single thread compaction → not good throughput • Solution : Making the size of L0 similar to size of L1 Tricky Compaction
  24. 24. 2. Universal Style Compaction • For write-heavy workloads → Level Style Compaction may be bottlenecked on disk throughput • Stores all files in L0 • All files are arranged in time order • Temporarily increase size amplification by a factor of two • Intended to decrease write amplification • But, increase space amplification
  25. 25. Universal Style Compaction ① Pick up a few files that are chronologically adjacent to one another ② Merge them ③ Replace them with a new file in level 0 Compaction process
  26. 26. Universal Style Compaction • size_ratio - Percentage flexibility while comparing file size - Default : 1 • min_merge_width - The minimum number of files in a single compaction - Default : 2 • max_merge_width - The maximum number of files in a single compaction - Default : UINT_MAX Compaction options
  27. 27. Universal Style Compaction Compaction example 5 bytes 6 bytes 10 bytes 10 bytes Stage 1 Stage 2 Single compaction by Universal Style Compaction Level-0

×