Compaction. A necessary reality in databases with immutable table designs. To date, Scylla and Cassandra compaction strategies for SSTables have had tradeoffs. For example, size-tiered compaction strategy requires leaving 50% of your total drive space unused in order to compact large tables.
What if there was a new, better, more efficient way to handle compactions in Scylla? One that allows you to use your storage much more efficiently? Enter Scylla’s unique Incremental Compaction Strategy (ICS).
Join us for a comparison of common compaction strategies and a technical deep dive into ICS. You’ll learn why ICS will become the new standard for compaction, including an overview of how much disk space you can save with ICS.
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction Strategy
1. Benny Halevy, Software Development Team Lead
Raphael S. Carvalho, Software Developer
Reduce Your Storage Footprint
with a Revolutionary New
Compaction Strategy
2. 2
Speakers
Benny Halevy leads the storage software development team at ScyllaDB.
Benny has been working on operating systems and distributed file
systems for over 20 years. Most recently, Benny led software development
for GSI Technology, providing a hardware/software solution for deep
learning and similarity search using in-memory computing technology.
Raphael S. Carvalho is a computer programmer who loves open source,
and especially kernel programming. He has worked on bringing new file
system support as well as allowing multiple file systems to co-exist,
so-called MultiFS support, for the open source project Syslinux. He has
mostly been working on SSTable handling and compaction for ScyllaDB.
3. 3
+ The Real-Time Big Data Database
+ Drop-in replacement for Apache Cassandra
+ 10X the performance & low tail latency
+ Open source and Enterprise editions
+ New: Scylla Cloud, DBaaS
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
About ScyllaDB
4. Agenda
4
+ Log-structured Storage overview
+ Compaction overview
+ Existing Compaction Strategies
+ Incremental Compaction Strategy (ICS)
Problem Definition
+ ICS in a Nutshell
+ Case Study: ICS vs. STCS
+ Future Directions
+ Q&A
6. + Scylla, like Cassandra, stores data using a log-structured method.
+ Changes to the data (a.k.a. mutation fragments) are first recorded in memory and are
also stored on disk in the commit log.
+ Then, memtables are flushed into SSTables.
+ The name, SSTable, stands for “Sorted Strings Table”
+ it stores a set of mutation fragments
+ that are sorted by the partition key and clustering keys.
+ There are no tables storing a static view of the database.
+ Reading a particular data item requires reading all sstables that may store parts of it
+ And merge all live mutations from the read results.
+ SSTables are immutable, i.e. they are never updated in place.
+ Therefore, updates accumulate over time.
+ And we have to merge them together to avoid running out of space.
+ There comes compaction to the rescue.
Introduction: Log Structured Storage
...
Updates
MemTable
...
SSTable
7. Introduction: Compaction Fundamentals
1. Compaction first selects a set of sstables to process.
+ based on the Compaction Strategy.
2. It then reads the SSTables, and
+ writes the compacted output
+ while eliminating overwrites, deleted data, and expired data.
3. Eventually, when the output SSTables are sealed and safely stored
on stable storage, the input SSTables can be finally deleted.
Note that compaction requires temporary space
+ Since SSTables must not be deleted until their compaction completes.
8. Introduction: Compaction Fundamentals
+ What kind of mutation fragments can be eliminated?
+ Overwritten.
+ TTL-expired.
+ Explicitly deleted (via a tombstone, or column deletion).
+ gc-expired tombstones.
a’
a
b c
!c
!d
a’ b !c
!z
!d
[a] is overwritten
by [a’]
[b] is newly
written
[c] is deleted
by [!c]
[!d] is a live
tombstone
[!z] is a
gc-expired
tombstone
poof!
Note that tombstones are kept around for gc_grace_secondsuntil they are
garbage-collected, to prevent data resurrection.
10. Introduction - Size Tiered Compaction Strategy
1. STCS, for short, is Scylla’s default compaction strategy.
2. STCS organizes SSTables into tiers, where in tier [n], sstables are:
+ roughly the same size
+ [k] times bigger than sstables in tier [n-1]
+ [k] corresponds to the “min_threshold” config option
3. When compacting [k] SSTables in tier [n]
+ A single SSTable is created.
+ It may be as large as the union of all [k] and then
it is moved to the next tier.
+ Or even become much smaller (with deletes/expire)
and move to a lower tier.
11. Introduction - STCS Space Amplification
+ STCS results in a low and logarithmic (in size of data) number of sstables, and
a. Data is copied during compaction a fairly low number of times.
+ However, STCS requires that the disk be substantially larger than a perfectly-compacted
representation of the data (i.e., all the data in one single sstable).
+ This is called Space amplification.
+ The main factors are:
a. Accumulation of updates and deletes
across different tiers.
b. Temporary space. In particular, for
Compacting the highest tier.
12. Introduction - Leveled Compaction Strategy
+ The first thing that Leveled Compaction does is to replace large sstables, the staple of
STCS, by “runs” of small, fixed-sized sstables.
+ Level 0 (L0) stores the new sstables, recently flushed from memtables.
+ As their number grows (and reads slow down), our goal is to move sstables out of this
level to the next levels.
+ Each of the other levels, L1, L2 and on,
is a single run of an exponentially
increasing size.
13. Introduction - LCS Write Amplification
+ Compaction is triggered whenever some level [i] consist more than 10i
SSTables
+ LCS picks one sstable from level [i], with size X, to compact.
+ It then finds the roughly 10 sstables in the next higher level [i + 1] which overlap with this sstable
and compacts them against the one input sstable.
+ It writes the resulting run, of size bound by (1+10)*X, to the next level.
While LCS limits space amplification, it
results in higher write amplification.
As data updates need to be frequently
compacted, along with unchanged data,
that is merely copied over and over again.
15. ICS: Problem Definition
+ We observed problems with legacy compaction strategies:
+ STCS has high space amplification, and relatively low write amplification.
+ LCS has high write amplification, and relatively low space amplification.
+ We wanted a compaction strategy that could benefit from both approaches.
16. ICS - In a Nutshell
+ ICS (originally called “Hybrid Compaction Strategy”)
+ is based on the Size-Tiered Compaction Strategy
+ sharing its low write amplification, and reducing its space amplification.
+ By borrowing SSTable Runs from LCS
+ ICS has reduced temporary space requirements.
+ Since individual, small fragments are compacted, and freed early as soon as they are
exhausted.
+ ICS still avoids LCS’s worse write amplification
since it follows STCS’s size-tiered flow.
+ Merely replacing the increasingly larger SSTables
with increasingly longer SSTable Runs.
17. SSTable Runs
+ A SSTable Run is a sorted set of SSTables with non-overlapping token ranges.
+ Those are called “Fragments”.
+ It is essentially equivalent to a large SSTable that is split into several smaller SSTables.
+ Since the fragments are disjoint and sorted with respect to each other:
+ We can scan the runs and compact them incrementally, fragment-by-fragment,
+ While deleting exhausted SSTables as we go.
a
b
...
z
a
b
...
z
A
B
...
Z
a
b
...
z
A+a
18. Summary: Comparing ICS with STCS/LCS
STCS ICS LCS
Space amplification High Low Lowest
Write amplification Low Low High
Number of SSTables Low Medium High
20. Case Study
1) Write 500GB
2) Overwrite repeatedly
3) Compact
+Clearly shows ICS’ improved
space-amplification.
+Most notably, 2X peak with
STCS major compaction vs. no
noticeable peak for ICS.
22. Future Directions
1. Adaptive ICS
+ Improve space amplification
+ by adapting to overwrite workload
+ And tune the algorithm between Size-Tiered ⇔ Leveled.
2. Disk-space controller
+ Prioritize compaction over writes when reaching capacity.
+ Compaction strategy-based high-water mark (E.g. 50% for STCS, ~15% for ICS)
3. Enhanced metadata tracking
+ Improve overlap heuristics to optimize compaction of non-overlapping SSTables.
+ Both spatially, in token-range terms
+ And temporally, in (expiration) time terms
23. Available on-demand:
How to Shrink Your Datacenter
Footprint by 50%
Data Modeling Best Practices
How to Size Your Scylla Cluster
How to Bullet-Proof Your Scylla Deployment