SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Benny Halevy, Software Development Team Lead
Raphael S. Carvalho, Software Developer
Reduce Your Storage Footprint
with a Revolutionary New
Compaction Strategy
2
Speakers
Benny Halevy leads the storage software development team at ScyllaDB.
Benny has been working on operating systems and distributed file
systems for over 20 years. Most recently, Benny led software development
for GSI Technology, providing a hardware/software solution for deep
learning and similarity search using in-memory computing technology.
Raphael S. Carvalho is a computer programmer who loves open source,
and especially kernel programming. He has worked on bringing new file
system support as well as allowing multiple file systems to co-exist,
so-called MultiFS support, for the open source project Syslinux. He has
mostly been working on SSTable handling and compaction for ScyllaDB.
3
+ The Real-Time Big Data Database
+ Drop-in replacement for Apache Cassandra
+ 10X the performance & low tail latency
+ Open source and Enterprise editions
+ New: Scylla Cloud, DBaaS
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
About ScyllaDB
Agenda
4
+ Log-structured Storage overview
+ Compaction overview
+ Existing Compaction Strategies
+ Incremental Compaction Strategy (ICS)
Problem Definition
+ ICS in a Nutshell
+ Case Study: ICS vs. STCS
+ Future Directions
+ Q&A
Introduction:
Log -structured Storage and
Compaction Fundamentals
+ Scylla, like Cassandra, stores data using a log-structured method.
+ Changes to the data (a.k.a. mutation fragments) are first recorded in memory and are
also stored on disk in the commit log.
+ Then, memtables are flushed into SSTables.
+ The name, SSTable, stands for “Sorted Strings Table”
+ it stores a set of mutation fragments
+ that are sorted by the partition key and clustering keys.
+ There are no tables storing a static view of the database.
+ Reading a particular data item requires reading all sstables that may store parts of it
+ And merge all live mutations from the read results.
+ SSTables are immutable, i.e. they are never updated in place.
+ Therefore, updates accumulate over time.
+ And we have to merge them together to avoid running out of space.
+ There comes compaction to the rescue.
Introduction: Log Structured Storage
...
Updates
MemTable
...
SSTable
Introduction: Compaction Fundamentals
1. Compaction first selects a set of sstables to process.
+ based on the Compaction Strategy.
2. It then reads the SSTables, and
+ writes the compacted output
+ while eliminating overwrites, deleted data, and expired data.
3. Eventually, when the output SSTables are sealed and safely stored
on stable storage, the input SSTables can be finally deleted.
Note that compaction requires temporary space
+ Since SSTables must not be deleted until their compaction completes.
Introduction: Compaction Fundamentals
+ What kind of mutation fragments can be eliminated?
+ Overwritten.
+ TTL-expired.
+ Explicitly deleted (via a tombstone, or column deletion).
+ gc-expired tombstones.
a’
a
b c
!c
!d
a’ b !c
!z
!d
[a] is overwritten
by [a’]
[b] is newly
written
[c] is deleted
by [!c]
[!d] is a live
tombstone
[!z] is a
gc-expired
tombstone
poof!
Note that tombstones are kept around for gc_grace_secondsuntil they are
garbage-collected, to prevent data resurrection.
Introduction:
Legacy Compaction Strategies
Introduction - Size Tiered Compaction Strategy
1. STCS, for short, is Scylla’s default compaction strategy.
2. STCS organizes SSTables into tiers, where in tier [n], sstables are:
+ roughly the same size
+ [k] times bigger than sstables in tier [n-1]
+ [k] corresponds to the “min_threshold” config option
3. When compacting [k] SSTables in tier [n]
+ A single SSTable is created.
+ It may be as large as the union of all [k] and then
it is moved to the next tier.
+ Or even become much smaller (with deletes/expire)
and move to a lower tier.
Introduction - STCS Space Amplification
+ STCS results in a low and logarithmic (in size of data) number of sstables, and
a. Data is copied during compaction a fairly low number of times.
+ However, STCS requires that the disk be substantially larger than a perfectly-compacted
representation of the data (i.e., all the data in one single sstable).
+ This is called Space amplification.
+ The main factors are:
a. Accumulation of updates and deletes
across different tiers.
b. Temporary space. In particular, for
Compacting the highest tier.
Introduction - Leveled Compaction Strategy
+ The first thing that Leveled Compaction does is to replace large sstables, the staple of
STCS, by “runs” of small, fixed-sized sstables.
+ Level 0 (L0) stores the new sstables, recently flushed from memtables.
+ As their number grows (and reads slow down), our goal is to move sstables out of this
level to the next levels.
+ Each of the other levels, L1, L2 and on,
is a single run of an exponentially
increasing size.
Introduction - LCS Write Amplification
+ Compaction is triggered whenever some level [i] consist more than 10i
SSTables
+ LCS picks one sstable from level [i], with size X, to compact.
+ It then finds the roughly 10 sstables in the next higher level [i + 1] which overlap with this sstable
and compacts them against the one input sstable.
+ It writes the resulting run, of size bound by (1+10)*X, to the next level.
While LCS limits space amplification, it
results in higher write amplification.
As data updates need to be frequently
compacted, along with unchanged data,
that is merely copied over and over again.
ICS:
Incremental Compaction Strategy
ICS: Problem Definition
+ We observed problems with legacy compaction strategies:
+ STCS has high space amplification, and relatively low write amplification.
+ LCS has high write amplification, and relatively low space amplification.
+ We wanted a compaction strategy that could benefit from both approaches.
ICS - In a Nutshell
+ ICS (originally called “Hybrid Compaction Strategy”)
+ is based on the Size-Tiered Compaction Strategy
+ sharing its low write amplification, and reducing its space amplification.
+ By borrowing SSTable Runs from LCS
+ ICS has reduced temporary space requirements.
+ Since individual, small fragments are compacted, and freed early as soon as they are
exhausted.
+ ICS still avoids LCS’s worse write amplification
since it follows STCS’s size-tiered flow.
+ Merely replacing the increasingly larger SSTables
with increasingly longer SSTable Runs.
SSTable Runs
+ A SSTable Run is a sorted set of SSTables with non-overlapping token ranges.
+ Those are called “Fragments”.
+ It is essentially equivalent to a large SSTable that is split into several smaller SSTables.
+ Since the fragments are disjoint and sorted with respect to each other:
+ We can scan the runs and compact them incrementally, fragment-by-fragment,
+ While deleting exhausted SSTables as we go.
a
b
...
z
a
b
...
z
A
B
...
Z
a
b
...
z
A+a
Summary: Comparing ICS with STCS/LCS
STCS ICS LCS
Space amplification High Low Lowest
Write amplification Low Low High
Number of SSTables Low Medium High
Case Study:
Test results comparing ICS to STCS
under write workload.
Case Study
1) Write 500GB
2) Overwrite repeatedly
3) Compact
+Clearly shows ICS’ improved
space-amplification.
+Most notably, 2X peak with
STCS major compaction vs. no
noticeable peak for ICS.
Future Directions
Future Directions
1. Adaptive ICS
+ Improve space amplification
+ by adapting to overwrite workload
+ And tune the algorithm between Size-Tiered ⇔ Leveled.
2. Disk-space controller
+ Prioritize compaction over writes when reaching capacity.
+ Compaction strategy-based high-water mark (E.g. 50% for STCS, ~15% for ICS)
3. Enhanced metadata tracking
+ Improve overlap heuristics to optimize compaction of non-overlapping SSTables.
+ Both spatially, in token-range terms
+ And temporally, in (expiration) time terms
Available on-demand:
How to Shrink Your Datacenter
Footprint by 50%
Data Modeling Best Practices
How to Size Your Scylla Cluster
How to Bullet-Proof Your Scylla Deployment
Q&A
Stay in touch
bhalevy@scylladb.com
raphaelsc@scylladb.com
United States
1900 Embarcadero Road
Palo Alto, CA 94303
Israel
11 Galgalei Haplada
Herzelia, Israel
www.scylladb.com
@scylladb
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
ScyllaDB
 

Was ist angesagt? (20)

Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)
 
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availability
 
Demystifying the Distributed Database Landscape
Demystifying the Distributed Database LandscapeDemystifying the Distributed Database Landscape
Demystifying the Distributed Database Landscape
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Addressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraAddressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache Cassandra
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
 
Scylla Virtual Workshop 2020
Scylla Virtual Workshop 2020Scylla Virtual Workshop 2020
Scylla Virtual Workshop 2020
 
Introducing Scylla Cloud
Introducing Scylla CloudIntroducing Scylla Cloud
Introducing Scylla Cloud
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Running a DynamoDB-compatible Database on Managed Kubernetes Services
Running a DynamoDB-compatible Database on Managed Kubernetes ServicesRunning a DynamoDB-compatible Database on Managed Kubernetes Services
Running a DynamoDB-compatible Database on Managed Kubernetes Services
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
 
Overcoming Barriers of Scaling Your Database
Overcoming Barriers of Scaling Your DatabaseOvercoming Barriers of Scaling Your Database
Overcoming Barriers of Scaling Your Database
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
 
Understanding Storage I/O Under Load
Understanding Storage I/O Under LoadUnderstanding Storage I/O Under Load
Understanding Storage I/O Under Load
 

Ähnlich wie TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction Strategy

General commands for navisphere cli
General commands for navisphere cliGeneral commands for navisphere cli
General commands for navisphere cli
msaleh1234
 

Ähnlich wie TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction Strategy (20)

How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage Footprint
 
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
 
General commands for navisphere cli
General commands for navisphere cliGeneral commands for navisphere cli
General commands for navisphere cli
 
Distributed Caching - Cache Unleashed
Distributed Caching - Cache UnleashedDistributed Caching - Cache Unleashed
Distributed Caching - Cache Unleashed
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
 
Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art CompactionScaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
 
Optimization in essbase
Optimization in essbaseOptimization in essbase
Optimization in essbase
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction Everywhere
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
AWS Roadshow Herbst 2013: Datenanalyse und Business IntelligenceAWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
 

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction Strategy

  • 1. Benny Halevy, Software Development Team Lead Raphael S. Carvalho, Software Developer Reduce Your Storage Footprint with a Revolutionary New Compaction Strategy
  • 2. 2 Speakers Benny Halevy leads the storage software development team at ScyllaDB. Benny has been working on operating systems and distributed file systems for over 20 years. Most recently, Benny led software development for GSI Technology, providing a hardware/software solution for deep learning and similarity search using in-memory computing technology. Raphael S. Carvalho is a computer programmer who loves open source, and especially kernel programming. He has worked on bringing new file system support as well as allowing multiple file systems to co-exist, so-called MultiFS support, for the open source project Syslinux. He has mostly been working on SSTable handling and compaction for ScyllaDB.
  • 3. 3 + The Real-Time Big Data Database + Drop-in replacement for Apache Cassandra + 10X the performance & low tail latency + Open source and Enterprise editions + New: Scylla Cloud, DBaaS + Founded by the creators of KVM hypervisor + HQs: Palo Alto, CA; Herzelia, Israel About ScyllaDB
  • 4. Agenda 4 + Log-structured Storage overview + Compaction overview + Existing Compaction Strategies + Incremental Compaction Strategy (ICS) Problem Definition + ICS in a Nutshell + Case Study: ICS vs. STCS + Future Directions + Q&A
  • 5. Introduction: Log -structured Storage and Compaction Fundamentals
  • 6. + Scylla, like Cassandra, stores data using a log-structured method. + Changes to the data (a.k.a. mutation fragments) are first recorded in memory and are also stored on disk in the commit log. + Then, memtables are flushed into SSTables. + The name, SSTable, stands for “Sorted Strings Table” + it stores a set of mutation fragments + that are sorted by the partition key and clustering keys. + There are no tables storing a static view of the database. + Reading a particular data item requires reading all sstables that may store parts of it + And merge all live mutations from the read results. + SSTables are immutable, i.e. they are never updated in place. + Therefore, updates accumulate over time. + And we have to merge them together to avoid running out of space. + There comes compaction to the rescue. Introduction: Log Structured Storage ... Updates MemTable ... SSTable
  • 7. Introduction: Compaction Fundamentals 1. Compaction first selects a set of sstables to process. + based on the Compaction Strategy. 2. It then reads the SSTables, and + writes the compacted output + while eliminating overwrites, deleted data, and expired data. 3. Eventually, when the output SSTables are sealed and safely stored on stable storage, the input SSTables can be finally deleted. Note that compaction requires temporary space + Since SSTables must not be deleted until their compaction completes.
  • 8. Introduction: Compaction Fundamentals + What kind of mutation fragments can be eliminated? + Overwritten. + TTL-expired. + Explicitly deleted (via a tombstone, or column deletion). + gc-expired tombstones. a’ a b c !c !d a’ b !c !z !d [a] is overwritten by [a’] [b] is newly written [c] is deleted by [!c] [!d] is a live tombstone [!z] is a gc-expired tombstone poof! Note that tombstones are kept around for gc_grace_secondsuntil they are garbage-collected, to prevent data resurrection.
  • 10. Introduction - Size Tiered Compaction Strategy 1. STCS, for short, is Scylla’s default compaction strategy. 2. STCS organizes SSTables into tiers, where in tier [n], sstables are: + roughly the same size + [k] times bigger than sstables in tier [n-1] + [k] corresponds to the “min_threshold” config option 3. When compacting [k] SSTables in tier [n] + A single SSTable is created. + It may be as large as the union of all [k] and then it is moved to the next tier. + Or even become much smaller (with deletes/expire) and move to a lower tier.
  • 11. Introduction - STCS Space Amplification + STCS results in a low and logarithmic (in size of data) number of sstables, and a. Data is copied during compaction a fairly low number of times. + However, STCS requires that the disk be substantially larger than a perfectly-compacted representation of the data (i.e., all the data in one single sstable). + This is called Space amplification. + The main factors are: a. Accumulation of updates and deletes across different tiers. b. Temporary space. In particular, for Compacting the highest tier.
  • 12. Introduction - Leveled Compaction Strategy + The first thing that Leveled Compaction does is to replace large sstables, the staple of STCS, by “runs” of small, fixed-sized sstables. + Level 0 (L0) stores the new sstables, recently flushed from memtables. + As their number grows (and reads slow down), our goal is to move sstables out of this level to the next levels. + Each of the other levels, L1, L2 and on, is a single run of an exponentially increasing size.
  • 13. Introduction - LCS Write Amplification + Compaction is triggered whenever some level [i] consist more than 10i SSTables + LCS picks one sstable from level [i], with size X, to compact. + It then finds the roughly 10 sstables in the next higher level [i + 1] which overlap with this sstable and compacts them against the one input sstable. + It writes the resulting run, of size bound by (1+10)*X, to the next level. While LCS limits space amplification, it results in higher write amplification. As data updates need to be frequently compacted, along with unchanged data, that is merely copied over and over again.
  • 15. ICS: Problem Definition + We observed problems with legacy compaction strategies: + STCS has high space amplification, and relatively low write amplification. + LCS has high write amplification, and relatively low space amplification. + We wanted a compaction strategy that could benefit from both approaches.
  • 16. ICS - In a Nutshell + ICS (originally called “Hybrid Compaction Strategy”) + is based on the Size-Tiered Compaction Strategy + sharing its low write amplification, and reducing its space amplification. + By borrowing SSTable Runs from LCS + ICS has reduced temporary space requirements. + Since individual, small fragments are compacted, and freed early as soon as they are exhausted. + ICS still avoids LCS’s worse write amplification since it follows STCS’s size-tiered flow. + Merely replacing the increasingly larger SSTables with increasingly longer SSTable Runs.
  • 17. SSTable Runs + A SSTable Run is a sorted set of SSTables with non-overlapping token ranges. + Those are called “Fragments”. + It is essentially equivalent to a large SSTable that is split into several smaller SSTables. + Since the fragments are disjoint and sorted with respect to each other: + We can scan the runs and compact them incrementally, fragment-by-fragment, + While deleting exhausted SSTables as we go. a b ... z a b ... z A B ... Z a b ... z A+a
  • 18. Summary: Comparing ICS with STCS/LCS STCS ICS LCS Space amplification High Low Lowest Write amplification Low Low High Number of SSTables Low Medium High
  • 19. Case Study: Test results comparing ICS to STCS under write workload.
  • 20. Case Study 1) Write 500GB 2) Overwrite repeatedly 3) Compact +Clearly shows ICS’ improved space-amplification. +Most notably, 2X peak with STCS major compaction vs. no noticeable peak for ICS.
  • 22. Future Directions 1. Adaptive ICS + Improve space amplification + by adapting to overwrite workload + And tune the algorithm between Size-Tiered ⇔ Leveled. 2. Disk-space controller + Prioritize compaction over writes when reaching capacity. + Compaction strategy-based high-water mark (E.g. 50% for STCS, ~15% for ICS) 3. Enhanced metadata tracking + Improve overlap heuristics to optimize compaction of non-overlapping SSTables. + Both spatially, in token-range terms + And temporally, in (expiration) time terms
  • 23. Available on-demand: How to Shrink Your Datacenter Footprint by 50% Data Modeling Best Practices How to Size Your Scylla Cluster How to Bullet-Proof Your Scylla Deployment
  • 25. United States 1900 Embarcadero Road Palo Alto, CA 94303 Israel 11 Galgalei Haplada Herzelia, Israel www.scylladb.com @scylladb Thank you