SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Deletes Without
Tombstones or TTLs
Eric Stevens, Principal Architect
ProtectWise, Inc.
©2016 ProtectWise, Inc. All rights reserved.
About ProtectWise
An enterprise security company that records, analyzes, and visualizes your network on demand to detect
complex threats that others can’t see
Big DataData Ingestion and Availability
● Well north of a billion new records
per day
● Processed, analyzed, and stored
in soft real time
● Fully indexed and searchable with
p95 query response times <1
second
○ Shortening the OODA loop
● Hundreds of Cassandra servers
● Hundreds of Billions of Records
● Multiple Petabytes of Data
©2016 ProtectWise, Inc. All rights reserved.
With one sensor, ProtectWise captured the
following data at Super Bowl 50:
● 8.806 Terabytes of data seen. Primarily HTTP,
SSL and traffic to Amazon AWS, Facebook,
Twitter, and Instagram.
● 1.550 Terabytes of data captured (82%
optimization)
● 17 million URLs hit
● 8,085,949 DNS requests
With a single sensor deployed on the Levi's
Public Wi-Fi Network, ProtectWise captured
8.806 Terabytes of Data and was able to optimize
it by 82% to just 1.550 Terabytes of data, a true
testament to the scale and power of our platform.
Use Case – Super Bowl 50
The Broncos weren’t the only team from Denver in Levi’s Stadium
©2016 ProtectWise, Inc. All rights reserved.
● How Deletes (tombstones) in Cassandra Work Today
● The Limitations of Tombstones
● Misconceptions about Tombstones
● How TTL (Time to Live) in Cassandra works today
● The limitations of TTLs
● Why neither strategy works for ProtectWise
● Our unconventional solution
● Advantages of our solution
● Disadvantages of our solution
Overview
©2016 ProtectWise, Inc. All rights reserved.
● Increases both write and read I/O pressure
● Not an effective means of reclaiming disk
capacity
● May be difficult to locate correct records for
deletion
● Makes reads more expensive
● Actual tombstones can often greatly outlive
their deleted data (much longer than
gc_grace)
Terrible
● Surgically target data for removal
● Easy to reason about from a read
consistency perspective
Terrific
The Trouble with Tombstones
©2016 ProtectWise, Inc. All rights reserved.
When do tombstones (and expired TTL’d
records) go away?
● Never before it’s gc_grace old (this is a good thing, and you get to control it)
● During compaction, for a tombstone past gc_grace, its partition key is checked
against the bloom filters of all other SSTables for the given CQL table.
● If there is a bloom filter collision, the tombstone will remain, even if the bloom
filter collision was a false positive
● If there is ANY data, even other tombstones for that partition in any SSTable,
the tombstone will not get cleaned up
● If bloom filters indicate there is no chance of overlap on that partition key, the
tombstone will get cleaned up
©2016 ProtectWise, Inc. All rights reserved.
Misconception about Tombstone Performance
● The performance degradation from tombstones isn’t from the tombstone itself.
● If you do
○ for (n <- 0 to 100000) {
INSERT INTO table (partitionKey, clusterKey) VALUES ( 1, n )
}
● You can later create a range tombstone that is tiny bytes wise:
○ DELETE FROM table WHERE partitionKey = 1 AND clusterKey < 99999
● But if you then
○ SELECT * FROM table WHERE partitionKey = 1 LIMIT 1
● Cassandra will have to read then discard rows with clusterKey values from 0
to 99998 before the LIMIT 1 can be reached
©2016 ProtectWise, Inc. All rights reserved.
PK1 CK1
CK2
1 2 ... o
1 2 ... p
... ...
CKn 1 2 ... q
PK1 DELETE 1 – n-1
SSTable 1
SSTable 2
3
SELECT * FROM table WHERE pk1 LIMIT 1
©2016 ProtectWise, Inc. All rights reserved.
{
{
{
{
Compaction Review
↑ Writes
← Older Data Newer Data →
©2016 ProtectWise, Inc. All rights reserved.
Tombstones in Compaction
↑ Delete
SSTable
containing
record to
delete ↑
©2016 ProtectWise, Inc. All rights reserved.
Tombstones in Compaction
↑ Other Writes
SSTable
containing
record to
delete ↑
©2016 ProtectWise, Inc. All rights reserved.
Tombstones in Compaction
↑ Other Writes
SSTable
containing
record to
delete ↑
©2016 ProtectWise, Inc. All rights reserved.
Tombstones in Compaction
↑ Other Writes
SSTable
containing
record to
delete ↑
©2016 ProtectWise, Inc. All rights reserved.
Tombstones in Compaction
↑ Other Writes
Finally
Deleted ↑
Showing why tombstones are not the same thing as a delete.
Tombstone Demo
©2016 ProtectWise, Inc. All rights reserved.
Setup
cqlsh> CREATE TABLE testing(

 p blob,

 c blob,

 v blob,

 PRIMARY KEY(p,c)

 ) WITH gc_grace_seconds=0;
©2016 ProtectWise, Inc. All rights reserved.
Setup
cqlsh> INSERT INTO testing
(p,c,v) VALUES (0xcafebabe,
0xdeadbeef, 0xdeadc0de);
$ nodetool flush && ls *-Data.db
testing-testing-ka-1-Data.db
testing-testing-ka-2-Data.db
cqlsh> INSERT INTO testing
(p,c,v) VALUES (0xcafebabe,
0xdeadbeef, 0xfacefeed);
$ nodetool flush && ls *-Data.db
testing-testing-ka-1-Data.db
0xcafebabe:0xdeadbeef:0xfacefeed1 0xcafebabe:0xdeadbeef:0xfacefeed1
0xcafebabe:0xdeadbeef:0xdeadc0de2
©2016 ProtectWise, Inc. All rights reserved.
Setup
cqlsh> DELETE FROM testing WHERE
p=0xcafebabe AND c=0xdeadbeef;
$ nodetool flush && ls *-Data.db
testing-testing-ka-1-Data.db
testing-testing-ka-2-Data.db
testing-testing-ka-3-Data.db
cqlsh> select * from testing;
p | c | v
------------+------------+------------
0xcafebabe | 0xdeadbeef | 0xdeadc0de
0xcafebabe:0xdeadbeef:0xfacefeed1
0xcafebabe:0xdeadbeef:0xdeadc0de2
0xcafebabe:0xdeadbeef:DELETE3
©2016 ProtectWise, Inc. All rights reserved.
Let’s look at the data
$ hexdump testing-testing-ka-1-Data.db
0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 80
0000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 34
0000020 3b d8 4e df f1 0d 00 14 0b 19 00 29 01 76 1a 00
0000030 70 04 fa ce fe ed 00 00 6f 9b 15 17
0xcafebabe:0xdeadbeef:0xfacefeed1
©2016 ProtectWise, Inc. All rights reserved.
Let’s look at the data
$ hexdump testing-testing-ka-2-Data.db
0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 80
0000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 34
0000020 3b e3 86 df 23 0d 00 14 0b 19 00 29 01 76 1a 00
0000030 70 04 de ad c0 de 00 00 62 de 14 02
0xcafebabe:0xdeadbeef:0xdeadc0de2
©2016 ProtectWise, Inc. All rights reserved.
Let’s look at the data
$ hexdump testing-testing-ka-3-Data.db
0000000 33 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 80
0000010 00 01 00 94 07 00 04 de ad be ef ff 10 0a 00 f0
0000020 00 01 57 4f 2d 69 00 05 34 3b e6 ab 47 c8 00 00
0000030 db 77 12 69
0xcafebabe:0xdeadbeef:DELETE3
©2016 ProtectWise, Inc. All rights reserved.
Time to Compact
Simulate compaction
happening on data that
has been deleted, but
where the tombstone is
not involved in the
compaction
% jmx_invoke -m
org.apache.cassandra.db:type=CompactionMan
ager forceUserDefinedCompaction testing-
testing-ka-1-Data.db,testing-testing-ka-2-
Data.db
$ ls *-Data.db
testing-testing-ka-3-Data.db
testing-testing-ka-4-Data.db
0xcafebabe:0xdeadbeef:0xfacefeed1
0xcafebabe:0xdeadbeef:0xdeadc0de2 0xcafebabe:0xdeadbeef:??????????4
©2016 ProtectWise, Inc. All rights reserved.
Let’s look again:
$ hexdump testing-testing-ka-4-Data.db
0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 80
0000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 34
0000020 3b e3 86 df 23 0d 00 14 0b 19 00 29 01 76 1a 00
0000030 70 04 de ad c0 de 00 00 62 de 14 02
0xcafebabe:0xdeadbeef:0xdeadc0de4
©2016 ProtectWise, Inc. All rights reserved.
What happened?
● The tombstone for primary key (0xcafebabe,0xdeadbeef) was written in
SSTable 3
● SSTable 3 wasn’t involved in the compaction
● ∮The data at rest didn’t get cleaned up
©2016 ProtectWise, Inc. All rights reserved.
Why is this a problem
● In all mainline compaction strategies:
○ Data written close together chronologically tends to compact together relatively quickly
○ Data written chronologically far apart tends to take a long time to compact together
■ This is why it’s an anti-pattern to append or overwrite the same partition over long
periods of time, your reads to that partition will end up needing to read out of a large
number of SSTables
○ Because disk capacity is not recovered until the tombstone and its underlying data are
involved in the same compaction, it can take a long time to recover disk capacity
● Some compaction strategies (DateTiered, TimeWindowed) have controls that
allow for data to permanently stop compacting.
○ Under these conditions there become times where it’s impossible to ever recover disk capacity
Note, See CASSANDRA-7019 for an upcoming alternative
Also “Improving Tombstone Compactions” today at 4:10 in 210C
©2016 ProtectWise, Inc. All rights reserved.
● Once a TTL has been written, there is no
way to change your mind except to write the
record again with a new TTL
● Rows written to more than one time may
have inconsistent TTLs leading to dirty or
incomplete reads.
● TTL’d records may remain at rest much
longer than you realize in some
circumstances
Trouble
● Fire and forget, your data will “go away”
fairly predictably
Terrific
The Trouble with TTLs
©2016 ProtectWise, Inc. All rights reserved.
● Customers get to change their mind about how
long they want us to retain their data
● Changing TTL’s is expensive, both in terms of
I/O pressure, and temporarily doubling the size
of your data at rest
● Disks are cheap
 lots of disks are not
● Cassandra data at rest has an ongoing cost, if
a customer stops paying for it, we need to as
well
● Timeliness of deletes is important
● Sensitive data spillage means we need to
remove some data quickly
Why Neither Strategy Works for Us
Our Unconventional Solution
©2016 ProtectWise, Inc. All rights reserved.
● If you have hot swappable drives, this is a
lot easier, if not, you might have some
temporary downtime due to RF change.
Step 2: Disconnect Drive
● There are some weird anti-entropy corner
cases that are solved if you disable
replication
Step 1: Set RF=1
Basic Strategy
Successfully used to delete significant amounts of data with little to no performance impact
©2016 ProtectWise, Inc. All rights reserved.
Step 3
Deleting Compaction Strategy
©2016 ProtectWise, Inc. All rights reserved.
● Records are removed from the next
compaction as soon as they should be
evicted
● If we need to recover capacity quickly we
can use user defined compaction to
selectively target our oldest files
Evicting Compaction Strategy
● During compaction, use deterministic logic
to determine which records should be
removed
● Prevent records from surviving the
compaction process
● Clean up indexes at the time the record is
removed
Delete While Compacting
Basic Strategy
For real this time.
©2016 ProtectWise, Inc. All rights reserved.
● If you choose to, you can create a backup
automatically of the deleted records
● Save yourself from deletion remorse
● Incorrect deletion logic
● Change of heart by you(r customer)
● Move those records to cheaper storage
Backing up your deletes
● Acts as a parent strategy with your
preferred child compaction strategy
● Child strategy is responsible for sstable
selection
● You get the characteristics of your strategy,
with the deletes of our strategy
Wrapping Compaction Strategy
Features
Does it support feature X of my preferred compaction strategy?
©2016 ProtectWise, Inc. All rights reserved.
● Configurable and extensible
● Several provided implementations can
be reasonably surgically controlled by
reading deletion rules out of a table
you specify
● Extend one of several base classes to
provide more sophisticated custom
logic
● Restoring backups
● To restore accidentally deleted
records, copy these files to the right
path and do nodetool refresh
● Or if your topology has changed you
can restore them with sstableloader
Features
©2016 ProtectWise, Inc. All rights reserved.
ALTER TABLE bar WITH compaction = {
'class': 'DeletingCompactionStrategy',
'dcs_underlying_compactor':
'LeveledCompactionStrategy',
'sstable_size_in_mb': 160
};
ALTER TABLE foo WITH compaction = {
'class': 'DeletingCompactionStrategy',
'dcs_underlying_compactor':
'SizeTieredCompactionStrategy',
'min_threshold': '2',
'max_threshold': '8'
};
A Wrapping Compaction Strategy
Doesn’t change the fundamental characteristics
of your preferred compaction strategy
©2016 ProtectWise, Inc. All rights reserved.
Compaction’s Inner Workings
Credit: DataStax
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html
©2016 ProtectWise, Inc. All rights reserved.
Compaction’s Inner Workings
Credit: DataStax
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html
{
Compaction Strategy
selects SSTables
Returns SSTableIterators
©2016 ProtectWise, Inc. All rights reserved.
Compaction’s Inner Workings
Credit: DataStax
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html
}
FilteringSSTableIterators
exclude data which should be
deleted, and also notify
IndexManager if appropriate to
clean up associated indexes.
©2016 ProtectWise, Inc. All rights reserved.
Rules:
A => ✓
B => ✗
C => ✓
D => ✗
E => ✓
* if configured to backup convicted records
An Evicting Compaction Strategy
Records involved in compaction which are convicted do not
survive into the newly compacted SSTable
A
B
C
A
B
D
C
D
E
A
C
E
SSTable 1 SSTable 2 SSTable 3
New SSTable Backup SSTable*
B
D
©2016 ProtectWise, Inc. All rights reserved.
● Compaction performance is often bounded
by available write capacity
● Fewer records surviving into the target table
reduces write pressure during compaction
● Testing of records for conviction is
lightweight (depending on the complexity of
your business logic), and mostly CPU
bound
Often Faster than Existing Compaction
©2016 ProtectWise, Inc. All rights reserved.
● Records past the deletion boundary may
still be visible to your application
● You may get inconsistent reads for
such records
● Evicted records may resurrect temporarily
due to repair
● They’ll end up in a new SSTable and
will evict again during the next auto
compaction
Boundary Consistency
● Like all other baked in deletion options, disk
capacity is reclaimed only eventually
● Old SSTables still tend not to compact
very frequently
● However by triggering user defined
compaction, you can reclaim space
immediately without resorting to major
compaction
Eventual Deletes
Limitations
©2016 ProtectWise, Inc. All rights reserved.
● Read repair and in general any repair may
cause a record to fully resurrect temporarily
● Resurrected record will appear in the
youngest SSTables
● Will disappear again when those new
SSTables next compact (generally relatively
quickly for an active cluster)
Repair = Resurrection
● Logic for deletes needs to be deterministic
or you’ll end up with consistency issues
● Probably not a good idea to base any
deletion logic on anything outside of the
primary key except in narrow use cases
Requires deletion determinism
Limitations
©2016 ProtectWise, Inc. All rights reserved.
● Supports and tested against Cassandra 2.x
series
● In 3.x the package and class names
changed, needs to be ported
● Tests are written in Scala, they cover a lot
of surface area but would need to be
rewritten prior to contribution
● Needs additional general purpose
convictors
● Principally tested against STCS and
deserves better coverage for other child
strategies
Current Project Status
©2016 ProtectWise, Inc. All rights reserved.
https://github.com/protectwise/cassandra-util
Also includes:
● Our DataStax Driver Wrapper for Scala
● Our CCM wrapper lib for automating unit tests in Scala
GitHub
Availability & Compatibility
www.protectwise.com/careers.html
Especially if you’re in Denver!
Scala, Akka, Spark, Node, DevOps
We’re Hiring!
©2016 ProtectWise, Inc. All rights reserved.
Cold Storage that Isn’t Glacial
Tomorrow 10:45 Room LL20D
Using Approximate Data for Small,
Insightful Analytics
Tomorrow 2:00 Room LL20A
See Our Other Talks

Weitere Àhnliche Inhalte

Was ist angesagt?

Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on ReadDatabricks
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDatabricks
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Introduction to KSQL: Streaming SQL for Apache KafkaÂź
Introduction to KSQL: Streaming SQL for Apache KafkaÂźIntroduction to KSQL: Streaming SQL for Apache KafkaÂź
Introduction to KSQL: Streaming SQL for Apache KafkaÂźconfluent
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slidesMohamed Farouk
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 

Was ist angesagt? (20)

Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing Merge
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Introduction to KSQL: Streaming SQL for Apache KafkaÂź
Introduction to KSQL: Streaming SQL for Apache KafkaÂźIntroduction to KSQL: Streaming SQL for Apache KafkaÂź
Introduction to KSQL: Streaming SQL for Apache KafkaÂź
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 

Andere mochten auch

Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...DataStax
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax
 
Optimizing Cassandra in AWS
Optimizing Cassandra in AWSOptimizing Cassandra in AWS
Optimizing Cassandra in AWSgreggulrich
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...DataStax
 
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...DataStax
 
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...DataStax
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...DataStax
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...DataStax
 

Andere mochten auch (8)

Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Optimizing Cassandra in AWS
Optimizing Cassandra in AWSOptimizing Cassandra in AWS
Optimizing Cassandra in AWS
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
 
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
 
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
 

Ähnlich wie Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsJeff Jirsa
 
DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920Daniel Cohen
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Johnny Miller
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDBMariaDB Corporation
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...DataStax
 
Next generation storage: eliminating the guesswork and avoiding forklift upgrade
Next generation storage: eliminating the guesswork and avoiding forklift upgradeNext generation storage: eliminating the guesswork and avoiding forklift upgrade
Next generation storage: eliminating the guesswork and avoiding forklift upgradeJisc
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...DataStax
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
CrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For OperatorsCrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For OperatorsDataStax Academy
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightDataWorks Summit
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationDataStax Academy
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Vinay Kumar Chella
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixStefan Krawczyk
 

Ähnlich wie Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016 (20)

Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
 
DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Next generation storage: eliminating the guesswork and avoiding forklift upgrade
Next generation storage: eliminating the guesswork and avoiding forklift upgradeNext generation storage: eliminating the guesswork and avoiding forklift upgrade
Next generation storage: eliminating the guesswork and avoiding forklift upgrade
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
System Design.pdf
System Design.pdfSystem Design.pdf
System Design.pdf
 
CrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For OperatorsCrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For Operators
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
 

Mehr von DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandraℱ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandraℱ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandraℱ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandraℱ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandraℱ + What’s New in 4.0
Introduction to Apache Cassandraℱ + What’s New in 4.0Introduction to Apache Cassandraℱ + What’s New in 4.0
Introduction to Apache Cassandraℱ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

Mehr von DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandraℱ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandraℱ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandraℱ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandraℱ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandraℱ + What’s New in 4.0
Introduction to Apache Cassandraℱ + What’s New in 4.0Introduction to Apache Cassandraℱ + What’s New in 4.0
Introduction to Apache Cassandraℱ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

KĂŒrzlich hochgeladen

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžcall girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžDelhi Call girls
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 

KĂŒrzlich hochgeladen (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžcall girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 

Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Summit 2016

  • 1. Deletes Without Tombstones or TTLs Eric Stevens, Principal Architect ProtectWise, Inc.
  • 2. ©2016 ProtectWise, Inc. All rights reserved. About ProtectWise An enterprise security company that records, analyzes, and visualizes your network on demand to detect complex threats that others can’t see Big DataData Ingestion and Availability ● Well north of a billion new records per day ● Processed, analyzed, and stored in soft real time ● Fully indexed and searchable with p95 query response times <1 second ○ Shortening the OODA loop ● Hundreds of Cassandra servers ● Hundreds of Billions of Records ● Multiple Petabytes of Data
  • 3. ©2016 ProtectWise, Inc. All rights reserved. With one sensor, ProtectWise captured the following data at Super Bowl 50: ● 8.806 Terabytes of data seen. Primarily HTTP, SSL and traffic to Amazon AWS, Facebook, Twitter, and Instagram. ● 1.550 Terabytes of data captured (82% optimization) ● 17 million URLs hit ● 8,085,949 DNS requests With a single sensor deployed on the Levi's Public Wi-Fi Network, ProtectWise captured 8.806 Terabytes of Data and was able to optimize it by 82% to just 1.550 Terabytes of data, a true testament to the scale and power of our platform. Use Case – Super Bowl 50 The Broncos weren’t the only team from Denver in Levi’s Stadium
  • 4. ©2016 ProtectWise, Inc. All rights reserved. ● How Deletes (tombstones) in Cassandra Work Today ● The Limitations of Tombstones ● Misconceptions about Tombstones ● How TTL (Time to Live) in Cassandra works today ● The limitations of TTLs ● Why neither strategy works for ProtectWise ● Our unconventional solution ● Advantages of our solution ● Disadvantages of our solution Overview
  • 5. ©2016 ProtectWise, Inc. All rights reserved. ● Increases both write and read I/O pressure ● Not an effective means of reclaiming disk capacity ● May be difficult to locate correct records for deletion ● Makes reads more expensive ● Actual tombstones can often greatly outlive their deleted data (much longer than gc_grace) Terrible ● Surgically target data for removal ● Easy to reason about from a read consistency perspective Terrific The Trouble with Tombstones
  • 6. ©2016 ProtectWise, Inc. All rights reserved. When do tombstones (and expired TTL’d records) go away? ● Never before it’s gc_grace old (this is a good thing, and you get to control it) ● During compaction, for a tombstone past gc_grace, its partition key is checked against the bloom filters of all other SSTables for the given CQL table. ● If there is a bloom filter collision, the tombstone will remain, even if the bloom filter collision was a false positive ● If there is ANY data, even other tombstones for that partition in any SSTable, the tombstone will not get cleaned up ● If bloom filters indicate there is no chance of overlap on that partition key, the tombstone will get cleaned up
  • 7. ©2016 ProtectWise, Inc. All rights reserved. Misconception about Tombstone Performance ● The performance degradation from tombstones isn’t from the tombstone itself. ● If you do ○ for (n <- 0 to 100000) { INSERT INTO table (partitionKey, clusterKey) VALUES ( 1, n ) } ● You can later create a range tombstone that is tiny bytes wise: ○ DELETE FROM table WHERE partitionKey = 1 AND clusterKey < 99999 ● But if you then ○ SELECT * FROM table WHERE partitionKey = 1 LIMIT 1 ● Cassandra will have to read then discard rows with clusterKey values from 0 to 99998 before the LIMIT 1 can be reached
  • 8. ©2016 ProtectWise, Inc. All rights reserved. PK1 CK1 CK2 1 2 ... o 1 2 ... p ... ... CKn 1 2 ... q PK1 DELETE 1 – n-1 SSTable 1 SSTable 2 3 SELECT * FROM table WHERE pk1 LIMIT 1
  • 9. ©2016 ProtectWise, Inc. All rights reserved. { { { { Compaction Review ↑ Writes ← Older Data Newer Data →
  • 10. ©2016 ProtectWise, Inc. All rights reserved. Tombstones in Compaction ↑ Delete SSTable containing record to delete ↑
  • 11. ©2016 ProtectWise, Inc. All rights reserved. Tombstones in Compaction ↑ Other Writes SSTable containing record to delete ↑
  • 12. ©2016 ProtectWise, Inc. All rights reserved. Tombstones in Compaction ↑ Other Writes SSTable containing record to delete ↑
  • 13. ©2016 ProtectWise, Inc. All rights reserved. Tombstones in Compaction ↑ Other Writes SSTable containing record to delete ↑
  • 14. ©2016 ProtectWise, Inc. All rights reserved. Tombstones in Compaction ↑ Other Writes Finally Deleted ↑
  • 15. Showing why tombstones are not the same thing as a delete. Tombstone Demo
  • 16. ©2016 ProtectWise, Inc. All rights reserved. Setup cqlsh> CREATE TABLE testing( 
 p blob, 
 c blob, 
 v blob, 
 PRIMARY KEY(p,c) 
 ) WITH gc_grace_seconds=0;
  • 17. ©2016 ProtectWise, Inc. All rights reserved. Setup cqlsh> INSERT INTO testing (p,c,v) VALUES (0xcafebabe, 0xdeadbeef, 0xdeadc0de); $ nodetool flush && ls *-Data.db testing-testing-ka-1-Data.db testing-testing-ka-2-Data.db cqlsh> INSERT INTO testing (p,c,v) VALUES (0xcafebabe, 0xdeadbeef, 0xfacefeed); $ nodetool flush && ls *-Data.db testing-testing-ka-1-Data.db 0xcafebabe:0xdeadbeef:0xfacefeed1 0xcafebabe:0xdeadbeef:0xfacefeed1 0xcafebabe:0xdeadbeef:0xdeadc0de2
  • 18. ©2016 ProtectWise, Inc. All rights reserved. Setup cqlsh> DELETE FROM testing WHERE p=0xcafebabe AND c=0xdeadbeef; $ nodetool flush && ls *-Data.db testing-testing-ka-1-Data.db testing-testing-ka-2-Data.db testing-testing-ka-3-Data.db cqlsh> select * from testing; p | c | v ------------+------------+------------ 0xcafebabe | 0xdeadbeef | 0xdeadc0de 0xcafebabe:0xdeadbeef:0xfacefeed1 0xcafebabe:0xdeadbeef:0xdeadc0de2 0xcafebabe:0xdeadbeef:DELETE3
  • 19. ©2016 ProtectWise, Inc. All rights reserved. Let’s look at the data $ hexdump testing-testing-ka-1-Data.db 0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 80 0000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 34 0000020 3b d8 4e df f1 0d 00 14 0b 19 00 29 01 76 1a 00 0000030 70 04 fa ce fe ed 00 00 6f 9b 15 17 0xcafebabe:0xdeadbeef:0xfacefeed1
  • 20. ©2016 ProtectWise, Inc. All rights reserved. Let’s look at the data $ hexdump testing-testing-ka-2-Data.db 0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 80 0000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 34 0000020 3b e3 86 df 23 0d 00 14 0b 19 00 29 01 76 1a 00 0000030 70 04 de ad c0 de 00 00 62 de 14 02 0xcafebabe:0xdeadbeef:0xdeadc0de2
  • 21. ©2016 ProtectWise, Inc. All rights reserved. Let’s look at the data $ hexdump testing-testing-ka-3-Data.db 0000000 33 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 80 0000010 00 01 00 94 07 00 04 de ad be ef ff 10 0a 00 f0 0000020 00 01 57 4f 2d 69 00 05 34 3b e6 ab 47 c8 00 00 0000030 db 77 12 69 0xcafebabe:0xdeadbeef:DELETE3
  • 22. ©2016 ProtectWise, Inc. All rights reserved. Time to Compact Simulate compaction happening on data that has been deleted, but where the tombstone is not involved in the compaction % jmx_invoke -m org.apache.cassandra.db:type=CompactionMan ager forceUserDefinedCompaction testing- testing-ka-1-Data.db,testing-testing-ka-2- Data.db $ ls *-Data.db testing-testing-ka-3-Data.db testing-testing-ka-4-Data.db 0xcafebabe:0xdeadbeef:0xfacefeed1 0xcafebabe:0xdeadbeef:0xdeadc0de2 0xcafebabe:0xdeadbeef:??????????4
  • 23. ©2016 ProtectWise, Inc. All rights reserved. Let’s look again: $ hexdump testing-testing-ka-4-Data.db 0000000 4b 00 00 00 c3 00 04 ca fe ba be 7f ff ff ff 80 0000010 00 01 00 72 0a 00 04 de ad be ef 0e 00 71 05 34 0000020 3b e3 86 df 23 0d 00 14 0b 19 00 29 01 76 1a 00 0000030 70 04 de ad c0 de 00 00 62 de 14 02 0xcafebabe:0xdeadbeef:0xdeadc0de4
  • 24. ©2016 ProtectWise, Inc. All rights reserved. What happened? ● The tombstone for primary key (0xcafebabe,0xdeadbeef) was written in SSTable 3 ● SSTable 3 wasn’t involved in the compaction ● ∎The data at rest didn’t get cleaned up
  • 25. ©2016 ProtectWise, Inc. All rights reserved. Why is this a problem ● In all mainline compaction strategies: ○ Data written close together chronologically tends to compact together relatively quickly ○ Data written chronologically far apart tends to take a long time to compact together ■ This is why it’s an anti-pattern to append or overwrite the same partition over long periods of time, your reads to that partition will end up needing to read out of a large number of SSTables ○ Because disk capacity is not recovered until the tombstone and its underlying data are involved in the same compaction, it can take a long time to recover disk capacity ● Some compaction strategies (DateTiered, TimeWindowed) have controls that allow for data to permanently stop compacting. ○ Under these conditions there become times where it’s impossible to ever recover disk capacity Note, See CASSANDRA-7019 for an upcoming alternative Also “Improving Tombstone Compactions” today at 4:10 in 210C
  • 26. ©2016 ProtectWise, Inc. All rights reserved. ● Once a TTL has been written, there is no way to change your mind except to write the record again with a new TTL ● Rows written to more than one time may have inconsistent TTLs leading to dirty or incomplete reads. ● TTL’d records may remain at rest much longer than you realize in some circumstances Trouble ● Fire and forget, your data will “go away” fairly predictably Terrific The Trouble with TTLs
  • 27. ©2016 ProtectWise, Inc. All rights reserved. ● Customers get to change their mind about how long they want us to retain their data ● Changing TTL’s is expensive, both in terms of I/O pressure, and temporarily doubling the size of your data at rest ● Disks are cheap
 lots of disks are not ● Cassandra data at rest has an ongoing cost, if a customer stops paying for it, we need to as well ● Timeliness of deletes is important ● Sensitive data spillage means we need to remove some data quickly Why Neither Strategy Works for Us
  • 29. ©2016 ProtectWise, Inc. All rights reserved. ● If you have hot swappable drives, this is a lot easier, if not, you might have some temporary downtime due to RF change. Step 2: Disconnect Drive ● There are some weird anti-entropy corner cases that are solved if you disable replication Step 1: Set RF=1 Basic Strategy Successfully used to delete significant amounts of data with little to no performance impact
  • 30. ©2016 ProtectWise, Inc. All rights reserved. Step 3
  • 32. ©2016 ProtectWise, Inc. All rights reserved. ● Records are removed from the next compaction as soon as they should be evicted ● If we need to recover capacity quickly we can use user defined compaction to selectively target our oldest files Evicting Compaction Strategy ● During compaction, use deterministic logic to determine which records should be removed ● Prevent records from surviving the compaction process ● Clean up indexes at the time the record is removed Delete While Compacting Basic Strategy For real this time.
  • 33. ©2016 ProtectWise, Inc. All rights reserved. ● If you choose to, you can create a backup automatically of the deleted records ● Save yourself from deletion remorse ● Incorrect deletion logic ● Change of heart by you(r customer) ● Move those records to cheaper storage Backing up your deletes ● Acts as a parent strategy with your preferred child compaction strategy ● Child strategy is responsible for sstable selection ● You get the characteristics of your strategy, with the deletes of our strategy Wrapping Compaction Strategy Features Does it support feature X of my preferred compaction strategy?
  • 34. ©2016 ProtectWise, Inc. All rights reserved. ● Configurable and extensible ● Several provided implementations can be reasonably surgically controlled by reading deletion rules out of a table you specify ● Extend one of several base classes to provide more sophisticated custom logic ● Restoring backups ● To restore accidentally deleted records, copy these files to the right path and do nodetool refresh ● Or if your topology has changed you can restore them with sstableloader Features
  • 35. ©2016 ProtectWise, Inc. All rights reserved. ALTER TABLE bar WITH compaction = { 'class': 'DeletingCompactionStrategy', 'dcs_underlying_compactor': 'LeveledCompactionStrategy', 'sstable_size_in_mb': 160 }; ALTER TABLE foo WITH compaction = { 'class': 'DeletingCompactionStrategy', 'dcs_underlying_compactor': 'SizeTieredCompactionStrategy', 'min_threshold': '2', 'max_threshold': '8' }; A Wrapping Compaction Strategy Doesn’t change the fundamental characteristics of your preferred compaction strategy
  • 36. ©2016 ProtectWise, Inc. All rights reserved. Compaction’s Inner Workings Credit: DataStax https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html
  • 37. ©2016 ProtectWise, Inc. All rights reserved. Compaction’s Inner Workings Credit: DataStax https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html { Compaction Strategy selects SSTables Returns SSTableIterators
  • 38. ©2016 ProtectWise, Inc. All rights reserved. Compaction’s Inner Workings Credit: DataStax https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html } FilteringSSTableIterators exclude data which should be deleted, and also notify IndexManager if appropriate to clean up associated indexes.
  • 39. ©2016 ProtectWise, Inc. All rights reserved. Rules: A => ✓ B => ✗ C => ✓ D => ✗ E => ✓ * if configured to backup convicted records An Evicting Compaction Strategy Records involved in compaction which are convicted do not survive into the newly compacted SSTable A B C A B D C D E A C E SSTable 1 SSTable 2 SSTable 3 New SSTable Backup SSTable* B D
  • 40. ©2016 ProtectWise, Inc. All rights reserved. ● Compaction performance is often bounded by available write capacity ● Fewer records surviving into the target table reduces write pressure during compaction ● Testing of records for conviction is lightweight (depending on the complexity of your business logic), and mostly CPU bound Often Faster than Existing Compaction
  • 41. ©2016 ProtectWise, Inc. All rights reserved. ● Records past the deletion boundary may still be visible to your application ● You may get inconsistent reads for such records ● Evicted records may resurrect temporarily due to repair ● They’ll end up in a new SSTable and will evict again during the next auto compaction Boundary Consistency ● Like all other baked in deletion options, disk capacity is reclaimed only eventually ● Old SSTables still tend not to compact very frequently ● However by triggering user defined compaction, you can reclaim space immediately without resorting to major compaction Eventual Deletes Limitations
  • 42. ©2016 ProtectWise, Inc. All rights reserved. ● Read repair and in general any repair may cause a record to fully resurrect temporarily ● Resurrected record will appear in the youngest SSTables ● Will disappear again when those new SSTables next compact (generally relatively quickly for an active cluster) Repair = Resurrection ● Logic for deletes needs to be deterministic or you’ll end up with consistency issues ● Probably not a good idea to base any deletion logic on anything outside of the primary key except in narrow use cases Requires deletion determinism Limitations
  • 43. ©2016 ProtectWise, Inc. All rights reserved. ● Supports and tested against Cassandra 2.x series ● In 3.x the package and class names changed, needs to be ported ● Tests are written in Scala, they cover a lot of surface area but would need to be rewritten prior to contribution ● Needs additional general purpose convictors ● Principally tested against STCS and deserves better coverage for other child strategies Current Project Status
  • 44. ©2016 ProtectWise, Inc. All rights reserved. https://github.com/protectwise/cassandra-util Also includes: ● Our DataStax Driver Wrapper for Scala ● Our CCM wrapper lib for automating unit tests in Scala GitHub Availability & Compatibility
  • 45. www.protectwise.com/careers.html Especially if you’re in Denver! Scala, Akka, Spark, Node, DevOps We’re Hiring!
  • 46. ©2016 ProtectWise, Inc. All rights reserved.
  • 47. Cold Storage that Isn’t Glacial Tomorrow 10:45 Room LL20D Using Approximate Data for Small, Insightful Analytics Tomorrow 2:00 Room LL20A See Our Other Talks

Hinweis der Redaktion

  1. Essentially tombstones will never go away as long as a partition contains data in more than one SSTable, sometimes not even then (bloom filter collisions)
  2. When you write to Cassandra, the writes initially go to Memtables. When the memtables get full, they flush to disk as an immutable SSTable When you perform a read, Cassandra needs to consider all the SSTables on disk, so as you accumulate lots of small SSTables, read performance will degrade
  3. What do you think will be in SSTable 4? Optimally it should be an empty table, the only record in it has been deleted. However

  4. What happened?