SlideShare a Scribd company logo
1 of 37
Download to read offline
Cassandra Days | Sponsored by
5 Developer Pitfalls
With Apache Cassandra
Cassandra Day |
Discover the real power of NoSQL
5 Pitfalls of
Cassandra Developers
BERLIN
September 20th
clunven
Cassandra Days | Sponsored by
Cédrick Lunven, Director of Developer Relations
➢ Trainer
➢ Public Speaker
➢ Developers Support
➢ Developer Applications
➢ Developer Tooling
➢ Creator of ff4j (ff4j.org)
➢ Maintainer for 8 years+
➢ Happy developer for 14 years
➢ Spring Petclinic Reactive & Starters
➢ Implementing APIs for 8 years
clunven
clun
clunven
Cassandra Days | Sponsored by
5 Developer Pitfalls
With Apache Cassandra
Cassandra Days | Sponsored by
Exploring Apis
Document oriented
Shape your requests up !
It is not “just CQL”
02
01
05
Data Modeling
The good, the bad, the ugly
03
Developers Horror Museum
Session, Object Mapping,
Frameworks
04
Cassandra Graveyard
Tombstones and Zombies
Administration and Operation
Scale like a boss
Agenda
Cassandra Days | Sponsored by
5 Developer Pitfalls
With Apache Cassandra
Cassandra Days | Sponsored by
Exploring Apis
Document oriented
02
01
05
Data Modeling
The good, the bad, the ugly
03
04
Cassandra Graveyard
Tombstones and Zombies
Administration and Operation
Scale like a boss
Agenda
Developers Horror Museum
Session, Object Mapping,
Frameworks
Shape your requests up !
It is not “just CQL”
Cassandra Days | Sponsored by
4 Objectives,
4 Models
2 Transitions
Data Modeling Methodology
Cassandra Days | Sponsored by
Data Modeling in action
Cassandra Days | Sponsored by
Data Modeling Methodology
Mapping rule 1: “Entities and relationships”
– Entity and relationship types map to tables
Mapping rule 2: “Equality search attributes”
– Equality search attributes map to the beginning columns of a primary key
Mapping rule 3: “Inequality search attributes”
– Inequality search attributes map to clustering columns
Mapping rule 4: “Ordering attributes”
– Ordering attributes map to clustering columns
Mapping rule 5: “Key attributes”
– Key attributes map to primary key columns
Based on
a conceptual
data model
Based on
a query
Based on
a conceptual
data model
Cassandra Days | Sponsored by
Data Modeling in action
Cassandra Days | Sponsored by
Some data modeling (High level) Pitfalls
❌ DO NOT
● Use relational DB methodology (3NF…) but follow Cassandra
methodology
● Reuse same tables for different where clauses
✅ DOs
● Know you requests and schema data model before code
● 1 table = 1 query (most of the time)
● Duplicate the data
● Secondary indexes as last resort
● Materialized views are experimental
More to come on types, collections…
Cassandra Days | Sponsored by
network sensor temperature
forest f001 92
forest f002 88
volcano v001 210
sea s001 45
sea s002 50
home h001 72
road r001 105
road r002 110
ice i001 35
car c001 69
dog d001 40
car c002 70
Primary Key
sensors_by_network
Partition Key
Partition Sizing : Balanced small partitions
Cassandra Days | Sponsored by
Partition Sizing : Balanced small partitions
forest f001 92
forest f002 88
cano v001 210
ice i001 35
home h001 72
dog d001 40
sea s001 45
sea s002 50
road r002 110
road r001 105
car c002 70
car c001 69
sensors_by_network
network sensor temperature
Cassandra Days | Sponsored by
Partition Sizing : Balanced small partitions
CREATE TABLE temperatures_by_XXXX (
sensor TEXT,
date DATE,
timestamp TIMESTAMP,
value FLOAT,
PRIMARY KEY ????
)...
PRIMARY KEY ((sensor), value, timestamp);
PRIMARY KEY ((sensor));
PRIMARY KEY ((sensor), timestamp);
Not Unique
Not sorted
PRIMARY KEY ((sensor, date), timestamp);
Big partition
PRIMARY KEY ((date), sensor, timestamp); Hot partition
PRIMARY KEY ((sensor, date, hour), timestamp);
BUCKETING
Cassandra Days | Sponsored by
✅ Calculators Tools:
● Cql Calculator:
https://github.com/johnnywidth/cql-calculator
● Cassandra Data Modeler
http://www.sestevez.com/sestevez/CassandraDataModeler/
📘 Sizing limits:
● Soft: 100k Rows, 100MB and 10MB/cell (slighty bigger with C4
)
● Hard: 2 billions cells
Partition Sizing : Balanced small partitions
Ck
: pk columns Cr
: regular columns
Cs
: static columns Cc
: clustering columns
Nr
: Number of rows
Nv
: number of values = Nr * (Nc-Mpk-Ns)+ Ns
Tavg
: avg cell metadata (~8 bytes)
k1
k2
✅ Bench
Cassandra Days | Sponsored by
Ordered list of values, can contains duplicates, can access
data with index.
❌ Limitations
● Server side race condition when simultaneously adding and
removing elements.
● Setting and removing an element by position incur an internal
read-before-write.
● Prepend or append operations are non-idempotent.
● There is an additional overhead to store the index
✅ Solutions
● If order is not important use Set
● if list will be “large” (100+) use dedicated tables
● if very small “(<10)” could be clustering columns
List collection type
CREATE TABLE IF NOT EXISTS table_with_list (
uid uuid,
items list<text>,
PRIMARY KEY (uid)
);
INSERT INTO table_with_list(uid,items)
VALUES (c7133017-6409-4d7a-9479-07a5c1e79306,
['a', 'b', 'c']);
UPDATE table_with_list SET items = items + ['f']
WHERE uid = c7133017-6409-4d7a-9479-07a5c1e79306;
Cassandra Days | Sponsored by
📘 What
● Same structures as java, unordered, ensure unicity
● Conflict-free replicated types = same value on each server
● Frozen: content serialized as a single value
● Non-frozen (default): save as a separate values
❌ Limitations (non-Frozen)
● Overread to store metadata
● Reading one value still returned the full collection
● Tombstones are created (more soon)
● Performance degradation over time (updates in different SSTABLES)
✅ Solutions
● Use frozen when data is immutable
Set and Map collection types
CREATE TABLE IF NOT EXISTS table_with_set (
uid uuid,
animals set<text>,
PRIMARY KEY (uid)
);
CREATE TABLE IF NOT EXISTS table_with_map (
uid text,
dictionary map<text, text>,
PRIMARY KEY (uid)
);
Cassandra Days | Sponsored by
📘 What
● Structures created by user to store organized data
● Hold a schema
● Frozen: content serialized as a single value
● Non-frozen (default): save as a separate values
❌ Limitations (non-Frozen)
● Schema evolutions: you cannot delete UDT columns
● Mutation size will increased (too many elements or to much nested
UDT). Hitting max_mutation_size = failed operation.
✅ Solutions
● Use frozen UDT when it is possible
● When using non-UDT limit the number of columns
● Falling back to text based column contains JSON release pressure
User Defined Types (UDT)
CREATE TYPE IF NOT EXISTS udt_address (
street text,
city text,
state text,
);
CREATE TABLE IF NOT EXISTS table_with_udt (
uid text,
address udt_address,
PRIMARY KEY (uid)
);
Cassandra Days | Sponsored by
📘 What:
● 64-bit signed integer, imprecise values such as likes, views
● Two operations only: increment, decrement
● First op assumes value is zero
● NOT LIKE ORACLE SEQUENCES, NOT AUTO-INCREMENT
❌ Limitations
● Cannot be part of primary key
● Cannot be mixed with other types in table
● Cannot be inserted or updated with a value
● Updates are not idempotent
● Writes are slower (extra local read ar replica level)
✅ Solutions
● Tables containing multiple counters should be distributed (counter
name part of PK)
Counters
CREATE TABLE IF NOT EXISTS table_with_counters (
handle text,
following counter,
followers counter,
notifications counter,
PRIMARY KEY (handle)
);
UPDATE table_with_counters
SET followers = followers + 1
WHERE handle = 'clunven';
Cassandra Days | Sponsored by
5 Developer Pitfalls
With Apache Cassandra
Cassandra Days | Sponsored by
Exploring Apis
Document oriented
Shape your requests up !
It is not “just CQL”
02
01
05
Data Modeling
The good, the bad, the ugly
03
Developers Horror Museum
Session, Object Mapping,
Frameworks
04
Cassandra Graveyard
Tombstones and Zombies
Administration and Operation
Scale like a boss
Agenda
Cassandra Days | Sponsored by
Cassandra Query Language
❌ Don’t
● AVOID SELECT *, provides colum names, adding columns get you more data than expected.
● AVOID SELECT COUNT(*), will likely timeout, will spread across all nodes, use DsBulk
● AVOID (LARGE) IN(...) statements cross partitions requests, unefficient and will hit different nodes, N+1 select pattern
● AVOID ALLOW DUMBERING FILTERING, full scan cluster 😨
✅ Dos
● Prepare your statements (Parse once, run many times)
○ Saves network trips for result set metadata.
○ Client-side type validation.
○ Statements binding on partition keys compute their own cluster routing.
● Know your Cql Types
○ Reduce space use on disk
○ https://cassandra.apache.org/doc/latest/cassandra/cql/types.html
● Use Metadata : TTL, TIMESTAMP
PreparedStatement ps = session
.prepare("
SELECT id FROM sensors_by_network
WHERE network = ?");
BoundStatement psb = ps.bind(network);
ResultSet rs = session.execute(psb);
Cassandra Days | Sponsored by
Enforce Immediate Consistency
Client
Writing…
Client
RF = 3
CL.READ = QUORUM = RF/2 + 1 = 2
CL.WRITE = QUORUM = RF/2 + 1 = 2
CL.READ + CL.WRITE > RF --> 4 > 3
CL.WRITE = QUORUM
CL.READ = QUORUM
Cassandra Days | Sponsored by
Synchronous Requests
Cassandra Days | Sponsored by
Asynchronous Requests
Cassandra Days | Sponsored by
Reactive Requests
Cassandra Days | Sponsored by
Understanding Batches
📘 What
● Execute multiple modification statements (insert, update, delete)
simultaneiously
● Coordinator node for batch, then each statements executed at the same
time
● Retries until it works or timeout
📘 Use Cases
● ✅ Logged, single-partition batches
○ Efficient and atomic
○ Use whenever possible
● ✅ Logged, multi-partition batches
○ Somewhat efficient and pseudo-atomic
○ Use selectively
● ❌ Unlogged and counter batches
○ Do not use
BEGIN BATCH
INSERT INTO ...;
INSERT INTO ...;
...
APPLY BATCH;
BEGIN BATCH
UPDATE ...;
UPDATE ...;
...
APPLY BATCH;
Cassandra Days | Sponsored by
Lightweight Transaction (LWT)
📘 What
Linearizable consistency ensures that concurrent transactions
produce the same result as if they execute in a sequence, one after
another.
● Guarantee linearizable consistency
● Require 4 coordinator-replica round trips
❌ Don’t
● May become a bottleneck with high data contention
✅ Dos
● Use with race conditions and low data contention
INSERT INTO ... VALUES ...
IF NOT EXISTS;
UPDATE ... SET ... WHERE ...
IF EXISTS | IF predicate [ AND ... ];
DELETE ... FROM ... WHERE ...
IF EXISTS | IF predicate [ AND ... ];
Cassandra Days | Sponsored by
5 Developer Pitfalls
With Apache Cassandra
Cassandra Days | Sponsored by
Exploring Apis
Document oriented
Shape your requests up !
It is not “just CQL”
02
01
05
Data Modeling
The good, the bad, the ugly
03
Developers Horror Museum
Session, Object Mapping,
Frameworks
04
Cassandra Graveyard
Tombstones and Zombies
Administration and Operation
Scale like a boss
Agenda
Cassandra Days | Sponsored by
Understanding compaction strategy
📘 About Compaction Strategy
● Process to merge SSTABLES in bigger ones
● Defined per table
✅ Dos
● SizeTiered Compaction (STCS) - (Default)
○ Triggers when multiple SSTables of a similar size are present.
○ Insert-heavy and general use cases
● Leveled Compaction (LCS) -
○ groups SSTables into levels, each of which has a fixed size limit
which is 10 times larger than the previous level.
○ Read-Heavy workload (no overlap at same level), I/O intensive
● TimeWindow Compaction (TWCS) -
○ creates time windowed buckets of SSTables that are compacted
with each other using the Size Tiered Compaction Strategy.
○ Immutable Data
Cassandra Days | Sponsored by
Tombstones
📘 What
● Written markers for deleted data in SSTables
● Data with TTL
● UPDATEs and INSERTs with NULL values
● “Overwritten” collections
● Multiples Types
○ Cell, Rows, Range, Partition
○ TTL Tombstones (cells and rows)
● Clean during “compaction”
❌ Don’t
● INSERT NULL when you can
✅ Dos
● Always delete as much data as possible in one mutation
[ { "partition" : {
"key" : [ "CA102" ], "position" : 0,
"deletion_info" : {
"marked_deleted" : "2020-07-03T23:11:58.785298Z",
"local_delete_time" : "2020-07-03T23:11:58Z"
} },"rows" : [ ] },
{ "partition" : {
"key" : [ "CA101" ],
"position" : 20 },
"rows" : [
{ "type" : "row",
"position" : 95,
"clustering" : [ "item101", 1.5 ],
"liveness_info" : { "tstamp" :"2020-07-03T23:10:40.326673Z" },
"cells" : [
{ "name" : "product_code", "value" : "p101" },
{ "name" : "replacements", "value" : ["item101-r", "item101-r2"]
}]}]},
{ "partition" : {
"key" : [ "CA103" ],
"position" : 0 },
"rows" : [ {
"type" : "row",
"position" : 74,
"clustering" : [ "item103", 3.0 ],
"liveness_info" : {
"tstamp" : "2020-07-03T23:23:39.440426Z", "ttl" : 30,
"expires_at" : "2020-07-03T23:24:09Z", "expired" : true
},
"cells" : [
{ "name" : "product_code", "value" : "p103" },
{ "name" : "replacements", "value" : ["item101", "item101-r"] }
] } ]} ]
Cassandra Days | Sponsored by
Tombstones - Issue #1 = Large Partitions
📘 Issue Definition
● Tombstone = additional data to store and read
● Query performance degrades, heap memory pressure increases
● tombstone_warn_threshold
○ Warning when more than 1,000 tombstones are scanned by a query
● tombstone_failure_threshold
○ Aborted query when more than 100,000 tombstones are scanned
❌ Don’t
● “Queueing Pattern” = keep deleting messages in Cassandra
✅ Dos
● Decrease the value of gc_grace_seconds (default is 864000 or 10 days)
○ Deleted data and tombstones can be purged during compaction after gc_grace_seconds
● Run compaction more frequently
○ nodetool compact keyspace tablename
deletion_info
deletion_info
deletion_info
deletion_info
deletion_info
deletion_info
Cassandra Days | Sponsored by
Tombstones Issue #2, Zombie Data
📘 Issue Definition
● A replica node is unresponsive and receives no tombstone
● Other replicas nodes receive tombstones
● The tombstones get purged after gc_grace_seconds and compaction
● The unresponsive replica comes back online and resurrects data that was
previously marked as deleted
✅ Dos
● Run repairs within the gc_grace_seconds and on a regular basis
● nodetool repair
● Do not let a node to rejoin the cluster after the gc_grace_seconds
DATA
Cassandra Days | Sponsored by
5 Developer Pitfalls
With Apache Cassandra
Cassandra Days | Sponsored by
Exploring Apis
Document oriented
Shape your requests up !
It is not “just CQL”
02
01
05
Data Modeling
The good, the bad, the ugly
03
Developers Horror Museum
Session, Object Mapping,
Frameworks
04
Cassandra Graveyard
Tombstones and Zombies
Administration and Operation
Scale like a boss
Agenda
Cassandra Days | Sponsored by
Setup connection with Session (Cluster)
✅ Dos
● Contact points
○ One is enough, consider multiples for first call
○ static-ip or fixed names
○ prioritized seed nodes
● Setup local data center
● Create CqlSession explicitely
❌ Don’t
● Multiple keyspaces is a code smell and a can of worms
● Do not stick to default available properties in frameworks
DRIVER
?
Cassandra Days | Sponsored by
Proper usage of the session
📘 About Session
● Stateful object handling communications with each node
○ Pooling, retries, reconnect, health checks
✅ Dos
● Should be unique in the Application (Singleton)
● Should be closed at application shutdown (shutdown hook)
● Should be used for fined grained queries (execute)
Java: cqlSession.close();
Python: session.shutdown();
Node: client.shutdown();
CSharp: IDisposable
Cassandra Days | Sponsored by
❌ DO NOT….
● DO NOT let it create your session (maximum flexibility)
● DO NOT let it generate your schema (metadata are important !)
● DO NOT use OSIV nor Active record but rather Repository
● DO NOT use findAll() = Table Full scan
● DO NOT create large IN queries.
● DO NOT reuse objects from multiple where clause
● DO NOT implement N+1 select, create a new table
● DO NOT stick to simplicity of Repositories
✅ DOs….
● DO Prepare your statements
● (Spring Data) : use SimpleCassandraRepository
● (Spring Data) : use CassandraOperations (cqlSession)
● Use Batches, LWT and everything show before (hidden by ORM)
Object Mapping is your closest ennemy
34
Cassandra Days | Sponsored by
5 Developer Pitfalls
With Apache Cassandra
Cassandra Days | Sponsored by
Exploring Apis
Document oriented
Shape your requests up !
It is not “just CQL”
02
01
05
Data Modeling
The good, the bad, the ugly
03
Developers Horror Museum
Session, Object Mapping,
Frameworks
04
Cassandra Graveyard
Tombstones and Zombies
Administration and Operation
Scale like a boss
Agenda
Cassandra Days | Sponsored by
Administration and Operations
✅ DOs….
● Mesure Everything (Disk, CPU, RAM) MCAC, JMX, Virtual tables
● GC pause is the top of the iceberg, dig into infrastructure
● Run Reparis frequently (incremental) using Repear
● SNAPSHOTS, Back and restore
● Do not undersestiate security
○ RBAC
○ Encryption at rest
○ Internode communications
❌ DO NOT….
● TUNE BEFORE UNDERSTAND
Cassandra Days | Sponsored by
5 Developer Pitfalls
With Apache Cassandra
Cassandra Day | Berlin
Discover the real power of NoSQL
September 20th, 2022
Thank You
clunven
clunven
Thank you !

More Related Content

Similar to Avoiding Pitfalls for Cassandra.pdf

Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Qbeast
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
Guillaume Lefranc
 

Similar to Avoiding Pitfalls for Cassandra.pdf (20)

AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
 
Cassandra
CassandraCassandra
Cassandra
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
Presentation
PresentationPresentation
Presentation
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Druid
DruidDruid
Druid
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
 

More from Cédrick Lunven

More from Cédrick Lunven (19)

Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
 
BigData Paris 2022 - Innovations récentes et futures autour du NoSQL Apache ...
BigData Paris 2022 - Innovations récentes et futures autour du NoSQL Apache ...BigData Paris 2022 - Innovations récentes et futures autour du NoSQL Apache ...
BigData Paris 2022 - Innovations récentes et futures autour du NoSQL Apache ...
 
Unlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQLUnlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQL
 
An oss api layer for your cassandra
An oss api layer for your cassandraAn oss api layer for your cassandra
An oss api layer for your cassandra
 
CN Asturias - Stateful application for kubernetes
CN Asturias -  Stateful application for kubernetes CN Asturias -  Stateful application for kubernetes
CN Asturias - Stateful application for kubernetes
 
Xebicon2019 m icroservices
Xebicon2019   m icroservicesXebicon2019   m icroservices
Xebicon2019 m icroservices
 
DevFestBdm2019
DevFestBdm2019DevFestBdm2019
DevFestBdm2019
 
Reactive Programming with Cassandra
Reactive Programming with CassandraReactive Programming with Cassandra
Reactive Programming with Cassandra
 
Shift Dev Conf API
Shift Dev Conf APIShift Dev Conf API
Shift Dev Conf API
 
VoxxedDays Luxembourg FF4J
VoxxedDays Luxembourg FF4JVoxxedDays Luxembourg FF4J
VoxxedDays Luxembourg FF4J
 
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
 
Design API - SnowCampIO
Design API - SnowCampIODesign API - SnowCampIO
Design API - SnowCampIO
 
Create API for your Databases
Create API for your DatabasesCreate API for your Databases
Create API for your Databases
 
Leveraging Feature Toggles for your Microservices (VoxxeddaysMicroservices Pa...
Leveraging Feature Toggles for your Microservices (VoxxeddaysMicroservices Pa...Leveraging Feature Toggles for your Microservices (VoxxeddaysMicroservices Pa...
Leveraging Feature Toggles for your Microservices (VoxxeddaysMicroservices Pa...
 
Streaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache CassandraStreaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache Cassandra
 
Riviera jug apicassandra
Riviera jug apicassandraRiviera jug apicassandra
Riviera jug apicassandra
 
Riviera JUG ff4j
Riviera JUG ff4jRiviera JUG ff4j
Riviera JUG ff4j
 
Paris Meetup Jhispter #9 - Generator FF4j for Jhipster
Paris Meetup Jhispter #9 - Generator FF4j for JhipsterParis Meetup Jhispter #9 - Generator FF4j for Jhipster
Paris Meetup Jhispter #9 - Generator FF4j for Jhipster
 
Introduction to Feature Toggle and FF4J
Introduction to Feature Toggle and FF4JIntroduction to Feature Toggle and FF4J
Introduction to Feature Toggle and FF4J
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Avoiding Pitfalls for Cassandra.pdf

  • 1. Cassandra Days | Sponsored by 5 Developer Pitfalls With Apache Cassandra Cassandra Day | Discover the real power of NoSQL 5 Pitfalls of Cassandra Developers BERLIN September 20th clunven
  • 2. Cassandra Days | Sponsored by Cédrick Lunven, Director of Developer Relations ➢ Trainer ➢ Public Speaker ➢ Developers Support ➢ Developer Applications ➢ Developer Tooling ➢ Creator of ff4j (ff4j.org) ➢ Maintainer for 8 years+ ➢ Happy developer for 14 years ➢ Spring Petclinic Reactive & Starters ➢ Implementing APIs for 8 years clunven clun clunven
  • 3. Cassandra Days | Sponsored by 5 Developer Pitfalls With Apache Cassandra Cassandra Days | Sponsored by Exploring Apis Document oriented Shape your requests up ! It is not “just CQL” 02 01 05 Data Modeling The good, the bad, the ugly 03 Developers Horror Museum Session, Object Mapping, Frameworks 04 Cassandra Graveyard Tombstones and Zombies Administration and Operation Scale like a boss Agenda
  • 4. Cassandra Days | Sponsored by 5 Developer Pitfalls With Apache Cassandra Cassandra Days | Sponsored by Exploring Apis Document oriented 02 01 05 Data Modeling The good, the bad, the ugly 03 04 Cassandra Graveyard Tombstones and Zombies Administration and Operation Scale like a boss Agenda Developers Horror Museum Session, Object Mapping, Frameworks Shape your requests up ! It is not “just CQL”
  • 5. Cassandra Days | Sponsored by 4 Objectives, 4 Models 2 Transitions Data Modeling Methodology
  • 6. Cassandra Days | Sponsored by Data Modeling in action
  • 7. Cassandra Days | Sponsored by Data Modeling Methodology Mapping rule 1: “Entities and relationships” – Entity and relationship types map to tables Mapping rule 2: “Equality search attributes” – Equality search attributes map to the beginning columns of a primary key Mapping rule 3: “Inequality search attributes” – Inequality search attributes map to clustering columns Mapping rule 4: “Ordering attributes” – Ordering attributes map to clustering columns Mapping rule 5: “Key attributes” – Key attributes map to primary key columns Based on a conceptual data model Based on a query Based on a conceptual data model
  • 8. Cassandra Days | Sponsored by Data Modeling in action
  • 9. Cassandra Days | Sponsored by Some data modeling (High level) Pitfalls ❌ DO NOT ● Use relational DB methodology (3NF…) but follow Cassandra methodology ● Reuse same tables for different where clauses ✅ DOs ● Know you requests and schema data model before code ● 1 table = 1 query (most of the time) ● Duplicate the data ● Secondary indexes as last resort ● Materialized views are experimental More to come on types, collections…
  • 10. Cassandra Days | Sponsored by network sensor temperature forest f001 92 forest f002 88 volcano v001 210 sea s001 45 sea s002 50 home h001 72 road r001 105 road r002 110 ice i001 35 car c001 69 dog d001 40 car c002 70 Primary Key sensors_by_network Partition Key Partition Sizing : Balanced small partitions
  • 11. Cassandra Days | Sponsored by Partition Sizing : Balanced small partitions forest f001 92 forest f002 88 cano v001 210 ice i001 35 home h001 72 dog d001 40 sea s001 45 sea s002 50 road r002 110 road r001 105 car c002 70 car c001 69 sensors_by_network network sensor temperature
  • 12. Cassandra Days | Sponsored by Partition Sizing : Balanced small partitions CREATE TABLE temperatures_by_XXXX ( sensor TEXT, date DATE, timestamp TIMESTAMP, value FLOAT, PRIMARY KEY ???? )... PRIMARY KEY ((sensor), value, timestamp); PRIMARY KEY ((sensor)); PRIMARY KEY ((sensor), timestamp); Not Unique Not sorted PRIMARY KEY ((sensor, date), timestamp); Big partition PRIMARY KEY ((date), sensor, timestamp); Hot partition PRIMARY KEY ((sensor, date, hour), timestamp); BUCKETING
  • 13. Cassandra Days | Sponsored by ✅ Calculators Tools: ● Cql Calculator: https://github.com/johnnywidth/cql-calculator ● Cassandra Data Modeler http://www.sestevez.com/sestevez/CassandraDataModeler/ 📘 Sizing limits: ● Soft: 100k Rows, 100MB and 10MB/cell (slighty bigger with C4 ) ● Hard: 2 billions cells Partition Sizing : Balanced small partitions Ck : pk columns Cr : regular columns Cs : static columns Cc : clustering columns Nr : Number of rows Nv : number of values = Nr * (Nc-Mpk-Ns)+ Ns Tavg : avg cell metadata (~8 bytes) k1 k2 ✅ Bench
  • 14. Cassandra Days | Sponsored by Ordered list of values, can contains duplicates, can access data with index. ❌ Limitations ● Server side race condition when simultaneously adding and removing elements. ● Setting and removing an element by position incur an internal read-before-write. ● Prepend or append operations are non-idempotent. ● There is an additional overhead to store the index ✅ Solutions ● If order is not important use Set ● if list will be “large” (100+) use dedicated tables ● if very small “(<10)” could be clustering columns List collection type CREATE TABLE IF NOT EXISTS table_with_list ( uid uuid, items list<text>, PRIMARY KEY (uid) ); INSERT INTO table_with_list(uid,items) VALUES (c7133017-6409-4d7a-9479-07a5c1e79306, ['a', 'b', 'c']); UPDATE table_with_list SET items = items + ['f'] WHERE uid = c7133017-6409-4d7a-9479-07a5c1e79306;
  • 15. Cassandra Days | Sponsored by 📘 What ● Same structures as java, unordered, ensure unicity ● Conflict-free replicated types = same value on each server ● Frozen: content serialized as a single value ● Non-frozen (default): save as a separate values ❌ Limitations (non-Frozen) ● Overread to store metadata ● Reading one value still returned the full collection ● Tombstones are created (more soon) ● Performance degradation over time (updates in different SSTABLES) ✅ Solutions ● Use frozen when data is immutable Set and Map collection types CREATE TABLE IF NOT EXISTS table_with_set ( uid uuid, animals set<text>, PRIMARY KEY (uid) ); CREATE TABLE IF NOT EXISTS table_with_map ( uid text, dictionary map<text, text>, PRIMARY KEY (uid) );
  • 16. Cassandra Days | Sponsored by 📘 What ● Structures created by user to store organized data ● Hold a schema ● Frozen: content serialized as a single value ● Non-frozen (default): save as a separate values ❌ Limitations (non-Frozen) ● Schema evolutions: you cannot delete UDT columns ● Mutation size will increased (too many elements or to much nested UDT). Hitting max_mutation_size = failed operation. ✅ Solutions ● Use frozen UDT when it is possible ● When using non-UDT limit the number of columns ● Falling back to text based column contains JSON release pressure User Defined Types (UDT) CREATE TYPE IF NOT EXISTS udt_address ( street text, city text, state text, ); CREATE TABLE IF NOT EXISTS table_with_udt ( uid text, address udt_address, PRIMARY KEY (uid) );
  • 17. Cassandra Days | Sponsored by 📘 What: ● 64-bit signed integer, imprecise values such as likes, views ● Two operations only: increment, decrement ● First op assumes value is zero ● NOT LIKE ORACLE SEQUENCES, NOT AUTO-INCREMENT ❌ Limitations ● Cannot be part of primary key ● Cannot be mixed with other types in table ● Cannot be inserted or updated with a value ● Updates are not idempotent ● Writes are slower (extra local read ar replica level) ✅ Solutions ● Tables containing multiple counters should be distributed (counter name part of PK) Counters CREATE TABLE IF NOT EXISTS table_with_counters ( handle text, following counter, followers counter, notifications counter, PRIMARY KEY (handle) ); UPDATE table_with_counters SET followers = followers + 1 WHERE handle = 'clunven';
  • 18. Cassandra Days | Sponsored by 5 Developer Pitfalls With Apache Cassandra Cassandra Days | Sponsored by Exploring Apis Document oriented Shape your requests up ! It is not “just CQL” 02 01 05 Data Modeling The good, the bad, the ugly 03 Developers Horror Museum Session, Object Mapping, Frameworks 04 Cassandra Graveyard Tombstones and Zombies Administration and Operation Scale like a boss Agenda
  • 19. Cassandra Days | Sponsored by Cassandra Query Language ❌ Don’t ● AVOID SELECT *, provides colum names, adding columns get you more data than expected. ● AVOID SELECT COUNT(*), will likely timeout, will spread across all nodes, use DsBulk ● AVOID (LARGE) IN(...) statements cross partitions requests, unefficient and will hit different nodes, N+1 select pattern ● AVOID ALLOW DUMBERING FILTERING, full scan cluster 😨 ✅ Dos ● Prepare your statements (Parse once, run many times) ○ Saves network trips for result set metadata. ○ Client-side type validation. ○ Statements binding on partition keys compute their own cluster routing. ● Know your Cql Types ○ Reduce space use on disk ○ https://cassandra.apache.org/doc/latest/cassandra/cql/types.html ● Use Metadata : TTL, TIMESTAMP PreparedStatement ps = session .prepare(" SELECT id FROM sensors_by_network WHERE network = ?"); BoundStatement psb = ps.bind(network); ResultSet rs = session.execute(psb);
  • 20. Cassandra Days | Sponsored by Enforce Immediate Consistency Client Writing… Client RF = 3 CL.READ = QUORUM = RF/2 + 1 = 2 CL.WRITE = QUORUM = RF/2 + 1 = 2 CL.READ + CL.WRITE > RF --> 4 > 3 CL.WRITE = QUORUM CL.READ = QUORUM
  • 21. Cassandra Days | Sponsored by Synchronous Requests
  • 22. Cassandra Days | Sponsored by Asynchronous Requests
  • 23. Cassandra Days | Sponsored by Reactive Requests
  • 24. Cassandra Days | Sponsored by Understanding Batches 📘 What ● Execute multiple modification statements (insert, update, delete) simultaneiously ● Coordinator node for batch, then each statements executed at the same time ● Retries until it works or timeout 📘 Use Cases ● ✅ Logged, single-partition batches ○ Efficient and atomic ○ Use whenever possible ● ✅ Logged, multi-partition batches ○ Somewhat efficient and pseudo-atomic ○ Use selectively ● ❌ Unlogged and counter batches ○ Do not use BEGIN BATCH INSERT INTO ...; INSERT INTO ...; ... APPLY BATCH; BEGIN BATCH UPDATE ...; UPDATE ...; ... APPLY BATCH;
  • 25. Cassandra Days | Sponsored by Lightweight Transaction (LWT) 📘 What Linearizable consistency ensures that concurrent transactions produce the same result as if they execute in a sequence, one after another. ● Guarantee linearizable consistency ● Require 4 coordinator-replica round trips ❌ Don’t ● May become a bottleneck with high data contention ✅ Dos ● Use with race conditions and low data contention INSERT INTO ... VALUES ... IF NOT EXISTS; UPDATE ... SET ... WHERE ... IF EXISTS | IF predicate [ AND ... ]; DELETE ... FROM ... WHERE ... IF EXISTS | IF predicate [ AND ... ];
  • 26. Cassandra Days | Sponsored by 5 Developer Pitfalls With Apache Cassandra Cassandra Days | Sponsored by Exploring Apis Document oriented Shape your requests up ! It is not “just CQL” 02 01 05 Data Modeling The good, the bad, the ugly 03 Developers Horror Museum Session, Object Mapping, Frameworks 04 Cassandra Graveyard Tombstones and Zombies Administration and Operation Scale like a boss Agenda
  • 27. Cassandra Days | Sponsored by Understanding compaction strategy 📘 About Compaction Strategy ● Process to merge SSTABLES in bigger ones ● Defined per table ✅ Dos ● SizeTiered Compaction (STCS) - (Default) ○ Triggers when multiple SSTables of a similar size are present. ○ Insert-heavy and general use cases ● Leveled Compaction (LCS) - ○ groups SSTables into levels, each of which has a fixed size limit which is 10 times larger than the previous level. ○ Read-Heavy workload (no overlap at same level), I/O intensive ● TimeWindow Compaction (TWCS) - ○ creates time windowed buckets of SSTables that are compacted with each other using the Size Tiered Compaction Strategy. ○ Immutable Data
  • 28. Cassandra Days | Sponsored by Tombstones 📘 What ● Written markers for deleted data in SSTables ● Data with TTL ● UPDATEs and INSERTs with NULL values ● “Overwritten” collections ● Multiples Types ○ Cell, Rows, Range, Partition ○ TTL Tombstones (cells and rows) ● Clean during “compaction” ❌ Don’t ● INSERT NULL when you can ✅ Dos ● Always delete as much data as possible in one mutation [ { "partition" : { "key" : [ "CA102" ], "position" : 0, "deletion_info" : { "marked_deleted" : "2020-07-03T23:11:58.785298Z", "local_delete_time" : "2020-07-03T23:11:58Z" } },"rows" : [ ] }, { "partition" : { "key" : [ "CA101" ], "position" : 20 }, "rows" : [ { "type" : "row", "position" : 95, "clustering" : [ "item101", 1.5 ], "liveness_info" : { "tstamp" :"2020-07-03T23:10:40.326673Z" }, "cells" : [ { "name" : "product_code", "value" : "p101" }, { "name" : "replacements", "value" : ["item101-r", "item101-r2"] }]}]}, { "partition" : { "key" : [ "CA103" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 74, "clustering" : [ "item103", 3.0 ], "liveness_info" : { "tstamp" : "2020-07-03T23:23:39.440426Z", "ttl" : 30, "expires_at" : "2020-07-03T23:24:09Z", "expired" : true }, "cells" : [ { "name" : "product_code", "value" : "p103" }, { "name" : "replacements", "value" : ["item101", "item101-r"] } ] } ]} ]
  • 29. Cassandra Days | Sponsored by Tombstones - Issue #1 = Large Partitions 📘 Issue Definition ● Tombstone = additional data to store and read ● Query performance degrades, heap memory pressure increases ● tombstone_warn_threshold ○ Warning when more than 1,000 tombstones are scanned by a query ● tombstone_failure_threshold ○ Aborted query when more than 100,000 tombstones are scanned ❌ Don’t ● “Queueing Pattern” = keep deleting messages in Cassandra ✅ Dos ● Decrease the value of gc_grace_seconds (default is 864000 or 10 days) ○ Deleted data and tombstones can be purged during compaction after gc_grace_seconds ● Run compaction more frequently ○ nodetool compact keyspace tablename deletion_info deletion_info deletion_info deletion_info deletion_info deletion_info
  • 30. Cassandra Days | Sponsored by Tombstones Issue #2, Zombie Data 📘 Issue Definition ● A replica node is unresponsive and receives no tombstone ● Other replicas nodes receive tombstones ● The tombstones get purged after gc_grace_seconds and compaction ● The unresponsive replica comes back online and resurrects data that was previously marked as deleted ✅ Dos ● Run repairs within the gc_grace_seconds and on a regular basis ● nodetool repair ● Do not let a node to rejoin the cluster after the gc_grace_seconds DATA
  • 31. Cassandra Days | Sponsored by 5 Developer Pitfalls With Apache Cassandra Cassandra Days | Sponsored by Exploring Apis Document oriented Shape your requests up ! It is not “just CQL” 02 01 05 Data Modeling The good, the bad, the ugly 03 Developers Horror Museum Session, Object Mapping, Frameworks 04 Cassandra Graveyard Tombstones and Zombies Administration and Operation Scale like a boss Agenda
  • 32. Cassandra Days | Sponsored by Setup connection with Session (Cluster) ✅ Dos ● Contact points ○ One is enough, consider multiples for first call ○ static-ip or fixed names ○ prioritized seed nodes ● Setup local data center ● Create CqlSession explicitely ❌ Don’t ● Multiple keyspaces is a code smell and a can of worms ● Do not stick to default available properties in frameworks DRIVER ?
  • 33. Cassandra Days | Sponsored by Proper usage of the session 📘 About Session ● Stateful object handling communications with each node ○ Pooling, retries, reconnect, health checks ✅ Dos ● Should be unique in the Application (Singleton) ● Should be closed at application shutdown (shutdown hook) ● Should be used for fined grained queries (execute) Java: cqlSession.close(); Python: session.shutdown(); Node: client.shutdown(); CSharp: IDisposable
  • 34. Cassandra Days | Sponsored by ❌ DO NOT…. ● DO NOT let it create your session (maximum flexibility) ● DO NOT let it generate your schema (metadata are important !) ● DO NOT use OSIV nor Active record but rather Repository ● DO NOT use findAll() = Table Full scan ● DO NOT create large IN queries. ● DO NOT reuse objects from multiple where clause ● DO NOT implement N+1 select, create a new table ● DO NOT stick to simplicity of Repositories ✅ DOs…. ● DO Prepare your statements ● (Spring Data) : use SimpleCassandraRepository ● (Spring Data) : use CassandraOperations (cqlSession) ● Use Batches, LWT and everything show before (hidden by ORM) Object Mapping is your closest ennemy 34
  • 35. Cassandra Days | Sponsored by 5 Developer Pitfalls With Apache Cassandra Cassandra Days | Sponsored by Exploring Apis Document oriented Shape your requests up ! It is not “just CQL” 02 01 05 Data Modeling The good, the bad, the ugly 03 Developers Horror Museum Session, Object Mapping, Frameworks 04 Cassandra Graveyard Tombstones and Zombies Administration and Operation Scale like a boss Agenda
  • 36. Cassandra Days | Sponsored by Administration and Operations ✅ DOs…. ● Mesure Everything (Disk, CPU, RAM) MCAC, JMX, Virtual tables ● GC pause is the top of the iceberg, dig into infrastructure ● Run Reparis frequently (incremental) using Repear ● SNAPSHOTS, Back and restore ● Do not undersestiate security ○ RBAC ○ Encryption at rest ○ Internode communications ❌ DO NOT…. ● TUNE BEFORE UNDERSTAND
  • 37. Cassandra Days | Sponsored by 5 Developer Pitfalls With Apache Cassandra Cassandra Day | Berlin Discover the real power of NoSQL September 20th, 2022 Thank You clunven clunven Thank you !