SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Apache Cassandra 
Fundamentals 
or: 
How I stopped worrying and learned to love the CAP theorem 
Russell Spitzer 
@RussSpitzer 
Software Engineer in Test at DataStax
Who am I? 
• Former Bioinformatics Student 
at UCSF 
• Work on the integration of 
Cassandra (C*) with Hadoop, 
Solr, and Redacted! 
• I Spend a lot of time spinning up 
clusters on EC2, GCE, Azure, … 
http://www.datastax.com/dev/ 
blog/testing-cassandra-1000- 
nodes-at-a-time 
• Developing new ways to make 
sure that C* Scales
Apache Cassandra is a Linearly Scaling 
and Fault Tolerant noSQL Database 
Linearly Scaling: 
The power of the database 
increases linearly with the 
number of machines 
2x machines = 2x throughput 
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 
Fault Tolerant: 
Nodes down != Database Down 
Datacenter down != Database Down
CAP Theorem Limits What 
Distributed Systems can do 
Consistency 
When I ask the same question to any part of the system I should get the same answer 
How many planes do we have?
CAP Theorem Limits What 
Distributed Systems can do 
Consistency 
When I ask the same question to any part of the system I should get the same answer 
How many planes do we have? 
Consistent 
1 1 1 1 1 1 1
CAP Theorem Limits What 
Distributed Systems can do 
Consistency 
When I ask the same question to any part of the system I should get the same answer 
How many planes do we have? 
Not Consistent 
1 4 1 2 1 8 1
CAP Theorem Limits What 
Distributed Systems can do 
When I ask a question I will get an answer 
Availability 
How many planes do we have? 
Available 
1 zzzzz *snort* zzz
CAP Theorem Limits What 
Distributed Systems can do 
Availability 
When I ask a question I will get an answer 
How many planes do we have? 
I have to wait for major snooze to wake up 
zzzzz *snort* zzz 
Not Available
CAP Theorem Limits What 
Distributed Systems can do 
Partition Tolerance 
I can ask questions even when the system is having intra-system communication 
problems 
How many planes do we have? 
Team Edward Team Jacob 
1 
Tolerant
CAP Theorem Limits What 
Distributed Systems can do 
Partition Tolerance 
I can ask questions even when the system is having intra-system communication 
problems 
How many planes do we have? 
Not Tolerant 
Team Edward Team Jacob 
I’m not sure without asking those 
vampire lovers and we aren’t speaking
Cassandra is an AP System 
which is Eventually Consistent 
Eventually consistent: 
New information will make it to everyone eventually 
How many planes do we have? How many planes do we have? 
I don’t know without asking those 
vampire lovers and we aren’t speaking 
1 1 1 1 1 1 
I just heard ! 
we actually ! 
have 2 
2 2 2 2 2 2 2
Two knobs control fault tolerance in 
C*: Replication and Consistency Level 
Server Side - Replication: 
How many copies of a data should exist in the cluster? 
Coordinator 
for this operation 
ABD ABC 
ACD 
BCD 
RF=3 
Client 
SimpleStrategy: Replicas 
NetworkTopologyStrategy: Replicas per Datacenter
Two knobs control fault tolerance in 
C*: Replication and Consistency Level 
Client Side - Consistency Level: 
How many replicas should we check before 
acknowledgment? 
ABD ABC 
ACD 
BCD 
Client 
Coordinator 
for this operation 
CL = One
Two knobs control fault tolerance in 
C*: Replication and Consistency Level 
Client Side - Consistency Level: 
How many replicas should we check before 
acknowledgment? 
ABD ABC 
ACD 
BCD 
CL = Quorum 
Client 
Coordinator 
for this operation
Nodes own data whose primary key 
hashes to their their token ranges 
ABD ABC 
ACD 
BCD 
Every piece of data belongs on 
the node who owns the 
Murmur3(2.0) Hash of its 
partition key + (RF-1) other 
nodes 
Partition Key Clustering Key 
Rest of Data 
ID: ICBM_432 Time: 30 
Loc: SF , Status: Idle 
ID: ICBM_432 
Murmur3Hash 
Murmur3: A
Cassandra writes are FAST 
due to log-append storage 
Par Clu Re Memory 
Memtable 
Memtable Memtable 
Commit Log 
Par Clu Re 
Par Clu Re 
Par Clu Re 
Disk Flushed 
SSTable SSTable
Deletes in a distributed 
System are Challenging 
We need to keep records of 
deletions in case of network 
partitions 
Node1 
Node2 Power Outage 
Time 
Tombstone Tombstone 
Tombstone
Compactions merge and 
unify data in our stables 
SSTable 
1 
+ SSTable 
SSTable 
2 3 
Since SSTables are immutable 
this is our chance to 
consolidate rows and remove 
tombstones (After GC Grace)
Layout of Data Allows for Rapid 
Queries Along Clustering Columns 
ID: ICBM_432 
ID: ICBM_900 
ID: ICBM_9210 
Time: 30 
Loc: 
SF 
Status: 
Idle 
Time: 45 
Loc: 
SF 
Status: 
Idle 
Time: 60 
Loc: 
SF 
Status: 
Idle 
Time: 30 
Loc: 
Boston 
Status: 
Idle 
Time: 45 
Loc: 
Boston 
Status: 
Idle 
Time: 60 
Loc: 
Boston 
Status: 
Idle 
Time: 30 
Loc: 
Tulsa 
Status: 
Idle 
Time: 45 
Loc: 
Tulsa 
Status: 
Idle 
Time: 60 
Loc: 
Tulsa 
Status: 
Idle 
Disclaimer: Not exactly like this (Use sstable2json to see real layout)
CQL allows easy definition 
of Table Structures 
ID: ICBM_432 
Time: 30 
Loc: 
SF 
Status: 
Idle 
Time: 45 
Loc: 
SF 
Status: 
Idle 
Time: 60 
Loc: 
SF 
Status: 
Idle 
CREATE TABLE icbmlog ( 
name text, 
time timestamp, 
location text, 
status text, 
PRIMARY KEY (name,time) 
);
Reading data is FAST but 
limited by disk IO 
Memory 
Memtable 
Memtable Memtable 
Commit Log 
Par Clu Re 
Par Clu Re 
Par Clu Re 
Disk 
SSTable SSTable 
Client 
Par Clu Re 
LWW 
Replica 
Par Clu Re
Reading data is FAST but 
limited by disk IO 
Memory 
Memtable 
Memtable Memtable 
Commit Log 
Par Clu Re 
Par Clu Re 
Par Clu Re 
Disk 
SSTable SSTable 
Client 
Par Clu Re 
LWW 
Replica 
Par Clu Re 
Read 
Repair
New Clients provide a 
holistic view of the C* cluster 
Client 
ABD ABC 
ACD 
BCD 
Initial Contact 
Cluster.builder().addContactPoint("127.0.0.1").build()
Session Objects Are used 
for Executing Requests 
session = cluster.connect() 
session.execute("DROP KEYSPACE IF EXISTS icbmkey") 
session.execute("CREATE KEYSPACE icbmkey with 
replication = 
{'class':'SimpleStrategy','replication_factor':'1'}") 
For highest throughput use asynchronous methods 
ResultSetFuture executeAsync(Query query) 
Then add a callback or Queue the ResultSetFutures 
ResultSetFuture 
ResultSetFuture 
ResultSetFuture
Token Aware Policies allow the reduction 
in the number of intra-network requests 
made 
Client 
ABD ABC 
ACD 
BCD 
A
Prepared statements allow for 
sending less data over the wire 
Query is prepared on all nodes by driver 
Prepared batch statements 
can further improve throughput 
PreparedStatement ps = session.prepare("INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?)"); 
BatchStatement batch = new BatchStatement(); 
batch.add(ps.bind(uid, mid1, title1, body1)); 
batch.add(ps.bind(uid, mid2, title2, body2)); 
batch.add(ps.bind(uid, mid3, title3, body3)); 
session.execute(batch);
Avoid 
• Preparing statements more than once 
• Creating batches which are too large 
• Running statements in serial 
• Using consistency-levels above your need 
• Secondary Indexes in your main queries 
• or really at all unless you are doing analytics
Have fun with C* 
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraRussell Spitzer
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureRussell Spitzer
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark datastaxjp
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayMatthias Niehoff
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 

Was ist angesagt? (20)

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 

Ähnlich wie Cassandra Fundamentals - C* 2.0

Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBANikhil Kumar
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistencyScyllaDB
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档YUCHENG HU
 
Cassandra basic
Cassandra basicCassandra basic
Cassandra basiczqhxuyuan
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsRuben Verborgh
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
 
Apache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveApache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveAlex Thompson
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]Chris Suszyński
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveIlyas F ☁☁☁
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraLuke Tillman
 
Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!Mikhail Panchenko
 
Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!Mikhail Panchenko
 

Ähnlich wie Cassandra Fundamentals - C* 2.0 (20)

Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档
 
Cassandra basic
Cassandra basicCassandra basic
Cassandra basic
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Apache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveApache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep dive
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
System Design.pdf
System Design.pdfSystem Design.pdf
System Design.pdf
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!
 
Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!
 

Kürzlich hochgeladen

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Cassandra Fundamentals - C* 2.0

  • 1. Apache Cassandra Fundamentals or: How I stopped worrying and learned to love the CAP theorem Russell Spitzer @RussSpitzer Software Engineer in Test at DataStax
  • 2. Who am I? • Former Bioinformatics Student at UCSF • Work on the integration of Cassandra (C*) with Hadoop, Solr, and Redacted! • I Spend a lot of time spinning up clusters on EC2, GCE, Azure, … http://www.datastax.com/dev/ blog/testing-cassandra-1000- nodes-at-a-time • Developing new ways to make sure that C* Scales
  • 3. Apache Cassandra is a Linearly Scaling and Fault Tolerant noSQL Database Linearly Scaling: The power of the database increases linearly with the number of machines 2x machines = 2x throughput http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Fault Tolerant: Nodes down != Database Down Datacenter down != Database Down
  • 4. CAP Theorem Limits What Distributed Systems can do Consistency When I ask the same question to any part of the system I should get the same answer How many planes do we have?
  • 5. CAP Theorem Limits What Distributed Systems can do Consistency When I ask the same question to any part of the system I should get the same answer How many planes do we have? Consistent 1 1 1 1 1 1 1
  • 6. CAP Theorem Limits What Distributed Systems can do Consistency When I ask the same question to any part of the system I should get the same answer How many planes do we have? Not Consistent 1 4 1 2 1 8 1
  • 7. CAP Theorem Limits What Distributed Systems can do When I ask a question I will get an answer Availability How many planes do we have? Available 1 zzzzz *snort* zzz
  • 8. CAP Theorem Limits What Distributed Systems can do Availability When I ask a question I will get an answer How many planes do we have? I have to wait for major snooze to wake up zzzzz *snort* zzz Not Available
  • 9. CAP Theorem Limits What Distributed Systems can do Partition Tolerance I can ask questions even when the system is having intra-system communication problems How many planes do we have? Team Edward Team Jacob 1 Tolerant
  • 10. CAP Theorem Limits What Distributed Systems can do Partition Tolerance I can ask questions even when the system is having intra-system communication problems How many planes do we have? Not Tolerant Team Edward Team Jacob I’m not sure without asking those vampire lovers and we aren’t speaking
  • 11. Cassandra is an AP System which is Eventually Consistent Eventually consistent: New information will make it to everyone eventually How many planes do we have? How many planes do we have? I don’t know without asking those vampire lovers and we aren’t speaking 1 1 1 1 1 1 I just heard ! we actually ! have 2 2 2 2 2 2 2 2
  • 12. Two knobs control fault tolerance in C*: Replication and Consistency Level Server Side - Replication: How many copies of a data should exist in the cluster? Coordinator for this operation ABD ABC ACD BCD RF=3 Client SimpleStrategy: Replicas NetworkTopologyStrategy: Replicas per Datacenter
  • 13. Two knobs control fault tolerance in C*: Replication and Consistency Level Client Side - Consistency Level: How many replicas should we check before acknowledgment? ABD ABC ACD BCD Client Coordinator for this operation CL = One
  • 14. Two knobs control fault tolerance in C*: Replication and Consistency Level Client Side - Consistency Level: How many replicas should we check before acknowledgment? ABD ABC ACD BCD CL = Quorum Client Coordinator for this operation
  • 15. Nodes own data whose primary key hashes to their their token ranges ABD ABC ACD BCD Every piece of data belongs on the node who owns the Murmur3(2.0) Hash of its partition key + (RF-1) other nodes Partition Key Clustering Key Rest of Data ID: ICBM_432 Time: 30 Loc: SF , Status: Idle ID: ICBM_432 Murmur3Hash Murmur3: A
  • 16. Cassandra writes are FAST due to log-append storage Par Clu Re Memory Memtable Memtable Memtable Commit Log Par Clu Re Par Clu Re Par Clu Re Disk Flushed SSTable SSTable
  • 17. Deletes in a distributed System are Challenging We need to keep records of deletions in case of network partitions Node1 Node2 Power Outage Time Tombstone Tombstone Tombstone
  • 18. Compactions merge and unify data in our stables SSTable 1 + SSTable SSTable 2 3 Since SSTables are immutable this is our chance to consolidate rows and remove tombstones (After GC Grace)
  • 19. Layout of Data Allows for Rapid Queries Along Clustering Columns ID: ICBM_432 ID: ICBM_900 ID: ICBM_9210 Time: 30 Loc: SF Status: Idle Time: 45 Loc: SF Status: Idle Time: 60 Loc: SF Status: Idle Time: 30 Loc: Boston Status: Idle Time: 45 Loc: Boston Status: Idle Time: 60 Loc: Boston Status: Idle Time: 30 Loc: Tulsa Status: Idle Time: 45 Loc: Tulsa Status: Idle Time: 60 Loc: Tulsa Status: Idle Disclaimer: Not exactly like this (Use sstable2json to see real layout)
  • 20. CQL allows easy definition of Table Structures ID: ICBM_432 Time: 30 Loc: SF Status: Idle Time: 45 Loc: SF Status: Idle Time: 60 Loc: SF Status: Idle CREATE TABLE icbmlog ( name text, time timestamp, location text, status text, PRIMARY KEY (name,time) );
  • 21. Reading data is FAST but limited by disk IO Memory Memtable Memtable Memtable Commit Log Par Clu Re Par Clu Re Par Clu Re Disk SSTable SSTable Client Par Clu Re LWW Replica Par Clu Re
  • 22. Reading data is FAST but limited by disk IO Memory Memtable Memtable Memtable Commit Log Par Clu Re Par Clu Re Par Clu Re Disk SSTable SSTable Client Par Clu Re LWW Replica Par Clu Re Read Repair
  • 23. New Clients provide a holistic view of the C* cluster Client ABD ABC ACD BCD Initial Contact Cluster.builder().addContactPoint("127.0.0.1").build()
  • 24. Session Objects Are used for Executing Requests session = cluster.connect() session.execute("DROP KEYSPACE IF EXISTS icbmkey") session.execute("CREATE KEYSPACE icbmkey with replication = {'class':'SimpleStrategy','replication_factor':'1'}") For highest throughput use asynchronous methods ResultSetFuture executeAsync(Query query) Then add a callback or Queue the ResultSetFutures ResultSetFuture ResultSetFuture ResultSetFuture
  • 25. Token Aware Policies allow the reduction in the number of intra-network requests made Client ABD ABC ACD BCD A
  • 26. Prepared statements allow for sending less data over the wire Query is prepared on all nodes by driver Prepared batch statements can further improve throughput PreparedStatement ps = session.prepare("INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?)"); BatchStatement batch = new BatchStatement(); batch.add(ps.bind(uid, mid1, title1, body1)); batch.add(ps.bind(uid, mid2, title2, body2)); batch.add(ps.bind(uid, mid3, title3, body3)); session.execute(batch);
  • 27. Avoid • Preparing statements more than once • Creating batches which are too large • Running statements in serial • Using consistency-levels above your need • Secondary Indexes in your main queries • or really at all unless you are doing analytics
  • 28. Have fun with C* Questions?