SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
©2013 DataStax Confidential. Do not distribute without consent.
@rustyrazorblade
Jon Haddad

Technical Evangelist, Datastax
Introduction to Cassandra
1
High Level Architecture
• Ring based replication
• Only 1 type of server (cassandra)
• All nodes hold data and can answer
queries
• No SPOF
• Build for HA & Scalability
• Multi-DC
• Data is found by key (CQL)
• Runs on JVM
Hash Ring
• Key is hashed to a position on the
ring
• Data is replicated to RF=N servers
• Using a snitch (build in feature) you
can ensure replicas are located on
different racks / AZ
CAP Tradeoffs
!
• Cassandra chooses Availability &
Partition Tolerance over Consistency
• Replication factor (RF)
• Queries have tunable consistency level
• ALL, QUORUM, ONE
Data Structures
• Like an RDBMS, Cassandra uses a Table to
store data
• But there’s where the similarities end
• Partitions within tables
• Rows within partitions (or a single row)
• CQL to create tables & query data
• Partition keys determine where a partition
is found
• Clustering keys determine ordering of rows
within a partition
Table
Partition
Row
Keyspace
Example: Single Row Partition
• Simple User system
• Identified by name (pk)
• 1 Row per partition
• This is familiar territory
name age job
jon 33 evangelist
luke 33 evangelist
old pete 108 retired
s. seagal 62 actor
JCVD 53 actor
cqlsh:demo> select * from user WHERE name = 'JCVD'
cqlsh:demo> create table user 	
(name text primary key, 	
age int, 	
job text);
Example: Multiple Rows
• Comments on photos
• Comments are always selected by
the photo_id
• There are only 4 rows in 2 partitions
• In the real world, use UUIDs instead
of int for PK
photo_id comment_id user comment
5 1 jon hi
5 2 luke oh hey
5 3 JCVD AHHHHH!!!
6 4 jon great pic
select * from comment where photo_id=5
create table comment 	
( photo_id int, 	
comment_id int, 	
user text, 	
comment text, 	
primary key (photo_id, comment_id));
Partition with Clustering
photo_id comment_id name comment comment_id user comment comment_id user comment
5 1 jon hi 2 luke oh hey 3 JCVD AHHHHH!!!
6 4 jon great pic
• Multiple rows are transposed into a single partition
• Partitions vary in size
• Old terminology - "wide row"
The Write Path
• Writes are written to any node in the cluster
(coordinator)
• Writes are written to commit log, then to
meltable
• Memtable flushed to disk periodically
(sstable)
• New memtable is created in memory
• Deletes are actually a special write case,
called a “tombstone”
What is an SSTable?
• Immutable data file
• Deletes are written as tombstones
• Every write includes a timestamp of when it
was written
• Partition is spread across multiple SSTables
• Same column can be in multiple SSTables
• Merged through compaction, only latest
timestamp is kept
sstable sstable sstable
sstable
The Read Path
• Any server may be queried, it acts as the
coordinator
• Contacts nodes with the requested key
• On each node, data is pulled from
SSTables and merged
• Consistency< ALL performs read repair
in background (read_repair_chance)
CQL Data Types
Basic Types Collections
text uuid counter map
int timeuuid list
decimal set
blob
Read the CQL documentation for the full list of types
Pro Data Modeling
• How do I query my data if I can only
query by key?
• Denormalize!
• Create multiple views into your data
(multiple tables)
• Cassandra is built for fast writes
• Use fast writes to do as few reads as
possible
©2013 DataStax Confidential. Do not distribute without consent. 14

Weitere ähnliche Inhalte

Was ist angesagt?

Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012Gluster.org
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)MongoDB
 
MongoDB Administration 20110922
MongoDB Administration 20110922MongoDB Administration 20110922
MongoDB Administration 20110922radiocats
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011jbellis
 
Openvz booth
Openvz boothOpenvz booth
Openvz boothOpenVZ
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hansonhungarianhc
 
Networking in iOS NSURLSession & NSStream
Networking in iOS NSURLSession & NSStreamNetworking in iOS NSURLSession & NSStream
Networking in iOS NSURLSession & NSStreamManjula Jonnalagadda
 
Migrating from MySQL to MongoDB
Migrating from MySQL to MongoDBMigrating from MySQL to MongoDB
Migrating from MySQL to MongoDBJames Carr
 
ZFS on the server and the desktop: N+1 ways to better store your data
ZFS on the server and the desktop: N+1 ways to better store your dataZFS on the server and the desktop: N+1 ways to better store your data
ZFS on the server and the desktop: N+1 ways to better store your dataMatthias van der Heide
 
Data Focused Docker Clustering. Docker Hamburg
Data Focused Docker Clustering. Docker HamburgData Focused Docker Clustering. Docker Hamburg
Data Focused Docker Clustering. Docker Hamburgmsh100
 
Staying friendly with the gc
Staying friendly with the gcStaying friendly with the gc
Staying friendly with the gcOren Eini
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing dataPhil Cryer
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrationsT Jake Luciani
 

Was ist angesagt? (20)

Raid
RaidRaid
Raid
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012
 
PostgreSQL News
PostgreSQL NewsPostgreSQL News
PostgreSQL News
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
mogpres
mogpresmogpres
mogpres
 
MongoDB Administration 20110922
MongoDB Administration 20110922MongoDB Administration 20110922
MongoDB Administration 20110922
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Lost with data consistency
Lost with data consistencyLost with data consistency
Lost with data consistency
 
Openvz booth
Openvz boothOpenvz booth
Openvz booth
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hanson
 
Ocfs2 storage
Ocfs2 storageOcfs2 storage
Ocfs2 storage
 
XS Boston 2008 Quantitative
XS Boston 2008 QuantitativeXS Boston 2008 Quantitative
XS Boston 2008 Quantitative
 
Networking in iOS NSURLSession & NSStream
Networking in iOS NSURLSession & NSStreamNetworking in iOS NSURLSession & NSStream
Networking in iOS NSURLSession & NSStream
 
Migrating from MySQL to MongoDB
Migrating from MySQL to MongoDBMigrating from MySQL to MongoDB
Migrating from MySQL to MongoDB
 
ZFS on the server and the desktop: N+1 ways to better store your data
ZFS on the server and the desktop: N+1 ways to better store your dataZFS on the server and the desktop: N+1 ways to better store your data
ZFS on the server and the desktop: N+1 ways to better store your data
 
Data Focused Docker Clustering. Docker Hamburg
Data Focused Docker Clustering. Docker HamburgData Focused Docker Clustering. Docker Hamburg
Data Focused Docker Clustering. Docker Hamburg
 
Cluster
ClusterCluster
Cluster
 
Staying friendly with the gc
Staying friendly with the gcStaying friendly with the gc
Staying friendly with the gc
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing data
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrations
 

Ähnlich wie Introduction to Cassandra

Cassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraCassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraDataStax Academy
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...DataStax Academy
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupCarlos Juzarte Rolo
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databaseslovingprince58
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Andy Parsons Pivotal June 2011
Andy Parsons Pivotal June 2011Andy Parsons Pivotal June 2011
Andy Parsons Pivotal June 2011Andy Parsons
 

Ähnlich wie Introduction to Cassandra (20)

Cassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraCassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache Cassandra
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User Group
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Andy Parsons Pivotal June 2011
Andy Parsons Pivotal June 2011Andy Parsons Pivotal June 2011
Andy Parsons Pivotal June 2011
 
Drop acid
Drop acidDrop acid
Drop acid
 

Mehr von DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Introduction to Cassandra

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @rustyrazorblade Jon Haddad
 Technical Evangelist, Datastax Introduction to Cassandra 1
  • 2. High Level Architecture • Ring based replication • Only 1 type of server (cassandra) • All nodes hold data and can answer queries • No SPOF • Build for HA & Scalability • Multi-DC • Data is found by key (CQL) • Runs on JVM
  • 3. Hash Ring • Key is hashed to a position on the ring • Data is replicated to RF=N servers • Using a snitch (build in feature) you can ensure replicas are located on different racks / AZ
  • 4. CAP Tradeoffs ! • Cassandra chooses Availability & Partition Tolerance over Consistency • Replication factor (RF) • Queries have tunable consistency level • ALL, QUORUM, ONE
  • 5. Data Structures • Like an RDBMS, Cassandra uses a Table to store data • But there’s where the similarities end • Partitions within tables • Rows within partitions (or a single row) • CQL to create tables & query data • Partition keys determine where a partition is found • Clustering keys determine ordering of rows within a partition Table Partition Row Keyspace
  • 6. Example: Single Row Partition • Simple User system • Identified by name (pk) • 1 Row per partition • This is familiar territory name age job jon 33 evangelist luke 33 evangelist old pete 108 retired s. seagal 62 actor JCVD 53 actor cqlsh:demo> select * from user WHERE name = 'JCVD' cqlsh:demo> create table user (name text primary key, age int, job text);
  • 7. Example: Multiple Rows • Comments on photos • Comments are always selected by the photo_id • There are only 4 rows in 2 partitions • In the real world, use UUIDs instead of int for PK photo_id comment_id user comment 5 1 jon hi 5 2 luke oh hey 5 3 JCVD AHHHHH!!! 6 4 jon great pic select * from comment where photo_id=5 create table comment ( photo_id int, comment_id int, user text, comment text, primary key (photo_id, comment_id));
  • 8. Partition with Clustering photo_id comment_id name comment comment_id user comment comment_id user comment 5 1 jon hi 2 luke oh hey 3 JCVD AHHHHH!!! 6 4 jon great pic • Multiple rows are transposed into a single partition • Partitions vary in size • Old terminology - "wide row"
  • 9. The Write Path • Writes are written to any node in the cluster (coordinator) • Writes are written to commit log, then to meltable • Memtable flushed to disk periodically (sstable) • New memtable is created in memory • Deletes are actually a special write case, called a “tombstone”
  • 10. What is an SSTable? • Immutable data file • Deletes are written as tombstones • Every write includes a timestamp of when it was written • Partition is spread across multiple SSTables • Same column can be in multiple SSTables • Merged through compaction, only latest timestamp is kept sstable sstable sstable sstable
  • 11. The Read Path • Any server may be queried, it acts as the coordinator • Contacts nodes with the requested key • On each node, data is pulled from SSTables and merged • Consistency< ALL performs read repair in background (read_repair_chance)
  • 12. CQL Data Types Basic Types Collections text uuid counter map int timeuuid list decimal set blob Read the CQL documentation for the full list of types
  • 13. Pro Data Modeling • How do I query my data if I can only query by key? • Denormalize! • Create multiple views into your data (multiple tables) • Cassandra is built for fast writes • Use fast writes to do as few reads as possible
  • 14. ©2013 DataStax Confidential. Do not distribute without consent. 14