Introduction to Cassandra

•

1 gefällt mir•1,671 views

DataStax Academy

©2013 DataStax Conﬁdential. Do not distribute without consent.
@rustyrazorblade
Jon Haddad 
Technical Evangelist, Datastax
Introduction to Cassandra
1

High Level Architecture
• Ring based replication
• Only 1 type of server (cassandra)
• All nodes hold data and can answer
queries
• No SPOF
• Build for HA & Scalability
• Multi-DC
• Data is found by key (CQL)
• Runs on JVM

Hash Ring
• Key is hashed to a position on the
ring
• Data is replicated to RF=N servers
• Using a snitch (build in feature) you
can ensure replicas are located on
different racks / AZ

CAP Tradeoffs
!
• Cassandra chooses Availability &
Partition Tolerance over Consistency
• Replication factor (RF)
• Queries have tunable consistency level
• ALL, QUORUM, ONE

Data Structures
• Like an RDBMS, Cassandra uses a Table to
store data
• But there’s where the similarities end
• Partitions within tables
• Rows within partitions (or a single row)
• CQL to create tables & query data
• Partition keys determine where a partition
is found
• Clustering keys determine ordering of rows
within a partition
Table
Partition
Row
Keyspace

Example: Single Row Partition
• Simple User system
• Identified by name (pk)
• 1 Row per partition
• This is familiar territory
name age job
jon 33 evangelist
luke 33 evangelist
old pete 108 retired
s. seagal 62 actor
JCVD 53 actor
cqlsh:demo> select * from user WHERE name = 'JCVD'
cqlsh:demo> create table user
(name text primary key,
age int,
job text);

Example: Multiple Rows
• Comments on photos
• Comments are always selected by
the photo_id
• There are only 4 rows in 2 partitions
• In the real world, use UUIDs instead
of int for PK
photo_id comment_id user comment
5 1 jon hi
5 2 luke oh hey
5 3 JCVD AHHHHH!!!
6 4 jon great pic
select * from comment where photo_id=5
create table comment
( photo_id int,
comment_id int,
user text,
comment text,
primary key (photo_id, comment_id));

Partition with Clustering
photo_id comment_id name comment comment_id user comment comment_id user comment
5 1 jon hi 2 luke oh hey 3 JCVD AHHHHH!!!
6 4 jon great pic
• Multiple rows are transposed into a single partition
• Partitions vary in size
• Old terminology - "wide row"

The Write Path
• Writes are written to any node in the cluster
(coordinator)
• Writes are written to commit log, then to
meltable
• Memtable flushed to disk periodically
(sstable)
• New memtable is created in memory
• Deletes are actually a special write case,
called a “tombstone”

What is an SSTable?
• Immutable data file
• Deletes are written as tombstones
• Every write includes a timestamp of when it
was written
• Partition is spread across multiple SSTables
• Same column can be in multiple SSTables
• Merged through compaction, only latest
timestamp is kept
sstable sstable sstable
sstable

The Read Path
• Any server may be queried, it acts as the
coordinator
• Contacts nodes with the requested key
• On each node, data is pulled from
SSTables and merged
• Consistency< ALL performs read repair
in background (read_repair_chance)

CQL Data Types
Basic Types Collections
text uuid counter map
int timeuuid list
decimal set
blob
Read the CQL documentation for the full list of types

Pro Data Modeling
• How do I query my data if I can only
query by key?
• Denormalize!
• Create multiple views into your data
(multiple tables)
• Cassandra is built for fast writes
• Use fast writes to do as few reads as
possible

©2013 DataStax Conﬁdential. Do not distribute without consent. 14

Weitere ähnliche Inhalte

Was ist angesagt?

RaidPrateek Gupta

Sdc challenges-2012Gluster.org

PostgreSQL NewsPeter Eisentraut

Sharding with MongoDB (Eliot Horowitz)MongoDB

mogpresHiroshi Ono

MongoDB Administration 20110922radiocats

State of Cassandra, 2011jbellis

Lost with data consistencyMichał Gryglicki

Openvz boothOpenVZ

MongoDB Administration ~ Kevin Hansonhungarianhc

Ocfs2 storageTimothy Krupinski

XS Boston 2008 QuantitativeThe Linux Foundation

Networking in iOS NSURLSession & NSStreamManjula Jonnalagadda

Migrating from MySQL to MongoDBJames Carr

ZFS on the server and the desktop: N+1 ways to better store your dataMatthias van der Heide

Data Focused Docker Clustering. Docker Hamburgmsh100

ClusterYasir Wani

Staying friendly with the gcOren Eini

Storing and distributing dataPhil Cryer

Cassandra integrationsT Jake Luciani

Was ist angesagt? (20)

Raid

Sdc challenges-2012

PostgreSQL News

Sharding with MongoDB (Eliot Horowitz)

mogpres

MongoDB Administration 20110922

State of Cassandra, 2011

Lost with data consistency

Openvz booth

MongoDB Administration ~ Kevin Hanson

Ocfs2 storage

XS Boston 2008 Quantitative

Networking in iOS NSURLSession & NSStream

Migrating from MySQL to MongoDB

ZFS on the server and the desktop: N+1 ways to better store your data

Data Focused Docker Clustering. Docker Hamburg

Cluster

Staying friendly with the gc

Storing and distributing data

Cassandra integrations

Ähnlich wie Introduction to Cassandra

Cassandra Day Denver 2014: Introduction to Apache CassandraDataStax Academy

Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...DataStax Academy

Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy

Intro to CassandraJon Haddad

L6.sp17.pptxSudheerKumar499932

Cassandra trainingAndrás Fehér

Deep Dive on Amazon RedshiftAmazon Web Services

An Introduction to Cassandra - Oracle User GroupCarlos Juzarte Rolo

What Every Developer Should Know About Database Scalabilityjbellis

Why you should care about data layout in the file system with Cheng Lian and ...Databricks

Design Patterns For Distributed NO-reational databaseslovingprince58

Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services

Deep Dive on Amazon RedshiftAmazon Web Services

Cassandra 101radicalbit

Hbase schema design and sizing apache-con europe - nov 2012Chris Huang

NoSQL databasesMarin Dimitrov

Design Patterns for Distributed Non-Relational Databasesguestdfd1ec

Data Warehousing with Amazon RedshiftAmazon Web Services

Andy Parsons Pivotal June 2011Andy Parsons

Drop acidMike Feltman

Ähnlich wie Introduction to Cassandra (20)

Cassandra Day Denver 2014: Introduction to Apache Cassandra

Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...

Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...

Intro to Cassandra

L6.sp17.pptx

Cassandra training

Deep Dive on Amazon Redshift

An Introduction to Cassandra - Oracle User Group

What Every Developer Should Know About Database Scalability

Why you should care about data layout in the file system with Cheng Lian and ...

Design Patterns For Distributed NO-reational databases

Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...

Deep Dive on Amazon Redshift

Cassandra 101

Hbase schema design and sizing apache-con europe - nov 2012

NoSQL databases

Design Patterns for Distributed Non-Relational Databases

Data Warehousing with Amazon Redshift

Andy Parsons Pivotal June 2011

Drop acid

Mehr von DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy

Introduction to DataStax Enterprise Graph DatabaseDataStax Academy

Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy

Cassandra on Docker @ Walmart LabsDataStax Academy

Cassandra 3.0 Data ModelingDataStax Academy

Cassandra Adoption on Cisco UCS & Open stackDataStax Academy

Data Modeling for Apache CassandraDataStax Academy

Coursera Cassandra DriverDataStax Academy

Production Ready CassandraDataStax Academy

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy

Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy

Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy

Standing Up Your First ClusterDataStax Academy

Real Time Analytics with DseDataStax Academy

Introduction to Data Modeling with Apache CassandraDataStax Academy

Cassandra Core ConceptsDataStax Academy

Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy

Bad Habits Die Hard DataStax Academy

Advanced Data Modeling with Apache CassandraDataStax Academy

Advanced CassandraDataStax Academy

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft

Introduction to DataStax Enterprise Graph Database

Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra

Cassandra on Docker @ Walmart Labs

Cassandra 3.0 Data Modeling

Cassandra Adoption on Cisco UCS & Open stack

Data Modeling for Apache Cassandra

Coursera Cassandra Driver

Production Ready Cassandra

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python

Cassandra @ Sony: The good, the bad, and the ugly part 1

Cassandra @ Sony: The good, the bad, and the ugly part 2

Standing Up Your First Cluster

Real Time Analytics with Dse

Introduction to Data Modeling with Apache Cassandra

Cassandra Core Concepts

Enabling Search in your Cassandra Application with DataStax Enterprise

Bad Habits Die Hard

Advanced Data Modeling with Apache Cassandra

Advanced Cassandra

Introduction to Cassandra

2. High Level Architecture • Ring based replication • Only 1 type of server (cassandra) • All nodes hold data and can answer queries • No SPOF • Build for HA & Scalability • Multi-DC • Data is found by key (CQL) • Runs on JVM

3. Hash Ring • Key is hashed to a position on the ring • Data is replicated to RF=N servers • Using a snitch (build in feature) you can ensure replicas are located on different racks / AZ

4. CAP Tradeoffs ! • Cassandra chooses Availability & Partition Tolerance over Consistency • Replication factor (RF) • Queries have tunable consistency level • ALL, QUORUM, ONE

5. Data Structures • Like an RDBMS, Cassandra uses a Table to store data • But there’s where the similarities end • Partitions within tables • Rows within partitions (or a single row) • CQL to create tables & query data • Partition keys determine where a partition is found • Clustering keys determine ordering of rows within a partition Table Partition Row Keyspace

6. Example: Single Row Partition • Simple User system • Identified by name (pk) • 1 Row per partition • This is familiar territory name age job jon 33 evangelist luke 33 evangelist old pete 108 retired s. seagal 62 actor JCVD 53 actor cqlsh:demo> select * from user WHERE name = 'JCVD' cqlsh:demo> create table user (name text primary key, age int, job text);

7. Example: Multiple Rows • Comments on photos • Comments are always selected by the photo_id • There are only 4 rows in 2 partitions • In the real world, use UUIDs instead of int for PK photo_id comment_id user comment 5 1 jon hi 5 2 luke oh hey 5 3 JCVD AHHHHH!!! 6 4 jon great pic select * from comment where photo_id=5 create table comment ( photo_id int, comment_id int, user text, comment text, primary key (photo_id, comment_id));

8. Partition with Clustering photo_id comment_id name comment comment_id user comment comment_id user comment 5 1 jon hi 2 luke oh hey 3 JCVD AHHHHH!!! 6 4 jon great pic • Multiple rows are transposed into a single partition • Partitions vary in size • Old terminology - "wide row"

9. The Write Path • Writes are written to any node in the cluster (coordinator) • Writes are written to commit log, then to meltable • Memtable flushed to disk periodically (sstable) • New memtable is created in memory • Deletes are actually a special write case, called a “tombstone”

10. What is an SSTable? • Immutable data file • Deletes are written as tombstones • Every write includes a timestamp of when it was written • Partition is spread across multiple SSTables • Same column can be in multiple SSTables • Merged through compaction, only latest timestamp is kept sstable sstable sstable sstable

11. The Read Path • Any server may be queried, it acts as the coordinator • Contacts nodes with the requested key • On each node, data is pulled from SSTables and merged • Consistency< ALL performs read repair in background (read_repair_chance)

12. CQL Data Types Basic Types Collections text uuid counter map int timeuuid list decimal set blob Read the CQL documentation for the full list of types

13. Pro Data Modeling • How do I query my data if I can only query by key? • Denormalize! • Create multiple views into your data (multiple tables) • Cassandra is built for fast writes • Use fast writes to do as few reads as possible

Introduction to Cassandra

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Introduction to Cassandra

Ähnlich wie Introduction to Cassandra (20)

Mehr von DataStax Academy

Mehr von DataStax Academy (20)

Introduction to Cassandra