SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Cassandra for impatients
Carlos Alonso (@calonso)MADRID · NOV 27-28 · 2015
MADRID · NOV 27-28 · 2015@calonso
• Introductions
• Cassandra Core concepts
• CQL
• Data modelling and examples
MADRID · NOV 27-28 · 2015@calonso
CREATETABLE users (
id INT PRIMARY KEY,
nameVARCHAR,
surnameVARCHAR,
birthdateTIMESTAMP
);
INSERT INTO users (id, name, surname, birthdate)
VALUES (1,‘Carlos’,‘Alonso’, ’1985-03-19’);
SELECT * FROM users WHERE id = 1;
ALTERTABLE users ADD cityVARCHAR;
UPDATE users SET city = ‘London’WHERE id = 1;
MADRID · NOV 27-28 · 2015@calonso
IMPATIENT LEVEL…
Aha!! That’s SQL, I already know all this stuff.
I’ll start using Cassandra as soon as I get home.
MADRID · NOV 27-28 · 2015@calonso
IMPATIENT LEVEL…
Why is this taking so long?
Have they gone killing the cow for my BigMac?
MADRID · NOV 27-28 · 2015@calonso
IMPATIENT LEVEL…
As soon as I see the loading spinner…
MADRID · NOV 27-28 · 2015@calonso
CARLOS ALONSO
•Spanish Londoner
•MSc Salamanca University, Spain
•Software Engineer @MyDrive Solutions
•Cassandra certified developer
•Cassandra MVP 2015
•@calonso / http://mrcalonso.com
MADRID · NOV 27-28 · 2015@calonso
MYDRIVE SOLUTIONS
• World leading driver profiling company
• Using technology and data to understand
how to improve driving behaviour
• Recently acquired by the Generali Group
• @_MyDrive / http://mydrivesolutions.com
• We are hiring!!
MADRID · NOV 27-28 · 2015@calonso
CASSANDRA
• Open Source distributed database
management system
• Initially developed at Facebook
• Inspired by Amazon’s Dynamo and Google
BigTable papers
• Became Apache top-level project in Feb, 2010
• Nowadays developed by DataStax
Cassandra Core Concepts
Technical introduction to Apache CassandraMADRID · NOV 27-28 · 2015
MADRID · NOV 27-28 · 2015@calonso
THE CAPTHEOREM
MADRID · NOV 27-28 · 2015@calonso
CASSANDRA
• Fast Distributed NoSQL Database
• High Availability
• Linear Scalability => Predictability
• No SPOF
• Multi-DC
• Horizontally scalable => $$$
• Not a drop in replacement for RDBMS
MADRID · NOV 27-28 · 2015@calonso
CLUSTER COMPONENTS
MADRID · NOV 27-28 · 2015@calonso
CLUSTER COMPONENTS
MADRID · NOV 27-28 · 2015@calonso
CONSISTENT HASHING
Hash function
“Carlos” 185664
1773456738847666528349
-894763734895827651234
MADRID · NOV 27-28 · 2015@calonso
REQUEST COORDINATION
• Coordinator:The node designated to coordinate a particular query.
• ANY node can coordinate ANY request.
• No SPOF: One of the main Cassandra’s principles.
• The driver chooses which node will coordinate
MADRID · NOV 27-28 · 2015@calonso
REPLICATION FACTOR
How many copies (replicas) for
your data
MADRID · NOV 27-28 · 2015@calonso
WHY REPLICATION?
• Disaster recovery
• Bring data closer to users (to reduce latencies)
• Workload segregation (analytical vs transactional)
MADRID · NOV 27-28 · 2015@calonso
REPLICATION
Defined at keyspace level
CREATE KEYSPACE <my_keyspace> WITH REPLICATION =
{ “class”:“SimpleStrategy”,“replication_factor”: 2 };
CREATE KEYSPACE <my_keyspace> WITH REPLICATION =
{ “class”:“NetworkTopologyStrategy”,
“dc-east”: 2,“dc-west”: 3 };
MADRID · NOV 27-28 · 2015@calonso
CONSISTENCY LEVEL
How many replicas of your data
must respond ok?
MADRID · NOV 27-28 · 2015@calonso
CONSISTENCY LEVEL
Availability /

Partition tolerance
Consistency
MADRID · NOV 27-28 · 2015@calonso
DriverClient
Partitioner
f81d4fae-…
834
• RF = 3
• CL = QUORUM
• SELECT * … WHERE id = f81d4fae-…
MADRID · NOV 27-28 · 2015@calonso
COMPACTIONS
• Merges most recent partition keys and columns
• Evicts tombstoned columns
• Creates new SSTable
• Rebuilds partition indexes and summaries
• Deletes old SSTables
MADRID · NOV 27-28 · 2015@calonso
COMPACTIONS
• SizeTieredCompactionStrategy
• LeveledCompactionStrategy
• DateTieredCompactionStrategy
CREATETABLE user (
id UUID PRIMARY KEY,
emailTEXT,
nameTEXT,
preferences SET<TEXT>,
) WITH COMPACTION =
{ “class”: “<strategy>”, <params> };
CQL
Cassandra Query LanguageMADRID · NOV 27-28 · 2015
MADRID · NOV 27-28 · 2015@calonso
CQL
CREATETABLE users (
id UUID,
nameVARCHAR,
surnameVARCHAR,
birthdateTIMESTAMP,
PRIMARY KEY(id)
);
Familiar row-column SQL-like approach.
INSERT INTO users (id, name, surname, birthdate)
VALUES (uuid(),‘Carlos’,‘Alonso’, ’1985-03-19’);
SELECT * FROM users WHERE
id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’;
ALTERTABLE users ADD addressVARCHAR;
MADRID · NOV 27-28 · 2015@calonso
WHERE
• Equality matches:
• SELECT … WHERE album_title = ‘Revolver’;
• IN:
• Only applicable in the last WHERE clause
• SELECT … WHERE album_title = ‘Revolver’ AND number IN (2, 3, 4);
• Range search:
• Only on clustering columns.
• SELECT … WHERE album_title = ‘Revolver’ AND number >= 6 AND number < 2;
MADRID · NOV 27-28 · 2015@calonso
COLLECTIONS
• Set: Uniqueness
• List: Order
• Map: Key-Value pairs
• Tuple: Fixed length
email_addresses SET<VARCHAR>
email_addresses LIST<VARCHAR>
email_addresses MAP<VARCHAR,VARCHAR>
email_addresses TUPLE<VARCHAR,VARCHAR>
MADRID · NOV 27-28 · 2015@calonso
MORE FEATURES
• COUNTERS
• LWT
• TTL
• UDT
• BATCHES
• BATCH + LWT
• STATIC COLUMNS
Data Modelling
Basic concepts for a Cassandra modelMADRID · NOV 27-28 · 2015
MADRID · NOV 27-28 · 2015@calonso
PHYSICAL DATA STRUCTURE
MADRID · NOV 27-28 · 2015@calonso
DATA MODELLING
• Understand your data
• Decide how you’ll query the data
• Define column families to satisfy those queries
• Implement and optimize
MADRID · NOV 27-28 · 2015@calonso
DATA MODELLING
Conceptual
Model
Logical
Model
Physical
Model
Query-Driven
Methodology
Analysis &
Validation
MADRID · NOV 27-28 · 2015@calonso
PRIMARY KEY
PARTITION KEY +
CLUSTERING
COLUMN(S)
MADRID · NOV 27-28 · 2015@calonso
QUERY DRIVEN METHODOLOGY
• Spread data evenly around the cluster
• Minimise the number of partitions read
• Follow the mapping rules:
• Entities and relationships: map to tables
• Equality search attributes: must be at the beginning of the primary key
• Inequality search attributes: become clustering columns
• Ordering attributes: become clustering columns
• Key attributes: map to primary key columns
MADRID · NOV 27-28 · 2015@calonso
ANALYSIS &VALIDATION
• 1 Partition per read?
• Are write conflicts (overwrites) possible?
• How large are partitions?
• Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M
• How much data duplication? (batches)
• Client side joins or new table?
MADRID · NOV 27-28 · 2015@calonso
An E-Library project.
MADRID · NOV 27-28 · 2015@calonso
EXAMPLE 1
Books can be uniquely identified and accessed by ISBN,
we also need a title, genre, author and publisher.
QDM
Book
ISBN K
Title
Author
Genre
Publisher
Q1
Q1: Find books by ISBN
MADRID · NOV 27-28 · 2015@calonso
ANALYSIS &VALIDATION
• 1 Partition per read?
• Are write conflicts (overwrites) possible?
• How large are partitions?
• Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M
• 1 X (5 - 1 - 0) + 0 < 1M
• How much data duplication? (batches)
• Client side joins or new table?
Book
ISBN K
Title
Author
Genre
Publisher
Q1
Q1: Find books by ISBN
N/A
N/A
MADRID · NOV 27-28 · 2015@calonso
Book
ISBN K
Title
Author
Genre
Publisher
Q1
Q1: Find books by ISBNCREATETABLE books (
ISBNVARCHAR PRIMARY KEY,
titleVARCHAR,
authorVARCHAR,
genreVARCHAR,
publisherVARCHAR
);
SELECT * FROM books WHERE ISBN = ‘…’;
MADRID · NOV 27-28 · 2015@calonso
Users register into the system uniquely identified by an email
and a password. We also want their full name. They will be
accessed by email and password or internal unique ID.
QDM K
Users_by_ID
ID
full_name
Q1
Q1: Find users by ID
Users_by_login_info
email K
password
Q2
Q2: Find users by login info
K
full_name
ID
MADRID · NOV 27-28 · 2015@calonso
ANALYSIS &VALIDATION
• 1 Partition per read?
• Are write conflicts (overwrites) possible?
• How large are partitions?
• Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M
• 1 X (2 - 1 - 0) + 0 < 1M
• How much data duplication? (batches)
• Client side joins or new table?
N/A
N/A
K
Users_by_ID
ID
full_name
Q1
Q1: Find users by ID
MADRID · NOV 27-28 · 2015@calonso
CREATETABLE users_by_id (
IDTIMEUUID PRIMARY KEY,
full_nameVARCHAR
);
SELECT * FROM users_by_id WHERE ID = …;
K
Users_by_ID
ID
full_name
Q1
Q1: Find users by ID
MADRID · NOV 27-28 · 2015@calonso
ANALYSIS &VALIDATION
• 1 Partition per read?
• Are write conflicts (overwrites) possible?
• How large are partitions?
• Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M
• 1 X (4 - 2 - 0) + 0 < 1M
• How much data duplication? (batches)
• Client side joins or new table? N/A
Users_by_login_info
email K
password
Q2
Q2: Find users by login info
K
full_name
ID
MADRID · NOV 27-28 · 2015@calonso
CREATETABLE users_by_login_info (
emailVARCHAR,
passwordVARCHAR,
full_nameVARCHAR,
IDTIMEUUID,
PRIMARY_KEY ((email, password))
);
SELECT * FROM users_by_login_info
WHERE email = ‘…’AND password = ‘…’;
Users_by_login_info
email K
password
Q2
Q2: Find users by login info
K
full_name
ID
MADRID · NOV 27-28 · 2015@calonso
BEGIN BATCH
INSERT INTO users_by_id (ID, full_name)VALUES (…) IF NOT EXISTS;
INSERT INTO users_by_login_info (email, password, full_name, ID)VALUES (…);
APPLY BATCH;
MADRID · NOV 27-28 · 2015@calonso
Users read books.
We want to know which books has a user read.
QDM
K
Books_read_by_user
user_ID
title
Q1
full_name
ISBN
genre
publisher
author
C
C
Q1: Find all books a logged
user has read
MADRID · NOV 27-28 · 2015@calonso
ANALYSIS &VALIDATION
• 1 Partition per read?
• Are write conflicts (overwrites) possible?
• How large are partitions?
• Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M
• Books X (7 - 1 - 0) + 0 < 1M => 166.666 books per user
• How much data duplication? (batches)
• Client side joins or new table?
Q1: Find all books a logged
user has read
K
Books_read_by_user
user_ID
title
Q1
full_name
ISBN
genre
publisher
author
C
C
New table
N/A
MADRID · NOV 27-28 · 2015@calonso
CREATETABLE books_read_by_user (
user_IDTIMEUUID,
titleVARCHAR,
authorVARCHAR,
full_nameVARCHAR,
ISBNVARCHAR,
genreVARCHAR,
publisherVARCHAR
PRIMARY_KEY (user_id, title, author)
);
SELECT * FROM books_read_by_user
WHERE user_ID = …;
Q1: Find all books a logged
user has read
K
Books_read_by_user
user_ID
title
Q1
full_name
ISBN
genre
publisher
author
C
C
MADRID · NOV 27-28 · 2015@calonso
In order to improve our site’s usability we need to
understand how our users use it by tracking every interaction
they have with our site.
QDM
user_ID
2_weeks
time
element
type
K
Actions_by_user
Q1
Q1: Find all actions a user
does in a time range
K
C
MADRID · NOV 27-28 · 2015@calonso
ANALYSIS &VALIDATION
• 1 Partition per read?
• Are write conflicts (overwrites) possible?
• How large are partitions?
• Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M
• Actions X (5 - 2 - 0) + 0 < 1M => 333.333 per user every 2 weeks
• How much data duplication? (batches)
• Client side joins or new table?
N/A
K
Actions_by_user
user_ID
2_weeks
Q1
Q1: Find all actions a user
does in a time range
K
time
element
type
C
N/A
MADRID · NOV 27-28 · 2015@calonso
CREATETABLE actions_by_user (
user_IDTIMEUUID,
2_weeks INT,
timeTIMESTAMP,
elementVARCHAR,
typeVARCHAR
PRIMARY_KEY ((user_ID, 2_weeks), time)
);
SELECT * FROM actions_by_user
WHERE user_ID = … AND 2_weeks = … AND
time < … AND time > …;
K
Actions_by_user
user_ID
2_weeks
Q1
Q1: Find all actions a user
does in a time range
K
time
element
type
C
MADRID · NOV 27-28 · 2015@calonso
THANKS!

Weitere ähnliche Inhalte

Was ist angesagt?

Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
Eric Evans
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 

Was ist angesagt? (12)

Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
RedisConf18 - Redis and Elasticsearch
RedisConf18 - Redis and ElasticsearchRedisConf18 - Redis and Elasticsearch
RedisConf18 - Redis and Elasticsearch
 
Mongo db improve the performance of your application codemotion2016
Mongo db improve the performance of your application codemotion2016Mongo db improve the performance of your application codemotion2016
Mongo db improve the performance of your application codemotion2016
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015
 
Codemotion akka persistence, cqrs%2 fes y otras siglas del montón
Codemotion   akka persistence, cqrs%2 fes y otras siglas del montónCodemotion   akka persistence, cqrs%2 fes y otras siglas del montón
Codemotion akka persistence, cqrs%2 fes y otras siglas del montón
 

Andere mochten auch

Andere mochten auch (17)

Cassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one dayCassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one day
 
Case Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCase Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developer
 
Ruby closures, how are they possible?
Ruby closures, how are they possible?Ruby closures, how are they possible?
Ruby closures, how are they possible?
 
Construyendo y publicando nuestra primera app multiplataforma
Construyendo y publicando nuestra primera app multiplataformaConstruyendo y publicando nuestra primera app multiplataforma
Construyendo y publicando nuestra primera app multiplataforma
 
Swift and the BigData
Swift and the BigDataSwift and the BigData
Swift and the BigData
 
iOS Notifications
iOS NotificationsiOS Notifications
iOS Notifications
 
Html5
Html5Html5
Html5
 
Javascript - 2014
Javascript - 2014Javascript - 2014
Javascript - 2014
 
Enumerados Server
Enumerados ServerEnumerados Server
Enumerados Server
 
Aplicaciones móviles - HTML5
Aplicaciones móviles - HTML5Aplicaciones móviles - HTML5
Aplicaciones móviles - HTML5
 
Construyendo y publicando nuestra primera app multi plataforma (II)
Construyendo y publicando nuestra primera app multi plataforma (II)Construyendo y publicando nuestra primera app multi plataforma (II)
Construyendo y publicando nuestra primera app multi plataforma (II)
 
Javascript
JavascriptJavascript
Javascript
 
iCloud
iCloudiCloud
iCloud
 
Programacion web
Programacion webProgramacion web
Programacion web
 
Sensors (Accelerometer, Magnetometer, Gyroscope, Proximity and Luminosity)
Sensors (Accelerometer, Magnetometer, Gyroscope, Proximity and Luminosity)Sensors (Accelerometer, Magnetometer, Gyroscope, Proximity and Luminosity)
Sensors (Accelerometer, Magnetometer, Gyroscope, Proximity and Luminosity)
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 

Ähnlich wie Cassandra for impatients

Ähnlich wie Cassandra for impatients (20)

Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
 
Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014
Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014
Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in Cassandra
 
RxJava in practice
RxJava in practice RxJava in practice
RxJava in practice
 
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
 
Geospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene indexGeospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene index
 
Adding Realtime to your Projects
Adding Realtime to your ProjectsAdding Realtime to your Projects
Adding Realtime to your Projects
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Steam Learn: Introduction to NoSQL with MongoDB
Steam Learn: Introduction to NoSQL with MongoDBSteam Learn: Introduction to NoSQL with MongoDB
Steam Learn: Introduction to NoSQL with MongoDB
 
An Exploration of 3 Very Different ML Solutions Running on Accumulo
An Exploration of 3 Very Different ML Solutions Running on AccumuloAn Exploration of 3 Very Different ML Solutions Running on Accumulo
An Exploration of 3 Very Different ML Solutions Running on Accumulo
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQL
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
 
Introduction to SQL Server Graph DB
Introduction to SQL Server Graph DBIntroduction to SQL Server Graph DB
Introduction to SQL Server Graph DB
 
Dynamic Columns of Phoenix for SQL on Sparse(NoSql) Data
Dynamic Columns of Phoenix for SQL on Sparse(NoSql) DataDynamic Columns of Phoenix for SQL on Sparse(NoSql) Data
Dynamic Columns of Phoenix for SQL on Sparse(NoSql) Data
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 

Kürzlich hochgeladen

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Kürzlich hochgeladen (20)

kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 

Cassandra for impatients

  • 1. Cassandra for impatients Carlos Alonso (@calonso)MADRID · NOV 27-28 · 2015
  • 2. MADRID · NOV 27-28 · 2015@calonso • Introductions • Cassandra Core concepts • CQL • Data modelling and examples
  • 3. MADRID · NOV 27-28 · 2015@calonso CREATETABLE users ( id INT PRIMARY KEY, nameVARCHAR, surnameVARCHAR, birthdateTIMESTAMP ); INSERT INTO users (id, name, surname, birthdate) VALUES (1,‘Carlos’,‘Alonso’, ’1985-03-19’); SELECT * FROM users WHERE id = 1; ALTERTABLE users ADD cityVARCHAR; UPDATE users SET city = ‘London’WHERE id = 1;
  • 4. MADRID · NOV 27-28 · 2015@calonso IMPATIENT LEVEL… Aha!! That’s SQL, I already know all this stuff. I’ll start using Cassandra as soon as I get home.
  • 5. MADRID · NOV 27-28 · 2015@calonso IMPATIENT LEVEL… Why is this taking so long? Have they gone killing the cow for my BigMac?
  • 6. MADRID · NOV 27-28 · 2015@calonso IMPATIENT LEVEL… As soon as I see the loading spinner…
  • 7. MADRID · NOV 27-28 · 2015@calonso CARLOS ALONSO •Spanish Londoner •MSc Salamanca University, Spain •Software Engineer @MyDrive Solutions •Cassandra certified developer •Cassandra MVP 2015 •@calonso / http://mrcalonso.com
  • 8. MADRID · NOV 27-28 · 2015@calonso MYDRIVE SOLUTIONS • World leading driver profiling company • Using technology and data to understand how to improve driving behaviour • Recently acquired by the Generali Group • @_MyDrive / http://mydrivesolutions.com • We are hiring!!
  • 9. MADRID · NOV 27-28 · 2015@calonso CASSANDRA • Open Source distributed database management system • Initially developed at Facebook • Inspired by Amazon’s Dynamo and Google BigTable papers • Became Apache top-level project in Feb, 2010 • Nowadays developed by DataStax
  • 10. Cassandra Core Concepts Technical introduction to Apache CassandraMADRID · NOV 27-28 · 2015
  • 11. MADRID · NOV 27-28 · 2015@calonso THE CAPTHEOREM
  • 12. MADRID · NOV 27-28 · 2015@calonso CASSANDRA • Fast Distributed NoSQL Database • High Availability • Linear Scalability => Predictability • No SPOF • Multi-DC • Horizontally scalable => $$$ • Not a drop in replacement for RDBMS
  • 13. MADRID · NOV 27-28 · 2015@calonso CLUSTER COMPONENTS
  • 14. MADRID · NOV 27-28 · 2015@calonso CLUSTER COMPONENTS
  • 15. MADRID · NOV 27-28 · 2015@calonso CONSISTENT HASHING Hash function “Carlos” 185664 1773456738847666528349 -894763734895827651234
  • 16. MADRID · NOV 27-28 · 2015@calonso REQUEST COORDINATION • Coordinator:The node designated to coordinate a particular query. • ANY node can coordinate ANY request. • No SPOF: One of the main Cassandra’s principles. • The driver chooses which node will coordinate
  • 17. MADRID · NOV 27-28 · 2015@calonso REPLICATION FACTOR How many copies (replicas) for your data
  • 18. MADRID · NOV 27-28 · 2015@calonso WHY REPLICATION? • Disaster recovery • Bring data closer to users (to reduce latencies) • Workload segregation (analytical vs transactional)
  • 19. MADRID · NOV 27-28 · 2015@calonso REPLICATION Defined at keyspace level CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”:“SimpleStrategy”,“replication_factor”: 2 }; CREATE KEYSPACE <my_keyspace> WITH REPLICATION = { “class”:“NetworkTopologyStrategy”, “dc-east”: 2,“dc-west”: 3 };
  • 20. MADRID · NOV 27-28 · 2015@calonso CONSISTENCY LEVEL How many replicas of your data must respond ok?
  • 21. MADRID · NOV 27-28 · 2015@calonso CONSISTENCY LEVEL Availability /
 Partition tolerance Consistency
  • 22. MADRID · NOV 27-28 · 2015@calonso DriverClient Partitioner f81d4fae-… 834 • RF = 3 • CL = QUORUM • SELECT * … WHERE id = f81d4fae-…
  • 23.
  • 24. MADRID · NOV 27-28 · 2015@calonso COMPACTIONS • Merges most recent partition keys and columns • Evicts tombstoned columns • Creates new SSTable • Rebuilds partition indexes and summaries • Deletes old SSTables
  • 25. MADRID · NOV 27-28 · 2015@calonso COMPACTIONS • SizeTieredCompactionStrategy • LeveledCompactionStrategy • DateTieredCompactionStrategy CREATETABLE user ( id UUID PRIMARY KEY, emailTEXT, nameTEXT, preferences SET<TEXT>, ) WITH COMPACTION = { “class”: “<strategy>”, <params> };
  • 26.
  • 27. CQL Cassandra Query LanguageMADRID · NOV 27-28 · 2015
  • 28. MADRID · NOV 27-28 · 2015@calonso CQL CREATETABLE users ( id UUID, nameVARCHAR, surnameVARCHAR, birthdateTIMESTAMP, PRIMARY KEY(id) ); Familiar row-column SQL-like approach. INSERT INTO users (id, name, surname, birthdate) VALUES (uuid(),‘Carlos’,‘Alonso’, ’1985-03-19’); SELECT * FROM users WHERE id = ‘f81d4fae-7dec-11d0-a765-00a0c91e6bf6’; ALTERTABLE users ADD addressVARCHAR;
  • 29. MADRID · NOV 27-28 · 2015@calonso WHERE • Equality matches: • SELECT … WHERE album_title = ‘Revolver’; • IN: • Only applicable in the last WHERE clause • SELECT … WHERE album_title = ‘Revolver’ AND number IN (2, 3, 4); • Range search: • Only on clustering columns. • SELECT … WHERE album_title = ‘Revolver’ AND number >= 6 AND number < 2;
  • 30. MADRID · NOV 27-28 · 2015@calonso COLLECTIONS • Set: Uniqueness • List: Order • Map: Key-Value pairs • Tuple: Fixed length email_addresses SET<VARCHAR> email_addresses LIST<VARCHAR> email_addresses MAP<VARCHAR,VARCHAR> email_addresses TUPLE<VARCHAR,VARCHAR>
  • 31. MADRID · NOV 27-28 · 2015@calonso MORE FEATURES • COUNTERS • LWT • TTL • UDT • BATCHES • BATCH + LWT • STATIC COLUMNS
  • 32. Data Modelling Basic concepts for a Cassandra modelMADRID · NOV 27-28 · 2015
  • 33. MADRID · NOV 27-28 · 2015@calonso PHYSICAL DATA STRUCTURE
  • 34. MADRID · NOV 27-28 · 2015@calonso DATA MODELLING • Understand your data • Decide how you’ll query the data • Define column families to satisfy those queries • Implement and optimize
  • 35. MADRID · NOV 27-28 · 2015@calonso DATA MODELLING Conceptual Model Logical Model Physical Model Query-Driven Methodology Analysis & Validation
  • 36. MADRID · NOV 27-28 · 2015@calonso PRIMARY KEY PARTITION KEY + CLUSTERING COLUMN(S)
  • 37. MADRID · NOV 27-28 · 2015@calonso QUERY DRIVEN METHODOLOGY • Spread data evenly around the cluster • Minimise the number of partitions read • Follow the mapping rules: • Entities and relationships: map to tables • Equality search attributes: must be at the beginning of the primary key • Inequality search attributes: become clustering columns • Ordering attributes: become clustering columns • Key attributes: map to primary key columns
  • 38. MADRID · NOV 27-28 · 2015@calonso ANALYSIS &VALIDATION • 1 Partition per read? • Are write conflicts (overwrites) possible? • How large are partitions? • Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M • How much data duplication? (batches) • Client side joins or new table?
  • 39. MADRID · NOV 27-28 · 2015@calonso An E-Library project.
  • 40. MADRID · NOV 27-28 · 2015@calonso EXAMPLE 1 Books can be uniquely identified and accessed by ISBN, we also need a title, genre, author and publisher. QDM Book ISBN K Title Author Genre Publisher Q1 Q1: Find books by ISBN
  • 41. MADRID · NOV 27-28 · 2015@calonso ANALYSIS &VALIDATION • 1 Partition per read? • Are write conflicts (overwrites) possible? • How large are partitions? • Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M • 1 X (5 - 1 - 0) + 0 < 1M • How much data duplication? (batches) • Client side joins or new table? Book ISBN K Title Author Genre Publisher Q1 Q1: Find books by ISBN N/A N/A
  • 42. MADRID · NOV 27-28 · 2015@calonso Book ISBN K Title Author Genre Publisher Q1 Q1: Find books by ISBNCREATETABLE books ( ISBNVARCHAR PRIMARY KEY, titleVARCHAR, authorVARCHAR, genreVARCHAR, publisherVARCHAR ); SELECT * FROM books WHERE ISBN = ‘…’;
  • 43. MADRID · NOV 27-28 · 2015@calonso Users register into the system uniquely identified by an email and a password. We also want their full name. They will be accessed by email and password or internal unique ID. QDM K Users_by_ID ID full_name Q1 Q1: Find users by ID Users_by_login_info email K password Q2 Q2: Find users by login info K full_name ID
  • 44. MADRID · NOV 27-28 · 2015@calonso ANALYSIS &VALIDATION • 1 Partition per read? • Are write conflicts (overwrites) possible? • How large are partitions? • Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M • 1 X (2 - 1 - 0) + 0 < 1M • How much data duplication? (batches) • Client side joins or new table? N/A N/A K Users_by_ID ID full_name Q1 Q1: Find users by ID
  • 45. MADRID · NOV 27-28 · 2015@calonso CREATETABLE users_by_id ( IDTIMEUUID PRIMARY KEY, full_nameVARCHAR ); SELECT * FROM users_by_id WHERE ID = …; K Users_by_ID ID full_name Q1 Q1: Find users by ID
  • 46. MADRID · NOV 27-28 · 2015@calonso ANALYSIS &VALIDATION • 1 Partition per read? • Are write conflicts (overwrites) possible? • How large are partitions? • Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M • 1 X (4 - 2 - 0) + 0 < 1M • How much data duplication? (batches) • Client side joins or new table? N/A Users_by_login_info email K password Q2 Q2: Find users by login info K full_name ID
  • 47. MADRID · NOV 27-28 · 2015@calonso CREATETABLE users_by_login_info ( emailVARCHAR, passwordVARCHAR, full_nameVARCHAR, IDTIMEUUID, PRIMARY_KEY ((email, password)) ); SELECT * FROM users_by_login_info WHERE email = ‘…’AND password = ‘…’; Users_by_login_info email K password Q2 Q2: Find users by login info K full_name ID
  • 48. MADRID · NOV 27-28 · 2015@calonso BEGIN BATCH INSERT INTO users_by_id (ID, full_name)VALUES (…) IF NOT EXISTS; INSERT INTO users_by_login_info (email, password, full_name, ID)VALUES (…); APPLY BATCH;
  • 49. MADRID · NOV 27-28 · 2015@calonso Users read books. We want to know which books has a user read. QDM K Books_read_by_user user_ID title Q1 full_name ISBN genre publisher author C C Q1: Find all books a logged user has read
  • 50. MADRID · NOV 27-28 · 2015@calonso ANALYSIS &VALIDATION • 1 Partition per read? • Are write conflicts (overwrites) possible? • How large are partitions? • Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M • Books X (7 - 1 - 0) + 0 < 1M => 166.666 books per user • How much data duplication? (batches) • Client side joins or new table? Q1: Find all books a logged user has read K Books_read_by_user user_ID title Q1 full_name ISBN genre publisher author C C New table N/A
  • 51. MADRID · NOV 27-28 · 2015@calonso CREATETABLE books_read_by_user ( user_IDTIMEUUID, titleVARCHAR, authorVARCHAR, full_nameVARCHAR, ISBNVARCHAR, genreVARCHAR, publisherVARCHAR PRIMARY_KEY (user_id, title, author) ); SELECT * FROM books_read_by_user WHERE user_ID = …; Q1: Find all books a logged user has read K Books_read_by_user user_ID title Q1 full_name ISBN genre publisher author C C
  • 52. MADRID · NOV 27-28 · 2015@calonso In order to improve our site’s usability we need to understand how our users use it by tracking every interaction they have with our site. QDM user_ID 2_weeks time element type K Actions_by_user Q1 Q1: Find all actions a user does in a time range K C
  • 53. MADRID · NOV 27-28 · 2015@calonso ANALYSIS &VALIDATION • 1 Partition per read? • Are write conflicts (overwrites) possible? • How large are partitions? • Ncells = Nrow X ( Ncols – Npk – Nstatic ) + Nstatic < 1M • Actions X (5 - 2 - 0) + 0 < 1M => 333.333 per user every 2 weeks • How much data duplication? (batches) • Client side joins or new table? N/A K Actions_by_user user_ID 2_weeks Q1 Q1: Find all actions a user does in a time range K time element type C N/A
  • 54. MADRID · NOV 27-28 · 2015@calonso CREATETABLE actions_by_user ( user_IDTIMEUUID, 2_weeks INT, timeTIMESTAMP, elementVARCHAR, typeVARCHAR PRIMARY_KEY ((user_ID, 2_weeks), time) ); SELECT * FROM actions_by_user WHERE user_ID = … AND 2_weeks = … AND time < … AND time > …; K Actions_by_user user_ID 2_weeks Q1 Q1: Find all actions a user does in a time range K time element type C
  • 55. MADRID · NOV 27-28 · 2015@calonso THANKS!