SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Cassandra
Nick Bailey
@nickmbailey
nick@datastax.com
Thursday, May 30, 13
©2012 DataStax
Introduction
2
Thursday, May 30, 13
©2012 DataStax
Why does Cassandra Exist?
3
Thursday, May 30, 13
©2012 DataStax
Analytics
+
Real Time
4
Big Data
Thursday, May 30, 13
©2012 DataStax
Architecture
5
Thursday, May 30, 13
©2012 DataStax
Dynamo
+
BigTable
6
Thursday, May 30, 13
©2012 DataStax
Why do people like Cassandra?
7
Thursday, May 30, 13
©2012 DataStax
Availability
8
Thursday, May 30, 13
©2012 DataStax
Scalability
9
Thursday, May 30, 13
©2012 DataStax 10
Thursday, May 30, 13
©2012 DataStax
Performance
11
Thursday, May 30, 13
©2012 DataStax 12
Thursday, May 30, 13
©2012 DataStax
Multi Datacenter Support
13
Thursday, May 30, 13
©2012 DataStax 14
Thursday, May 30, 13
©2012 DataStax
Hadoop Support
15
Thursday, May 30, 13
©2012 DataStax
Hadoop Support
• InputFormat
• Run tasktrackers/datanodes locally
• Run namenode/jobtracker anywhere
16
Thursday, May 30, 13
©2012 DataStax
Data Locality
Workload Partitioning
17
Thursday, May 30, 13
©2012 DataStax
Data Modeling
18
Thursday, May 30, 13
©2012 DataStax
Keyspace,
Column Families
19
Thursday, May 30, 13
©2012 DataStax
Database,
Tables
20
Thursday, May 30, 13
©2012 DataStax
Column Family =
Row Key + Columns (name, value)
...
21
Thursday, May 30, 13
©2012 DataStax
Static Column Families
Dynamic Column Families
22
Thursday, May 30, 13
©2012 DataStax
Static - Users Column Family
23
Row Key
g_m_bluth
password:
banana stand
name: George
Michael
tobias_f
password:
c_weathers
name:Tobias phone: 512-7777
Thursday, May 30, 13
©2012 DataStax
Dynamic - Friend Column Family
24
Row Key
g_m_bluth <date>:ann_v <date>:maeby
tobias_f <date>:barry_z <date>:carl_w <date>:lindsay ...
Thursday, May 30, 13
©2012 DataStax
Time Series Data
• Event logs
• Metrics
• Sensor Data
• Etc
25
Thursday, May 30, 13
©2012 DataStax
Time Series - Login CF
26
Row Key
g_m_bluth
1369633061:
United States
1369625839:
Mexico
...
tobias_f
1369932413:
Canada
1369681738:
United States
...
Thursday, May 30, 13
©2012 DataStax
What Else?
27
Thursday, May 30, 13
©2012 DataStax
Counter Columns
28
• Inc/Dec operations
• Not idempotent
• Possibility for over counting
Thursday, May 30, 13
©2012 DataStax
Expiring Columns
29
• TTL - Time to live
• Set per column
• Possibly an anti-pattern (we’ll get to that later)
Thursday, May 30, 13
©2012 DataStax
Secondary Indexes
30
• Select * from Users where name=Nick;
• Only support ‘=’ clauses (for first condition)
• Often misused
Thursday, May 30, 13
©2012 DataStax
CQL
Cassandra Query Language
31
Thursday, May 30, 13
©2012 DataStax 32
CREATE COLUMNFAMILY songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob);
INSERT INTO songs (id, title, artist, album)
VALUES ('a3e64f8f...', 'La Grange', 'ZZ Top', 'Tres Hombres');
SELECT * FROM songs;
id          | album        | artist         | title
-------------+--------------+----------------+----------------
2b09185b... |    Roll Away | Back Door Slam | Outside Woman...
8a172618... | We Must Obey |      Fu Manchu | Moving in Ste...
a3e64f8f... | Tres Hombres |         ZZ Top | La Grange
Thursday, May 30, 13
©2012 DataStax
How do I start?
33
Thursday, May 30, 13
©2012 DataStax
Define your questions
34
Thursday, May 30, 13
©2012 DataStax
SELECT time, location FROM
logins WHERE user =
‘nickmbailey’ ORDER BY time
DESC LIMIT 10;
35
Thursday, May 30, 13
©2012 DataStax
WHERE user = ‘nickmbailey’
Row Key
36
Thursday, May 30, 13
©2012 DataStax
ORDER BY time DESC LIMIT
10;
Store columns in chronological
order
37
Thursday, May 30, 13
©2012 DataStax
CREATE COLUMN FAMILY logins (
! user,
time,
location,
PRIMARY KEY (user, time));
38
Thursday, May 30, 13
©2012 DataStax
What about?
39
Thursday, May 30, 13
©2012 DataStax
SELECT time FROM logins
WHERE user = ‘nickmbailey’
and location = ‘United States’;
40
Thursday, May 30, 13
©2012 DataStax 41
g_m_bluth
1369633061:
United States
1369625839:
Mexico
....
1369622839:
Canada
1369422839:
Canada
1368422839:
Canada
....
1368421839:
Canada
1367421839:
United States
1367411839:
Mexico
....
Thursday, May 30, 13
©2012 DataStax
CREATE COLUMN FAMILY
logins (user, time, location,
PRIMARY KEY (user, location));
42
Thursday, May 30, 13
©2012 DataStax 43
g_m_bluth
United States:
1369633061
Canada:
1369622839
....
Thursday, May 30, 13
©2012 DataStax
To Normalize or Not
44
Thursday, May 30, 13
©2012 DataStax
SELECT time, location FROM.....
+
SELECT city, state, zip.... FROM
locations.....
45
Thursday, May 30, 13
©2012 DataStax 46
g_m_bluth
1369633061:
<United States,
Austin,
Texas,
78701>
1369625839:
<Mexico,
Tiajuana,
88191>
1358633061:
<United
States,Austin,
Texas,
78701>
Thursday, May 30, 13
©2012 DataStax
Anti Patterns
47
Thursday, May 30, 13
©2012 DataStax
Batched Writes
• Failure case is suboptimal
• Increased chance of failure
• Tune to your workload
48
Thursday, May 30, 13
©2012 DataStax
BOP/OPP
• You don’t really need it
• Your Ops Team will hate you
• Really, you don’t need it.
49
Thursday, May 30, 13
©2012 DataStax
Super Columns
• Performance penalty
• Speed
• Memory
• Replaced by CQL3
50
Thursday, May 30, 13
©2012 DataStax
Read Before Write
• Race conditions
• Hurts performance
• Cache
• IO
51
Thursday, May 30, 13
©2012 DataStax
Queues
• More generally, many deletes within a row
• A delete in Cassandra is actually a tombstone
• Read 1000 tombstones in order to find 10
columns
52
Thursday, May 30, 13
©2012 DataStax
Use Cases
53
Thursday, May 30, 13
©2012 DataStax
Ebay
54
Thursday, May 30, 13
©2012 DataStax
http://www.youtube.com/
watch?v=F-fYqPu2ciQ
55
Thursday, May 30, 13
©2012 DataStax
Ebay
• dozens of nodes
• 200 TB+ of storage
56
Thursday, May 30, 13
©2012 DataStax
Ebay
• Social Signals
• Hunch Taste Graph
• Various Time Series
57
Thursday, May 30, 13
©2012 DataStax
Social Signals
• Like, Own, Want
• Need:
• scalable counters
• high performance writes
• want to find most popular items in a given
category
58
Thursday, May 30, 13
©2012 DataStax
Social Signals
59
Row Key
item_id_1 like: 300 own:104 want:105
item_id_2 ... ... ...
ItemCount
Row Key
user_id_1 like: 50 own:10 want:75
user_id_2 ... ... ...
UserCount
Thursday, May 30, 13
©2012 DataStax
Social Signals
60
Row Key
item_id_1 user_id_1:<time> user_id_2:<time> ...
item_id_2 ... ... ...
ItemLike
Row Key
user_id_1 <time>: <item_id> <time>: <item_id> ...
user_id_2 ... ... ...
UserLike
Thursday, May 30, 13
©2012 DataStax
Social Signals - Possibilities
• Store aggregated counts per category
• Column names are counts
• Get top N items in a category
61
Thursday, May 30, 13
Questions?
Thursday, May 30, 13
Come to the Summit!
Ask me for a discount code
June 11-12, 2013
San Francisco, CA
http://www.datastax.com/company/news-and-events/events/
cassandrasummit2013
Thursday, May 30, 13

Weitere ähnliche Inhalte

Ähnlich wie Introduction to Cassandra and Data Modeling

Cassandra - PHP
Cassandra - PHPCassandra - PHP
Cassandra - PHPmauritsl
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax Academy
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQLIvan Zoratti
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Manchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra IntroManchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra IntroChristopher Batey
 

Ähnlich wie Introduction to Cassandra and Data Modeling (8)

Cassandra - PHP
Cassandra - PHPCassandra - PHP
Cassandra - PHP
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
ໂປຮແກຮມ MySQL
ໂປຮແກຮມ MySQLໂປຮແກຮມ MySQL
ໂປຮແກຮມ MySQL
 
Manchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra IntroManchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra Intro
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 

Mehr von nickmbailey

Clojure at DataStax: The Long Road From Python to Clojure
Clojure at DataStax: The Long Road From Python to ClojureClojure at DataStax: The Long Road From Python to Clojure
Clojure at DataStax: The Long Road From Python to Clojurenickmbailey
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Cassandra and Clojure
Cassandra and ClojureCassandra and Clojure
Cassandra and Clojurenickmbailey
 
CFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for HadoopCFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for Hadoopnickmbailey
 
Clojure and the Web
Clojure and the WebClojure and the Web
Clojure and the Webnickmbailey
 

Mehr von nickmbailey (7)

Clojure at DataStax: The Long Road From Python to Clojure
Clojure at DataStax: The Long Road From Python to ClojureClojure at DataStax: The Long Road From Python to Clojure
Clojure at DataStax: The Long Road From Python to Clojure
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Cassandra and Clojure
Cassandra and ClojureCassandra and Clojure
Cassandra and Clojure
 
CFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for HadoopCFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for Hadoop
 
Clojure and the Web
Clojure and the WebClojure and the Web
Clojure and the Web
 

Introduction to Cassandra and Data Modeling