HR5 alum Stephen Portanova will be presenting on the highly scalable database Cassandra, which is used by Reddit, Netflix, CERN, and The Weather Channel. 'nuff said.
4. Why Should You Care
● Horizontal Scaling (basically auto sharding)
● Multiple Nodes - Highly Available
● Really Fast Writes
● Not too shabby at reads either - SLICES!!
● Bright Future
8. Data Model
● Wide rows
● Slices Queries
● Denormalization
● Index tables
9. Data Model - Simple Key
CREATE TABLE email_app.emails (
user_id text,
subject text,
to_add text,
cc text,
body text,
ROW KEY
PRIMARY KEY(user_id));
10. Data Model - Simple Inserts
INSERT INTO email_app.emails (user_id,
subject, to_add, cc, body) VALUES (‘111’,
‘party’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my
place’);
INSERT INTO email_app.emails (user_id,
subject, to_add, cc, body) VALUES (‘999’, ‘wat
‘, ‘horse@b.com‘, ‘giraffe@b.com‘, ‘is going
on?’);
11. Data Model Simple Inserts Result
Select * from email_app.emails;
111
subject to_add cc body
party cat@ hippo@ at my place
subject to_add cc body
wat horse@ giraffe@ is going on 999
12. Mental Model - Nested Hash
Row Keys 111
999
to cc body
Column
Values
subject subject to cc body
13. Data Model - Simple Insert - Again
INSERT INTO email_app.emails (user_id, subject, to_add,
cc, body) VALUES (‘111’, ‘party’, ‘cat@b.com‘, ‘hippo@b.
com‘, ‘at my place’);
111 subject to_add cc body
party cat@ hippo@ at my place
subject to_add cc body
wat horse@ giraffe@ Is going on? 999 IDEMPOTENT
14. Data Model - Composite Key 1
CREATE TABLE email_app.emails (
user_id text,
subject text,
to_add text,
cc text,
body text,
PRIMARY KEY(user_id, subject));
ROW KEY CLUSTERING KEY
15. Data Model - Composite Insert 1
INSERT INTO email_app.emails (user_id,
subject, to_add, cc, body) VALUES (‘111’,
‘party‘, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my
place’);
Same as Before.
Right???
16. Data Model Composite Insert Result
Select * from emails WHERE user_id = 111;
Subject
111 party|to_ad party|cc party|body
cat@ hippo@ At my place
17. Mental Model - Nested Hash
111
to_add cc body
Row Key
Column
Values
party
Clustering
Column
user_id
subject
18. Data Model - Composite Insert 2
INSERT INTO email_app.emails (user_id,
subject, to_add, cc, body) VALUES (‘111’, ’
swim’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘in the
pool’);
19. Composite Insert 2 Result
Select * from emails WHERE user_id = ‘111’;
Subject
111 party|to_add party|cc party|body
cat@ hippo@ at my place
swim|to_add swim|cc swim|body
cat@ hippo@b in the pool
Sorted by clustering column - “subject”
20. Mental Model - Nested Sorted Hash
111
party
to cc body
Row Key
Clustering
Column
Column
Values
swim
to cc body
user_id
subject
21. Why sorted?
SLICE QUERIES!!
SELECT * FROM emails WHERE user_id = '111'
AND (subject) >= ('s') AND (subject) < (‘t’);
111 party|to_add party|cc party|body
cat@ giraffe@ At my place
swim|to_add swim|cc swim|body
cat@ hippo@b in the pool
24. Composite Insert 2 Result
SELECT * FROM emails WHERE user_id = ‘111’;
SELECT * FROM emails WHERE user_id = ‘111’
AND subject = ‘party’;
111:party
cat@|cc cat@|body
hippo@ At my place
to_add
25. Data Model - Composite Insert 1
INSERT INTO email_app.emails (user_id, subject, to_add,
cc, body) VALUES (‘111’, ‘party‘, ‘dog@b.com‘, ‘hippo@b.
com‘, ‘all the time’);
SELECT * FROM emails WHERE user_id = ‘111’ AND
subject = ‘party’;
111:party
cat@|cc cat@...|body
giraffe@ At my place
dog@|cc dog@|body
hippo@b all the time
Sorting / slice on - “to_add”
to_add
27. Composite / Clustered Inserts
INSERT INTO email_app.emails (user_id, subject, to_add,
cc, body) VALUES (‘111’, ‘party‘, ‘dog@b.com‘, ‘hippo@b.
com‘, ‘all the time);
INSERT INTO email_app.emails (user_id, subject, to_add,
cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b.
com‘, ‘At my place’);
INSERT INTO email_app.emails (user_id, subject, to_add,
cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘mouse@b.
com‘, ‘At my place’);
28. DM - Composite / Clustered Inserts
SELECT * FROM emails WHERE user_id = ‘111’ AND
subject = ‘party’;
111|party
cat@|hippo@|body cat@|mouse@|body
at my place at my place
dog@|hippo@|body
all the time
Slice on (to_add) OR (to_add, cc)
29. Mental Model - Nested Sorted Hash
111|party
cat dog
hippo mouse hippo
body body body
Row Key
Clustering
Columns
Column
Values
user_id +
subject
to_add
cc
30. Part 2 / 8 of this 7 hour talk
● Denormalization
● Index Column Families
● Cassandra Internals (memtables, SSTables,
compaction, repair)
31. Part 8 / 8: The Future
● Continually improving
● More and more adoption
● Awesome projects
● http://www.datastax.
com/documentation/cassandra/2.
0/pdf/cassandra20.pdf
● http://planetcassandra.org/