Cassandra nice use-cases and worst anti-patterns 
DuyHai DOAN, Technical Advocate 
@doanduyhai
Shameless self-promotion! 
@doanduyhai 
2 
Duy Hai DOAN 
Cassandra technical advocate 
• talks, meetups, confs 
• open-sou...
Agenda! 
@doanduyhai 
3 
Anti-patterns 
• Queue-like designs 
• CQL null values 
• Intensive updates on same column 
• Des...
Agenda! 
@doanduyhai 
4 
Nice use-cases 
• Rate-limiting 
• Anti Fraud 
• Account validation 
• Sensor data timeseries
Data Model Crash Course!
Last Write Win (LWW)! 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
@doanduyhai 
6 
jdoe 
age 
nam...
Last Write Win (LWW)! 
@doanduyhai 
jdoe 
age (t1) name (t1) 
33 John DOE 
7 
INSERT INTO users(login, name, age) VALUES(‘...
Last Write Win (LWW)! 
@doanduyhai 
8 
UPDATE users SET age = 34 WHERE login = jdoe; 
jdoe 
SSTable1 SSTable2 
age (t1) na...
Last Write Win (LWW)! 
@doanduyhai 
9 
DELETE age FROM users WHERE login = jdoe; 
tombstone 
SSTable1 SSTable2 SSTable3 
j...
Last Write Win (LWW)! 
@doanduyhai 
10 
SELECT age FROM users WHERE login = jdoe; 
? ? ? 
SSTable1 SSTable2 SSTable3 
jdoe...
Last Write Win (LWW)! 
@doanduyhai 
11 
SELECT age FROM users WHERE login = jdoe; 
✕ ✕ ✓ 
SSTable1 SSTable2 SSTable3 
jdoe...
Compaction! 
@doanduyhai 
12 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe ...
Simple Table! 
@doanduyhai 
13 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
… 
PRIMARY KEY(login)); 
partitio...
Clustered table (1 – N)! 
@doanduyhai 
14 
CREATE TABLE sensor_data ( 
sensor_id text, 
date timestamp, 
raw_data blob, 
P...
Sorted on disk! 
sensor_id 
@doanduyhai 
SSTable1 
date1(t1) date2(t2) date3(t3) date4(t4) date5(t5) … 
… … … … …
Worst anti-patterns! 
Queue-like designs! 
CQL null! 
Intensive updates on same column! 
Design around dynamic schema! 
!
Failure level! 
@doanduyhai 
17 
☠ 
☠☠ 
☠☠☠ 
☠☠☠☠
Queue-like designs! 
@doanduyhai 
18 
Adding new message ☞ 1 physical insert
Queue-like designs! 
@doanduyhai 
19 
Adding new message ☞ 1 physical insert 
Consuming message = deleting it ☞ 1 physical...
Queue-like designs! 
@doanduyhai 
20 
Adding new message ☞ 1 physical insert 
Consuming message = deleting it ☞ 1 physical...
Queue-like designs! 
FIFO queue 
@doanduyhai 
21 
A 
{ A }
Queue-like designs! 
FIFO queue 
@doanduyhai 
22 
A B 
{ A, B }
Queue-like designs! 
FIFO queue 
@doanduyhai 
23 
A B C 
{ A, B, C }
Queue-like designs! 
FIFO queue 
@doanduyhai 
24 
A B C A 
{ B, C }
Queue-like designs! 
FIFO queue 
@doanduyhai 
25 
A B C A D 
{ B, C, D }
Queue-like designs! 
FIFO queue 
@doanduyhai 
26 
A B C A D B 
{ C, D }
Queue-like designs! 
FIFO queue 
@doanduyhai 
27 
A B C A D B C 
{ D }
Queue-like designs! 
FIFO queue, worst case 
@doanduyhai 
28 
A A A A A A A A A A 
{ }
Failure level! 
@doanduyhai 
29 
☠☠☠
Queue-like designs! 
Read cursor. Next read will give {A, E} 
@doanduyhai 
30 
Solution: event-sourcing 
• write ahead, ne...
CQL null semantics! 
@doanduyhai 
31 
Reading null value means 
• value does not exist (has never bean created) 
• value d...
CQL null semantics! 
@doanduyhai 
32 
Writing null means 
• delete value (creating tombstone) 
• even though it does not e...
CQL null semantics! 
@doanduyhai 
33 
Seen in production: prepared statement 
UPDATE users SET 
age = ?, 
… 
geo_location ...
CQL null semantics! 
@doanduyhai 
34 
Seen in production: bound statement 
preparedStatement.bind(33, …, null, null, null,...
Failure level! 
@doanduyhai 
35 
☠
Intensive update! 
@doanduyhai 
36 
Context 
• small start-up 
• cloud-based video recording & alarm 
• internet of things...
Intensive update on same column! 
@doanduyhai 
37 
Data model 
sensor_id 
value 
45.0034 
CREATE TABLE sensor_data ( 
sens...
Intensive update on same column! 
UPDATE sensor_data SET value = 45.0034 WHERE sensor_id = …; 
UPDATE sensor_data SET valu...
Intensive update on same column! 
@doanduyhai 
39 
Read 
SELECT sensor_value from sensor_data WHERE sensor_id = …; 
read N...
Failure level! 
@doanduyhai 
40 
☠☠
Intensive update on same column! 
@doanduyhai 
41 
Solution 1: leveled compaction! (if your I/O can keep up) 
sensor_id 
v...
Intensive update on same column! 
@doanduyhai 
42 
Solution 2: reversed timeseries & DateTiered compaction strategy 
CREAT...
Intensive update on same column! 
SELECT sensor_value FROM sensor_data WHERE sensor_id = … LIMIT 1; 
@doanduyhai 
43 
sens...
Design around dynamic schema! 
@doanduyhai 
44 
Customer emergency call 
• 3 nodes cluster almost full 
• impossible to sc...
Design around dynamic schema! 
@doanduyhai 
45 
After investigation 
• 4th node in JOINING state because streaming is stal...
Design around dynamic schema! 
@doanduyhai 
46 
After investigation 
• 4th node in JOINING state because streaming is stal...
Design around dynamic schema! 
@doanduyhai 
47 
public class CompressedStreamReader extends StreamReader 
{ 
… 
@Override ...
Design around dynamic schema! 
@doanduyhai 
48 
The truth is 
• the devs dynamically drop & recreate table every day 
• dy...
Design around dynamic schema! 
@doanduyhai 
49 
Failure sequence 
n1 
n2 
n4 
n3 
catalog_x_y 
catalog_x_y 
catalog_x_y 
c...
Design around dynamic schema! 
@doanduyhai 
50 
Failure sequence 
n1 
n2 
n4 
n3 
catalog_x_y 
catalog_x_y 
catalog_x_y 
c...
Design around dynamic schema! 
@doanduyhai 
catalog_x_y ???? 
51 
Failure sequence 
n1 
n2 
n4 
n3 
4 1 
2 
3 
5 
6 
catal...
Design around dynamic schema! 
@doanduyhai 
52 
Nutshell 
• dynamic schema change as normal prod operation is not 
recomme...
Failure level! 
@doanduyhai 
53 
☠☠☠☠
! " 
! 
Q & R
Nice Examples! 
Rate limiting! 
Anti Fraud! 
Account Validation!
Rate limiting! 
@doanduyhai 
56 
Start-up company, reset password feature 
1) /password/reset 
2) SMS with token A0F83E63D...
Rate limiting! 
@doanduyhai 
57 
Problem 1 
• account created with premium phone number
Rate limiting! 
@doanduyhai 
58 
Problem 1 
• account created with premium phone number 
• /password/reset x 100
Rate limiting! 
@doanduyhai 
59 
« money, money, money, give money, in the richman’s world » $$$
Rate limiting! 
@doanduyhai 
60 
Problem 2 
• massive hack
Rate limiting! 
@doanduyhai 
61 
Problem 2 
• massive hack 
• 106 /password/reset calls from few accounts
Rate limiting! 
@doanduyhai 
62 
Problem 2 
• massive hack 
• 106 /password/reset calls from few accounts 
• SMS messages ...
Rate limiting! 
@doanduyhai 
63 
Problem 2 
• ☞ but not at the 106/per user/per day scale
Rate limiting! 
@doanduyhai 
64 
Solution 
• premium phone number ☞ Google libphonenumber
Rate limiting! 
@doanduyhai 
65 
Solution 
• premium phone number ☞ Google libphonenumber 
• massive hack ☞ rate limiting ...
Cassandra Time To Live! 
@doanduyhai 
66 
Time to live 
• built-in feature 
• insert data with a TTL in sec 
• expires ser...
Rate limiting in action! 
@doanduyhai 
67 
Implementation 
• threshold = max 3 reset password per sliding 24h per 
user
Rate limiting in action! 
@doanduyhai 
68 
Implementation 
• when /password/reset called 
• check threshold 
• reached ☞ e...
Rate Limiting 
Demo
Anti Fraud! 
@doanduyhai 
70 
Real story 
• many special offers available 
• 30 mins international calls (50 countries) 
•...
Anti Fraud! 
@doanduyhai 
71 
Real story 
• each offer has a duration (week/month/year) 
• only one offer active at a time
Anti Fraud! 
@doanduyhai 
72 
Cassandra TTL 
• when granting new offer 
INSERT INTO user_special_offer(login, offer_code, ...
Anti Fraud Demo
Account Validation! 
@doanduyhai 
74 
Requirement 
• user creates new account 
• sends sms/email link with token to valida...
Account Validation! 
@doanduyhai 
75 
How to ? 
• create account with 10 days TTL 
INSERT INTO users(login, name, age) 
VA...
Account Validation! 
@doanduyhai 
76 
How to ? 
• create random token for validation with 10 days TTL 
INSERT INTO account...
Account Validation! 
@doanduyhai 
77 
On token validation 
• check token exist & retrieve user details 
SELECT login, name...
! " 
! 
Q & R
Thank You 
@doanduyhai 
duy_hai.doan@datastax.com 
https://academy.datastax.com/
Nächste SlideShare
Wird geladen in …5
×

DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - NoSQl matters Barcelona 2014

1.494 Aufrufe

Veröffentlicht am

DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns

In this session, you'll see how to leverage the best features of Cassandra to solve real world issues (Rate limiting/anti fraud system, account validation, security token …). We'll also highlight some common anti-patterns (queue,partition key miss,CQL3 null) and see how to solve them in the Cassandra way.

Veröffentlicht in: Daten & Analysen
0 Kommentare
4 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
1.494
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
41
Aktionen
Geteilt
0
Downloads
36
Kommentare
0
Gefällt mir
4
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - NoSQl matters Barcelona 2014

  1. 1. Cassandra nice use-cases and worst anti-patterns DuyHai DOAN, Technical Advocate @doanduyhai
  2. 2. Shameless self-promotion! @doanduyhai 2 Duy Hai DOAN Cassandra technical advocate • talks, meetups, confs • open-source devs (Achilles, …) • technical point of contact ☞ duy_hai.doan@datastax.com • production troubleshooting
  3. 3. Agenda! @doanduyhai 3 Anti-patterns • Queue-like designs • CQL null values • Intensive updates on same column • Design around dynamic schema
  4. 4. Agenda! @doanduyhai 4 Nice use-cases • Rate-limiting • Anti Fraud • Account validation • Sensor data timeseries
  5. 5. Data Model Crash Course!
  6. 6. Last Write Win (LWW)! INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); @doanduyhai 6 jdoe age name 33 John DOE #partition
  7. 7. Last Write Win (LWW)! @doanduyhai jdoe age (t1) name (t1) 33 John DOE 7 INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); auto-generated timestamp (μs) .
  8. 8. Last Write Win (LWW)! @doanduyhai 8 UPDATE users SET age = 34 WHERE login = jdoe; jdoe SSTable1 SSTable2 age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  9. 9. Last Write Win (LWW)! @doanduyhai 9 DELETE age FROM users WHERE login = jdoe; tombstone SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  10. 10. Last Write Win (LWW)! @doanduyhai 10 SELECT age FROM users WHERE login = jdoe; ? ? ? SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  11. 11. Last Write Win (LWW)! @doanduyhai 11 SELECT age FROM users WHERE login = jdoe; ✕ ✕ ✓ SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  12. 12. Compaction! @doanduyhai 12 SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 New SSTable jdoe age (t3) name (t1) ý John DOE
  13. 13. Simple Table! @doanduyhai 13 CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login)); partition key (#partition)
  14. 14. Clustered table (1 – N)! @doanduyhai 14 CREATE TABLE sensor_data ( sensor_id text, date timestamp, raw_data blob, PRIMARY KEY((sensor_id), date)); partition key clustering column (sorted) unicity
  15. 15. Sorted on disk! sensor_id @doanduyhai SSTable1 date1(t1) date2(t2) date3(t3) date4(t4) date5(t5) … … … … … …
  16. 16. Worst anti-patterns! Queue-like designs! CQL null! Intensive updates on same column! Design around dynamic schema! !
  17. 17. Failure level! @doanduyhai 17 ☠ ☠☠ ☠☠☠ ☠☠☠☠
  18. 18. Queue-like designs! @doanduyhai 18 Adding new message ☞ 1 physical insert
  19. 19. Queue-like designs! @doanduyhai 19 Adding new message ☞ 1 physical insert Consuming message = deleting it ☞ 1 physical insert (tombstone)
  20. 20. Queue-like designs! @doanduyhai 20 Adding new message ☞ 1 physical insert Consuming message = deleting it ☞ 1 physical insert (tombstone) Transactional queue = re-inserting messages ☞ physical insert * <many>
  21. 21. Queue-like designs! FIFO queue @doanduyhai 21 A { A }
  22. 22. Queue-like designs! FIFO queue @doanduyhai 22 A B { A, B }
  23. 23. Queue-like designs! FIFO queue @doanduyhai 23 A B C { A, B, C }
  24. 24. Queue-like designs! FIFO queue @doanduyhai 24 A B C A { B, C }
  25. 25. Queue-like designs! FIFO queue @doanduyhai 25 A B C A D { B, C, D }
  26. 26. Queue-like designs! FIFO queue @doanduyhai 26 A B C A D B { C, D }
  27. 27. Queue-like designs! FIFO queue @doanduyhai 27 A B C A D B C { D }
  28. 28. Queue-like designs! FIFO queue, worst case @doanduyhai 28 A A A A A A A A A A { }
  29. 29. Failure level! @doanduyhai 29 ☠☠☠
  30. 30. Queue-like designs! Read cursor. Next read will give {A, E} @doanduyhai 30 Solution: event-sourcing • write ahead, never delete • read = move a cursor forward (or backward in time for history) A B C D A E Write cursor
  31. 31. CQL null semantics! @doanduyhai 31 Reading null value means • value does not exist (has never bean created) • value deleted (tombstone) SELECT age FROM users WHERE login = jdoe; à NULL
  32. 32. CQL null semantics! @doanduyhai 32 Writing null means • delete value (creating tombstone) • even though it does not exist UPDATE users SET age = NULL WHERE login = jdoe;
  33. 33. CQL null semantics! @doanduyhai 33 Seen in production: prepared statement UPDATE users SET age = ?, … geo_location = ?, mood = ?, … WHERE login = ?;
  34. 34. CQL null semantics! @doanduyhai 34 Seen in production: bound statement preparedStatement.bind(33, …, null, null, null, …); null ☞ tombstone creation on each update … jdoe age name geo_loc mood status 33 John DOE ý ý ý
  35. 35. Failure level! @doanduyhai 35 ☠
  36. 36. Intensive update! @doanduyhai 36 Context • small start-up • cloud-based video recording & alarm • internet of things (sensor) • 10 updates/sec for some sensors
  37. 37. Intensive update on same column! @doanduyhai 37 Data model sensor_id value 45.0034 CREATE TABLE sensor_data ( sensor_id long, value double, PRIMARY KEY(sensor_id));
  38. 38. Intensive update on same column! UPDATE sensor_data SET value = 45.0034 WHERE sensor_id = …; UPDATE sensor_data SET value = 47.4182 WHERE sensor_id = …; UPDATE sensor_data SET value = 48.0300 WHERE sensor_id = …; @doanduyhai 38 Updates sensor_id value (t1) 45.0034 sensor_id value (t13) 47.4182 sensor_id value (t36) 48.0300
  39. 39. Intensive update on same column! @doanduyhai 39 Read SELECT sensor_value from sensor_data WHERE sensor_id = …; read N physical columns, only 1 useful … (until compaction) sensor_id value (t1) 45.0034 sensor_id value (t13) 47.4182 sensor_id value (t36) 48.0300
  40. 40. Failure level! @doanduyhai 40 ☠☠
  41. 41. Intensive update on same column! @doanduyhai 41 Solution 1: leveled compaction! (if your I/O can keep up) sensor_id value (t1) 45.0034 sensor_id value (t13) 47.4182 sensor_id value (t36) 48.0300 sensor_id value (t36) 48.0300
  42. 42. Intensive update on same column! @doanduyhai 42 Solution 2: reversed timeseries & DateTiered compaction strategy CREATE TABLE sensor_data ( sensor_id long, date timestamp, value double, PRIMARY KEY((sensor_id), date)) WITH CLUSTERING ORDER (date DESC);
  43. 43. Intensive update on same column! SELECT sensor_value FROM sensor_data WHERE sensor_id = … LIMIT 1; @doanduyhai 43 sensor_id date3(t3) date2(t2) date1(t1) Data cleaning by configuration the strategy (base_time_seconds) ... 48.0300 47.4182 45.0034 …
  44. 44. Design around dynamic schema! @doanduyhai 44 Customer emergency call • 3 nodes cluster almost full • impossible to scale out • 4th node in JOINING state for 1 week • disk space is filling up, production at risk!
  45. 45. Design around dynamic schema! @doanduyhai 45 After investigation • 4th node in JOINING state because streaming is stalled • NPE in logs
  46. 46. Design around dynamic schema! @doanduyhai 46 After investigation • 4th node in JOINING state because streaming is stalled • NPE in logs Cassandra source-code to the rescue
  47. 47. Design around dynamic schema! @doanduyhai 47 public class CompressedStreamReader extends StreamReader { … @Override public SSTableWriter read(ReadableByteChannel channel) throws IOException { … Pair<String, String> kscf = Schema.instance.getCF(cfId); ColumnFamilyStore cfs = Keyspace.open(kscf.left).getColumnFamilyStore(kscf.right); NPE here
  48. 48. Design around dynamic schema! @doanduyhai 48 The truth is • the devs dynamically drop & recreate table every day • dynamic schema is in the core of their design Example: DROP TABLE catalog_127_20140613; CREATE TABLE catalog_127_20140614( … );
  49. 49. Design around dynamic schema! @doanduyhai 49 Failure sequence n1 n2 n4 n3 catalog_x_y catalog_x_y catalog_x_y catalog_x_y 4 1 2 3 5 6
  50. 50. Design around dynamic schema! @doanduyhai 50 Failure sequence n1 n2 n4 n3 catalog_x_y catalog_x_y catalog_x_y catalog_x_y 4 1 2 3 5 6 catalog_x_z catalog_x_z catalog_x_z catalog_x_z
  51. 51. Design around dynamic schema! @doanduyhai catalog_x_y ???? 51 Failure sequence n1 n2 n4 n3 4 1 2 3 5 6 catalog_x_z catalog_x_z catalog_x_z catalog_x_z
  52. 52. Design around dynamic schema! @doanduyhai 52 Nutshell • dynamic schema change as normal prod operation is not recommended • schema AND topology change at the same time is an anti-pattern
  53. 53. Failure level! @doanduyhai 53 ☠☠☠☠
  54. 54. ! " ! Q & R
  55. 55. Nice Examples! Rate limiting! Anti Fraud! Account Validation!
  56. 56. Rate limiting! @doanduyhai 56 Start-up company, reset password feature 1) /password/reset 2) SMS with token A0F83E63DB935465CE73DFE…. Phone number Random token 3) /password/new/<token>/<password>
  57. 57. Rate limiting! @doanduyhai 57 Problem 1 • account created with premium phone number
  58. 58. Rate limiting! @doanduyhai 58 Problem 1 • account created with premium phone number • /password/reset x 100
  59. 59. Rate limiting! @doanduyhai 59 « money, money, money, give money, in the richman’s world » $$$
  60. 60. Rate limiting! @doanduyhai 60 Problem 2 • massive hack
  61. 61. Rate limiting! @doanduyhai 61 Problem 2 • massive hack • 106 /password/reset calls from few accounts
  62. 62. Rate limiting! @doanduyhai 62 Problem 2 • massive hack • 106 /password/reset calls from few accounts • SMS messages are cheap
  63. 63. Rate limiting! @doanduyhai 63 Problem 2 • ☞ but not at the 106/per user/per day scale
  64. 64. Rate limiting! @doanduyhai 64 Solution • premium phone number ☞ Google libphonenumber
  65. 65. Rate limiting! @doanduyhai 65 Solution • premium phone number ☞ Google libphonenumber • massive hack ☞ rate limiting with Cassandra
  66. 66. Cassandra Time To Live! @doanduyhai 66 Time to live • built-in feature • insert data with a TTL in sec • expires server-side automatically • ☞ use as sliding-window
  67. 67. Rate limiting in action! @doanduyhai 67 Implementation • threshold = max 3 reset password per sliding 24h per user
  68. 68. Rate limiting in action! @doanduyhai 68 Implementation • when /password/reset called • check threshold • reached ☞ error message/ignore • not reached ☞ log the attempt with TTL = 86400
  69. 69. Rate Limiting Demo
  70. 70. Anti Fraud! @doanduyhai 70 Real story • many special offers available • 30 mins international calls (50 countries) • unlimited land-line calls to 5 countries • …
  71. 71. Anti Fraud! @doanduyhai 71 Real story • each offer has a duration (week/month/year) • only one offer active at a time
  72. 72. Anti Fraud! @doanduyhai 72 Cassandra TTL • when granting new offer INSERT INTO user_special_offer(login, offer_code, …) VALUES(‘jdoe’, ’30_mins_international’,…) IF NOT EXISTS USING TTL <offer_duration>;
  73. 73. Anti Fraud Demo
  74. 74. Account Validation! @doanduyhai 74 Requirement • user creates new account • sends sms/email link with token to validate account • 10 days to validate
  75. 75. Account Validation! @doanduyhai 75 How to ? • create account with 10 days TTL INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33) USING TTL 864000;
  76. 76. Account Validation! @doanduyhai 76 How to ? • create random token for validation with 10 days TTL INSERT INTO account_validation(token, login, name, age) VALUES(‘A0F83E63DB935465CE73DFE…’, ‘jdoe’, ‘John DOE’, 33) USING TTL 864000;
  77. 77. Account Validation! @doanduyhai 77 On token validation • check token exist & retrieve user details SELECT login, name, age FROM account_validation WHERE token = ‘A0F83E63DB935465CE73DFE…’; • re-insert durably user details without TTL INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
  78. 78. ! " ! Q & R
  79. 79. Thank You @doanduyhai duy_hai.doan@datastax.com https://academy.datastax.com/

×