Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Cassandra nice use-cases and worst anti-patterns 
DuyHai DOAN, Technical Advocate 
@doanduyhai
Shameless self-promotion! 
@doanduyhai 
2 
Duy Hai DOAN 
Cassandra technical advocate 
• talks, meetups, confs 
• open-sou...
Agenda! 
@doanduyhai 
3 
Anti-patterns 
• Queue-like designs 
• CQL null values 
• Intensive updates on same column 
• Des...
Agenda! 
@doanduyhai 
4 
Nice use-cases 
• Rate-limiting 
• Anti Fraud 
• Account validation 
• Sensor data timeseries
Data Model Crash Course!
Last Write Win (LWW)! 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
@doanduyhai 
6 
jdoe 
age 
nam...
Last Write Win (LWW)! 
@doanduyhai 
jdoe 
age (t1) name (t1) 
33 John DOE 
7 
INSERT INTO users(login, name, age) VALUES(‘...
Last Write Win (LWW)! 
@doanduyhai 
8 
UPDATE users SET age = 34 WHERE login = jdoe; 
jdoe 
SSTable1 SSTable2 
age (t1) na...
Last Write Win (LWW)! 
@doanduyhai 
9 
DELETE age FROM users WHERE login = jdoe; 
tombstone 
SSTable1 SSTable2 SSTable3 
j...
Last Write Win (LWW)! 
@doanduyhai 
10 
SELECT age FROM users WHERE login = jdoe; 
? ? ? 
SSTable1 SSTable2 SSTable3 
jdoe...
Last Write Win (LWW)! 
@doanduyhai 
11 
SELECT age FROM users WHERE login = jdoe; 
✕ ✕ ✓ 
SSTable1 SSTable2 SSTable3 
jdoe...
Compaction! 
@doanduyhai 
12 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe ...
Simple Table! 
@doanduyhai 
13 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
… 
PRIMARY KEY(login)); 
partitio...
Clustered table (1 – N)! 
@doanduyhai 
14 
CREATE TABLE sensor_data ( 
sensor_id text, 
date timestamp, 
raw_data blob, 
P...
Sorted on disk! 
sensor_id 
@doanduyhai 
SSTable1 
date1(t1) date2(t2) date3(t3) date4(t4) date5(t5) … 
… … … … …
Worst anti-patterns! 
Queue-like designs! 
CQL null! 
Intensive updates on same column! 
Design around dynamic schema! 
!
Failure level! 
@doanduyhai 
17 
☠ 
☠☠ 
☠☠☠ 
☠☠☠☠
Queue-like designs! 
@doanduyhai 
18 
Adding new message ☞ 1 physical insert
Queue-like designs! 
@doanduyhai 
19 
Adding new message ☞ 1 physical insert 
Consuming message = deleting it ☞ 1 physical...
Queue-like designs! 
@doanduyhai 
20 
Adding new message ☞ 1 physical insert 
Consuming message = deleting it ☞ 1 physical...
Queue-like designs! 
FIFO queue 
@doanduyhai 
21 
A 
{ A }
Queue-like designs! 
FIFO queue 
@doanduyhai 
22 
A B 
{ A, B }
Queue-like designs! 
FIFO queue 
@doanduyhai 
23 
A B C 
{ A, B, C }
Queue-like designs! 
FIFO queue 
@doanduyhai 
24 
A B C A 
{ B, C }
Queue-like designs! 
FIFO queue 
@doanduyhai 
25 
A B C A D 
{ B, C, D }
Queue-like designs! 
FIFO queue 
@doanduyhai 
26 
A B C A D B 
{ C, D }
Queue-like designs! 
FIFO queue 
@doanduyhai 
27 
A B C A D B C 
{ D }
Queue-like designs! 
FIFO queue, worst case 
@doanduyhai 
28 
A A A A A A A A A A 
{ }
Failure level! 
@doanduyhai 
29 
☠☠☠
Queue-like designs! 
Read cursor. Next read will give {A, E} 
@doanduyhai 
30 
Solution: event-sourcing 
• write ahead, ne...
CQL null semantics! 
@doanduyhai 
31 
Reading null value means 
• value does not exist (has never bean created) 
• value d...
CQL null semantics! 
@doanduyhai 
32 
Writing null means 
• delete value (creating tombstone) 
• even though it does not e...
CQL null semantics! 
@doanduyhai 
33 
Seen in production: prepared statement 
UPDATE users SET 
age = ?, 
… 
geo_location ...
CQL null semantics! 
@doanduyhai 
34 
Seen in production: bound statement 
preparedStatement.bind(33, …, null, null, null,...
Failure level! 
@doanduyhai 
35 
☠
Intensive update! 
@doanduyhai 
36 
Context 
• small start-up 
• cloud-based video recording & alarm 
• internet of things...
Intensive update on same column! 
@doanduyhai 
37 
Data model 
sensor_id 
value 
45.0034 
CREATE TABLE sensor_data ( 
sens...
Intensive update on same column! 
UPDATE sensor_data SET value = 45.0034 WHERE sensor_id = …; 
UPDATE sensor_data SET valu...
Intensive update on same column! 
@doanduyhai 
39 
Read 
SELECT sensor_value from sensor_data WHERE sensor_id = …; 
read N...
Failure level! 
@doanduyhai 
40 
☠☠
Intensive update on same column! 
@doanduyhai 
41 
Solution 1: leveled compaction! (if your I/O can keep up) 
sensor_id 
v...
Intensive update on same column! 
@doanduyhai 
42 
Solution 2: reversed timeseries & DateTiered compaction strategy 
CREAT...
Intensive update on same column! 
SELECT sensor_value FROM sensor_data WHERE sensor_id = … LIMIT 1; 
@doanduyhai 
43 
sens...
Design around dynamic schema! 
@doanduyhai 
44 
Customer emergency call 
• 3 nodes cluster almost full 
• impossible to sc...
Design around dynamic schema! 
@doanduyhai 
45 
After investigation 
• 4th node in JOINING state because streaming is stal...
Design around dynamic schema! 
@doanduyhai 
46 
After investigation 
• 4th node in JOINING state because streaming is stal...
Design around dynamic schema! 
@doanduyhai 
47 
public class CompressedStreamReader extends StreamReader 
{ 
… 
@Override ...
Design around dynamic schema! 
@doanduyhai 
48 
The truth is 
• the devs dynamically drop & recreate table every day 
• dy...
Design around dynamic schema! 
@doanduyhai 
49 
Failure sequence 
n1 
n2 
n4 
n3 
catalog_x_y 
catalog_x_y 
catalog_x_y 
c...
Design around dynamic schema! 
@doanduyhai 
50 
Failure sequence 
n1 
n2 
n4 
n3 
catalog_x_y 
catalog_x_y 
catalog_x_y 
c...
Design around dynamic schema! 
@doanduyhai 
catalog_x_y ???? 
51 
Failure sequence 
n1 
n2 
n4 
n3 
4 1 
2 
3 
5 
6 
catal...
Design around dynamic schema! 
@doanduyhai 
52 
Nutshell 
• dynamic schema change as normal prod operation is not 
recomme...
Failure level! 
@doanduyhai 
53 
☠☠☠☠
! " 
! 
Q & R
Nice Examples! 
Rate limiting! 
Anti Fraud! 
Account Validation!
Rate limiting! 
@doanduyhai 
56 
Start-up company, reset password feature 
1) /password/reset 
2) SMS with token A0F83E63D...
Rate limiting! 
@doanduyhai 
57 
Problem 1 
• account created with premium phone number
Rate limiting! 
@doanduyhai 
58 
Problem 1 
• account created with premium phone number 
• /password/reset x 100
Rate limiting! 
@doanduyhai 
59 
« money, money, money, give money, in the richman’s world » $$$
Rate limiting! 
@doanduyhai 
60 
Problem 2 
• massive hack
Rate limiting! 
@doanduyhai 
61 
Problem 2 
• massive hack 
• 106 /password/reset calls from few accounts
Rate limiting! 
@doanduyhai 
62 
Problem 2 
• massive hack 
• 106 /password/reset calls from few accounts 
• SMS messages ...
Rate limiting! 
@doanduyhai 
63 
Problem 2 
• ☞ but not at the 106/per user/per day scale
Rate limiting! 
@doanduyhai 
64 
Solution 
• premium phone number ☞ Google libphonenumber
Rate limiting! 
@doanduyhai 
65 
Solution 
• premium phone number ☞ Google libphonenumber 
• massive hack ☞ rate limiting ...
Cassandra Time To Live! 
@doanduyhai 
66 
Time to live 
• built-in feature 
• insert data with a TTL in sec 
• expires ser...
Rate limiting in action! 
@doanduyhai 
67 
Implementation 
• threshold = max 3 reset password per sliding 24h per 
user
Rate limiting in action! 
@doanduyhai 
68 
Implementation 
• when /password/reset called 
• check threshold 
• reached ☞ e...
Rate Limiting 
Demo
Anti Fraud! 
@doanduyhai 
70 
Real story 
• many special offers available 
• 30 mins international calls (50 countries) 
•...
Anti Fraud! 
@doanduyhai 
71 
Real story 
• each offer has a duration (week/month/year) 
• only one offer active at a time
Anti Fraud! 
@doanduyhai 
72 
Cassandra TTL 
• when granting new offer 
INSERT INTO user_special_offer(login, offer_code, ...
Anti Fraud Demo
Account Validation! 
@doanduyhai 
74 
Requirement 
• user creates new account 
• sends sms/email link with token to valida...
Account Validation! 
@doanduyhai 
75 
How to ? 
• create account with 10 days TTL 
INSERT INTO users(login, name, age) 
VA...
Account Validation! 
@doanduyhai 
76 
How to ? 
• create random token for validation with 10 days TTL 
INSERT INTO account...
Account Validation! 
@doanduyhai 
77 
On token validation 
• check token exist & retrieve user details 
SELECT login, name...
! " 
! 
Q & R
Thank You 
@doanduyhai 
duy_hai.doan@datastax.com 
https://academy.datastax.com/
Nächste SlideShare
Wird geladen in …5
×

DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - NoSQl matters Barcelona 2014

1.633 Aufrufe

Veröffentlicht am

DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns

In this session, you'll see how to leverage the best features of Cassandra to solve real world issues (Rate limiting/anti fraud system, account validation, security token …). We'll also highlight some common anti-patterns (queue,partition key miss,CQL3 null) and see how to solve them in the Cassandra way.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - NoSQl matters Barcelona 2014

  1. 1. Cassandra nice use-cases and worst anti-patterns DuyHai DOAN, Technical Advocate @doanduyhai
  2. 2. Shameless self-promotion! @doanduyhai 2 Duy Hai DOAN Cassandra technical advocate • talks, meetups, confs • open-source devs (Achilles, …) • technical point of contact ☞ duy_hai.doan@datastax.com • production troubleshooting
  3. 3. Agenda! @doanduyhai 3 Anti-patterns • Queue-like designs • CQL null values • Intensive updates on same column • Design around dynamic schema
  4. 4. Agenda! @doanduyhai 4 Nice use-cases • Rate-limiting • Anti Fraud • Account validation • Sensor data timeseries
  5. 5. Data Model Crash Course!
  6. 6. Last Write Win (LWW)! INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); @doanduyhai 6 jdoe age name 33 John DOE #partition
  7. 7. Last Write Win (LWW)! @doanduyhai jdoe age (t1) name (t1) 33 John DOE 7 INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); auto-generated timestamp (μs) .
  8. 8. Last Write Win (LWW)! @doanduyhai 8 UPDATE users SET age = 34 WHERE login = jdoe; jdoe SSTable1 SSTable2 age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  9. 9. Last Write Win (LWW)! @doanduyhai 9 DELETE age FROM users WHERE login = jdoe; tombstone SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  10. 10. Last Write Win (LWW)! @doanduyhai 10 SELECT age FROM users WHERE login = jdoe; ? ? ? SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  11. 11. Last Write Win (LWW)! @doanduyhai 11 SELECT age FROM users WHERE login = jdoe; ✕ ✕ ✓ SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  12. 12. Compaction! @doanduyhai 12 SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 New SSTable jdoe age (t3) name (t1) ý John DOE
  13. 13. Simple Table! @doanduyhai 13 CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login)); partition key (#partition)
  14. 14. Clustered table (1 – N)! @doanduyhai 14 CREATE TABLE sensor_data ( sensor_id text, date timestamp, raw_data blob, PRIMARY KEY((sensor_id), date)); partition key clustering column (sorted) unicity
  15. 15. Sorted on disk! sensor_id @doanduyhai SSTable1 date1(t1) date2(t2) date3(t3) date4(t4) date5(t5) … … … … … …
  16. 16. Worst anti-patterns! Queue-like designs! CQL null! Intensive updates on same column! Design around dynamic schema! !
  17. 17. Failure level! @doanduyhai 17 ☠ ☠☠ ☠☠☠ ☠☠☠☠
  18. 18. Queue-like designs! @doanduyhai 18 Adding new message ☞ 1 physical insert
  19. 19. Queue-like designs! @doanduyhai 19 Adding new message ☞ 1 physical insert Consuming message = deleting it ☞ 1 physical insert (tombstone)
  20. 20. Queue-like designs! @doanduyhai 20 Adding new message ☞ 1 physical insert Consuming message = deleting it ☞ 1 physical insert (tombstone) Transactional queue = re-inserting messages ☞ physical insert * <many>
  21. 21. Queue-like designs! FIFO queue @doanduyhai 21 A { A }
  22. 22. Queue-like designs! FIFO queue @doanduyhai 22 A B { A, B }
  23. 23. Queue-like designs! FIFO queue @doanduyhai 23 A B C { A, B, C }
  24. 24. Queue-like designs! FIFO queue @doanduyhai 24 A B C A { B, C }
  25. 25. Queue-like designs! FIFO queue @doanduyhai 25 A B C A D { B, C, D }
  26. 26. Queue-like designs! FIFO queue @doanduyhai 26 A B C A D B { C, D }
  27. 27. Queue-like designs! FIFO queue @doanduyhai 27 A B C A D B C { D }
  28. 28. Queue-like designs! FIFO queue, worst case @doanduyhai 28 A A A A A A A A A A { }
  29. 29. Failure level! @doanduyhai 29 ☠☠☠
  30. 30. Queue-like designs! Read cursor. Next read will give {A, E} @doanduyhai 30 Solution: event-sourcing • write ahead, never delete • read = move a cursor forward (or backward in time for history) A B C D A E Write cursor
  31. 31. CQL null semantics! @doanduyhai 31 Reading null value means • value does not exist (has never bean created) • value deleted (tombstone) SELECT age FROM users WHERE login = jdoe; à NULL
  32. 32. CQL null semantics! @doanduyhai 32 Writing null means • delete value (creating tombstone) • even though it does not exist UPDATE users SET age = NULL WHERE login = jdoe;
  33. 33. CQL null semantics! @doanduyhai 33 Seen in production: prepared statement UPDATE users SET age = ?, … geo_location = ?, mood = ?, … WHERE login = ?;
  34. 34. CQL null semantics! @doanduyhai 34 Seen in production: bound statement preparedStatement.bind(33, …, null, null, null, …); null ☞ tombstone creation on each update … jdoe age name geo_loc mood status 33 John DOE ý ý ý
  35. 35. Failure level! @doanduyhai 35 ☠
  36. 36. Intensive update! @doanduyhai 36 Context • small start-up • cloud-based video recording & alarm • internet of things (sensor) • 10 updates/sec for some sensors
  37. 37. Intensive update on same column! @doanduyhai 37 Data model sensor_id value 45.0034 CREATE TABLE sensor_data ( sensor_id long, value double, PRIMARY KEY(sensor_id));
  38. 38. Intensive update on same column! UPDATE sensor_data SET value = 45.0034 WHERE sensor_id = …; UPDATE sensor_data SET value = 47.4182 WHERE sensor_id = …; UPDATE sensor_data SET value = 48.0300 WHERE sensor_id = …; @doanduyhai 38 Updates sensor_id value (t1) 45.0034 sensor_id value (t13) 47.4182 sensor_id value (t36) 48.0300
  39. 39. Intensive update on same column! @doanduyhai 39 Read SELECT sensor_value from sensor_data WHERE sensor_id = …; read N physical columns, only 1 useful … (until compaction) sensor_id value (t1) 45.0034 sensor_id value (t13) 47.4182 sensor_id value (t36) 48.0300
  40. 40. Failure level! @doanduyhai 40 ☠☠
  41. 41. Intensive update on same column! @doanduyhai 41 Solution 1: leveled compaction! (if your I/O can keep up) sensor_id value (t1) 45.0034 sensor_id value (t13) 47.4182 sensor_id value (t36) 48.0300 sensor_id value (t36) 48.0300
  42. 42. Intensive update on same column! @doanduyhai 42 Solution 2: reversed timeseries & DateTiered compaction strategy CREATE TABLE sensor_data ( sensor_id long, date timestamp, value double, PRIMARY KEY((sensor_id), date)) WITH CLUSTERING ORDER (date DESC);
  43. 43. Intensive update on same column! SELECT sensor_value FROM sensor_data WHERE sensor_id = … LIMIT 1; @doanduyhai 43 sensor_id date3(t3) date2(t2) date1(t1) Data cleaning by configuration the strategy (base_time_seconds) ... 48.0300 47.4182 45.0034 …
  44. 44. Design around dynamic schema! @doanduyhai 44 Customer emergency call • 3 nodes cluster almost full • impossible to scale out • 4th node in JOINING state for 1 week • disk space is filling up, production at risk!
  45. 45. Design around dynamic schema! @doanduyhai 45 After investigation • 4th node in JOINING state because streaming is stalled • NPE in logs
  46. 46. Design around dynamic schema! @doanduyhai 46 After investigation • 4th node in JOINING state because streaming is stalled • NPE in logs Cassandra source-code to the rescue
  47. 47. Design around dynamic schema! @doanduyhai 47 public class CompressedStreamReader extends StreamReader { … @Override public SSTableWriter read(ReadableByteChannel channel) throws IOException { … Pair<String, String> kscf = Schema.instance.getCF(cfId); ColumnFamilyStore cfs = Keyspace.open(kscf.left).getColumnFamilyStore(kscf.right); NPE here
  48. 48. Design around dynamic schema! @doanduyhai 48 The truth is • the devs dynamically drop & recreate table every day • dynamic schema is in the core of their design Example: DROP TABLE catalog_127_20140613; CREATE TABLE catalog_127_20140614( … );
  49. 49. Design around dynamic schema! @doanduyhai 49 Failure sequence n1 n2 n4 n3 catalog_x_y catalog_x_y catalog_x_y catalog_x_y 4 1 2 3 5 6
  50. 50. Design around dynamic schema! @doanduyhai 50 Failure sequence n1 n2 n4 n3 catalog_x_y catalog_x_y catalog_x_y catalog_x_y 4 1 2 3 5 6 catalog_x_z catalog_x_z catalog_x_z catalog_x_z
  51. 51. Design around dynamic schema! @doanduyhai catalog_x_y ???? 51 Failure sequence n1 n2 n4 n3 4 1 2 3 5 6 catalog_x_z catalog_x_z catalog_x_z catalog_x_z
  52. 52. Design around dynamic schema! @doanduyhai 52 Nutshell • dynamic schema change as normal prod operation is not recommended • schema AND topology change at the same time is an anti-pattern
  53. 53. Failure level! @doanduyhai 53 ☠☠☠☠
  54. 54. ! " ! Q & R
  55. 55. Nice Examples! Rate limiting! Anti Fraud! Account Validation!
  56. 56. Rate limiting! @doanduyhai 56 Start-up company, reset password feature 1) /password/reset 2) SMS with token A0F83E63DB935465CE73DFE…. Phone number Random token 3) /password/new/<token>/<password>
  57. 57. Rate limiting! @doanduyhai 57 Problem 1 • account created with premium phone number
  58. 58. Rate limiting! @doanduyhai 58 Problem 1 • account created with premium phone number • /password/reset x 100
  59. 59. Rate limiting! @doanduyhai 59 « money, money, money, give money, in the richman’s world » $$$
  60. 60. Rate limiting! @doanduyhai 60 Problem 2 • massive hack
  61. 61. Rate limiting! @doanduyhai 61 Problem 2 • massive hack • 106 /password/reset calls from few accounts
  62. 62. Rate limiting! @doanduyhai 62 Problem 2 • massive hack • 106 /password/reset calls from few accounts • SMS messages are cheap
  63. 63. Rate limiting! @doanduyhai 63 Problem 2 • ☞ but not at the 106/per user/per day scale
  64. 64. Rate limiting! @doanduyhai 64 Solution • premium phone number ☞ Google libphonenumber
  65. 65. Rate limiting! @doanduyhai 65 Solution • premium phone number ☞ Google libphonenumber • massive hack ☞ rate limiting with Cassandra
  66. 66. Cassandra Time To Live! @doanduyhai 66 Time to live • built-in feature • insert data with a TTL in sec • expires server-side automatically • ☞ use as sliding-window
  67. 67. Rate limiting in action! @doanduyhai 67 Implementation • threshold = max 3 reset password per sliding 24h per user
  68. 68. Rate limiting in action! @doanduyhai 68 Implementation • when /password/reset called • check threshold • reached ☞ error message/ignore • not reached ☞ log the attempt with TTL = 86400
  69. 69. Rate Limiting Demo
  70. 70. Anti Fraud! @doanduyhai 70 Real story • many special offers available • 30 mins international calls (50 countries) • unlimited land-line calls to 5 countries • …
  71. 71. Anti Fraud! @doanduyhai 71 Real story • each offer has a duration (week/month/year) • only one offer active at a time
  72. 72. Anti Fraud! @doanduyhai 72 Cassandra TTL • when granting new offer INSERT INTO user_special_offer(login, offer_code, …) VALUES(‘jdoe’, ’30_mins_international’,…) IF NOT EXISTS USING TTL <offer_duration>;
  73. 73. Anti Fraud Demo
  74. 74. Account Validation! @doanduyhai 74 Requirement • user creates new account • sends sms/email link with token to validate account • 10 days to validate
  75. 75. Account Validation! @doanduyhai 75 How to ? • create account with 10 days TTL INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33) USING TTL 864000;
  76. 76. Account Validation! @doanduyhai 76 How to ? • create random token for validation with 10 days TTL INSERT INTO account_validation(token, login, name, age) VALUES(‘A0F83E63DB935465CE73DFE…’, ‘jdoe’, ‘John DOE’, 33) USING TTL 864000;
  77. 77. Account Validation! @doanduyhai 77 On token validation • check token exist & retrieve user details SELECT login, name, age FROM account_validation WHERE token = ‘A0F83E63DB935465CE73DFE…’; • re-insert durably user details without TTL INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
  78. 78. ! " ! Q & R
  79. 79. Thank You @doanduyhai duy_hai.doan@datastax.com https://academy.datastax.com/

×