2. @doanduyhai
Datastax
• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 450+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
2
7. @doanduyhai
3) Multi Data-centers
7
• out-of-the-box (config only)
• AWS config for multi-regions DCs
• GCE support
• Microsoft Azure support
• CloudStack support
8. @doanduyhai
Multi DC usages
Data locality, disaster recovery
8
C*
C*
C*
C*
C* C*
C* C* C*
C*
C*
C*
C*
New York (DC1) London (DC2)
Async
replication
9. @doanduyhai
Multi DC usages
Virtual DC for workload segregation
9
C*
C*
C*
C*
C* C*
C* C* C*
C*
C*
C*
C*
Production
(LIVE)
Analytics
(Spark)
Async
replication
Same room
10. @doanduyhai
Multi DC usages
Prod data copy for back-up/benchmark
10
C*
C*
C*
C*
C* C*
C* C* C*
C*
C*
C*
C*
Use
LOCAL_XXX
Consistency
Levels
My tiny test DC
READ-ONLY!!!
Async
replication
11. @doanduyhai
4) Operational simplicity
11
• 1 node = 1 process + 2 config files (cassandra.yaml + cassandra-rackdc.properties)
• No role between nodes, perfect symmetry
• deployment automation
• OpsCenter* for
• monitoring
• provisioning
• services (repair, performance, …)
* only with Datastax Enterprise from Cassandra 3.x
42. @doanduyhai
Consistency level common patterns
42
ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down
QUORUMRead + QUORUMWrite
☞ available for read/write even if (RF - 1) replica (s) down
45. @doanduyhai
Last Write Win (LWW)
45
jdoe
age name
33 John DOE
INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33);
#partition
46. @doanduyhai
Last Write Win (LWW)
46
INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33);
jdoe
age (t1) name (t1)
33 John DOE
auto-generated timestamp (μs)
.
47. @doanduyhai
Last Write Win (LWW)
47
UPDATE users SET age = 34 WHERE login = 'jdoe';
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2
48. @doanduyhai
Last Write Win (LWW)
48
DELETE age FROM users WHERE login = 'jdoe';
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2
tombstone
SSTable3
jdoe
age (t3)
ý
49. @doanduyhai
Last Write Win (LWW)
49
SELECT age FROM users WHERE login = 'jdoe';
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
???
50. @doanduyhai
Last Write Win (LWW)
50
SELECT age FROM users WHERE login = 'jdoe';
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
✓✕✕
54. @doanduyhai
DML statements
54
INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33);
UPDATE users SET age = 34 WHERE login = 'jdoe';
DELETE age FROM users WHERE login = 'jdoe';
SELECT age FROM users WHERE login = 'jdoe';
55. @doanduyhai
What’s about joins ?
55
How can I join data between tables ?
How can I model 1 – N relationships ?
How to model a mailbox ?
EmailsUser
1 n
58. @doanduyhai
Queries
58
Get message by user and message_id (date)
Get message by user and date interval
SELECT * FROM mailbox WHERE login = 'jdoe'
and message_id = ‘2014-11-21 16:00:00’;
SELECT * FROM mailbox WHERE login = 'jdoe'
and message_id <= ‘2014-11-25 23:59:59’
and message_id >= ‘2014-11-20 00:00:00’;
59. @doanduyhai
Queries
59
Get message by message_id only
Get message by date interval
SELECT * FROM mailbox WHERE message_id = ‘2014-11-21 16:00:00’; ???
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-11-25 23:59:59’ ???
and message_id >= ‘2014-11-20 00:00:00’;
60. @doanduyhai
Queries
60
Get message by message_id only (#partition not provided)
Get message by date interval (#partition not provided)
SELECT * FROM mailbox WHERE message_id = ‘2014-11-21 16:00:00’;
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-11-25 23:59:59’
and message_id >= ‘2014-11-20 00:00:00’;
62. @doanduyhai
Queries
62
Get message by user range (range query on #partition)
Get message by user pattern (non exact match on #partition)
SELECT * FROM mailbox WHERE login >= 'hsue' and login <= 'jdoe';
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
63. @doanduyhai
WHERE clause restrictions
63
All DML queries must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
• ☞ full cluster scan
On clustering columns, only range queries (<, ≤, >, ≥) and exact match (=)
WHERE clause only possible
• on columns defined in PRIMARY KEY
• on indexed columns ( )
65. @doanduyhai
WHERE clause restrictions
65
What if I want to perform "arbitrary" WHERE clause ?
• search form scenario, dynamic search fields
NEW SASI secondary index (contribution from Apple engineers)
• ☞ https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
• ☞ integrated Query Planner (for multi-criteria searches)
• ☞ Cassandra version ≥ 3.4
• ☞ still buggy (CASSANDRA-11383, CASSANDRA-11399, …)
SELECT * FROM users WHERE firstname LIKE '%John%';
SELECT * FROM users WHERE age >= 20 AND age <=30;
66. @doanduyhai
WHERE clause restrictions
66
What if I want to perform "arbitrary" WHERE clause ?
• search form scenario, dynamic search fields
Or Datastax Enterprise Search (battle field proven)
• ☞ Apache Solr (Lucene) integration (Datastax Enterprise Search)
• ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
SELECT * FROM users WHERE solr_query = 'age:[33 TO *] AND gender:male';
SELECT * FROM users WHERE solr_query = 'lastname:*schwei?er';
73. @doanduyhai
JSON syntax for INSERT/UPDATE/DELETE
73
CREATE TABLE users (
id text PRIMARY KEY,
age int,
state text );
INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}’;
INSERT INTO users(id, age, state) VALUES('me', fromJson('20'), 'CA');
UPDATE users SET age = fromJson('25’) WHERE id = fromJson('"me"');
DELETE FROM users WHERE id = fromJson('"me"');
74. @doanduyhai
JSON syntax for SELECT
74
> SELECT JSON * FROM users WHERE id = 'me';
[json]
----------------------------------------
{"id": "me", "age": 25, "state": "CA”}
> SELECT JSON age,state FROM users WHERE id = 'me';
[json]
----------------------------------------
{"age": 25, "state": "CA"}
> SELECT age, toJson(state) FROM users WHERE id = 'me';
age | system.tojson(state)
-----+----------------------
25 | "CA"
75. @doanduyhai
Why Materialized Views ?
Relieve the pain of manual denormalization
75
CREATE TABLE user(
id int PRIMARY KEY,
country text,
…);
CREATE TABLE user_by_country(
country text,
id int,
…,
PRIMARY KEY(country, id));
76. @doanduyhai
Materialzed View In Action
76
CREATE MATERIALIZED VIEW user_by_country
AS SELECT country, id, firstname, lastname
FROM user
WHERE country IS NOT NULL AND id IS NOT NULL
PRIMARY KEY(country, id)
CREATE TABLE user_by_country (
country text,
id int,
firstname text,
lastname text,
PRIMARY KEY(country, id));
77. @doanduyhai
User Defined Functions (UDF)
77
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
maxOf (col1 int, col2 int)
CALL ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN int
LANGUAGE java
AS $$
return Math.max(col1, col2);
$$;
SELECT maxOf(col1, col2) FROM table WHERE id = xxx;
78. @doanduyhai
User Defined Aggregates (UDA)
78
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
sum(bigint)
SFUNC accumulatorFunction
STYPE bigint
[FINALFUNC finalFunction]
INITCOND 0;
CREATE FUNCTION accumulatorFunction(accu bigint, column bigint)
RETURNS NULL ON NULL INPUT RETURN bigint LANGUAGE java
AS $$ return accu + colum; $$;