YouTube Video: TBA
As we move into the world of Big Data and the Internet of Things, the systems architectures and data models we've relied on for decades are becoming a hindrance. At the core of the problem is the read-modify-write cycle. In this session, Al will talk about how to build systems that don't rely on RMW, with a focus on Cassandra. Finally, for those times when RMW is unavoidable, he will cover how and when to use Cassandra's lightweight transactions and collections.
10. 3-tier + caching
cache
slave
more complex
cache coherency is a hard problem
cascading failures are common
Next: Out: the funnel. In: The ring.
master
slave
11. Webscale
outer ring: clients (cell phones, etc.)
middle ring: application servers
inside ring: Cassandra servers
!
Serving millions of clients with mere hundreds or thousands of nodes requires a different approach to applications!
14. Theory & Practice
In theory there is no difference
between theory and practice. In
practice there is.
!
-Yogi Berra
I could talk about CAP theorem, but there’s plenty of that going around. And then there’s this quote.
16. Safety
But you don’t have to throw it out entirely.
- roll cages
- kill switches
- rev limiters
- protective clothing
17. Safety
max speed is 320 km/h (200 mph)
Tested to 443 km/h (275 mph) & 581 km/h (361 mph) (world record)
Safety is ultimately a property of the system.
But it can be expensive.
- lots of maintenance
- inspections
- reputation
18. Read-Modify-Write
UPDATE
Employees
SET
Rank=4,
Promoted=2014-‐01-‐24
WHERE
EmployeeID=1337;
EmployeeID**1337
Name********アルトビー
StartDate***2013510501
Rank********3
Promoted****null
This might be what it looks like from SQL / CQL, but …
!
EmployeeID**1337
Name********アルトビー
StartDate***2013510501
Rank********4
Promoted****2014501524
19. Read-Modify-Write
UPDATE
Employees
SET
Rank=4,
Promoted=2014-‐01-‐24
WHERE
EmployeeID=1337;
EmployeeID**1337
Name********アルトビー
StartDate***2013510501
Rank********4
Promoted****2014501524
EmployeeID**1337
Name********アルトビー
StartDate***2013510501
Rank********3
Promoted****null
RDBMS
TNSTAAFL
無償の昼食なんてものはありません
TNSTAAFL …
If you’re lucky, the cell is in cache.
Otherwise, it’s a disk access to read, another to write.
20. Eventual Consistency
UPDATE
Employees
SET
Rank=4,
Promoted=2014-‐01-‐24
WHERE
EmployeeID=1337;
EmployeeID**1337
Name********アルトビー
StartDate***2013510501
Rank********3
Promoted****null
EmployeeID**1337
Name********アルトビー
StartDate***2013510501
Rank********4
Promoted****2014501524
Explain distributed RMW
More complicated.
Will talk about how it’s abstracted in CQL later.
Coordinator
21. Eventual Consistency
UPDATE
Employees
SET
Rank=4,
Promoted=2014-‐01-‐24
WHERE
EmployeeID=1337;
EmployeeID**1337
Name********アルトビー
StartDate***2013510501
Rank********3
Promoted****null
EmployeeID**1337
Name********アルトビー
StartDate***2013510501
Rank********4
Promoted****2014501524
Coordinator
read
write
Memory replication on write, depending on RF, usually RF=3.
Reads AND writes remain available through partitions.
Hinted handoff.
22. Overwriting
CREATE TABLE host_lookup (
name
varchar,
id
uuid,
PRIMARY KEY(name)
);
!
INSERT INTO host_uuid (name,id) VALUES
(“www.tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”);
!
INSERT INTO host_uuid (name,id) VALUES
(“tobert.org”,
“463b03ec-fcc1-4428-bac8-80ccee1c2f77”);
!
INSERT INTO host_uuid (name,id) VALUES
(“www.tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”);
!
SELECT id FROM host_lookup WHERE name=“tobert.org”;
Beware of expensive compaction
Best for: small indexes, lookup tables
Compaction handles RMW at storage level in the background.
Under heavy writes, clock synchronization is very important to avoid timestamp collisions. In practice, this isn’t a
problem very often and even when it goes wrong, not much harm done.
23. Key/Value
CREATE TABLE keyval (
key VARCHAR,
value blob,
PRIMARY KEY(key)
);
!
INSERT INTO keyval (key,value) VALUES (?, ?);
!
SELECT value FROM keyval WHERE key=?;
e.g. memcached
Don’t do this.
But it works when you really need it.
24. Journaling / Logging / Time-series
CREATE TABLE tsdb (
time_bucket timestamp,
time
timestamp,
value
blob,
PRIMARY KEY(time_bucket, time)
);
!
INSERT INTO tsdb (time_bucket, time, value) VALUES (
“2014-10-24”,
-- 1-day bucket (UTC)
“2014-10-24T12:12:12Z”, -- ALWAYS USE UTC
‘{“foo”: “bar”}’
);
Oversimplified, use normalization over blobs whenever possible.
ALWAYS USE UTC :)
25. Journaling / Logging / Time-series
2014(01(24 2014(01(24T12:12:12Z 2014(01(24T21:21:21Z
{“key”:" value”}
{“key”:"“value”}
2014(01(25 2014(01(25T13:13:13Z
{“key”:"“value”}
{"“2014(01(24”"=>"{
""""“2014(01(24T12:12:12Z”"=>"{
""""""""‘{“foo”:"“bar”}’
""""}
}
Oversimplified, use normalization over blobs whenever possible.
ALWAYS USE UTC :)
26. Content Addressable Storage
CREATE TABLE objects (
cid
varchar,
content blob,
PRIMARY KEY(cid)
);
!
INSERT INTO objects (cid,content) VALUES (?, ?);
!
SELECT content FROM objects WHERE cid=?;
The address of the data can be created by using the data itself.
e.g. SHA1 (160 bits), MD5, Whirpool, etc.
27. Content Addressable Storage
require
'cql'
require
‘digest/sha1'
!
dbh
=
Cql::Client.connect(hosts:
['127.0.0.1'])
dbh.use('cas')
!
data
=
{
:timestamp
=>
1390436043,
:value
=>
1234
}
!
cid
=
Digest::SHA1.new.digest(data.to_s).unpack(‘H*’)
!
sth
=
dbh.prepare(
'SELECT
content
FROM
objects
WHERE
cid=?')
!
sth.execute(root_cid).first[‘content’]
Oversimplified! e.g. data.to_s is a BAD idea
ALWAYS USE UTC :)
28. In Practice
• In practice, RMW is sometimes unavoidable
• Recent versions of Cassandra support RMW
• Use them only when necessary
• Or when performance hit is mitigated elsewhere or irrelevant
29. Cassandra Collections
CREATE TABLE posts (
id
uuid,
body
varchar,
created timestamp,
authors set<varchar>,
tags
set<varchar>,
PRIMARY KEY(id)
);
!
INSERT INTO posts (id,body,created,authors,tags) VALUES (
ea4aba7d-9344-4d08-8ca5-873aa1214068,
‘アルトビーの犬はばかね’,
‘now',
[‘アルトビー’, ’ィオートビー’],
[‘dog’, ‘silly’, ’犬’, ‘ばか’]
);
quick story about 犬ばかね
sets & maps are CRDTs, safe to modify
30. Cassandra Collections
CREATE TABLE metrics (
bucket timestamp,
time
timestamp,
value
blob,
labels map<varchar,varchar>,
PRIMARY KEY(bucket)
);
sets & maps are CRDTs, safe to modify
31. Lightweight Transactions
• Cassandra 2.0 and on support LWT based on PAXOS
• PAXOS is a distributed consensus protocol
• Given a constraint, Cassandra ensures correct ordering
32. Lightweight Transactions
UPDATE
users
SET
username=‘tobert’
WHERE
id=68021e8a-‐9eb0-‐436c-‐8cdd-‐aac629788383
IF
username=‘renice’;
!
INSERT
INTO
users
(id,
username)
VALUES
(68021e8a-‐9eb0-‐436c-‐8cdd-‐aac629788383,
‘renice’)
IF
NOT
EXISTS;
!
!
Client error on conflict.
33. Conclusion
• Businesses are scaling further and faster than ever
• Assume you have to provide utility-grade service
• Data models and application architectures need to change to keep up
• Avoiding Read/Modify/Write makes high-performance easier
• Cassandra provides tools for safe RMW when you need it
!
• Questions?