18. SORTED + TIMESTAMP
COLUMN KEY COLUMN KEY
ROW KEY
VALUE VALUE
the Bigtable model, “column oriented”,
“sparse tables” found in Cassandra and HBase
19. LIST OR SET
KEY VALUE VALUE VALUE
SORTED SET OR HASH
KEY KEY KEY
KEY
VALUE VALUE VALUE
INCREMENT APPEND, SLICE, CAS
,
KEY VALUE
“datastructure server”, e.g. Redis
37. DIVIDED BY DA A SIZE
T
KEYSPACE
SHARD SHARD SHARD
A Z
REPLICA REPLICA REPLICA
REPLICA REPLICA REPLICA
REPLICA REPLICA REPLICA
divide the keyspace into shards, or regions
(and store each one redundantly)
38. KEYSPACE
SHARD SHARD SHARD SHARD
A Z
REPLICA REPLICA REPLICA REPLICA
REPLICA REPLICA REPLICA REPLICA
REPLICA REPLICA REPLICA REPLICA
SPLIT
split a shard when it grows too big,
move one of the new shards onto a new node
39. KEYSPACE
SHARD SHARD SHARD SHARD
A Z
REPLICA REPLICA REPLICA REPLICA
REPLICA REPLICA REPLICA REPLICA
REPLICA REPLICA REPLICA REPLICA
in reality there’s chunks, tablets or “virtual shards”
that are distributed over physical shards
40. HBASE, MONGODB
sharding is easy in theory, hard in practice,
lots data needs to be moved when adding nodes
41. CONSISTENT
HASHING
scaling writes in an available system
42. 2n 0
NODE
hash(key)
replication
KEYSPACE
NODE
NODE
NODE
each node is responsible for a range of the keyspace,
keys are hashed and mapped to the first following node,
(optionally) replicated to subsequent nodes
43. 2n 0
NEW NODE
NODE
NODE
KEYSPACE
NODE
NODE
NODE
when a new node is added, only part of
the keyspace needs to be moved
44. 2n 0
NODE
NODE
KEYSPACE
NODE
NODE
NODE
in practice, “virtual nodes” are evenly distributed over
the keyspace, and then mapped onto physical nodes
45. CASSANDRA, RIAK
perfect balance, in theory,
but rings may still need rebalancing
46. GOSSIP HINTED HANDOFF
, ,
LOG STRUCTURED
STORAGE, COMPACTION,
VECTOR CLOCKS, READ
REPAIR, JOURNALING,
QUORUMS, EVENTUAL
CONSISTENCY, DYNAMO,
MAP/REDUCE, 2PC
a few of the things I haven’t mentioned, look them up
50. THINK ABOUT YOUR
QUERIES FIRST
don’t optimize for insertion, denormalize heavily,
disk is cheap, this ain’t 1970
51. GIVE A LOT OF
THOUGHT TO YOUR
PRIMARY KEYS
range queries over cleverly designed
primary keys can be very powerful,
good keys required for efficient sharding