5. Cassandra in EC2 at Talkbits
NetworkTopologyStrategy + EC2MultiRegionSnitch
1 DC, 3 racks (availability zones in S3 Region), N nodes per rack.
3N nodes total.
Data stored in 3 local copies, 1 per zone.
Write with LOCAL_QUORUM setting, read with 1 or 2.
m1.large nodes (2 cores, 4CU, 7.5Gb RAM).
Transaction log and data files are both on RAID0-ed ephemeral
drive (2 drives in array). Works for SSD or EC2 disks only!
Other typical setup options for EC2:
m1.xlarge (16Gb) / m2.4xlarge (64Gb) / hi1.4xlarge (SSD) nodes
EBS-backed data volumes (not recommended. use for
development only).
6. Cassandra consistency options
Definitions
N, R, W settings from Amazon Dynamo.
N – replication factor. Set per keyspace on keyspace creation.
Quorum: N / 2 + 1 (rounded down)
RW consistency options:
ANY, ONE, TWO, THREE, QUORUM, LOCAL_QUORUM &
EACH_QUORUM (multi-dc), ALL.
Set per query.
7. Cassandra consistency semantics
W + R > N
Ensures strong consistency. Read will always reflect the most recent
write.
R = W = [LOCAL_]QUORUM
Strong consistency. See quorum definition and formula above.
W + R <= N
Eventual consistency.
W = 1
Good for fire-n-forget writes: logs, traces, metrics, page views etc.
8. Cassandra backups to S3
Full backups
•Periodic snapshots (daily, weekly)
•Remove from local disk after upload to S3 to prevent disk
overflow
Incremental backups
•SSTable are compressed and copied to S3
•Happens on IN_MOVED_TO, IN_CLOSE_WRITE events
•Don’t turn on with leveled compaction (huge network traffic
to S3)
Continuous backups
•Compress and copy transaction log to S3 with short time
intervals (for example - 5, 30, 60 mins)
9. Cassandra backups to S3 - tools
TableSnap from SimpleGeo
https://github.com/Instagram/tablesnap (most up-to-date fork)
3 simple Python scripts is the whole tool (tablesnap, tableslurp,
tablechop). Allows to upload SSTables in real-time, restore and remove
old backups uploads from S3.
Priam from Netflix
https://github.com/Netflix/Priam
Full-blown web application. Requires servlet container to run and
depends on Amazon SimpleDB service for distributed token
management.