2. Instance Sizes
● m1.xlarge is by far the most common size
● m1.large is ok for many use cases
● m2.4xlarge in some cases
● keep the entire dataset in memory
● c1.xlarge / cc1.4xlarge
● Smallish but very hot set of data
– regardless of how much data is on disk
● Extremely high request rate
● Encrypted node-node communications and high traffic
● Usually better off with many m1.xlarge instances because of
the extra memory, but not always
@mdennis
3. Configuration
● Stripe All Ephemeral Drives
● data directory and commit log on same volume
● Only applies to EC2 and SSDs, not physical HW
● Why?
● 6-8 GB heap on m1.xlarge
● 3-4 GB heap on m1.large
● Phi Convict Threshold? Maybe ...
@mdennis
4. EBS versus Ephemeral
● Ephemeral drives are:
● Generally faster for C*
● More stable (no pauses/freezes; outages?)
● Cheaper
● Easier to initially configure
● Striped EBS?
● yeah, about that …
● TL;DL don't use EBS for C* on EC2
@mdennis
5. Multi-Zone
● Alternate zones in your token topology
● No really, this is important, alternate zones
– We should probably fix this ...
● “complicated, but possible” to add new zones
after initial deployment
● Never move a *token* to a different region or
zone
● If you think that is what you want to do, really you
want to bootstrap new one at token-1 in the new
region/zone and then decom the old one
@mdennis
6. Multi-Region C* on EC2
● Connectivity is the complicated part
● Ec2MultiRegionSnitch is not the entire answer
– https://issues.apache.org/jira/browse/CASSANDRA-2452
● Don't try to make a “fail over” DC, just go with active-active
● If you insist, then do the fail over in your application and configure C* the
same as you would active-active
● Generally requires a lot more storage
● Doesn't matter though because you're using ephemeral drives (right?)
and don't want a TB of data on each node anyway
@mdennis
7. Multi-Region Connectivity Options
● VPN
● Encrypted node-node communication
● CPU utilization is often a downside
● VPNCubed / VPCPlus
● I've never deployed it, heard good things about it though
● Amazon VPC
● anyone know if a single VPC can span regions yet?
● SSH Tunnels
● EC2 security groups
● IPTables
● Encrypted node-node + public IP binding + AWS security groups +
IPTables (EIPs may simplify this, never actually tried it)
@mdennis
8. Recovery From Failures
● Don't “fix” EC2 nodes, replace them
● boostrap at token-1, remove old token
– bootstrap can be slow, but will get better
● Other than that it's the same in EC2 as not ...
@mdennis
9. Node Maintenance
● “Maintenance” On EC2?
● Usually not required (just replace the node)
● If it is, just stop C*, CL+HH/repair/RR will fix it
● Same as physical HW
● https://issues.apache.org/jira/browse/CASSANDRA-2034
● Stop Trying To Decom Nodes Just To Replace a Disk !!!
@mdennis
10. Backups
● C* snapshots and push to S3
● Directory Watcher that pushes new files to S3
● SimpleGeo: https://github.com/simplegeo/tablesnap
● Netflix: http://slidesha.re/NFOnCassBkup
● Keep a log of all incoming writes
● Not specific to S3
● Can be coupled with snapshots / S3
● Useful for other reasons as well
● Compression in transit to S3 (or where ever) can be done on
a separate EC2 instance to avoid burning CPU
● Usually not worth the extra complexity / cost
@mdennis
11. Changing Node Sizes
● Start a new instance
● rsync data from from original node to new node
● Shutdown C* on original node
● rsync data from from original node to new node
● Start C* on new node
● Shutdown original instance
● NB: Assumes same token, region, zone, etc
@mdennis
12. Elastic Load Balancers
● They're awesome, use them
● Could be more awesome (e.g. better integration with Route 53)
● What I really want is TCP anycast for ELB across regions (AWS could
make it work)
● Balance across regions with GeoIP / GeoDNS
● Zerigo, TZOHA, Neustar, “homegrown”, etc
● Route 53? You wish (though Route 53 itself is run over anycast)
– “in the future we plan for Route 53 to also give you greater control over
… the route your users take to reach an endpoint” --Werner Vogels
● Put them in front of your app servers, not your C* instances
● Keep your app servers stateless or at least “weakly” stateless (e.g. no sticky
sessions required)
@mdennis
13. AMIs versus Scripted Setup
● DataStax publishes C* AMIs
● Chef Recipes as well
● Or roll your own …
● Whatever you do, just make sure it's automated
and repeatable
● *personally* I prefer scripting the setup
remotely, but this is … “less than ideal”
● PSSH is, in general, awesome
@mdennis
14. WTF?!
● Your zone X is not the same as my zone X
● Consistent within an EC2 account
● Problematic across accounts
● Does not apply to regions (i.e. your region X is my region X)
● EIPs resolve to private IPs from within AWS
● EBS volumes sometimes just “freeze”
● AWS: “yeah, that happens sometimes under load”
● steal% sometimes 20% or more (1%-3% is “normal”)
● This is AWS literally stealing your money
● Thankfully not all that common, but watch out for it
@mdennis
15. Missing AWS Features
● ELB over anycast
● Probably doable by AWS, but not others ...
● GeoDNS from Route53
● No really, WTF Doesn't Route53 Do GeoDNS ?!?!
● Multi-Region VPC
● Local SSDs
@mdennis
16. We're Hiring !
● Developers
● QA
● Community Manager
● Sales / SE
● Interns
– Dev
– Support
– QA
● Smart People Interested In Cassandra
@mdennis
17. Cassandra On EC2
Q?
(yes, I'll post the slides on slideshare)
@mdennis