SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
PRACTICE MAKES PERFECT:
EXTREME CASSANDRA OPTIMIZATION
@AlTobey
Tech Lead, Compute and Data Services
#CASSANDRA13
1Saturday, June 15, 13
I didn’t name this talk. The conference people did, but I like it a lot.
2
⁍ About me / Ooyala
⁍ How not to manage your Cassandra clusters
⁍ Make it suck less
⁍ How to be a heuristician
⁍ Tools of the trade
⁍ More Settings
⁍ Show & Tell
#CASSANDRA13
Outline
2Saturday, June 15, 13
3
⁍ Tech Lead, Compute and Data Services at Ooyala, Inc.
⁍ C&D team is #devops: 3 ops, 3 eng, me
⁍ C&D team is #bdaas: Big Data as a Service
⁍ ~100 Cassandra nodes, expanding quickly
⁍ Obligatory: we’re hiring
#CASSANDRA13
@AlTobey
3Saturday, June 15, 13
⁍ I won’t go into devops today, but I’m happy to talk about it later.
⁍ 2 years at Ooyala, SRE -> TL Tools Team -> C&D
⁍ C&D builds BDaaS for Ooyala: fully managed Cassandra / Spark / Hadoop / Zookeeper / Kafka
⁍ 11 clusters, 5-36 nodes, working on something big
⁍ BEFORE: Engineers deployed systems: expensive, error-prone, AFTER: Engineers use API’s & consult
4
⁍ Founded in 2007
⁍ 230+ employees globally
⁍ 200M unique users,110+ countries
⁍ Over 1 billion videos played per month
⁍ Over 2 billion analytic events per day
#CASSANDRA13
Ooyala
4Saturday, June 15, 13
5
Ooyala has been using Cassandra since v0.4
Use cases:
⁍ Analytics data (real-time and batch)
⁍ Highly available K/V store
⁍ Time series data
⁍ Play head tracking (cross-device resume)
⁍ Machine Learning Data
#CASSANDRA13
Ooyala & Cassandra
5Saturday, June 15, 13
Ooyala: Legacy Platform
cassandracassandracassandracassandra
6
S3
hadoophadoophadoophadoophadoop
cassandra
ABE Service
APIloggersplayers
START HERE
#CASSANDRA13
read-modify-write
6Saturday, June 15, 13
⁍ Ruby MR -- CDH3u4 -- 80 Dell Blades
⁍ Cassandra 0.4 --> 1.1 / DSE 3.x
⁍ 18x Dell r509 48GiB RAM 6x 600G 15k SAS / MD RAID 5 -- more on RAID later
⁍ We’ve scaled our data volume by 2x yearly for the last 4 years.
memTable
Avoiding read-modify-write
7#CASSANDRA13
Albert 6 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday 3
Kelvin Tuesday 0 Wednesday 0
cassandra13_drinks column family
Krzysztof Tuesday 0 Wednesday 0
Phillip Tuesday 12 Wednesday 0
Tuesday
7Saturday, June 15, 13
⁍ CF to track how much I expect my team at Ooyala to drink
⁍ Row keys are names
⁍ Column keys are days
⁍ Values are a count of drinks
memTable
Avoiding read-modify-write
8#CASSANDRA13
Al Tuesday 2 Wednesday 0
Phillip Tuesday 0 Wednesday 1
cassandra13_drinks column family
ssTable
Albert 6 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday 3
Kelvin Tuesday 0 Wednesday 0
Krzysztof Tuesday 0 Wednesday 0
Phillip Tuesday 12 Wednesday 0
Tuesday
8Saturday, June 15, 13
⁍ Next day, after after a ïŹ‚ush
⁍ I’m speaking so I decided to drink less
⁍ Phillip informs me that he has quit drinking
memTable
Avoiding read-modify-write
9#CASSANDRA13
Albert Tuesday 22 Wednesday 0
cassandra13_drinks column family
ssTable
Albert Tuesday 2 Wednesday 0
Phillip Tuesday 0 Wednesday 1
ssTable
Albert 6 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday 3
Kelvin Tuesday 0 Wednesday 0
Krzysztof Tuesday 0 Wednesday 0
Phillip Tuesday 12 Wednesday 0
Tuesday
9Saturday, June 15, 13
⁍ I’m drinking with all you people so I decide to add 20
⁍ read 2, add 20, write 22
Avoiding read-modify-write
10#CASSANDRA13
cassandra13_drinks column family
ssTable
Albert Tuesday 22 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday 3
Kelvin Tuesday 0 Wednesday 0
Krzysztof Tuesday 0 Wednesday 0
Phillip Tuesday 0 Wednesday 1
10Saturday, June 15, 13
⁍ After compaction & conïŹ‚ict resolution
⁍ Overwriting the same value is just ïŹne! Works really well for some patterns such as time-series data
⁍ Separate read/write streams handy for debugging, but not a big deal
2011: 0.6 ➜ 0.8
11
⁍ Migration is still a largely unsolved problem
⁍ Wrote a tool in Scala to scrub data and write via Thrift
⁍ Rebuilt indexes - faster than copying
hadoop
cassandra
GlusterFS P2P
cassandra
Thrift
#CASSANDRA13
Scala Map/Reduce
11Saturday, June 15, 13
⁍ Because of some legacy choices, we know we had a bunch of expired tombstones
⁍ GlusterFS: userspace, ionice(1), fast & easy
⁍ Scala MR: sstabledump, etc. TOO SLOW, Scala MR only took a week (with production running too!)
Changes: 0.6 ➜ 0.8
12
⁍ Cassandra 0.8
⁍ 24GiB heap
⁍ Sun Java 1.6 update
⁍ Linux 2.6.36
⁍ XFS on MD RAID5
⁍ Disabled swap or at least vm.swappiness=1
#CASSANDRA13
12Saturday, June 15, 13
⁍ More on XFS settings & bugs later
⁍ Got signiïŹcant improvements from RAID & readahead tuning (more later)
⁍ Al’s ïŹrst rule of tuning databases: disable swap or GTFO
⁍ ïŹxed lots of applications by simply disabling swap
13
⁍ 18 nodes ➜ 36 nodes
⁍ DSE 3.0
⁍ Stale tombstones again!
⁍ No downtime!
cassandra
GlusterFS P2P
DSE 3.0
Thrift
#CASSANDRA13
Scala Map/Reduce
2012: Capacity Increase
13Saturday, June 15, 13
⁍ I switched teams, working on Hastur, didn’t document enough, repairs were forgotten again
⁍ 60 day GC Grace Period expired ... 3 months ago
⁍ rsync is not enough for hardware moves: do rebuilds!
⁍ Use DSE Map/Reduce to isolate most of the load from production
System Changes: Apache 1.0 ➜ DSE 3.0
14
⁍ DSE 3.0 installed via apt packages
⁍ Unchanged: heap, distro
⁍ Ran much faster this time!
⁍ Mistake: Moved to MD RAID 0
Fix: RAID10 or RAID5, MD, ZFS, or btrfs
⁍ Mistake: Running on Ubuntu Lucid
Fix: Ubuntu Precise
#CASSANDRA13
14Saturday, June 15, 13
⁍ Previously deployed with Capistrano
⁍ DSE 3’s Hadoop is compiled on Debian 6 so native components will not load on 10.04’s libc
⁍ still gradually rebuilding nodes from RAID0 ➜ RAID5 and Lucid -> Precise
Config Changes: Apache 1.0 ➜ DSE 3.0
15
⁍ Schema: compaction_strategy = LCS
⁍ Schema: bloom_filter_fp_chance = 0.1
⁍ Schema: sstable_size_in_mb = 256
⁍ Schema: compression_options = Snappy
⁍ YAML: compaction_throughput_mb_per_sec: 0
#CASSANDRA13
15Saturday, June 15, 13
⁍ LCS is a huge improvement in operations life (no more major compactions)
⁍ Bloom ïŹlters were tipping over a 24GiB heap
⁍ With lots of data per node, sstable sizes in LCS must be MUCH bigger
⁍ > 100,000 open ïŹles slows everything down, especially startup
⁍ 256mb v.s. 5mb is 50x reduction in ïŹle count
⁍ Compaction can’t keep up: even huge rates don’t work, must be disabled
⁍ try to adjust heap, etc. so you’re ïŹ‚ushing at nearly full memtables to reduce compaction needs
⁍ backreference RMW?
⁍ might be ïŹxed in >= 1.2
16
⁍ 36 nodes ➜ lots more nodes
⁍ As usual, no downtime!
#CASSANDRA13
DSE 3.1DSE 3.1
replication
2013: Datacenter Move
16Saturday, June 15, 13
⁍ Size omitted in published slides. I was asked not to publish yet, I will tweet, etc. in a couple weeks.
⁍ Wasn’t the original plan, but we save a lot of $$ by leaving old cage
⁍ Prep for next-generation architecture!
17
Upcoming use cases:
⁍ Store every event from our players at full resolution
⁍ Cache code for our Spark job server
⁍ AMPLab Tachyon backend?
#CASSANDRA13
Coming Soon for Cassandra at Ooyala
17Saturday, June 15, 13
⁍ This is the intro for the next slide / diagram.
⁍ Considering Astyanax or CQL3 backend for Tachyon so we can contribute it back
18
spark
APIloggersplayers kafka
ingest
job server
#CASSANDRA13
DSE 3.1
Next Generation Architecture: Ooyala Event Store
Tachyon?
18Saturday, June 15, 13
⁍ Look mom! No Hadoop! Remember what I said about latency?
⁍ But we’re not just running DSE on these machines. They’re running: DSE, Spark, KVM, and CDH3u4 (legacy)
⁍ Secret is cgroups!
⁍ Also, ZFS (later)
19
⁍ Security
⁍ Cost of Goods Sold
⁍ Operations / support
⁍ Developer happiness
⁍ Physical capacity (cpu/memory/network/disk)
⁍ Reliability / Resilience
⁍ Compromise
#CASSANDRA13
There’s more to tuning than performance:
19Saturday, June 15, 13
Shifting themes: philosophy of tuning
⁍ Security is always #1: The decision to disable security features is an important decision!
⁍ Example: EC2 instances sizes vary wildly in consistency and raw performance
⁍ Leveled v.s. Size Tiered compaction, ZFS/LVM/MDRAID, bare metal v.s. EC2
⁍ how much of this stuff do my devs need to know? How much work is it to get a new KS/CF?
⁍ speed of node rebuilds, risk incurred by extended rebuilds, speed of repair
a.) e.g. it takes a full 24 hours to repair each node in our 36-node cluster, so > 1 month to repair the cluster
⁍ repeatable conïŹgurations, do future admins have to remember to do stuff or is it automated?
⁍ Look up “John Allspaw Resilience”
⁍ you only have access to EC2 or old hardware, your company has an OS/ïŹlesystem/settings policy (e.g. my $PREVIOUS_JOB CentOS 5.3 Linux
2.18.x hardened distro)
There are others of course.
20
⁍ I’d love to be more scientific, but production comes first
⁍ Sometimes you have to make educated guesses
⁍ It’s not as difficult as it’s made out to be
⁍ Your brain is great at heuristics. Trust it.
⁍ Concentrate on bottlenecks
⁍ Make incremental changes
⁍ Read Malcom Gladwell’s “Blink”
#CASSANDRA13
I am not a scientist ... heuristician?
20Saturday, June 15, 13
⁍ A truly scientiïŹc approach would take a lot of time and resources.
⁍ When under time pressure and things are slow, you have to move fast and measure “by the seat of your pants”
⁍ Be educated, do research, and make sensible decisions without months of testing, be prepared to do better next time
⁍ It’s actually pretty fast and easy this way!
⁍ More on what tools I use later on.
21
Observe, Orient, Decide, Act:
⁍ Observe the system in production under load
⁍ Make small, safe changes
⁍ Observe
⁍ Commit or Revert
#CASSANDRA13
The OODA Loop
21Saturday, June 15, 13
⁍ Understand YOUR production workload ïŹrst!
⁍ Look at Opscenter latency numbers
⁍ cl-netstat.pl (later)
⁍Examples:
⁍ Changing /proc/sys/vm/dirty_background_ratio is fairly safe and shows results quickly.
⁍ Some network settings can take your node ofïŹ‚ine, temporarily or require manual intervention.
⁍ Changing the compaction scheme requires a lot of time and has other implications.
Testing Shiny Things
22
⁍ Like kernels
⁍ And Linux distributions
⁍ And ZFS
⁍ And btrfs
⁍ And JVM’s & parameters
⁍ Test them in production!
#CASSANDRA13
22Saturday, June 15, 13
⁍ Testing stuff in a lab is ïŹne, if you have one and you have the time.
⁍ Take (responsible) advantage of Cassandra’s resilience:
⁍ test things you think should work well in production on ONE node or a couple nodes well spaced out.
ext4
ext4
ext4
ZFS
ext4
kernel
upgrade
ext4
btrfs
Testing Shiny Things: In Production
23#CASSANDRA13
23Saturday, June 15, 13
⁍ Use your staging / non-prod environments ïŹrst if you have them (some people don’t and that’s unfortunate but it happens)
⁍ test things you think should work well in production on ONE node or a couple nodes well spaced out.
24#CASSANDRA13
Brendan Gregg’s Tool Chart
http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x
24Saturday, June 15, 13
⁍ Brendan Gregg’s chart is so good, I just copied it for now.
⁍ Original: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x
⁍ I’ll brieïŹ‚y talk about a few
25#CASSANDRA13
dstat -lrvn 10
25Saturday, June 15, 13
⁍ Just like vmstat but prettier and does way more
⁍ 35 lines of output = about 5 minutes of 10s snapshots
⁍ What’s interesting?
⁍ IO wait starting at line 5, but all numbers are going up, so this is probably during a map/reduce job
⁍ IO wait is high, but disk throughput isn’t impressive at all
⁍ ~2 blocked “procs” (which includes threads)
Not bothering to tune this right now because production latency is ïŹne.
26#CASSANDRA13
cl-netstat.pl
https://github.com/tobert/perl-ssh-tools
26Saturday, June 15, 13
⁍ Home grown.
⁍ Requires no software on the target machines except for SSH.
⁍ Recent Net::SSH2 supports ssh-agent
27#CASSANDRA13
iostat -x 1
27Saturday, June 15, 13
⁍ Mostly I just look at the *wait numbers here.
⁍ Great for ïŹnding a bad disk with high latency.
28#CASSANDRA13
htop
28Saturday, June 15, 13
⁍ Per-CPU utilization bars are nice
⁍ Displays threads by default (hit “H” in plain top)
⁍ Very conïŹgurable!
⁍ For example: 1 thread at 100% CPU is usually the GC
29#CASSANDRA13
jconsole
29Saturday, June 15, 13
⁍ Looks like I can reduce the heap size on this cluster, but should probably increase -Xmn to 100mb * (physical cores) (not counting hypercores)
30#CASSANDRA13
opscenter
30Saturday, June 15, 13
⁍ It looks better on a high-resolution display ;)
31#CASSANDRA13
nodetool ring
10.10.10.10 Analytics rack1 Up Normal 47.73 MB 1.72% 1012046694721756637024691720378965
10.10.10.10 Analytics rack1 Up Normal 63.94 MB 0.86% 1026714038123521225967078556906197
10.10.10.10 Analytics rack1 Up Normal 85.73 MB 0.86% 1041381381525285814909465393433428
10.10.10.10 Analytics rack1 Up Normal 47.87 MB 0.86% 1056048724927050403851852229960659
10.10.10.10 Analytics rack1 Up Normal 39.73 MB 0.86% 1070716068328814992794239066487891
10.10.10.10 Analytics rack1 Up Normal 40.74 MB 1.75% 1100423945662575060114582859200003
10.10.10.10 Analytics rack1 Up Normal 40.08 MB 2.20% 1137814208669076757916163680305794
10.10.10.10 Analytics rack1 Up Normal 56.19 MB 3.45% 1196501513956187970179620530735245
10.10.10.10 Analytics rack1 Up Normal 214.88 MB 11.62% 1394248867770897155613247921498720
10.10.10.10 Analytics rack1 Up Normal 214.29 MB 2.45% 1435882108713996181107000284314407
10.10.10.10 Analytics rack1 Up Normal 158.49 MB 1.76% 1465773686249280216901752503449044
10.10.10.10 Analytics rack1 Up Normal 40.3 MB 0.92% 1481401683578223483181070489250370
31Saturday, June 15, 13
⁍ hotspots
32#CASSANDRA13
nodetool cfstats
Keyspace: gostress
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Column Family: stressful
SSTable count: 1
Space used (live): 32981239
Space used (total): 32981239
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Positives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 336
Compacted row minimum size: 7007507
Compacted row maximum size: 8409007
Compacted row mean size: 8409007
Could be using a lot of heap
Controllable by sstable_size_in_mb
32Saturday, June 15, 13
⁍ bloom ïŹlters
⁍ sstable_size_in_mb
33#CASSANDRA13
nodetool proxyhistograms
Offset Read Latency Write Latency Range Latency
35 0 20 0
42 0 61 0
50 0 82 0
60 0 440 0
72 0 3416 0
86 0 17910 0
103 0 48675 0
124 1 97423 0
149 0 153109 0
179 2 186205 0
215 5 139022 0
258 134 44058 0
310 2656 60660 0
372 34698 742684 0
446 469515 7359351 0
535 3920391 31030588 0
642 9852708 33070248 0
770 4487796 9719615 0
924 651959 984889 0
33Saturday, June 15, 13
⁍ units are microseconds
⁍ can give you a good idea of how much latency coordinator hops are costing you
34#CASSANDRA13
nodetool compactionstats
al@node ~ $ nodetool compactionstats
pending tasks: 3
compaction type keyspace column family bytes compacted bytes total progress
Compaction hastur gauge_archive 9819749801 16922291634 58.03%
Compaction hastur counter_archive 12141850720 16147440484 75.19%
Compaction hastur mark_archive 647389841 1475432590 43.88%
Active compaction remaining time : n/a
al@node ~ $ nodetool compactionstats
pending tasks: 3
compaction type keyspace column family bytes compacted bytes total progress
Compaction hastur gauge_archive 10239806890 16922291634 60.51%
Compaction hastur counter_archive 12544404397 16147440484 77.69%
Compaction hastur mark_archive 1107897093 1475432590 75.09%
Active compaction remaining time : n/a
34Saturday, June 15, 13
⁍
35#CASSANDRA13
⁍ cassandra-stress
⁍ YCSB
⁍ Production
⁍ Terasort (DSE)
⁍ Homegrown
Stress Testing Tools
35Saturday, June 15, 13
⁍ we mostly focus on cassandra-stress for burn-in of new clusters
⁍ can quickly ïŹgure out the right setting for -Xmn
⁍ Terasort is interesting for comparing DSE to Cloudera/Hortonworks/etc. (it’s fast!)
⁍ Consider writing custom benchmarks for your application patterns
⁍ sometimes it’s faster to write one than ïŹgure out how to make a generic tool do what you want
36#CASSANDRA13
kernel.pid_max = 999999
fs.file-max = 1048576
vm.max_map_count = 1048576
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 65536 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
vm.dirty_ratio = 10
vm.dirty_background_ratio = 2
vm.swappiness = 1
/etc/sysctl.conf
36Saturday, June 15, 13
⁍ pid_max doesn’t ïŹx anything, I just like it and have never had a problem with it
⁍ These are my starting point settings for nearly every system/application.
⁍ Generally safe for production.
⁍ vm.dirty*ratio can go big for fake fast writes, generally safe for Cassandra, but beware you’re more likely to see FS/ïŹle corruption on power loss
⁍ but you will get latency spikes if you hit dirty_ratio (percentage of RAM), so don’t tune too low
37#CASSANDRA13
ra=$((2**14))# 16k
ss=$(blockdev --getss /dev/sda)
blockdev --setra $(($ra / $ss)) /dev/sda
echo 256 > /sys/block/sda/queue/nr_requests
echo cfq > /sys/block/sda/queue/scheduler
echo 16384 > /sys/block/md7/md/stripe_cache_size
/etc/rc.local
37Saturday, June 15, 13
⁍ Lower readahead is better for latency on seeky workloads
⁍ More readahead will artiïŹcially increase your IOPS by reading a bunch of stuff you might not need!
⁍ nr_requests = number of IO structs the kernel will keep in ïŹ‚ight, don’t go crazy
⁍ Deadline is best for raw throughput
⁍ CFQ supports cgroup priorities and is occasionally better for latency on SATA drives
⁍ Default stripe cache is 128. The increase seems to help MD RAID5 a lot.
⁍ Don’t forget to set readahead separately for MD RAID devices
38#CASSANDRA13
-Xmx8G leave it alone
-Xms8G leave it alone
-Xmn1200M 100MiB * nCPU
-Xss180k should be fine
-XX:+UseNUMA
numactl --interleave
JVM Args
38Saturday, June 15, 13
⁍ In general, most people should leave the defaults alone. Especially the heap, which can cause no end of trouble if you do it wrong and cause GC
pauses.
⁍ Don’t count hypercores.
⁍ Our biggest bang for the buck so far has been tuning newsize.
⁍ Have you ever seen “out of memory” when there’s plenty of memory available? You probably have a full NUMA node.
⁍ NUMA is how modern machines are built. Older Apache Cassandra distros had numactl --interleave, but this doesn’t seem to be in the DSE
scripts. I’ve been running +UseNUMA for about a year and a half now and it seems to work ïŹne.
cgroups
39#CASSANDRA13
Provides fine-grained control over Linux resources
⁍ Makes the Linux scheduler better
⁍ Lets you manage systems under extreme load
⁍ Useful on all Linux machines
⁍ Can choose between determinism and flexibility
39Saturday, June 15, 13
⁍ static resource assignment has better determinism / constentcy
⁍ weighted resources provide most of the advantage with a lot more ïŹ‚exibility
cgroups
40#CASSANDRA13
cat >> /etc/default/cassandra <<EOF
cpucg=/sys/fs/cgroup/cpu/cassandra
mkdir $cpucg
cat $cpucg/../cpuset.mems >$cpucg/cpuset.mems
cat $cpucg/../cpuset.cpus >$cpucg/cpuset.cpus
echo 100 > $cpucg/shares
echo $$ > $cpucg/tasks
EOF
40Saturday, June 15, 13
⁍ automatically adds cassandra to a CG called “cassandra”
⁍ cpuset.mems can be used to limit NUMA nodes if you have huge machines
⁍ cpuset.cpus can restrict tasks to speciïŹc cores (like taskset, stricter)
⁍ shares is just a number, set your own scale, 1-1000 works for me
⁍ adding a task to a CG is as simple as adding its PID
⁍ children are not necessarily added, you must add threads too if joining after startup (ps -efL)
Successful Experiment: btrfs
41#CASSANDRA13
mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1
mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1
mount -o compress=lzo /dev/sdc1 /data
41Saturday, June 15, 13
⁍ Like ZFS, btrfs can manage multiple disks without mdraid or LVM.
⁍ We have one production system in EC2 running btrfs ïŹ‚awlessly.
⁍ I’m told there are problems when the disk ïŹlls up so don’t do that.
⁍ noatime isn’t necessary on modern Linux, relatime is the default for xfs / ext4 and is good enough
Successful Experiment: ZFS on Linux
42#CASSANDRA13
zpool create data raidz /dev/sd[c-h]
zfs create data/cassandra
zfs set compression=lzjb data/cassandra
zfs set atime=off data/cassandra
zfs set logbias=throughput data/cassandra
42Saturday, June 15, 13
⁍ ZFS really is the ultimate ïŹlesystem.
⁍ RAIDZ is like RAID5 but totally different:
⁍ variable-width stripes
⁍ no write hole
⁍ VERY fast, plays well with C*
⁍ Stable! (so far)
Conclusions
43#CASSANDRA13
⁍ Tuning is multi-dimensional
⁍ Production load is your most important benchmark
⁍ Lean on Cassandra, experiment!
⁍ No one metric tells the whole story
43Saturday, June 15, 13
Questions?
44#CASSANDRA13
⁍ Twitter: @AlTobey
⁍ Github: https://github.com/tobert
⁍ Email: al@ooyala.com / tobert@gmail.com
44Saturday, June 15, 13

Weitere Àhnliche Inhalte

Was ist angesagt?

Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSPythian
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1DataStax Academy
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...DevOpsDays Tel Aviv
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandraAxel Liljencrantz
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireDataStax
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with CassandraSperasoft
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)Ontico
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
 
Development to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersDevelopment to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersSeveralnines
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSudheer Kondla
 
BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsMatthew Dennis
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
 

Was ist angesagt? (20)

Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWS
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on Fire
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For Operators
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
Development to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersDevelopment to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB Clusters
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current Trends
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFS
 

Ähnlich wie C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Albert Tobey

Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...DataStax
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scalePatrick McFadin
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationDataStax Academy
 
C* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt Kennedy
C* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt KennedyC* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt Kennedy
C* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt KennedyDataStax Academy
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelDean Wampler
 
OSS Presentation DRMC by Keith Brennan
OSS Presentation DRMC by Keith BrennanOSS Presentation DRMC by Keith Brennan
OSS Presentation DRMC by Keith BrennanOpenStorageSummit
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at PollfishPollfish
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesScyllaDB
 
Complex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff PollardComplex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff PollardRedis Labs
 
Real Developer Tools for WordPress by Stefan Didak
Real Developer Tools for WordPress by Stefan DidakReal Developer Tools for WordPress by Stefan Didak
Real Developer Tools for WordPress by Stefan DidakEast Bay WordPress Meetup
 
High Availabiltity & Replica Sets with mongoDB
High Availabiltity & Replica Sets with mongoDBHigh Availabiltity & Replica Sets with mongoDB
High Availabiltity & Replica Sets with mongoDBGareth Davies
 
Pl2017 High Availability in GCE
Pl2017 High Availability in GCEPl2017 High Availability in GCE
Pl2017 High Availability in GCEAllan Mason
 
High Availability in GCE
High Availability in GCEHigh Availability in GCE
High Availability in GCECarmen Mason
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10benoitg
 
IT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksIT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksUS-Analytics
 
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi DataStax Academy
 
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...Gianmario Spacagna
 

Ähnlich wie C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Albert Tobey (20)

Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
 
C* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt Kennedy
C* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt KennedyC* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt Kennedy
C* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt Kennedy
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Making WordPress Fly
Making WordPress FlyMaking WordPress Fly
Making WordPress Fly
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) Model
 
OSS Presentation DRMC by Keith Brennan
OSS Presentation DRMC by Keith BrennanOSS Presentation DRMC by Keith Brennan
OSS Presentation DRMC by Keith Brennan
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Complex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff PollardComplex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff Pollard
 
Real Developer Tools for WordPress by Stefan Didak
Real Developer Tools for WordPress by Stefan DidakReal Developer Tools for WordPress by Stefan Didak
Real Developer Tools for WordPress by Stefan Didak
 
High Availabiltity & Replica Sets with mongoDB
High Availabiltity & Replica Sets with mongoDBHigh Availabiltity & Replica Sets with mongoDB
High Availabiltity & Replica Sets with mongoDB
 
Pl2017 High Availability in GCE
Pl2017 High Availability in GCEPl2017 High Availability in GCE
Pl2017 High Availability in GCE
 
High Availability in GCE
High Availability in GCEHigh Availability in GCE
High Availability in GCE
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
 
IT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksIT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance Sucks
 
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
 
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
 

Mehr von DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra DriverDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

KĂŒrzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

KĂŒrzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Albert Tobey

  • 1. PRACTICE MAKES PERFECT: EXTREME CASSANDRA OPTIMIZATION @AlTobey Tech Lead, Compute and Data Services #CASSANDRA13 1Saturday, June 15, 13 I didn’t name this talk. The conference people did, but I like it a lot.
  • 2. 2 ⁍ About me / Ooyala ⁍ How not to manage your Cassandra clusters ⁍ Make it suck less ⁍ How to be a heuristician ⁍ Tools of the trade ⁍ More Settings ⁍ Show & Tell #CASSANDRA13 Outline 2Saturday, June 15, 13
  • 3. 3 ⁍ Tech Lead, Compute and Data Services at Ooyala, Inc. ⁍ C&D team is #devops: 3 ops, 3 eng, me ⁍ C&D team is #bdaas: Big Data as a Service ⁍ ~100 Cassandra nodes, expanding quickly ⁍ Obligatory: we’re hiring #CASSANDRA13 @AlTobey 3Saturday, June 15, 13 ⁍ I won’t go into devops today, but I’m happy to talk about it later. ⁍ 2 years at Ooyala, SRE -> TL Tools Team -> C&D ⁍ C&D builds BDaaS for Ooyala: fully managed Cassandra / Spark / Hadoop / Zookeeper / Kafka ⁍ 11 clusters, 5-36 nodes, working on something big ⁍ BEFORE: Engineers deployed systems: expensive, error-prone, AFTER: Engineers use API’s & consult
  • 4. 4 ⁍ Founded in 2007 ⁍ 230+ employees globally ⁍ 200M unique users,110+ countries ⁍ Over 1 billion videos played per month ⁍ Over 2 billion analytic events per day #CASSANDRA13 Ooyala 4Saturday, June 15, 13
  • 5. 5 Ooyala has been using Cassandra since v0.4 Use cases: ⁍ Analytics data (real-time and batch) ⁍ Highly available K/V store ⁍ Time series data ⁍ Play head tracking (cross-device resume) ⁍ Machine Learning Data #CASSANDRA13 Ooyala & Cassandra 5Saturday, June 15, 13
  • 6. Ooyala: Legacy Platform cassandracassandracassandracassandra 6 S3 hadoophadoophadoophadoophadoop cassandra ABE Service APIloggersplayers START HERE #CASSANDRA13 read-modify-write 6Saturday, June 15, 13 ⁍ Ruby MR -- CDH3u4 -- 80 Dell Blades ⁍ Cassandra 0.4 --> 1.1 / DSE 3.x ⁍ 18x Dell r509 48GiB RAM 6x 600G 15k SAS / MD RAID 5 -- more on RAID later ⁍ We’ve scaled our data volume by 2x yearly for the last 4 years.
  • 7. memTable Avoiding read-modify-write 7#CASSANDRA13 Albert 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 cassandra13_drinks column family Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 Tuesday 7Saturday, June 15, 13 ⁍ CF to track how much I expect my team at Ooyala to drink ⁍ Row keys are names ⁍ Column keys are days ⁍ Values are a count of drinks
  • 8. memTable Avoiding read-modify-write 8#CASSANDRA13 Al Tuesday 2 Wednesday 0 Phillip Tuesday 0 Wednesday 1 cassandra13_drinks column family ssTable Albert 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 Tuesday 8Saturday, June 15, 13 ⁍ Next day, after after a ïŹ‚ush ⁍ I’m speaking so I decided to drink less ⁍ Phillip informs me that he has quit drinking
  • 9. memTable Avoiding read-modify-write 9#CASSANDRA13 Albert Tuesday 22 Wednesday 0 cassandra13_drinks column family ssTable Albert Tuesday 2 Wednesday 0 Phillip Tuesday 0 Wednesday 1 ssTable Albert 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 Tuesday 9Saturday, June 15, 13 ⁍ I’m drinking with all you people so I decide to add 20 ⁍ read 2, add 20, write 22
  • 10. Avoiding read-modify-write 10#CASSANDRA13 cassandra13_drinks column family ssTable Albert Tuesday 22 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 0 Wednesday 1 10Saturday, June 15, 13 ⁍ After compaction & conïŹ‚ict resolution ⁍ Overwriting the same value is just ïŹne! Works really well for some patterns such as time-series data ⁍ Separate read/write streams handy for debugging, but not a big deal
  • 11. 2011: 0.6 ➜ 0.8 11 ⁍ Migration is still a largely unsolved problem ⁍ Wrote a tool in Scala to scrub data and write via Thrift ⁍ Rebuilt indexes - faster than copying hadoop cassandra GlusterFS P2P cassandra Thrift #CASSANDRA13 Scala Map/Reduce 11Saturday, June 15, 13 ⁍ Because of some legacy choices, we know we had a bunch of expired tombstones ⁍ GlusterFS: userspace, ionice(1), fast & easy ⁍ Scala MR: sstabledump, etc. TOO SLOW, Scala MR only took a week (with production running too!)
  • 12. Changes: 0.6 ➜ 0.8 12 ⁍ Cassandra 0.8 ⁍ 24GiB heap ⁍ Sun Java 1.6 update ⁍ Linux 2.6.36 ⁍ XFS on MD RAID5 ⁍ Disabled swap or at least vm.swappiness=1 #CASSANDRA13 12Saturday, June 15, 13 ⁍ More on XFS settings & bugs later ⁍ Got signiïŹcant improvements from RAID & readahead tuning (more later) ⁍ Al’s ïŹrst rule of tuning databases: disable swap or GTFO ⁍ ïŹxed lots of applications by simply disabling swap
  • 13. 13 ⁍ 18 nodes ➜ 36 nodes ⁍ DSE 3.0 ⁍ Stale tombstones again! ⁍ No downtime! cassandra GlusterFS P2P DSE 3.0 Thrift #CASSANDRA13 Scala Map/Reduce 2012: Capacity Increase 13Saturday, June 15, 13 ⁍ I switched teams, working on Hastur, didn’t document enough, repairs were forgotten again ⁍ 60 day GC Grace Period expired ... 3 months ago ⁍ rsync is not enough for hardware moves: do rebuilds! ⁍ Use DSE Map/Reduce to isolate most of the load from production
  • 14. System Changes: Apache 1.0 ➜ DSE 3.0 14 ⁍ DSE 3.0 installed via apt packages ⁍ Unchanged: heap, distro ⁍ Ran much faster this time! ⁍ Mistake: Moved to MD RAID 0 Fix: RAID10 or RAID5, MD, ZFS, or btrfs ⁍ Mistake: Running on Ubuntu Lucid Fix: Ubuntu Precise #CASSANDRA13 14Saturday, June 15, 13 ⁍ Previously deployed with Capistrano ⁍ DSE 3’s Hadoop is compiled on Debian 6 so native components will not load on 10.04’s libc ⁍ still gradually rebuilding nodes from RAID0 ➜ RAID5 and Lucid -> Precise
  • 15. Config Changes: Apache 1.0 ➜ DSE 3.0 15 ⁍ Schema: compaction_strategy = LCS ⁍ Schema: bloom_filter_fp_chance = 0.1 ⁍ Schema: sstable_size_in_mb = 256 ⁍ Schema: compression_options = Snappy ⁍ YAML: compaction_throughput_mb_per_sec: 0 #CASSANDRA13 15Saturday, June 15, 13 ⁍ LCS is a huge improvement in operations life (no more major compactions) ⁍ Bloom ïŹlters were tipping over a 24GiB heap ⁍ With lots of data per node, sstable sizes in LCS must be MUCH bigger ⁍ > 100,000 open ïŹles slows everything down, especially startup ⁍ 256mb v.s. 5mb is 50x reduction in ïŹle count ⁍ Compaction can’t keep up: even huge rates don’t work, must be disabled ⁍ try to adjust heap, etc. so you’re ïŹ‚ushing at nearly full memtables to reduce compaction needs ⁍ backreference RMW? ⁍ might be ïŹxed in >= 1.2
  • 16. 16 ⁍ 36 nodes ➜ lots more nodes ⁍ As usual, no downtime! #CASSANDRA13 DSE 3.1DSE 3.1 replication 2013: Datacenter Move 16Saturday, June 15, 13 ⁍ Size omitted in published slides. I was asked not to publish yet, I will tweet, etc. in a couple weeks. ⁍ Wasn’t the original plan, but we save a lot of $$ by leaving old cage ⁍ Prep for next-generation architecture!
  • 17. 17 Upcoming use cases: ⁍ Store every event from our players at full resolution ⁍ Cache code for our Spark job server ⁍ AMPLab Tachyon backend? #CASSANDRA13 Coming Soon for Cassandra at Ooyala 17Saturday, June 15, 13 ⁍ This is the intro for the next slide / diagram. ⁍ Considering Astyanax or CQL3 backend for Tachyon so we can contribute it back
  • 18. 18 spark APIloggersplayers kafka ingest job server #CASSANDRA13 DSE 3.1 Next Generation Architecture: Ooyala Event Store Tachyon? 18Saturday, June 15, 13 ⁍ Look mom! No Hadoop! Remember what I said about latency? ⁍ But we’re not just running DSE on these machines. They’re running: DSE, Spark, KVM, and CDH3u4 (legacy) ⁍ Secret is cgroups! ⁍ Also, ZFS (later)
  • 19. 19 ⁍ Security ⁍ Cost of Goods Sold ⁍ Operations / support ⁍ Developer happiness ⁍ Physical capacity (cpu/memory/network/disk) ⁍ Reliability / Resilience ⁍ Compromise #CASSANDRA13 There’s more to tuning than performance: 19Saturday, June 15, 13 Shifting themes: philosophy of tuning ⁍ Security is always #1: The decision to disable security features is an important decision! ⁍ Example: EC2 instances sizes vary wildly in consistency and raw performance ⁍ Leveled v.s. Size Tiered compaction, ZFS/LVM/MDRAID, bare metal v.s. EC2 ⁍ how much of this stuff do my devs need to know? How much work is it to get a new KS/CF? ⁍ speed of node rebuilds, risk incurred by extended rebuilds, speed of repair a.) e.g. it takes a full 24 hours to repair each node in our 36-node cluster, so > 1 month to repair the cluster ⁍ repeatable conïŹgurations, do future admins have to remember to do stuff or is it automated? ⁍ Look up “John Allspaw Resilience” ⁍ you only have access to EC2 or old hardware, your company has an OS/ïŹlesystem/settings policy (e.g. my $PREVIOUS_JOB CentOS 5.3 Linux 2.18.x hardened distro) There are others of course.
  • 20. 20 ⁍ I’d love to be more scientific, but production comes first ⁍ Sometimes you have to make educated guesses ⁍ It’s not as difficult as it’s made out to be ⁍ Your brain is great at heuristics. Trust it. ⁍ Concentrate on bottlenecks ⁍ Make incremental changes ⁍ Read Malcom Gladwell’s “Blink” #CASSANDRA13 I am not a scientist ... heuristician? 20Saturday, June 15, 13 ⁍ A truly scientiïŹc approach would take a lot of time and resources. ⁍ When under time pressure and things are slow, you have to move fast and measure “by the seat of your pants” ⁍ Be educated, do research, and make sensible decisions without months of testing, be prepared to do better next time ⁍ It’s actually pretty fast and easy this way! ⁍ More on what tools I use later on.
  • 21. 21 Observe, Orient, Decide, Act: ⁍ Observe the system in production under load ⁍ Make small, safe changes ⁍ Observe ⁍ Commit or Revert #CASSANDRA13 The OODA Loop 21Saturday, June 15, 13 ⁍ Understand YOUR production workload ïŹrst! ⁍ Look at Opscenter latency numbers ⁍ cl-netstat.pl (later) ⁍Examples: ⁍ Changing /proc/sys/vm/dirty_background_ratio is fairly safe and shows results quickly. ⁍ Some network settings can take your node ofïŹ‚ine, temporarily or require manual intervention. ⁍ Changing the compaction scheme requires a lot of time and has other implications.
  • 22. Testing Shiny Things 22 ⁍ Like kernels ⁍ And Linux distributions ⁍ And ZFS ⁍ And btrfs ⁍ And JVM’s & parameters ⁍ Test them in production! #CASSANDRA13 22Saturday, June 15, 13 ⁍ Testing stuff in a lab is ïŹne, if you have one and you have the time. ⁍ Take (responsible) advantage of Cassandra’s resilience: ⁍ test things you think should work well in production on ONE node or a couple nodes well spaced out.
  • 23. ext4 ext4 ext4 ZFS ext4 kernel upgrade ext4 btrfs Testing Shiny Things: In Production 23#CASSANDRA13 23Saturday, June 15, 13 ⁍ Use your staging / non-prod environments ïŹrst if you have them (some people don’t and that’s unfortunate but it happens) ⁍ test things you think should work well in production on ONE node or a couple nodes well spaced out.
  • 24. 24#CASSANDRA13 Brendan Gregg’s Tool Chart http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x 24Saturday, June 15, 13 ⁍ Brendan Gregg’s chart is so good, I just copied it for now. ⁍ Original: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ⁍ I’ll brieïŹ‚y talk about a few
  • 25. 25#CASSANDRA13 dstat -lrvn 10 25Saturday, June 15, 13 ⁍ Just like vmstat but prettier and does way more ⁍ 35 lines of output = about 5 minutes of 10s snapshots ⁍ What’s interesting? ⁍ IO wait starting at line 5, but all numbers are going up, so this is probably during a map/reduce job ⁍ IO wait is high, but disk throughput isn’t impressive at all ⁍ ~2 blocked “procs” (which includes threads) Not bothering to tune this right now because production latency is ïŹne.
  • 26. 26#CASSANDRA13 cl-netstat.pl https://github.com/tobert/perl-ssh-tools 26Saturday, June 15, 13 ⁍ Home grown. ⁍ Requires no software on the target machines except for SSH. ⁍ Recent Net::SSH2 supports ssh-agent
  • 27. 27#CASSANDRA13 iostat -x 1 27Saturday, June 15, 13 ⁍ Mostly I just look at the *wait numbers here. ⁍ Great for ïŹnding a bad disk with high latency.
  • 28. 28#CASSANDRA13 htop 28Saturday, June 15, 13 ⁍ Per-CPU utilization bars are nice ⁍ Displays threads by default (hit “H” in plain top) ⁍ Very conïŹgurable! ⁍ For example: 1 thread at 100% CPU is usually the GC
  • 29. 29#CASSANDRA13 jconsole 29Saturday, June 15, 13 ⁍ Looks like I can reduce the heap size on this cluster, but should probably increase -Xmn to 100mb * (physical cores) (not counting hypercores)
  • 30. 30#CASSANDRA13 opscenter 30Saturday, June 15, 13 ⁍ It looks better on a high-resolution display ;)
  • 31. 31#CASSANDRA13 nodetool ring 10.10.10.10 Analytics rack1 Up Normal 47.73 MB 1.72% 1012046694721756637024691720378965 10.10.10.10 Analytics rack1 Up Normal 63.94 MB 0.86% 1026714038123521225967078556906197 10.10.10.10 Analytics rack1 Up Normal 85.73 MB 0.86% 1041381381525285814909465393433428 10.10.10.10 Analytics rack1 Up Normal 47.87 MB 0.86% 1056048724927050403851852229960659 10.10.10.10 Analytics rack1 Up Normal 39.73 MB 0.86% 1070716068328814992794239066487891 10.10.10.10 Analytics rack1 Up Normal 40.74 MB 1.75% 1100423945662575060114582859200003 10.10.10.10 Analytics rack1 Up Normal 40.08 MB 2.20% 1137814208669076757916163680305794 10.10.10.10 Analytics rack1 Up Normal 56.19 MB 3.45% 1196501513956187970179620530735245 10.10.10.10 Analytics rack1 Up Normal 214.88 MB 11.62% 1394248867770897155613247921498720 10.10.10.10 Analytics rack1 Up Normal 214.29 MB 2.45% 1435882108713996181107000284314407 10.10.10.10 Analytics rack1 Up Normal 158.49 MB 1.76% 1465773686249280216901752503449044 10.10.10.10 Analytics rack1 Up Normal 40.3 MB 0.92% 1481401683578223483181070489250370 31Saturday, June 15, 13 ⁍ hotspots
  • 32. 32#CASSANDRA13 nodetool cfstats Keyspace: gostress Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Column Family: stressful SSTable count: 1 Space used (live): 32981239 Space used (total): 32981239 Number of Keys (estimate): 128 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Bloom Filter False Positives: 0 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 336 Compacted row minimum size: 7007507 Compacted row maximum size: 8409007 Compacted row mean size: 8409007 Could be using a lot of heap Controllable by sstable_size_in_mb 32Saturday, June 15, 13 ⁍ bloom ïŹlters ⁍ sstable_size_in_mb
  • 33. 33#CASSANDRA13 nodetool proxyhistograms Offset Read Latency Write Latency Range Latency 35 0 20 0 42 0 61 0 50 0 82 0 60 0 440 0 72 0 3416 0 86 0 17910 0 103 0 48675 0 124 1 97423 0 149 0 153109 0 179 2 186205 0 215 5 139022 0 258 134 44058 0 310 2656 60660 0 372 34698 742684 0 446 469515 7359351 0 535 3920391 31030588 0 642 9852708 33070248 0 770 4487796 9719615 0 924 651959 984889 0 33Saturday, June 15, 13 ⁍ units are microseconds ⁍ can give you a good idea of how much latency coordinator hops are costing you
  • 34. 34#CASSANDRA13 nodetool compactionstats al@node ~ $ nodetool compactionstats pending tasks: 3 compaction type keyspace column family bytes compacted bytes total progress Compaction hastur gauge_archive 9819749801 16922291634 58.03% Compaction hastur counter_archive 12141850720 16147440484 75.19% Compaction hastur mark_archive 647389841 1475432590 43.88% Active compaction remaining time : n/a al@node ~ $ nodetool compactionstats pending tasks: 3 compaction type keyspace column family bytes compacted bytes total progress Compaction hastur gauge_archive 10239806890 16922291634 60.51% Compaction hastur counter_archive 12544404397 16147440484 77.69% Compaction hastur mark_archive 1107897093 1475432590 75.09% Active compaction remaining time : n/a 34Saturday, June 15, 13 ⁍
  • 35. 35#CASSANDRA13 ⁍ cassandra-stress ⁍ YCSB ⁍ Production ⁍ Terasort (DSE) ⁍ Homegrown Stress Testing Tools 35Saturday, June 15, 13 ⁍ we mostly focus on cassandra-stress for burn-in of new clusters ⁍ can quickly ïŹgure out the right setting for -Xmn ⁍ Terasort is interesting for comparing DSE to Cloudera/Hortonworks/etc. (it’s fast!) ⁍ Consider writing custom benchmarks for your application patterns ⁍ sometimes it’s faster to write one than ïŹgure out how to make a generic tool do what you want
  • 36. 36#CASSANDRA13 kernel.pid_max = 999999 fs.file-max = 1048576 vm.max_map_count = 1048576 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 65536 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 vm.dirty_ratio = 10 vm.dirty_background_ratio = 2 vm.swappiness = 1 /etc/sysctl.conf 36Saturday, June 15, 13 ⁍ pid_max doesn’t ïŹx anything, I just like it and have never had a problem with it ⁍ These are my starting point settings for nearly every system/application. ⁍ Generally safe for production. ⁍ vm.dirty*ratio can go big for fake fast writes, generally safe for Cassandra, but beware you’re more likely to see FS/ïŹle corruption on power loss ⁍ but you will get latency spikes if you hit dirty_ratio (percentage of RAM), so don’t tune too low
  • 37. 37#CASSANDRA13 ra=$((2**14))# 16k ss=$(blockdev --getss /dev/sda) blockdev --setra $(($ra / $ss)) /dev/sda echo 256 > /sys/block/sda/queue/nr_requests echo cfq > /sys/block/sda/queue/scheduler echo 16384 > /sys/block/md7/md/stripe_cache_size /etc/rc.local 37Saturday, June 15, 13 ⁍ Lower readahead is better for latency on seeky workloads ⁍ More readahead will artiïŹcially increase your IOPS by reading a bunch of stuff you might not need! ⁍ nr_requests = number of IO structs the kernel will keep in ïŹ‚ight, don’t go crazy ⁍ Deadline is best for raw throughput ⁍ CFQ supports cgroup priorities and is occasionally better for latency on SATA drives ⁍ Default stripe cache is 128. The increase seems to help MD RAID5 a lot. ⁍ Don’t forget to set readahead separately for MD RAID devices
  • 38. 38#CASSANDRA13 -Xmx8G leave it alone -Xms8G leave it alone -Xmn1200M 100MiB * nCPU -Xss180k should be fine -XX:+UseNUMA numactl --interleave JVM Args 38Saturday, June 15, 13 ⁍ In general, most people should leave the defaults alone. Especially the heap, which can cause no end of trouble if you do it wrong and cause GC pauses. ⁍ Don’t count hypercores. ⁍ Our biggest bang for the buck so far has been tuning newsize. ⁍ Have you ever seen “out of memory” when there’s plenty of memory available? You probably have a full NUMA node. ⁍ NUMA is how modern machines are built. Older Apache Cassandra distros had numactl --interleave, but this doesn’t seem to be in the DSE scripts. I’ve been running +UseNUMA for about a year and a half now and it seems to work ïŹne.
  • 39. cgroups 39#CASSANDRA13 Provides fine-grained control over Linux resources ⁍ Makes the Linux scheduler better ⁍ Lets you manage systems under extreme load ⁍ Useful on all Linux machines ⁍ Can choose between determinism and flexibility 39Saturday, June 15, 13 ⁍ static resource assignment has better determinism / constentcy ⁍ weighted resources provide most of the advantage with a lot more ïŹ‚exibility
  • 40. cgroups 40#CASSANDRA13 cat >> /etc/default/cassandra <<EOF cpucg=/sys/fs/cgroup/cpu/cassandra mkdir $cpucg cat $cpucg/../cpuset.mems >$cpucg/cpuset.mems cat $cpucg/../cpuset.cpus >$cpucg/cpuset.cpus echo 100 > $cpucg/shares echo $$ > $cpucg/tasks EOF 40Saturday, June 15, 13 ⁍ automatically adds cassandra to a CG called “cassandra” ⁍ cpuset.mems can be used to limit NUMA nodes if you have huge machines ⁍ cpuset.cpus can restrict tasks to speciïŹc cores (like taskset, stricter) ⁍ shares is just a number, set your own scale, 1-1000 works for me ⁍ adding a task to a CG is as simple as adding its PID ⁍ children are not necessarily added, you must add threads too if joining after startup (ps -efL)
  • 41. Successful Experiment: btrfs 41#CASSANDRA13 mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1 mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1 mount -o compress=lzo /dev/sdc1 /data 41Saturday, June 15, 13 ⁍ Like ZFS, btrfs can manage multiple disks without mdraid or LVM. ⁍ We have one production system in EC2 running btrfs ïŹ‚awlessly. ⁍ I’m told there are problems when the disk ïŹlls up so don’t do that. ⁍ noatime isn’t necessary on modern Linux, relatime is the default for xfs / ext4 and is good enough
  • 42. Successful Experiment: ZFS on Linux 42#CASSANDRA13 zpool create data raidz /dev/sd[c-h] zfs create data/cassandra zfs set compression=lzjb data/cassandra zfs set atime=off data/cassandra zfs set logbias=throughput data/cassandra 42Saturday, June 15, 13 ⁍ ZFS really is the ultimate ïŹlesystem. ⁍ RAIDZ is like RAID5 but totally different: ⁍ variable-width stripes ⁍ no write hole ⁍ VERY fast, plays well with C* ⁍ Stable! (so far)
  • 43. Conclusions 43#CASSANDRA13 ⁍ Tuning is multi-dimensional ⁍ Production load is your most important benchmark ⁍ Lean on Cassandra, experiment! ⁍ No one metric tells the whole story 43Saturday, June 15, 13
  • 44. Questions? 44#CASSANDRA13 ⁍ Twitter: @AlTobey ⁍ Github: https://github.com/tobert ⁍ Email: al@ooyala.com / tobert@gmail.com 44Saturday, June 15, 13