SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Cassandra: From tarball to production
Why talk about this?
You are about to deploy Cassandra
You are looking for “best practices”
You don’t want:
... to scour through the documentation
... to do something known not to work well
... to forget to cover some important step
What we won’t cover
● Cassandra: how does
it work?
● How do I design my
schema?
● What’s new in
Cassandra X.Y?
So many things to do
Monitoring Snitch DC/Rack Settings Time Sync
Seeds/Autoscaling Full/Incremental
Backups
AWS Instance
Selection
Disk - SSD?
Disk Space - 2x? AWS AMI (Image)
Selection
Periodic Repairs Replication Strategy
Compaction
Strategy
SSL/VPC/VPN Authorization +
Authentication
OS Conf - Users
OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs
C* Start/Stop OS Conf - Path Use case evaluation
Chef to the rescue?
Chef community cookbook available
https://github.com/michaelklishin/cassandra-chef-cookbook
Installs java Creates a “cassandra” user/group
Download/extract the tarball Fixes up ownership
Builds the C* configuration files
Sets the ulimits for filehandles, processes,
memory locking
Sets up an init script Sets up data directories
Chef Cookbook Coverage
Monitoring Snitch DC/Rack Settings Time Sync
Seeds/Autoscaling Full/Incremental
Backups
Disk - SSD? Disk - How much?
AWS Instance Type AWS AMI (Image)
Selection
Periodic Repairs Replication Strategy
Compaction
Strategy
SSL/VPC/VPN Authorization +
Authentication
OS Conf - Users
OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs
C* Start/Stop OS Conf - Path Use case evaluation
Monitoring
Is every node answering queries?
Are nodes talking to each other?
Are any nodes running slowly?
Push UDP! (statsd)
http://hackers.lookout.com/2015/01/cassandra-monitoring/
https://github.com/lookout/cassandra-statsd-agent
Monitoring - Synthetic
Health checks, bad and good
● ‘nodetool status’ exit code
○ Might return 0 if the node is not accepting requests
○ Slow, cross node reads
● cqlsh -u sysmon -p password < /dev/null
● Verifies this node can read auth table
● https://github.com/lookout/cassandra-health-check
What about OpsCenter?
We chose not to use it
Want consistent interface for all monitoring
GUI vs Command Line argument
Didn’t see good auditing capabilities
Didn’t interface well with our chef solution
Snitch
Use the right snitch!
● AWS EC2MultiRegionSnitch
● Google? GoogleCloudSnitch
● GossipingPropertyFileSnitch
NOT
● SimpleSnitch (default)
Community cookbook: set it!
What is RF?
Replication Factor is how many copies of data
Value is hashed to determine primary host
Additional copies always next node
Hash here
What is CL?
Consistency Level -- It’s not RF!
Describes how many nodes must respond
before operation is considered COMPLETE
CL_ONE - only one node responds
CL_QUORUM - (RF/2)+1 nodes (round down)
CL_ALL - RF nodes respond
DC/Rack Settings
You might need to set these
Maybe you’re not in Amazon
Rack == Availability Zone?
Hard: Renaming DC or adding racks
Renaming DCs
Clients “remember” which DC they talk to
Renaming single DC causes all clients to fail
Better to spin up a new one than rename old
Adding a rack
Start with 6 node cluster, rack R1
Replication factor 3
Add 1 node in R2, and rebalance
ALL data in R2 node?
Good idea to keep racks balanced
I don’t have time for this
Clusters must have synchronized time
You will get lots of drift with: [0-3].amazon.pool.
ntp.org
Community cookbook doesn’t cover anything
here
Better make time for this
C* serializes write operations by time stamps
Clocks on virtual machines drift!
It’s the relative difference among clocks that matters
C* nodes should synchronize with each other
Solution: use a pair of peered NTP servers (level 2 or 3)
and a small set of known upstream providers
From a small seed…
Seeds are used for new nodes to find cluster
Every new node should use the same seeds
Seed nodes get topology changes faster
Each seed node must be in the config file
Multiple seeds per datacenter recommended
Tricky to configure on AWS
Backups - Full+Incremental
Nothing in the cookbooks for this
C* makes it “easy”: snapshot, then copy
Snapshots might require a lot more space
Remove the snapshot after copying it
Disk selection
SSD Rotational
Ephemeral
EBS
Low latency Any size instance Any size instance
Recommended Not cheap Less expensive
Great random r/w perf Good write performance No node rebuilds
No network use for disk No network use for disk
AWS Instance Selection
We moved to EC2
c3.2xlarge (15GiB mem, Disk 160GB)?
i2.xlarge (30GiB mem, 800GB disk)
Max recommended storage per node is 1TB
Use instance types that support HVM
Some previous generation instance types, such as T1, C1, M1, and M2 do not support Linux HVM AMIs. Some current generation instance
types, such as T2, I2, R3, G2, and C4 do not support PV AMIs.
How much can I use??
Snapshots take space (kind of)
Best practice: keep disks half full!
800GB disk becomes 400GB
Snapshots during repairs?
Lots of uses for snapshots!
Periodic Repairs
Buried in the docs:
“As a best practice, you should
schedule repairs weekly”
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
● “-pr” (yes)
● “-par” (maybe)
● “--in-local-dc” (no)
Repair Tips
Raise gc_grace_seconds (tombstones?)
Run on one node at a time
Schedule for low usage hours
Use “par” if you have dead time (faster)
Tune with: nodetool setcompactionthroughput
I thought I deleted that
Compaction removes “old” tombstones
10 day default grace period (gc_grace_period)
After that, deletes will not be propagated!
Run ‘nodetool repair’ at least every 10 days
Once a week is perfect (3 day grace)
Node down >7 days? ‘nodetool remove’ it!
Changing RF within DC?
Easy to decrease RF
Impossible to increase RF without (usually)
Reads with CL_ONE might fail!
Hash here
Replication Strategy
How many replicas should we have?
What happens if some data is lost?
Are you write-heavy or read-heavy?
Quorum considerations: odd is better!
RF=1? RF=3? RF=5?
Magic JMX setting: reduce traffic to a node
Great when node is “behind” the 4 hour window
Used by gossiper to divert traffic during repairs
Writes: ok, read repair: ok, nodetool repair: ok
$ java -jar jmxterm.jar -l localhost:7199
$> set -b org.apache.cassandra.db:type=DynamicEndpointSnitch Severity
10000
Don’t be too severe!
Compaction Strategy
Solved by using a good C* design
SizeTiered or Leveled?
Leveled has better guarantees for read times
SizeTiered may require 10 (or more) reads!
Leveled uses less disk space
Leveled tombstone collection is slower
Auth*
Cookbooks default to OFF
Turn authenticator and authorizer on
‘cassandra’ user is super special
Requires QUORUM (cross-DC) for signon
LOCAL_ONE for all other users!
Users
OS users vs Cassandra users: 1 to 1?
Shared credentials for apps?
Nothing logs the user taking the action!
‘cassandra’ user is created by cookbook
All processes run as ‘cassandra’
Limits
Chef helps here! Startup:
ulimit -l unlimited # mem lock
ulimit -n 48000 # fds
/etc/security/limits.d
cassandra - nofile 48000
cassandra - nproc unlimited
cassandra - memlock unlimited
Filesystem Type
Officially supported: ext4 or XFS
XFS is slightly faster
Interesting options:
● ext4 without journal
● ext2
● zfs
Logs
To consolidate or not to consolidate?
Push or pull? Usually push!
FOSS: syslogd, syslog-ng, logstash/kibana,
heka, banana
Others: Splunk, SumoLogic, Loggly, Stackify
Shutdown
Nice init script with cookbook, steps are:
● nodetool disablethrift (no more clients)
● nodetool disablegossip (stop talking to
cluster)
● nodetool drain (flush all memtables)
● kill the jvm
Quick performance wins
● Disable assertions - cookbook property
● No swap space (or vm.swappiness=1)
● max_concurrent_reads
● max_concurrent_writes
Thank
You!@rkuris
ron.kuris@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSPythian
 
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Community
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScyllaDB
 
Glauber Costa on OSv as NoSQL platform
Glauber Costa on OSv as NoSQL platformGlauber Costa on OSv as NoSQL platform
Glauber Costa on OSv as NoSQL platformDon Marti
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Deep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceDeep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceAmazon Web Services
 
(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep DiveAmazon Web Services
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsSematext Group, Inc.
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Ontico
 
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Community
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successErick Ramirez
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redisDaeMyung Kang
 
OSv at Cassandra Summit
OSv at Cassandra SummitOSv at Cassandra Summit
OSv at Cassandra SummitDon Marti
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSJohn McCormack
 
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizingAWSKRUG - AWS한국사용자모임
 
AWS - an introduction to bursting (GP2 - T2)
AWS - an introduction to bursting (GP2 - T2)AWS - an introduction to bursting (GP2 - T2)
AWS - an introduction to bursting (GP2 - T2)Rasmus Ekman
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySematext Group, Inc.
 

Was ist angesagt? (20)

TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWS
 
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
Glauber Costa on OSv as NoSQL platform
Glauber Costa on OSv as NoSQL platformGlauber Costa on OSv as NoSQL platform
Glauber Costa on OSv as NoSQL platform
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
MySQL Head-to-Head
MySQL Head-to-HeadMySQL Head-to-Head
MySQL Head-to-Head
 
Deep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceDeep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS Performance
 
(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
 
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
OSv at Cassandra Summit
OSv at Cassandra SummitOSv at Cassandra Summit
OSv at Cassandra Summit
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWS
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
 
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
 
AWS - an introduction to bursting (GP2 - T2)
AWS - an introduction to bursting (GP2 - T2)AWS - an introduction to bursting (GP2 - T2)
AWS - an introduction to bursting (GP2 - T2)
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 

Andere mochten auch

LJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraLJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraChristopher Batey
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleEric Lubow
 
Manage your compactions before they manage you!
Manage your compactions before they manage you!Manage your compactions before they manage you!
Manage your compactions before they manage you!Carlos Juzarte Rolo
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandraaaronmorton
 
A deep look at the cql where clause
A deep look at the cql where clauseA deep look at the cql where clause
A deep look at the cql where clauseBenjamin Lerer
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiazznate
 
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraDevoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraChristopher Batey
 
Case Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCase Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCarlos Alonso Pérez
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 

Andere mochten auch (12)

LJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraLJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache Cassandra
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary Tale
 
Manage your compactions before they manage you!
Manage your compactions before they manage you!Manage your compactions before they manage you!
Manage your compactions before they manage you!
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
 
A deep look at the cql where clause
A deep look at the cql where clauseA deep look at the cql where clause
A deep look at the cql where clause
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoia
 
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraDevoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
 
Case Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCase Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developer
 
Tombstones and Compaction
Tombstones and CompactionTombstones and Compaction
Tombstones and Compaction
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For Operators
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 

Ähnlich wie Cassandra from tarball to production

RDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and PatternsRDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and PatternsLaine Campbell
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operationniallmilton
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-PatternsMatthew Dennis
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
Percona Live 2014 - Scaling MySQL in AWS
Percona Live 2014 - Scaling MySQL in AWSPercona Live 2014 - Scaling MySQL in AWS
Percona Live 2014 - Scaling MySQL in AWSPythian
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceAshok Modi
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraAmazon Web Services
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewExperience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewPhuwadon D
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Amazon Web Services
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryScyllaDB
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfRedis Labs
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 

Ähnlich wie Cassandra from tarball to production (20)

RDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and PatternsRDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and Patterns
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-Patterns
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Percona Live 2014 - Scaling MySQL in AWS
Percona Live 2014 - Scaling MySQL in AWSPercona Live 2014 - Scaling MySQL in AWS
Percona Live 2014 - Scaling MySQL in AWS
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewExperience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's View
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
 
What’s New in Amazon Aurora
What’s New in Amazon AuroraWhat’s New in Amazon Aurora
What’s New in Amazon Aurora
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConf
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 

Kürzlich hochgeladen

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Cassandra from tarball to production

  • 1. Cassandra: From tarball to production
  • 2. Why talk about this? You are about to deploy Cassandra You are looking for “best practices” You don’t want: ... to scour through the documentation ... to do something known not to work well ... to forget to cover some important step
  • 3. What we won’t cover ● Cassandra: how does it work? ● How do I design my schema? ● What’s new in Cassandra X.Y?
  • 4. So many things to do Monitoring Snitch DC/Rack Settings Time Sync Seeds/Autoscaling Full/Incremental Backups AWS Instance Selection Disk - SSD? Disk Space - 2x? AWS AMI (Image) Selection Periodic Repairs Replication Strategy Compaction Strategy SSL/VPC/VPN Authorization + Authentication OS Conf - Users OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs C* Start/Stop OS Conf - Path Use case evaluation
  • 5. Chef to the rescue? Chef community cookbook available https://github.com/michaelklishin/cassandra-chef-cookbook Installs java Creates a “cassandra” user/group Download/extract the tarball Fixes up ownership Builds the C* configuration files Sets the ulimits for filehandles, processes, memory locking Sets up an init script Sets up data directories
  • 6. Chef Cookbook Coverage Monitoring Snitch DC/Rack Settings Time Sync Seeds/Autoscaling Full/Incremental Backups Disk - SSD? Disk - How much? AWS Instance Type AWS AMI (Image) Selection Periodic Repairs Replication Strategy Compaction Strategy SSL/VPC/VPN Authorization + Authentication OS Conf - Users OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs C* Start/Stop OS Conf - Path Use case evaluation
  • 7. Monitoring Is every node answering queries? Are nodes talking to each other? Are any nodes running slowly? Push UDP! (statsd) http://hackers.lookout.com/2015/01/cassandra-monitoring/ https://github.com/lookout/cassandra-statsd-agent
  • 8. Monitoring - Synthetic Health checks, bad and good ● ‘nodetool status’ exit code ○ Might return 0 if the node is not accepting requests ○ Slow, cross node reads ● cqlsh -u sysmon -p password < /dev/null ● Verifies this node can read auth table ● https://github.com/lookout/cassandra-health-check
  • 9. What about OpsCenter? We chose not to use it Want consistent interface for all monitoring GUI vs Command Line argument Didn’t see good auditing capabilities Didn’t interface well with our chef solution
  • 10. Snitch Use the right snitch! ● AWS EC2MultiRegionSnitch ● Google? GoogleCloudSnitch ● GossipingPropertyFileSnitch NOT ● SimpleSnitch (default) Community cookbook: set it!
  • 11. What is RF? Replication Factor is how many copies of data Value is hashed to determine primary host Additional copies always next node Hash here
  • 12. What is CL? Consistency Level -- It’s not RF! Describes how many nodes must respond before operation is considered COMPLETE CL_ONE - only one node responds CL_QUORUM - (RF/2)+1 nodes (round down) CL_ALL - RF nodes respond
  • 13. DC/Rack Settings You might need to set these Maybe you’re not in Amazon Rack == Availability Zone? Hard: Renaming DC or adding racks
  • 14. Renaming DCs Clients “remember” which DC they talk to Renaming single DC causes all clients to fail Better to spin up a new one than rename old
  • 15. Adding a rack Start with 6 node cluster, rack R1 Replication factor 3 Add 1 node in R2, and rebalance ALL data in R2 node? Good idea to keep racks balanced
  • 16. I don’t have time for this Clusters must have synchronized time You will get lots of drift with: [0-3].amazon.pool. ntp.org Community cookbook doesn’t cover anything here
  • 17. Better make time for this C* serializes write operations by time stamps Clocks on virtual machines drift! It’s the relative difference among clocks that matters C* nodes should synchronize with each other Solution: use a pair of peered NTP servers (level 2 or 3) and a small set of known upstream providers
  • 18. From a small seed… Seeds are used for new nodes to find cluster Every new node should use the same seeds Seed nodes get topology changes faster Each seed node must be in the config file Multiple seeds per datacenter recommended Tricky to configure on AWS
  • 19. Backups - Full+Incremental Nothing in the cookbooks for this C* makes it “easy”: snapshot, then copy Snapshots might require a lot more space Remove the snapshot after copying it
  • 20. Disk selection SSD Rotational Ephemeral EBS Low latency Any size instance Any size instance Recommended Not cheap Less expensive Great random r/w perf Good write performance No node rebuilds No network use for disk No network use for disk
  • 21. AWS Instance Selection We moved to EC2 c3.2xlarge (15GiB mem, Disk 160GB)? i2.xlarge (30GiB mem, 800GB disk) Max recommended storage per node is 1TB Use instance types that support HVM Some previous generation instance types, such as T1, C1, M1, and M2 do not support Linux HVM AMIs. Some current generation instance types, such as T2, I2, R3, G2, and C4 do not support PV AMIs.
  • 22. How much can I use?? Snapshots take space (kind of) Best practice: keep disks half full! 800GB disk becomes 400GB Snapshots during repairs? Lots of uses for snapshots!
  • 23. Periodic Repairs Buried in the docs: “As a best practice, you should schedule repairs weekly” http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html ● “-pr” (yes) ● “-par” (maybe) ● “--in-local-dc” (no)
  • 24. Repair Tips Raise gc_grace_seconds (tombstones?) Run on one node at a time Schedule for low usage hours Use “par” if you have dead time (faster) Tune with: nodetool setcompactionthroughput
  • 25. I thought I deleted that Compaction removes “old” tombstones 10 day default grace period (gc_grace_period) After that, deletes will not be propagated! Run ‘nodetool repair’ at least every 10 days Once a week is perfect (3 day grace) Node down >7 days? ‘nodetool remove’ it!
  • 26. Changing RF within DC? Easy to decrease RF Impossible to increase RF without (usually) Reads with CL_ONE might fail! Hash here
  • 27. Replication Strategy How many replicas should we have? What happens if some data is lost? Are you write-heavy or read-heavy? Quorum considerations: odd is better! RF=1? RF=3? RF=5?
  • 28. Magic JMX setting: reduce traffic to a node Great when node is “behind” the 4 hour window Used by gossiper to divert traffic during repairs Writes: ok, read repair: ok, nodetool repair: ok $ java -jar jmxterm.jar -l localhost:7199 $> set -b org.apache.cassandra.db:type=DynamicEndpointSnitch Severity 10000 Don’t be too severe!
  • 29. Compaction Strategy Solved by using a good C* design SizeTiered or Leveled? Leveled has better guarantees for read times SizeTiered may require 10 (or more) reads! Leveled uses less disk space Leveled tombstone collection is slower
  • 30. Auth* Cookbooks default to OFF Turn authenticator and authorizer on ‘cassandra’ user is super special Requires QUORUM (cross-DC) for signon LOCAL_ONE for all other users!
  • 31. Users OS users vs Cassandra users: 1 to 1? Shared credentials for apps? Nothing logs the user taking the action! ‘cassandra’ user is created by cookbook All processes run as ‘cassandra’
  • 32. Limits Chef helps here! Startup: ulimit -l unlimited # mem lock ulimit -n 48000 # fds /etc/security/limits.d cassandra - nofile 48000 cassandra - nproc unlimited cassandra - memlock unlimited
  • 33. Filesystem Type Officially supported: ext4 or XFS XFS is slightly faster Interesting options: ● ext4 without journal ● ext2 ● zfs
  • 34. Logs To consolidate or not to consolidate? Push or pull? Usually push! FOSS: syslogd, syslog-ng, logstash/kibana, heka, banana Others: Splunk, SumoLogic, Loggly, Stackify
  • 35. Shutdown Nice init script with cookbook, steps are: ● nodetool disablethrift (no more clients) ● nodetool disablegossip (stop talking to cluster) ● nodetool drain (flush all memtables) ● kill the jvm
  • 36. Quick performance wins ● Disable assertions - cookbook property ● No swap space (or vm.swappiness=1) ● max_concurrent_reads ● max_concurrent_writes