SlideShare a Scribd company logo
1 of 35
Java tuning for Knewton’s C* clusters
Lessons learned
Carlos Monroy
Knewton
Knewton
Leader in adaptive learning
- Partners with publishers and institutions in Europe, US, and Asia
- Provides unique recommendations to students based on
previous behavior.
- Advanced content ingestion, curation, and calibration
- Runs in AWS with many different storage backends
- Check us out: www.knewton.com/about/careers/
© DataStax, All Rights Reserved. 2
1 JVM tuning at Knewton
2 Updating memtable_allocation_type
3 Changing garbage collection strategy
3© DataStax, All Rights Reserved.
Context
As many startups, our company needed to make tradeoffs in order to rapidly deliver the product:
- Technical debt.
- Silos and isolated efforts.
- Decisions based on gut and intuition.
One year ago:
- Different versions of Cassandra
- Multiple clients (i.e.: Pycassa, Hector, Astyanax, Datastax)
- Huge challenge with backups and restores
Now:
- 99.98% database uptime
- The database is not a black box anymore
© DataStax, All Rights Reserved. 4
Successful initiatives
- In house command line tools
- cassandra-toolbox python package
- distributed nodetool
- Separation of objects from heap memory (memtable_allocation_type)
- Customization of heap size allocation.
- Update to Garbage First Garbage Collection (G1GC).
- Monitoring/alerting based on JMX metrics
© DataStax, All Rights Reserved. 5
Successful initiatives
- In house command line tools https://github.com/Knewton/cassandra-toolbox/
- cassandra-toolbox python package
- distributed nodetool
- Separation of objects from heap memory (memtable_allocation_type)
- Customization of heap size allocation.
- Update to Garbage First Garbage Collection (G1GC).
- Monitoring/alerting based on JMX metrics
© DataStax, All Rights Reserved. 6
Successful initiatives
- In house command line tools
- cassandra-toolbox python package
- distributed nodetool
- Separation of objects from heap memory (memtable_allocation_type)
- Customization of heap size allocation.
- Update to Garbage First Garbage Collection (G1GC).
- Monitoring/alerting based on JMX metrics
© DataStax, All Rights Reserved. 7
Successful initiatives
- In house command line tools
- cassandra-toolbox python package
- distributed nodetool
- Separation of objects from heap memory (memtable_allocation_type)
- Customization of heap size allocation. https://tech.knewton.com/
- Update to Garbage First Garbage Collection (G1GC).
- Monitoring/alerting based on JMX metrics
© DataStax, All Rights Reserved. 8
Some less successful initiatives
- Monitoring and alerts based on Graphite graphs
- Too many resources to get an aggregate
- High incidence of false positives and false negatives
- GoCD
- Cloudwatch
© DataStax, All Rights Reserved. 9
1 JVM tuning at Knewton
2 Updating memtable_allocation_type
3 Changing garbage collection strategy
10© DataStax, All Rights Reserved.
memtable_allocation_type
Cassandra allows to keep memtables and key cache objects in the native memory, instead of the Java
JVM heap.
- Used for data structures that continue growing with time
- Options:
- heap_buffers
- default value before Cassandra 3.0
- all the objects are kept in the JVM heap memory
- offheap_buffers
- cell name and values are moved to DirectBuffer objects
- offheap_objects
- moves the entire cell off heap, leaving only a pointer
11
Update memtable_allocation_type
cassandra-stress tool is a great starting point while
validating changes for the database configuration
12
But we needed to go the extra mile with an end-to-end test
- involving the rest of the dev team
- demonstrate the positive impact of the change to the
rest of the system
Update memtable_allocation_type
cassandra-stress tool is a great starting point while
validating changes for the database configuration
13
But we needed to go the extra mile with an end-to-end test
- involving the rest of the dev team
- demonstrate the positive impact of the change to the
rest of the system
Test memtable_allocation_type update
14
Update setting
Load test
(locust)
Compile logs
from C* and
application
Analysis
with R
Response times
Functional load
tests
Update memtable - Criteria
End-to-end
15
• Response time
– Timeouts
• Errors
• Throughput
• CPU consumption
• Memory used
Cassandra specific
• Cassandra
– Time spent for Garbage
Collection
• Collection
– Read and Write latencies
– Errors/Exceptions
Update memtable_allocation_type
Time used for Garbage Collection
offheap_buffers offheap_objects heap_buffers
16
Comparing garbage collection times with different values for memtable_allocation_type
Update memtable_allocation_type
Time used for Garbage Collection
offheap_buffers offheap_objects heap_buffers
17
Comparing garbage collection times with different values for memtable_allocation_type
Update memtable_allocation_type
Memory sizes
offheap_buffers offheap_objects heap_buffers
18
Comparing the sizes per generation spaces, before and after the garbage collection.
Update memtable_allocation_type
GC phases
offheap_buffers offheap_objects heap_buffers
19
Comparing the behaviour of the garbage collection phases.
memtable_allocation_type results
We are using offheap_buffers as it showed:
- the lowest average response time for requests
- lowest CPU usage
- lowest thread count created
- lowest write latency
*Results may vary
20
1 JVM tuning at Knewton
2 Updating memtable_allocation_type
3 Changing garbage collection strategy
21© DataStax, All Rights Reserved.
22
Garbage First Garbage Collection (G1GC)
The G1 collector utilizes multiple background threads to scan through the heap
that it divides into regions.
It is named “Garbage first” (G1) gives preference to scan those regions that
contain the most garbage objects first.
This collector is turned on using the –XX:+UseG1GC flag.
G1GC analysis
G1 was released since April 2012 (JDK 7 update 4 and up)
The tools available for the analysis of the garbage collection logs didn’t have
the support or were not able to interpret all the information from our servers.
- Netflix gcviz does not support Garbage First (G1) strategy
- In Oracle’s developer blog (Jeff Taylor) it is proposed an initial approach
for JDK 7
© DataStax, All Rights Reserved. 23
Test garbage collection
24
Enable gc data
collection
Get a
baseline
Compile GC
logs
Analysis
with R
G1GC Java arguments
25
Java Arguments as defined in cassandra-env.sh
-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:ParallelGCThreads=2
-XX:ConcGCThreads=2
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/<valid path>/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
G1GC analysis - Heap size
26
G1GC analysis - Heap size
27
G1GC analysis - Heap size
28
G1GC analysis - Heap size
29
G1GC analysis - Heap size
30
G1GC analysis - phases
31
G1GC Analysis demo
Code
Garbage collection analysis :
https://gist.github.com/roymontecutli/4cf5c97f03720e60825f414667c141da
Cassandra toolbox : https://github.com/Knewton/cassandra-toolbox
33
Conclusions
- Moving objects from the JVM heap memory can improve
the performance of the application when dealing with
large data sets. Yet you need to find out which strategy
(take out buffers or objects) suits the best for your use
case.
- Garbage Collection is an operation that can impact
adversely the performance on a Cassandra cluster.
Having tools to analyze its behaviour will help to identify
areas of impact and measure improvements.
- Configuration changes should always consider the
system as a whole, involve all the teams.
© DataStax, All Rights Reserved. 34
Thanks
carlos@knewton.com

More Related Content

What's hot

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
DataStax
 

What's hot (20)

Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFS
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStax
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 
Apache cassandra v4.0
Apache cassandra v4.0Apache cassandra v4.0
Apache cassandra v4.0
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Understanding DSE Search by Matt Stump
Understanding DSE Search by Matt StumpUnderstanding DSE Search by Matt Stump
Understanding DSE Search by Matt Stump
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
 

Viewers also liked

The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
DataStax
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
DataStax
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 

Viewers also liked (11)

G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and Cassandra
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 

Similar to Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Knewton) | C* Summit 2016

Similar to Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Knewton) | C* Summit 2016 (20)

Running Java Applications inside Kubernetes with Nested Container Architectur...
Running Java Applications inside Kubernetes with Nested Container Architectur...Running Java Applications inside Kubernetes with Nested Container Architectur...
Running Java Applications inside Kubernetes with Nested Container Architectur...
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
 
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
 
Vijendra_resume
Vijendra_resume Vijendra_resume
Vijendra_resume
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on Kubernetes
 
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
 
Collecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsDCollecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsD
 
gDBClone - Database Clone “onecommand Automation Tool”
gDBClone - Database Clone “onecommand Automation Tool”gDBClone - Database Clone “onecommand Automation Tool”
gDBClone - Database Clone “onecommand Automation Tool”
 
Tuning Java for Big Data
Tuning Java for Big DataTuning Java for Big Data
Tuning Java for Big Data
 
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1
 
Journey to containers by Chet Lintz - AWS Chicago Jan 17,2018 user group on C...
Journey to containers by Chet Lintz - AWS Chicago Jan 17,2018 user group on C...Journey to containers by Chet Lintz - AWS Chicago Jan 17,2018 user group on C...
Journey to containers by Chet Lintz - AWS Chicago Jan 17,2018 user group on C...
 

More from DataStax

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Knewton) | C* Summit 2016

  • 1. Java tuning for Knewton’s C* clusters Lessons learned Carlos Monroy Knewton
  • 2. Knewton Leader in adaptive learning - Partners with publishers and institutions in Europe, US, and Asia - Provides unique recommendations to students based on previous behavior. - Advanced content ingestion, curation, and calibration - Runs in AWS with many different storage backends - Check us out: www.knewton.com/about/careers/ © DataStax, All Rights Reserved. 2
  • 3. 1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 3© DataStax, All Rights Reserved.
  • 4. Context As many startups, our company needed to make tradeoffs in order to rapidly deliver the product: - Technical debt. - Silos and isolated efforts. - Decisions based on gut and intuition. One year ago: - Different versions of Cassandra - Multiple clients (i.e.: Pycassa, Hector, Astyanax, Datastax) - Huge challenge with backups and restores Now: - 99.98% database uptime - The database is not a black box anymore © DataStax, All Rights Reserved. 4
  • 5. Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separation of objects from heap memory (memtable_allocation_type) - Customization of heap size allocation. - Update to Garbage First Garbage Collection (G1GC). - Monitoring/alerting based on JMX metrics © DataStax, All Rights Reserved. 5
  • 6. Successful initiatives - In house command line tools https://github.com/Knewton/cassandra-toolbox/ - cassandra-toolbox python package - distributed nodetool - Separation of objects from heap memory (memtable_allocation_type) - Customization of heap size allocation. - Update to Garbage First Garbage Collection (G1GC). - Monitoring/alerting based on JMX metrics © DataStax, All Rights Reserved. 6
  • 7. Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separation of objects from heap memory (memtable_allocation_type) - Customization of heap size allocation. - Update to Garbage First Garbage Collection (G1GC). - Monitoring/alerting based on JMX metrics © DataStax, All Rights Reserved. 7
  • 8. Successful initiatives - In house command line tools - cassandra-toolbox python package - distributed nodetool - Separation of objects from heap memory (memtable_allocation_type) - Customization of heap size allocation. https://tech.knewton.com/ - Update to Garbage First Garbage Collection (G1GC). - Monitoring/alerting based on JMX metrics © DataStax, All Rights Reserved. 8
  • 9. Some less successful initiatives - Monitoring and alerts based on Graphite graphs - Too many resources to get an aggregate - High incidence of false positives and false negatives - GoCD - Cloudwatch © DataStax, All Rights Reserved. 9
  • 10. 1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 10© DataStax, All Rights Reserved.
  • 11. memtable_allocation_type Cassandra allows to keep memtables and key cache objects in the native memory, instead of the Java JVM heap. - Used for data structures that continue growing with time - Options: - heap_buffers - default value before Cassandra 3.0 - all the objects are kept in the JVM heap memory - offheap_buffers - cell name and values are moved to DirectBuffer objects - offheap_objects - moves the entire cell off heap, leaving only a pointer 11
  • 12. Update memtable_allocation_type cassandra-stress tool is a great starting point while validating changes for the database configuration 12 But we needed to go the extra mile with an end-to-end test - involving the rest of the dev team - demonstrate the positive impact of the change to the rest of the system
  • 13. Update memtable_allocation_type cassandra-stress tool is a great starting point while validating changes for the database configuration 13 But we needed to go the extra mile with an end-to-end test - involving the rest of the dev team - demonstrate the positive impact of the change to the rest of the system
  • 14. Test memtable_allocation_type update 14 Update setting Load test (locust) Compile logs from C* and application Analysis with R Response times Functional load tests
  • 15. Update memtable - Criteria End-to-end 15 • Response time – Timeouts • Errors • Throughput • CPU consumption • Memory used Cassandra specific • Cassandra – Time spent for Garbage Collection • Collection – Read and Write latencies – Errors/Exceptions
  • 16. Update memtable_allocation_type Time used for Garbage Collection offheap_buffers offheap_objects heap_buffers 16 Comparing garbage collection times with different values for memtable_allocation_type
  • 17. Update memtable_allocation_type Time used for Garbage Collection offheap_buffers offheap_objects heap_buffers 17 Comparing garbage collection times with different values for memtable_allocation_type
  • 18. Update memtable_allocation_type Memory sizes offheap_buffers offheap_objects heap_buffers 18 Comparing the sizes per generation spaces, before and after the garbage collection.
  • 19. Update memtable_allocation_type GC phases offheap_buffers offheap_objects heap_buffers 19 Comparing the behaviour of the garbage collection phases.
  • 20. memtable_allocation_type results We are using offheap_buffers as it showed: - the lowest average response time for requests - lowest CPU usage - lowest thread count created - lowest write latency *Results may vary 20
  • 21. 1 JVM tuning at Knewton 2 Updating memtable_allocation_type 3 Changing garbage collection strategy 21© DataStax, All Rights Reserved.
  • 22. 22 Garbage First Garbage Collection (G1GC) The G1 collector utilizes multiple background threads to scan through the heap that it divides into regions. It is named “Garbage first” (G1) gives preference to scan those regions that contain the most garbage objects first. This collector is turned on using the –XX:+UseG1GC flag.
  • 23. G1GC analysis G1 was released since April 2012 (JDK 7 update 4 and up) The tools available for the analysis of the garbage collection logs didn’t have the support or were not able to interpret all the information from our servers. - Netflix gcviz does not support Garbage First (G1) strategy - In Oracle’s developer blog (Jeff Taylor) it is proposed an initial approach for JDK 7 © DataStax, All Rights Reserved. 23
  • 24. Test garbage collection 24 Enable gc data collection Get a baseline Compile GC logs Analysis with R
  • 25. G1GC Java arguments 25 Java Arguments as defined in cassandra-env.sh -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:ParallelGCThreads=2 -XX:ConcGCThreads=2 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/<valid path>/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
  • 26. G1GC analysis - Heap size 26
  • 27. G1GC analysis - Heap size 27
  • 28. G1GC analysis - Heap size 28
  • 29. G1GC analysis - Heap size 29
  • 30. G1GC analysis - Heap size 30
  • 31. G1GC analysis - phases 31
  • 33. Code Garbage collection analysis : https://gist.github.com/roymontecutli/4cf5c97f03720e60825f414667c141da Cassandra toolbox : https://github.com/Knewton/cassandra-toolbox 33
  • 34. Conclusions - Moving objects from the JVM heap memory can improve the performance of the application when dealing with large data sets. Yet you need to find out which strategy (take out buffers or objects) suits the best for your use case. - Garbage Collection is an operation that can impact adversely the performance on a Cassandra cluster. Having tools to analyze its behaviour will help to identify areas of impact and measure improvements. - Configuration changes should always consider the system as a whole, involve all the teams. © DataStax, All Rights Reserved. 34