SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Downloaden Sie, um offline zu lesen
C* Upgrades at Scale
Ajay Upadhyay & Vinay Chella
Cloud Data Architects
Who are We??
● Ajay Upadhyay
○ Cloud Data Architect @ Netflix
● Vinay Chella
○ Cloud Data Architect @ Netflix
○ https://www.linkedin.com/in/vinaykumarchella
● Cloud Database Engineering [CDE]
● Providing Cassandra, Dynomite, Elastic
Search and other data stores as a service
● About Netflix
● Cassandra @ Netflix
● What is our C* scale ??
● What do we upgrade, Why?
● How does it impact applications and services
● What makes C* upgrades smoother??
● Online Upgrade process
● Infrastructure for upgrade
● Pappy framework
● Migration Hiccups/ Roadblocks
Agenda
4
• Started as DVD rental
• Internet delivery/ streaming of TV shows and movies directly on TVs,
computers, and mobile devices in the United States and internationally
About Netflix
Cassandra @ Netflix
• 95% of streaming data is stored in Cassandra
• Data ranges from customer details to Viewing history to streaming
bookmarks
Multi-region resiliency and Active-Active
281
What is our scale (C* Footprint)
What is our scale (C* Footprint) Continue...
9000+
Duration of the upgrade???
What do we upgrade, Why?
• Software
• Cassandra
• SSL Certificates
• Priam
• OS
• Other sidecars
• Hardware
• AWS instance types
What do we upgrade, Why?(Continue…)
Why new release? What’s wrong with
current release?
• Typically nothing wrong with
current release
• Pretty stable and clusters may
still run with current release
without any issues
• End of life cycle??
What do we upgrade, Why?(Continue…)
• Good to be on latest and stable
release
• Lots of great features
• Bug fixes and enhancements /
new features typically are being
added to the new release
What do we upgrade, Why?(Continue…)
Application and service level impact/change
● Before Migration
● During migration
● After migration
None*
*so long as application is using latest Netflix
OSS components (Astyanax/ karyon)
Application and service level impact/change
During the upgrade???
What makes C* upgrades smoother
and transparent
Choosing the right
version of the binary
Online upgrade process
How do we certify a binary??
Functionality….
Is any existing functionality broken?
Simulate PROD
• Pappy framework
• Mirror production traffic
• Identical AWS instance types
How do we certify a binary for the upgrade
Typical usage patterns:
• Read Heavy
• Write Heavy
• STCS
• LCS
• Aggressive TTLs
• Variable Payloads
How do we certify a binary for the upgrade
Usage Pattern
C* 1.2
C* 1.2
=
Metrics Upgrade C* 2.0 Metrics
MetricsMetrics
=
How do we certify a binary for the upgrade
During Upgrade:
Summary
• Functional validation
• Performance testing
• Load test with routine
maintenance tasks
• Repair and Compaction
• Replacement and Restarts
• Key Metrics Used
• 95th and 99th Latencies
• Dropped Metrics on C*
• Exceptions on C*
• Heap Usage on C*
• Threads Pending on C*
How do we certify a binary for the upgrade
How do we upgrade??
Pre-Upgrade Upgrade Post-Upgrade
Online upgrade process (Pre-Upgrade)
Healthy Cluster
No Maintenance
No Monkeying around
Download bits
Online upgrade process
One node at a time - parallelly in all regions
One zone at a time - parallely in all regions
One zone at a time??
Online upgrade process (steps)
Online upgrade process (steps)
Upgrade:
● Running binary upgrade of Priam and C* on a node
○ Continue to make sure that the cluster is still healthy
○ Disable node in Eureka
○ Disable thrift, gossip and run nodetool drain
○ Stop Cassandra and Priam
○ Install new C* package (tarball)
○ Make sure the cluster is still healthy
■ Disable/Enable gossip if new node does not see all others vice-versa
● Make sure the cluster is still healthy before next step
Online upgrade process (steps)
Post-Upgrade:
1. Set/change parameters for KeySpaces and CFs (if any)
2. Run SSTABLE Upgrade on all nodes (STCS - parallel, LCS - Serial) (if applicable)
3. Enable Chaos Monkey
4. Make sure the cluster is still healthy before considering the upgrade done
Binary upgrades??
Why not replacements??
Dual Writes - Forklift (Pappy Framework)
• For short lived data(TTL) and big clusters
• Astyanax - Dual Writes
• C* Forklifter - Pappy Framework
Choosing the right
version of the binary
Online upgrade process
Click of a button (Just 1 button….)
Infrastructure for upgrade...
• Health Monitoring Jobs
• Regular Maintenance Jobs
• Repairs
• Compactions
• Upgrades
• Horizontal/ Vertical scaling
• Instance Types
• OS
• Sidecar
• C*
Infrastructure for upgrade...
What about performance??
(Yet to be Open Sourced)
Pappy Framework…...
What is Pappy??
• Pappy is a fictional character featured in Popeye cartoons and comics.
• CDE’s new performance Test harness framework
• Netflix homegrown: Well integrated with netflix OSS infrastructure
(Archaius, Servo, Eureka and Karyan)
• To be Open Sourced @ Netflix- OSS https://netflix.github.io/
High level features
• Perf tests for
• Various Instance types comparison
• Varying data models in terms of payload, shard and comparators
• Different driver(s) versions
• Forklifter
• Wide Row Finder
• Pluggable architecture
Architecture
Demo….
Pappy
Pappy
Pappy
Pappy
Pappy
Pappy-Extensions
Pappy-Extensions
Pappy-Extensions
Performance testing...
• More than 10 rounds of performance testing
• Yaml tuning
• Read heavy - Write heavy
• Write heavy
• Read heavy
Test Setup #1
Row Size: 1KB (size-100*cols-10)
Data Size: 30GB (Keys: 20,000,000)
Read: 4K (3.6 K)
Writes: 1 K (700)
Instance: i2.xl (4 CPU, 30 GB RAM)
Nodes: 6
Region: Us-East
Read CL CL_ONE
Write CL CL_ONE
Heap 12 GB / Newgen: 2GB
Test Setup #1 - Results
Test Setup #1 C* 1.2 C* 2.0
ROps 3.1 K - 4.0 K / sec 2.6 - 3.5 K/ sec
WOps 710 /sec 730 /sec
Read 95th 2.25k micro 2.89k micro
Read 99Th 3.6k micro 4.7k micro
Write 95th 720 micro 770 micro
Write 99Th 890 micro 1.01 k micro
Data / Node 30 - 36 GB 30 - 36 GB
Read Metrics
Write Metrics
Summary….
Metrics Test Setup #1(12G Heap/ 30G) Test Setup #2(8GB Heap/~30G) Test Setup #3 (8GB Heap/ 60GB data)
C* 1.2 C* 2.0 C* 1.2 C* 2.0 C* 1.2 C* 2.0
ROps 3.1K - 4.0K/ sec 2.6 - 3.5 K/ sec 3.5K - 4.0K/ sec 3.0 - 3.5 K/ sec 2.5 K - 3.5 K / sec 2.0 K - 3.0 K/ sec
WOps 710 /sec 730 /sec 720 /sec 710 /sec 700 /sec 620 /sec
Read 95th 2.25k micro 2.89k micro 1.7k micro 2.1k micro 2.4k micro 3.5k micro
Read 99Th 3.6k micro 4.7k micro 2.6k micro 3.7k micro
9.0k micro (spikes
20k)
9.5k micro (spikes
19K)
Write 95th 720 micro 770 micro 730 micro 910 micro 730 micro 820 micro
Write 99Th 890 micro 1.01 k micro 910 micro 1.2 k micro 940 micro 1.1 k micro
Data / Node 30 - 36 GB 30 - 36 GB 30 - 36 GB 30 - 36 GB 60 GB 60 GB
Test Setup #5 - Results (Duration: 2 Days)
Test Setup #5 C* 1.2 C* 2.0
ROps 3.8 K / sec 3.8 K/ sec
WOps 198 /sec 198 /sec
Read 95th 7.01k micro 5.89k micro
Read 99Th 27.3k micro (spikes upto 80k) 15.2k micro (spikes upto 60 k)
Read Avg 2.15 2.04
Write 95th 755 micro 895 micro
Write 99Th 961 micro 1.1 k micro
Write Avg 396 444
Data / Node 60 GB 60 GB
Test Case #5: Write 95th and 99th
Test Case #5: Read 95th and 99th
Few Migration Hiccups/ Roadblocks
• InterNodeEncryption = ‘ALL’ => why do we need this?
• https://issues.apache.org/jira/browse/CASSANDRA-8751
• Hints causing Heap disaster
• https://issues.apache.org/jira/browse/CASSANDRA-8485.
is Hiring
jobs.netflix.com

Weitere ähnliche Inhalte

Was ist angesagt?

OpenStack HA
OpenStack HAOpenStack HA
OpenStack HAtcp cloud
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleMirantis
 
High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstackDeepak Mane
 
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG TorinoDistributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG TorinoCodemotion Tel Aviv
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentOPNFV
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a ServiceSteven Wu
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatQiming Teng
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019confluent
 
High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStackKamesh Pemmaraju
 
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...confluent
 
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMWalmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMRedis Labs
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflixVinay Kumar Chella
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Honest performance testing with NDBench
Honest performance testing with NDBenchHonest performance testing with NDBench
Honest performance testing with NDBenchVinay Kumar Chella
 
WSO2Con USA 2015: Deployment Patterns and Capacity Planning
WSO2Con USA 2015: Deployment Patterns and Capacity PlanningWSO2Con USA 2015: Deployment Patterns and Capacity Planning
WSO2Con USA 2015: Deployment Patterns and Capacity PlanningWSO2
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 

Was ist angesagt? (20)

OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
 
High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstack
 
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG TorinoDistributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deployment
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
 
High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStack
 
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
 
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMWalmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Honest performance testing with NDBench
Honest performance testing with NDBenchHonest performance testing with NDBench
Honest performance testing with NDBench
 
WSO2Con USA 2015: Deployment Patterns and Capacity Planning
WSO2Con USA 2015: Deployment Patterns and Capacity PlanningWSO2Con USA 2015: Deployment Patterns and Capacity Planning
WSO2Con USA 2015: Deployment Patterns and Capacity Planning
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 

Andere mochten auch

Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connectAdrian Cockcroft
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflixgreggulrich
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsAcunu
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 

Andere mochten auch (7)

NoSQL for Netflix
NoSQL for NetflixNoSQL for Netflix
NoSQL for Netflix
 
Real world repairs
Real world repairsReal world repairs
Real world repairs
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflix
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 

Ähnlich wie CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX

Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOpsEklove Mohan
 
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Pierre GRANDIN
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 
Continuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneContinuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneDashlane
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceESUG
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLPeace Lee
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesJosef Adersberger
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesQAware GmbH
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep DiveAmazon Web Services
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftAmazon Web Services LATAM
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
Scalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureScalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureSangJin Kang
 
Meetup Openshift Geneva 03/10
Meetup Openshift Geneva 03/10Meetup Openshift Geneva 03/10
Meetup Openshift Geneva 03/10MagaliDavidCruz
 

Ähnlich wie CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX (20)

Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOps
 
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
HPC on OpenStack
HPC on OpenStackHPC on OpenStack
HPC on OpenStack
 
Continuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneContinuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at Dashlane
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGL
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon Redshift
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Scalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureScalability strategies for cloud based system architecture
Scalability strategies for cloud based system architecture
 
Meetup Openshift Geneva 03/10
Meetup Openshift Geneva 03/10Meetup Openshift Geneva 03/10
Meetup Openshift Geneva 03/10
 

Mehr von Vinay Kumar Chella

Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandraVinay Kumar Chella
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudVinay Kumar Chella
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Vinay Kumar Chella
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Vinay Kumar Chella
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scaleVinay Kumar Chella
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandraVinay Kumar Chella
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflixVinay Kumar Chella
 

Mehr von Vinay Kumar Chella (8)

Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandra
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
Data Stores @ Netflix
Data Stores @ NetflixData Stores @ Netflix
Data Stores @ Netflix
 

Kürzlich hochgeladen

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 

Kürzlich hochgeladen (20)

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 

CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX

  • 1. C* Upgrades at Scale Ajay Upadhyay & Vinay Chella Cloud Data Architects
  • 2. Who are We?? ● Ajay Upadhyay ○ Cloud Data Architect @ Netflix ● Vinay Chella ○ Cloud Data Architect @ Netflix ○ https://www.linkedin.com/in/vinaykumarchella ● Cloud Database Engineering [CDE] ● Providing Cassandra, Dynomite, Elastic Search and other data stores as a service
  • 3. ● About Netflix ● Cassandra @ Netflix ● What is our C* scale ?? ● What do we upgrade, Why? ● How does it impact applications and services ● What makes C* upgrades smoother?? ● Online Upgrade process ● Infrastructure for upgrade ● Pappy framework ● Migration Hiccups/ Roadblocks Agenda
  • 4. 4 • Started as DVD rental • Internet delivery/ streaming of TV shows and movies directly on TVs, computers, and mobile devices in the United States and internationally About Netflix
  • 5. Cassandra @ Netflix • 95% of streaming data is stored in Cassandra • Data ranges from customer details to Viewing history to streaming bookmarks
  • 7. 281 What is our scale (C* Footprint)
  • 8. What is our scale (C* Footprint) Continue... 9000+
  • 9. Duration of the upgrade???
  • 10. What do we upgrade, Why? • Software • Cassandra • SSL Certificates • Priam • OS • Other sidecars • Hardware • AWS instance types
  • 11. What do we upgrade, Why?(Continue…) Why new release? What’s wrong with current release?
  • 12. • Typically nothing wrong with current release • Pretty stable and clusters may still run with current release without any issues • End of life cycle?? What do we upgrade, Why?(Continue…)
  • 13. • Good to be on latest and stable release • Lots of great features • Bug fixes and enhancements / new features typically are being added to the new release What do we upgrade, Why?(Continue…)
  • 14. Application and service level impact/change ● Before Migration ● During migration ● After migration
  • 15. None* *so long as application is using latest Netflix OSS components (Astyanax/ karyon) Application and service level impact/change
  • 17. What makes C* upgrades smoother and transparent
  • 18. Choosing the right version of the binary Online upgrade process
  • 19. How do we certify a binary??
  • 21. Is any existing functionality broken?
  • 22. Simulate PROD • Pappy framework • Mirror production traffic • Identical AWS instance types How do we certify a binary for the upgrade
  • 23. Typical usage patterns: • Read Heavy • Write Heavy • STCS • LCS • Aggressive TTLs • Variable Payloads How do we certify a binary for the upgrade
  • 24. Usage Pattern C* 1.2 C* 1.2 = Metrics Upgrade C* 2.0 Metrics MetricsMetrics = How do we certify a binary for the upgrade During Upgrade:
  • 25. Summary • Functional validation • Performance testing • Load test with routine maintenance tasks • Repair and Compaction • Replacement and Restarts • Key Metrics Used • 95th and 99th Latencies • Dropped Metrics on C* • Exceptions on C* • Heap Usage on C* • Threads Pending on C* How do we certify a binary for the upgrade
  • 26. How do we upgrade??
  • 28. Online upgrade process (Pre-Upgrade) Healthy Cluster No Maintenance No Monkeying around Download bits
  • 30. One node at a time - parallelly in all regions
  • 31. One zone at a time - parallely in all regions
  • 32. One zone at a time??
  • 34. Online upgrade process (steps) Upgrade: ● Running binary upgrade of Priam and C* on a node ○ Continue to make sure that the cluster is still healthy ○ Disable node in Eureka ○ Disable thrift, gossip and run nodetool drain ○ Stop Cassandra and Priam ○ Install new C* package (tarball) ○ Make sure the cluster is still healthy ■ Disable/Enable gossip if new node does not see all others vice-versa ● Make sure the cluster is still healthy before next step
  • 35. Online upgrade process (steps) Post-Upgrade: 1. Set/change parameters for KeySpaces and CFs (if any) 2. Run SSTABLE Upgrade on all nodes (STCS - parallel, LCS - Serial) (if applicable) 3. Enable Chaos Monkey 4. Make sure the cluster is still healthy before considering the upgrade done
  • 36. Binary upgrades?? Why not replacements??
  • 37. Dual Writes - Forklift (Pappy Framework) • For short lived data(TTL) and big clusters • Astyanax - Dual Writes • C* Forklifter - Pappy Framework
  • 38. Choosing the right version of the binary Online upgrade process
  • 39. Click of a button (Just 1 button….)
  • 41. • Health Monitoring Jobs • Regular Maintenance Jobs • Repairs • Compactions • Upgrades • Horizontal/ Vertical scaling • Instance Types • OS • Sidecar • C* Infrastructure for upgrade...
  • 43. (Yet to be Open Sourced) Pappy Framework…...
  • 44. What is Pappy?? • Pappy is a fictional character featured in Popeye cartoons and comics. • CDE’s new performance Test harness framework • Netflix homegrown: Well integrated with netflix OSS infrastructure (Archaius, Servo, Eureka and Karyan) • To be Open Sourced @ Netflix- OSS https://netflix.github.io/
  • 45. High level features • Perf tests for • Various Instance types comparison • Varying data models in terms of payload, shard and comparators • Different driver(s) versions • Forklifter • Wide Row Finder • Pluggable architecture
  • 48. Pappy
  • 49. Pappy
  • 50. Pappy
  • 51. Pappy
  • 52. Pappy
  • 56. Performance testing... • More than 10 rounds of performance testing • Yaml tuning • Read heavy - Write heavy • Write heavy • Read heavy
  • 57. Test Setup #1 Row Size: 1KB (size-100*cols-10) Data Size: 30GB (Keys: 20,000,000) Read: 4K (3.6 K) Writes: 1 K (700) Instance: i2.xl (4 CPU, 30 GB RAM) Nodes: 6 Region: Us-East Read CL CL_ONE Write CL CL_ONE Heap 12 GB / Newgen: 2GB
  • 58. Test Setup #1 - Results Test Setup #1 C* 1.2 C* 2.0 ROps 3.1 K - 4.0 K / sec 2.6 - 3.5 K/ sec WOps 710 /sec 730 /sec Read 95th 2.25k micro 2.89k micro Read 99Th 3.6k micro 4.7k micro Write 95th 720 micro 770 micro Write 99Th 890 micro 1.01 k micro Data / Node 30 - 36 GB 30 - 36 GB
  • 61. Summary…. Metrics Test Setup #1(12G Heap/ 30G) Test Setup #2(8GB Heap/~30G) Test Setup #3 (8GB Heap/ 60GB data) C* 1.2 C* 2.0 C* 1.2 C* 2.0 C* 1.2 C* 2.0 ROps 3.1K - 4.0K/ sec 2.6 - 3.5 K/ sec 3.5K - 4.0K/ sec 3.0 - 3.5 K/ sec 2.5 K - 3.5 K / sec 2.0 K - 3.0 K/ sec WOps 710 /sec 730 /sec 720 /sec 710 /sec 700 /sec 620 /sec Read 95th 2.25k micro 2.89k micro 1.7k micro 2.1k micro 2.4k micro 3.5k micro Read 99Th 3.6k micro 4.7k micro 2.6k micro 3.7k micro 9.0k micro (spikes 20k) 9.5k micro (spikes 19K) Write 95th 720 micro 770 micro 730 micro 910 micro 730 micro 820 micro Write 99Th 890 micro 1.01 k micro 910 micro 1.2 k micro 940 micro 1.1 k micro Data / Node 30 - 36 GB 30 - 36 GB 30 - 36 GB 30 - 36 GB 60 GB 60 GB
  • 62. Test Setup #5 - Results (Duration: 2 Days) Test Setup #5 C* 1.2 C* 2.0 ROps 3.8 K / sec 3.8 K/ sec WOps 198 /sec 198 /sec Read 95th 7.01k micro 5.89k micro Read 99Th 27.3k micro (spikes upto 80k) 15.2k micro (spikes upto 60 k) Read Avg 2.15 2.04 Write 95th 755 micro 895 micro Write 99Th 961 micro 1.1 k micro Write Avg 396 444 Data / Node 60 GB 60 GB
  • 63. Test Case #5: Write 95th and 99th
  • 64. Test Case #5: Read 95th and 99th
  • 65. Few Migration Hiccups/ Roadblocks • InterNodeEncryption = ‘ALL’ => why do we need this? • https://issues.apache.org/jira/browse/CASSANDRA-8751 • Hints causing Heap disaster • https://issues.apache.org/jira/browse/CASSANDRA-8485.