SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Building a Multi-Region Cluster at Target
Presenters: Andrew From and Aaron Ploetz
1 Introduction
2 Target’s DCD Cluster
3 Problems
4 Solutions
5 Current State
6 Lessons Learned
2© DataStax, All Rights Reserved.
© DataStax, All Rights Reserved. 3
TTS - CPE Cassandra
Data model consulting
Deployment
Operations
$ whoami_
• B.S.-MCS University of Wisconsin-Whitewater.
• M.S.-SED Regis University.
• 10+ years experience with distributed storage tech.
• Supporting Cassandra in production since v0.8.
• Contributor to the Apache Cassandra project (cqlsh).
• Contributor to the Cassandra tag on Stack Overflow.
• 3x DataStax MVP for Apache Cassandra (2014-17).
© DataStax, All Rights Reserved. 4
Aaron Ploetz
$ whoami_
• B.S. Computer Engineering University of Minnesota
• Using Cassandra in production since v2.0
• Contributor to the Ratpack Framework Project.
• Contributor to the Cassandra tag on Stack Overflow.
• Maintainer for statsd-kafka-backend plugin on the Target public Github org
• I am very passionate and interested in metrics, monitoring, and alerting (not just
on Cassandra)
© DataStax, All Rights Reserved. 5
Andrew From
Introduction
Cassandra at Target
Cassandra clusters at Target
© DataStax, All Rights Reserved. 7
Cartwheel
Personalization
GAM
DCD Subscriptions Enterprise
Services
GDMItem LocationsCheckoutAdaptive-SEO
Versions used at Target
• Depends on experience of the application team.
• Most clusters run DSE 4.0.3.
• Some clusters have been built with Apache Cassandra 2.1.12-15.
• Most new clusters built on 2.2.7.
© DataStax, All Rights Reserved. 8
Target’s DCD Cluster
• Data footprint = 350Gb
• Multi-tenant cluster, supporting several teams:
• Rating and Reviews
• Pricing
• Item
• Promotions
• Content
• A/B Testing
• Back To School, Shopping Lists
• …and others
© DataStax, All Rights Reserved. 9
2015 Peak (Black Friday -> Cyber Monday)
• DataStax Enterprise 4.0.3 (Cassandra 2.0.7.31)
• Java 1.7
• CMS GC
• 18 nodes (256GB RAM, 6-core, 24 HT CPUs, 1TB disk):
• 6 nodes at two Target data centers (3 on-the-metal nodes each).
• 12 nodes at two tel-co data centers (6 on-the-metal nodes each).
• 500 Mbit connection with Tel-co datacenters.
• Sustained between 5000 and 5500 TPS during Peak.
• Zero downtime!
© DataStax, All Rights Reserved. 10
2016 Q1: Expand DCD to the cloud
Plan to expand DCD Cassandra cluster to cloud East and West regions:
• Add six cloud instances in each region.
• VPN connection to Target network.
• Data locality: Support teams spinning-up scalable application servers in the cloud.
• Went “live” early March 2016:
• Cloud-West 6 nodes
• Cloud-East 6 nodes
• TTC – 3 nodes
• TTCE – 3 nodes
• Tel-co 1 – 6 nodes
• Tel-co 2 – 6 nodes
© DataStax, All Rights Reserved. 11
Problems
Cassandra at Target
Problems
• VPN not stable between Cloud and Target.
• GC Pauses causing application latency.
• Orphaned repair streams building up over time.
• Data inconsistency between Tel-co and Cloud.
• Issues with large batches.
• Issues with poor data models.
© DataStax, All Rights Reserved. 13
Solutions
Cassandra at Target
• VPN not stable between Cloud and Target
• Gossip between Tel-co and Cloud was sketchy.
• Nodes reporting as “DN” that were actually ”UN.”
• Worked with Cloud Provider admins, reviewed architecture.
• Set-up increased monitoring to determine DCD Cassandra downtimes,
cross-referenced that with VPN connection logs with both our (Target) Cloud
Platform Engineering (CPE) Network team and Cloud Provider admins.
• Our CPE Network team worked with Cloud Provider to:
• Ensure proper network configuration.
• “Bi-directional” dead peer detection (DPD).
• “DPD Responder” handles dead peer requests from Cloud endpoint.
• Upgrade our VPN connections to 1Gbit.
© DataStax, All Rights Reserved. 15
• GC Pauses causing application latency
• STW GC pauses of 10-20 seconds (or more) rendering nodes unresponsive,
during nightly batch jobs (9pm – 6am). Most often around 2am.
• Was a small issue just prior to 2015 Peak Season; became more of a
problem once we expanded to the cloud.
• Worked with our teams to refine their data models and write patterns.
• Upgraded to Java 1.8.
• Enabled G1GC…biggest “win.”
© DataStax, All Rights Reserved. 16
• Orphaned repair streams building up over time
• Due to VPN inconsistency, repair streams between Cloud and Tel-co could
not complete consistently.
• Orphaned Streams (pending repairs) built-up over time, system load
average in Tel-co and Target nodes rose, nodes eventually became
unresponsive.
• Examined our use cases, scheduled ”focused” repair jobs, and only for
certain applications.
nodetool –h <node_ip> repair –pr -hosts <source_ip1,source_ip2,etc…>
• Only run repairs between Cloud and Target or Tel-co and Target.
© DataStax, All Rights Reserved. 17
• Data inconsistency between Tel-co and Cloud
• Data inconsistency issues causing problems for certain teams.
• Upgraded our Tel-co connection to 1Gbit, expandable to 10Gbit on request.
• But this was ultimately the wall we could not get around.
• Problems with repairs, but also issues with boot-strapping and
decommissioning nodes
• Met with ALL application teams:
• Discussed a future plan to split-off the Cloud nodes (with six new Target nodes)
into a new “DCD Cloud” cluster.
• Talked through application requirements, determined who would need to
move/split to “DCD Cloud.”
• Also challenged the application teams on who really needed to be in both Cloud
and Tel-co. Turns out that only the Pricing team needed both, and the rest could
successfully serve their requirements from one or the other.
© DataStax, All Rights Reserved. 18
Issues with large batches
• Teams unknowingly using batches:
© DataStax, All Rights Reserved. 19
Poor Data Models
• Using Cassandra in a batch way:
• Reading large tables entirely with SELECT * FROM table;
• Re-inserting data to a table on a schedule, even when data has not changed
• Lack of de-normalized tables to support specific queries
• Queries using ALLOW FILTERING
• “XREF” tables
• Queue-like usage of tables
• Extremely large row sizes
• Abuse of Collection data types
• Read before writing (or writing then immediately reading)
© DataStax, All Rights Reserved. 20
Current State
Cassandra at Target
August 2016
• DataStax Enterprise 4.0.3 (Cassandra 2.0.7.31)… Upgrade to 2.1 planned.
• Java 1.8
• G1GC 32GB Heap, MaxGCPauseMillis=500
• DCD Classic - 18 nodes (256GB RAM, 24 HT CPUs, 1TB disk):
• 6 nodes at two Target data centers (3 on-the-metal nodes each).
• 12 nodes at two Tel-co data centers (6 on-the-metal nodes each).
• DCD Cloud - 18 nodes (256GB RAM, 24 HT CPUs):
• 6 nodes at two Target data centers (3 on-the-metal nodes each).
• 12 nodes at two Cloud data centers (6 i2.2xlarge nodes each).
• Upgraded connection (up to 10 Gbit) with Tel-co.
© DataStax, All Rights Reserved. 22
Lessons Learned
• Spend time on your data model!
• Most overlooked aspect of Cassandra architecting.
• Build relationships with your application teams.
• Build tables to suit the query patterns.
• Talk about consistency requirements with your app teams.
• Ask the questions:
• Do you really need to replicate everywhere?
• Do you really need to read/write at LOCAL_QUORUM?
• What is your anticipated read/write ratio?
• Watch for tombstones! TTL-ing data helps, but TTLs are not free!
© DataStax, All Rights Reserved. 23
Lessons Learned (part 2)
• Involve your network team.
• Use a metrics solution (Graphite/Grafana, OpsCenter, etc)
• Give them exact data to work with.
• When building a new cluster with data centers in the cloud, thoroughly test-
out the operational aspects of your cluster:
• Bootstrapping/decommissioning a node.
• Running repairs.
• Trigger a GC (cassandra-stress can help with that).
• G1GC helps for larger heap sizes, but it’s not a silver bullet.
© DataStax, All Rights Reserved. 24
Questions?
Cassandra at Target

Weitere ähnliche Inhalte

Was ist angesagt?

Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
Acunu
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
DataStax
 

Was ist angesagt? (18)

Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayCassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 

Andere mochten auch

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
DataStax
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
DataStax
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
DataStax
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
DataStax
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 

Andere mochten auch (20)

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Strategic Design by Architecture and Organisation @ FINN.no - JavaZone 2016
Strategic Design by Architecture and Organisation @ FINN.no - JavaZone 2016Strategic Design by Architecture and Organisation @ FINN.no - JavaZone 2016
Strategic Design by Architecture and Organisation @ FINN.no - JavaZone 2016
 
Webinar | Target Modernizes Retail with Engaging Digital Experiences
Webinar | Target Modernizes Retail with Engaging Digital ExperiencesWebinar | Target Modernizes Retail with Engaging Digital Experiences
Webinar | Target Modernizes Retail with Engaging Digital Experiences
 
Best buy strategic analysis (bb team) final
Best buy strategic analysis (bb team) finalBest buy strategic analysis (bb team) final
Best buy strategic analysis (bb team) final
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big Data
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at Target
 
Using APIs to Create an Omni-Channel Retail Experience
Using APIs to Create an Omni-Channel Retail ExperienceUsing APIs to Create an Omni-Channel Retail Experience
Using APIs to Create an Omni-Channel Retail Experience
 
Demystify Big Data Breakfast Briefing - Juergen Urbanski, T-Systems
Demystify Big Data Breakfast Briefing - Juergen Urbanski, T-SystemsDemystify Big Data Breakfast Briefing - Juergen Urbanski, T-Systems
Demystify Big Data Breakfast Briefing - Juergen Urbanski, T-Systems
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
 
Electronics Industry (Marketing Management)
Electronics Industry (Marketing Management)Electronics Industry (Marketing Management)
Electronics Industry (Marketing Management)
 
Operating Model
Operating ModelOperating Model
Operating Model
 
Best buy
Best buyBest buy
Best buy
 

Ähnlich wie Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra Summit 2016

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 

Ähnlich wie Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra Summit 2016 (20)

Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Deploying Cassandra Multi-cloud
Deploying Cassandra Multi-cloudDeploying Cassandra Multi-cloud
Deploying Cassandra Multi-cloud
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 

Mehr von DataStax

Mehr von DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Kürzlich hochgeladen

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Kürzlich hochgeladen (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra Summit 2016

  • 1. Building a Multi-Region Cluster at Target Presenters: Andrew From and Aaron Ploetz
  • 2. 1 Introduction 2 Target’s DCD Cluster 3 Problems 4 Solutions 5 Current State 6 Lessons Learned 2© DataStax, All Rights Reserved.
  • 3. © DataStax, All Rights Reserved. 3 TTS - CPE Cassandra Data model consulting Deployment Operations
  • 4. $ whoami_ • B.S.-MCS University of Wisconsin-Whitewater. • M.S.-SED Regis University. • 10+ years experience with distributed storage tech. • Supporting Cassandra in production since v0.8. • Contributor to the Apache Cassandra project (cqlsh). • Contributor to the Cassandra tag on Stack Overflow. • 3x DataStax MVP for Apache Cassandra (2014-17). © DataStax, All Rights Reserved. 4 Aaron Ploetz
  • 5. $ whoami_ • B.S. Computer Engineering University of Minnesota • Using Cassandra in production since v2.0 • Contributor to the Ratpack Framework Project. • Contributor to the Cassandra tag on Stack Overflow. • Maintainer for statsd-kafka-backend plugin on the Target public Github org • I am very passionate and interested in metrics, monitoring, and alerting (not just on Cassandra) © DataStax, All Rights Reserved. 5 Andrew From
  • 7. Cassandra clusters at Target © DataStax, All Rights Reserved. 7 Cartwheel Personalization GAM DCD Subscriptions Enterprise Services GDMItem LocationsCheckoutAdaptive-SEO
  • 8. Versions used at Target • Depends on experience of the application team. • Most clusters run DSE 4.0.3. • Some clusters have been built with Apache Cassandra 2.1.12-15. • Most new clusters built on 2.2.7. © DataStax, All Rights Reserved. 8
  • 9. Target’s DCD Cluster • Data footprint = 350Gb • Multi-tenant cluster, supporting several teams: • Rating and Reviews • Pricing • Item • Promotions • Content • A/B Testing • Back To School, Shopping Lists • …and others © DataStax, All Rights Reserved. 9
  • 10. 2015 Peak (Black Friday -> Cyber Monday) • DataStax Enterprise 4.0.3 (Cassandra 2.0.7.31) • Java 1.7 • CMS GC • 18 nodes (256GB RAM, 6-core, 24 HT CPUs, 1TB disk): • 6 nodes at two Target data centers (3 on-the-metal nodes each). • 12 nodes at two tel-co data centers (6 on-the-metal nodes each). • 500 Mbit connection with Tel-co datacenters. • Sustained between 5000 and 5500 TPS during Peak. • Zero downtime! © DataStax, All Rights Reserved. 10
  • 11. 2016 Q1: Expand DCD to the cloud Plan to expand DCD Cassandra cluster to cloud East and West regions: • Add six cloud instances in each region. • VPN connection to Target network. • Data locality: Support teams spinning-up scalable application servers in the cloud. • Went “live” early March 2016: • Cloud-West 6 nodes • Cloud-East 6 nodes • TTC – 3 nodes • TTCE – 3 nodes • Tel-co 1 – 6 nodes • Tel-co 2 – 6 nodes © DataStax, All Rights Reserved. 11
  • 13. Problems • VPN not stable between Cloud and Target. • GC Pauses causing application latency. • Orphaned repair streams building up over time. • Data inconsistency between Tel-co and Cloud. • Issues with large batches. • Issues with poor data models. © DataStax, All Rights Reserved. 13
  • 15. • VPN not stable between Cloud and Target • Gossip between Tel-co and Cloud was sketchy. • Nodes reporting as “DN” that were actually ”UN.” • Worked with Cloud Provider admins, reviewed architecture. • Set-up increased monitoring to determine DCD Cassandra downtimes, cross-referenced that with VPN connection logs with both our (Target) Cloud Platform Engineering (CPE) Network team and Cloud Provider admins. • Our CPE Network team worked with Cloud Provider to: • Ensure proper network configuration. • “Bi-directional” dead peer detection (DPD). • “DPD Responder” handles dead peer requests from Cloud endpoint. • Upgrade our VPN connections to 1Gbit. © DataStax, All Rights Reserved. 15
  • 16. • GC Pauses causing application latency • STW GC pauses of 10-20 seconds (or more) rendering nodes unresponsive, during nightly batch jobs (9pm – 6am). Most often around 2am. • Was a small issue just prior to 2015 Peak Season; became more of a problem once we expanded to the cloud. • Worked with our teams to refine their data models and write patterns. • Upgraded to Java 1.8. • Enabled G1GC…biggest “win.” © DataStax, All Rights Reserved. 16
  • 17. • Orphaned repair streams building up over time • Due to VPN inconsistency, repair streams between Cloud and Tel-co could not complete consistently. • Orphaned Streams (pending repairs) built-up over time, system load average in Tel-co and Target nodes rose, nodes eventually became unresponsive. • Examined our use cases, scheduled ”focused” repair jobs, and only for certain applications. nodetool –h <node_ip> repair –pr -hosts <source_ip1,source_ip2,etc…> • Only run repairs between Cloud and Target or Tel-co and Target. © DataStax, All Rights Reserved. 17
  • 18. • Data inconsistency between Tel-co and Cloud • Data inconsistency issues causing problems for certain teams. • Upgraded our Tel-co connection to 1Gbit, expandable to 10Gbit on request. • But this was ultimately the wall we could not get around. • Problems with repairs, but also issues with boot-strapping and decommissioning nodes • Met with ALL application teams: • Discussed a future plan to split-off the Cloud nodes (with six new Target nodes) into a new “DCD Cloud” cluster. • Talked through application requirements, determined who would need to move/split to “DCD Cloud.” • Also challenged the application teams on who really needed to be in both Cloud and Tel-co. Turns out that only the Pricing team needed both, and the rest could successfully serve their requirements from one or the other. © DataStax, All Rights Reserved. 18
  • 19. Issues with large batches • Teams unknowingly using batches: © DataStax, All Rights Reserved. 19
  • 20. Poor Data Models • Using Cassandra in a batch way: • Reading large tables entirely with SELECT * FROM table; • Re-inserting data to a table on a schedule, even when data has not changed • Lack of de-normalized tables to support specific queries • Queries using ALLOW FILTERING • “XREF” tables • Queue-like usage of tables • Extremely large row sizes • Abuse of Collection data types • Read before writing (or writing then immediately reading) © DataStax, All Rights Reserved. 20
  • 22. August 2016 • DataStax Enterprise 4.0.3 (Cassandra 2.0.7.31)… Upgrade to 2.1 planned. • Java 1.8 • G1GC 32GB Heap, MaxGCPauseMillis=500 • DCD Classic - 18 nodes (256GB RAM, 24 HT CPUs, 1TB disk): • 6 nodes at two Target data centers (3 on-the-metal nodes each). • 12 nodes at two Tel-co data centers (6 on-the-metal nodes each). • DCD Cloud - 18 nodes (256GB RAM, 24 HT CPUs): • 6 nodes at two Target data centers (3 on-the-metal nodes each). • 12 nodes at two Cloud data centers (6 i2.2xlarge nodes each). • Upgraded connection (up to 10 Gbit) with Tel-co. © DataStax, All Rights Reserved. 22
  • 23. Lessons Learned • Spend time on your data model! • Most overlooked aspect of Cassandra architecting. • Build relationships with your application teams. • Build tables to suit the query patterns. • Talk about consistency requirements with your app teams. • Ask the questions: • Do you really need to replicate everywhere? • Do you really need to read/write at LOCAL_QUORUM? • What is your anticipated read/write ratio? • Watch for tombstones! TTL-ing data helps, but TTLs are not free! © DataStax, All Rights Reserved. 23
  • 24. Lessons Learned (part 2) • Involve your network team. • Use a metrics solution (Graphite/Grafana, OpsCenter, etc) • Give them exact data to work with. • When building a new cluster with data centers in the cloud, thoroughly test- out the operational aspects of your cluster: • Bootstrapping/decommissioning a node. • Running repairs. • Trigger a GC (cassandra-stress can help with that). • G1GC helps for larger heap sizes, but it’s not a silver bullet. © DataStax, All Rights Reserved. 24