SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Running Cassandra in AWS
Patrick Eaton, PhD
patrick@stackdriver.com
@PatrickREaton

Joey Imbasciano
joey@stackdriver.com
@_joeyi
Stackdriver at a Glance

Stackdriver's hosted intelligent monitoring service helps
SaaS companies innovate more by reducing the burden of
day-to-day operations
● Cloud-native and cloud-aware
● Designed for complex distributed applications
● Founded by cloud/infrastructure industry veterans
(Microsoft, VMware, EMC, Endeca, Red Hat) with deep
systems and DevOps expertise
● Team of ~25, based in Downtown Boston
Intelligent Monitoring
Discover customer’s cloud-hosted
applications
●
●
●
●

Infrastructure inventory
Logical units, like groups/clusters
Services, hosted and self-managed
Elastic resources

Monitor
●

●

Various data sources
● Provider metrics
● Host metrics
● Custom metrics
● Endpoints
● Events
● Health
Rich visualizations

Analyze
●
●
●
●
●

Integrate data sources
Aggregate metrics
Report utilization, cost, etc.
Detect policy violations
Recommend actions
Lambda Architecture
●
●
●
●

●

●

Typical of modern architectures for on-line
applications.
Formalized by Nathan Marz
Composed of "batch", "speed", and "serving" layers
Batch layer
○ Store of record
○ Compute arbitrary views
Speed layer
○ Low latency updates
○ Streaming algorithms
Serving layer
○ Combine data from batch and speed layers to
answer queries

Serving

Speed

Batch

Data
Stackdriver Architecture
●
●
●

●

●

Shares characteristics of lambda architecture
Indexing (speed) path
○ Make "live" data available "pre-analysis"
Analysis (batch) path
○ Compute aggregations
○ Create recommendations
Query (serving) layer
○ Combine "live" and analyzed
data to answer queries
○ May require on-the-fly analysis
Alerting (speed) path (not discussed here)
○ Stream processing to detect

Query
(Serving)
Notification
(Serving)

Database

Indexing
(Speed)

Analysis
(Batch)

policy-based anomalies
Data

Alerting
(Speed)
Database Options
● We chose Cassandra!
○ True P2P architecture
○ Good support for write-heavy workloads
○ Compatible data model for time series data
■ Column per metric type, timestamps as columns
● Why not MySQL?
○ Experience with operating large, sharded deployments
○ Relational data model not a good match
● Why not HBase?
○ Operational complexity - zk, hadoop, hdfs, ...
○ Special "Master" role
● Why not Dynamo?
○ Avoid vendor lock-in and high cost
Stackdriver Architecture ++
●

Archival pipeline stores all data
● Very small surface area, battle-tested
● Critical for disaster recovery
● S3 considered durable enough
● Replicated for availability

Query

Cassandra

Roll-ups
Analysis
Recs

Inventory
Data Series
Analyze

●
●
●

Archive means Cassandra is "soft state"
C* consolidates analysis and indexing results
Properties of data in C*
● Immutable data
● Append-only
● Read-1, write-1 consistency

S3

Archive

Index

●

Scales out easily
● Indexers, archivers, analyzers, query servers
Data
Cassandra at Stackdriver Cluster Configuration

●
●
●
●
●

●

Version: Datastax Community Edition 1.2.10
Replication Factor: 3
Vnodes
Murmur3Partitioner
Ec2Snitch
○ Aids in request efficiency
○ Enables Cassandra to ensure replicas are in
different Availability Zones
phi_convict_threshold: 8 -> 12
○ Used to determine when nodes are down
○ AWS network can be spotty
Cassandra Topology in AWS
Where we started...

Where we are...

1
us-east-1a
us-east-1a

3

2

us-east-1c

us-east-1b
us-east-1c

Keep it balanced!

us-east-1b
Cassandra EC2 Node Configuration
● m1.xlarge
○ 4 cores
○ 15 GB RAM
○ 4 ephemeral disks available

● 4 disks RAID-0 for Data Volume and CommitLog
○
○
○
○

ext4 - defaults,noatime
mdadm RAID-0
Compactions
Heavy Read/Write IO
Cassandra Automation and Operations
● Combination of Boto, Fabric, &

Puppet
○ Boto for AWS API
○ Fabric + Puppet for Bootstrapping
○ Fabric for Operations

● One command to:
○
○
○
○
○

Launch a new cluster
Upsize a cluster
Replace a dead node
Remove existing nodes
List nodes in a cluster
Our (Internal) Slogan
Cassandra Backups using S3
● No Cassandra Powered Backups
● Restore from S3
● Useful for major version upgrades
Data

S3

Bulk
Loader

Map
Reduce

1. Data is archived when it is received
2. Bulk loader reads from S3
3. M/R re-analyzes data
4. Cassandra is repopulated

Cassandra
Disaster Recover in the Wild
●
●
●
●
●
●
●
●

October 23, Stackdriver suffered a total loss of our C* cluster
● Exhausted memory due to number of open file descriptors (see graph)
We did not notice the problem until it was too late
● Nodes began crashing, resulted in inconsistent view of the ring
Attempted to restart the cluster unsuccessfully for ~2 hours
Provisioned new 36 node cluster in ~2 hours
Directed “live” data to new cluster
Started bulk restore operation from archive
● Full-fidelity data and aggregations
No data loss due to archival pipeline
See http://www.stackdriver.com/post-mortem-october-23-stackdriver-outage/
Cluster Restoration Process
S3

Map
Reduce

Bulk
Loader

Historical Data
New Cluster
UI
UI
UI

UI
UI
API

UI
UI
Gateway
New Data

Old Cluster
Thank you!
Yes, we are hiring!
Patrick Eaton - patrick@stackdriver.com - @PatrickREaton
Joey Imbasciano - joey@stackdriver.com - @_joeyi

Weitere ähnliche Inhalte

Was ist angesagt?

Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
DataStax
 

Was ist angesagt? (20)

Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Boyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experienceBoyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experience
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
 
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 

Andere mochten auch

Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
jbellis
 
сувид практическое пособие по применению
сувид   практическое пособие по применениюсувид   практическое пособие по применению
сувид практическое пособие по применению
FoodRussiaSchool
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
Amazon Web Services
 

Andere mochten auch (18)

Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Monitoring with Stackdriver
Monitoring with StackdriverMonitoring with Stackdriver
Monitoring with Stackdriver
 
Google Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with Zabbix
 
Bootify your spring application
Bootify your spring applicationBootify your spring application
Bootify your spring application
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
сувид практическое пособие по применению
сувид   практическое пособие по применениюсувид   практическое пособие по применению
сувид практическое пособие по применению
 
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sCloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2's
 
Lightning Hedis
Lightning HedisLightning Hedis
Lightning Hedis
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 
Disaster Recovery Planning using Azure Site Recovery
Disaster Recovery Planning using Azure Site RecoveryDisaster Recovery Planning using Azure Site Recovery
Disaster Recovery Planning using Azure Site Recovery
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Cassandra 2.1 簡介
Cassandra 2.1 簡介Cassandra 2.1 簡介
Cassandra 2.1 簡介
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Amazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWSAmazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWS
 

Ähnlich wie Running Cassandra in AWS

[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Anna Ossowski
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)
Marcos García
 

Ähnlich wie Running Cassandra in AWS (20)

NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQL
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud Workshop
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)
 
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesOrchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
 
Welcome to icehouse
Welcome to icehouseWelcome to icehouse
Welcome to icehouse
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
 

Mehr von DataStax Academy

Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Running Cassandra in AWS

  • 1. Running Cassandra in AWS Patrick Eaton, PhD patrick@stackdriver.com @PatrickREaton Joey Imbasciano joey@stackdriver.com @_joeyi
  • 2. Stackdriver at a Glance Stackdriver's hosted intelligent monitoring service helps SaaS companies innovate more by reducing the burden of day-to-day operations ● Cloud-native and cloud-aware ● Designed for complex distributed applications ● Founded by cloud/infrastructure industry veterans (Microsoft, VMware, EMC, Endeca, Red Hat) with deep systems and DevOps expertise ● Team of ~25, based in Downtown Boston
  • 3. Intelligent Monitoring Discover customer’s cloud-hosted applications ● ● ● ● Infrastructure inventory Logical units, like groups/clusters Services, hosted and self-managed Elastic resources Monitor ● ● Various data sources ● Provider metrics ● Host metrics ● Custom metrics ● Endpoints ● Events ● Health Rich visualizations Analyze ● ● ● ● ● Integrate data sources Aggregate metrics Report utilization, cost, etc. Detect policy violations Recommend actions
  • 4. Lambda Architecture ● ● ● ● ● ● Typical of modern architectures for on-line applications. Formalized by Nathan Marz Composed of "batch", "speed", and "serving" layers Batch layer ○ Store of record ○ Compute arbitrary views Speed layer ○ Low latency updates ○ Streaming algorithms Serving layer ○ Combine data from batch and speed layers to answer queries Serving Speed Batch Data
  • 5. Stackdriver Architecture ● ● ● ● ● Shares characteristics of lambda architecture Indexing (speed) path ○ Make "live" data available "pre-analysis" Analysis (batch) path ○ Compute aggregations ○ Create recommendations Query (serving) layer ○ Combine "live" and analyzed data to answer queries ○ May require on-the-fly analysis Alerting (speed) path (not discussed here) ○ Stream processing to detect Query (Serving) Notification (Serving) Database Indexing (Speed) Analysis (Batch) policy-based anomalies Data Alerting (Speed)
  • 6. Database Options ● We chose Cassandra! ○ True P2P architecture ○ Good support for write-heavy workloads ○ Compatible data model for time series data ■ Column per metric type, timestamps as columns ● Why not MySQL? ○ Experience with operating large, sharded deployments ○ Relational data model not a good match ● Why not HBase? ○ Operational complexity - zk, hadoop, hdfs, ... ○ Special "Master" role ● Why not Dynamo? ○ Avoid vendor lock-in and high cost
  • 7. Stackdriver Architecture ++ ● Archival pipeline stores all data ● Very small surface area, battle-tested ● Critical for disaster recovery ● S3 considered durable enough ● Replicated for availability Query Cassandra Roll-ups Analysis Recs Inventory Data Series Analyze ● ● ● Archive means Cassandra is "soft state" C* consolidates analysis and indexing results Properties of data in C* ● Immutable data ● Append-only ● Read-1, write-1 consistency S3 Archive Index ● Scales out easily ● Indexers, archivers, analyzers, query servers Data
  • 8. Cassandra at Stackdriver Cluster Configuration ● ● ● ● ● ● Version: Datastax Community Edition 1.2.10 Replication Factor: 3 Vnodes Murmur3Partitioner Ec2Snitch ○ Aids in request efficiency ○ Enables Cassandra to ensure replicas are in different Availability Zones phi_convict_threshold: 8 -> 12 ○ Used to determine when nodes are down ○ AWS network can be spotty
  • 9. Cassandra Topology in AWS Where we started... Where we are... 1 us-east-1a us-east-1a 3 2 us-east-1c us-east-1b us-east-1c Keep it balanced! us-east-1b
  • 10. Cassandra EC2 Node Configuration ● m1.xlarge ○ 4 cores ○ 15 GB RAM ○ 4 ephemeral disks available ● 4 disks RAID-0 for Data Volume and CommitLog ○ ○ ○ ○ ext4 - defaults,noatime mdadm RAID-0 Compactions Heavy Read/Write IO
  • 11. Cassandra Automation and Operations ● Combination of Boto, Fabric, & Puppet ○ Boto for AWS API ○ Fabric + Puppet for Bootstrapping ○ Fabric for Operations ● One command to: ○ ○ ○ ○ ○ Launch a new cluster Upsize a cluster Replace a dead node Remove existing nodes List nodes in a cluster
  • 13. Cassandra Backups using S3 ● No Cassandra Powered Backups ● Restore from S3 ● Useful for major version upgrades Data S3 Bulk Loader Map Reduce 1. Data is archived when it is received 2. Bulk loader reads from S3 3. M/R re-analyzes data 4. Cassandra is repopulated Cassandra
  • 14. Disaster Recover in the Wild ● ● ● ● ● ● ● ● October 23, Stackdriver suffered a total loss of our C* cluster ● Exhausted memory due to number of open file descriptors (see graph) We did not notice the problem until it was too late ● Nodes began crashing, resulted in inconsistent view of the ring Attempted to restart the cluster unsuccessfully for ~2 hours Provisioned new 36 node cluster in ~2 hours Directed “live” data to new cluster Started bulk restore operation from archive ● Full-fidelity data and aggregations No data loss due to archival pipeline See http://www.stackdriver.com/post-mortem-october-23-stackdriver-outage/
  • 15. Cluster Restoration Process S3 Map Reduce Bulk Loader Historical Data New Cluster UI UI UI UI UI API UI UI Gateway New Data Old Cluster
  • 16. Thank you! Yes, we are hiring! Patrick Eaton - patrick@stackdriver.com - @PatrickREaton Joey Imbasciano - joey@stackdriver.com - @_joeyi