SlideShare ist ein Scribd-Unternehmen logo
1 von 60
@WrathOfChris github.com/WrathOfChris .blog.wrathofchris.com
Time Series Metrics
with Cassandra
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
About Me
• Chris Maxwell
• @WrathOfChris
• Sr Systems Engineer @
Ubiquiti Networks
• Cloud Guy
• DevOps
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Mission
• Metrics service for internal services
• Deliver 90 60 30 days of system and app metrics
• Gain experience with Cassandra
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
History
Ancient Designs
Aging Tools
Pitfalls
https://flic.kr/p/6pqVnP
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v1)
• Single instance
• carbon-relay +
(2-4) carbon-cache
processes (=cpu)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v1)
Problems:
• Single point of SUCCESS!
• Can grow to 16-32 cores, but
I/O saturation
• Carbon write-amplifies 10x
(flushes every 10s)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v2)
• Frontend: carbon-relay
• Backend: carbon-relay +
4x carbon-cache
• m3.2xlarge ephemeral SSD
• Manual consistent-hash by IP
• Replication 3
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v2)
Problems:
• Kind of like a Dynamo, but not
• Replacing node requires full
partition key shuffle
• Adding 5 nodes took 6 days on
1Gbps to re-replicate ring
• Less than 50% disk free means
pain during reshuffle
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Limitations
• Cloud Native
• Avoid Manual Intervention
• Ephemeral SSD > EBS
https://flic.kr/p/2hZy6P
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Design
What we set out to build
https://flic.kr/p/2spiXb
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
…it got complicated…
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
Ingest:
• carbon-c-relay
https://github.com/grobian/carbon-c-relay
• cyanite
https://github.com/pyr/cyanite
• cassandra
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
Retrieval:
• graphite-api
https://github.com/brutasse/graphite-api
• grafana
https://github.com/grafana/grafana
• cyanite
https://github.com/pyr/cyanite
• elasticsearch
(metric path cache)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Journey
Lessons learned along the way
https://flic.kr/p/hjY15L
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Sorted String Table (SSTable)
is an immutable data file
• New data written to small
SSTables
• Periodically merged into larger
SSTables
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Merge 4 similarly sized
SSTables into 1 new SSTable
• Data migrates into larger
SSTables that are less-
regularly compacted
• Disk space required:
Sum of 4 largest SSTables
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Updating a partition frequently
may cause it to be spread
between SSTables
• Metrics workload writes to
all partitions,
every period
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Metrics workload writes to
all partitions,
every period
• Range queries that spanned
50+ SSTables !!!
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Getting to the older data…
• Ingest 25% more data
• Major Compaction:
• Requires 50% free space
• Compacts all SSTables into
1 large SSTable
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Aside: DELETE
• DELETE is the INSERT of a
TOMBSTONE to the end of a
partition
• INSERTs with TTL become
tombstones in the future
• Tombstones live for at least
gc_grace_seconds
• Data is only deleted during
compaction
https://flic.kr/p/35RACf
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
gc_grace_seconds
Grace is getting something you don’t deserve
(time to noetool repair a node that is down)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
gc_grace_seconds
deleted data reappears!
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Time To Live
• INSERT with TTL becomes
tombstone after expiry
• 10s for 6 hours
• 60s for 3 days
• 300s for 30 days
https://flic.kr/p/6Fxv7M
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
TTL
• gc_grace_seconds is 10 days
(by default)
• 10s for 6 hours 10.25 days
• 60s for 3 days 13 days
• 300s for 30 days 40 days
https://flic.kr/p/gBLHYf
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
https://flic.kr/p/4LNiXg
https://flic.kr/p/35RACf
1.4TB
Disks
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
based on Google’s LevelDB implementation
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Data is ingested at Level 0
• Immediately compacted and
merged with L1
• Partitions are merged up to Ln
• 90% of partition data
guaranteed to be in same level
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Metrics workload writes to
all partitions,
every period
• Immediately rolled up to L1
• Immediately rolled up to L2
• Immediately rolled up to L3
• Immediately rolled up to L4
• Immediately rolled up to L5
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Metrics workload writes to
all partitions,
every period
• 1 batch of writes —> 5 writes
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Increasing Write rate
Constant Ingest rate
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Increasing Write rate
Constant Ingest rate
https://flic.kr/p/4LNiXg
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
compaction_throughput_mb_per_sec: 128
…then 0 (unlimited)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Speeding Compactions
… Don’t Do This …
multithreaded: true
cassandra_in_memory_compaction_limit_in_mb: 256M
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Date Tiered Compaction
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Date Tiered Compaction
• Written by
Björn Hegerfors at Spotify
• Experimental!
• Released in 2.0.11 / 2.1.1
• Group data by time
• Compact by time
• Drop expired data by time
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Compact SSTables by date window
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
– but the docs say 8GB maximum heap!
MAX_HEAP_SIZE=16G
HEAP_NEWSIZE=2048M
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
– Rick Branson, Instagram
http://www.slideshare.net/planetcassandra/cassandra-summit-2014-cassandra-at-instagram-2014
-XX:+CMSScavengeBeforeRemark
-XX:CMSMaxAbortablePrecleanTime=60000
-XX:CMSWaitDuration=30000
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
All systems normal
Inadvertently tested 30,000 writes/sec during launch
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
http://wattsupwiththat.com/2015/03/17/spaceship-lenticular-cloud-maybe-the-coolest-cloud-picture-evah/
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
Ec2MultiRegionSnitch
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
Ephemeral RAID0
-Djava.io.tmpdir=/mnt/cassandra/tmp
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Disable AutoScaling Terminate Process:
aws autoscaling suspend-processes --scaling-processes Terminate
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
This design works to 50 instances per region
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Security Groups
IAM instance-profile role
Security Group + (per region) Security Group
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Management (OpsCenter)
IAM instance-profile role
Security Group + (per region) Security Group
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Internode Encryption
server_encryption_options:
internode_encryption: all
• keytool -genkeypair -alias test-cass -keyalg RSA -validity 3650 
-keystore test-cass.keystore
• keytool -export -alias test-cass -keystore test-cass.keystore 
-rfc -file test-cass.crt
• keytool -import -alias test-cass -file test-cass.crt -keystore 
test-cass.truststore
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Seeds
Cheated….
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Seeds
• selects first 3 nodes from each
region using Autoscale Group
order
• ignores (self) as a seed for
bootstrapping first 3 nodes in
each region
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
• >= 4 Cores per node always
• >= 8 Cores as soon as feasible
• EC2 sweet spots:
• m3.2xlarge (8c/160GB) for small workloads
• i2.2xlarge (8c/1.6TB) for production
• Avoid c3.2xlarge - CPU:Mem ratio is too high
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Breaking News!
Dense-storage Instances for EC2
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Questions?
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
d2 instances
Joining a node - system/network
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
d2 instances
Joining a node - disk performance
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
Metrics
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
Cassandra Metrics
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
CPU - DateTiered
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
JVM - DateTiered
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
Compaction/CommitLog - DateTiered

Weitere ähnliche Inhalte

Ähnlich wie Cassandra meetup 20150331

Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Databricks
 
Introduction to hubot
Introduction to hubotIntroduction to hubot
Introduction to hubot
dcshi
 

Ähnlich wie Cassandra meetup 20150331 (20)

Git deep dive – chopping Kubernetes
Git deep dive – chopping KubernetesGit deep dive – chopping Kubernetes
Git deep dive – chopping Kubernetes
 
CVPR2017 oral survey
CVPR2017 oral surveyCVPR2017 oral survey
CVPR2017 oral survey
 
Version Control with GitHub for Bioinformatics
Version Control with GitHub for BioinformaticsVersion Control with GitHub for Bioinformatics
Version Control with GitHub for Bioinformatics
 
Bb health ai_jan26_v2
Bb health ai_jan26_v2Bb health ai_jan26_v2
Bb health ai_jan26_v2
 
Understanding blockchain
Understanding blockchainUnderstanding blockchain
Understanding blockchain
 
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
 
リバースプロキシで webサーバを集約 ついでにdocker化しよう
リバースプロキシでwebサーバを集約ついでにdocker化しようリバースプロキシでwebサーバを集約ついでにdocker化しよう
リバースプロキシで webサーバを集約 ついでにdocker化しよう
 
Voxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservicesVoxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservices
 
Blockchain Cryptography
Blockchain Cryptography Blockchain Cryptography
Blockchain Cryptography
 
CBOR - The Better JSON
CBOR - The Better JSONCBOR - The Better JSON
CBOR - The Better JSON
 
Introduction to hubot
Introduction to hubotIntroduction to hubot
Introduction to hubot
 
Introduction to hubot
Introduction to hubotIntroduction to hubot
Introduction to hubot
 
LJC: Microservices in the real world
LJC: Microservices in the real worldLJC: Microservices in the real world
LJC: Microservices in the real world
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Day 2 Kubernetes - Tools for Operability (HashiConf)
Day 2 Kubernetes - Tools for Operability (HashiConf)Day 2 Kubernetes - Tools for Operability (HashiConf)
Day 2 Kubernetes - Tools for Operability (HashiConf)
 
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
 
15年前に作ったアプリを現在に蘇らせてみた話
15年前に作ったアプリを現在に蘇らせてみた話15年前に作ったアプリを現在に蘇らせてみた話
15年前に作ったアプリを現在に蘇らせてみた話
 
Scrum Gathering Portugal 2016 - Containerizing Tests with Docker
Scrum Gathering Portugal 2016 - Containerizing Tests with DockerScrum Gathering Portugal 2016 - Containerizing Tests with Docker
Scrum Gathering Portugal 2016 - Containerizing Tests with Docker
 
플렉스팀 프론트엔드 기술 스택의 이해: `lint`, `build`, `run`
플렉스팀 프론트엔드 기술 스택의 이해: `lint`, `build`, `run`플렉스팀 프론트엔드 기술 스택의 이해: `lint`, `build`, `run`
플렉스팀 프론트엔드 기술 스택의 이해: `lint`, `build`, `run`
 
October 2014 - USG Rock Eagle - Sass 101
October 2014 - USG Rock Eagle - Sass 101October 2014 - USG Rock Eagle - Sass 101
October 2014 - USG Rock Eagle - Sass 101
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Cassandra meetup 20150331