SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Evolution of
Cassandra at Signal
Matthew Kemp
Matching Service
Unified Customer View
Measurement and Activation
Metrics
Using KairosDB and Cyanite
Audit Log
Event persistence
Our Cassandra Use Cases
Our First Use Cases
Cassandra Timeline: 2011
Evaluated
Cassandra 0.8,
Couch, Mongo
and Riak
Summer 2011 Fall 2011 Winter 2011
Decided on
Cassandra
(1.0 just released)
Initial AWS
deployment in
two regions
(8 total nodes)
Spring 2011
create column family signal_data
with comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and key_validation_class = 'UTF8Type'
and read_repair_chance = 0.0
and gc_grace = 864000;
Our Very First Schema
Example Data Rows
Row Key Columns ...
a111 wwww xxxx yyyy
w111 x111 y111
b222 xxxx zzzz
x222 z222
c333 wwww yyyy zzzz
w333 y333 z333
Attempted to use
SuperColumns
and started using
Priam
Attempted to use
Composite
Columns and
switched to
m2.xlarges
Upgraded to
Cassandra 1.1
and start of
production usage
Added index
column family
Cassandra Timeline: 2012
Summer 2012 Fall 2012 Winter 2012Spring 2012
Priam
Because nobody wants to deal with this
Pre-CQL Data Types
Row Key Columns ...
a111 wwww:id wwww:attr_1 wwww:attr_2 xxxx:id
w111 value1 value2 x111
b222 xxxx:id
x222
c333 wwww:id wwww:attr_2
w333 something
Example Index Rows
Row Key Columns ... ... continued ...
wwww:w111 sid xxxx:x222 sid
a111 b222
xxxx:x111 sid zzzz:z222 sid
a111 b222
yyyy:y111 sid wwww:w333 sid
a111 c333
Upgraded to
Cassandra 1.2.9
and schema
redesign
Performance
testing, evaluating
1.2.x, switched to
Agathon
Upgraded to
Virtual Nodes and
started using
OpsCenter
Cassandra Timeline: 2013
Summer 2013 Fall 2013 Winter 2013Spring 2013
CREATE KEYSPACE signal WITH replication = {
'class': 'NetworkTopologyStrategy',
'eu-west': '2',
'ap-northeast': '2',
'us-west': '2',
'us-east': '2'
};
Keyspace
CREATE TABLE signal_data (
sid uuid,
identifier varchar,
ids map<varchar, varchar>,
data map<varchar, varchar>,
internal map<varchar, varchar>
PRIMARY KEY (sid, identifier)
);
Data Table
Example Data Rows
sid identifier ids data internal
a111 wwww {'id':'w111'} {'attr_1':'value1'} {}
a111 xxxx {'xid':'x111'} {'attr_2':'value-a'} {}
a111 yyyy {'id':'y111'} {} {}
b222 xxxx {'xid':'x222'} {'attr_2':'value-b'} {}
b222 zzzz {'zid':'z222'} {} {}
CREATE TABLE signal_index (
identifier varchar,
partition int,
id_name varchar,
id_value varchar,
sid uuid
PRIMARY KEY (
(identifier, partition), id_name, id_value)
);
Index Table
Example Index Rows
identifier partition id_name id_value sid
wwww 128 id w111 a111
xxxx 84 xid x111 a111
yyyy 71 id y111 a111
xxxx 193 xid x222 b222
zzzz 3 zid z222 b222
partition = abs(hash(id_name + id_value)) % partitions
Migration To Virtual Nodes
Old Ring
(read only)
Data
Access
Service
"Migration"
Script
New Ring
(read & write)
Upgraded to i2.
xlarges and
Cassandra 1.2.16
Scaling clusterSchema
improvements
and KariosDB
backed by
Cassandra
Cassandra Timeline: 2014
Summer 2014 Fall 2014 Winter 2014Spring 2014
AWS Instance Types
Instance Type ECUs Virtual CPUs Memory Disks
m2.xlarge 6.5 2 17.1 1x420GB
m2.2xlarge 13 4 34.2 4x420GB
i2.xlarge 14 4 30.5 1x800GB SSD
SSD Performance: Writes
Upgrade to SSDs Upgrade to 1.2.16
SSD Performance: Reads
Upgrade to SSDs Upgrade to 1.2.16
Added a new
AWS region and
rolled out new
audit log use
case
(separate ring)
???Planned schema
redesign and
focus on
performance
(latency)
Cassandra Timeline: 2015
Summer 2015 Fall 2015 Winter 2015Spring 2015
Metrics and
reporting
(Spark?)
Adding a New Region
West
EastEurope
Tokyo
Sydney
Reduce Latency: Writes
Reduce Latency: Reads
Currently run as a set of ad hoc python scripts
Considering adding reporting tables
Centered around set membership and overlap
Considering using Spark as job runner
Metrics and Reporting
140+ Nodes (i2.xlarge)
15B+ rows in data table
45B+ rows in index table
30+ TB of data
Some Ring Stats
We've survived …
Amazon swallowing nodes
Amazon rebooting the internet
Disk Failures
Corrupt SSTables
Intra and inter region network issues
Dealing With Failure
Use SSDs, don't skimp on hardware
Stay current, but not too current
Keep up on repairs
Test various configurations for your use case
Simulating production load is important
Advice
"The best way to approach data modeling for
Cassandra is to start with your queries and work
backwards from there. Think about the actions
your application needs to perform, how you want
to access the data, and then design column
families to support those access patterns."
The Best Advice
Questions?
Contact Info
mkemp@signal.co
@mattkemp
/in/matthewkemp

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
 
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
 
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
How to Reduce Your Database Total Cost of Ownership with TimescaleDB
How to Reduce Your Database Total Cost of Ownership with TimescaleDBHow to Reduce Your Database Total Cost of Ownership with TimescaleDB
How to Reduce Your Database Total Cost of Ownership with TimescaleDB
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 

Ähnlich wie Cassandra Day Chicago 2015: The Evolution of Apache Cassandra at Signal

Ähnlich wie Cassandra Day Chicago 2015: The Evolution of Apache Cassandra at Signal (20)

Cassandra at BrightTag
Cassandra at BrightTagCassandra at BrightTag
Cassandra at BrightTag
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacks
 
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
 
게임을 위한 Cloud Native on AWS (김일호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
게임을 위한 Cloud Native on AWS (김일호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018게임을 위한 Cloud Native on AWS (김일호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
게임을 위한 Cloud Native on AWS (김일호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
 
Return on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & DataReturn on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & Data
 
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
데이터 마이그레이션 AWS와 같이하기 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
 
Ford's AWS Service Update - March 2020 (Richmond AWS User Group)
Ford's AWS Service Update - March 2020 (Richmond AWS User Group)Ford's AWS Service Update - March 2020 (Richmond AWS User Group)
Ford's AWS Service Update - March 2020 (Richmond AWS User Group)
 
CloudFork
CloudForkCloudFork
CloudFork
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
 
Migrating Monolithic Applications with the Strangler Pattern
Migrating Monolithic Applications with the Strangler Pattern Migrating Monolithic Applications with the Strangler Pattern
Migrating Monolithic Applications with the Strangler Pattern
 
Welcome to icehouse
Welcome to icehouseWelcome to icehouse
Welcome to icehouse
 
CN Asturias - Stateful application for kubernetes
CN Asturias -  Stateful application for kubernetes CN Asturias -  Stateful application for kubernetes
CN Asturias - Stateful application for kubernetes
 
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...
 
Serverless Realtime Backup
Serverless Realtime BackupServerless Realtime Backup
Serverless Realtime Backup
 
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
 

Mehr von DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Cassandra Day Chicago 2015: The Evolution of Apache Cassandra at Signal

  • 1. Evolution of Cassandra at Signal Matthew Kemp
  • 2. Matching Service Unified Customer View Measurement and Activation Metrics Using KairosDB and Cyanite Audit Log Event persistence Our Cassandra Use Cases
  • 4. Cassandra Timeline: 2011 Evaluated Cassandra 0.8, Couch, Mongo and Riak Summer 2011 Fall 2011 Winter 2011 Decided on Cassandra (1.0 just released) Initial AWS deployment in two regions (8 total nodes) Spring 2011
  • 5. create column family signal_data with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.0 and gc_grace = 864000; Our Very First Schema
  • 6. Example Data Rows Row Key Columns ... a111 wwww xxxx yyyy w111 x111 y111 b222 xxxx zzzz x222 z222 c333 wwww yyyy zzzz w333 y333 z333
  • 7. Attempted to use SuperColumns and started using Priam Attempted to use Composite Columns and switched to m2.xlarges Upgraded to Cassandra 1.1 and start of production usage Added index column family Cassandra Timeline: 2012 Summer 2012 Fall 2012 Winter 2012Spring 2012
  • 8. Priam Because nobody wants to deal with this
  • 9. Pre-CQL Data Types Row Key Columns ... a111 wwww:id wwww:attr_1 wwww:attr_2 xxxx:id w111 value1 value2 x111 b222 xxxx:id x222 c333 wwww:id wwww:attr_2 w333 something
  • 10. Example Index Rows Row Key Columns ... ... continued ... wwww:w111 sid xxxx:x222 sid a111 b222 xxxx:x111 sid zzzz:z222 sid a111 b222 yyyy:y111 sid wwww:w333 sid a111 c333
  • 11. Upgraded to Cassandra 1.2.9 and schema redesign Performance testing, evaluating 1.2.x, switched to Agathon Upgraded to Virtual Nodes and started using OpsCenter Cassandra Timeline: 2013 Summer 2013 Fall 2013 Winter 2013Spring 2013
  • 12. CREATE KEYSPACE signal WITH replication = { 'class': 'NetworkTopologyStrategy', 'eu-west': '2', 'ap-northeast': '2', 'us-west': '2', 'us-east': '2' }; Keyspace
  • 13. CREATE TABLE signal_data ( sid uuid, identifier varchar, ids map<varchar, varchar>, data map<varchar, varchar>, internal map<varchar, varchar> PRIMARY KEY (sid, identifier) ); Data Table
  • 14. Example Data Rows sid identifier ids data internal a111 wwww {'id':'w111'} {'attr_1':'value1'} {} a111 xxxx {'xid':'x111'} {'attr_2':'value-a'} {} a111 yyyy {'id':'y111'} {} {} b222 xxxx {'xid':'x222'} {'attr_2':'value-b'} {} b222 zzzz {'zid':'z222'} {} {}
  • 15. CREATE TABLE signal_index ( identifier varchar, partition int, id_name varchar, id_value varchar, sid uuid PRIMARY KEY ( (identifier, partition), id_name, id_value) ); Index Table
  • 16. Example Index Rows identifier partition id_name id_value sid wwww 128 id w111 a111 xxxx 84 xid x111 a111 yyyy 71 id y111 a111 xxxx 193 xid x222 b222 zzzz 3 zid z222 b222 partition = abs(hash(id_name + id_value)) % partitions
  • 17. Migration To Virtual Nodes Old Ring (read only) Data Access Service "Migration" Script New Ring (read & write)
  • 18. Upgraded to i2. xlarges and Cassandra 1.2.16 Scaling clusterSchema improvements and KariosDB backed by Cassandra Cassandra Timeline: 2014 Summer 2014 Fall 2014 Winter 2014Spring 2014
  • 19. AWS Instance Types Instance Type ECUs Virtual CPUs Memory Disks m2.xlarge 6.5 2 17.1 1x420GB m2.2xlarge 13 4 34.2 4x420GB i2.xlarge 14 4 30.5 1x800GB SSD
  • 20. SSD Performance: Writes Upgrade to SSDs Upgrade to 1.2.16
  • 21. SSD Performance: Reads Upgrade to SSDs Upgrade to 1.2.16
  • 22. Added a new AWS region and rolled out new audit log use case (separate ring) ???Planned schema redesign and focus on performance (latency) Cassandra Timeline: 2015 Summer 2015 Fall 2015 Winter 2015Spring 2015 Metrics and reporting (Spark?)
  • 23. Adding a New Region West EastEurope Tokyo Sydney
  • 26. Currently run as a set of ad hoc python scripts Considering adding reporting tables Centered around set membership and overlap Considering using Spark as job runner Metrics and Reporting
  • 27. 140+ Nodes (i2.xlarge) 15B+ rows in data table 45B+ rows in index table 30+ TB of data Some Ring Stats
  • 28. We've survived … Amazon swallowing nodes Amazon rebooting the internet Disk Failures Corrupt SSTables Intra and inter region network issues Dealing With Failure
  • 29. Use SSDs, don't skimp on hardware Stay current, but not too current Keep up on repairs Test various configurations for your use case Simulating production load is important Advice
  • 30. "The best way to approach data modeling for Cassandra is to start with your queries and work backwards from there. Think about the actions your application needs to perform, how you want to access the data, and then design column families to support those access patterns." The Best Advice