SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Skinny on Wide Rows
Aki Colovic
Lead Software Engineer
About Signal
- collect customer engagement data
over across different channels(mobile,
desktop, pos
)
- connect customer data and behaviors
into identities
- send outbound real-time marketing
signals(activation)
Signal’s FUSE Platform
Presentation
- Cassandra @ Signal
- Wide Rows
- How do we use wide rows?
- Signal’s Identity service
- Wide rows and compaction
- Wide rows and caches
- Don’t mix reads and writes
- Index table rebuild
- GC tuning
- Ring Migration
- Questions
- Identity Service
- Activation Metrics
Cassandra @ Signal
5 rings
190+ Nodes (AWS i2.xlarge)‹
~2 billion queries per day
30+ TB of data
15B+ rows in profiles table
45B+ rows in index table
Cassandra Ring Stats
Tokyo
Global Cassandra Deployment
US West US East Europe
Sydney
We've survived 

Amazon swallowing nodes
Amazon rebooting the internet
Disk Failures
Corrupt SSTables
Intra and inter region network issues
Dealing With Failure
How do we use wide rows?
- store profile info
- build customer cross-channel identity
- identity graph
- client reporting
CREATE TABLE profile_data (‹
profile_id uuid,‹
client_id varchar,‹
user_ids map<varchar, varchar>,‹
attributes map<varchar, varchar>‹
PRIMARY KEY (profile_id, client)‹
);
Profile Data Table
profile_id 1
client id 1 client id 2
user_ids attributes user_ids attributes 

client id N
user_ids attributes
Profile Data Table
profile_id N
client id 1 client id 2
user_ids attributes user_ids attributes 

client id N
user_ids attributes
:
:
CREATE TABLE profile_data_index (
client_id varchar,
partition int,
user_id varchar,
user_id_value varchar,
profile_id uuid,
PRIMARY KEY
((client_id, partition), user_id, user_id_value)‹
);
Profile Data Index Table
client_id
+
partition
user_id 1,
user_id_value 1
user_id 1,
user_id_value 2
user_id 2,
user_id_value 2
profile_id A profile_id B profile_id C


Profile Data Index Table
user_id N,
user_id_value N
profile_id N
partition = hash(user_id_name + user_id_value) MOD partition_max
- multiple customer
profiles are connected
into single identity
- connect identities across
different clients
Identity Graph
CREATE TABLE identity_graph (‹
identity_id uuid,‹
client varchar,‹
profile_id map<uuid, timestamp>‹
PRIMARY KEY (identity_id, client)‹
);
Identity Graph Table
identity_id
client id 1 client id 2 client id 3
{profile id 1,
profile id 2}
{profile id 4,
profile id 5,
profile id 6}
{profile id 7,
profile id 8} 

Identity Graph Table
client id N
{set of profile ids}
"The best way to approach data modeling for
Cassandra is to start with your queries and work
backwards from there. Think about the actions
your application needs to perform, how you want
to access the data, and then design column
families to support those access patterns."
Final thoughts on modeling
- size-tiered compaction
- slower compactions due to wide rows
- wide row limits (100MB or 100,000 elements)
- monitor wide row size
- monitor your sstable count
- in_memory_compaction_limit_in_mb,
- your widest row should be able to fit in memory
- avoids slow 2 pass disk-based compaction
- set_compaction_throughput
Wide Rows and Compactions
- caches are good
- unless your wide rows are too big
- be aware of your cache hits
- key_cache_size_in_mb
- row_cache_size_in_mb
Wide Rows and Caches
- slower reads can affect your
FAST writes
- handle your read and writes
paths separately
- if chaining reads and writes
do it async
Mixing Reads and Writes
- it is awesome to grow
- our wide rows became too wide
- 256 partitions was not enough
- solution: rebuild the index with more partitions
- index re-builder
Index Rebuild
- we tuned our GC settings for low latency and to prevent full GC as
much as possible
- know your GC settings
- monitor your GC latency - it has an impact on your cassandra
performance
- monitor your traffic patterns during troubling GC latencies
- using CMSIncrementalMode to break up the concurrent GC phases
into short bursts of activity
- every environment different, you will have to
GC and Cassandra
- upgrading to i2.2xlarge
- virtual nodes, increase replication factor
Ring Migration/Upgrade
Migration
Old Ring
Identity
Service
"Migration"
Script
New Ring
READ & WRITE
WRITE ONLY
Real Time Traffic
READ
WRITE
Questions?
Contact Info
acolovic@signal.co
@akicolovic

Weitere Àhnliche Inhalte

Was ist angesagt?

Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
DataStax
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
Data Con LA
 

Was ist angesagt? (20)

Cassandra
CassandraCassandra
Cassandra
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Performance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxPerformance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. Datastax
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
 
AWS Big Data Demystified #4 data governance demystified [security, networ...
AWS Big Data Demystified #4   data governance demystified   [security, networ...AWS Big Data Demystified #4   data governance demystified   [security, networ...
AWS Big Data Demystified #4 data governance demystified [security, networ...
 
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarWebinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 

Andere mochten auch

Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
DataStax
 
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
DataStax Academy
 
Apache Cassandra at Narmal 2014
Apache Cassandra at Narmal 2014Apache Cassandra at Narmal 2014
Apache Cassandra at Narmal 2014
DataStax Academy
 
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
DataStax Academy
 

Andere mochten auch (20)

Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
 
Apache Cassandra at Narmal 2014
Apache Cassandra at Narmal 2014Apache Cassandra at Narmal 2014
Apache Cassandra at Narmal 2014
 
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
 
Introduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraIntroduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for Cassandra
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
 
Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!
 
Production Ready Cassandra (Beginner)
Production Ready Cassandra (Beginner)Production Ready Cassandra (Beginner)
Production Ready Cassandra (Beginner)
 
Coursera's Adoption of Cassandra
Coursera's Adoption of CassandraCoursera's Adoption of Cassandra
Coursera's Adoption of Cassandra
 
The Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to DatabaseThe Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to Database
 
New features in 3.0
New features in 3.0New features in 3.0
New features in 3.0
 
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
 

Ähnlich wie Signal Digital: The Skinny on Wide Rows

In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
Chinmay Kulkarni
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
VMware Tanzu
 

Ähnlich wie Signal Digital: The Skinny on Wide Rows (20)

Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
AWS Summit Seoul 2015 - AWS 씜신 서ëč„슀 ì‚ŽíŽŽëłŽêž° - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 씜신 서ëč„슀 ì‚ŽíŽŽëłŽêž° - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 씜신 서ëč„슀 ì‚ŽíŽŽëłŽêž° - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 씜신 서ëč„슀 ì‚ŽíŽŽëłŽêž° - Aurora, Lambda, EFS, Machine Learn...
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
Dbs302 driving a realtime personalization engine with cloud bigtable
Dbs302  driving a realtime personalization engine with cloud bigtableDbs302  driving a realtime personalization engine with cloud bigtable
Dbs302 driving a realtime personalization engine with cloud bigtable
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
Presentation
PresentationPresentation
Presentation
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 

Mehr von DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

KĂŒrzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

KĂŒrzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Signal Digital: The Skinny on Wide Rows

  • 1. Skinny on Wide Rows Aki Colovic Lead Software Engineer
  • 2. About Signal - collect customer engagement data over across different channels(mobile, desktop, pos
) - connect customer data and behaviors into identities - send outbound real-time marketing signals(activation)
  • 4. Presentation - Cassandra @ Signal - Wide Rows - How do we use wide rows? - Signal’s Identity service - Wide rows and compaction - Wide rows and caches - Don’t mix reads and writes - Index table rebuild - GC tuning - Ring Migration - Questions
  • 5. - Identity Service - Activation Metrics Cassandra @ Signal
  • 6. 5 rings 190+ Nodes (AWS i2.xlarge)‹ ~2 billion queries per day 30+ TB of data 15B+ rows in profiles table 45B+ rows in index table Cassandra Ring Stats
  • 7. Tokyo Global Cassandra Deployment US West US East Europe Sydney
  • 8. We've survived 
 Amazon swallowing nodes Amazon rebooting the internet Disk Failures Corrupt SSTables Intra and inter region network issues Dealing With Failure
  • 9. How do we use wide rows? - store profile info - build customer cross-channel identity - identity graph - client reporting
  • 10. CREATE TABLE profile_data (‹ profile_id uuid,‹ client_id varchar,‹ user_ids map<varchar, varchar>,‹ attributes map<varchar, varchar>‹ PRIMARY KEY (profile_id, client)‹ ); Profile Data Table
  • 11. profile_id 1 client id 1 client id 2 user_ids attributes user_ids attributes 
 client id N user_ids attributes Profile Data Table profile_id N client id 1 client id 2 user_ids attributes user_ids attributes 
 client id N user_ids attributes : :
  • 12. CREATE TABLE profile_data_index ( client_id varchar, partition int, user_id varchar, user_id_value varchar, profile_id uuid, PRIMARY KEY ((client_id, partition), user_id, user_id_value)‹ ); Profile Data Index Table
  • 13. client_id + partition user_id 1, user_id_value 1 user_id 1, user_id_value 2 user_id 2, user_id_value 2 profile_id A profile_id B profile_id C 
 Profile Data Index Table user_id N, user_id_value N profile_id N partition = hash(user_id_name + user_id_value) MOD partition_max
  • 14. - multiple customer profiles are connected into single identity - connect identities across different clients Identity Graph
  • 15. CREATE TABLE identity_graph (‹ identity_id uuid,‹ client varchar,‹ profile_id map<uuid, timestamp>‹ PRIMARY KEY (identity_id, client)‹ ); Identity Graph Table
  • 16. identity_id client id 1 client id 2 client id 3 {profile id 1, profile id 2} {profile id 4, profile id 5, profile id 6} {profile id 7, profile id 8} 
 Identity Graph Table client id N {set of profile ids}
  • 17. "The best way to approach data modeling for Cassandra is to start with your queries and work backwards from there. Think about the actions your application needs to perform, how you want to access the data, and then design column families to support those access patterns." Final thoughts on modeling
  • 18. - size-tiered compaction - slower compactions due to wide rows - wide row limits (100MB or 100,000 elements) - monitor wide row size - monitor your sstable count - in_memory_compaction_limit_in_mb, - your widest row should be able to fit in memory - avoids slow 2 pass disk-based compaction - set_compaction_throughput Wide Rows and Compactions
  • 19. - caches are good - unless your wide rows are too big - be aware of your cache hits - key_cache_size_in_mb - row_cache_size_in_mb Wide Rows and Caches
  • 20. - slower reads can affect your FAST writes - handle your read and writes paths separately - if chaining reads and writes do it async Mixing Reads and Writes
  • 21. - it is awesome to grow - our wide rows became too wide - 256 partitions was not enough - solution: rebuild the index with more partitions - index re-builder Index Rebuild
  • 22. - we tuned our GC settings for low latency and to prevent full GC as much as possible - know your GC settings - monitor your GC latency - it has an impact on your cassandra performance - monitor your traffic patterns during troubling GC latencies - using CMSIncrementalMode to break up the concurrent GC phases into short bursts of activity - every environment different, you will have to GC and Cassandra
  • 23.
  • 24. - upgrading to i2.2xlarge - virtual nodes, increase replication factor Ring Migration/Upgrade
  • 25. Migration Old Ring Identity Service "Migration" Script New Ring READ & WRITE WRITE ONLY Real Time Traffic READ WRITE