SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
A Cassandra Data Model for Serving up 
Cat Videos 
Luke Tillman (@LukeTillman) 
Language Evangelist at DataStax
Who are you?! 
• Evangelist with a focus on the .NET Community 
• Long-time Developer 
• Recently presented at Cassandra Summit 2014 with Microsoft 
• Very Recent Denver Transplant 
2
1 What is this KillrVideo thing you speak of? 
2 Know Thy Data and Thy Queries 
3 Denormalize All the Things 
4 Building Healthy Relationships 
5 Knowing Your Limits 
3
What is this KillrVideo thing you speak of? 
4
KillrVideo, a Video Sharing Site 
• Think a YouTube competitor 
– Users add videos, rate them, comment on them, etc. 
– Can search for videos by tag
See the Live Demo, Get the Code 
• Live demo available at http://www.killrvideo.com 
– Written in C# 
– Live Demo running in Azure 
– Open source: https://github.com/luketillman/killrvideo-csharp 
• Interesting use case because of different data modeling 
challenges and the scale of something like YouTube 
– More than 1 billion unique users visit YouTube each month 
– 100 hours of video are uploaded to YouTube every minute 
6
Just How Popular are Cats on the Internet? 
7 
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Just How Popular are Cats on the Internet? 
8 
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Know Thy Data and Thy Queries
Getting to Know Your Data 
• What things do I have in the system? 
• What are the relationships between them? 
• This is your conceptual data model 
• You already do this in the RDBMS world
Some of the Entities and Relationships in KillrVideo 
11 
User 
id 
firstname 
lastname 
email 
password Video 
id 
name 
description 
location 
preview_image 
tags features 
Comment 
id 
comment 
adds 
timestamp 
posts 
timestamp 
1 
n 
n 
1 
1 
n 
n 
m 
rates 
rating
Getting to Know Your Queries 
• What are your application’s workflows? 
• How will I access the data? 
• Knowing your queries in advance is NOT optional 
• Different from RDBMS because I can’t just JOIN or create a new 
indexes to support new queries 
12
Some Application Workflows in KillrVideo 
13 
User Logs 
into site 
Show basic 
information 
about user 
Show videos 
added by a 
user 
Show 
comments 
posted by a 
user 
Search for a 
video by tag 
Show latest 
videos added 
to the site 
Show 
comments 
for a video 
Show ratings 
for a video 
Show video 
and its 
details
Some Queries in KillrVideo to Support Workflows 
14 
Users 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
Comments 
Show 
comments 
for a video 
Find comments by 
video (latest first) 
Show 
comments 
posted by a 
user 
Find comments by 
user (latest first) 
Ratings 
Show ratings 
for a video Find ratings by video
Some Queries in KillrVideo to Support Workflows 
15 
Videos 
Search for a 
video by tag Find video by tag 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
Show video 
and its 
details 
Find video by id 
Show videos 
added by a 
user 
Find videos by user 
(latest first)
Denormalize All the Things
Breaking the Relational Mindset 
• Disk is cheap now 
• Writes in Cassandra are FAST 
• Many times we end up with a “table per query” 
• Take advantage of atomic batches to write to multiple tables 
• Similar to materialized views from the RDBMS world 
17
Users – The Relational Way 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
• Single Users table with all user data and an Id Primary Key 
• Add an index on email address to allow queries by email
Users – The Cassandra Way 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
CREATE 
TABLE 
user_credentials 
( 
email 
text, 
password 
text, 
userid 
uuid, 
PRIMARY 
KEY 
(email) 
); 
CREATE 
TABLE 
users 
( 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
created_date 
timestamp, 
PRIMARY 
KEY 
(userid) 
);
Videos Everywhere! 
20 
Show video 
and its 
details 
Find video by id 
Show videos 
added by a 
user 
Find videos by user 
(latest first) 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
user_videos 
( 
userid 
uuid, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
(userid, 
added_date, 
videoid) 
) 
WITH 
CLUSTERING 
ORDER 
BY 
( 
added_date 
DESC, 
videoid 
ASC);
Videos Everywhere! 
Considerations When Duplicating Data 
• Can the data change? 
• How likely is it to change or how frequently will it change? 
• Do I have all the information I need to update duplicates and 
maintain consistency? 
21 
Search for a 
video by tag Find video by tag 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first)
Building Healthy Relationships
Modeling Relationships – Collection Types 
• Cassandra doesn’t support JOINs, but your data will still have 
relationships (and you can still model that in Cassandra) 
• One tool available is CQL collection types 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
);
Modeling Relationships – Client Side Joins 
24 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
); 
Currently requires query for video, 
followed by query for user by id based 
on results of first query 
CREATE 
TABLE 
users 
( 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
created_date 
timestamp, 
PRIMARY 
KEY 
(userid) 
);
Modeling Relationships – Client Side Joins 
• What is the cost? Might be OK in small situations 
• Do NOT scale 
• Avoid when possible 
25
Modeling Relationships – Client Side Joins 
26 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
... 
user_firstname 
text, 
user_lastname 
text, 
user_email 
text, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
users_by_video 
( 
videoid 
uuid, 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
PRIMARY 
KEY 
(videoid) 
); 
or
Modeling Relationships – Client Side Joins 
• Remember the considerations when you duplicate data 
• What happens if a user changes their name or email address? 
• Can I update the duplicated data? 
27
Knowing Your Limits
Cassandra Rules Can Impact Your Design 
• Video Ratings – use counters to track sum of all ratings and 
count of ratings 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
... 
rating_counter 
counter, 
rating_total 
counter, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
video_ratings 
( 
videoid 
uuid, 
rating_counter 
counter, 
rating_total 
counter, 
PRIMARY 
KEY 
(videoid) 
); 
• Counters are a good example of something with special rules
Single Nodes Have Limits Too 
• Latest videos are bucketed by 
day 
• Means all reads/writes to latest 
videos are going to same 
partition (and thus the same 
nodes) 
• Could create a hotspot 
30 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
CREATE 
TABLE 
latest_videos 
( 
yyyymmdd 
text, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
(yyyymmdd, 
added_date, 
videoid) 
) 
WITH 
CLUSTERING 
ORDER 
BY 
( 
added_date 
DESC, 
videoid 
ASC 
);
Single Nodes Have Limits Too 
• Mitigate by adding data to the 
Partition Key to spread load 
• Data that’s already naturally a 
part of the domain 
– Latest videos by category? 
• Arbitrary data, like a bucket 
number 
– Round robin at the app level 
31 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
CREATE 
TABLE 
latest_videos 
( 
yyyymmdd 
text, 
bucket_number 
int, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
( 
(yyyymmdd, 
bucket_number) 
added_date, 
videoid) 
) 
...
Questions? 
32 
Follow me on Twitter for updates or to ask questions later: @LukeTillman

Weitere ähnliche Inhalte

Ähnlich wie Cassandra Data Model for Cat Video Site

YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Gina Montgomery, V-TSP
 
Microsoft Stream - The Video Evolution
Microsoft Stream - The Video EvolutionMicrosoft Stream - The Video Evolution
Microsoft Stream - The Video EvolutionColin Phillips
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
CBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxCBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxOrtus Solutions, Corp
 
ITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven DevelopmentITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven DevelopmentOrtus Solutions, Corp
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBigDataExpo
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity frameworkMaxim Shaptala
 
Itproadd 01 60 minute version
Itproadd 01 60 minute versionItproadd 01 60 minute version
Itproadd 01 60 minute versionTarique_1
 
BUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media ServicesBUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media ServicesMingfei Yan
 
Best Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company VideoBest Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company VideoMichael Greth
 
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft StreamECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft StreamEuropean Collaboration Summit
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Derek Jacoby
 
October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101Eric Sembrat
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationC4Media
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like systemAree Oh
 
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...AwsReinventSlides
 

Ähnlich wie Cassandra Data Model for Cat Video Site (20)

YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
 
Database History From Codd to Brewer
Database History From Codd to BrewerDatabase History From Codd to Brewer
Database History From Codd to Brewer
 
Microsoft Stream - The Video Evolution
Microsoft Stream - The Video EvolutionMicrosoft Stream - The Video Evolution
Microsoft Stream - The Video Evolution
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
CBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxCBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBox
 
ITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven DevelopmentITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven Development
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
 
Itproadd 01 60 minute version
Itproadd 01 60 minute versionItproadd 01 60 minute version
Itproadd 01 60 minute version
 
BUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media ServicesBUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media Services
 
Best Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company VideoBest Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company Video
 
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft StreamECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful Degradation
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
 

Mehr von DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Cassandra Data Model for Cat Video Site

  • 1. A Cassandra Data Model for Serving up Cat Videos Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. Who are you?! • Evangelist with a focus on the .NET Community • Long-time Developer • Recently presented at Cassandra Summit 2014 with Microsoft • Very Recent Denver Transplant 2
  • 3. 1 What is this KillrVideo thing you speak of? 2 Know Thy Data and Thy Queries 3 Denormalize All the Things 4 Building Healthy Relationships 5 Knowing Your Limits 3
  • 4. What is this KillrVideo thing you speak of? 4
  • 5. KillrVideo, a Video Sharing Site • Think a YouTube competitor – Users add videos, rate them, comment on them, etc. – Can search for videos by tag
  • 6. See the Live Demo, Get the Code • Live demo available at http://www.killrvideo.com – Written in C# – Live Demo running in Azure – Open source: https://github.com/luketillman/killrvideo-csharp • Interesting use case because of different data modeling challenges and the scale of something like YouTube – More than 1 billion unique users visit YouTube each month – 100 hours of video are uploaded to YouTube every minute 6
  • 7. Just How Popular are Cats on the Internet? 7 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 8. Just How Popular are Cats on the Internet? 8 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 9. Know Thy Data and Thy Queries
  • 10. Getting to Know Your Data • What things do I have in the system? • What are the relationships between them? • This is your conceptual data model • You already do this in the RDBMS world
  • 11. Some of the Entities and Relationships in KillrVideo 11 User id firstname lastname email password Video id name description location preview_image tags features Comment id comment adds timestamp posts timestamp 1 n n 1 1 n n m rates rating
  • 12. Getting to Know Your Queries • What are your application’s workflows? • How will I access the data? • Knowing your queries in advance is NOT optional • Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries 12
  • 13. Some Application Workflows in KillrVideo 13 User Logs into site Show basic information about user Show videos added by a user Show comments posted by a user Search for a video by tag Show latest videos added to the site Show comments for a video Show ratings for a video Show video and its details
  • 14. Some Queries in KillrVideo to Support Workflows 14 Users User Logs into site Find user by email address Show basic information about user Find user by id Comments Show comments for a video Find comments by video (latest first) Show comments posted by a user Find comments by user (latest first) Ratings Show ratings for a video Find ratings by video
  • 15. Some Queries in KillrVideo to Support Workflows 15 Videos Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first) Show video and its details Find video by id Show videos added by a user Find videos by user (latest first)
  • 17. Breaking the Relational Mindset • Disk is cheap now • Writes in Cassandra are FAST • Many times we end up with a “table per query” • Take advantage of atomic batches to write to multiple tables • Similar to materialized views from the RDBMS world 17
  • 18. Users – The Relational Way User Logs into site Find user by email address Show basic information about user Find user by id • Single Users table with all user data and an Id Primary Key • Add an index on email address to allow queries by email
  • 19. Users – The Cassandra Way User Logs into site Find user by email address Show basic information about user Find user by id CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 20. Videos Everywhere! 20 Show video and its details Find video by id Show videos added by a user Find videos by user (latest first) CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC);
  • 21. Videos Everywhere! Considerations When Duplicating Data • Can the data change? • How likely is it to change or how frequently will it change? • Do I have all the information I need to update duplicates and maintain consistency? 21 Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first)
  • 23. Modeling Relationships – Collection Types • Cassandra doesn’t support JOINs, but your data will still have relationships (and you can still model that in Cassandra) • One tool available is CQL collection types CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 24. Modeling Relationships – Client Side Joins 24 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); Currently requires query for video, followed by query for user by id based on results of first query CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 25. Modeling Relationships – Client Side Joins • What is the cost? Might be OK in small situations • Do NOT scale • Avoid when possible 25
  • 26. Modeling Relationships – Client Side Joins 26 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... user_firstname text, user_lastname text, user_email text, PRIMARY KEY (videoid) ); CREATE TABLE users_by_video ( videoid uuid, userid uuid, firstname text, lastname text, email text, PRIMARY KEY (videoid) ); or
  • 27. Modeling Relationships – Client Side Joins • Remember the considerations when you duplicate data • What happens if a user changes their name or email address? • Can I update the duplicated data? 27
  • 29. Cassandra Rules Can Impact Your Design • Video Ratings – use counters to track sum of all ratings and count of ratings CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); CREATE TABLE video_ratings ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); • Counters are a good example of something with special rules
  • 30. Single Nodes Have Limits Too • Latest videos are bucketed by day • Means all reads/writes to latest videos are going to same partition (and thus the same nodes) • Could create a hotspot 30 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (yyyymmdd, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC );
  • 31. Single Nodes Have Limits Too • Mitigate by adding data to the Partition Key to spread load • Data that’s already naturally a part of the domain – Latest videos by category? • Arbitrary data, like a bucket number – Round robin at the app level 31 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, bucket_number int, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY ( (yyyymmdd, bucket_number) added_date, videoid) ) ...
  • 32. Questions? 32 Follow me on Twitter for updates or to ask questions later: @LukeTillman