SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Building an Activity
Feed with Cassandra
Mark Dunphy, Software Engineer
Behance/Adobe
@dunphtastic
Disclaimer
Not an operations person.
Will pretend to be one for the purpose of this talk.
Quick Overview
What is the Behance Activity Feed?
• Actions
• Comments, Appreciations, Etc
• Entities
• Projects, Works in Progress
• Actors
• Users
Project Entity
Actions taken
by actors
Activity Fan Out
User A publishes
a new project
Write to Follower A’s feed
Write to Follower B’s feed
Write to Follower C’s feed
Write to Follower D’s feed
Now that that’s over…
MongoDB
2011
• Smaller user base (~340,000).
• Built very quickly. Worked well at the time.
• Not well researched.
Fast forward to 2014
• Frequent node failures
• Heavy disk fragmentation caused by deletes
• Slow reads from disk. Started storing in RAM.
• Primary -> Secondary caused downtime for
some.
• Scaled out vertically and horizontally.
Why Cassandra?
• Riak
• Very close. Community seemed lacking.
• Redis
• No native cluster. Too much maintenance.
• Memcached/MySQL
• Too much complex app logic.
Cassandra Wins.
• Fantastic community. #cassandra on Twitter
• Easy to read documentation
• Linearly scalable. Easy to grow cluster.
• Low maintenance overhead for ops team.
• Handles time series data very well.
Learning
• Cassandra Summit 2014
• Other team in Adobe
• Long nights reading documentation
Our Data
• Ephemeral
• “Source of truth” lives in a MySQL database
• Okay with *some* data loss
Our Rules
• User’s feed is comprised of entities with one set
of actions
• User’s feed only contains one of any given entity
• An entity’s set of actions contains up to seven of
the most recent actions taken by that user’s
network
Planning
Language Support
• Most services on Behance are PHP
• No official Datastax PHP driver
–Mark Dunphy, 2014
“Looks like I’m learning python.”
Go to Production
No, nothing is working yet. I didn’t skip a slide.
• App/cluster in production before anything works
• Test real life load
• Fail spectacularly without anybody noticing
• Deploy risky changes without fear
• Run alongside MongoDB
January 19th, 2015
Query Patterns
• “Create your data models based on the queries
you want to run” - Basically Everybody
• Wanted to…
• Read a user’s feed entities by type and time of
most recent action…separately.
• Write/Update a user’s feed entities with new
actions while knowing only user id and entity id
Data Models
–Mark Dunphy, January 2015
“An UPDATE in Cassandra works like an
UPSERT! Let’s store the user’s entire feed in a
single row in a table! It’s so simple!”
First Data Model
CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TYPE activity.entity (
entity_type_id int,
entity_id int
);
CREATE TABLE activity.project_actions (
modified_on timestamp,
entity_id int,
user_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, entity_id)
)
CREATE TABLE activity.feeds (
modified_entities list<frozen<entity>>,
modified_on timestamp,
project_ids list<int>,
user_id int,
wip_revision_ids list<int>,
PRIMARY KEY(user_id)
)
First Data Model
First Data Model
Moments Before Everything Exploded
–Mark Dunphy, January 2015
“Okay let’s keep nearly the same model, but
use INSERT and DELETE instead of always
UPDATE. Just use batch statements.”
Second Data Model
Second Data Model
This was also a very very bad idea.
• Lose the benefit of Cassandra being distributed
• All queries go through the same coordinator
which puts a lot of stress and responsibility on
one node.
• Use concurrency and prepared statements
instead. Datastax drivers make this easy.
Second Data Model
Second Data Model
Oops
Okay…
Now we’ve got it.
Winning Data Model
CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TABLE activity.projects (
created_on timestamp,
user_id int,
entity_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, created_on, entity_id)
)
CREATE TABLE activity.project_actions (
modified_on timestamp,
entity_id int,
user_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, entity_id)
)
Much Nicer
Write Strategy
• “User A comments on Project A. User B follows
User A.”
• Request out to add the comment action to User
B’s feed
• Read existing actions for that entity (Project A) in
B’s feed. Push new action on top.
• Write new actions list into new “row” in projects
table
Read Strategy
• SELECT * FROM projects WHERE user_id
= 123 AND created_on > 123214373
• Optimized for quick/easy reads. More important
that a user’s feed loads quickly than it updating
quickly.
• Use timestamp to “page” through data.
Lessons Learned
• Duplicate your data to achieve desired queries.
Storage is cheap. Writes are cheap.
• Think outside the box. Cassandra is not
relational.
• Never ever ever ignore inserts/deletes in favor of
an update only workflow. Never. It is literally
insane.
Final Specs
• 16 node cluster on AWS EC2 c3.8xlarge
• Mix of SizeTieredCompactionStrategy and
DateTieredCompactionStrategy
• NetworkTopologyStrategy
• Replication factor 3
• ConsistencyLevel = ONE for most requests
Final Specs
• Bursty write volume. Consistent read volume.
• 5k to 80k writes per second
• 2k to 4k reads per second
Questions?
I might have answers.
Thank you!
Mark Dunphy, Software Engineer
Behance/Adobe
@dunphtastic

Weitere ähnliche Inhalte

Was ist angesagt?

Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Spark Summit
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
Derek Collison
 

Was ist angesagt? (20)

Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Andere mochten auch

Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
Chris Clarke
 

Andere mochten auch (20)

Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
MongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic WebMongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic Web
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 

Ähnlich wie Building an Activity Feed with Cassandra

Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Thinkful
 

Ähnlich wie Building an Activity Feed with Cassandra (20)

All about that reactive ui
All about that reactive uiAll about that reactive ui
All about that reactive ui
 
2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
 
Bridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsBridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality Maps
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference Zurich
 
Cracking web development
Cracking web developmentCracking web development
Cracking web development
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)
 
Microservices: Architecture and Practice
Microservices:  Architecture and PracticeMicroservices:  Architecture and Practice
Microservices: Architecture and Practice
 
React state management with Redux and MobX
React state management with Redux and MobXReact state management with Redux and MobX
React state management with Redux and MobX
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Full stack conference talk slides
Full stack conference talk slidesFull stack conference talk slides
Full stack conference talk slides
 
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsSciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
 
306 belmont ssp08agileit
306 belmont ssp08agileit306 belmont ssp08agileit
306 belmont ssp08agileit
 
redpill Forensics
redpill Forensicsredpill Forensics
redpill Forensics
 
Drupal 8 Initiatives
Drupal 8 InitiativesDrupal 8 Initiatives
Drupal 8 Initiatives
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to python
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)
 
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
 
Gaurav agarwal
Gaurav agarwalGaurav agarwal
Gaurav agarwal
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Building an Activity Feed with Cassandra

  • 1. Building an Activity Feed with Cassandra Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic
  • 2. Disclaimer Not an operations person. Will pretend to be one for the purpose of this talk.
  • 3. Quick Overview What is the Behance Activity Feed?
  • 4.
  • 5. • Actions • Comments, Appreciations, Etc • Entities • Projects, Works in Progress • Actors • Users
  • 8. User A publishes a new project Write to Follower A’s feed Write to Follower B’s feed Write to Follower C’s feed Write to Follower D’s feed
  • 11. • Smaller user base (~340,000). • Built very quickly. Worked well at the time. • Not well researched.
  • 13. • Frequent node failures • Heavy disk fragmentation caused by deletes • Slow reads from disk. Started storing in RAM. • Primary -> Secondary caused downtime for some. • Scaled out vertically and horizontally.
  • 15. • Riak • Very close. Community seemed lacking. • Redis • No native cluster. Too much maintenance. • Memcached/MySQL • Too much complex app logic.
  • 17. • Fantastic community. #cassandra on Twitter • Easy to read documentation • Linearly scalable. Easy to grow cluster. • Low maintenance overhead for ops team. • Handles time series data very well.
  • 19. • Cassandra Summit 2014 • Other team in Adobe • Long nights reading documentation
  • 21. • Ephemeral • “Source of truth” lives in a MySQL database • Okay with *some* data loss
  • 23. • User’s feed is comprised of entities with one set of actions • User’s feed only contains one of any given entity • An entity’s set of actions contains up to seven of the most recent actions taken by that user’s network
  • 25. Language Support • Most services on Behance are PHP • No official Datastax PHP driver
  • 26. –Mark Dunphy, 2014 “Looks like I’m learning python.”
  • 27. Go to Production No, nothing is working yet. I didn’t skip a slide.
  • 28. • App/cluster in production before anything works • Test real life load • Fail spectacularly without anybody noticing • Deploy risky changes without fear • Run alongside MongoDB
  • 30. Query Patterns • “Create your data models based on the queries you want to run” - Basically Everybody • Wanted to… • Read a user’s feed entities by type and time of most recent action…separately. • Write/Update a user’s feed entities with new actions while knowing only user id and entity id
  • 32. –Mark Dunphy, January 2015 “An UPDATE in Cassandra works like an UPSERT! Let’s store the user’s entire feed in a single row in a table! It’s so simple!” First Data Model
  • 33. CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int ); CREATE TYPE activity.entity ( entity_type_id int, entity_id int );
  • 34. CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id) )
  • 35. CREATE TABLE activity.feeds ( modified_entities list<frozen<entity>>, modified_on timestamp, project_ids list<int>, user_id int, wip_revision_ids list<int>, PRIMARY KEY(user_id) )
  • 37. First Data Model Moments Before Everything Exploded
  • 38. –Mark Dunphy, January 2015 “Okay let’s keep nearly the same model, but use INSERT and DELETE instead of always UPDATE. Just use batch statements.” Second Data Model
  • 39. Second Data Model This was also a very very bad idea.
  • 40. • Lose the benefit of Cassandra being distributed • All queries go through the same coordinator which puts a lot of stress and responsibility on one node. • Use concurrency and prepared statements instead. Datastax drivers make this easy. Second Data Model
  • 45. CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int );
  • 46. CREATE TABLE activity.projects ( created_on timestamp, user_id int, entity_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, created_on, entity_id) )
  • 47. CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id) )
  • 49. Write Strategy • “User A comments on Project A. User B follows User A.” • Request out to add the comment action to User B’s feed • Read existing actions for that entity (Project A) in B’s feed. Push new action on top. • Write new actions list into new “row” in projects table
  • 50. Read Strategy • SELECT * FROM projects WHERE user_id = 123 AND created_on > 123214373 • Optimized for quick/easy reads. More important that a user’s feed loads quickly than it updating quickly. • Use timestamp to “page” through data.
  • 51. Lessons Learned • Duplicate your data to achieve desired queries. Storage is cheap. Writes are cheap. • Think outside the box. Cassandra is not relational. • Never ever ever ignore inserts/deletes in favor of an update only workflow. Never. It is literally insane.
  • 52. Final Specs • 16 node cluster on AWS EC2 c3.8xlarge • Mix of SizeTieredCompactionStrategy and DateTieredCompactionStrategy • NetworkTopologyStrategy • Replication factor 3 • ConsistencyLevel = ONE for most requests
  • 53. Final Specs • Bursty write volume. Consistent read volume. • 5k to 80k writes per second • 2k to 4k reads per second
  • 55. Thank you! Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic