SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
MONGODB TO
CASSANDRA

ARCHITECTURAL LESSONS
!

Jon Hadad & Blake Eggleston
Overview
Differences in DB Architectures
!

SHIFT Platform
!

SHIFT Media Manager
!

Intro to cqlengine
MongoDB Architecture
Important Concepts

•
•
•
•
•

replica set (master / slave)
shard (replica set within a cluster)
config server (topology)
mongos (router)
Shard key is an indexed field that
determines the shard a particular
document belongs to

!

sources: http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/, http://docs.mongodb.org/manual/core/sharding-shard-key/
Cassandra Architecture
• Only 1 type of server (Cassandra)
• Ring Based Replication (no master
or slave)
• No single point of failure
• Key hashes to a location in the ring
• Replication Factor (RF=3)
• Limited query flexibility (always
select by key)
• Each query has a consistency level
source: http://developer.rackspace.com/images/2013-03-27-rackspace-service-registry-status-update/vnodes.png
Cassandra Storage
• SSTables are immutable
• Each column includes a timestamp of when it was written
• The same column can exist for a given key in multiple
SSTables
• Deletes are written as tombstones
• SSTables are periodically merged (compaction)
• Compaction keeps the column with the latest timestamp
on conflicts

source: http://developer.rackspace.com/images/2013-03-27-rackspace-service-registry-status-update/vnodes.png
Cassandra Writes
• Writes are written to any node in

the cluster (the coordinator) which
figures out where it should go


• Writes are saved in memory to a
“memtable”, and written to a
commit log.


• Memtables are flushed to disk
periodically as SSTables.

source: http://www.datastax.com/docs/_images/write_access.png
Cassandra Reads
• Any server may be queried
• Acts as coordinator
• Data is pulled from SSTables and
merged
• Contacts nodes with the
requested key
• Performs read repair if necessary
• Reads are a more time consuming
operation than writes.

source: http://www.datastax.com/docs/_images/write_access.png
MongoDB Advantages
• Very Flexible Documents

• Very Flexible Queries

• Full text search (2.4)

• Aggregation Framework

• Geospatial Indexes / Queries

• Really good documentation
MongoDB Pitfalls
•
•
!

!

!

Many queries will route to entire
cluster
!

Overwriting documents / changing
doc sizes causes memory
fragmentation problems (db repair)

•

Query language is awkward for
humans

•

Queries that go to disk pay an
enormous penalty

•

Max size of 256GB per collection

source: https://blog.serverdensity.com/map-reduce-and-mongodb/
Cassandra Advantages
• Multi data center aware & reliable
• Fewer moving parts
• No DB / table locking
• Unbelievable with time series data (stats)
• Performance scales linearly as you add servers
• Optimized compaction options for traditional spinning
disks and SSDs
• Lots of control over how your data is stored on disk.
Cassandra Pitfalls
• Secondary Indexes have hidden costs
• Individual reads (single rows) are not as fast as other DBs
• JVM can be intimidating (GC)
• Data modeling requires more planning
• Generally need to construct a table per query you intend on
running
• Ad hoc queries or queries with lots of permutations can be
very difficult to model
• We complement Cassandra with Elastic Search for these types
of queries (also Solr & DS Enterprise are good choices)
Media Manager
Social Analytics
What is Media Manager?
• Ad buying and management tool for Facebook, Twitter

• We sync ~2 billion ad stats a month

• We roll up stats at multiple levels in real time

• 10 node C* cluster, AWS high I/O

• Peaked at 150K queries / second

• Approx 150GB of data, growing 10% / week
Real time Rollups
•
•

•

A single row per parent object type &
date


campaign
+date

ad1

ad2

ad3

stats

stats

stats

For any object (teams, folders,
campaign) we can perform a rollup for
a given date by accessing only a single
row. This limits our I/O and is
extremely efficient.

New ad stats are propagated up
immediately in rollups with very few
reads.

rollup

campaign1 campaign2

campaign3

folder+date
stats

stats

stats
Why Cassandra?
• Almost our entire DB is in our working set.

• We have rows on disk that are inconsistently
sized, so heuristics on doc size for
preallocation are not useful.


• We could not tolerate unpredictable query
behavior due to disk access.
SHIFT.com

Collaboration Platform
Real time Collaboration
• Build for Marketers

• Allows communication across departments and organizations

• 3rd Party Applications
Messaging
• Messages are fanned out to an entire team

• Teams may have hundreds of members

• Each member has perspectival view of their messages and
their own metadata on those messages (tags & unread)
Message Inbox
• When a message is sent or replied to, we
use insert a record with a timeuuid into a
persons stream which points to the
message.


• Timeuuids are stored on disk in reverse

user

timeuuid1

timeuuid2 timeuuid3

jon

msg1

msg2

msg3

blake

msg3

msg1

msg2

order of the embedded timestamp


• We can easily query the row for the first N
items in the users inbox


• We store multiple views as tags for each
user to quickly surface messages in
different contexts.
CQLENGINE
python CQL3 mapper
cqlengine features
• CQL3 Object Mapper for Python
• Supports Cassandra 1.2
• Builds queries supporting the following:
•
•
•
•
•
•

•
•
•

TTLs
Per Query Consistency
Blind Table Updates
Batch Queries
Counters
Maps, sets, lists
Schema management
Per table compaction settings
Table Polymorphism
Table Polymorphism
• In a single table we can have heterogenous objects
• We use this on Media Manager for Ad types
campaign

ad

type

1

1

page_post

1

2

mobile_ad

1

3

application_ad
Upcoming Features
• Work seamlessly with multiple clusters

• Native driver integration

• Key cache / row cache configuration

• Cassandra 2.0 features

• Third party plugins
• session
• flask
• identity map
THANK YOU
Jon

Blake

jon@shift.com
@rustyrazorblade

blake@shift.com
@beggleston

SANTA MONICA
310.310.8315

PALO ALTO
650.804.8319

NEW YORK
646.649.2972

www.shift.com

CHICAGO
312.465.2152

Weitere ähnliche Inhalte

Was ist angesagt?

C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
DataStax
 
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxWebinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
DataStax
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
DataStax
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 

Was ist angesagt? (20)

Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayCassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
 
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarWebinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
 
Webinar | Introducing DataStax Enterprise 4.6
Webinar | Introducing DataStax Enterprise 4.6Webinar | Introducing DataStax Enterprise 4.6
Webinar | Introducing DataStax Enterprise 4.6
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
 
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxWebinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
 
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 

Ähnlich wie Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
Chris Henry
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 

Ähnlich wie Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons (20)

MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
MongoDB
MongoDBMongoDB
MongoDB
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & Performance
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scale
 
Drop acid
Drop acidDrop acid
Drop acid
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Migrating from MySQL to MongoDB
Migrating from MySQL to MongoDBMigrating from MySQL to MongoDB
Migrating from MySQL to MongoDB
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
MongoDB
MongoDBMongoDB
MongoDB
 

Mehr von DataStax

Mehr von DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons

  • 2. Overview Differences in DB Architectures ! SHIFT Platform ! SHIFT Media Manager ! Intro to cqlengine
  • 3. MongoDB Architecture Important Concepts • • • • • replica set (master / slave) shard (replica set within a cluster) config server (topology) mongos (router) Shard key is an indexed field that determines the shard a particular document belongs to ! sources: http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/, http://docs.mongodb.org/manual/core/sharding-shard-key/
  • 4. Cassandra Architecture • Only 1 type of server (Cassandra) • Ring Based Replication (no master or slave) • No single point of failure • Key hashes to a location in the ring • Replication Factor (RF=3) • Limited query flexibility (always select by key) • Each query has a consistency level source: http://developer.rackspace.com/images/2013-03-27-rackspace-service-registry-status-update/vnodes.png
  • 5. Cassandra Storage • SSTables are immutable • Each column includes a timestamp of when it was written • The same column can exist for a given key in multiple SSTables • Deletes are written as tombstones • SSTables are periodically merged (compaction) • Compaction keeps the column with the latest timestamp on conflicts source: http://developer.rackspace.com/images/2013-03-27-rackspace-service-registry-status-update/vnodes.png
  • 6. Cassandra Writes • Writes are written to any node in the cluster (the coordinator) which figures out where it should go
 • Writes are saved in memory to a “memtable”, and written to a commit log.
 • Memtables are flushed to disk periodically as SSTables. source: http://www.datastax.com/docs/_images/write_access.png
  • 7. Cassandra Reads • Any server may be queried • Acts as coordinator • Data is pulled from SSTables and merged • Contacts nodes with the requested key • Performs read repair if necessary • Reads are a more time consuming operation than writes. source: http://www.datastax.com/docs/_images/write_access.png
  • 8. MongoDB Advantages • Very Flexible Documents
 • Very Flexible Queries
 • Full text search (2.4)
 • Aggregation Framework
 • Geospatial Indexes / Queries
 • Really good documentation
  • 9. MongoDB Pitfalls • • ! ! ! Many queries will route to entire cluster ! Overwriting documents / changing doc sizes causes memory fragmentation problems (db repair) • Query language is awkward for humans • Queries that go to disk pay an enormous penalty • Max size of 256GB per collection source: https://blog.serverdensity.com/map-reduce-and-mongodb/
  • 10. Cassandra Advantages • Multi data center aware & reliable • Fewer moving parts • No DB / table locking • Unbelievable with time series data (stats) • Performance scales linearly as you add servers • Optimized compaction options for traditional spinning disks and SSDs • Lots of control over how your data is stored on disk.
  • 11. Cassandra Pitfalls • Secondary Indexes have hidden costs • Individual reads (single rows) are not as fast as other DBs • JVM can be intimidating (GC) • Data modeling requires more planning • Generally need to construct a table per query you intend on running • Ad hoc queries or queries with lots of permutations can be very difficult to model • We complement Cassandra with Elastic Search for these types of queries (also Solr & DS Enterprise are good choices)
  • 13. What is Media Manager? • Ad buying and management tool for Facebook, Twitter
 • We sync ~2 billion ad stats a month
 • We roll up stats at multiple levels in real time
 • 10 node C* cluster, AWS high I/O
 • Peaked at 150K queries / second
 • Approx 150GB of data, growing 10% / week
  • 14. Real time Rollups • • • A single row per parent object type & date
 campaign +date ad1 ad2 ad3 stats stats stats For any object (teams, folders, campaign) we can perform a rollup for a given date by accessing only a single row. This limits our I/O and is extremely efficient.
 New ad stats are propagated up immediately in rollups with very few reads. rollup campaign1 campaign2 campaign3 folder+date stats stats stats
  • 15. Why Cassandra? • Almost our entire DB is in our working set.
 • We have rows on disk that are inconsistently sized, so heuristics on doc size for preallocation are not useful.
 • We could not tolerate unpredictable query behavior due to disk access.
  • 17. Real time Collaboration • Build for Marketers
 • Allows communication across departments and organizations
 • 3rd Party Applications
  • 18. Messaging • Messages are fanned out to an entire team
 • Teams may have hundreds of members
 • Each member has perspectival view of their messages and their own metadata on those messages (tags & unread)
  • 19. Message Inbox • When a message is sent or replied to, we use insert a record with a timeuuid into a persons stream which points to the message.
 • Timeuuids are stored on disk in reverse user timeuuid1 timeuuid2 timeuuid3 jon msg1 msg2 msg3 blake msg3 msg1 msg2 order of the embedded timestamp
 • We can easily query the row for the first N items in the users inbox
 • We store multiple views as tags for each user to quickly surface messages in different contexts.
  • 21. cqlengine features • CQL3 Object Mapper for Python • Supports Cassandra 1.2 • Builds queries supporting the following: • • • • • • • • • TTLs Per Query Consistency Blind Table Updates Batch Queries Counters Maps, sets, lists Schema management Per table compaction settings Table Polymorphism
  • 22. Table Polymorphism • In a single table we can have heterogenous objects • We use this on Media Manager for Ad types campaign ad type 1 1 page_post 1 2 mobile_ad 1 3 application_ad
  • 23. Upcoming Features • Work seamlessly with multiple clusters
 • Native driver integration
 • Key cache / row cache configuration
 • Cassandra 2.0 features
 • Third party plugins • session • flask • identity map
  • 24. THANK YOU Jon Blake jon@shift.com @rustyrazorblade blake@shift.com @beggleston SANTA MONICA 310.310.8315 PALO ALTO 650.804.8319 NEW YORK 646.649.2972 www.shift.com CHICAGO 312.465.2152