MongoDB to Cassandra

MongoDB to Cassandra
The Atlas Odyssey

Fred van den Driessche Tom McAdam Adam Horwich
Engineer CTO Systems Engineer
@fredvdd @tfm @Mmmkayness

http://ﬂickr.com/photos/dhammza/88644497/

Our platform - late 2012

tbc tbc

MetaBroadcast platform

Video and audio metadata Proﬁles and activity from video and
from 20+ sources Analytic requests and groupings
audio products, social networks

Main clients Main Partners

Data Partners

What is Atlas?
/content
BBC
/schedules

/topics
PA

ATLAS

C4
sitemaps

radioplayer
etc... DB
interlinking

Atlas Data Model

brand item

series version

broadcast location

MongoDB

• ﬂexible

• features

• really simple

• shell

Where MongoDB falls short

• too simple

• lack of control

• sharding

• embedding

Where to?

• add a cache?

Atlas API
• content

• http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/
b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82

• http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/
b0074g7p&annotations=description,brand_summary,locations

• schedules

• http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus.
3h&channel=bbcone&publisher=bbc.co.uk

• http://atlas.metabroadcast.com/3.0/schedule.json?
from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk

• api explorer http://atlas.metabroadcast.com/#apiExplorer

Why Cassandra?

•scalability/performance

• row caches

• consistency control

• column-based model matches our use case

And?

• ElasticSearch

• messaging

• tooling: bootstraps

What is Atlas?
BBC
Data ingest
server DB
PA

C4
Update bus HTTP server

etc...

ES

Data model
• columns to model annotations

• secondary indexes
• index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM).

from(segment.getCanonicalUri()).
to(segment.getIdentifier()).
index().execute(requestTimeout, TimeUnit.MILLISECONDS);

ID generation
• give external data our own ID on ingest

• needs to be user-friendly:
http://www.radiotimes.com/programme/cf2/eastenders

• mongo: findAndModify()

• solution: uses Astyanax client with its distributed locking

• more details: http://metabroadcast.com/blog/let-
cassandra-identify-your-data

Where we’re at

• already live with some data

• alpha release of schedule endpoint coming soon

• later: roll out across other endpoints

Ops in Cassandra

• we love Puppet
• it’s great for automation and deployment

• MongoDB: 1 ﬁle

• Cassandra: 2 ﬁles!

• oh... tokens

Cassandra Tokens

• deﬁne where data is written to
in a cluster

• therefore balanced tokens =
balanced cluster

• tokens should be rack aware
• tools available to provide appropriate tokens
for you

Cassandra plays nicely with AWS

• datacentre / rack aware
• AWS Region = Datacentre

• AWS Availability Zone = Rack

• only recently introduced in MongoDB but simple to
implement in Cassandra

• horizontally (and vertically) scalable

Monitoring

• Nagios is a little threadbare for Cassandra
• basic TCP service check

• stats from API not very helpful

• nodetool and CLI tools useful
• manual effort to integrate them

• if only there was some useful service...

OpsCenter

• wonderful for an overview
• not so much for alerting ;)

• ohai API
• can integrate metrics into Nagios

Disaster Recovery

• we operate a 4 node cluster presently
• replication factor of 3 with quorum read/writes

• DR complicated by tokens

• cluster should be balanced

• snapshot + S3 Backups

Cluster Happiness and Headaches

• little maintenance overhead

• cluster rebalancing
• uncommon maintenance procedure

• schema changes are cumbersome
• little scope for rollback, can put cluster in unrecoverable state

Summary

• Mongo is good, Atlas has outgrown it

• Cassandra isn’t a drop-in replacement

• Ops more complex but so far so good

MongoDB to Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to MongoDB to Cassandra

Similar to MongoDB to Cassandra (20)

Recently uploaded

Recently uploaded (20)

MongoDB to Cassandra