Mongodb meetup

MongoDB & EC2: A Love Story?

Eytan Daniyalzade
@daniyalzade
http://bit.ly/cb_mongodb_meetup

Contents

● Chartbeat
● Architecture
● MongoDB & EC2 Challenges
● Happy Ending: (MongoDB ? EC2)
● Takeaways

Chartbeat: real-time analytics service

● 18 person startup in New York
● part of Betaworks
● peaking at just under 5M concurrents daily
○ up from 1M in July/2010

What chartbeat Provides

● real-time view of site performance

○ top pages

○ new/returning visitors

○ traffic flow
■ where are people coming from
■ where are people going to

● historic replay for the last 30 days

Architecture, Browser

Part 1:
<head>
<script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>
...

Part 2:
...
function loadChartbeat() {
// insert script tag
}
window.onload = loadChartbeat;
</body>
(highly simplified)

Ping is standard beacon logic, i.e. loading a 1x1 image.

Architecture, Backend

● custom libevent-based C backend
○ real-time collection and aggregation

● real-time system in-memory only

● background queue jobs snapshot every x minutes
○ Gearman

● historical data
○ mostly in MongoDB

Why Chartbeat uses MongoDB

● Pure JSON all along
○ Live API
○ Historical data
○ No mapping back and forth

● Fast Inserts (fire and forget)

● Flexible Schema

Why Chartbeat uses EC2

● Elastic Capacity

● No trips to datacenter

● EBS snapshots

Chartbeat & MongoDB & EC2 (1)

● 3 Clusters
○ 1 for each product
○ 1 as a caching layer
○ 2 - 4 instance/cluster

● m2-2xlarge
○ 34.2 GB merory
○ Ubuntu 10.04
○ RAID0 x 4 - 1 TB volumes

● Dedicated Snapshot Server
○ Shared among clusters
○ Serves as an arbiter as well

Chartbeat & MongoDB & EC2 (2)
Cluster View

MongoDB & EC2 Challenges

● Instances disappear
○ MongoDB can have long recovery operations
○ MongoDB is (was) not ACID compliant. Unclean
shutdown could corrupt your data.

● Poor IO performance on EBS
○ MongoDB has global read/write lock

● Variable IO performance on EBS
○ Could cause replication issues

Instances Disappearing - Master/Slave

● Down-time :(

● Slave-promotion = headache
○ New instance
○ Copy oplog
○ Code change
○ Long/manual/error prone

Instances Disappearing - Replica Sets

● No down-time :) yay!

● Automatic failover on writes

● Eventual failover on reads

● No code change

Instances Disappearing - Replica Sets
(caveats)
● pymongo driver reads/writes from primary
○ pymongo 2.1 will fix this

● chartbeat pymongo driver
○ based on MasterSlaveConnection
○ writes to primary
○ distribute reads among secondaries
○ automatic failover
○ eventual read re-distribution

Instances Disappearing - Fact of Life
● Accept this fact of life

● Always snapshot
○ Dedicated snapshot server
○ Hidden, i.e. no reads

● Automate everything
○ puppet
■ New instance from scratch within a minute
○ python-boto
■ Script all EC2 interaction
■ new_instance.py
■ mount_volumes_from_snap.py -o iid -n iid
■ snapshot_mongo.py

Instances Disappearing - Caveats

● New volumes - slow!!!
○ EBS loads blocks lazily

● Warm up EBS & File Cache before use
○ Options
■ Slowly direct the reads (app by app)
■ Run cache warm-up scripts
○ Not automated currently

Poor IO Performance on EBS

● XFS & RAIDing Helps

but,

● Disk IO varies over time

● MongoDB holds global lock on writes

● Query of death
○ Grinding-halt if not careful

Case Study: Historical Data
● For historical data, we store time series.
{
key:<key>
ts:<key>
values: {metric1: int1, metric2: int2}
meta:{}
}
● High Insert Rate vs Fast Historical Read
○ Optimize reads or writes?
● Fast inserts: ~1 MB/sec (through append only)
○ No disk-seek
● Historical reads: painfully slow

Faster Reads Through Cache DB
● Avoid reading from disk
● Favor reads over writes
● Aim for disk & memory locality
{day_tskey:<key>values: {metric1: list(int), metric2: list(int)}
}

● Data for historical reads resides together

● .append() to list could cause disk fragmentation

Avoid Fragmentation w/ Preallocation
● Fragmentation causes:
○ Inefficient disk usage
○ Slower writes (due to block allocation)
● Preallocate daily arrays instead
○ Pros:
■ No fragmentation
■ Write causes no change in data size
○ Cons:
■ Wasteful (we don't know keys ahead of time)
■ Requires heavy disk IO, ~7MB/sec (~60Mbis/sec on EBS)

● Conclusion: spread preallocation over 1 hour

EC2 Performance is Unpredictable

EC2 Unpredictability - Challenges

● Resource contention in virtualized environment

● EBS and Network IO performance varies drastically

● RAID0 over 4 disks = 4 x risk

Heavy Monitoring (1)
● Track individual disk performance over time

● Create a new instance if disk not getting better

● Monitor replication lag

● Remove from read mix if lag gets too high
○ Incorrect data
○ Strain on primary

● Track slow queries / opcounts / track page faults / IO
volume
○ Tweak indexes accordingly
○ Limit requested data size if you can

Open Issues

● More granular page-fault / memory usage information
○ Difficult due to mmap

● Multi-datacenter usage

● Burn-in scripts

● Sharding
○ Tipping point will be insert volume
○ Or inefficient read memory usage

● Better understand replication failures

Take-aways (1)

● Automate everything
○ Instance creation, snapshotting, mount/unmount
● Strive for high locality & low fragmentation
● Repeatedly revise schema/index
● Heavily monitor
○ Server: IO/mem/disk
○ MongoDB: Opcounts/Index Hits/Slow queries
○ Cluster: Replication lag
○ Application: CRUD times

Questions?

Slides: http://bit.
ly/cb_mongodb_meetup

Mongodb meetup

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Andere mochten auch

Andere mochten auch (13)

Ähnlich wie Mongodb meetup

Ähnlich wie Mongodb meetup (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Mongodb meetup