The Ultimate Guide to Choosing WordPress Pros and Cons
MongoSF 2011 - Using MongoDB for IGN's Social Platform
1. Using MongoDB for IGN’s Social Platform MongoSF Tuesday May 24th, 2011
2. Agenda About Architecture MongoDB Usage ActivityStreams Configuration, Monitoring, Maintenance Backup Tools Lessons Learned, Next steps
3. About About IGN We have the largest audience of gamers in the world Over 70M Monthly Uniques About IGN’s Social Platform: An API to connect gamer community with editors, games, other gamers, and help lay the foundation for premium content discovery as well as UGC Launched Sept 2010 ~7M activities 30M API calls per day (24h), ~9ms response times
4. Architecture REST based API, built in Java Entities are People, MediaItems, Activities, Comments, Notifications, Status Interfaces across IGN.com as well as other social networks Caching tier based on memcached MySQL and MongoDB as persistence PHP/Zend front end
5. MongoDB Usage Activity Streams : ActivityStrea.ms standard Activity Caching : (more on this later!) Activity Commenting Points, Leaderboards : Also extend to badges Block lists, Ban lists Notifications for conversations Analytics : Activity snapshot for a user
6. Challenges with ActivityStreams Lots of data! Large amount of data coming out as a result Reverse sorting The data has to be sorted in reverse natural order ($natural : -1), and we do not use capped collections Aggregation of similar activities Impacts pagination Fetching self activities (profile), and newsfeed (self + friends) Filtering based on the activity type People want to see Game Updates or Blog updates from their friends Hydration of activities for dynamic data The thumbnail and level of the actor or commenter may change Activity Comments When an activity is rendered, the initial comments and count has to be pulled ($slice). Not having a $sizeOf type operator hurts. No Embedding or References We build data on the fly as a part of hydration process
7. Caching using MongoDB Caching the entire streams A bad idea (or bad implementation?) The expired objects sat in the db, bloating the database The removal did not free up space, so we ran out Batch removals clogged the slaves Use Mongo as a cache-key-index Cache the streams in Memcached For invalidation, keep the index of the memcached keys in MongoDB. Works!
8. Configuration Server: 1 Master, 2 Slaves (load balanced thru Netscalar) 2 extra slaves which are not queried (replicate!!) Version 1.6.1 1.8.1 with Journaling is being tested in Stage Clients: Java Driver (2.1) Ruby Driver (1.2) Mappers: Morphia for Java, MongoMapper for Ruby Connections per host : 200, #hosts = 4 Oplog Size: 1GB, gives us ~272 hours Syncdelay: 60s (default) Hardware: 2 core, 6 GB virtualized machine
9. Monitoring Slow Query Logs after every new build Nagios TCP Port Monitoring Disk space monitoring CPU monitoring Munin Mongo connections Memory usage Ops/second Write Lock % Collection Sizes (in terms of # of documents) MMS Started using it 2 weeks ago as a beta customer
10. Maintenance Data defragmentation Slaves – by running it on different port Master – by having a downtime Collection trimming The scripts block during remove Bulk removes kills the slaves, spiking CPU 100%
11. Backup or prepping for O S***! NetApp Filter based, snapshots Make sure to do {fsync:1} and {lock:1} on one slave Hourly dumps via a cron job Using mongodump Incremental backup via the oplog Replay the oplog instead of relying on a snapshot Delayed slaves Not recommended as it almost guarantees data loss proportional to the delay, which is inversely proportional to the time-to-react
12. Tools to be familiar with mongostat Look at queue lengths, memory, connections and operation mix db.serverStatus() Server status with sync, pagefaults, locks, index misses atop iostat/vm_stat db.stats() Overall info at the database level db.<coll_name>.stats() Overall info at the collection level db.printReplicationInfo() Info about the oplog size andlength in time db.printSlaveReplicationInfo() Info about the master, the last sync timetamp, and how behind the slave is from the master. The delays could be no writes on the master if the numbers look wonky.
13. What we’ve learned Keep an eye on Page Faults Index misses Queue lengths Write Lock % Database sizes on disk due to reuse vs. release Use .explain() Watch for nscanned and indexBounds Use limit() when using find While updating, try to load that object in memory so that its in the working set (findAndModify) Try to keep the fields being selected at a minimum Do not use writeconcerns Elegant schema design might bite you – design for performance and ease of programming Write to multiple collections instead of doing mapreduce operations
14. Next Steps Move to replica sets on 1.8.1 Move relationship graphs to MongoDB Shard the relationships based on the userId Run multiple mongo processes, splitting out collections among multiple databases Fan-out architecture instead of queries – using HornetQ and Scala (Akka)
16. About Me Manish Pandit Engineering Manager, API Platform IGN Entertainment @lobster1234
17. We are hiring Software Engineers to help us with exciting initiatives at IGN Technologies we use RoR, Java (no J2EE!), Scala, Spring, Play! Framework PHP/Zend, JQuery, HTML5, CSS3, Sencha Touch, PhoneGap MongoDB, memcached, Redis, Solr, ElasticSearch NewRelic for monitoring, 3Scale for Open APIs http://corp.ign.com/careers @ignjobs