Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Is MongoDB Right For Your Project (or Organization)
1. Is MongoDB Right for Your Project?
(...or your organization)?
Tony Bibbs
2. I’m Nobody Special. Seriously.
• Routine Geek Stuff
• Crappy code + 3NF databases
• These days, less crappy code
• Still using 3NF databases
3. About The GForge Group
• Not-so-crappy code
• Rigid, 3NF PostgreSQL DB
• ORM helps some
• If it won’t fit on paper....
• ...it won’t fit in a slide -->
4. RDMS are like light beer You can party with it but ......
8. Who can guess the exact name of the #1 process
hogging CPU on PHP/PostgreSQL apps. PHP/
MySQL apps?
9. My Path to NoSQL
•I knew RDBMS was a problem
•Likes of FB, Twitter, Guardian
•Far better geeks than I using it
•I wanted to iterate faster
•“The Dream” of Web Scale
10. Basic Types of NoSQL Solutions
• Key-Value (Memcache)
• Column-Tab (Cassandra, HBase)
• Graph Databases (Neo4j, VertexDB)
• Document Databases (CouchDB, MongoDB)
• (Never lose sight of RDBMS which are adding NoSQL features)
12. What attracted me to MongoDB
• Low ramp
• Document oriented applications (most things web facing)
• Trivial single server setup
• Native PHP driver (Java/Scala, .NET and about 9 others)
• Easy to make PHP use MongoDB as session store
• Fluent Query API
• Atomic operations despite not being ACID
13. What hooked me on MongoDB
• Large file support with GridFS
• MapReduce support when basic native query interface won’t do
• Cloud Friendly
• Native Geospatial Support (think location-based applications)
• Single Master to sharded cluster with zero downtime
• Fluid schemas and it’s all JSON (BSON, really)
14. MongoDB Ain’t All Roses
• Record limit of 4MB in 1.6.x and 16MB in 1.8.x
• Use 64-bit systems (or don’t even bother)
• Lack of full text search
• Not ACID
• Writes can be problematic (which can be overcome)
• Backup and Disaster Recovery
15. What I Considered Before MongoDB
• Started with Cassandra (influenced by Twitter)
• Gave HBase/Hadoop a try (influenced by T8 Webware)
• Realized a) I have different needs and b) HBase people are smarter than I am
16. Walk Over, Limp Back
• Take Physical DBA Approach
• Know Internals of MongoDB
• Know hardware (RAM)
• Know schema design
• OOID <> GUID
17. MongoDB Plans at The GForge Group
• New feedback application
• Possible heterogeneous use of
PostgreSQL and MongoDB
• GeoSpatial Mobile Applications
• Internal CRM Application
19. Tony Bibbs
tony@gforgegroup.com
http://gforgegroup.com
@tonybibbs and @gforgegroup
20. Credits
• Steve Francia at 10gen
• Light Beer - http://www.flickr.com/photos/iandavid
• Bar - http://www.flickr.com/photos/mediafury
• Ving Rhames - http://www.flickr.com/photos/16180154@N07
Hinweis der Redaktion
This is about if MongoDB is best for your project. But we&#x2019;ll try and lead you to other alternatives if needed.\n
- Aerospace\n- Ag\n- Marketing\n- Public Service\n- GForge\n\nLAMP but continuously dabbling in other stuff.\n
Diagram represents the rigidity of RDBMS in most organizations.\n\nRelated things broken out across multiple pages so either you need to create multiple logical models focusing on one part of business or be patient connecting all the dots and lines.\n\nAnd this is just the design aspect...we haven&#x2019;t even touched SQL to access this crap.\n
It works, but you have to have a lot of it and even then you are left feeling bloated.\n\nShould be noted that &#x201C;If it ain&#x2019;t broke, don&#x2019;t fix it&#x201D;. F- dogma. Most of the best geeks I know find a way to dabble with new technologies in ways that allow for true evaluation. Strive for this. Startups may not be able to do this...innovation is, at times, about risk.\n
Get a lot more people drunk faster. \n
Four of these things are DB or DB related. WTF...\n\nIt&#x2019;s expensive to find developers that can master all these\n\nWe haven&#x2019;t even added the complication of the Cloud (e.g. AWS)\n
- Index verification\n- Page Size Optimizations (who does this anymore?)\n- Query caching\n- Denormalizing data (blow mind of classically trained DBAs)\n- Cluster (and insane sys admin&#x2019;ing)\n- Sharding (use Digg example) Requires expensive DBMS or custom API for fluent access to shards.\n
This is where I get mad at PHP haters. Performance problems in all my apps have started at the database. PHP has seldom been the problem. Find a new reasons to hate PHP...performance ain&#x2019;t it.\n\nSo despite APC, Memcache, denormalizing, query caching, etc the DB is still the biggest problem.\n
Started with a &#x201C;hunch&#x201D; things could be better.\n\nRSS feeds started filling up with NoSQL chatter.\n\nI admit I&#x2019;m not working for FB, Twitter, etc but that&#x2019;s the Web Scale &#x201C;dream&#x201D;\n\nFaster iteration over conventional RDBMS web development...particularly for proof of concepts/prototyping.\n
Are there others?\n\nHow may people are using NoSQL in production? Which flavor?\n\nAny quick plugs for those solutions?\n
Thrift - anybody here like thrift? \n\nThrift interface for Cassandra == fail. Might as well write assembler.\n\nWanted native PHP support if possible, not a PHP/Thrift wrapper.\n
In PHP Mongo is a quick PECL extension. Installs in one command on most sane Linux installs (and, happily, on OSX)\n\nMongo install is almost as easy. \n\nAPI is dead simple for an idiot like myself.\n\nAlmost everything I do is document based. In stark contrast to companies that do a lot of batch processing or high volume transaction processing. That&#x2019;s a BigTable problem so you&#x2019;ll want to explore other NoSQL solutions.\n\nI only mention the session thing because if you are using something that, at it&#x2019;s very core, is distributed then first thing on todo list is session handling.\n
Admittedly, I haven&#x2019;t used GridFS but it&#x2019;s in production use on large scale system for storing things like audio and video.\n\nMapReduce is BigTable type stuff. It&#x2019;s the only way to do real analytics (read: complicated queries) in NoSQL systems. I&#x2019;ve avoided this to date but looks simple enough in MongoDB. HBase/Hadoop...not so much (but same basic concept).\n\nGeoSpatial support allows distance-based queries that are intuitive. Initial benchmarks show it performs much better than doing similar queries in MySQL.\n\nFluid schemas means each record in a document collection can have entirely different data. This can be good or bad. Good because it is managed in code bad because nothing but code can enforce schema changes. ORM still has a role here, IMHO.\n\nJSON is synonymous with API&#x2019;s and having DB support it natively means no special work to get data into JSON\n
You can do full text search in Mongo using other tools like Lucene if required.\n\nWhile not ACID has atomic operators like push that allow you to, say, push multiple comments to a blog post without fear of collisions. In other words, no need to worry about programming around concurrency issues.\n\n\n
Cassandra was fine but Thrift interface sucked. Hard. Getting data in was fine but query interface was much lower level. Example query on email to get ID. Query on ID to get blog posts. This can be done with one call in MongoDB\n\nHBase/Hadoop is mind blowing. Store data in HBase, Hadoop for MapReduce, ZooKeeper...more moving parts, more complicated but meant to give high performant, non-relational solution for high volumn processing (talking trillions of records). In short I may not be smart enough for HBase.\n\nKey is to know what your organization needs, how the NoSQL databases differ and be sure to find up-to-date input on big organizations using it. NoSQL teams are iterating quickly so limitations are quickly resolved so be sure to stay current.\n
This is really true of all NoSQL implementations. Don&#x2019;t do it and you&#x2019;ll be blaming the tool, limping back to your boss.\n\nGreat example is there is a right and wrong way to shut MongoDB down. Wrong way can lead to data loss...only takes once and it&#x2019;s easy to blame the tool\n\nMongoDB wants to work in RAM. You want it to work in RAM, too. Many times you&#x2019;ll want to query an item before updating it to make sure updates is as fast as possible.\n\nShould know how they use the network. I admit I&#x2019;m weak here but I&#x2019;m aware of it so I know when to bounce things of network geeks.\n\nSchema design even in &#x201C;schema-less&#x201D; environments is still key. Takes some getting use to to\n\nOOID = timestamp + machine ID + process ID + counter\n