1. MongoDB talk
A brief introduction to MongoDB
Inhaltsverzeichnis
WhatÂŽs MongoDB? .................................................................................................................................. 2
Whoâs behind MongoDB ..................................................................................................................... 2
Licence ................................................................................................................................................. 2
Main key features:............................................................................................................................... 3
More cool features .............................................................................................................................. 3
How should or should it not be used? .................................................................................................... 4
Use Cases ............................................................................................................................................. 4
Well Suited ...................................................................................................................................... 4
Less Well Suited ............................................................................................................................... 4
Schema design basics .......................................................................................................................... 5
Replication + Replica Sets + Sharding .................................................................................................. 5
Replication ....................................................................................................................................... 5
Master-slave .................................................................................................................................... 6
Replica sets ...................................................................................................................................... 7
Sharding ........................................................................................................................................... 8
2. WhatÂŽs MongoDB?
âMongoDB (from "humongous") is an open source document-oriented database system
developed and supported by 10gen. It is part of the NoSQL family of database systems.
Instead of storing data in tables as is done in a "classical" relational database, MongoDB
stores structured data as JSON-like documents with dynamic schemas (MongoDB calls the
format BSON), making the integration of data in certain types of applications easier and
faster. â
(Wikipedia)
Whoâs behind MongoDB
o 10Gen, New York City
Licence
- Database:
o Free Software Foundation's GNU AGPL v3.0.
o Commercial licenses are also available from 10gen, including free evaluation licenses.
- Drivers:
o mongodb.org supported drivers: Apache License v2.0.
o Third parties have created drivers too; licenses will vary there.
- Documentation:
o Creative Commons.
3. Main key features:
- Speed:
o No expensive Joins thanks to documents
o Fast small insert (Slow at writing large documents. Can be bypassed when reading
from slaves is allowed.)
o Fast In-Place Updates
ï§ Atomic modifiers for contention-free performance
- Out scaling:
o Easy setup
o Sharding distributes load to different machines
ï§ Auto-Sharding
ï§ Scale horizontally without compromising functionality
- Data safety
o Replica Sets:
ï§ Your data isnât lost, when a master drops out
o Replication & High Availability
More cool features
- Document-oriented storage
o JSON-style documents with dynamic schemas offer simplicity and power.
- Full Index Support
o Index on any attribute, just like you're used to.
- Querying
o Rich, document-based queries.
o Map/Reduce
ï§ Flexible aggregation and data processing.
- GridFS
o Store files of any size without complicating your stack.
- Commercial Support
o Enterprise class support, training, and consulting available
- API
o C
o C++
o C#
o Haskell
o Java
o JavaScript
o Lisp
o Pearl
o PHP
o Python
o Ruby
o Scala
o Some further (not official) language APIs
4. How should or should it not be used?
Use Cases
Well Suited
- Archiving and event logging
o Real-time stats/analytics
- Document and Content Management Systems
o As a document-oriented (JSON) database, MongoDB's flexible schemas are a good fit
for this.
- Ecommerce
o Several sites are using MongoDB as the core of their ecommerce infrastructure
(often in combination with an RDBMS for the final order processing and accounting).
- Gaming.
o High performance small read/writes are a good fit for MongoDB. Also for certain
games geospatial indexes can be helpful.
- High volume problems.
o Problems where a traditional DBMS might be too expensive for the data in question.
In many cases developers would traditionally write custom code to a file system
instead using flat files or other methodologies.
- Mobile.
o specifically the server-side infrastructure of mobile systems. Geospatial key here.
- Operational data store of a web site MongoDB is very good at real-time inserts, updates, and
queries. Scalability and replication are provided which are necessary functions for large web
sites' real-time data stores. Specific web use case examples:
o content management
o comment storage, management, voting
o user registration, profile, session data
- Projects using iterative/agile development methodologies.
o Mongo's BSON data format makes it very easy to store and retrieve data in a
document-style / "schema less" format. Addition of new properties to existing
objects is easy and does not generally require blocking "ALTER TABLE" style
operations.
Less Well Suited
- Systems with a heavy emphasis on complex transactions such as banking systems and
accounting. These systems typically require multi-object transactions, which MongoDB
doesn't support. It's worth noting that, unlike many "NoSQL" solutions, MongoDB does
support atomic operations on single documents. As documents can be rich entities; for many
use cases, this is sufficient.
- Traditional Non-Realtime Data Warehousing. Traditional relational data warehouses and
variants (columnar relational) are well suited for certain business intelligence problems â
especially if you need SQL to use client tools (e.g. MicroStrategy) with the database. For
cases where the analytics are realtime, the data very complicated to model in relational, or
where the data volume is huge, MongoDB may be a fit.
- Problems requiring SQL.
5. Schema design basics
Traditional RDBMS store their data normalized.
- Pro:
o No data redundancy
o Joins and data aggregation from different sources
- Contra:
o Hard to read
o Data have to be gathered from different sources
Document stored data
In a document riven database, all the data are stored in so called documents.
Example: all data for a recipe are stored in one document. Using a normalized storage, we would
have to split the data in different tables:
- Recipe
- Ingredients
- âŠ
- Pro:
o Human readable
o Easy and fast accessible data
- Contra:
o Redundant data
o No join --> you have to get used to map/reduce or similar frameworks
Replication + Replica Sets + Sharding
Replication
6. Master-slave
- MongoDB supports master-slave replication. A master can perform reads and writes. A slave
copies data from the master and can only be used for reads or backup (not writes).
- MongoDB allows developers to guarantee that an operation has been replicated to at least N
servers on a per-operation basis.
- As operations are performed on the master, the slave will replicate any changes to the data.
7. Replica sets
Replica sets are similar to master-slave, but they incorporate the ability for the slaves to elect a new
master if the current one goes down.
8. Sharding
MongoDB scales horizontally using a system called sharding which is very similar to the BigTable and
PNUTS scaling model. The developer chooses a shard key, which determines how the data in a
collection will be distributed. The data is split into ranges (based on the shard key) and distributed
across multiple shards.
The developer's application must know that it is talking to a sharded cluster when performing some
operations. For example, a "findAndModify" query must contain the shard key if the queried
collection is sharded. The application talks to a special routing process called `mongos` that looks
identical to a single MongoDB server. This `mongos` process knows what data is on each shard and
routes the client's requests appropriately. All requests flow through this process: it not only forwards
requests and responses but also performs any necessary final data merges or sorts. Any number of
`mongos` processes can be run: usually one per application server is recommended.