Talking about
MongoDB Intro & Fundamentals
Why MongoDB & Hadoop
Getting Started
Using MongoDB & Hadoop
Future of Big Data
Steve @sp
A
15+ years building
the internet
Father, husband,
skateboarder
Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
Company behind MongoDB
Offices in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic
Well Funded: Sequoia, Union Square, Flybridge
MongoDB
Application Document
Oriented
High { author : “steve”,
date : new Date(),
Performance
text : “About MongoDB...”,
tags : [“tech”, “database”]}
Fully
Consistent
Horizontally Scalable
MongoDB philosophy
Keep functionality when we can (key/value
stores are great, but we need more)
Non-relational (no joins) makes scaling
horizontally practical
Document data models are good
Database technology should run anywhere
virtualized, cloud, metal, etc
Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped files
i.e. read-through write-through
memory caching.
“
MongoDB has the best
features of key/value
stores, document
databases and
relational databases
in one.
John Nunemaker
Relational made normalized
data look like this
Category
• Name
• Url
Article
User • Name
Tag
• Name • Slug • Name
• Email Address • Publish date • Url
• Text
Comment
• Comment
• Date
• Author
Document databases make
normalized data look like this
Article
• Name
• Slug
• Publish date
User • Text
• Name • Author
• Email Address
Comment[]
• Comment
• Date
• Author
Tag[]
• Value
Category[]
• Value
CMS / Blog
Needs:
• Business needed modern data store for rapid development and
scale
Solution:
• Use PHP & MongoDB
Results:
• Real time statistics
• All data, images, etc stored together
easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth
Photo Meta-Data
Problem:
• Business needed more flexibility than Oracle could deliver
Solution:
• Use MongoDB instead of Oracle
Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
Customer Analytics
Problem:
• Deal with massive data volume across all customer sites
Solution:
• Use MongoDB to replace Google Analytics / Omniture options
Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features
Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding enabled horizontal scale
• Very happily looking at other places to use MongoDB
Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents
Solution:
• Switched from MySQL to MongoDB
Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efficiently) in
RDBMS
Solution:
• Switched from MySQL to MongoDB
Results:
• Massive simplification of code base
• Rapidly build, halving time to market (and cost)
• Eliminated need for external caching system
• 50x+ performance improvement over MySQL
Tons more
MongoDB casts a wide net
people keep coming up with
new and brilliant ways to use it
Applications have
complex needs
Use the best tool for the job
Often more than one tool is needed
MongoDB ideal operational database
MongoDB ideal for BIG data
Not a data processing engine
For heavy processing needs use tool designed
for that job ... Hadoop
MongoDB Map Reduce
MongoDB map reduce quite capable... but with limits
- Javascript not best language for processing map
reduce
- Javascript limited in external data processing
libraries
- Adds load to data store
- Sharded environments do parallel processing
MongoDB
Aggregation
Most uses of MongoDB Map Reduce were for
aggregation
Aggregation Framework optimized for aggregate
queries
Fixes some of limits of MongoDB MR
- Can do realtime aggregation similar to SQL GroupBy
- parallel processing on sharded clusters
MongoDB Map Reduce
Map()
MongoDB Data
Group(k)
emit(k,v)
map iterates on
documents
Document is $this
Sort(k)
1 at time per shard
Reduce(k,values)
k,v
Finalize(k,v)
Input matches output
k,v Can run multiple times
Hadoop Map Reduce
Runs on same
1 1
InputFormat Map (k , v , ctx) thread as map
Many map operations ctx.write(k2,v2) Combiner(k2,values2)
1 at time per input
split same as k 2, v 3
Mongo's emit
similar to
Mongo's reducer
similar to Partitioner(k2)
Mongo's group
Sort(keys2)
Reducer threads
similar to
Mongo's Finalize
Reduce(k3,values4)
Output Format Runs once per key
kf,vf
MongoDB & Hadoop
same as Mongo's Many map operations
MongoDB shard chunks (64mb) 1 at time per input split
Creates a list each split Map (k1,1v1,1ctx) Runs on same
of Input Splits Map (k ,1v ,1ctx) thread as map
each split Map (k , v , ctx)
single server or
sharded cluster (InputFormat) each split ctx.write(k2,v2)2
ctx.write(k2,v )2 Combiner(k2,values2)2
RecordReader ctx.write(k2,v ) Combiner(k2,values )2
Combiner(k2,values )
k2, 2v3 3
k , 2v 3
k ,v
Partitioner(k2)2
Partitioner(k )2
Partitioner(k )
Sort(keys2)
Sort(k2)2
Sort(k )
MongoDB
Reducer threads
Reduce(k2,values3)
Output Format Runs once per key
kf,vf
Google 2000
Google Inc, today announced it
has released the largest search
engine on the Internet.
Google’s new index, comprising
more than 1 billion URLs
Google 2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs
(and the number of individual web
pages out there is growing by
several billion pages per day).
BIG 2012 & Beyond
MongoDB enables us to scale
with the redefinition of BIG.
New processing tools like
Hadoop & Storm are enabling
us to process the new BIG.
MongoDB is
committed to
working with best
data tools including
Storm, Spark, &
more
http://spf13.com
http://github.com/s
@spf13
Question
download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
Hinweis der Redaktion
\n
10\n15\n10\n5\n
\n
\n
\n
\n
By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n