8. Global MongoDB Community
41,000+
Monthly Unique Downloads
24,000+
Online Education Registrants
12,000+
MongoDB User Group Members
10,000+
Annual MongoDB Days Attendees
15. Organizations are becoming frustrated using a
RDBMS.
Productivity decreases Productivity
• Needed to add new software
layers of ORM, Caching,
Sharding, Message Queue
• Polymorphic, semi-structured
and unstructured data not well
supported
Costs Cost of database increases
• Vertical, not horizontal, scaling
• High cost of SAN
20. MongoDB is a scalable, high-performance NoSQL
database.
• Open source, written in C++ • Full featured indexes, query
• Document-oriented Storage language
– Based on JSON Documents • Replication & High Availability
– Schema-less
• Auto-sharding
21. Relational Database Challenges
Data Types Agile Development
•Unstructured data •Iterative
•Semi-structured data •Short development cycles
•Polymorphic data •New workloads
Volume of Data New Architectures
•Petabytes of data •Horizontal scaling
•Trillions of records •Commodity servers
•Tens of millions of queries per second •Cloud computing
21
22. Volume of Data
Volume of Data
•Petabytes of data
•Trillions of records
•Millions of queries per second
22
23. Data Types
{
_id : ObjectId("4c4ba5e5e8aabf3"),
Data Types
employee_name: "Dunham, Justin",
department : "Marketing",
•Unstructured data
•Semi-structured data
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
•Polymorphic data
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
23
24. Agile Development
Agile Development
•Iterative
•Short development cycles
•New workloads
24
25. MongoDB Use Cases
Content Management Operational Intelligence
E-Commerce User Data Management High Volume Data Feeds
26. Problem Why MongoDB Impact
A need to extract value from
A need to extract value from Built around scalability, with
Built around scalability, with Priority Moments project is
Priority Moments project is
existing semi-structured
existing semi-structured auto-sharding features
auto-sharding features a strong success
a strong success
data sources (social
data sources (social mongoDB deployment
mongoDB deployment Subsequent adoption of
Subsequent adoption of
networks etc.)
networks etc.) architecture prevents any
architecture prevents any mongoDB by O2 &
mongoDB by O2 &
A fast-growing customer-
A fast-growing customer- single point of failure
single point of failure Telefonica across a large
Telefonica across a large
base required any solution
base required any solution Geospatial indexing out-of-
Geospatial indexing out-of- number of projects
number of projects
to be easily scalable
to be easily scalable the-box enables location-
the-box enables location-
based service delivery
based service delivery
“Selecting MongoDB as our database platform was a no brainer as the technology offered us the flexibility
and scalability that we knew we’d need for Priority Moments.”
Andrew Pattinson, Head of Online Delivery
27. Problem Why MongoDB Impact
RDBMS architecture
RDBMS architecture Flexible data model allows
Flexible data model allows The Guardian has
The Guardian has
constrained their ability to
constrained their ability to for heterogeneous structure
for heterogeneous structure competitive advantage,
competitive advantage,
absorb upstream
absorb upstream Rich query language
Rich query language through enabling social
through enabling social
contributions from users
contributions from users preserves functionality
preserves functionality conversations through the
conversations through the
New features, competitions
New features, competitions System updates with zero
System updates with zero site
site
needed to log data into user
needed to log data into user downtime
downtime Interactive features can be
Interactive features can be
records, requiring schema
records, requiring schema Ease of use, allowing a large
Ease of use, allowing a large delivered more quickly,
delivered more quickly,
changes
changes development team to adopt
development team to adopt which translates to
which translates to
the technology quickly
the technology quickly increased revenues
increased revenues
“Relational databases have a sound approach, but that doesn’t necessarily match the way we see our data.
mongoDB gave us the flexibility to store data in the way that we understand it as opposed to somebody’s
theoretical view.”
Philip Wills, Software Architect
28. New Architectures
New Architectures
•Horizontal scaling
•Commodity servers
•Cloud computing
28
31. Best Total Cost of Ownership
(TCO)
Developer and Ops Savings
•Less code
•More productive development
•Easier to maintain
Hardware Savings
•Commodity servers
•Internal storage (no SAN)
•Scale out, not up
Software and Support Savings
•No upfront license – pay for value DB Alternative
over time
•Cost visibility for usage growth
32. Relational Database Challenges
Data Types Agile Development
•Unstructured data •Iterative
•Semi-structured data •Short development cycles
•Polymorphic data •New workloads
Volume of Data New Architectures
•Petabytes of data •Horizontal scaling
•Trillions of records •Commodity servers
•Tens of millions of queries per second •Cloud computing
33
33. For Developers / Architects
What Values, For Which Audience?
• Agility / Flexibility
Schema-Free
Easy to get started
• Performance
Often a significant improvement over
RDBMS
• Features
Rich-Query Language, Aggregation
Framework, Map-Reduce
34
34. For Operations For Which Audience?
What Values,
• Automation & Scaling
Sharding
High-Availability
• Resilience, DR
Write-Concerns give granular control,
across data-centers
35
35. For Executives For Which Audience?
What Values,
• Competitive Advantage
Faster time-to-market
Accessible real-time analytics
Flexible (low-risk) deployments
• Commodity Infrastructure
Lower TCO than proprietary RDBMS
36
38. 2.2 Overview
• Concurrency: yielding + db level locking
• New aggregation framework
• TTL Collections
• Improved free list implementation
• Tag aware sharding
• Read Preferences
• http://docs.mongodb.org/manual/release-
notes/2.2/
39. 2.4 Roadmap
• Security
– SASL, Kerberos, Additions to privileges and auditing
• Hash-based Sharding
• Geospatial Indexing: query intersecting polygons
• Aggregation framework: faster and more features
• V8, background secondary indexing, replica set
flapping
• Distribute non-sharded collections throughout cluster
• MMS running in your own data center (separate)
Ok, so here are the presenters notes. Your first job is to add you name and other useful stuff so that your students can contact you afterwards. This is a good time to - introduce yourself - create a seating chart, get each student to say their name, company and what they want to learn... and write it on your seating chart
Note: Growth refers to year-to-date revenue based on our fiscal years for 2011 and 2012, i.e., it compares Feb-Oct 2011 (calendar year) to Feb-Oct 2012 (calendar). These figures are unaudited and subject to change.
Ok, so here are the presenters notes. Your first job is to add you name and other useful stuff so that your students can contact you afterwards. This is a good time to - introduce yourself - create a seating chart, get each student to say their name, company and what they want to learn... and write it on your seating chart
Ok, so here are the presenters notes. Your first job is to add you name and other useful stuff so that your students can contact you afterwards. This is a good time to - introduce yourself - create a seating chart, get each student to say their name, company and what they want to learn... and write it on your seating chart
A highlight of some key features in 2.4. . . . We ’ll add more details and more items each month as we work towards a winter release. Security: SASL is a framework for authentication that helps decouple specific authentication mechanisms from client/server implementation. This framework will permit working with a variety of authentication mechanisms, initially we ’ll build in kerberos. We may add others over time, but SASL implementation will make it much easier for you to add your own without having to implement a new client. Kerberos is quite common, so we ’ll build that one in first. With additional authentication, we want to take a few steps to separate out activities authorized to various users. Separate read, read/write, security administration, database-specific (compact, validate, etc.), and server/cluster administration (fsync, log rotate, shutdown, create database, etc.). This is just an initial step in our authorization work. Hash-based sharding Apply a hash function to a selected key as the shard key. Evenly spread documents in a sharded cluster. Evenly spread the work associated with queries in a sharded cluster. Will minimize migrations (should only happen when growing a cluster). Note: this is something you can do now, but not automatic. Geospatial index resolution: Talk about challenge of specifying some polygon and finding overlap with another polygon in a document, this becomes interesting for location-aware applications, intelligence community. Replica set flapping: avoid electing a new primary due to a falsely detecting that the current primary went down. Adding mechanisms to reduce false detections. This is good for heavy load and network issues/blips in a data center.
A highlight of some key features in 2.4. . . . We ’ll add more details and more items each month as we work towards a winter release. Security: SASL is a framework for authentication that helps decouple specific authentication mechanisms from client/server implementation. This framework will permit working with a variety of authentication mechanisms, initially we ’ll build in kerberos. We may add others over time, but SASL implementation will make it much easier for you to add your own without having to implement a new client. Kerberos is quite common, so we ’ll build that one in first. With additional authentication, we want to take a few steps to separate out activities authorized to various users. Separate read, read/write, security administration, database-specific (compact, validate, etc.), and server/cluster administration (fsync, log rotate, shutdown, create database, etc.). This is just an initial step in our authorization work. Hash-based sharding Apply a hash function to a selected key as the shard key. Evenly spread documents in a sharded cluster. Evenly spread the work associated with queries in a sharded cluster. Will minimize migrations (should only happen when growing a cluster). Note: this is something you can do now, but not automatic. Geospatial index resolution: Talk about challenge of specifying some polygon and finding overlap with another polygon in a document, this becomes interesting for location-aware applications, intelligence community. Replica set flapping: avoid electing a new primary due to a falsely detecting that the current primary went down. Adding mechanisms to reduce false detections. This is good for heavy load and network issues/blips in a data center.