tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
Massimo Brignoli, MongoDB Inc
The presentation will illustrate what MongoDB is, the advantages of the document based approach and some of the use cases where MongoDB is a perfect fit.
4. MongoDB Inc. Overview
300+ employees
Offices in New York, Palo Alto, Washington
DC, London, Dublin, Barcelona and Sydney
600+ customers
Over $231 million in funding
6. MongoDB Inc. Products and
Services
Subscriptions
MongoDB Enterprise, On-Prem Monitoring, Professional Support
and Commercial License
Consulting
Expert Resources for All Phases of MongoDB Implementations
Training
Online and In-Person for Developers and Administrators
MongoDB Monitoring Service
Cloud-Based Service for Monitoring, Alerts, Backup and Restore
10. Document Model Benefits
• Agility and flexibility
– Data models can evolve easily
– Companies can adapt to changes quickly
• Intuitive, natural data representation
– Developers are more productive
– Many types of applications are a good fit
• Reduces the need for joins, disk seeks
– Programming is more simple
– Performance can be delivered at scale
14. MongoDB is full featured
Rich Queries
• Find Paul’s cars
• Find everybody in London with a car
built between 1970 and 1980
MongoDB
{
Geospatial
• Find all of the car owners within 5km of
Trafalgar Sq.
Text Search
• Find all the cars described as having
leather seats
Aggregation
• Calculate the average value of Paul’s
car collection
Map Reduce
• What is the ownership pattern of colors
by geography over time? (is purple
trending up in China?)
first_name: „Paul‟,
surname: „Miller‟,
city: „London‟,
location: [45.123,47.232],
cars: [
{ model: „Bentley‟,
year: 1973,
value: 100000, … },
{ model: „Rolls Royce‟,
year: 1965,
value: 330000, … }
}
}
15. Shell and Drivers
Drivers
Drivers for most popular
programming languages and
frameworks
Java
Ruby
JavaScript
Python
Shell
Command-line shell for interacting
directly with database
Perl
Haskell
> db.collection.insert({company:“10gen”,
product:“MongoDB”})
>
> db.collection.findOne()
{
“_id”
: ObjectId(“5106c1c2fc629bfe52792e86”),
“company”
: “10gen”
“product”
: “MongoDB”
}
20. Availability Considerations
High Availability – Ensure application availability during many
types of failures
Disaster Recovery – Address the RTO and RPO goals for
business continuity
Maintenance – Perform upgrades and other maintenance
operations with no application downtime
21. Replica Sets
• Replica Set – two or more copies
• “Self-healing” shard
• Addresses many concerns:
- High Availability
- Disaster Recovery
- Maintenance
22. Replica Set Benefits
Business Needs
Replica Set Benefits
High Availability
Automated failover
Disaster Recovery
Hot backups offsite
Maintenance
Rolling upgrades
Low Latency
Locate data near users
Workload Isolation
Read from non-primary replicas
Data Privacy
Restrict data to physical location
Data Consistency
Tunable Consistency
24. Single Data Center
Primary – A
Primary – B
Primary – C
Secondary – B
Secondary – A
Secondary – A
• Automated failover
• Tolerates server failures
• Tolerates rack failures
Secondary – C
Secondary – C
Secondary – B
• Number of replicas
defines failure tolerance
25. Active/Standby Data Center
Primary – A
Primary – B
Primary – C
Secondary – B
Secondary – C
Secondary – A
Secondary – A
Data Center - West
• Tolerates server and rack failure
• Standby data center
Secondary – B
Secondary – C
Data Center - East
26. Active/Active Data Center
Primary – A
Primary – B
Primary – C
Secondary – A
Secondary – B
Secondary – C
Secondary – C
Secondary – A
Secondary – B
Secondary – B
Secondary – C
Secondary – A
Arbiter – A
Data Center - West
Arbiter – B
Arbiter – C
Data Center - Central
Data Center - East
• Tolerates server, rack, data center failures, network
partitions
36. Schema Design Challenge
• Flexibility
– Easily adapt to new requirements
• Agility
– Rapid application development
• Scalability
– Support large data and query volumes
49. How are they different? Why?
Contact
Contact
•
•
•
•
name
company
title
phone
Address
•
•
•
•
street
city
state
zip_code
• name
• company
• adress
address
• Street
street
• City
city
• State
state
• Zip
zip_code
• title
• phone
52. Many to Many
Traditional Relational Association
Join table
Groups
name
X
GroupContacts
group_id
contact_id
Use arrays instead
Contacts
name
company
title
phone
54. Groups
Contacts
• name
• name
• company
• title
N
N
1
1
Portraits
• mime_type
• data
twitter
•
•
•
•
addresses N
1
name
location
web
bio
thumbnail 1
• mime_type
• data
•
•
•
•
•
type
street
city
state
zip_code
phones N
• type
• number
emails N
• type
• address
Document model - holistic and efficient representation
57. 360-Degree Patient View
• Healthcare provider networks have massive
amounts of patient data
–
–
–
–
Both structured and unstructured
Basic patient informations
Lab results
MRI images
• Centralization of data needed
– Aggregation of all the data in one repository
• Analytics
58. Population Management for At-Risk
Demographics
• Certain populations are known to be prone to certain
diseases.
• Analyzing data insurers help people take
preventative measures
– reminding them to get regularly scheduled colonoscopies
• Help insurers to reduce costs and to expand
margins,
59. Lab Data Management and
Analytics
• Strain on traditional technological systems:
– Rise of number of tests conducted
– Rise of variety of data collected
– Lack of flexibility
• With MongoDB‟s flexible data model, providers of
lab testing, genomics and clinical pathology can:
– Ingest, store and analyze a variety of data types
– Coming from numerous sources all in a single data store
• enables these companies to generate new insights
and revenue streams
60. Other use cases for MongoDB in
healthcare include:
• Fraud Detection
• Remote Monitoring and Body Area Networks
• Mobile Apps for Doctors and Nurses
• Pandemic Detection with Real-Time Geospatial
Analytics
• Electronic Healthcare Records (EHR)
• Advanced Auditing Systems for Compliance
• Hospital Equipment Management and Optimization
Dotted line is the natural boundary of what is possible today. Eg, ORCL lives far out on the right and does things nosql vendors will ever do. These things come at the expense of some degree of scale and performance.NoSQL born out of wanting greater scalability and performance, but we think they overreacted by giving up some things. Eg, caching layers give up many things, key value stores are super fast, but give up rich data model and rich query model.MongoDB tries to give up some features of a relational database (joins, complex transactions) to enable greater scalability and performance. You get most of the functionality – 80% - with much better scalability and performance. Start with rdbms, ask what could we do to scale – take out complex transactions and joins. How? Change the data model. >> segue to data model section.May need to revise the graphic – either remove the line or all points should be on the line.To enable horizontal scalability, reduce coordination between nodes (joins and transactions). Traditionally in rdbms you would denormalize the data or tell the system more about how data relates to one another. Another way, a more intuitive way, is to use a document data model. More intuitive b/c closer to the way we develop applications today with object oriented languages, like java,.net, ruby, node.js, etc. Document data model is good segue to next section >> Data Model
Here we have greatly reduced the relational data model for this application to two tables. In reality no database has two tables. It is much more common to have hundreds or thousands of tables. And as a developer where do you begin when you have a complex data model?? If you’re building an app you’re really thinking about just a hand full of common things, like products, and these can be represented in a document much more easily that a complex relational model where the data is broken up in a way that doesn’t really reflect the way you think about the data or write an application.
Segue – Rich queries, text search, geospatial, aggregation, mapreduce are types of things you can build based on the richness of the query model. More on that in just a moment.