Matt Kalan, Senior Solutions Architect, MongoDB
Matt will explain how modern technology requirements have changed the requirements of the database. In order to handle agile development, big data, cloud, APIs, continuous availability, and unlimited scale while lowering costs, new capabilities are required. Do you need to tolerate the impedance mismatch between an object model and the relational model, or is there another way? We will walk through the application development process, to the code level, to compare using an RDBMS with MongoDB.
16. What If?
Instead of You Had
Pre-defined schema Dynamic schema determined by your object
Flat data model Object data model
One schema Multiple schemas possible
Each object spread across flat tables Each object stored together
Scaling up for better performance Easy to partition & scale horizontally
SAN required and app handling failover DB & driver handle auto-failover
Manual DB operations Built-in automated DB operations
Large up-front license and add-ons (replication,
partitioning, caching)
Freemium model
17. Dynamic, Object Data Model Stored Together
Relational MongoDB
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
number : “1-212-777-1213”,
type : “cell”
}]
}
Customer ID First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay Black Newark
3 Meagan White London
4 Edward Daniels Boston
Phone Number Type DNC Customer ID
1-212-555-1212 home T 0
1-212-555-1213 home T 0
1-212-555-1214 cell F 0
1-212-777-1212 home T 1
1-212-777-1213 cell (null) 1
1-212-888-1212 home F 2
18. HA & Scaling Out Built-in
Application
Driver
mongos
Primary
Secondary
Secondary
Customers
1-1000
Primary
Secondary
Secondary
Customers 1001-
1700
…
Primary
Secondary
Secondary
Customers 1701-
2500
High availability
- Replica sets
Horizontal scalability
- Sharding
… …
Query router, so data
can be auto-balanced
in background
19. MongoDB Compass MongoDB Connector for BI
MongoDB Enterprise Server
Automation & Productionizing
CommercialLicense
(NoAGPLCopyleftRestrictions)
Platform
Certifications
MongoDB Ops Manager
Monitoring &
Alerting
Query
Optimization
Backup &
Recovery
Automation &
Configuration
Schema Visualization
Data Exploration
Ad-Hoc Queries
Visualization
Analysis
Reporting
LDAP & Kerberos Auditing FIPS 140-2Encryption at Rest
REST APIEmergency
Patches
Customer
Success
Program
On-Demand
Online Training
Warranty
Limitation of
Liability
Indemnification
24x7Support
(1hourSLA)
20. Aligns with Microservices Design Patterns
API Layer
(Microservices, SQL reads,
Spark)
BI UserApp Data Scientist
Customer Info
Service1
Customer Info
Service2
Customer Info
ServiceM
…
…
…
…
MongoDB BI
Connector1
MongoDB BI
ConnectorN
Spark
Connector1
Spark
ConnectorY
…
mongos
(Query Router)
mongos
(Query Router)
mongos
(Query Router)
mongos
(Query Router)
CustInfo
Shard 1
mongos
(Query Router)
mongos
(Query Router)
DC1
DC2
DC3
CustInfo
Shard 2
CustInfo
Shard X
…
MongoDB Ops Manager
• Monitors
• Backups/restores
• Automates management
• REST API for container
orchestration integration
23. Implementation Phase Example (with Real Code!)
Let’s compare and contrast RDBMS/SQL to MongoDB development using
Java over the course of a few weeks.
Some ground rules:
1. Observe rules of Software Engineering 101: assume separation of application with a data access layer (DAL)
2. DAL must be able to
a. Expose simple, functional, data-only interfaces to the application
• No ORM, frameworks, compile-time bindings, special tools
b. Exploit high performance features of the persistor
3. Focus on core data handling code and avoid distractions that require the same amount of work in both
technologies
a. No exception or error handling
b. Leave out DB connection and other setup resources
4. Day counts are a proxy for progress, not actual time to complete indicated task
24. The Task: Saving and Fetching Contact data
Map m = new HashMap();
m.put(“name”, “matt”);
m.put(“id”, “K1”);
Start with this simple, flat
shape in the Data Access
Layer:
save(Map m)
And assume we save it in
this way:
Map m = fetch(String id)
And assume we fetch one
by primary key in this way:
25. Day 1: Initial Efforts for Both Technologies
DDL: create table contact ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name ) values ( ?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
}
return m;
}
SQL
DDL: none
save(Map m)
{
collection.insert(new Document(m));
}
MongoDB
Map fetch(String id)
{
Map m;
Document doc = new Document();
doc.put(“id”, id);
c = collection.find(doc);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
26. Day 2: Add simple fields
m.put(“name”, “matt”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
• Capturing title and hireDate is part of adding a new business feature
• It was pretty easy to add two fields to the structure
• …but now we have to change our persistence code
27. SQL Day 2 (Changes in Bold)
DDL: alter table contact add title varchar(8);
alter table contact add hireDate date;
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,?
)”);
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
}
return m;
}
Consequences:
1. Code release schedule linked to database
upgrade (new code cannot run on old
schema)
2. Issues with case sensitivity starting to
creep in (many RDBMSs are case
insensitive for column names, but code is
case sensitive)
3. Changes require careful mods in 4 places
4. Beginning of technical debt
28. MongoDB Day 2
save(Map m)
{
collection.insert(m);
}
Map fetch(String id)
{
Map m;
Document doc = new Document();
doc.put(“id”, id);
c = collection.find(doc);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Advantages:
1. Zero time and money spent on overhead code
2. Code and database not physically linked
3. New material with more fields can be added into existing
collections; backfill is optional
4. Names of fields in database precisely match key names
in code layer and directly match on name, not indirectly
via positional offset
5. No technical debt is created
✔ NO CHANGE
30. Day 3: With RDBMS
DDL: create table phones ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values (
?,?,?,? )”);
c2stmt = connection.prepareStatement(“insert into phones (id,
type, number) values (?, ?, ?)”;
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate, type, number from contact,
phones where phones.id = contact.id and contact.id = ?”);
}
save(Map m)
{
startTrans();
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
for(Map onePhone : m.get(“phones”)) {
c2stmt.setString(1, m.get(“id”));
c2stmt.setString(2, onePhone.get(“type”));
c2stmt.setString(3, onePhone.get(“number”));
c2stmt.execute();
}
contactInsertStmt.execute();
endTrans();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
int i = 0;
List list = new ArrayList();
while (rs.next()) {
if(i == 0) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
m.put(“phones”, list);
}
Map onePhone = new HashMap();
onePhone.put(“type”, rs.getString(5));
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
i++;
}
return m;
}
This takes time and money
31. Day 3: With MongoDB
save(Map m)
{
collection.insert(m);
}
Map fetch(String id)
{
Map m;
Document doc = new Document();
doc.put(“id”, id);
c = collection.find(doc);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Advantages:
1. Almost zero time and money spent on
overhead code
2. No need to fear fields that are “naturally
occurring” lists containing data specific to the
parent structure and thus do not benefit from
normalization and referential integrity
✔ NO CHANGE
32. By Day 14, Our Structure Looks Like This:
m.put(“name”, “name”);
m.put(“id”, “K1”);
//…
n4.put(“startupApps”, new String[] { “app1”, “app2”, “app3” } );
n4.put(“geo”, “US-EAST”);
list2.add(n4);
n4.put(“startupApps”, new String[] { “app6” } );
n4.put(“geo”, “EMEA”);l
n4.put(“useLocalNumberFormats”, false):
list2.add(n4);
m.put(“preferences”, list2)
n6.put(“optOut”, true);
n6.put(“assertDate”, someDate);
seclist.add(n6);
m.put(“attestations”, seclist)
m.put(“security”, anotherMapOfData);
• It was still pretty easy to add this data
to the structure
• Want to guess what the SQL
persistence code looks like?
• How about the mongoDB persistence
code?
34. MongoDB Day 14 – and every other day
save(Map m)
{
collection.insert(m);
}
Map fetch(String id)
{
Map m;
Document doc = new Document();
doc.put(“id”, id);
c = collection.find(doc);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Advantages:
1. Zero time and money spent on overhead
code
2. Persistence is so easy and flexible and
backward compatible that the persistor does not
upward-influence the shapes we want to persist
i.e. the tail does not wag the dog
✔ NO CHANGE
35. Also Powerful Functionality
MongoDB
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…” is on the “do not
call” list
Geospatial
• Find the best offer for the customer at geo coordinates of 42nd
St. and 6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers grouped by city
Native Binary
JSON support
• Add an additional phone number to Mark Smith’s without
rewriting the document
• Select just the mobile phone number in the list
• Sort on the modified date
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
Left outer join
($lookup)
• Query for all San Francisco residences, lookup their
transactions, and sum the amount by person
Graph queries
($graphLookup)
• Query for all people within 3 degrees of separation from Mark
37. Top 15
Global Bank
MongoDB is DB Standard
Global bank with 48M customers in 50 countries terminates Oracle
ULA & makes MongoDB database of choice
Problem Why MongoDB ResultsProblem Solution Results
Slow development cycles due to
RDBMS’ rigid data model hindering
ability to meet business demands
High TCO for hardware, licenses,
development, and support
(>$50M Oracle ULA)
Poor overall performance of customer-
facing and internal applications
Building dozens of apps on MongoDB,
both net new and migrations from
Oracle – e.g., significant portion of retail
banking, including customer-facing and
backoffice apps, fraud detection, card
activation, equity research content
mgt.)
Flexible data model to develop apps
quickly and accommodate diverse data
Ability to scale infrastructure and costs
elastically
Able to cancel Oracle ULA. Evaluating
what apps can be migrated to
MongoDB. For new apps, MongoDB is
default choice
Apps built in weeks instead of months
or years, e.g., ebanking app prototyped
in 2 weeks and in production in 4 weeks
70% TCO reduction
38. IoT App Running on MongoDB Atlas
Biotechnology giant uses MongoDB Atlas to allow their customers
to track experiments from any mobile device
Problem Why MongoDB ResultsProblem Solution Results
Thermo Fisher is developing Thermo Fisher
Cloud, one of the largest cloud platforms for the
scientific community on AWS
For scientific IoT applications, internal
developers need a database that could easily
handle a wide variety of fast-changing data
Each experiment produces millions of “rows” of
data, which led to suboptimal performance with
incumbent database
Thermo Fisher customers need to be able to
slice and dice their data in many different ways
MS instrument Connect allows Thermo
Fisher customers to see live experiment
results from any mobile device or browser
MongoDB’s expressive query language and
rich secondary indexes provide flexibility to
support both ad-hoc and predefined queries
to support customers’ scientific experiments
Deployed MongoDB using MongoDB Atlas, a
hosted DB service running on Amazon EC2
Thermo Fisher customers now can obtain
real-time insights from mass spectrometry
experiments from any mobile device or
browser; not possible before
Improved developer productivity with 40x less
code in testing with MongoDB when
compared to incumbent databases
Improved performance by 6x
Easy migration process & zero downtime.
Testing to production in under 2 months
39. ThermoFisher: Inserting data, MongoDB vs. MySQL
• Inserting 1,615 chemical compound records into two parent-child tables.
• To optimize the MySQL query, we turned off foreign keys during insert and
used a string builder to create a bulk insert SQL statement. This improved
insert performance by a factor of 360.
• Compare to MongoDB.
Database Milliseconds Lines of code
MySQL not optimized 147,600 (2.5 minutes) 21
MySQL optimized 410 40
MongoDB 68 1
40. For More Information
Resource Location
Atlas MongoDBaaS mongodb.com/cloud/atlas
Case Studies mongodb.com/customers
Presentations
Thermo Fisher’s Talk
mongodb.com/presentations
How Thermo Fisher Is Reducing Mass Spectrometry
Experiment Times from Days to Minutes with MongoDB
Atlas on AWS
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.com
MongoDB Downloads mongodb.com/download
First section is the plan for successful ideas – ideas that failed obviously don’t continue
Ask do they want any restriction during prototyping? What is the benefit of data modeling, having a restrictive schema, before at least things are fairly stable?
What if could eliminate these, shrink them, or break dependencies? Duration and effort would come down
DB Activities:
- prototyping: defining schema (even when prototyping), object to relational mapping, changing schema, testing 100K inserts (can just use VMs horizontally scaled)
- biz case: often SAN, as scale up infra for DB is higher portion of costs (exponential chart going to $1mm exadata), % of DBA
- design: data modeling, SPROCS, OR modeling, schema migration, scaling - (show pie chart with half being for DB)
- impl: persistence code, unit tests fail because out of sync with schema (someone else doing that)
- testing: delays because shared DB not has to be in sync
When remove dependencies, more flexibility in the critical path and can add resources to speed up duration
Often these are 2 separate people too
Now 3 things have to be in sync and you lose control over performance with the ORM
Now, let’s imagine there’s a new feature, or even just a small change. Let’s say now I need to track the age of the people in my application.
I have to go to my schema, add some tables maybe, add some rows. And some of these operations may require my application to go offline for a while.
Need to use SPROCS because DB is so slow, and still doesn’t help enough
This is also because of spreading data across table – to minimize distributed joins, scaling up is often preferred with RDBMSs
Point out there are other NoSQLs that give you some of it but not meant for all use cases (e.g. no secondary indexes)
Built for agility in every direction: data variety, volume, velocity with low TCO and easy management
Can get started with community or Atlas free or paid tier and immediately be ready to productionize and make enterprise-grade
Company: Thermo Fisher
Industry: Science, Biotechnology
Use Case: Real Time Analytics
Products & Services: MongoDB Atlas
MySQL not optimized: 21 lines
MySQL optimized: 40 lines
MongoDB: 1 line