For developers new to MongoDB and Node.js, however, some the common design patterns are very different than those of a RDBMS and traditional synchronous languages. Developers learning these technologies together may find it a bit bewildering. In reality, however, these tools fit perfectly together and enable I high degree of developer productivity and application performance.
This webinar will walk developers through common MongoDB development patterns in Node.js, such as efficiently loading data into MongoDB using MongoDB's bulk API, iterating through query results, and managing simultaneous asynchronous MongoDB queries to provide the best possible application performance. Working Node.js and MongoDB examples will be used throughout the presentation.
6. 6
Goal Today
• Help you get started with MongoDB and Node.JS
• Assumption:
– New to both technologies
– Programming Experience
– Database Experience
• Learn from my newbie confusion
7. 7
Agenda
1. Why Node.JS and MongoDB?
2. Find and Insert
3. Node.JS async, event queue, flow control
4. Controlling multiple threads
– Bulk insert
10. 10
JavaScript and “JSON” throughout
Browser
JavaScript
Node.JS
JavaScriptJSON
BSON (JSON)
BSON
JSON
11. 11
Documents are Rich Data Structures
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: ‘+447557505611’
city: ‘London’,
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
12. Do More With Your Data
MongoDB
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
}
Rich Queries
Find Paul’s cars
Find everybody in London with a car
built between 1970 and 1980
Geospatial
Find all of the car owners within 5km
of Trafalgar Sq.
Text Search
Find all the cars described as having
leather seats
Aggregation
Calculate the average value of Paul’s
car collection
Map Reduce
What is the ownership pattern of
colors by geography over time?
(is purple trending up in China?)
15. 15
Example: Find Query
• Find a flight status entry for United Airlines Flight 1234
– db.data.findOne({"callsign" : "UA7549"})
• Find a aircraft flying at less than 5000 feet
– db.data.findOne({"events.a" : {$lt : 5000}})
• Set value of note field
– db.data.update({"callsign" : "OY1949"},
{$set : {"note" : "spoke with captain"}})
17. 17
The synchronous way
var MongoClient = require('mongodb').MongoClient;
var db = MongoClient.connect('mongodb://localhost:27017/adsb');
var col = db.collection('data');
var doc = col.findOne({"callsign" : "UA7549"});
console.log("Here is my doc: %j", doc);
db.close();
18. 18
The synchronous way
var MongoClient = require('mongodb').MongoClient;
var db = MongoClient.connect('mongodb://localhost:27017/adsb');
var col = db.collection('data');
var doc = col.findOne({"callsign" : "UA7549"});
console.log("Here is my doc: %j", doc);
db.close();
19. 19
It works this way in the mongoshell???
var col = db.getCollection("data");
var doc = col.findOne({"callsign" : "HR9368"});
printjson(doc);
23. 23
Callbacks
col.findOne({"callsign" : "UA7549"}, function (err, doc) {
assert.equal(null, err);
console.log("Here is my doc: %j", doc);
console.log(”All done!”);
• Execute findOne
• When it is done, call the callback function
• Callback function takes two arguments
– err – contains the error message or null
– doc – the result of the findOne call
• “All Done” will be printed before the “Here is my doc…”
36. 36
MongoDB Asynchronous Queries
var MongoClient = require('mongodb').MongoClient,
assert = require('assert');
MongoClient.connect('mongodb://localhost:27017/adsb', function (err, db) {
assert.equal(null, err);
var col = db.collection('data');
col.findOne({"callsign" : "UA7549"}, function (err, doc) {
assert.equal(null, err);
console.log("Here is my doc: %j", doc);
db.close();
});
});
37. 37
Asynchronously
var MongoClient = require('mongodb').MongoClient,
assert = require('assert');
MongoClient.connect('mongodb://localhost:27017/adsb', function (err, db) {
assert.equal(null, err);
var col = db.collection('data');
col.findOne({"callsign" : "UA7549"}, function (err, doc) {
assert.equal(null, err);
console.log("Here is my doc: %j", doc);
db.close();
});
});
callback
38. 38
Asynchronously
var MongoClient = require('mongodb').MongoClient,
assert = require('assert');
MongoClient.connect('mongodb://localhost:27017/adsb', function (err, db) {
assert.equal(null, err);
var col = db.collection('data');
col.findOne({"callsign" : "UA7549"}, function (err, doc) {
assert.equal(null, err);
console.log("Here is my doc: %j", doc);
db.close();
});
});
callback
39. 39
This gets ugly fast
var MongoClient = require('mongodb').MongoClient,
assert = require('assert');
var db = MongoClient.connect('mongodb://localhost:27017/adsb', function (err, db) {
assert.equal(null, err);
var col = db.collection('data');
col.findOne({"callsign" : "UA7549"}, function (err, doc) {
assert.equal(null, err);
console.log("Here is my doc: %j", doc);
col.updateOne({"callsign" : "UA7549"}, {$set : {"note" : "Spoke with the
pilot"}}, {}, function(err, result) {
assert.equal(null, err);
console.log("Note updated");
db.close();
});
});
});
47. 47
Find Many - Cursor
• This works in the MongoShell
var col = db.getCollection("data");
var cursor = col.find({"events.a" : {$gt : 5000}});
while (cursor.hasNext()) {
printjson(cursor.next());
}
48. 48
Find Many - Cursor
• This works in the MongoShell
var col = db.getCollection("data");
var cursor = col.find({"events.a" : {$gt : 5000}});
while (cursor.hasNext()) {
printjson(cursor.next());
}
• It does not work in Node.JS
• The MongoDB driver retrieves documents in batches from MongoDB
– Retrieving a new batch is asynchronous
49. 49
Find Many - Streams
MongoClient.connect("mongodb://localhost:27017/adsb", function (err, db) {
var col = db.collection('data')
var stream = col.find({"events.a" : {$gt : 5000}}).stream();
stream.on('data', function(doc) {
console.log("Doc: %j", doc);
});
stream.on('error', function (doc) {
console.log("Query failed: %j", doc);
});
stream.on('end', function() {
console.log("All data retrieved.");
db.close();
});
});
50. 50
Find Many - Streams
MongoClient.connect("mongodb://localhost:27017/adsb", function (err, db) {
var col = db.collection('data')
var stream = col.find({"events.a" : {$gt : 5000}}).stream();
stream.on('data', function(doc) {
console.log("Doc: %j", doc);
});
stream.on('error', function (doc) {
console.log("Query failed: %j", doc);
});
stream.on('end', function() {
console.log("All data retrieved.");
db.close();
});
});
‘data’ callback invoked
for each document
52. 52
What if I have to insert 100M documents?
MongoClient.connect('mongodb://localhost:27017/adsb', function (err, db) {
assert.equal(null, err);
var col = db.collection('data');
for (i = 1; i <= 100000000; i++) {
col.insert({x: i, y: 2, z: 3},
{},
function (err, result) {
assert.equal(null, err);
console.log("Insert Complete");
});
}
});
Let’s insert all 100,000,000
in parallel!!!!
53. 53
What if I have to insert 100M documents?
MongoClient.connect('mongodb://localhost:27017/adsb', function (err, db) {
assert.equal(null, err);
var col = db.collection('data');
for (i = 1; i <= 100000000; i++) {
col.insert({x: i, y: 2, z: 3},
{},
function (err, result) {
assert.equal(null, err);
console.log("Insert Complete");
db.close();
});
}
});
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
54. 54
Event Queue/Call Stack
for (i = 1; i <= 100000000; i++) {
col.insert({x: i, y: 2, z: 3},
{},
function (err, result) {
assert.equal(null, err);
console.log("Insert C…");
});
}
Call Stack Driver API
Callback
Queue
Event Loop
55. 55
Event Queue/Call Stack
for (i = 1; i <= 100000000; i++) {
col.insert({x: i, y: 2, z: 3},
{},
function (err, result) {
assert.equal(null, err);
console.log("Insert C…");
db.close();
});
Call Stack Driver API
Callback
Queue
Event Loop
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
56. 56
Event Queue/Call Stack
for (i = 1; i <= 100000000; i++) {
col.insert({x: i, y: 2, z: 3},
{},
function (err, result) {
assert.equal(null, err);
console.log("Insert C…");
db.close();
});
Call Stack Driver API
Callback
Queue
Event Loop
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
col.insert()
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
57. 57
Let’s try 5 at a time
var DOCS_TO_INSERT = 1000;
var numInsertsRunning = 0; // number of insert threads
var insertCount = 0; // number of documents inserted so far
MongoClient.connect('mongodb://localhost:27017/adsb', function (err, db) {
assert.equal(null, err);
var col = db.collection('data');
for (i = 0; i < 5; i++) {
++numInsertsRunning;
insertDocument(db, col, ++insertCount, i, function (err, result) {
console.log("All ", DOCS_TO_INSERT, " documents inserted.");
db.close();
});
}
});
59. 59
InsertDocument Callback Logic
col.insert({x: ++insertCount, y: 2, z: 3}, function (err, result) {
}
Have all the documents
been inserted?
Call insertDocument
again
Are other inserts still
running?
Do nothing & decrement
running thread count
All inserts done
Call the original callback
Yes Yes
No No
60. 60
Bulk Inserts
• The previous example was for illustrative purposes
• MongoDB provides a buik write API that provides better performance for bulk writes
• The bulk write API batches up writes
– Batch writes with a single acknowledgement
• Use collection.bulkWrite
• Improve performance using multiple bulkWrite threads
– Previous example will be identical
– Replace collection.insert with collection.bulkWrite
61. 61
Another word of caution
• All my examples established a MongoDB connection and then closed it
– This was for illustrative purposes
• Don’t continuously open and close MongoDB connections.
• Open a connection once once
– Use that connection through the life of the program
– Close it at the end
62. 62
Summary
• Asynchronous vs synchronous programming
• Call stack, event loop, driver API
• Flow control
• Find, insert, update examples
• Managing multiple parallel threads
– bulk insert example
• Learn from my mistakes and misconceptions