3. History
mongoDB = “Humongous DB”
Open-source
Document-based
“High performance, high
availability”
Automatic scaling
-blog.mongodb.org/post/475279604/on-distributed-consistency-part-1
-mongodb.org/manual
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
4. Motivations
Problems with SQL
Rigid schema
Not easily scalable (designed for 90’s
technology or worse)
Requires unintuitive joins (despite its claims, Mongo
does not do any better due to physical constraints)
Perks of mongoDB
Easy interface with common languages (Java,
Javascript, PHP, etc.)
Keeps essential features of RDBMS’s while
learning from key-value noSQL systems
http://www.slideshare.net/spf13/mongodb-9794741?v=qf1&b=&from_search=13
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
5. Company Using mongoDB
“MongoDB powers Under Armour’s online store, and
was chosen for its dynamic schema, ability to scale
horizontally and perform multi-data center
replication.”
http://www.mongodb.org/about/production-deployments/
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
7. Data Model
Document-Based (max 16 MB each entry)
Documents are in BSON format, consisting of
field-value pairs
Each document stored in a collection
Collections
Like tables of relational db’s.
Documents do not have to have uniform structure
-docs.mongodb.org/manual/
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
8. JSON
“JavaScript Object Notation”
Easy for humans to write/read, easy for
computers to parse/generate
Objects can be nested
Built on
name/value pairs
ordered list of values
http://json.org/
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
11. BSON Example
{
"_id" : "37010"
"city" : "ADAMS",
"pop" : 2660,
"state" : "TN",
“congressmen:”: [“John”, “Willian”, “Adolf”]
“mayor” : {
name: “John Smith”
address: “13 Scenic Way”
}
}
Embedding and arrays, more similar to what we have
in all-purpose programing languages
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
12. BSON Types
Type Number
Double 1
String 2
Object 3
Array 4
Binary data 5
Object id 7
Boolean 8
Date 9
Null 10
Regular Expression 11
JavaScript 13
Symbol 14
JavaScript (with scope) 15
32-bit integer 16
Timestamp 17
64-bit integer 18
Min key 255
Max key 127
http://docs.mongodb.org/manual/reference/bson-types/
https://docs.mongodb.com/manual/reference/operator/query/type/
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
13. The _id Field
• By default, each document contains an _id
field. This field has a number of special
characteristics:
– Primary key for collection.
– Value is unique, immutable, and may be any non-
array type.
– Default data type is ObjectId, which is “small,
likely unique, fast to generate, and ordered.”
Sorting on an ObjectId value is roughly equivalent
to sorting on creation time.
http://docs.mongodb.org/manual/reference/bson-types/
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
14. The _id Field
• Using the default _id:
db.collection.insert({city: “New York“, state:”NY”, pop:”5M”})
• Using your own _id:
db.collection.insert({_id: 10, city: “New York“, state:”NY”, pop:”5M”})
• Using your own composite _id:
db.collection.insert({_id: {city: “New York“, state:”NY”}, pop:”5M”})
http://docs.mongodb.org/manual/reference/bson-types/
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
The _id itself is a document.
15. mongoDB vs. SQL
mongoDB SQL
Document Tuple
Collection Table/View
PK: _id Field PK: Any Attribute(s)
Uniformity not Required Uniform Relation Schema
Index Index
Embedded Structure Joins
Shard Partition
CRUD DML
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
17. Getting Started with mongoDB
To install mongoDB, go to this link and click on the
appropriate OS and architecture:
http://www.mongodb.org/downloads
First, extract the files (preferably to the C drive).
Finally, create a data directory on C: for
mongoDB to use
i.e. “md data” followed by “md datadb”
http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
18. Install
Unzip
Find executable “mongod”
Default connection at localhost:27017
Run
mongod --dbpath .
Or just
mongod
For the default dir (/var/lib/mongodb/ or c:datadb)
Or run from bin dir, and have data anywhere else
mongod --dbpath <any dir path>
Visual interface
https://www.mongodb.com/products/compass
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
19. CRUD: Using the Shell
To establish a connection to the server, open another command
prompt window and go to the same directory, entering
“mongo.exe”
To check which db you’re using db
Show all databases show dbs
Switch db’s/make a new one use <name>
See what collections exist show collections
Create collection db.createCollection(“<name>”)
Note: db’s are not actually created until you insert data!
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
21. CRUD: Using the Shell (cont.)
To insert documents into a collection/make a
new collection:
db.<collection>.insert(<document>)
<=>
INSERT INTO <table>
VALUES(<attributevalues>);
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
22. CRUD: Inserting Data
Insert one document
db.<collection>.insert({<field>:<value>})
Inserting a document with a field name new to the collection
is inherently supported by the BSON model.
To insert multiple documents, use an array.
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
23. CRUD: Querying
Get all docs: db.<collection>.find()
Returns a cursor, which is iterated over shell to
display first 20 results.
Add .limit(<number>) to limit results
db.<collection>.find().limit(2)
SELECT * FROM <table>;
Get one doc: db.<collection>.findOne(), the
first in the disk physical order, usually the
first inserted
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
24. CRUD: Querying
To match a specific value:
db.<collection>.find({<field>:<value>})
“AND”:
db.<collection>.find({<field1>:<value1>,
<field2>:<value2>
})
SELECT *
FROM <table>
WHERE <field1> = <value1> AND <field2> = <value2>;
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
25. CRUD: Querying
OR
db.<collection>.find({ $or: [
<field>:<value1>
<field>:<value2> ]
})
SELECT *
FROM <table>
WHERE <field> = <value1> OR <field> = <value2>;
Checking for multiple values of a set:
db.<collection>.find({<field>: {$in [<value>, <value>]}})
SELECT *
FROM <table>
WHERE <field> IN (<value>,<value>);
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
26. CRUD: Querying
Including/excluding document fields
db.<collection>.find({ }, {<field1>: 1})
SELECT field1
FROM <table>;
db.<collection>.find({<field1>:<value>}, {<field1>: 1})
SELECT field1
FROM <table>
WHERE <field1> = <value>;
0 false
>0 true
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
27. CRUD: Querying
Including/excluding document fields
db.<collection>.find({<field1>:<value>}, {<field2>: 0})
SELECT <all fields but not field2>
FROM <table>
WHERE <field1> = <value>;
- notice that find() takes two parameters
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
28. CRUD: Updating
db.<collection>.update(
{<field1>:<value1>}, //all docs in which field = value
{$set: {<field2>:<value2>}}, //set field to value
{multi:true} ) //update multiple docs
UPDATE <table>
SET <field2> = <value2>
WHERE <field1> = <value1>;
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
29. CRUD: Updating
To remove a field
db.<collection>.update({<field>:<value>},
{ $unset: { <field>: 1}})
ALTER TABLE DROP COLUMN <field>
“WHERE field = value”
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
30. CRUD: Removal
Remove all records where field = value
db.<collection>.remove({<field>:<value>})
DELETE FROM <table>
WHERE <field> = <value>;
As above, but only remove first document
db.<collection>.remove({<field>:<value>}, true)
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
31. CRUD: Isolation
• By default, all writes are atomic only on the level of a single
document.
• This means that writes over multiple documents of the same
collection can be interleaved with other operations.
• You can isolate writes on an entire collection by adding
“$isolated:1” in the query area:
db.foo.update(
{ status : "A" , $isolated : 1 },
{ $inc : { count : 1 } },
{ multi: true }
) --increments by 1 the field count of every document
--with status A in the collection foo
In this example, the $isolated :1 clause makes other clients wait to read and
to write the collection until the command is completed
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
search criterium
32. Access control included
Authentication mode must be set during start up:
mongod --auth
Then, users must be created:
use admin /*as administrator*/
db.createUser(
{ user: "reportsUser", pwd: "12345678",
roles: [
{ role: "read", db: "reporting" },
{ role: "read", db: "products" },
{ role: "readWrite", db: "accounts" }
]
}
)
password
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
35. Intuition – why database
exist in the first place?
Why can’t we just write programs that operate on
objects?
Memory limit
We cannot swap back from disk merely by OS via page-based
memory management mechanism
Why can’t we have the database operating on the
same data structures (like classes) as the ones
used in programs?
That is where mongoDB comes in
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
36. Mongo is basically schema-free
The purpose of schema in SQL is for meeting the
requirements of tables and guide the SQL
implementation
Every “row” in a database “table” is a data
structure, much like a “struct” in C, or a “class” in
Java. A table is then an array (or list) of such data
structures
So what we design in mongoDB is basically the same
way we design a compound data type binding in JSON
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
37. There are some patterns
Embedding (pre-joining)
Linking
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
41. Many to many relationship
Can put relation in either one of the
documents (embedding in one of the
documents)
Unavoidable redundancy
Possible (probable) inconsistency
It is also possible via linking
But, in this case, random access is
necessary – and joining is necessary in
case one needs all the relationships
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
44. Many to many relationship
Example:
Inserting a reference
1) Find the document to be referenced
var doc = db.courses.findOne(“name”: “Data Bases”);
2) Insert with _id field
db.students.insert(“name”: “Chris”, “courses”:[doc]);
45. Joins
MongoDB enthusiasts say that it avoids joining
by pre-joining (embedding documents). Is it
true?
Well, not
Embedding is also supported in RDBMs with a
more technical name: denormalization
In many situations, embedding is not
applicable:
M:N relationships
1:N relationships in which the left side (the 1) is
related to other entities
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
46. Joins
When embedding is not the case, it is
possible to do linking….and pay the same
price as RDBMs pay
In other cases, when the relationship was
not modeled into the data (usual, since
MongoDB does not have schema), an
explicit join is necessary:
$lookup aggregation
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
47. Joins
$lookup example
db.disciplina.aggregate(
{$lookup:
{from: ”professor”,
localField: “Prof”,
foreignField: “ProfCPF”,
as: “Ministrantes”}
)
SELECT *
FROM disciplina, professor
WHERE diciplina.Prof = Professor.ProfCPF
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
48. Checks
MongoDb also takes checks
They come in the form of “Document Validation”
The overall sintax is as follows:
db.runCommand( {
collMod: “<collection_name>",
validator: <boolean expression>,
validationAction: “error"|“warn" – warn or issues error
} )
Example: ensure that either phone or mail are provided
db.runCommand( {
collMod: "contacts",
validator: { $or: [ { phone: { $exists: true } }, { email: {
$exists: true } } ] },
validationAction : "warn"
} ) http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
“NOT NULL”
50. Before Index
What does database normally do when we query?
MongoDB must scan every document
Inefficient due to large volumes of data
db.users.find( { score: { “$lt” : 30} } )
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
51. Definition of Index
Definition
Indexes are special data structures
(B-Trees by default in MogoDB - just
as in RDBMSs) that store a small
portion of the collection’s data set in
an easy to traverse form.
Diagram of a query that uses an index to select
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
52. Index in MongoDB
Creation index
db.users.ensureIndex( { score: 1 } )
Show existing indexes
db.users.getIndexes()
Drop index
db.users.dropIndex( {score: 1} )
Explain—Explain
db.users.find().explain()
Returns a document that describes the
process and indexes
Operations
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
ascending
an attribute of the collection
53. Index in MongoDB
Types
• Single Field Indexes
– db.users.ensureIndex( { score: 1 } )
• Single Field Indexes
• Compound Field Indexes
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
54. Index in MongoDB
Types
• Compound Field Indexes
– db.users.ensureIndex( { userid: 1, score: -1 } )
• Single Field Indexes
• Compound Field Indexes
ascending descending
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
56. Pipelines
Modeled on the concept of data processing
pipelines.
Provides:
filters that operate like queries
document transformations that modify the
form of the output document
Provides tools for:
grouping and sorting by field
aggregating the contents of arrays, including
arrays of documents
Can use operators for tasks such as calculating the average or
concatenating a string.
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
58. Pipelines
$limit
$sort
db.zips.aggregate(
{$group: {_id:{state:"$state“}, pop:{$sum:"$pop"}}}}, -- group by
{$sort{pop,-1} -- sort descending
{$limit: 3}, -- only 3 first states
}
)
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
SELECT * FROM
(SELECT state, SUM(POP) AS pop
FROM ZIPS
GROUP BY state
ORDER BY pop DESC)
WHERE ROWNUM <= 3
59. Pipelines
$limit
$sort
db.zips.aggregate(
{$group: {_id:{state:"$state“}, pop:{$sum:"$pop"}}}}, -- group by
{$sort{pop,-1} -- sort descending
{$limit: 3}, -- only 3 first states
}
)
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
SELECT * FROM
(SELECT state, SUM(POP) AS pop
FROM ZIPS
GROUP BY state
ORDER BY pop DESC)
WHERE ROWNUM <= 3
Notice the diffence to
the former example
60. Pipelines
$limit
$sort
db.zips.aggregate(
{$group: {_id:{state:"$state“, city:“$city"}, pop:{$sum:"$pop"}}}},
{$sort{pop,-1} -- sort descending
{$limit: 3}, -- only 3 first states-cities
}
)
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
SELECT * FROM
(SELECT state, city, SUM(POP) AS pop
FROM ZIPS
GROUP BY state, city
ORDER BY pop DESC)
WHERE ROWNUM <= 3
Group-by on multiple fields
61. Single Purpose Aggregation
Operations
Special purpose database commands:
returning a count of matching documents
returning the distinct values for a field
grouping data based on the values of a field
Aggregate documents from a single collection.
https://docs.mongodb.com/manual/reference/meth
od/js-collection/
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
64. Install
Add to project libraries:
mongodb-driver
https://oss.sonatype.org/content/repositories/releases/org/mo
ngodb/mongodb-driver/
mongodb-driver-core
https://oss.sonatype.org/content/repositories/releases/org/mo
ngodb/mongodb-driver-core
bson
https://oss.sonatype.org/content/repositories/releases/org/mo
ngodb/bson
Make sure you use the same release for all of them
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
65. Hello Worldimport com.mongodb.*;
public class App {
public static void main(String[] args) {
try { /*Connect*/
MongoClient mongo = new MongoClient("localhost", 27017);
DB db = mongo.getDB("testdb");
DBCollection table = db.getCollection("user");
/*Insert*/
BasicDBObject document = new BasicDBObject();
document.put("name", "joao");
document.put("age", 30);
document.put("createdDate", new Date());
table.insert(document);
/*Find*/
BasicDBObject searchQuery = new BasicDBObject();
searchQuery.put("name", "joao");
DBCursor cursor = table.find(searchQuery);
while (cursor.hasNext()) {
System.out.println(cursor.next());
}
} catch (Exception e) { e.printStackTrace(); }}}
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
67. Cons
No schema No project (tempting to start right
away)
No schema More expensive application-level
management later on
Joins might be necessary after all; RDBMs do it
better
Consistency is in risk when denormalization is
accepted by default
Very limited transaction support do not put
your bank account on MongoDB
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm
69. MongoDB: The Definitive Guide,
By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
http://conteudo.icmc.usp.br/pessoas/junio/Site/index.htm