SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Thiago Veiga
MongoDB is an open-source document database that provides high performance, high
availability, and automatic scaling.
What is MongoDB ?
Why should i use MongoDB ?
When shuld i use MongoDB ?
• Account and user profiles: can store arrays of addresses
• CMS: the flexible schema of MongoDB is great for heterogeneous collections of content
types
• Form data: MongoDB makes it easy to evolve the structure of form data over time
• logs / user-generated content: can keep data with complex relationships together in one
object
• Messaging: vary message meta-data easily per message or message type without needing
to maintain separate collections or schemas
• System configuration: just a nice object graph of configuration values, which is very
natural in MongoDB
• Log data of any kind: structured log data is the future
• Graphs: just objects and pointers – a perfect fit
• Location based data: MongoDB understands geo-spatial coordinates and natively supports
geo-spatial indexing
• Queries: mongoDb supports field, range queries, regular expressions
• Indexing: Any field in a MongoDB collection can be indexed
• Replication: MongoDb provides high availability with replica sets
• Load Balancing: MongoDB scales horizontally using sharding and shard key which
determines how the data in a collection will be distributed
• File Storage: MongoDb has a Grid File System function
• Aggregation: The mongoDb aggregation framework can be used to map reduce or batch
processing
• Fixed size collection: MongoDb supports fixed size collection called capped collections
Thiago Veiga
Document Database
A record in MongoDB is a document, which is a data structure composed of field and
value pairs. MongoDB documents are similar to JSON objects. The values of fields may
include other documents, arrays, and arrays of documents.
The advantages of using documents are:
•Documents (i.e. objects) correspond to native data types in many
programming languages.
•Embedded documents and arrays reduce need for expensive joins.
•Dynamic schema supports fluent polymorphism.
Thiago Veiga
MongoDB Instalation:
• Download a build from https://www.mongodb.com/download-center
• Decompress and run
Start Mongod
• Create the default data directory in /data/db or c:datadb
• Start mongod or mongod.exe
• To verify that you can connect to the server, start the shell: mongo or mongo.exe.
• Then just type exit and press Enter.
Shutting Down Mongod
• 1. When mongod is running attached to a controlling terminal, entering Control-C.
• 2. Execute the following command from the operating system prompt:
• mongo --eval "db.adminCommand( { "shutdown" : 1 } )"
• 3. On Linux/Unix systems, sending a TERM or INT signal, e.g., kill -TERM <pid-of-mongod>.
Thiago Veiga
Data File Allocation
Each database will have at least two data files, one ending in .ns and the rest in
integers starting with 0.
-rw------- 1 tveiga group 67108864 Aug 29 12:57 pessoa.0
-rw------- 1 tveiga group 134217728 Aug 29 12:57 pessoa.1
-rw------- 1 tveiga group 16777216 Aug 29 12:57 pessoa.ns
The .ns file stores metadata about namespaces (collections and indexes). The number of namespaces is proportional to
the size of the .ns file. Each database can have up to 24,000 namespaces by default, although the size of these files,
and thus the number of namespaces, can be increased with the --nssize option (up to 2GB).
By default, datafiles start at 64MB and double in size with each additional datafile, up to 2GB. Additionally, on some
platforms, mongod allocates one more numbered data file than it needs, to improve throughput.
Thus, it’s possible for allocated size to be much larger than data size. If this presents a problem, you can use some
combination of server options:
--smallfiles // quarters the sizes of data files
--noprealloc // inhibits preallocation of extra files
The Lock File
In order to protect against the possibility that multiple mongod processes might try to use a set of database files in
conflicting ways, there is a lock file called mongod.lock.
-rw------- 1 tveiga group 5 Aug 29 12:57 mongod.lock
The Journal Subdirectory
The mongod process is able to employ a write-ahead journal to speed up data file recovery in the event of a server
crash. The journal’s files are stored in a subdirectory of the dbpath called journal.
Thiago Veiga
Log Files
MongoDB servers log informational messages as a normal operation. By default, a server process’s log is written to standard
output. You can have the server write the log to a file with the options
--logpath /var/mongodb/mongodb.log –logappend
db.adminCommand( { "logRotate" : 1 } ) ;
Config Files
All of these options can be specified in a config file. Any option that takes an argument is specified as option = argument.
Options that don’t take arguments are specified as option = true. An example config file would look something like this:
fork = true
# vvv = true
logpath = /var/mongodb/mongodb.log
You can then invoke mongod with the config file like so:
mongod --config mongod.conf
Thiago Veiga
MongoDB’s concurrency model
∙ Read operations block write operations.
∙ A write operation blocks everything.
∙ A pending write operation prevents new read operations.
∙ All operations yield occasionally, but only between documents.
Indexing
An index is a data structure that is used by Mongo’s query optimizer to quickly sort through and order the
documents in a collection. Formally speaking, these indexes are implemented as B-Tree-style indexes.
Try this query with the twitter data set:
use twitter
db.tweets.find( { "user.followers_count" : 1000 } ) ;
db.tweets.find( { "user.followers_count" : 1000 } ).explain() ;
Look at the output from explain.
Explain()
A great way to get more information on the performance of your database queries is to use the explain
method on the cursor. The result will be a document that contains the explain output. Note that explain
runs the actual query to determine the result.
Some of the important fields in the explain output are explained below:
cursor : This is either a BasicCursor which indicates a table scan operation or a BtreeCursor which means
an index was used.
nscanned : Number of items (documents or index entries) examined.
n : Number of documents matched (on all criteria specified).
The ratio n / nscanned is a rough measure of how effective the index is for that query. For an effective index, this
ratio should be close to 1.
Thiago Veiga
Create Index
MongoDB by default creates a unique index on the _id field for all collections
db.tweets.ensureIndex( { "user.followers_count" : 1 } ) ;
db.tweets.ensureIndex( { "user.screen_name" : 1, "created_at" : -1 } ) ;
{
"name" : "Raleigh",
"tags" : [ "north" , "carolina" , "unc" ]
}
{
"line_items" :
[
{
"sku" : "555b",
"name" : "Coltrane: Impressions"
},
{
"sku" : "123a",
"name" : "Davis: Kind of Blue"
}
]
}
db.cities.ensureIndex( { "tags" : 1 } ) ;
db.cities.find( { "tags" : "south" } ) ;
db.orders.ensureIndex( { "line_items.sku" : 1 } ) ;
db.orders.find( { "line_items.sku" : "123a" } ) ;
Thiago Veiga
Schema Design
In MongoDB, the basic rubric for schema design is store your data the way your application wants to see it.
Some things to keep in mind
1. Whether to embed data in subdocuments or to refer to separate documents by key fields. Usually,
one embeds data that is seldom changed (either truly immutable or only rarely mutated), and
data that is not interesting enough to be represented as a document on its own (e.g., tags or
labels tend to be represented as strings rather than normalized into their own documents).
2. Whether to store embedded data positionally (with arrays) or by named fields (with nested
documents). This is often a matter of taste, but sometimes relates to what can be queried/indexed
efficiently (i.e., whether you need to be able to use a multi-key index).
3. Whether to put possibly-related data together into fewer, larger documents or to split them into
more numerous but smaller documents (possibly across separate collections). In general, it’s best
to design your documents to fit what the application needs; data stored but don’t look at in
documents will just cost you working space.
4. When you have immutable (or seldom mutated) fields, whether to denormalize values over documents.
If business requirements permit some data to be immutable, then you can freely duplicate
data around in any document to reduce round-trips to your servers. (For instance, in a product
review system, there might be a Users collection with canonical username information. If username
is permitted to be immutable, then you can embed it in review documents without concern
about update inconsistencies.)
Thiago Veiga
Storage Engines
The storage engine is the component of the database that is responsible for managing how data is stored, both in memory and
on disk. MongoDB supports multiple storage engines, as different engines perform better for specific workloads. Choosing the
appropriate storage engine for your use case can significantly impact the performance of your applications.
WiredTiger is the default storage engine starting in MongoDB 3.2. It is well-suited for most workloads and is recommended for
new deployments. WiredTiger provides a document-level concurrency model, checkpointing, and compression, among other
features. In MongoDB Enterprise, WiredTiger also supports Encryption at Rest.
MMAPv1 is the original MongoDB storage engine and is the default storage engine for MongoDB versions before 3.2. It performs
well on workloads with high volumes of reads and writes, as well as in-place updates.
The In-Memory Storage Engine is available in MongoDB Enterprise. Rather than storing documents on-disk, it retains them in-
memory for more predictable data latencies.
Journaling
To provide durability in the event of a failure, MongoDB uses write ahead logging to on-disk journal files.
Journal Files
For the journal files, MongoDB creates a subdirectory named journal under the dbPath directory. WiredTiger journal files have
names with the following format WiredTigerLog.<sequence> where <sequence> is a zero-padded number starting from
000000001.
Journal files contain a record per each write operation. Each record has a unique identifier.
MongoDB configures WiredTiger to use snappy compression for the journaling data.
Minimum log record size for WiredTiger is 128 bytes. If a log record is 128 bytes or smaller, WiredTiger does not compress that
record.
WiredTiger journal files for MongoDB have a maximum size limit of approximately 100 MB. Once the file exceeds that limit,
WiredTiger creates a new journal file.
WiredTiger automatically removes old journal files to maintain only the files needed to recover from last checkpoint.
WiredTiger will pre-allocate journal files.
Thiago Veiga
Durability, Availability, and Replica Sets
Like any other data storage system, unless you’re making sure to put copies of your data into places that fail
separately from one another, your data isn’t really durable or available in the presence of failures (power outages,
network partitions, hardware failures, etc.) For this reason, MongoDB has a built-in replication model based on
coordination among a number of mongod processes, called a Replica Set.
Replica Set Basics
A replica set is a group of mongod processes that allow you to have your data duplicated over several hosts, ideally
distributed among several data centers. Replica set members all know about each other, and each member
communicates with every other member occasionally, so it’s important to ensure network connectivity between all
the hosts where your replica set members will run.
In a replica set, at any moment there is at most one writable set member, called the primary node, or just the
primary. By default, all other members of a replica set request descriptions of the data changes that happen on the
primary, and apply those changes to their own copies of the primary’s data and indexes; these members that store
data but aren’t writable at a particular point in time are called secondary nodes, or just secondaries. Secondaries
constantly request new data changes, but it’s important to know that replication in MongoDB is asynchronous and in
no way a distributed transaction.
Thiago Veiga
Automatic Failover and Primary Elections
Whenever a replica set’s primary becomes unavailable (e.g., goes offline), the remainder of the set may try to elect a
new primary node. In order for a subset of a replica set to perform an election, the subset must consist of a strict
majority of the set’s normal composition. For example, if the replica set normally has 4 members, the set will be able
to elect a primary whenever 3 or 4 members are online and able to communicate with each other; if only 2 members
can communicate, neither of those members will be a primary.
Thiago Veiga
How Clients Work with Replica Sets
All 10gen-supported MongoDB drivers implement special logic for connecting to replica sets, often as adistinct class in
the language. When a client connects to a replica set, the driver automatically discovers what nodes exist in the set
and which node is primary. At all times, the driver always routes write operations to the primary; by default, read
operations go to the primary, too.
Thiago Veiga
Reading Documents From Secondaries
As mentioned, MongoDB’s drivers send all read operations to the replica set’s primary by default. To send a read
operation to a secondary, the application must authorize the driver to send reads to non-primary nodes.
All supported 10gen driver APIs have tunable ReadPreference setting for controlling read operation routing.
In the mongo shell you can use the rs.slaveOk() function to permit secondary reads on a per-connection basis. In 2.2,
the shell also includes a readPref() method for cursor objects.
Ensuring Replication of Write Operations
Because MongoDB’s replication is asynchronous and non-transactional, it can happen that a primary node performs a
write operation and subsequently fails before that operation gets replicated to any secondary. In this case, even if the
client asked the primary to acknowledge the write’s success with getLastError or a WriteConcern object, the result of
the write operation won’t be reflected in the state of the replica set after the primary fails.
Consequently, the getLastError command has an option (which the WriteConcern object encapsulates), called the w
parameter, that tells the primary not to confirm a write operation’s success until the write has been replicated.
Here’s how it works: if the w parameter is a number, the primary won’t confirm success until that number of nodes in
the set, including the primary have performed the write operation. If w is a string, then it names a so-called
getLastErrorMode.
Finally, when an application calls getLastError with a w parameter, the primary will block until an appropriate number
of secondaries have replicated. Because it’s often undesirable to block indefinitely for replication, getLastError also
supports a wtimeout option, to tell the primary to return a timeout error after some number of milliseconds, rather
than blocking.
// Ensure that a write has been propagated to 2 secondaries
// before returning.
db.runCommand( { "getLastError": 1, "w" : 2 } ) ;
// Like the previous, but time out after 2 seconds.
db.runCommand( { "getLastError" : 1, "w" : 2, "wtimeout" : 2000}) ;
Thiago Veiga
How Replication Works Internally
Each node contains a database called local. The local database contains the a collection, called oplog.rs.
All writes to the primary are written to the oplog, in an idempotent form. Secondary nodes also hold a local
database, where they keep track of how far into the primary’s oplog they have caught up.
Getting stats on the oplog:
db.printReplicationInfo()
configured oplog size: 192MB
log length start to end: 7878secs (2.19hrs)
oplog first event time: Mon Sep 13 2010 15:15:53 GMT-0400 (EDT)
oplog last event time: Mon Sep 13 2010 17:27:11 GMT-0400 (EDT)
now: Mon Sep 13 2010 17:27:17 GMT-0400 (EDT)
The oplog is a special kind of collection, called a capped collection. MongoDB allocates the space for a capped
collection’s space once, and it never grows; instead, when the capped collection runs out of room for more
documents, new documents replace the oldest documents. You may think of a capped collection as a circular file.)
On most platforms, the default oplog size is 5% of free disk space at the time the oplog is allocated (remember,
oplogs never change in size). However, this default size is arbitrary, and unlikely to be what you need. You can set
the oplog size when initially starting mongod like so:
mongod --oplogSize 200 // in MB
Thiago Veiga
Sharding
MongoDB supports an automated partitioning architecture called sharding, enabling horizontal scaling across multiple
nodes. Sharding operates by breaking up selected collections into smallish chunks of documents based on ranges of a
user-specified field, called the shard key, and then disbursing those chunks of across a number of cooperating replica
sets, called shards.
Sharding is made to support applications that outgrow the capacity of a single replica set. A replica set can be
converted to a sharded cluster fairly straightforwardly, and relatively few changes are necessary to convert an
application to work with a sharded cluster.
Operationally, a sharded cluster consists of many processes, grouped into 3 categories:
• Shards. Each shard should be a replica set. Shards store non-overlapping subsets of your applications’ databases.
mongod processes are started with all their normal options plus the --shardsvr option to operate in sharded mode.
You may have any number of shards. Although it’s possible to have a cluster with just one shard, such a cluster has
no performance or scaling benefits over a single replica set.
• Config servers. The config servers store routing information and some book keeping metadata. You need three of
these. Config servers are mongod processes started with the --configsvr option, along with other standard mongod
options.
• Routing nodes. Applications communicate with the sharded cluster via one or more mongos processes, never by
contacting the config servers or shard members directly. The mongos is a non-data-storing process that usually
lives on each application server, but can be deployed anywhere that has good connectivity to the shards and the
config servers. mongos processes must be started with a –configdb argument that specifies the addresses of the all
3 config servers.
Thiago Veiga
How Sharding Works
After all the processes in a cluster are set up and configured, it’s up to you, the application’s designers and
operators, to decide which collections would benefit from automatic data balancing. Typically, only the
largest or most volatile collections gain much by being sharded; and a cluster may contain both sharded and
non-sharded collectons. You run a command to shard a collection on a shard key; any field or compound
field can be given as the shard key, but the shard key on a collection cannot be changed, so it’s important to
pick a good one.
Once a collection is sharded, the cluster automatically breaks up the collection into ranges of shard key
values, called chunks. Each document in the collection falls into exactly one chunk, based on the value of the
shard key in that document. Chunks are automatically split into smaller, non-overlapping chunks in order
to keep the data volume within each chunk about the same size. Finally, whenever any shard has too many
chunks, the cluster will have that shard migrate some of its chunks in order to even out the distribution of
chunks over the group of shards.
All of the splitting and migrating activity is invisible to ordinary applications, because the mongos hides it
all. The mongos routes read and write requests to whichever shard or shards is the current holder of the
chunk that needs to be accessed. But at any particular moment, a document might exist in multiple copies
over a couple of shards, because a migration may be in progress. (Consequently, if you ever bypass the
mongos and connect directly to a shard, you may find confusing data. So don’t do that.)
The config servers are the canonical repository of which collections are sharded, what chunks exist in those
collections, and on which shards those chunks reside. The mongos processes keep a cache of the config servers’
state, and both the shards and the mongos processes occasionally update the config servers. However, the
protocol for updating config servers is carefully designed to prevent the system from losing track of any
chunks; in particular, whenever any config server is unavailable, no chunks may be split or migrated, no
collections may be newly sharded, and no shards may be added or removed.
Thiago Veiga
Thiago Veiga
Primary Shard
Every database has a primary shard that holds all the un-sharded collections for a database. The primary shard
has no relation to the primary in a replica set.
Shard Status
Use the sh.status() method in the mongo shell to see an overview of the cluster. This
reports includes which shard is primary for the database and the chunk distribution
across the shards. See sh.status() method for more details. Thiago Veiga
Thiago Veiga
Thiago Veiga

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentationHyphen Call
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionJoão Gabriel Lima
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Kai Zhao
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for BeginnersEnoch Joshua
 
Connecting NodeJS & MongoDB
Connecting NodeJS & MongoDBConnecting NodeJS & MongoDB
Connecting NodeJS & MongoDBEnoch Joshua
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDBElieHannouch
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2MongoDB
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineMongoDB
 

Was ist angesagt? (20)

MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 
Mongo db
Mongo dbMongo db
Mongo db
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for Beginners
 
Connecting NodeJS & MongoDB
Connecting NodeJS & MongoDBConnecting NodeJS & MongoDB
Connecting NodeJS & MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
Open source Technology
Open source TechnologyOpen source Technology
Open source Technology
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
MongoDb - Details on the POC
MongoDb - Details on the POCMongoDb - Details on the POC
MongoDb - Details on the POC
 
Mongodb Introduction
Mongodb IntroductionMongodb Introduction
Mongodb Introduction
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
 
Mongodb
MongodbMongodb
Mongodb
 

Andere mochten auch

Security Strategies for Success
Security Strategies for SuccessSecurity Strategies for Success
Security Strategies for SuccessCitrix
 
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...Amazon Web Services
 
From Prosperity to Extinction: A Tale of Blockbuster Proportions
From Prosperity to Extinction: A Tale of Blockbuster ProportionsFrom Prosperity to Extinction: A Tale of Blockbuster Proportions
From Prosperity to Extinction: A Tale of Blockbuster ProportionsSoftchoice Corporation
 
ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...
ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...
ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...Rene Modery
 
AdTech & MarTech Barometer - 2015 Market Review
AdTech & MarTech Barometer - 2015 Market ReviewAdTech & MarTech Barometer - 2015 Market Review
AdTech & MarTech Barometer - 2015 Market Reviewresultsig
 
Nghi dinh 63_2014_nd-cp
Nghi dinh 63_2014_nd-cpNghi dinh 63_2014_nd-cp
Nghi dinh 63_2014_nd-cpTtx Love
 
I T.A.K.E. talk: "When DDD meets FP, good things happen"
I T.A.K.E. talk: "When DDD meets FP, good things happen"I T.A.K.E. talk: "When DDD meets FP, good things happen"
I T.A.K.E. talk: "When DDD meets FP, good things happen"Cyrille Martraire
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
 
Highlights from 2015 Citrix Customer Case Studies
Highlights from 2015 Citrix Customer Case StudiesHighlights from 2015 Citrix Customer Case Studies
Highlights from 2015 Citrix Customer Case StudiesCitrix
 
Business intelligence: making more informed decisions - Jisc Digifest 2016
Business intelligence: making more informed decisions - Jisc Digifest 2016Business intelligence: making more informed decisions - Jisc Digifest 2016
Business intelligence: making more informed decisions - Jisc Digifest 2016Jisc
 
Box of Broadcasts - enhance learning with TV and radio content
Box of Broadcasts - enhance learning with TV and radio contentBox of Broadcasts - enhance learning with TV and radio content
Box of Broadcasts - enhance learning with TV and radio contentJisc
 
Manage Risk By Protecting the Apps and Data Infographic
Manage Risk By Protecting the Apps and Data InfographicManage Risk By Protecting the Apps and Data Infographic
Manage Risk By Protecting the Apps and Data InfographicCitrix
 
Softchoice Discovery Series: Cloud Cost Governance
Softchoice Discovery Series: Cloud Cost GovernanceSoftchoice Discovery Series: Cloud Cost Governance
Softchoice Discovery Series: Cloud Cost GovernanceSoftchoice Corporation
 

Andere mochten auch (14)

Security Strategies for Success
Security Strategies for SuccessSecurity Strategies for Success
Security Strategies for Success
 
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
 
From Prosperity to Extinction: A Tale of Blockbuster Proportions
From Prosperity to Extinction: A Tale of Blockbuster ProportionsFrom Prosperity to Extinction: A Tale of Blockbuster Proportions
From Prosperity to Extinction: A Tale of Blockbuster Proportions
 
U boot
U bootU boot
U boot
 
ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...
ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...
ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...
 
AdTech & MarTech Barometer - 2015 Market Review
AdTech & MarTech Barometer - 2015 Market ReviewAdTech & MarTech Barometer - 2015 Market Review
AdTech & MarTech Barometer - 2015 Market Review
 
Nghi dinh 63_2014_nd-cp
Nghi dinh 63_2014_nd-cpNghi dinh 63_2014_nd-cp
Nghi dinh 63_2014_nd-cp
 
I T.A.K.E. talk: "When DDD meets FP, good things happen"
I T.A.K.E. talk: "When DDD meets FP, good things happen"I T.A.K.E. talk: "When DDD meets FP, good things happen"
I T.A.K.E. talk: "When DDD meets FP, good things happen"
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
Highlights from 2015 Citrix Customer Case Studies
Highlights from 2015 Citrix Customer Case StudiesHighlights from 2015 Citrix Customer Case Studies
Highlights from 2015 Citrix Customer Case Studies
 
Business intelligence: making more informed decisions - Jisc Digifest 2016
Business intelligence: making more informed decisions - Jisc Digifest 2016Business intelligence: making more informed decisions - Jisc Digifest 2016
Business intelligence: making more informed decisions - Jisc Digifest 2016
 
Box of Broadcasts - enhance learning with TV and radio content
Box of Broadcasts - enhance learning with TV and radio contentBox of Broadcasts - enhance learning with TV and radio content
Box of Broadcasts - enhance learning with TV and radio content
 
Manage Risk By Protecting the Apps and Data Infographic
Manage Risk By Protecting the Apps and Data InfographicManage Risk By Protecting the Apps and Data Infographic
Manage Risk By Protecting the Apps and Data Infographic
 
Softchoice Discovery Series: Cloud Cost Governance
Softchoice Discovery Series: Cloud Cost GovernanceSoftchoice Discovery Series: Cloud Cost Governance
Softchoice Discovery Series: Cloud Cost Governance
 

Ähnlich wie Mongodb

Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesAshishRathore72
 
What are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docxWhat are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docxTechnogeeks
 
Top MongoDB interview Questions and Answers
Top MongoDB interview Questions and AnswersTop MongoDB interview Questions and Answers
Top MongoDB interview Questions and Answersjeetendra mandal
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcriptfoliba
 
What is the significance of MongoDB and what are its usages.docx
What is the significance of MongoDB and what are its usages.docxWhat is the significance of MongoDB and what are its usages.docx
What is the significance of MongoDB and what are its usages.docxkzayra69
 
MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introductiondinkar thakur
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesronwarshawsky
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?Binary Studio
 
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYCHands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYCLaura Ventura
 
UNIT-1 MongoDB.pptx
UNIT-1 MongoDB.pptxUNIT-1 MongoDB.pptx
UNIT-1 MongoDB.pptxDharaDarji5
 
how_can_businesses_address_storage_issues_using_mongodb.pptx
how_can_businesses_address_storage_issues_using_mongodb.pptxhow_can_businesses_address_storage_issues_using_mongodb.pptx
how_can_businesses_address_storage_issues_using_mongodb.pptxsarah david
 
how_can_businesses_address_storage_issues_using_mongodb.pdf
how_can_businesses_address_storage_issues_using_mongodb.pdfhow_can_businesses_address_storage_issues_using_mongodb.pdf
how_can_businesses_address_storage_issues_using_mongodb.pdfsarah david
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB DatabaseTariqul islam
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialPHP Support
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge ShareingPhilip Zhong
 

Ähnlich wie Mongodb (20)

Mongo db
Mongo dbMongo db
Mongo db
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practices
 
What are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docxWhat are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docx
 
Top MongoDB interview Questions and Answers
Top MongoDB interview Questions and AnswersTop MongoDB interview Questions and Answers
Top MongoDB interview Questions and Answers
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
 
Mongodb
MongodbMongodb
Mongodb
 
What is the significance of MongoDB and what are its usages.docx
What is the significance of MongoDB and what are its usages.docxWhat is the significance of MongoDB and what are its usages.docx
What is the significance of MongoDB and what are its usages.docx
 
MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introduction
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategies
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?
 
MongoDB
MongoDBMongoDB
MongoDB
 
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYCHands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
 
UNIT-1 MongoDB.pptx
UNIT-1 MongoDB.pptxUNIT-1 MongoDB.pptx
UNIT-1 MongoDB.pptx
 
how_can_businesses_address_storage_issues_using_mongodb.pptx
how_can_businesses_address_storage_issues_using_mongodb.pptxhow_can_businesses_address_storage_issues_using_mongodb.pptx
how_can_businesses_address_storage_issues_using_mongodb.pptx
 
how_can_businesses_address_storage_issues_using_mongodb.pdf
how_can_businesses_address_storage_issues_using_mongodb.pdfhow_can_businesses_address_storage_issues_using_mongodb.pdf
how_can_businesses_address_storage_issues_using_mongodb.pdf
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js Tutorial
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 

Kürzlich hochgeladen

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Mongodb

  • 2. MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling. What is MongoDB ? Why should i use MongoDB ? When shuld i use MongoDB ? • Account and user profiles: can store arrays of addresses • CMS: the flexible schema of MongoDB is great for heterogeneous collections of content types • Form data: MongoDB makes it easy to evolve the structure of form data over time • logs / user-generated content: can keep data with complex relationships together in one object • Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas • System configuration: just a nice object graph of configuration values, which is very natural in MongoDB • Log data of any kind: structured log data is the future • Graphs: just objects and pointers – a perfect fit • Location based data: MongoDB understands geo-spatial coordinates and natively supports geo-spatial indexing • Queries: mongoDb supports field, range queries, regular expressions • Indexing: Any field in a MongoDB collection can be indexed • Replication: MongoDb provides high availability with replica sets • Load Balancing: MongoDB scales horizontally using sharding and shard key which determines how the data in a collection will be distributed • File Storage: MongoDb has a Grid File System function • Aggregation: The mongoDb aggregation framework can be used to map reduce or batch processing • Fixed size collection: MongoDb supports fixed size collection called capped collections Thiago Veiga
  • 3. Document Database A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents. The advantages of using documents are: •Documents (i.e. objects) correspond to native data types in many programming languages. •Embedded documents and arrays reduce need for expensive joins. •Dynamic schema supports fluent polymorphism. Thiago Veiga
  • 4. MongoDB Instalation: • Download a build from https://www.mongodb.com/download-center • Decompress and run Start Mongod • Create the default data directory in /data/db or c:datadb • Start mongod or mongod.exe • To verify that you can connect to the server, start the shell: mongo or mongo.exe. • Then just type exit and press Enter. Shutting Down Mongod • 1. When mongod is running attached to a controlling terminal, entering Control-C. • 2. Execute the following command from the operating system prompt: • mongo --eval "db.adminCommand( { "shutdown" : 1 } )" • 3. On Linux/Unix systems, sending a TERM or INT signal, e.g., kill -TERM <pid-of-mongod>. Thiago Veiga
  • 5. Data File Allocation Each database will have at least two data files, one ending in .ns and the rest in integers starting with 0. -rw------- 1 tveiga group 67108864 Aug 29 12:57 pessoa.0 -rw------- 1 tveiga group 134217728 Aug 29 12:57 pessoa.1 -rw------- 1 tveiga group 16777216 Aug 29 12:57 pessoa.ns The .ns file stores metadata about namespaces (collections and indexes). The number of namespaces is proportional to the size of the .ns file. Each database can have up to 24,000 namespaces by default, although the size of these files, and thus the number of namespaces, can be increased with the --nssize option (up to 2GB). By default, datafiles start at 64MB and double in size with each additional datafile, up to 2GB. Additionally, on some platforms, mongod allocates one more numbered data file than it needs, to improve throughput. Thus, it’s possible for allocated size to be much larger than data size. If this presents a problem, you can use some combination of server options: --smallfiles // quarters the sizes of data files --noprealloc // inhibits preallocation of extra files The Lock File In order to protect against the possibility that multiple mongod processes might try to use a set of database files in conflicting ways, there is a lock file called mongod.lock. -rw------- 1 tveiga group 5 Aug 29 12:57 mongod.lock The Journal Subdirectory The mongod process is able to employ a write-ahead journal to speed up data file recovery in the event of a server crash. The journal’s files are stored in a subdirectory of the dbpath called journal. Thiago Veiga
  • 6. Log Files MongoDB servers log informational messages as a normal operation. By default, a server process’s log is written to standard output. You can have the server write the log to a file with the options --logpath /var/mongodb/mongodb.log –logappend db.adminCommand( { "logRotate" : 1 } ) ; Config Files All of these options can be specified in a config file. Any option that takes an argument is specified as option = argument. Options that don’t take arguments are specified as option = true. An example config file would look something like this: fork = true # vvv = true logpath = /var/mongodb/mongodb.log You can then invoke mongod with the config file like so: mongod --config mongod.conf Thiago Veiga
  • 7. MongoDB’s concurrency model ∙ Read operations block write operations. ∙ A write operation blocks everything. ∙ A pending write operation prevents new read operations. ∙ All operations yield occasionally, but only between documents. Indexing An index is a data structure that is used by Mongo’s query optimizer to quickly sort through and order the documents in a collection. Formally speaking, these indexes are implemented as B-Tree-style indexes. Try this query with the twitter data set: use twitter db.tweets.find( { "user.followers_count" : 1000 } ) ; db.tweets.find( { "user.followers_count" : 1000 } ).explain() ; Look at the output from explain. Explain() A great way to get more information on the performance of your database queries is to use the explain method on the cursor. The result will be a document that contains the explain output. Note that explain runs the actual query to determine the result. Some of the important fields in the explain output are explained below: cursor : This is either a BasicCursor which indicates a table scan operation or a BtreeCursor which means an index was used. nscanned : Number of items (documents or index entries) examined. n : Number of documents matched (on all criteria specified). The ratio n / nscanned is a rough measure of how effective the index is for that query. For an effective index, this ratio should be close to 1. Thiago Veiga
  • 8. Create Index MongoDB by default creates a unique index on the _id field for all collections db.tweets.ensureIndex( { "user.followers_count" : 1 } ) ; db.tweets.ensureIndex( { "user.screen_name" : 1, "created_at" : -1 } ) ; { "name" : "Raleigh", "tags" : [ "north" , "carolina" , "unc" ] } { "line_items" : [ { "sku" : "555b", "name" : "Coltrane: Impressions" }, { "sku" : "123a", "name" : "Davis: Kind of Blue" } ] } db.cities.ensureIndex( { "tags" : 1 } ) ; db.cities.find( { "tags" : "south" } ) ; db.orders.ensureIndex( { "line_items.sku" : 1 } ) ; db.orders.find( { "line_items.sku" : "123a" } ) ; Thiago Veiga
  • 9. Schema Design In MongoDB, the basic rubric for schema design is store your data the way your application wants to see it. Some things to keep in mind 1. Whether to embed data in subdocuments or to refer to separate documents by key fields. Usually, one embeds data that is seldom changed (either truly immutable or only rarely mutated), and data that is not interesting enough to be represented as a document on its own (e.g., tags or labels tend to be represented as strings rather than normalized into their own documents). 2. Whether to store embedded data positionally (with arrays) or by named fields (with nested documents). This is often a matter of taste, but sometimes relates to what can be queried/indexed efficiently (i.e., whether you need to be able to use a multi-key index). 3. Whether to put possibly-related data together into fewer, larger documents or to split them into more numerous but smaller documents (possibly across separate collections). In general, it’s best to design your documents to fit what the application needs; data stored but don’t look at in documents will just cost you working space. 4. When you have immutable (or seldom mutated) fields, whether to denormalize values over documents. If business requirements permit some data to be immutable, then you can freely duplicate data around in any document to reduce round-trips to your servers. (For instance, in a product review system, there might be a Users collection with canonical username information. If username is permitted to be immutable, then you can embed it in review documents without concern about update inconsistencies.) Thiago Veiga
  • 10. Storage Engines The storage engine is the component of the database that is responsible for managing how data is stored, both in memory and on disk. MongoDB supports multiple storage engines, as different engines perform better for specific workloads. Choosing the appropriate storage engine for your use case can significantly impact the performance of your applications. WiredTiger is the default storage engine starting in MongoDB 3.2. It is well-suited for most workloads and is recommended for new deployments. WiredTiger provides a document-level concurrency model, checkpointing, and compression, among other features. In MongoDB Enterprise, WiredTiger also supports Encryption at Rest. MMAPv1 is the original MongoDB storage engine and is the default storage engine for MongoDB versions before 3.2. It performs well on workloads with high volumes of reads and writes, as well as in-place updates. The In-Memory Storage Engine is available in MongoDB Enterprise. Rather than storing documents on-disk, it retains them in- memory for more predictable data latencies. Journaling To provide durability in the event of a failure, MongoDB uses write ahead logging to on-disk journal files. Journal Files For the journal files, MongoDB creates a subdirectory named journal under the dbPath directory. WiredTiger journal files have names with the following format WiredTigerLog.<sequence> where <sequence> is a zero-padded number starting from 000000001. Journal files contain a record per each write operation. Each record has a unique identifier. MongoDB configures WiredTiger to use snappy compression for the journaling data. Minimum log record size for WiredTiger is 128 bytes. If a log record is 128 bytes or smaller, WiredTiger does not compress that record. WiredTiger journal files for MongoDB have a maximum size limit of approximately 100 MB. Once the file exceeds that limit, WiredTiger creates a new journal file. WiredTiger automatically removes old journal files to maintain only the files needed to recover from last checkpoint. WiredTiger will pre-allocate journal files. Thiago Veiga
  • 11. Durability, Availability, and Replica Sets Like any other data storage system, unless you’re making sure to put copies of your data into places that fail separately from one another, your data isn’t really durable or available in the presence of failures (power outages, network partitions, hardware failures, etc.) For this reason, MongoDB has a built-in replication model based on coordination among a number of mongod processes, called a Replica Set. Replica Set Basics A replica set is a group of mongod processes that allow you to have your data duplicated over several hosts, ideally distributed among several data centers. Replica set members all know about each other, and each member communicates with every other member occasionally, so it’s important to ensure network connectivity between all the hosts where your replica set members will run. In a replica set, at any moment there is at most one writable set member, called the primary node, or just the primary. By default, all other members of a replica set request descriptions of the data changes that happen on the primary, and apply those changes to their own copies of the primary’s data and indexes; these members that store data but aren’t writable at a particular point in time are called secondary nodes, or just secondaries. Secondaries constantly request new data changes, but it’s important to know that replication in MongoDB is asynchronous and in no way a distributed transaction. Thiago Veiga
  • 12. Automatic Failover and Primary Elections Whenever a replica set’s primary becomes unavailable (e.g., goes offline), the remainder of the set may try to elect a new primary node. In order for a subset of a replica set to perform an election, the subset must consist of a strict majority of the set’s normal composition. For example, if the replica set normally has 4 members, the set will be able to elect a primary whenever 3 or 4 members are online and able to communicate with each other; if only 2 members can communicate, neither of those members will be a primary. Thiago Veiga
  • 13. How Clients Work with Replica Sets All 10gen-supported MongoDB drivers implement special logic for connecting to replica sets, often as adistinct class in the language. When a client connects to a replica set, the driver automatically discovers what nodes exist in the set and which node is primary. At all times, the driver always routes write operations to the primary; by default, read operations go to the primary, too. Thiago Veiga
  • 14. Reading Documents From Secondaries As mentioned, MongoDB’s drivers send all read operations to the replica set’s primary by default. To send a read operation to a secondary, the application must authorize the driver to send reads to non-primary nodes. All supported 10gen driver APIs have tunable ReadPreference setting for controlling read operation routing. In the mongo shell you can use the rs.slaveOk() function to permit secondary reads on a per-connection basis. In 2.2, the shell also includes a readPref() method for cursor objects. Ensuring Replication of Write Operations Because MongoDB’s replication is asynchronous and non-transactional, it can happen that a primary node performs a write operation and subsequently fails before that operation gets replicated to any secondary. In this case, even if the client asked the primary to acknowledge the write’s success with getLastError or a WriteConcern object, the result of the write operation won’t be reflected in the state of the replica set after the primary fails. Consequently, the getLastError command has an option (which the WriteConcern object encapsulates), called the w parameter, that tells the primary not to confirm a write operation’s success until the write has been replicated. Here’s how it works: if the w parameter is a number, the primary won’t confirm success until that number of nodes in the set, including the primary have performed the write operation. If w is a string, then it names a so-called getLastErrorMode. Finally, when an application calls getLastError with a w parameter, the primary will block until an appropriate number of secondaries have replicated. Because it’s often undesirable to block indefinitely for replication, getLastError also supports a wtimeout option, to tell the primary to return a timeout error after some number of milliseconds, rather than blocking. // Ensure that a write has been propagated to 2 secondaries // before returning. db.runCommand( { "getLastError": 1, "w" : 2 } ) ; // Like the previous, but time out after 2 seconds. db.runCommand( { "getLastError" : 1, "w" : 2, "wtimeout" : 2000}) ; Thiago Veiga
  • 15. How Replication Works Internally Each node contains a database called local. The local database contains the a collection, called oplog.rs. All writes to the primary are written to the oplog, in an idempotent form. Secondary nodes also hold a local database, where they keep track of how far into the primary’s oplog they have caught up. Getting stats on the oplog: db.printReplicationInfo() configured oplog size: 192MB log length start to end: 7878secs (2.19hrs) oplog first event time: Mon Sep 13 2010 15:15:53 GMT-0400 (EDT) oplog last event time: Mon Sep 13 2010 17:27:11 GMT-0400 (EDT) now: Mon Sep 13 2010 17:27:17 GMT-0400 (EDT) The oplog is a special kind of collection, called a capped collection. MongoDB allocates the space for a capped collection’s space once, and it never grows; instead, when the capped collection runs out of room for more documents, new documents replace the oldest documents. You may think of a capped collection as a circular file.) On most platforms, the default oplog size is 5% of free disk space at the time the oplog is allocated (remember, oplogs never change in size). However, this default size is arbitrary, and unlikely to be what you need. You can set the oplog size when initially starting mongod like so: mongod --oplogSize 200 // in MB Thiago Veiga
  • 16. Sharding MongoDB supports an automated partitioning architecture called sharding, enabling horizontal scaling across multiple nodes. Sharding operates by breaking up selected collections into smallish chunks of documents based on ranges of a user-specified field, called the shard key, and then disbursing those chunks of across a number of cooperating replica sets, called shards. Sharding is made to support applications that outgrow the capacity of a single replica set. A replica set can be converted to a sharded cluster fairly straightforwardly, and relatively few changes are necessary to convert an application to work with a sharded cluster. Operationally, a sharded cluster consists of many processes, grouped into 3 categories: • Shards. Each shard should be a replica set. Shards store non-overlapping subsets of your applications’ databases. mongod processes are started with all their normal options plus the --shardsvr option to operate in sharded mode. You may have any number of shards. Although it’s possible to have a cluster with just one shard, such a cluster has no performance or scaling benefits over a single replica set. • Config servers. The config servers store routing information and some book keeping metadata. You need three of these. Config servers are mongod processes started with the --configsvr option, along with other standard mongod options. • Routing nodes. Applications communicate with the sharded cluster via one or more mongos processes, never by contacting the config servers or shard members directly. The mongos is a non-data-storing process that usually lives on each application server, but can be deployed anywhere that has good connectivity to the shards and the config servers. mongos processes must be started with a –configdb argument that specifies the addresses of the all 3 config servers. Thiago Veiga
  • 17. How Sharding Works After all the processes in a cluster are set up and configured, it’s up to you, the application’s designers and operators, to decide which collections would benefit from automatic data balancing. Typically, only the largest or most volatile collections gain much by being sharded; and a cluster may contain both sharded and non-sharded collectons. You run a command to shard a collection on a shard key; any field or compound field can be given as the shard key, but the shard key on a collection cannot be changed, so it’s important to pick a good one. Once a collection is sharded, the cluster automatically breaks up the collection into ranges of shard key values, called chunks. Each document in the collection falls into exactly one chunk, based on the value of the shard key in that document. Chunks are automatically split into smaller, non-overlapping chunks in order to keep the data volume within each chunk about the same size. Finally, whenever any shard has too many chunks, the cluster will have that shard migrate some of its chunks in order to even out the distribution of chunks over the group of shards. All of the splitting and migrating activity is invisible to ordinary applications, because the mongos hides it all. The mongos routes read and write requests to whichever shard or shards is the current holder of the chunk that needs to be accessed. But at any particular moment, a document might exist in multiple copies over a couple of shards, because a migration may be in progress. (Consequently, if you ever bypass the mongos and connect directly to a shard, you may find confusing data. So don’t do that.) The config servers are the canonical repository of which collections are sharded, what chunks exist in those collections, and on which shards those chunks reside. The mongos processes keep a cache of the config servers’ state, and both the shards and the mongos processes occasionally update the config servers. However, the protocol for updating config servers is carefully designed to prevent the system from losing track of any chunks; in particular, whenever any config server is unavailable, no chunks may be split or migrated, no collections may be newly sharded, and no shards may be added or removed. Thiago Veiga
  • 19. Primary Shard Every database has a primary shard that holds all the un-sharded collections for a database. The primary shard has no relation to the primary in a replica set. Shard Status Use the sh.status() method in the mongo shell to see an overview of the cluster. This reports includes which shard is primary for the database and the chunk distribution across the shards. See sh.status() method for more details. Thiago Veiga