2. • Written in: C++
• Data model: uses BSON (binary JSON),
lightweight, traversable, efficient
• Based on ‘document model’
• Use NoSQL (not only SQL), SQL is used
for RDBMS
• Retains some friendly properties of SQL
• License: AGPL (Drivers:Apache)
15. mongoDB takes entirely different approach
Here, Data is stored in records (called as
documents)
Separate documents for each patient (example)
22. MongoDB Distributed Systems Architecture
Replication Sets
Single-master!
Maintains backup copies of database
instance
Secondaries can elect a new primary within
seconds if primary goes down
23. Architecture
Replica Set Quirks
Replicas only address durability, not ability to
scale
A majority of the servers in your set must agree
on the primary
Even number of servers (2) does not work
24. Architecture
Write process
All write operation go through primary, which
applies the write operation.
Write operation then records operations on
primary’s operation log “oplog”
Secondary are continuously replicating the oplog
and applying the operations to themselves in an
asynchronous process
28. Architecture
Sharding Quirks
If any one config serves goes down, your entire
database goes down.
Auto-sharding sometimes doesn’t work.
29. Architecture
Data Locality
MongoDB zoned sharding allows precise control
over where data is physically stored in a cluster.
Enables developers- data placement by
geographic region
Developer can assign each shard to a zone
representing the physical location.
Any number of shards can be associated with each
zone, and each zone can be scaled independently
of the other
30. Architecture
Data Security
Authentication
Offers integration with external security
mechanisms including Windows Active Directory,
Kerberos, etc.
Authorization
Enable to configure granular permission for user
Auditing
provide native audit log to track any DB operations
Encryption
On networks, on disk and in backups
31. Architecture
Freedom to Run Anywhere
Many companies moving to public cloud
MongoDB allows organizations to adopt cloud at
their own pace by moving select workloads as
needed.
For example, They may run the same workload in
a hybrid environment to manage sudden peaks in
demand, or use the cloud to launch services in
regions where they lack a physical data center
presence.
32. Advantages over RDBMS
Schema less
Deep query ability
Ease of scale-out
Structure of single object is clear
Uses internal memory for storing the working set,
enabling faster access of data
Mapping of application objects to database
objects is not needed
33. Why use MongoDB?
Document Oriented Storage
Index on any attribute
Replication and high availability
Auto-sharding
Rich queries
Built-in aggregation capabilities, MapReduce,
GridFS
Professional support by MongoDB
34. Where to use MongoDB?
Big Data
Content Management and Delivery
Mobile and Social infrastructure
User Data Management
Data Hub
MongoDB is a free and open-source cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemas. MongoDB is developed by MongoDB Inc., and is published under a combination of the GNU Affero General Public License and the Apache License.
SQL- Structure Query Language, RDBMS- data is stored in table(relation) and tuples in table
-Lets take an example of RDBMS (Relational Database Management system)
-Consider database of a hospital
-You can see how it get out of hands quickly
-This is exactly how developers work with data in Relational Databases
-so the record of single patient is spreadout in such dozens of table. This adds massive amount of complexity to the application
Due to this complexity
-hard for people maintaining application to understand
-it makes adding feature harder, because there are so much more to account for
-pulling data from so many places in inefficient
-Doctors have to pull out every drawer to get the complete information of a single patient
-Here each cabinet represents ‘Table’ in RDBMS
-Doctors have to pull out every drawer to get the complete information of a single patient
-Doctors have to pull out every drawer to get the complete information of a single patient
-Doctors have to pull out every drawer to get the complete information of a single patient
-You can see how complicated, error prone and slow that will be
Document of One patient
Document of another patient.
-Different amount of data for two consecutive patient document
Eg. One has email address, another doesnot
Document of three patients having different number of columns of information
All three document are stored in one cabinet
-Developer do not have to make their application accommodates the need of database anymore
mongoDB accommodates them, so their application can store data in a natural way
-It also means they can adapt, add a new thing without worrying that a simple change can break everything
-Collection- can contain pretty much anythings
Restriction: You can’ t move data across collections between different databases
-Beyond using replication for redundancy and availability, replica sets also provide a foundation for combing different classes of workload on the same MongoDB cluster, each operating against its own copy of the data.
-With workload isolation, business analysts can run exploratory queries and generate reports, and data scientists can build machine learning models without impacting operational applications
-With the operational and analytic workloads isolated from one another on different replica set nodes, they never contend for resouces.
-MongoDB provides horizontal scale-out for databases on low-cost, commodity hardware or cloud infrastructure using the technique called sharding.
-Each shard is backed by a replica set to provide always-on availability and workload isolation.
-Sharding allows developers to seamlessly scale the database as their apps grows beyond the hardware limits of a single server, and it does this without adding complexity to the application.
Sharding is transparent to applications; whether there is one or a thousand shards, the application code for querying MongoDB remains the same
Config server- knows about how things are partitioned and then use that figure out which replica set that mongos talk to to get the information that mongos want
Config server run on top of the single-master design of replica sets
Horizontal scaling: Scaling by adding more machines into your pool of resouces
Vertical scaling: you scale by adding more power (CPU, RAM) to your existing machine
Sharding types: Ranged Sharding, Hashed Sharding, Zoned Sharding
2-Data placement by geographic region for latency and governance requirements
3- Physical location (North America, Europe or China) of that shard’s servers and then map all documents to the correct zone based on its region field.
4- For instance, accommodating faster user growth in China than North America
Having freedom to put data anywhere for developers, they must also be confident that their data is secure, wherever it is stored.
Rather than build security controls back in the application, they should be able to rely on the database to implement the mechanisms needed to protect sensitive data and meet the needs of apps in regulated industries.
MongoDB features extensive capabilities to defend, detect, and control access to data
1. to not only reduce the operational overhead of managing infrastructure, but also provide their teams with on-demand services that make it easier to build and run and application backend
-MongoDB is document database in which one collection holds different documents. Number of fields, content and size of the document can differ from one document to another.
-MongoDB supports dynamic queries on documents using document-based query language that’s nearly as powerful as SQL
-
-For some applications you might not need Hadoop at all
-But MongoDB still integrates with Hadoop, Spark and most language
-GridFs=> kind of like HDFS
MongoDB is a free and open-source cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemas. MongoDB is developed by MongoDB Inc., and is published under a combination of the GNU Affero General Public License and the Apache License.