2. Unit III
Data Management – (DM -1)
(30 Theory + 23 Practical)
Prepared by
Courtesy CBSE
DCSc & Engg, PGDCA,ADCA,MCA.MSc(IT),Mtech(IT),MPhil (Comp. Sci)
Department of Computer Science, Sainik School Amaravathinagar
Cell No: 9431453730
Praveen M Jigajinni
4. To understand Databases;
To understand NoSQL;
Why MongoDB
Differentiate between SQL and NoSQL
Understand the terminologies used in
NoSQL;
explain the need of NoSQL
Basics of NoSQL Database
MongoDB
OBJECTIVES
5. What are NoSQL Databases?
A No SQL database provides a mechanism
for storage and retrieval of data that
employs less constrained consistency
models than traditional relational database.
No SQL systems are also referred to as
"NotonlySQL“ to emphasize that they do in
fact allow SQL-like query languages to be
used.
INTRODUCTION
6. The term NoSQL was used by Carlo Strozzi in
1998 to name his lightweight Strozzi NoSQL
open-source relational database that did
not expose the standard Structured Query
Language (SQL) interface, but was still
relational. His NoSQL RDBMS is distinct from
the circa-2009 general concept of NoSQL
databases.
History
What are NoSQL Databases?
7. Strozzi suggests that, because the current
NoSQL movement "departs from the
relational model altogether, it should
therefore have been called more
appropriately 'NoREL', referring to 'No
Relational'.
History
What are NoSQL Databases?
8. Strozzi suggests that, because the current
NoSQL movement "departs from the
relational model altogether, it should
therefore have been called more
appropriately 'NoREL', referring to 'No
Relational'.
History
What are NoSQL Databases?
9. What are NoSQL Databases?
• The advantages of NoSQL include being
able to handle:
• Large volumes of structured, semi-structured,
and unstructured data
• Agile sprints, quick iteration, and frequent
code pushes
Advantages
10. What are NoSQL Databases?
• The advantages of NoSQL include being
able to handle:
• Object-oriented programming that is easy to
use and flexible.
• Efficient, scale-out architecture instead of
expensive, monolithic architecture.
Advantages
11. As the name implies NoSQL, also
called Not-only-SQL are the databases
that let the developers store/manage
unstructured data and perform complex
analytical operations on it as well.
Insight - NoSQL Database
12. Nowadays a wide range of NoSQL
databases are available and can be chosen
by developers according to their
requirement. So the companies and
developers now do not need to stay
confined to a single kind of database
platform.
Insight - NoSQL Database
13. NoSQL database was first adopted
by companies such as Amazon
DynamoDB, Google and others for
solutions to real problems. These
companies realized that SQL didn't meet
their requirement and decided that they
needed a solution to this problem.
Insight - NoSQL Database
14. Insight - NoSQL Database
Then they tried their traditional
approach, they upgraded to faster
hardware. When even that did not
work, they tried to scale existing
relational solutions by de-normalizing
the schema. NoSQL stores the data in
DENORMALIZE FORM, and follows the
different model to store the data
depending upon requirements.
15. Key Characteristics of NoSQL Database
Due to a mismatch between the
in-memory data structure and relational
data structure of applications, many
problems were faced by application
developers. By using NoSQL databases,
developers do not need to convert in-
memory structure to relational
structure. They also use it as an
integration point to the application.
17. Key Characteristics of NoSQL Database
Relational databases were not designed
in such a way that they can run perfectly
on clusters.
The storage needs of an ERP application
are very different than data storage needs
of Facebook and other such applications.
The organizations are shifting to
NoSQL database to achieve higher
scalability, higher speed, and continuous
availability.
18. Features of NoSQL Database
Need of Speed - Whenever a fast response
time is required, the data should be
placed in the memory. In this case, when
the very fast response time is required we
have to choose a database that stores the
data in the memory.
19. Features of NoSQL Database
Need of Scale - With the increased
number of users and data volumes
organizations requires such a databases
which are easily scalable.
20. Features of NoSQL Database
Need for Continuous Availability - Slow
performance can drive a customer away and
nothing is worse than downtime. There is a
difference between high scalability approach
that RDBMS offer with master-slave
architecture and the continuous availability
that NoSQL databases like Cassandra offer no
downtime with redundant copies of data are
being spread throughout a cluster across
multiple locations.
21. Features of NoSQL Database
Need for Location Independence - The
ability to serve data quickly to multiple
locations is critical. Because of
fundamental master-slave design, RDBMS
struggles to provide fast read access to
many locations.
22. Features of NoSQL Database
NoSQL databases can easily spread
across multiple data centers and cloud
availability.
For example, Adobe runs on Datastax
enterprise using Apache Cassandra
Database cluster between two data
centers to ensure its customers can read
and write data fast, no matter where they
are located.
23. Features of NoSQL Database
NoSQL database like Cassandra
offers a much more flexible data model
that can easily store structured, semi-
structured and unstructured data
24. Moving From Relational Database
to NoSQL Database
New Applications Many applications
which made in SQL begin with NoSQL by
creating a new application and starting
from the ground up, but it creates the
issue of application rewrite.
25. Moving From Relational Database
to NoSQL Database
Augmentation (a process of making greater
or larger in size)
Some choose to augment an existing by adding
a NoSQL component to it. This often happens
with applications than having outgrown RDBMS
due to scaling issues, the need for better
availability or other issues. Part of the
application continues to use existing RDBMS,
but the other components of an application are
modified to utilize the NoSQL database.
26. Moving From Relational Database
to NoSQL Database
Full Rip-Replace
The system that simply is proving
too costly from an RDBMS perspective to
keep or increase of users concurrency. A
full replacement is done with NoSQL
database.
27. Types of NOSQL Databases
There are multiple types of data models
that NoSQL database uses. Based on these
data model types we can categorize NoSQL
databases as:
i) Key Value Databases.
Ii) Document Databases.
iii) Column Family Store Databases.
Iv) Graph Databases.
28. i) Key Value Databases
The key-value part refers to the fact
that the database stores data as a
collection of key/valuepairs. This is a
simple method of storing data, and it is
known to scale well. The key-value pair is
a well established concept in many
programming languages.
29. WHAT IS A KEY-VALUE DATABASE?
The value is stored as a blob.
The storage of value as BLOB removes
the need to index the data to improve
performance so that we cannot control
what's returned from a request by value.
30. WHAT IS A KEY-VALUE DATABASE?
Key value stores do not have any query
language. They only allow to store,
retrieve and update data using simple get,
put and delete commands and the data
can be retrieved by making a direct
request to the object in memory or on
disk.
31. EXAMPLES OF KEY-VALUE STORE DATABASES
SOFTWARES
Aerospike
Apache Cassandra
Berkeley DB
Couchbase Server
Redis
Riak
Memcached
34. ii) Document Databases.
(Document Oriented Databases)
A document database is a type of
non-relational database that is
designed to store semistructured data
as documents. Document
databases are intuitive for developers
to use because the data in the
application tier is typically represented
as a JSON document.
35. INSIGHT - DOCUMENT DATABASE
Document-based databases are
similar to key/values databases. They store
data on the basis of key/value which is
similar to a key-value database. But the
only difference is that it stores the values
in form of XML, JSON(javascript object
Notation), BSON (Binary encoding of
JSON objects).
36. INSIGHT - A DOCUMENT DATABASE
The database understands the format
of data so that the operations can be
performed easily.
It allows the storage of complex data.
If we want to store trees, collections, and
dictionaries, then it is a good choice.
37. INSIGHT - A DOCUMENT DATABASE
It does not support relations. Each
document is standalone. It can refer to
other documents by storing their key,
corresponding to the particular document.
Document-based databases do not
support the joins, so it almost overcomes
the problem of sharing the data across
multiple nodes.
38. INSIGHT - A DOCUMENT DATABASE
Queries - There is no other way to query
the data except the key-value stores. We
can also perform range queries on the
basis of a key.
Transactions - Mostly document based
database support transaction for a single
document.
39. Schemaless - Schemaless means it does not
require any schema to store the data. Each
document can differ in the number of
columns. It understands the data of JSON
format only.
INSIGHT - A DOCUMENT DATABASE
40. Scaling up - In this database, each
document is an independent document. It
does not support joins. So it is easily
possible to share the data across multiple
nodes independent of each other.
INSIGHT - A DOCUMENT DATABASE
42. BENEFITS OF A DOCUMENT DATABASE
Document stores offer important
advantages when specific characteristics
are required, including:
Flexible data modeling: As web, mobile,
social, and IoT-based applications change
the nature of application data models,
document databases eliminate the need
to force-fit relational data models to
support new types of application data
models.
45. TERMS: RDBMS VS. MONGO DB
RDBMS MongoDB
Database ➜ Database
Table ➜ Collection
Row ➜ Document
Index ➜ Index
Join ➜ Embedded
Document
Foreign Key ➜ Reference
46. COLUMN-FAMILY STORE DATABASE
Column-family databases store data
in column families as rows. These rows
have many columns associated with a
particular row. Column families basically
contain the group of correlated data
which we can access together.
Each column family can be compared
to a container of rows in an RDBMS table
where the key identifies the row and the
row consists of multiple columns.
47. COLUMN-FAMILY STORE DATABASE
Rows do not need to have the same
columns, and columns can be added to
any row at any time without having to
add it to other rows.
When a column consists of a map
of columns, we have a super column. A
super column consists of a name and a
value which is a map of columns. Think
of a super column as a container of
columns.
50. MAJOR BENEFITS OF COLUMN FAMILY
DATABASE
Compression
Aggregation Queries
Scalability
Fast to load and query
51. UNDERSTANDING GRAPH DATABASE
Graph databases allow you to store
data in the form of nodes and edges in
which nodes are represented as entities
and relationships are represented in form
of edges. Node is an instance of an object
in an application. Relations which are
known as edges can also have their
properties. Edges have directional
significance to represent the relationship
between the edges.
52. UNDERSTANDING GRAPH DATABASE
The graph database allows you to store the
data only once and a number of different
types of relationships can be stored in
these nodes.
Relationships which are represented in the
form of edges can be unidirectional, bi-
directional which is same as the one to
one, one to many, many to one and many
to many relationship types of Relational
Database Management System.
53. UNDERSTANDING GRAPH DATABASE
In RDBMS, adding another relation after
the schema creation results in a lot of
schema changes and data movement. But
this problem of RDBMS is overcome by
graph databases. It requires only storing
data once in the form of nodes, then after
a number of different types of
relationships in form of edges can be
specified to the already stored data (data
which is stored in the form of nodes).
54. UNDERSTANDING GRAPH DATABASE
In graph databases, relationships between
the nodes are not calculated at query time
because it is persisted as a relationship.
Traversing persisted relationships are faster
as compared to calculating the relationship
at query time.
55. UNDERSTANDING GRAPH DATABASE
Relationships are an important part of the
graph database. By adding properties to
the edges (relationships), we can add some
level of intelligence to the graph database.
Adding new relationships to the graph
databases is easy. But changing existing
relationships to the graph databases is a
difficult task because
56. UNDERSTANDING GRAPH DATABASE
changes have to be made on each node
and for each relationship in the existing
data. So changing existing node and their
relationships is similar to data migration.
Since most of the queries to the graph
database are answered on the basis of
relationships and its properties, it is
mandatory to choose the relationship
properly.
57. UNDERSTANDING GRAPH DATABASE
to choose the relationship properly.
There are different types of graph
databases. Some graph databases, support
only single-depth relationships. With some
graph databases, we can not traverse more
than one level of relationship.
60. What is the CAP Theorem?
The concept of consistency(C),
availability and partition tolerance(P)
across distributed systems gives rise to
the need of CAP theorem. But CAP
theorem demonstrates that any
distributed system cannot guarantee C,
A, and P simultaneously.
61. Consistency in CAP Theorem
When data is stored on multiple
nodes, all the nodes should see the same
data, meaning, that when the data is
updated at one node then the same
update should be made at the other
nodes storing the same data also.
For example, if we perform a read
operation, it will return the value of the
most recent write operation causing all
nodes to return the same data.
62. Consistency in CAP Theorem
A system is said to be in a consistent
state, if the transaction starts with the
system in a consistent state, and ends
with a system in a consistent state. In this
model, a system can shift into an
inconsistent state during a transaction
but, in this case, the entire transaction
gets rolled back if there is an error at any
stage in the process.
63. Availability in CAP Theorem
To achieve a higher order of
availability, it is required that system
should remain operational 100% all the
time. So we can get a response at any
time. So according to this whenever a
user makes a request, a user should be
able to get the response regardless the
state of a system.
64. Partition Tolerance in CAP Theorem
According to this, a system should
work despite message loss or partition
failure. A system that is partition-tolerant
can sustain any amount of network
failure. A system that is partition tolerant
can sustain any amount of network
failure that does not result in a failure of
the entire network.
65. Data storage models which come under
the NoSQL database
Data storage models which come
under the NoSQL database of the
following but it is not possible to follow
all -
CA(Consistency and Availability)
AP(Availability with Partition Tolerance)
CP(Consistency with Partition Tolerance)
66. Mongo DB Vs SQL Databases
SQL Database NoSQL Database
(MongoDB)
Relational database Non-relational database
Supports SQL query
language
Supports JSON query
language
Table based Collection based and
key-value pair
Row based Document based
67. Mongo DB Vs SQL Databases
SQL Database NoSQL Database
(MongoDB)
Column based Field based
Support foreign key No support for foreign
key
Support for triggers No Support for triggers
Contains schema
which is predefined
Contains dynamic schema
68. Mongo DB Vs SQL Databases
SQL Database NoSQL Database
(MongoDB)
Not fit for
hierarchical data
storage
Best fit for hierarchical
data storage
Vertically scalable -
increasing RAM
Horizontally scalable - add
more servers
69. Mongo DB Vs SQL Databases
SQL Database NoSQL Database
(MongoDB)
Emphasizes on ACID
properties (Atomicity,
Consistency, Isolation
and Durability)
Emphasizes on CAP
theorem (Consistency,
Availability and Partition
tolerance)
72. CRUD OPERATIONS
Create operation – Create operation
or Insert operation are used to add new
documents to the collection and if the
collection does not exist, it creates one.
Following command can insert a document
on the collection –
db.collection.insert()
db.collection.insertone()
db.collection.insertmany()
76. CRUD OPERATIONS – SAVE
db.collection.save()
db.products.save( { item: "book", qty: 40 } )
Updates an existing document or
inserts a new document, depending on
its document parameter.
In the above example, save() method
performs an insert since the document
passed to the method does not contain
the _id field:
77. CRUD OPERATIONS – SAVE
db.collection.save()
During the insert, the shell will
create the _id field with a
unique ObjectId value, as verified by the
inserted document:
{ "_id" :
ObjectId("50691737d386d8fadbd
6b01d"), "item" : "book",
"qty" : 40 }
78. CRUD OPERATIONS – SAVE
db.collection.save()
In the following example, save() performs
an update with upsert:true since the
document contains an _idfield:
db.products.save( { _id: 100,
item: "water", qty: 30 } )
79. USING ARRAYS AS VALUES IN DOCUMENT
Array stores group of similar values,
In MongoDB arrays are enclosed in square
brackets [ ]
For Example:
[1,2,3,4,5,6]
[‘Jan’, ’Feb’, ’Mar’, ’Apr’, ’May’, ’June’]
80. USING ARRAYS AS VALUES IN DOCUMENT
db.students.save({
name:’Pranav’,
Class: XI,
section:’B’,
RollNo:1234,
Subjects:[‘Computer Sci’,
‘Physics’,
‘Chemistry’,
’English’,
’Maths’]}
81. WriteResult()
The save() returns
a WriteResult object that contains the
status of the insert or update operation.
SeeWriteResult for insert and WriteResult
for update for details.
82. WriteResult( ) -Example
newstud={ name:’Aakash’, Class: XI,
section:’B’, RollNo:2351,
Subjects:[‘Computer Sci’,
‘Physics’, ‘Chemistry’,
’English’, ’Maths’]}
>db.students.save(newstud);
WriteResult({“nInserted” : 1})
nInserted – total number of documents
inserted, in the above example 1
document is inserted.
84. CRUD OPERATIONS
Read operation – This operation reads the
documents from the collection. This
process is taken place by executing a
query.
The command to read the document is –
db.collection.find()
85. CRUD OPERATIONS
Other commands to read the document
are –
db.collection.find()
db.collection.findone()
db.collection.find(key:value)
86. CRUD OPERATIONS
Other commands to read the document
are –
db.collection.find()
db.collection.findone()
db.collection.find(key:value)
87. CRUD OPERATIONS - db.collection.find()
db.collection.find()
This will list all the documents from
a collection in the database.
Example:
db.bios.find( { _id: 5 } )
db.bios.find( { "name.last": "Hopper" } )
89. CRUD OPERATIONS -
db.collection.find(key:value)
db.collection.find(key:value)
This will list one document in the
given collection of given database.
db.bios.findOne( { contribs: 'OOP' }, { _id: 0,
'name.first': 0, birth: 0 } )
This operation returns a document in the bios
collection where the contribs field contains the
element OOP and returns all fields except the _id field,
the first field in the name embedded document, and
the birthfield:
90. SOME BASIC OPERATIONS
Logical Query Operators
Name Description
$and
Joins query clauses with a logical AND returns all
documents that match the conditions of both clauses.
$not
Inverts the effect of a query expression and returns
documents that do not match the query expression.
$nor
Joins query clauses with a logical NOR returns all
documents that fail to match both clauses.
$or
Joins query clauses with a logical OR returns all
documents that match the conditions of either clause.
91. SOME BASIC OPERATIONS
Logical Query Operator: $and
db.inventory.find( { $and:
[ { price: { $ne: 1.99 } },
{ price: { $exists: true } } ] } )
This query will select all documents in the inventory collection
where:
•the price field value is not equal to 1.99 and
•the price field exists.
92. SOME BASIC OPERATIONS
Logical Query Operator: $not
db.inventory.find( { price: { $not: { $gt: 1.99 } } } )
This query will select all documents in the inventory collection
where:
the price field value is less than or equal to 1.99 or
the price field does not exist
93. SOME BASIC OPERATIONS
Logical Query Operator: $nor
db.inventory.find( { $nor: [ { price: 1.99 }, { qty: { $lt:
20 } }, { sale: true } ] } )
This query will select all documents in the inventory collection
where:
the price field value does not equal 1.99 and
the qty field value is not less than 20 and
the sale field value is not equal to true
94. SOME BASIC OPERATIONS
Logical Query Operator: $or
db.inventory.find( { $or: [ { quantity: { $lt: 20 } }, {
price: 10 } ] } )
This query will select all documents in the inventory collection
where either the quantity field value is less
than 20 or the price field value equals 10
95. SOME BASIC OPERATIONS
COMPARISON OPERATOR
Name Description
$eq Matches values that are equal to a specified value.
$gt Matches values that are greater than a specified value.
$gte
Matches values that are greater than or equal to a specified
value.
$in Matches any of the values specified in an array.
$lt Matches values that are less than a specified value.
$lte
Matches values that are less than or equal to a specified
value.
$ne Matches all values that are not equal to a specified value.
$in Matches any value specified in an array.
$nin Matches none of the values specified in an array.
97. SOME BASIC OPERATIONS
MATCHING OPERATOR
Name Description
$in in operator
$nin Not in operator
EXAMPLES
{grade: { $in:[‘A’,’B’,’C’]}} Grade should be A or B or C
{grade: { $in:[’D’,’E’]}} Grade should not be in D and E
99. CRUD OPERATIONS
Update operation – Update operation is
used to modify an existing document.
The command that updates a
document is –
db.collection.update()
db.collection.updateMany()
100. CRUD OPERATIONS - UPDATE
db.people.update( { name: "Andy" },
{ name: "Andy", rating: 1, score: 1 },
{ upsert: true } )
The update() method either modifies
specific fields in existing documents or
replaces an existing document entirely.
If upsert is true and no document matches the
query criteria, update() inserts
a single document. The update creates the new
document with either:
101. CRUD OPERATIONS – UPDATE MANY
db.inspectors.updateMany(
{ "Sector" : { $gt : 4 },
"inspector" : "R. Coltrane" },
{ $set: { "Patrolling" : false } },
{ upsert: true } );
Updates multiple documents within
the collection based on the filter.
103. CRUD OPERATIONS -
db.orders.deleteOne()
Delete operation – Delete operation
erases the document from the collection.
db.orders.deleteOne( { "_id" :
ObjectId("563237a41a4d68582c2509da")
} );
104. CRUD OPERATIONS
db.orders.deleteMany()
Removes all documents that match
the filter from a collection.
The following operation deletes all
documents where
stock : "Brent Crude Futures" and
limit is greater than 48.88:
db.orders.deleteMany( { "stock" : "Brent
Crude Futures", "limit" : { $gt : 48.88 } } );
105. CRUD OPERATIONS – Remove Method
Removes documents from a
collection
The following operation removes the
first document from the
collection products where qty is greater
than 20:
db.products.remove( { qty: { $gt: 20 } },
true )
106. CRUD OPERATIONS – Remove all
documents
To remove all documents in a
collection, call the remove method with
an empty query document {}. The
following operation deletes all documents
from the bios collection:
db.bios.remove( { } )