SlideShare a Scribd company logo
1 of 81
NOSQL DATABASES
AND BIG DATA STORAGE SYSTEMS
Ateeq Ateeq
CONTENT
 1- Introduction to NOSQL Systems
 2- The CAP Theorem
 3- Document-Based NOSQL Systems and MongoDB
 4- NOSQL Key-Value Stores
 5- Column-Based or Wide Column NOSQL Systems
 6- NOSQL Graph Databases and Neo4j
INTRODUCTION TO NOSQL SYSTEMS
 1.1 Emergence of NOSQL Systems
 1.2 Characteristics of NOSQL Systems
 1.3 Categories of NOSQL Systems
1.1 EMERGENCE OF NOSQL SYSTEMS
 SQL system may not be appropriate for some applications
such as Emails
 SQL systems offer too many services (powerful query
language, concurrency control, etc.), which this application
may not need;
 structured data model such the traditional relational model
may be too restrictive.
 SQL require schemas, which are not required by many of
the NOSQL systems.
1.1 EMERGENCE OF NOSQL SYSTEMS
 Examples of NOSQL systems:
 Google – BigTable
 Amazon – DynamoDB
 Facebook – Cassandra
 MongoDB
 CouchDB
 Graph databases like Neo4J and GraphBase
1.2 CHARACTERISTICS OF NOSQL SYSTEMS
 NOSQL characteristics related to distributed
databases and distributed systems.
 NOSQL characteristics related to data models and
query languages.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
1- Scalability:
 horizontal scalability: adding more nodes for data
storage and processing as the volume of data grows.
 Vertical scalability: expanding the storage and
computing power of existing nodes.
 In NOSQL systems, horizontal scalability is employed
while the system is operational, so techniques for
distributing the existing data among new nodes without
interrupting system operation are necessary.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
2- Availability, Replication and Eventual Consistency:
 Data is replicated over two or more nodes in a
transparent manner.
 Update must be applied to every copy of the replicated
data items.
 Eventual consistency: is a consistency model used in
distributed computing to achieve high availability that
informally guarantees that, if no new updates are made
to a given data item, eventually all accesses to that item
will return the last updated value.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
3- Replication Models:
 Master-slave replication: requires one copy to be the
master copy;
 Write operations must be applied to the master copy, usually
using eventual consistency
 For read, all reads are from the master copy, or reads at the
slave copies but would not guarantee that the values are the
latest writes.
 Master-master replication: allows reads and writes at
any of the replicas.
 The values of an item will be temporarily inconsistent.
 Reconciliation method to resolve conflicting write operations of
the same data item at different nodes must be implemented as
part of the master-master replication scheme.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
 4- Sharding of Files:
 Files can have many millions of records accessed concurrently by
thousands of users.
 Sharding (also known as horizontal) serves to distribute the load
of accessing the file records to multiple nodes.
 Shards works in tandem to improve load balancing on the
replication as well as data availability.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
 5- High-Performance Data Access:
 Hashing: The location of the value is given by the result of h(k).
 Range partitioning: the location is determined via a range of key values.
Example: location i would hold the objects whose key values K are in the
range Kimin ≤ K ≤ Kimax.
In applications that require range queries, where multiple objects within a range of
key values are retrieved, range partitioned is preferred.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DATA MODELS
AND QUERY LANGUAGES.
 1- Not Requiring a Schema:
 Allowing semi-structured and self describing data.
 The users can specify a partial schema in some systems to improve storage
efficiency, but it is not required to have a schema in most of the NOSQL
systems.
 Constraints on the data would have to be programmed in the application
programs that access the data items.
 Languages for describing semi-structured data: JSON (JavaScript Object
Notation) and XML (Extensible Markup Language)
CHARACTERISTICS RELATED TO DATA MODELS
AND QUERY LANGUAGES.
 2- Less Powerful Query Languages:
 In many applications that use NOSQL systems may not require a powerful
query language such as SQL, because search (read) queries in these systems
often locate single objects in a single file based on their object keys.
 Reading and writing the data objects is accomplished by calling the
appropriate operations by the programmer (API).
 SCRUD: Search, Create, Read, Update and Delete
 Provide a high-level query language, but it may not have the full power of
SQL, for example the joins need to be implemented in the application
programs.
CHARACTERISTICS RELATED TO DATA MODELS
AND QUERY LANGUAGES.
 3- Versioning:
 Provide storage of multiple versions of the data items, with the timestamps of
when the data version was created.
1.3 CATEGORIES OF NOSQL SYSTEMS
The most common categories:
1. Document-based NOSQL systems:
 Store data in the form of documents using well-known formats such as JSON.
 Documents are accessible via their document id, but can also be accessed rapidly
using other indexes.
2. NOSQL key-value stores:
 Fast access by the key to the value associated with the key
 Value can be a record or an object or a document or even have a more complex
data structure.
3. Column-based or wide column NOSQL systems:
 Partition a table by column into column families
 Form of vertical partitioning.
4. Graph-based NOSQL systems:
 Data is represented as graphs
 Related nodes can be found by traversing the edges using path expressions.
1.3 CATEGORIES OF NOSQL SYSTEMS
Additional categories :
5. Hybrid NOSQL systems:
 These systems have characteristics from two or more of the common categories..
6. Object databases.
7. XML databases.
THE CAP THEOREM
 The CAP: it’s impossible to guarantee consistency, availability and
partition tolerance at the same time in a distributed system with data
replication.
 Two properties out of the three to guarantee.
 Weaker consistency levels are often used in NOSQL system instead
of guaranteeing serializability.
 Eventual consistency is used.
THE CAP THEOREM
 The CAP theorem is used to explain some of the
competing requirements in a distributed system with
replication.
 The three letters in CAP refers to
 Consistency (among replicated copies):
 The nodes will have the same copies of a replicated data item
visible for various transactions.
 Availability (of the system for read and write operations) :
 Each read or write will either be processed successfully or will
receive a message that the operation cannot be completed.
 Partition tolerance (in the face of the nodes in the system
being partitioned by a network fault).:
 The system can continue operating if the network connecting the
nodes has a fault that results in two or more partitions,
 Nodes in each partition can only communicate among each other.
THE CAP THEOREM
DOCUMENT-BASED NOSQL SYSTEMS AND MONGODB
1. Introduction
2. MongoDB Data Model
3. MongoDB CRUD Operations
4. MongoDB Distributed Systems Characteristics
3.1INTRODUCTION
 Document-based NOSQL systems store data as
collections of similar documents.
 Documents resemble complex objects or XML
documents
 Documents in a collection should be similar, but
they can have different attributes.
 Document-based NOSQL systems: MongoDB and
CouchDB.
3.2 MONGODB DATA MODEL
 MongoDB is a free and open-source cross-platform
document-oriented database.
 Classified as a NoSQL database,
3.2 MONGODB DATA MODEL
 MongoDB documents are stored in BSON (Binary
JSON) format.
 BSON is a variation of JSON with some additional data
types and is more efficient for storage than JSON.
 Individual documents are stored in a collection.
 The operation createCollection is used to create each
collection.
3.2 MONGODB DATA MODEL
 Example: create a collection called project to hold PROJECT
objects from the COMPANY database :
db.createCollection(“project”, { capped : true, size : 1310720,
max : 500 } )
 “project” is the name of the collection (Mandatory)
 Capped: capped means it has upper limits on its storage
space (size) and number of documents (max).
 Capping helps the system to choose the storage options
for each collection.
3.2 MONGODB DATA MODEL
 Example: create a document collection called worker :
db.createCollection(“worker”, { capped : true, size : 5242880, max : 2000 } )
 Each document has a unique ObjectId field “_id”
 The _id is by default:
 Automatically indexed in the collection.
 The value is system-generated.
 System-generated have a specific format – “combines the timestamp when the object is
created, the node id, the process id and a counter “.
 User-generated can have any value specified by the user as long as its.
3.2 MONGODB DATA MODEL
 A collection does not have a schema.
 The structure of the data fields in documents is chosen based on
how documents will be accessed and used, and the user can choose
a normalized design (similar to normalized relational tuples) or a
denormalized design (similar to XML documents or complex objects).
 Interdocument references can be specified by storing in one
document the ObjectId or ObjectIds of other related documents.
3.2 MONGODB DATA MODEL
Company database example
3.2 MONGODB DATA MODEL
Project info
Embedded workers info
3.2 MONGODB DATA MODEL
Project info
Embedded workers array
Workers
3.2 MONGODB DATA MODEL
Project ID as an attribute
3.2 MONGODB DATA MODEL
3.3 MONGODB CRUD OPERATIONS
 Insert:
 db.<collection_name>.insert(<document(s)>)
 Example:
 Db.project.insert({_id:”P1”, Pname:”ProjectX”,Plocation:”Jenin”})
 Delete: remove
 db.<collection_name>.remove(<condition>)
 Example:
 db.project.remove( {"_id": ObjectId(“P1")});
3.3 MONGODB CRUD OPERATIONS
 Read: fined
 db.<collection_name>.find(<condition>)
 Example:
 Db.project.find({"_id": ObjectId(“P1")})
 Update:
 db.<collection_name>. update(SELECTIOIN_CRITERIA,
UPDATED_DATA)
 Example:
 Db.project.update({"_id" : ObjectId(P1)},{$set:{‘PLocation':‘AAUJ'}})
3.4 MONGODB DISTRIBUTED SYSTEMS
CHARACTERISTICS
 Replication in MongoDB
 Sharding in MongoDB
REPLICATION IN MONGODB
 Master-slave approach for replication.
 All read and write are done on the primary copy.
 Secondary copies are to recover from primary fails.
SHARDING IN MONGODB
 Sharding of the documents in the collection—also
known as horizontal partitioning— divides the
documents into disjoint partitions known as shards.
 Two ways:
 Range partitioning
 Hash partitioning
SHARDING IN MONGODB
 Range and Hash portioning require that the user
specify a particular document field to be used as
the basis for partitioning the documents into shards.
 The partitioning field—known as the “shard key”,
must exist in every document in the collection, and
it must have an index.
 The values of the shard key are divided into
chunks, and the documents are partitioned based
on the chunks of shard key values
SHARDING IN MONGODB
 Chunks created by specifying a range of key values
and each chunk contains the key values in one
range.
 If range queries are commonly applied to a
collection (for example, retrieving all documents
whose shard key value is between 200 and 400),
then range partitioning is preferred
 Because each range query will typically be submitted to
a single node that contains all the required documents
in one shard.
 If most searches retrieve one document at a time,
hash partitioning may be preferable because it
randomizes the distribution of shard key values into
chunks.
SHARDING IN MONGODB
 MongoDB queries are submitted to a module called
the query router, which keeps track of which nodes
contain which shards based on the particular
partitioning method used on the shard keys.
 The query will be routed to the nodes that contain the
shards that hold the documents that the query is
requesting.
 If the system cannot determine which shards hold the
required documents, the query will be submitted to all
the nodes that hold shards of the collection.
SHARDING IN MONGODB
 Sharding and replication are used together:
 Sharding focuses on improving performance via load
balancing and horizontal scalability.
 Replication focuses on ensuring system availability
when certain nodes fail in the distributed system.
WHY NOSQL?
 Document or table ?
WHY NOSQL?
 Alter the table and add Description, Rate and Reviews
 NOSQL is Flexible
No Schema restrictions
WHY NOSQL?
 SQL is Restricted !
Fill the data
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Flexibility for Faster Development
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Flexibility for Faster Development
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Simplicity for Easier Development
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Simplicity for Easier Development
 Reading this profile would require the application to
read six rows from three table
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Simplicity for Easier Development
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Availability for Always-on
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Availability for Always-on
NOSQL CATEGORIES EXAMPLES -
DOCUMENT-BASED NOSQL SYSTEMS
XML is stored into a native XML Type
NOSQL CATEGORIES EXAMPLES -
DOCUMENT-BASED NOSQL SYSTEMS
 The query retrieves the <Features> child element of
the <ProductDescription> element
 Result:
NOSQL CATEGORIES EXAMPLES - NOSQL
KEY-VALUE STORES
 RIAK as example
NOSQL CATEGORIES EXAMPLES - NOSQL
KEY-VALUE STORES
 The response to a query will be an object contains
a list of documents which match the given query.
 The documents returned are Search documents (a
set of Solr field/values)
NOSQL CATEGORIES EXAMPLES - COLUMN
NOSQL SYSTEMS
 Cassandra as an example
 returns a result-set of rows, where each row
consists of a key and a collection of columns
corresponding to the query
NOSQL CATEGORIES EXAMPLES - COLUMN
NOSQL SYSTEMS
 LOCAL_QUORUM: it’s a consistency level type
 Used in multiple data center clusters.
 Use to maintain consistency locally (within the single data center).
NOSQL CATEGORIES EXAMPLES - GRAPH-
BASED NOSQL SYSTEMS
 Neo4j as an example
NOSQL CATEGORIES EXAMPLES - GRAPH-
BASED NOSQL SYSTEMS
NOSQL CATEGORIES EXAMPLES - OBJECT
DATABASES
 LINQ as an example
NOSQL KEY-VALUE STORES
1. Introduction
2. DynamoDB Overview
3. Voldemort Key-Value Distributed Data Store
4. Examples of Other Key-Value Stores
4.1 INTRODUCTION
 No query language
 A set of operations that can be used by the
application programmers.
 Characteristics:
 Every value is associated with a unique key.
 Retrieving the value by supplying the key is very fast.
4.1 INTRODUCTION
4.2 DYNAMODB OVERVIEW
 Amazon product – part AWS
 Data model is using the concepts of tables, items,
and attributes.
 The table does not have a schema.
 Holds a collection of self-describing items.
 The item consist of a number of (attribute, value) pairs
 Attribute values can be single-valued or multivalued.
4.2 DYNAMODB OVERVIEW
 Uploads an item to the ProductCatalog table
4.3 VOLDEMORT KEY-VALUE DISTRIBUTED
DATA STORE
 Based on Amazon’s DynamoDB.
 Used by LinkedIn.
 Simple and basic set of operations, like (put, delete
and get).
 Pluggable with other storage engines like MySQL
 Nodes are independent
 Automatic replications and partitioning
4.3 VOLDEMORT KEY-VALUE DISTRIBUTED
DATA STORE
4.4 EXAMPLES OF OTHER KEY-VALUE
STORES
1. Oracle key-value store.
2. Redis key-value cache and store.
3. Apache Cassandra
COLUMN-BASED OR WIDE COLUMN
NOSQL SYSTEMS
 Stores data tables as columns rather than as rows.
HBASE DATA MODEL AND VERSIONING
 Apache HBase is an open-source, distributed, versioned, non-
relational database.
 Column is identified by a combination of (column family:column
qualifier).
 Stores multiple versions of a data item, with a timestamp associated
with each version.
HBASE DATA MODEL AND VERSIONING
HBASE DATA MODEL AND VERSIONING
 Table is divided into a number of regions.
 Range partitioning.
 Apache Zookeeper and Apache HDFS (Hadoop Distributed
File System) are used for management.
NOSQL GRAPH DATABASES AND NEO4J
 The data is represented as a graph, which is a collection of vertices
(nodes) and edges.
 Nodes and edges can be labeled to indicate the types of entities and
relationships they represent
 It is generally possible to store data associated with both individual
nodes and individual edges.
 Neo4j is a NOSQL Graph DB and it’s an open source system, also it
is implemented in Java.
NEO4J
 The data model in Neo4j organizes data using the concepts of nodes
and relationships.
 Nodes and relationships have properties which store the data items.
 Nodes can have labels.
 Nodes that have the same label are grouped into a collection that
identifies a subset of the nodes in the database graph for querying
purposes.
 A node can have zero, one, or several labels.
NEO4J
NEO4J
Nosql databases

More Related Content

What's hot

What's hot (20)

NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
SQL & NoSQL
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
NoSql
NoSqlNoSql
NoSql
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
Cloud Resource Management
Cloud Resource ManagementCloud Resource Management
Cloud Resource Management
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 

Viewers also liked

NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
Deterministic simulation testing
Deterministic simulation testingDeterministic simulation testing
Deterministic simulation testingFoundationDB
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014Anuj Sahni
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-ConceptsBhaskar Gunda
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation Ericsson Labs
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTobias Trelle
 

Viewers also liked (20)

NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
FoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACIDFoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACID
 
Deterministic simulation testing
Deterministic simulation testingDeterministic simulation testing
Deterministic simulation testing
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
NoSQL and ACID
NoSQL and ACIDNoSQL and ACID
NoSQL and ACID
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 
NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL Databases
 

Similar to Nosql databases

Softwae and database in data communication network
Softwae and database in data communication networkSoftwae and database in data communication network
Softwae and database in data communication networkAyoubSohiabMohammad
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSabdurrobsoyon
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...ijdms
 
no sql presentation
no sql presentationno sql presentation
no sql presentationchandanm2
 
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreOracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreTIB Academy
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB DatabaseTariqul islam
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Databasepuja_dhar
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle databaseSamar Prasad
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle databaseSamar Prasad
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Mohamed Galal
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingNosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingHUSNAINAHMAD39
 

Similar to Nosql databases (20)

No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Softwae and database in data communication network
Softwae and database in data communication networkSoftwae and database in data communication network
Softwae and database in data communication network
 
Datastores
DatastoresDatastores
Datastores
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
no sql presentation
no sql presentationno sql presentation
no sql presentation
 
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreOracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
 
Oracle archi ppt
Oracle archi pptOracle archi ppt
Oracle archi ppt
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
ORDBMS.pptx
ORDBMS.pptxORDBMS.pptx
ORDBMS.pptx
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Database
 
Datastores
DatastoresDatastores
Datastores
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingNosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understanding
 

Recently uploaded

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 

Recently uploaded (20)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Nosql databases

  • 1. NOSQL DATABASES AND BIG DATA STORAGE SYSTEMS Ateeq Ateeq
  • 2. CONTENT  1- Introduction to NOSQL Systems  2- The CAP Theorem  3- Document-Based NOSQL Systems and MongoDB  4- NOSQL Key-Value Stores  5- Column-Based or Wide Column NOSQL Systems  6- NOSQL Graph Databases and Neo4j
  • 3. INTRODUCTION TO NOSQL SYSTEMS  1.1 Emergence of NOSQL Systems  1.2 Characteristics of NOSQL Systems  1.3 Categories of NOSQL Systems
  • 4. 1.1 EMERGENCE OF NOSQL SYSTEMS  SQL system may not be appropriate for some applications such as Emails  SQL systems offer too many services (powerful query language, concurrency control, etc.), which this application may not need;  structured data model such the traditional relational model may be too restrictive.  SQL require schemas, which are not required by many of the NOSQL systems.
  • 5. 1.1 EMERGENCE OF NOSQL SYSTEMS  Examples of NOSQL systems:  Google – BigTable  Amazon – DynamoDB  Facebook – Cassandra  MongoDB  CouchDB  Graph databases like Neo4J and GraphBase
  • 6. 1.2 CHARACTERISTICS OF NOSQL SYSTEMS  NOSQL characteristics related to distributed databases and distributed systems.  NOSQL characteristics related to data models and query languages.
  • 7. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS 1- Scalability:  horizontal scalability: adding more nodes for data storage and processing as the volume of data grows.  Vertical scalability: expanding the storage and computing power of existing nodes.  In NOSQL systems, horizontal scalability is employed while the system is operational, so techniques for distributing the existing data among new nodes without interrupting system operation are necessary.
  • 8. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS 2- Availability, Replication and Eventual Consistency:  Data is replicated over two or more nodes in a transparent manner.  Update must be applied to every copy of the replicated data items.  Eventual consistency: is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
  • 9. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS 3- Replication Models:  Master-slave replication: requires one copy to be the master copy;  Write operations must be applied to the master copy, usually using eventual consistency  For read, all reads are from the master copy, or reads at the slave copies but would not guarantee that the values are the latest writes.  Master-master replication: allows reads and writes at any of the replicas.  The values of an item will be temporarily inconsistent.  Reconciliation method to resolve conflicting write operations of the same data item at different nodes must be implemented as part of the master-master replication scheme.
  • 10. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 11. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 12. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS  4- Sharding of Files:  Files can have many millions of records accessed concurrently by thousands of users.  Sharding (also known as horizontal) serves to distribute the load of accessing the file records to multiple nodes.  Shards works in tandem to improve load balancing on the replication as well as data availability.
  • 13. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 14. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS  5- High-Performance Data Access:  Hashing: The location of the value is given by the result of h(k).  Range partitioning: the location is determined via a range of key values. Example: location i would hold the objects whose key values K are in the range Kimin ≤ K ≤ Kimax. In applications that require range queries, where multiple objects within a range of key values are retrieved, range partitioned is preferred.
  • 15. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 16. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 17. CHARACTERISTICS RELATED TO DATA MODELS AND QUERY LANGUAGES.  1- Not Requiring a Schema:  Allowing semi-structured and self describing data.  The users can specify a partial schema in some systems to improve storage efficiency, but it is not required to have a schema in most of the NOSQL systems.  Constraints on the data would have to be programmed in the application programs that access the data items.  Languages for describing semi-structured data: JSON (JavaScript Object Notation) and XML (Extensible Markup Language)
  • 18. CHARACTERISTICS RELATED TO DATA MODELS AND QUERY LANGUAGES.  2- Less Powerful Query Languages:  In many applications that use NOSQL systems may not require a powerful query language such as SQL, because search (read) queries in these systems often locate single objects in a single file based on their object keys.  Reading and writing the data objects is accomplished by calling the appropriate operations by the programmer (API).  SCRUD: Search, Create, Read, Update and Delete  Provide a high-level query language, but it may not have the full power of SQL, for example the joins need to be implemented in the application programs.
  • 19. CHARACTERISTICS RELATED TO DATA MODELS AND QUERY LANGUAGES.  3- Versioning:  Provide storage of multiple versions of the data items, with the timestamps of when the data version was created.
  • 20. 1.3 CATEGORIES OF NOSQL SYSTEMS The most common categories: 1. Document-based NOSQL systems:  Store data in the form of documents using well-known formats such as JSON.  Documents are accessible via their document id, but can also be accessed rapidly using other indexes. 2. NOSQL key-value stores:  Fast access by the key to the value associated with the key  Value can be a record or an object or a document or even have a more complex data structure. 3. Column-based or wide column NOSQL systems:  Partition a table by column into column families  Form of vertical partitioning. 4. Graph-based NOSQL systems:  Data is represented as graphs  Related nodes can be found by traversing the edges using path expressions.
  • 21. 1.3 CATEGORIES OF NOSQL SYSTEMS Additional categories : 5. Hybrid NOSQL systems:  These systems have characteristics from two or more of the common categories.. 6. Object databases. 7. XML databases.
  • 22. THE CAP THEOREM  The CAP: it’s impossible to guarantee consistency, availability and partition tolerance at the same time in a distributed system with data replication.  Two properties out of the three to guarantee.  Weaker consistency levels are often used in NOSQL system instead of guaranteeing serializability.  Eventual consistency is used.
  • 23. THE CAP THEOREM  The CAP theorem is used to explain some of the competing requirements in a distributed system with replication.  The three letters in CAP refers to  Consistency (among replicated copies):  The nodes will have the same copies of a replicated data item visible for various transactions.  Availability (of the system for read and write operations) :  Each read or write will either be processed successfully or will receive a message that the operation cannot be completed.  Partition tolerance (in the face of the nodes in the system being partitioned by a network fault).:  The system can continue operating if the network connecting the nodes has a fault that results in two or more partitions,  Nodes in each partition can only communicate among each other.
  • 25. DOCUMENT-BASED NOSQL SYSTEMS AND MONGODB 1. Introduction 2. MongoDB Data Model 3. MongoDB CRUD Operations 4. MongoDB Distributed Systems Characteristics
  • 26. 3.1INTRODUCTION  Document-based NOSQL systems store data as collections of similar documents.  Documents resemble complex objects or XML documents  Documents in a collection should be similar, but they can have different attributes.  Document-based NOSQL systems: MongoDB and CouchDB.
  • 27. 3.2 MONGODB DATA MODEL  MongoDB is a free and open-source cross-platform document-oriented database.  Classified as a NoSQL database,
  • 28. 3.2 MONGODB DATA MODEL  MongoDB documents are stored in BSON (Binary JSON) format.  BSON is a variation of JSON with some additional data types and is more efficient for storage than JSON.  Individual documents are stored in a collection.  The operation createCollection is used to create each collection.
  • 29. 3.2 MONGODB DATA MODEL  Example: create a collection called project to hold PROJECT objects from the COMPANY database : db.createCollection(“project”, { capped : true, size : 1310720, max : 500 } )  “project” is the name of the collection (Mandatory)  Capped: capped means it has upper limits on its storage space (size) and number of documents (max).  Capping helps the system to choose the storage options for each collection.
  • 30. 3.2 MONGODB DATA MODEL  Example: create a document collection called worker : db.createCollection(“worker”, { capped : true, size : 5242880, max : 2000 } )  Each document has a unique ObjectId field “_id”  The _id is by default:  Automatically indexed in the collection.  The value is system-generated.  System-generated have a specific format – “combines the timestamp when the object is created, the node id, the process id and a counter “.  User-generated can have any value specified by the user as long as its.
  • 31. 3.2 MONGODB DATA MODEL  A collection does not have a schema.  The structure of the data fields in documents is chosen based on how documents will be accessed and used, and the user can choose a normalized design (similar to normalized relational tuples) or a denormalized design (similar to XML documents or complex objects).  Interdocument references can be specified by storing in one document the ObjectId or ObjectIds of other related documents.
  • 32. 3.2 MONGODB DATA MODEL Company database example
  • 33. 3.2 MONGODB DATA MODEL Project info Embedded workers info
  • 34. 3.2 MONGODB DATA MODEL Project info Embedded workers array Workers
  • 35. 3.2 MONGODB DATA MODEL Project ID as an attribute
  • 37. 3.3 MONGODB CRUD OPERATIONS  Insert:  db.<collection_name>.insert(<document(s)>)  Example:  Db.project.insert({_id:”P1”, Pname:”ProjectX”,Plocation:”Jenin”})  Delete: remove  db.<collection_name>.remove(<condition>)  Example:  db.project.remove( {"_id": ObjectId(“P1")});
  • 38. 3.3 MONGODB CRUD OPERATIONS  Read: fined  db.<collection_name>.find(<condition>)  Example:  Db.project.find({"_id": ObjectId(“P1")})  Update:  db.<collection_name>. update(SELECTIOIN_CRITERIA, UPDATED_DATA)  Example:  Db.project.update({"_id" : ObjectId(P1)},{$set:{‘PLocation':‘AAUJ'}})
  • 39. 3.4 MONGODB DISTRIBUTED SYSTEMS CHARACTERISTICS  Replication in MongoDB  Sharding in MongoDB
  • 40. REPLICATION IN MONGODB  Master-slave approach for replication.  All read and write are done on the primary copy.  Secondary copies are to recover from primary fails.
  • 41. SHARDING IN MONGODB  Sharding of the documents in the collection—also known as horizontal partitioning— divides the documents into disjoint partitions known as shards.  Two ways:  Range partitioning  Hash partitioning
  • 42. SHARDING IN MONGODB  Range and Hash portioning require that the user specify a particular document field to be used as the basis for partitioning the documents into shards.  The partitioning field—known as the “shard key”, must exist in every document in the collection, and it must have an index.  The values of the shard key are divided into chunks, and the documents are partitioned based on the chunks of shard key values
  • 43. SHARDING IN MONGODB  Chunks created by specifying a range of key values and each chunk contains the key values in one range.  If range queries are commonly applied to a collection (for example, retrieving all documents whose shard key value is between 200 and 400), then range partitioning is preferred  Because each range query will typically be submitted to a single node that contains all the required documents in one shard.  If most searches retrieve one document at a time, hash partitioning may be preferable because it randomizes the distribution of shard key values into chunks.
  • 44. SHARDING IN MONGODB  MongoDB queries are submitted to a module called the query router, which keeps track of which nodes contain which shards based on the particular partitioning method used on the shard keys.  The query will be routed to the nodes that contain the shards that hold the documents that the query is requesting.  If the system cannot determine which shards hold the required documents, the query will be submitted to all the nodes that hold shards of the collection.
  • 45. SHARDING IN MONGODB  Sharding and replication are used together:  Sharding focuses on improving performance via load balancing and horizontal scalability.  Replication focuses on ensuring system availability when certain nodes fail in the distributed system.
  • 47. WHY NOSQL?  Alter the table and add Description, Rate and Reviews  NOSQL is Flexible No Schema restrictions
  • 48. WHY NOSQL?  SQL is Restricted ! Fill the data
  • 49. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Flexibility for Faster Development
  • 50. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Flexibility for Faster Development
  • 51. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Simplicity for Easier Development
  • 52. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Simplicity for Easier Development  Reading this profile would require the application to read six rows from three table
  • 53. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Simplicity for Easier Development
  • 54. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Availability for Always-on
  • 55. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Availability for Always-on
  • 56. NOSQL CATEGORIES EXAMPLES - DOCUMENT-BASED NOSQL SYSTEMS XML is stored into a native XML Type
  • 57. NOSQL CATEGORIES EXAMPLES - DOCUMENT-BASED NOSQL SYSTEMS  The query retrieves the <Features> child element of the <ProductDescription> element  Result:
  • 58. NOSQL CATEGORIES EXAMPLES - NOSQL KEY-VALUE STORES  RIAK as example
  • 59. NOSQL CATEGORIES EXAMPLES - NOSQL KEY-VALUE STORES  The response to a query will be an object contains a list of documents which match the given query.  The documents returned are Search documents (a set of Solr field/values)
  • 60. NOSQL CATEGORIES EXAMPLES - COLUMN NOSQL SYSTEMS  Cassandra as an example  returns a result-set of rows, where each row consists of a key and a collection of columns corresponding to the query
  • 61. NOSQL CATEGORIES EXAMPLES - COLUMN NOSQL SYSTEMS  LOCAL_QUORUM: it’s a consistency level type  Used in multiple data center clusters.  Use to maintain consistency locally (within the single data center).
  • 62. NOSQL CATEGORIES EXAMPLES - GRAPH- BASED NOSQL SYSTEMS  Neo4j as an example
  • 63. NOSQL CATEGORIES EXAMPLES - GRAPH- BASED NOSQL SYSTEMS
  • 64. NOSQL CATEGORIES EXAMPLES - OBJECT DATABASES  LINQ as an example
  • 65. NOSQL KEY-VALUE STORES 1. Introduction 2. DynamoDB Overview 3. Voldemort Key-Value Distributed Data Store 4. Examples of Other Key-Value Stores
  • 66. 4.1 INTRODUCTION  No query language  A set of operations that can be used by the application programmers.  Characteristics:  Every value is associated with a unique key.  Retrieving the value by supplying the key is very fast.
  • 68. 4.2 DYNAMODB OVERVIEW  Amazon product – part AWS  Data model is using the concepts of tables, items, and attributes.  The table does not have a schema.  Holds a collection of self-describing items.  The item consist of a number of (attribute, value) pairs  Attribute values can be single-valued or multivalued.
  • 69. 4.2 DYNAMODB OVERVIEW  Uploads an item to the ProductCatalog table
  • 70. 4.3 VOLDEMORT KEY-VALUE DISTRIBUTED DATA STORE  Based on Amazon’s DynamoDB.  Used by LinkedIn.  Simple and basic set of operations, like (put, delete and get).  Pluggable with other storage engines like MySQL  Nodes are independent  Automatic replications and partitioning
  • 71. 4.3 VOLDEMORT KEY-VALUE DISTRIBUTED DATA STORE
  • 72. 4.4 EXAMPLES OF OTHER KEY-VALUE STORES 1. Oracle key-value store. 2. Redis key-value cache and store. 3. Apache Cassandra
  • 73. COLUMN-BASED OR WIDE COLUMN NOSQL SYSTEMS  Stores data tables as columns rather than as rows.
  • 74. HBASE DATA MODEL AND VERSIONING  Apache HBase is an open-source, distributed, versioned, non- relational database.  Column is identified by a combination of (column family:column qualifier).  Stores multiple versions of a data item, with a timestamp associated with each version.
  • 75. HBASE DATA MODEL AND VERSIONING
  • 76. HBASE DATA MODEL AND VERSIONING  Table is divided into a number of regions.  Range partitioning.  Apache Zookeeper and Apache HDFS (Hadoop Distributed File System) are used for management.
  • 77. NOSQL GRAPH DATABASES AND NEO4J  The data is represented as a graph, which is a collection of vertices (nodes) and edges.  Nodes and edges can be labeled to indicate the types of entities and relationships they represent  It is generally possible to store data associated with both individual nodes and individual edges.  Neo4j is a NOSQL Graph DB and it’s an open source system, also it is implemented in Java.
  • 78. NEO4J  The data model in Neo4j organizes data using the concepts of nodes and relationships.  Nodes and relationships have properties which store the data items.  Nodes can have labels.  Nodes that have the same label are grouped into a collection that identifies a subset of the nodes in the database graph for querying purposes.  A node can have zero, one, or several labels.
  • 79. NEO4J
  • 80. NEO4J