SlideShare ist ein Scribd-Unternehmen logo
1 von 26
NoSQL
Presented By: Nusrat Sharmin
What is NoSQL?
 Stands for Not Only SQL
implying that when designing a software solution or product there are more than one
storage mechanism that could be used based on the needs
 Class of non-relational data storage systems
 Usually do not require fixed table schema that is schema-less nor do they use
concept of joins
 Running well on clusters
 Mostly open-source, distributed, & built for 21st web estates
 Designed to cope up with the scale & agility challenges that face modern
applications
 Built to take advantage of the cheap storage & processing power available today
Why NoSQL Databases?
 Allows developers to develop
without having to convert in-memory structures to relational structures
Why NoSQL Databases?
 Using databases as
 integration points in favor of
encapsulating databases with
applications & integrating using services
The rise of the web as a platform also
created a vital factor change in data
storage
 need to support large volumes of data by
running on clusters
Relational databases were not
designed to run on clusters
 for example the data storage for ERP
application are lot more different than
data storage needs of a Facebook or an
Etsy
Data Models of NoSQL
 A data model is a set of constructs for representing the information
Relational model: tables, columns & rows
Storage model: how the DBMS stores & manipulates the data internally
 A data model is usually independent of the storage model
 Data models for NoSQL systems
 Aggregate Data Models
 key-value
 document
 column-family
 Distribution Models
Aggregate Data Models
 Data as units that have a complex structure
 more structure than just a set of tuples
 example:
 complex record with: simple fields, arrays, records nested inside
 Aggregate in Domain-Driven Design
 a collection of related objects that we treat as unit
 a unit for data manipulation and management of consistency
 Advantages of aggregates:
 easier for application programmers to work with
 easier for database systems to handle operating on cluster
Distribution Models
 Aggregate oriented databases make distribution of data easier
 the distribution mechanism has to move the aggregate that contained all the related
data in the aggregate
 There are two styles of distributing data
 Sharding
 distributes different data across multiple servers
 each server acts as the single source for a subset of data
 Replication
 copies data in multiple servers, so each bit of data can be found in multiple places
 comes in two forms
 Master-slave replication makes one node the authoritative copy that handles writes while slaves
synchronize with the master and may handle reads
 reduces the chance of update conflicts
 Peer-to-peer replication allows writes to any node that nodes coordinate to synchronize their copies of
the data
 avoids loading all writes onto a single server creating a single point of failure
CAP Theorem
 Proposed by Eric Brewer (talk on
Principles of Distributed
Computing July 2000)
 Three properties of a system:
consistency, availability and
partitions
 Can have at most two of these
three properties for any shared-
data system
 To scale out, partition will need.
That leaves either consistency or
availability to choose from
 In almost all cases, choose
availability over consistency
Consistency
Partition
tolerance
Availability
CAP Theorem
 Once a writer has written, all
readers will see that write
 Two kinds of consistency:
 strong consistency – ACID(Atomicity
Consistency Isolation Durability)
 weak consistency – BASE(Basically
Available Soft-state Eventual consistency )
Consistency
Partition
tolerance
Availability
CAP Theorem
 System is available
during software & hardware upgrades
& node failures
 Traditionally, thought of as the
server/process available five 9’s
(99.999 %)
 However, for large node system,
at almost any point in time there’s
a good chance that a node is either
down or there is a network
disruption among the nodes
 Want a system that is resilient in the
face of network disruption
Consistency
Partition
tolerance
Availability
CAP Theorem
 A system can continue to operate
in the presence of a network
partitions
Consistency
Partition
tolerance
Availability
CAP Theorem
 Theorem: Can have at most two of
these properties for any shared-data
system
Consistency
Partition
tolerance
Availability
Types of NoSQL Databases
NoSQL
Key-Value or ‘the
big hash table’
Schema-less
Column-based
Document-based
Graph-based
Key-Value databases
 Simplest NoSQL data stores to use from
an API perspective
 The client can
 either get the value for the key
 put a value for a key
 or delete a key from the data store
 The data stores just store the value is blob
without caring what is inside
 Can store whatever like in the aggregate
 Can only access an aggregate by lookup
based on its key
 Examples: Riak, Redis, Memcached,
Berkely DB, HamsterDB, Amazon
DynamoDB (not open-source), Project
Voldemort & Couchbase
Document databases
 Main concept are – ‘Documents’
 Database stores & retrieves documents
which can be
 XML, JSON, BSON and so on
 Documents are
 Self-describing
 Hierarchical tree data structures that can
consist of maps, collections & scalar values
 Documents are stored similar to each other
but do not have to be exactly the same
 Store documents in the ‘value’
 i.e. part of the key-value store where the values are
examinable
 Example: MongoDB, CouchDB, Terrastore,
OrientDB, RavenDB
Column family stores
 Store data in column families as
rows
that have many columns associated
with a row key
 Column families are group of
related data
that is often accessed together
 Various rows do not have the
same columns
 Columns can be added
to any rows at any time without having
to add it to other rows
 Example: Cassandra, Hbase,
Hypertable, Amazon DynamoDB
Graph stores
Allows to store entities & relationships
between these entities
Entities are also known as nodes
 can be an instance of an object in the
application
Relations are known as edges
Nodes are organized by relationships
 allows you to find interesting patterns
between the nodes
 complex relationship requires complex
join
 Like storing a graph like structure in
RDBMS in relational databases model
the graph beforehand the traversal
need.
 Traversal will change the data
movement
Graph stores
 In database traversing
 the joins or relationships are very fast
 Nodes can have
 different types of relationships
 Value of the graph databases
 derived from the relationships
 Relationships don’t only have a type
but also
 a start node &
 an end node
 Adding new relationship types is easy
 Changing existing nodes &
relationships are similar to
 data migration
 Example : Neo4J, Infinite Graph,
OrientDB or FlockDB
Key/Value Vs. Schema-less
Key/Value
 Pros:
very fast
very scalable
simple model
able to distribute horizontally
 Cons:
many data structures (objects) can’t be
easily modeled as key value pairs
Schema-less
 Pros:
Schema-less data model is richer than
key/value pairs
eventual consistency
many are distributed
still provide excellent performance and
scalability
 Cons:
typically no ACID transactions or joins
SQL Vs. NoSQL
Topics SQL NoSQL
Types One type : SQL Database (with minor
variations)
Many different types: Key/Value,
document database, column stores
database, graph database
Development
History
Developed in 1970s Developed in 2000s
Deal with First wave of data storage applications Limitations of SQL databases, particularly
concerning scale, replication &
unstructured data storage
Examples MySQL, Postgres, Oracle MongoDB, Cassandra, Hbase, Neo4J
Data Storage Model Individual records are stored as rows in
tables with columns much like
spreadsheet. Separate data stored in
separate tables & used joined
operation for querying data
Varies based on database type. For
example, key-value stores function similar
to the SQL but have only two columns:
‘key’ & ‘value’ with more information
sometimes stored in ‘value’ & Document
databases work with table & row model
storing all relevant data in single
document like JSON, XML etc.
Topics SQL NoSQL
Schemas Predefined i.e. structure & datatypes are
fixed
Dynamic. Unlike SQL can store dissimilar data
if necessary.
Scaling Vertically i.e. single sever must be made
increasingly powerful. To spread SQL
database over many servers additional
engineering required
Horizontally i.e. to add capacity, a database
administrator can simply add more
commodity servers & cloud instances
Sharding Manual sharding Auto sharding
Development
Model
Mix of open-source (e.g. Postgres, MySQL)
and closed source (e.g. Oracle)
Open-source
Supports
Transactions
Update can be configured entirely or not
at all
In certain circumstances and at certain levels
(e.g. document level vs. database level)
Data
Manipulation
Specific language using select, insert &
update statements e.g. SELECT fields
FROM table WHERE
Object oriented APIs
Consistency Strong consistency Depends on product. Some provide strong
consistency (e.g. MongoDB) whereas others
eventual consistency (e.g. Cassandra)
SQL Vs. NoSQL
Handling Relational Data
 Lack ability of joins in queries
 Three main techniques for handling relational data
 Multiple queries
 instead of retrieving all data with one query, it’s acceptable to do several queries
 Caching/replication/non-normalized data
 instead of storing only foreign keys, it’s common to store actual foreign values with model’s data
 Nesting data
 put more data in a smaller number of collections so that a single document can contains all the
data that need for a specific task
Benefits of NoSQL
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore identical & fault tolerant) and can
be partitioned
 Down nodes easily replaced
 No single point of failure
 Easy to distribute
 Don’t require a schema
 Can scale up and down
 Relax the data consistency requirement (CAP)
Conclusion
 NoSQL database doesn’t mean
 the demise of RDBMS databases
 improve programmer productivity
 improve data access performance via some combination
 handling larger data volumes
 reducing latency
 improving throughput
 Entering an era of ‘Polyglot Persistence’
 a technique that uses different data storage technologies to handle varying data storage
needs
 can apply across an enterprise or within a single application
References
1. http://www.thoughtworks.com/insights/blog/nosql-databases-overview
2. http://www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx
3. http://en.wikipedia.org/wiki/NoSQL
4. http://www.mongodb.com/nosql-explained
5. http://nosql-database.org/
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
Rick Perry
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 

Was ist angesagt? (20)

NoSQL (Non-Relational Databases)
NoSQL (Non-Relational Databases)NoSQL (Non-Relational Databases)
NoSQL (Non-Relational Databases)
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 
cassandra
cassandracassandra
cassandra
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
NoSql
NoSqlNoSql
NoSql
 
Why Cassandra?
Why Cassandra?Why Cassandra?
Why Cassandra?
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
No SQL
No SQLNo SQL
No SQL
 
MS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTUREMS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTURE
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
Sql Server Basics
Sql Server BasicsSql Server Basics
Sql Server Basics
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTP
 
No SQL - MongoDB
No SQL - MongoDBNo SQL - MongoDB
No SQL - MongoDB
 

Andere mochten auch

Atomu tanaka japanese resume
Atomu tanaka japanese resumeAtomu tanaka japanese resume
Atomu tanaka japanese resume
Atomu Tanaka
 
Ingles de gigi ii corte...
Ingles de gigi ii corte...Ingles de gigi ii corte...
Ingles de gigi ii corte...
gigi marcano
 
2014 master resume
2014 master resume2014 master resume
2014 master resume
Atomu Tanaka
 
Phần công suất123
Phần công suất123Phần công suất123
Phần công suất123
Lê Nam
 
The measurement of advertisement effectiveness of banglalink
The measurement of advertisement effectiveness of banglalinkThe measurement of advertisement effectiveness of banglalink
The measurement of advertisement effectiveness of banglalink
Shabbir Hasan
 

Andere mochten auch (15)

Chapter 5(Cellular Concepts)
Chapter 5(Cellular Concepts)Chapter 5(Cellular Concepts)
Chapter 5(Cellular Concepts)
 
1 project kill the flashover slides firehouse 2014
1 project kill the flashover slides firehouse 20141 project kill the flashover slides firehouse 2014
1 project kill the flashover slides firehouse 2014
 
Cost and Finance in Off-Grid Access
Cost and Finance in Off-Grid AccessCost and Finance in Off-Grid Access
Cost and Finance in Off-Grid Access
 
Atomu tanaka japanese resume
Atomu tanaka japanese resumeAtomu tanaka japanese resume
Atomu tanaka japanese resume
 
Ingles de gigi ii corte...
Ingles de gigi ii corte...Ingles de gigi ii corte...
Ingles de gigi ii corte...
 
Instrumento de Evaluación
Instrumento de Evaluación Instrumento de Evaluación
Instrumento de Evaluación
 
2014 master resume
2014 master resume2014 master resume
2014 master resume
 
Phần công suất123
Phần công suất123Phần công suất123
Phần công suất123
 
Mobile banking service
Mobile  banking serviceMobile  banking service
Mobile banking service
 
The measurement of advertisement effectiveness of banglalink
The measurement of advertisement effectiveness of banglalinkThe measurement of advertisement effectiveness of banglalink
The measurement of advertisement effectiveness of banglalink
 
Islami Bank Bangladesh ltd - Rds
Islami Bank Bangladesh ltd - Rds Islami Bank Bangladesh ltd - Rds
Islami Bank Bangladesh ltd - Rds
 
GROUP 01: PRIMARY, SECONDARY AND TERTIARY COLOURS
GROUP 01: PRIMARY, SECONDARY AND TERTIARY COLOURSGROUP 01: PRIMARY, SECONDARY AND TERTIARY COLOURS
GROUP 01: PRIMARY, SECONDARY AND TERTIARY COLOURS
 
GROUP 06: ATTRIBUTES OF COLOUR: HUE, VALUE, SATURATION
GROUP 06: ATTRIBUTES OF COLOUR: HUE, VALUE, SATURATIONGROUP 06: ATTRIBUTES OF COLOUR: HUE, VALUE, SATURATION
GROUP 06: ATTRIBUTES OF COLOUR: HUE, VALUE, SATURATION
 
GROUP 04: COLOUR SYSTEMS: SUBSTRACTIVE AND ADDITIVE COLOURS
GROUP 04: COLOUR SYSTEMS: SUBSTRACTIVE AND ADDITIVE COLOURSGROUP 04: COLOUR SYSTEMS: SUBSTRACTIVE AND ADDITIVE COLOURS
GROUP 04: COLOUR SYSTEMS: SUBSTRACTIVE AND ADDITIVE COLOURS
 
GROUP 03: COMPLEMENTARY AND ANALOGOUS COLOURS
GROUP 03: COMPLEMENTARY AND ANALOGOUS COLOURSGROUP 03: COMPLEMENTARY AND ANALOGOUS COLOURS
GROUP 03: COMPLEMENTARY AND ANALOGOUS COLOURS
 

Ähnlich wie No sq lv2

No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
data base system to new data science lerne
data base system to new data science lernedata base system to new data science lerne
data base system to new data science lerne
tarunprajapati0t
 

Ähnlich wie No sq lv2 (20)

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbms
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingNosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understanding
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Nosql
NosqlNosql
Nosql
 
Datastores
DatastoresDatastores
Datastores
 
Nosql
NosqlNosql
Nosql
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
 
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptxDATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
 
Datastores
DatastoresDatastores
Datastores
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
data base system to new data science lerne
data base system to new data science lernedata base system to new data science lerne
data base system to new data science lerne
 
NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Unit-10.pptx
Unit-10.pptxUnit-10.pptx
Unit-10.pptx
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

No sq lv2

  • 2. What is NoSQL?  Stands for Not Only SQL implying that when designing a software solution or product there are more than one storage mechanism that could be used based on the needs  Class of non-relational data storage systems  Usually do not require fixed table schema that is schema-less nor do they use concept of joins  Running well on clusters  Mostly open-source, distributed, & built for 21st web estates  Designed to cope up with the scale & agility challenges that face modern applications  Built to take advantage of the cheap storage & processing power available today
  • 3. Why NoSQL Databases?  Allows developers to develop without having to convert in-memory structures to relational structures
  • 4. Why NoSQL Databases?  Using databases as  integration points in favor of encapsulating databases with applications & integrating using services The rise of the web as a platform also created a vital factor change in data storage  need to support large volumes of data by running on clusters Relational databases were not designed to run on clusters  for example the data storage for ERP application are lot more different than data storage needs of a Facebook or an Etsy
  • 5. Data Models of NoSQL  A data model is a set of constructs for representing the information Relational model: tables, columns & rows Storage model: how the DBMS stores & manipulates the data internally  A data model is usually independent of the storage model  Data models for NoSQL systems  Aggregate Data Models  key-value  document  column-family  Distribution Models
  • 6. Aggregate Data Models  Data as units that have a complex structure  more structure than just a set of tuples  example:  complex record with: simple fields, arrays, records nested inside  Aggregate in Domain-Driven Design  a collection of related objects that we treat as unit  a unit for data manipulation and management of consistency  Advantages of aggregates:  easier for application programmers to work with  easier for database systems to handle operating on cluster
  • 7. Distribution Models  Aggregate oriented databases make distribution of data easier  the distribution mechanism has to move the aggregate that contained all the related data in the aggregate  There are two styles of distributing data  Sharding  distributes different data across multiple servers  each server acts as the single source for a subset of data  Replication  copies data in multiple servers, so each bit of data can be found in multiple places  comes in two forms  Master-slave replication makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads  reduces the chance of update conflicts  Peer-to-peer replication allows writes to any node that nodes coordinate to synchronize their copies of the data  avoids loading all writes onto a single server creating a single point of failure
  • 8. CAP Theorem  Proposed by Eric Brewer (talk on Principles of Distributed Computing July 2000)  Three properties of a system: consistency, availability and partitions  Can have at most two of these three properties for any shared- data system  To scale out, partition will need. That leaves either consistency or availability to choose from  In almost all cases, choose availability over consistency Consistency Partition tolerance Availability
  • 9. CAP Theorem  Once a writer has written, all readers will see that write  Two kinds of consistency:  strong consistency – ACID(Atomicity Consistency Isolation Durability)  weak consistency – BASE(Basically Available Soft-state Eventual consistency ) Consistency Partition tolerance Availability
  • 10. CAP Theorem  System is available during software & hardware upgrades & node failures  Traditionally, thought of as the server/process available five 9’s (99.999 %)  However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes  Want a system that is resilient in the face of network disruption Consistency Partition tolerance Availability
  • 11. CAP Theorem  A system can continue to operate in the presence of a network partitions Consistency Partition tolerance Availability
  • 12. CAP Theorem  Theorem: Can have at most two of these properties for any shared-data system Consistency Partition tolerance Availability
  • 13. Types of NoSQL Databases NoSQL Key-Value or ‘the big hash table’ Schema-less Column-based Document-based Graph-based
  • 14. Key-Value databases  Simplest NoSQL data stores to use from an API perspective  The client can  either get the value for the key  put a value for a key  or delete a key from the data store  The data stores just store the value is blob without caring what is inside  Can store whatever like in the aggregate  Can only access an aggregate by lookup based on its key  Examples: Riak, Redis, Memcached, Berkely DB, HamsterDB, Amazon DynamoDB (not open-source), Project Voldemort & Couchbase
  • 15. Document databases  Main concept are – ‘Documents’  Database stores & retrieves documents which can be  XML, JSON, BSON and so on  Documents are  Self-describing  Hierarchical tree data structures that can consist of maps, collections & scalar values  Documents are stored similar to each other but do not have to be exactly the same  Store documents in the ‘value’  i.e. part of the key-value store where the values are examinable  Example: MongoDB, CouchDB, Terrastore, OrientDB, RavenDB
  • 16. Column family stores  Store data in column families as rows that have many columns associated with a row key  Column families are group of related data that is often accessed together  Various rows do not have the same columns  Columns can be added to any rows at any time without having to add it to other rows  Example: Cassandra, Hbase, Hypertable, Amazon DynamoDB
  • 17. Graph stores Allows to store entities & relationships between these entities Entities are also known as nodes  can be an instance of an object in the application Relations are known as edges Nodes are organized by relationships  allows you to find interesting patterns between the nodes  complex relationship requires complex join  Like storing a graph like structure in RDBMS in relational databases model the graph beforehand the traversal need.  Traversal will change the data movement
  • 18. Graph stores  In database traversing  the joins or relationships are very fast  Nodes can have  different types of relationships  Value of the graph databases  derived from the relationships  Relationships don’t only have a type but also  a start node &  an end node  Adding new relationship types is easy  Changing existing nodes & relationships are similar to  data migration  Example : Neo4J, Infinite Graph, OrientDB or FlockDB
  • 19. Key/Value Vs. Schema-less Key/Value  Pros: very fast very scalable simple model able to distribute horizontally  Cons: many data structures (objects) can’t be easily modeled as key value pairs Schema-less  Pros: Schema-less data model is richer than key/value pairs eventual consistency many are distributed still provide excellent performance and scalability  Cons: typically no ACID transactions or joins
  • 20. SQL Vs. NoSQL Topics SQL NoSQL Types One type : SQL Database (with minor variations) Many different types: Key/Value, document database, column stores database, graph database Development History Developed in 1970s Developed in 2000s Deal with First wave of data storage applications Limitations of SQL databases, particularly concerning scale, replication & unstructured data storage Examples MySQL, Postgres, Oracle MongoDB, Cassandra, Hbase, Neo4J Data Storage Model Individual records are stored as rows in tables with columns much like spreadsheet. Separate data stored in separate tables & used joined operation for querying data Varies based on database type. For example, key-value stores function similar to the SQL but have only two columns: ‘key’ & ‘value’ with more information sometimes stored in ‘value’ & Document databases work with table & row model storing all relevant data in single document like JSON, XML etc.
  • 21. Topics SQL NoSQL Schemas Predefined i.e. structure & datatypes are fixed Dynamic. Unlike SQL can store dissimilar data if necessary. Scaling Vertically i.e. single sever must be made increasingly powerful. To spread SQL database over many servers additional engineering required Horizontally i.e. to add capacity, a database administrator can simply add more commodity servers & cloud instances Sharding Manual sharding Auto sharding Development Model Mix of open-source (e.g. Postgres, MySQL) and closed source (e.g. Oracle) Open-source Supports Transactions Update can be configured entirely or not at all In certain circumstances and at certain levels (e.g. document level vs. database level) Data Manipulation Specific language using select, insert & update statements e.g. SELECT fields FROM table WHERE Object oriented APIs Consistency Strong consistency Depends on product. Some provide strong consistency (e.g. MongoDB) whereas others eventual consistency (e.g. Cassandra) SQL Vs. NoSQL
  • 22. Handling Relational Data  Lack ability of joins in queries  Three main techniques for handling relational data  Multiple queries  instead of retrieving all data with one query, it’s acceptable to do several queries  Caching/replication/non-normalized data  instead of storing only foreign keys, it’s common to store actual foreign values with model’s data  Nesting data  put more data in a smaller number of collections so that a single document can contains all the data that need for a specific task
  • 23. Benefits of NoSQL  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical & fault tolerant) and can be partitioned  Down nodes easily replaced  No single point of failure  Easy to distribute  Don’t require a schema  Can scale up and down  Relax the data consistency requirement (CAP)
  • 24. Conclusion  NoSQL database doesn’t mean  the demise of RDBMS databases  improve programmer productivity  improve data access performance via some combination  handling larger data volumes  reducing latency  improving throughput  Entering an era of ‘Polyglot Persistence’  a technique that uses different data storage technologies to handle varying data storage needs  can apply across an enterprise or within a single application
  • 25. References 1. http://www.thoughtworks.com/insights/blog/nosql-databases-overview 2. http://www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx 3. http://en.wikipedia.org/wiki/NoSQL 4. http://www.mongodb.com/nosql-explained 5. http://nosql-database.org/
  • 26. Q & A

Hinweis der Redaktion

  1. - NoSQL encompasses a wide variety of different database technologies that were developed in response to a rise in the volume of data stored about users, objects & products, the frequency in which this data is accessed and performance and processing needs - NoSQL was a hashtag(#nosql) choosen for a meetup to discuss these new databases - Not like RDBMS, NoSQL designed to cope up with the scale & agility challenges that face modern applications & built to take advantage of the cheap storage & processing power available today
  2. Application developers have been frustrated with the impedance mismatch between the relational data structures and the in memory data structures of the application. By using NoSQL databases allows developers to develop without having to convert in-memory structures to relational structures.
  3. - Currently there is a movement from using databases as integration points to encapsulating databases with application and integrating using services - As the web based data are increasing day by day which is a major change in data storage to manage large volume of data on clusters - Relational DB were not designed to run on clusters
  4. - RDBMS modeling is vastly different than the types of data structures that application developers use. - Using the data structures as modeled by the developers to solve different problem domains has given rise to movement away from relational modeling and towards aggregates models, most of this is driven by Domain Driven Design
  5. -> Proposed by Eric Brewer (talk on Principles of Distributed Computing July 2000). -> Partitionability: divide nodes into small groups that can see other groups, but they can't see everyone. -> Consistency: write a value and then you read the value you get the same value back. In a partitioned system there are windows where that's not true. -> Availability: may not always be able to write or read. The system will say you can't write because it wants to keep the system consistent. -> To scale you have to partition, so you are left with choosing either high consistency or high availability for a particular system. You must find the right overlap of availability and consistency. -> Choose a specific approach based on the needs of the service. -> For the checkout process you always want to honor requests to add items to a shopping cart because it's revenue producing. In this case you choose high availability. Errors are hidden from the customer and sorted out later. -> When a customer submits an order you favor consistency because several services--credit card processing, shipping and handling, reporting— are simultaneously accessing the data.
  6. Consistency: write a value and then you read the value you get the same value back. In a partitioned system there are windows where that's not true. *A consistency model determines rules for visibility and apparent order of updates. For example: Row X is replicated on nodes M and N Client A writes row X to node N Some period of time t elapses. Client B reads row X from node M Does client B see the write from client A? Consistency is a continuum with tradeoffs For NoSQL, the answer would be: maybe CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and partition-tolerance.
  7. - The CAP theorem which states that in any distributed system we can choose only two of consistency , availability or partition tolerance. - Many NoSQL databases try to provide options where the developer has choices where they can tune the database as per their needs
  8. Usually when we store graph-like structure in RDBMS, it’s for a single type of relationship & adding another relationship to the mix usually means a lot of schema changes and data movement, which is not in the case when we are using graph databases. Similarly in relational databases we model the graph beforehand based on the traversal we want; if the traversal changes the data will have to change
  9. In database traversing the joins or relationships is very fast. Because the relationship between nodes is not calculated at query time but is actually persisted as a relationship & traversing persisted relationship is faster than calculating them for every query Relationships don’t only have a type but also a start node & an end node but can have properties of their own. Using these properties on the relationships we can add intelligence to the relationship & can be used to query the graph Adding new relationship types is easy; changing existing nodes & their relationships is similar to data migration because these changes will have to be done on each node and each relationship in the existing data
  10. - SQL: Data storage model: Individual records are stored as rows in tables with each column storing a specific piece of data about the record much like spreadsheet. Separate data stored in separate table & used joined operation for querying data
  11. Schemas SQL: Predefined i.e. structure & datatypes are fixed. Entire database must be altered to store new data information & to do this database must be taken offline NoSQL: Dynamic. Unlike SQL can store dissimilar data if necessary. But for some database like wide-column stores it is challenging to add new fields dynamically Scaling SQL: Vertically i.e. single sever must be made increasingly powerful. Possible to spread SQL database over many servers but also additional engineering required NoSQL: Horizontally i.e. to add capacity, a database administrator can simply add more commodity servers & cloud instances Sharding SQL: Manual sharding. In relational databases application code is developed to distribute the data, queries, & aggregate the results of data across all of the database instances. Additional code must be developed to handle resource failures to perform joins across the different databases, for data rebalancing, replication & other requirements. Furthermore many benefits of the relational database such as transactional integrity are compromised or eliminated when employing manual sharding NoSQL: Auto sharding meaning that they natively and automatically spread data across an arbitrary number of servers
  12. Most NoSQL database lack ability for joins in queries, the database schema generally needs to be designed differently. There are three main techniques for handling relational data in a NoSQL database Multiple queries: instead of retrieving all data with one query it’s common to do several queries to get the desired data. NoSQL queries are often faster than traditional SQL queries so the cost of doing additional queries may be acceptable Caching/replication/non-normalized data: Each blog comment might include the username in addition to a user id, thus providing easy access to the username in addition to a user is thus providing easy access to the username without requiring another lookup. when a username changes however this will now need to be changed over many places in the database Nesting data: For example in a blogging application, one might choose to store comments within the blog post document so that with a single retrieval one gets all the comments. Thus in this approach a single document contains all the data you need for specific task
  13. - To improve programmers productivity by using a database that better matches an application’s need