SlideShare ist ein Scribd-Unternehmen logo
1 von 32
NoSQL
Leo’s notes
Those slides are Leopold Gault's notes, when reading :
• https://www.thoughtworks.com/insights/blog/nosql-databases-overview
• https://www.slideshare.net/arangodb/query-mechanisms-for-nosql-databases
• https://www.slideshare.net/arangodb/introduction-to-column-oriented-databases
• https://neo4j.com/developer/guide-data-modeling/
I am not a NoSQL expert; those notes are just my understanding of the aforementioned sources
Aggregates
Relational data models (OLTP and OLAP) vs NoSQL data models
NoSQL data modelsRelational data models
Transactional (OLTP)
Note that they represent a
document as a hierarchical
tree of data (it makes sense)
I think they meant to
represent a star schema
Who do I think is meant to be normalized ?
Transactional (OLTP)
normalized
normalizedDeliberately
de-normalized
Normalized ?
Not normalized
Not normalized
NoSQL data modelsRelational data models
Who do I think natively supports ACID transactions
Transactional (OLTP)
Always
Most of the times
(e.g. Node4J)
Maybe
sometimes
NoSQL data modelsRelational data models
Always
Maybe
sometimes
Maybe
sometimes
Why aggregates
Let’s say that my application always uses
a set of data like this one
Why aggregates
In a RDBMS, such set of data would
have to be fetched from many
tables (requiring plenty of JOINs)
Let’s say that my application always uses
a set of data like this one
Why aggregates
In a RDBMS, such set of data would
have to be fetched from many
tables (requiring plenty of JOINs)
Let’s say that my application always uses
a set of data like this one
We can see that there is a big mismatch between the way the data is
aggregated by this application (i.e. the data is aggregated), and how the
data was scattered in tables of the RDBMS.
Aggregate-oriented DBMS
NoSQL DBMS (bar Graph DBMS) are aggregate-oriented.
An aggregate is a set of data, that will form the boundaries for ACID
operations.
Hence, the “acidity scope” is not at the transaction level, but at the aggregate level. Note
however that some aggregate-oriented DBMS also support ACID transactions.
An aggregate’s data have been grouped together only because it makes sense to do so,
from the application’s point of view.
This grouping is masterminded by a human. By:
• the developer: when coding an app, the developer will try to identify which sets of
data will be accessed together by the app. He will hence decide to write/read each set
of data as an aggregate.
• or the creator of materialized views, i.e. new aggregates emitted from disparate data.
Why aggregate-oriented DBMS
Working with aggregates is more performant. Indeed, an aggregate is stored together,
instead of being scattered among many tables. The same applies when reading: it is
quicker to retrieve a set of data that has been stored together, than if it had been
scattered throughout many tables.
In a cluster of an aggregate-oriented DBMS, an aggregate can live on the same node (or
be replicated on the same few nodes). Thus our cluster can scale out without reducing
the response time, as sets of data frequently accessed together (i.e. aggregates) are not
cut into pieces that are scattered through many nodes. The same logic applies for
sharding (an aggregate would belong to a single shard, instead of many) and replication.
About aggregates
Here are 2 formal definitions :
• An aggregate is a collection of data that we interact with as a unit. These units
of data (aggregates) form the boundaries for ACID operations (at the
aggregate level) with the database. [source1]
• Aggregate defines a collection of related objects that we treat as a unit. This
unit is taken as a whole for the context of {data manipulation and
management of consistency}. We update aggregates via atomic operations
and communicate our data storage in terms of aggregates. NoSQL databases,
apart from graph databases , have aggregate data models.
However, relational databases have no concept of aggregates within their data
model. These are considered aggregate-ignorant.
An aggregate-ignorant model allows you to look at data in different ways, so
it’s good when you don’t have a primary structure for manipulating data.
Aggregate ignorant databases, like relational and graph databases, in general
support ACID transactions.
[source2]
Who do I think is aggregate oriented?
Transactional (OLTP)
Yes (1 aggregate = 1 column /
segment of column)
Yes (1 aggregate can be a whole document
(identified by its key),
or a materialized view generated using map-
reduce)
Yes
(1 aggregate = 1 value,
i.e. a BLOB that bundles together
a bunch of data, this bunch is
meaningful only for the app)
No
No
No
Aggregate ignorant
Aggregate oriented data models
Maybe also a column family ?
But I don’t think so
I think the reason why Graph DBs are not “aggregate oriented” is because, despite storing
data as interconnected nodes, a node is probably not considered as an aggregate; probably
because the boundaries of an ACID operation extend beyond one node.
NoSQL data modelsRelational data models
Key-Value DBMS
Performance, but ignorance of what the values mean.
Key value DBMS
BLOB.
The K-V DBMS doesn’t care
what’s inside this BLOB value;
it’s up to the app to figure that
out.
key value
key value
key value
key value
key value
key value
Values are just BLOBs ; they have no meaning for the DBMS
Key-value DBMS
Key value DBMS
key value
key value
key value
key value
key value
key value
API
• get the value for a key,
• put a value for a key,
• delete a key-value pair
How to query: with a very simple API
Key-value DBMS
Documents DBMS
Store hierarchical trees of data
<Value=Document>
<Key=DocumentID>
Documents DBMS
key document
key document
key document
key document
key document
“key-value stores where the value is examinable”; indeed this value is a document
key document
Depending on the DBMS, the document
may be in JSON, XML, BSON, etc.
Documents DBMS
Documents DBMS
key document
key document
key document
key document
key document
Example with a JSON document
key document
Documents DBMS
Documents DBMS
key document
key document
key document
key document
key document
How to query: with the document key, or (for some DBMS, like MongoDB) with attributes within documents
key
API
MongoDB
Actually, with MongoDB, it wouldn’t be a JSON doc, but a
BSON one. So it’d look like this:
x31x00x00x00
x04BSONx00
x26x00x00x00
x02x30x00x08x00x00x00awesomex00
x01x31x00x33x33x33x33x33x33x14x40
x10x32x00xc2x07x00x00
x00
x00
Documents DBMS
key document
key document
key document
key document
key document
How to query: for some other DBMS (e.g. CouchDB), querying docs by anything else than their ID requires
creating a materialized view, populated with JavaScript map-reduce code (for instance).
key
API
CouchDB
Document ID
This functions will parse all the
documents in the store, and emit the
docID of docs where there is a match
(where one of the topics is “music”).
The load of running a map function can be
distributed between nodes.
I think that this map function should be followed by a reduce
function that simply returns what it has been fed as parameters: e.g.
nonReduce = function (keys, values, reduce) {
if (reduce) {
// never run
}else{
// returns the emitted data
return values;
}
};
false
Documents DBMS
key document
key document
key document
key document
key document
Example with map and reduce
key
API
CouchDB
I think it's an array (with keys) of an array (with '1's) :
values= [ 'skating':[1,1]
'music': [1],
'sleeping': [1,1,1,1]
];
length() of each nested array ?
Boolean to say whether or not a
re-reduce is needed
That’s a key
That’s a value
Columnar DBMS
To have the DBMS work on columns, instead of rows
Columnar DBMS vs RDBMS
How you use them
Columnar DBMS
• Data is stored in columns
• You specify column families (kind of entities), that are composed of
rows featuring some of the columns (among all the columns
mentioned in the column-family).
RDBMS
• Data is stored in tables; each row contains data for all columns (although
a value can be NULL)
Col 1 Col 2 Col 3
Column family A
row1
row2
row3
row4
Col 1 Col 2 Col 3
Table A
row1
row2
row3
row4
Why columnar DBMS ?
The benefits of column-oriented DBMS reside only in the way they
store data on-disk: they stores data by column instead of by row.
This makes such DBMS more performant when you query a few
columns, but read/write many things in those few columns.
It also makes possible to store the columns in a compressed state, and
only the columns being queried will be decompressed (on the fly).
Such DBMS are meant for analytics or batch-processing use-cases (and
not performant at all for OLTP).
Colum oriented storage vs Row oriented storage
Column oriented storage (columnar DBMS’ strategy)
• Each column is stored in its own datafile
source
datafile0
datafile1
a. Adding/deleting a column is relatively cheep in I/O: it only requires
working on a single small datafile.
b. Columns are stored compressed on the disk. Only the columns you
query will be decompressed (on the fly).
Row oriented storage (RDBMS’ strategy)
a. it might require to rewrite the whole table...
b. you can’t compress rows, because the whole row has to be decompressed
in order to be understandable (just like in a column-oriented storage, the whole column has to be
decompressed, or at least the whole subset of a column –i.e. “segment” ?-). This means the whole
table would have to be decompressed in order to be queried (I don’t think it you
could only decompress a subset of the table, because it is hard to think of a meaningful way the table could have
been chunked. Maybe you could only compress all the values except the ID, and chunk the table based on the ID;
but it would only be useful for JOINs based on foreign key.). A decompressed table is often too
big to fit only in memory, so you’d have to swap part of it on disk (which is
slow) just to be able to query it.
Colum oriented storage vs Row oriented storage
when not to use
Column oriented storage (columnar DBMS’ strategy)
source
• If you only want to work on a few rows (like it’s often the case in
OLTP), it won’t be performant at all: you’ll have to read and
decompress all the columns (or at least their relevant subsets), and
then recompress and rewrite them.
Row oriented storage (RDBMS’ strategy)
• If you only need to work on a few columns, but the table has may
columns, and you want to read/write many thing from those few
columns, you’ll have to read the whole row, just to get the few column
data that interests you.
Col 1 Col 2 Col 3
You just want to
modify a row
FYI: Memory page: the smallest unit of data for virtual-memory management: the OS will move this unit
of block from the HD to the RAM using I/O channels, and vice-versa. As it is the smallest unit, a page is
read from disk as a whole, including unused space.
Graph DBMS
To store and query relationships
How to deal with many relationships
RDBMS
• you would use JOINs to compute relationships, at query
time. On top of being less intuitive, the performance of
the JOINs will decrease exponentially with the size of
the tables being joined.
Graph DBMS
• the relationships are natively stored, so no relationship
will have to be computed at run time.
Labelled Property Graph Model (e.g. implemented by Neo4J)
A graph in such a model is composed of:
• Nodes
• Relationships (between 2 nodes.)
Labelled Property Graph Model (e.g. implemented by Neo4J)
About Nodes
A node can contain:
• Properties: multiple key-value pairs
• Labels: tags representing the roles of the node in the data domain. They are used to group
nodes into sets. Labels may also serve to attach metadata (index or constraint information)
to certain nodes.
Nodes
+
Label Labelled nodes
Person Book
Those names are
properties
Labelled Property Graph Model (e.g. implemented by Neo4J)
About Relationships
A relationship always has:
• a direction: a start node, and an end node
• a type (i.e. a name)
• Properties: multiple key-value pairs
Properties
Properties
The type of relationship is
“HAS_READ”

Weitere ähnliche Inhalte

Was ist angesagt?

Object relational database management system
Object relational database management systemObject relational database management system
Object relational database management systemSaibee Alam
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB DatabaseTariqul islam
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Beat Signer
 
Column db dol
Column db dolColumn db dol
Column db dolpoojabi
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005rainynovember12
 

Was ist angesagt? (20)

Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Object relational database management system
Object relational database management systemObject relational database management system
Object relational database management system
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
PostgreSQL - Case Study
PostgreSQL - Case StudyPostgreSQL - Case Study
PostgreSQL - Case Study
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Ms sql-server
Ms sql-serverMs sql-server
Ms sql-server
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
 
Column db dol
Column db dolColumn db dol
Column db dol
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
 

Ähnlich wie NoSQL Database Types Explained (20)

Nosql
NosqlNosql
Nosql
 
Datastores
DatastoresDatastores
Datastores
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Choosing your NoSQL storage
Choosing your NoSQL storageChoosing your NoSQL storage
Choosing your NoSQL storage
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
 
Some NoSQL
Some NoSQLSome NoSQL
Some NoSQL
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
Nosql
NosqlNosql
Nosql
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 

Mehr von Léopold Gault

Containers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoContainers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoLéopold Gault
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLéopold Gault
 
Leo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLeo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLéopold Gault
 
Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c Léopold Gault
 

Mehr von Léopold Gault (7)

OAuth OpenID Connect
OAuth OpenID ConnectOAuth OpenID Connect
OAuth OpenID Connect
 
SAML
SAMLSAML
SAML
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
 
Containers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoContainers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes Leo
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
 
Leo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLeo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 Days
 
Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c
 

Kürzlich hochgeladen

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Kürzlich hochgeladen (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

NoSQL Database Types Explained

  • 1. NoSQL Leo’s notes Those slides are Leopold Gault's notes, when reading : • https://www.thoughtworks.com/insights/blog/nosql-databases-overview • https://www.slideshare.net/arangodb/query-mechanisms-for-nosql-databases • https://www.slideshare.net/arangodb/introduction-to-column-oriented-databases • https://neo4j.com/developer/guide-data-modeling/ I am not a NoSQL expert; those notes are just my understanding of the aforementioned sources
  • 2.
  • 4. Relational data models (OLTP and OLAP) vs NoSQL data models NoSQL data modelsRelational data models Transactional (OLTP) Note that they represent a document as a hierarchical tree of data (it makes sense) I think they meant to represent a star schema
  • 5. Who do I think is meant to be normalized ? Transactional (OLTP) normalized normalizedDeliberately de-normalized Normalized ? Not normalized Not normalized NoSQL data modelsRelational data models
  • 6. Who do I think natively supports ACID transactions Transactional (OLTP) Always Most of the times (e.g. Node4J) Maybe sometimes NoSQL data modelsRelational data models Always Maybe sometimes Maybe sometimes
  • 7. Why aggregates Let’s say that my application always uses a set of data like this one
  • 8. Why aggregates In a RDBMS, such set of data would have to be fetched from many tables (requiring plenty of JOINs) Let’s say that my application always uses a set of data like this one
  • 9. Why aggregates In a RDBMS, such set of data would have to be fetched from many tables (requiring plenty of JOINs) Let’s say that my application always uses a set of data like this one We can see that there is a big mismatch between the way the data is aggregated by this application (i.e. the data is aggregated), and how the data was scattered in tables of the RDBMS.
  • 10. Aggregate-oriented DBMS NoSQL DBMS (bar Graph DBMS) are aggregate-oriented. An aggregate is a set of data, that will form the boundaries for ACID operations. Hence, the “acidity scope” is not at the transaction level, but at the aggregate level. Note however that some aggregate-oriented DBMS also support ACID transactions. An aggregate’s data have been grouped together only because it makes sense to do so, from the application’s point of view. This grouping is masterminded by a human. By: • the developer: when coding an app, the developer will try to identify which sets of data will be accessed together by the app. He will hence decide to write/read each set of data as an aggregate. • or the creator of materialized views, i.e. new aggregates emitted from disparate data.
  • 11. Why aggregate-oriented DBMS Working with aggregates is more performant. Indeed, an aggregate is stored together, instead of being scattered among many tables. The same applies when reading: it is quicker to retrieve a set of data that has been stored together, than if it had been scattered throughout many tables. In a cluster of an aggregate-oriented DBMS, an aggregate can live on the same node (or be replicated on the same few nodes). Thus our cluster can scale out without reducing the response time, as sets of data frequently accessed together (i.e. aggregates) are not cut into pieces that are scattered through many nodes. The same logic applies for sharding (an aggregate would belong to a single shard, instead of many) and replication.
  • 12. About aggregates Here are 2 formal definitions : • An aggregate is a collection of data that we interact with as a unit. These units of data (aggregates) form the boundaries for ACID operations (at the aggregate level) with the database. [source1] • Aggregate defines a collection of related objects that we treat as a unit. This unit is taken as a whole for the context of {data manipulation and management of consistency}. We update aggregates via atomic operations and communicate our data storage in terms of aggregates. NoSQL databases, apart from graph databases , have aggregate data models. However, relational databases have no concept of aggregates within their data model. These are considered aggregate-ignorant. An aggregate-ignorant model allows you to look at data in different ways, so it’s good when you don’t have a primary structure for manipulating data. Aggregate ignorant databases, like relational and graph databases, in general support ACID transactions. [source2]
  • 13. Who do I think is aggregate oriented? Transactional (OLTP) Yes (1 aggregate = 1 column / segment of column) Yes (1 aggregate can be a whole document (identified by its key), or a materialized view generated using map- reduce) Yes (1 aggregate = 1 value, i.e. a BLOB that bundles together a bunch of data, this bunch is meaningful only for the app) No No No Aggregate ignorant Aggregate oriented data models Maybe also a column family ? But I don’t think so I think the reason why Graph DBs are not “aggregate oriented” is because, despite storing data as interconnected nodes, a node is probably not considered as an aggregate; probably because the boundaries of an ACID operation extend beyond one node. NoSQL data modelsRelational data models
  • 14. Key-Value DBMS Performance, but ignorance of what the values mean.
  • 15. Key value DBMS BLOB. The K-V DBMS doesn’t care what’s inside this BLOB value; it’s up to the app to figure that out. key value key value key value key value key value key value Values are just BLOBs ; they have no meaning for the DBMS Key-value DBMS
  • 16. Key value DBMS key value key value key value key value key value key value API • get the value for a key, • put a value for a key, • delete a key-value pair How to query: with a very simple API Key-value DBMS
  • 18. <Value=Document> <Key=DocumentID> Documents DBMS key document key document key document key document key document “key-value stores where the value is examinable”; indeed this value is a document key document Depending on the DBMS, the document may be in JSON, XML, BSON, etc. Documents DBMS
  • 19. Documents DBMS key document key document key document key document key document Example with a JSON document key document Documents DBMS
  • 20. Documents DBMS key document key document key document key document key document How to query: with the document key, or (for some DBMS, like MongoDB) with attributes within documents key API MongoDB Actually, with MongoDB, it wouldn’t be a JSON doc, but a BSON one. So it’d look like this: x31x00x00x00 x04BSONx00 x26x00x00x00 x02x30x00x08x00x00x00awesomex00 x01x31x00x33x33x33x33x33x33x14x40 x10x32x00xc2x07x00x00 x00 x00
  • 21. Documents DBMS key document key document key document key document key document How to query: for some other DBMS (e.g. CouchDB), querying docs by anything else than their ID requires creating a materialized view, populated with JavaScript map-reduce code (for instance). key API CouchDB Document ID This functions will parse all the documents in the store, and emit the docID of docs where there is a match (where one of the topics is “music”). The load of running a map function can be distributed between nodes. I think that this map function should be followed by a reduce function that simply returns what it has been fed as parameters: e.g. nonReduce = function (keys, values, reduce) { if (reduce) { // never run }else{ // returns the emitted data return values; } }; false
  • 22. Documents DBMS key document key document key document key document key document Example with map and reduce key API CouchDB I think it's an array (with keys) of an array (with '1's) : values= [ 'skating':[1,1] 'music': [1], 'sleeping': [1,1,1,1] ]; length() of each nested array ? Boolean to say whether or not a re-reduce is needed That’s a key That’s a value
  • 23. Columnar DBMS To have the DBMS work on columns, instead of rows
  • 24. Columnar DBMS vs RDBMS How you use them Columnar DBMS • Data is stored in columns • You specify column families (kind of entities), that are composed of rows featuring some of the columns (among all the columns mentioned in the column-family). RDBMS • Data is stored in tables; each row contains data for all columns (although a value can be NULL) Col 1 Col 2 Col 3 Column family A row1 row2 row3 row4 Col 1 Col 2 Col 3 Table A row1 row2 row3 row4
  • 25. Why columnar DBMS ? The benefits of column-oriented DBMS reside only in the way they store data on-disk: they stores data by column instead of by row. This makes such DBMS more performant when you query a few columns, but read/write many things in those few columns. It also makes possible to store the columns in a compressed state, and only the columns being queried will be decompressed (on the fly). Such DBMS are meant for analytics or batch-processing use-cases (and not performant at all for OLTP).
  • 26. Colum oriented storage vs Row oriented storage Column oriented storage (columnar DBMS’ strategy) • Each column is stored in its own datafile source datafile0 datafile1 a. Adding/deleting a column is relatively cheep in I/O: it only requires working on a single small datafile. b. Columns are stored compressed on the disk. Only the columns you query will be decompressed (on the fly). Row oriented storage (RDBMS’ strategy) a. it might require to rewrite the whole table... b. you can’t compress rows, because the whole row has to be decompressed in order to be understandable (just like in a column-oriented storage, the whole column has to be decompressed, or at least the whole subset of a column –i.e. “segment” ?-). This means the whole table would have to be decompressed in order to be queried (I don’t think it you could only decompress a subset of the table, because it is hard to think of a meaningful way the table could have been chunked. Maybe you could only compress all the values except the ID, and chunk the table based on the ID; but it would only be useful for JOINs based on foreign key.). A decompressed table is often too big to fit only in memory, so you’d have to swap part of it on disk (which is slow) just to be able to query it.
  • 27. Colum oriented storage vs Row oriented storage when not to use Column oriented storage (columnar DBMS’ strategy) source • If you only want to work on a few rows (like it’s often the case in OLTP), it won’t be performant at all: you’ll have to read and decompress all the columns (or at least their relevant subsets), and then recompress and rewrite them. Row oriented storage (RDBMS’ strategy) • If you only need to work on a few columns, but the table has may columns, and you want to read/write many thing from those few columns, you’ll have to read the whole row, just to get the few column data that interests you. Col 1 Col 2 Col 3 You just want to modify a row FYI: Memory page: the smallest unit of data for virtual-memory management: the OS will move this unit of block from the HD to the RAM using I/O channels, and vice-versa. As it is the smallest unit, a page is read from disk as a whole, including unused space.
  • 28. Graph DBMS To store and query relationships
  • 29. How to deal with many relationships RDBMS • you would use JOINs to compute relationships, at query time. On top of being less intuitive, the performance of the JOINs will decrease exponentially with the size of the tables being joined. Graph DBMS • the relationships are natively stored, so no relationship will have to be computed at run time.
  • 30. Labelled Property Graph Model (e.g. implemented by Neo4J) A graph in such a model is composed of: • Nodes • Relationships (between 2 nodes.)
  • 31. Labelled Property Graph Model (e.g. implemented by Neo4J) About Nodes A node can contain: • Properties: multiple key-value pairs • Labels: tags representing the roles of the node in the data domain. They are used to group nodes into sets. Labels may also serve to attach metadata (index or constraint information) to certain nodes. Nodes + Label Labelled nodes Person Book Those names are properties
  • 32. Labelled Property Graph Model (e.g. implemented by Neo4J) About Relationships A relationship always has: • a direction: a start node, and an end node • a type (i.e. a name) • Properties: multiple key-value pairs Properties Properties The type of relationship is “HAS_READ”

Hinweis der Redaktion

  1. I think an aggregate is stored as a BLOB value (associated to a key), in a Key-Value DBMS a document, in a Document DBMS a column, in a Columnar DBMS