Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
2.Introduction to NOSQL (Core concepts).pptx
1. I N T R O D U C T I O N T O N O S Q L
Chapter 2
2. What is a NoSQL Database?
NoSQL is a type of database management system
(DBMS) that is designed to handle and store large
volumes of unstructured and semi-structured data.
Unlike traditional relational databases that use
tables with pre-defined schemas to store data,
NoSQL databases use flexible data models that can
adapt to changes in data structures and are capable
of scaling horizontally to handle growing amounts
of data.
3. A NoSQL Database, also known as a non SQL or
non-relational Database is a non-tabular Database
that stores data differently than the tabular
relations used in relational databases.
Companies widely used NoSQL Database
generally for big Data and real-time web
applications.
4. NoSQL Databases offer a simple design, horizontal
scaling for clustering machines, and limit the
object-relational impedance mismatch.
It uses different data structures from those used
by relational Databases making some operations
faster.
NoSQL Databases are designed to be flexible,
scalable, and capable of rapidly responding to the
data management demands of modern businesses.
5. Why NoSQL
Dynamic schema: NoSQL databases do not have a fixed schema
and can accommodate changing data structures without the need
for migrations or schema alterations.
Horizontal scalability: NoSQL databases are designed to scale
out by adding more nodes to a database cluster, making them well-
suited for handling large amounts of data and high levels of traffic.
Document-based: Some NoSQL databases, such as MongoDB,
use a document-based data model, where data is stored in
a scalessemi-structured format, such as JSON or BSON.
Key-value-based: Other NoSQL databases, such as Redis, use a
key-value data model, where data is stored as a collection of key-
value pairs.
Column-based: Some NoSQL databases, such as Cassandra, use a
column-based data model, where data is organized into columns
instead of rows.
6. Distributed and high availability: NoSQL databases
are often designed to be highly available and to
automatically handle node failures and data replication
across multiple nodes in a database cluster.
Flexibility: NoSQL databases allow developers to store
and retrieve data in a flexible and dynamic manner, with
support for multiple data types and changing data
structures.
Performance: NoSQL databases are optimized for high
performance and can handle a high volume of reads and
writes, making them suitable for big data and real-time
applications.
7. Aggregate Data Models
Aggregate means a collection of objects that are treated as a unit. In
NoSQL Databases, an aggregate is a collection of data that interact
as a unit. Moreover, these units of data or aggregates of data form
the boundaries for the ACID operations.
Aggregate Data Models in NoSQL make it easier for the Databases
to manage data storage over the clusters as the aggregate data or
unit can now reside on any of the machines. Whenever data is
retrieved from the Database all the data comes along with the
Aggregate Data Models in NoSQL.
Aggregate Data Models in NoSQL don’t support ACID transactions
and sacrifice one of the ACID properties. With the help of Aggregate
Data Models in NoSQL, you can easily perform OLAP operations on
the Database
8. Types of Aggregate Data Models
Key-Value
Model
Document
Model
Column
Family Model
Graph-Based
Model
Types
9. Key-Value Model
The Key-Value Data Model contains the key or an ID used
to access or fetch the data of the aggregates
corresponding to the key.
In this Aggregate Data Models in NoSQL, the data of the
aggregates are secure and encrypted and can be
decrypted with a Key
10. They are a simpler type of database where each item
contains keys and values.
A value can typically only be retrieved by referencing
its key, so learning how to query for a specific key-
value pair is typically simple.
Key-value databases are great for use cases where
you need to store large amounts of data but you don’t
need to perform complex queries to retrieve it.
Common use cases include storing user preferences
or caching. Redis and DynanoDB are popular key-
value databases.
11. examples of key-value databases:
Couchbase: It permits SQL-style querying and
searching for text.
Amazon DynamoDB: The key-value database which is
mostly used is Amazon DynamoDB as it is a trusted
database used by a large number of users. It can easily
handle a large number of requests every day and it also
provides various security options.
Riak: It is the database used to develop applications.
Aerospike: It is an open-source and real-time database
working with billions of exchanges.
Berkeley DB: It is a high-performance and open-source
database providing scalability.
12. Document Model
It store data in documents similar to JSON (JavaScript Object
Notation) objects.
Each document contains pairs of fields and values.
The values can typically be a variety of types including things like
strings, numbers, booleans, arrays, or objects, and their structures
typically align with objects developers are working with in code.
Because of their variety of field value types and powerful query
languages, document databases are great for a wide variety of use cases
and can be used as a general purpose database.
They can horizontally scale-out to accommodate large data volumes.
MongoDB is consistently ranked as the world’s most popular NoSQL
database according to DB-engines and is an example of a document
database.
13.
14. Examples of Document Data Models :
Amazon DocumentDB
MongoDB
Cosmos DB
ArangoDB
Couchbase Server
CouchDB
15. Column Family Model
It store data in tables, rows, and dynamic columns.
Wide-column stores provide a lot of flexibility over
relational databases because each row is not required to
have the same columns.
Many consider wide-column stores to be two-
dimensional key-value databases.
Wide-column stores are great for when you need to store
large amounts of data and you can predict what your
query patterns will be.
Wide-column stores are commonly used for storing
Internet of Things data and user profile data.
Cassandra and HBase are two of the most popular wide-
column stores.
16. the first level of the Column family contains the keys
that act as a row identifier that is used to select the
aggregate data. Whereas the second level values are
referred to as columns
17.
18.
19.
20. Graph-Based Model
Graph or network data models consider the
relationship between two pieces of information to be
as meaningful as the information itself.
As such, this data model is really made for any
information you would typically represent in a chart.
It uses relationships and nodes, where the data is the
information itself, and the connection is created
between the nodes.
21. It store data in nodes and edges.
Nodes typically store information about people,
places, and things while edges store information
about the relationships between the nodes.
Graph databases excel in use cases where you need
to traverse relationships to look for patterns such as
social networks, fraud detection, and
recommendation engines.
Neo4j and JanusGraph are examples of graph
databases.
22.
23.
24. Graph-based Data Models are used in social
networking sites to store interconnections.
It is used in fraud detection systems.
This Data Model is also widely used in Networks and
IT operations.
26. Distribution Models
The primary driver of interest in NoSQL has been its
ability to run databases on a large cluster.
As data volumes increase, it becomes more difficult
and expensive to scale up—buy a bigger server to run
the database on.
A more appealing option is to scale out—run the
database on a cluster of servers.
Aggregate orientation fits well with scaling out
because the aggregate is a natural unit to use for
distribution.
28. Single Server
NO distribution at all
Run the database on a single machine that handles
all the reads and writes to the data store
it eliminates all the complexities
it’s easy for operations people to manage and easy
for application developers to reason about
29. Single Server
NO
distribution
at all
Run the
database on
a single
machine
that handles
all the reads
and writes to
the data
store
it eliminates
all the
complexities
it’s easy for
operations
people to
manage and
easy for
application
developers
Graph
databases
30. Replication
Replication takes the same data and copies it over
multiple nodes.
Two Types:
master-slave
peer-to-peer
31. Master-Slave Replication
Replicate data across multiple nodes.
One node is designated as the master, or primary.
This master is the authoritative source for the data
and is usually responsible for processing any updates
to that data.
The other nodes are slaves, or secondaries.
A replication process synchronizes the slaves with
the master
32.
33. Advantages
read-intensive dataset.
You can scale horizontally to handle more read requests
by adding more slave nodes and ensuring that all read
requests are routed to the slaves
read resilience
Should the master fail, the slaves can still handle
read requests. Again, this is useful if most of
your data access is reads. The failure of the
master does eliminate the ability to handle
writes until either the master is restored or a
new master is appointed.
34. appoint a slave
replace a failed master means that master-slave replication
is useful even if you don’t need to scale out. All read and
write traffic can go to the master while the slave acts as a hot
backup.
Masters can be appointed manually or
automatically
Manual appointing typically means that when you configure
your cluster, you configure one node as the master.
With automatic appointment, you create a cluster of nodes
and they elect one of themselves to be the master.
35. Disadvantages:
Inconsistency
You have the danger that different clients, reading different
slaves, will see different values because the changes haven’t all
propagated to the slaves.
if the master fails, any updates not passed on to the
backup are lost.
36. Peer-to-Peer Replication
Peer-to-peer replication attacks these problems by
not having a master.
All the replicas have equal weight, they can all accept
writes, and the loss of any of them doesn’t prevent
access to the data store.
With a peer-to-peer replication cluster, you can ride
over node failures without losing access to data.
Furthermore, you can easily add nodes to improve
your performance.
37.
38. Problem: Inconsistency:
When you can write to two different places, you run
the risk that two people will attempt to update the
same record at the same time—a write-write conflict.
Inconsistencies on read lead to problems but at least
they are relatively transient.
Inconsistent writes are forever.
39. Solution:
1. we can ensure that whenever we write data, the
replicas coordinate to ensure to avoid a conflict.
don’t need all the replicas to agree on the write, just a
majority, so we can still survive losing a minority of
the replica nodes
2. we can decide to cope with an inconsistent write.
There are contexts when we can come up with policy
to merge inconsistent writes. In this case we can get
the full performance benefit of writing to any replica.
40. Sharding
a busy data store is busy because different people are
accessing different parts of the dataset.
In these circumstances we can support horizontal
scalability by putting different parts of the data onto
different servers—a technique that’s called
sharding
41.
42. In the ideal case, we have different users all talking
to different server nodes. Each user only has to talk
to one server, so gets rapid responses from that
server.
The load is balanced out nicely between servers
for example, if we have ten servers, each one only has to
handle 10% of the load.
43. In order to get close to it we have to ensure that data
that’s accessed together is clumped together on the
same node and that these clumps are arranged on
the nodes to provide the best data access
we design them to combine data that’s commonly
accessed together—so aggregates leap out as an
obvious unit of distribution
44. most accesses of certain aggregates are based on a
physical location,
you can place the data close to where it’s being
accessed.
Another factor is trying to keep the load even. This
means that you should try to arrange aggregates so
they are evenly distributed across the nodes which all
get equal amounts of the load.
45. Many NoSQL databases offer auto-sharding,
where the database takes on the responsibility
of allocating data to shards and ensuring that data
access goes to the right shard.
Sharding improve both read and write performance
Sharding provides a way to horizontally scale writes.
A node failure makes that shard’s data unavailable.
It’s not good to have a database with part of its data
missing
46. Combining Sharding and Replication
If we use both master-slave replication and sharding
this means that we have multiple masters, but each
data item only has a single master.
Depending on your configuration, you may choose a
node to be a master for some data and slaves for
others, or you may dedicate nodes for master or slave
duties.
48. Using peer-to-peer replication and sharding is a
common strategy for column-family databases.
In a scenario like this you might have tens or
hundreds of nodes in a cluster with data sharded
over them.
A good starting point for peer-to-peer replication is
to have a replication factor of 3, so each shard is
present on three nodes.
Should a node fail, then the shards on that node will
be built on the other nodes
50. Consistency
Biggest change from a centralized relational database
to a cluster oriented NoSQL
Relational databases: strong consistency
NoSQL systems: mostly eventual consistency
51. Update Consistency
Update is slightly differently, because each uses a slightly
different format.
This issue is called a write-write conflict: two people
updating the same data item at the same time.
When the writes reach the server, the server will serialize
them—decide to apply one, then the other.
Two people updating the same data item at the same time If
the server serialize them: one is applied and immediately
overwritten by the other (lost update)
52. Read-read (or simply read) conflict:
Different people see different data at the same time
Stale data: out of date
Replication is a source of inconsistency
54. Solutions
Pessimistic approach
Prevent conflicts from occurring.
Usually implemented with write locks managed by the system
Optimistic approach
Lets conflicts occur, but detects them and takes action to sort them
out
Approaches (for write-write conflicts):
conditional updates: test the value just before updating
save both updates: record that they are in conflict and then
merge them
55. Pessimistic v/s optimistic approach
Concurrency involves a fundamental tradeoff between:
consistency (avoiding errors such as update conflicts) and
availability (responding quickly to clients).
Pessimistic approaches often:
severely degrade the responsiveness of a system
leads to deadlocks, which are hard to prevent and debug.
56. Forms of consistency
Strong (or immediate) consistency :
ACID transaction
Logical consistency :
No read-write conflicts (atomic transactions)
Sequential consistency :
Updates are serialized
Session (or read-your-writes) consistency
Within a user’s session
Eventual consistency
You may have replication inconsistencies but eventually all nodes
will be updated to the same value
57. Relaxing Consistency
Consistency is a Good Thing—but, sadly, sometimes
we have to sacrifice it.
It is always possible to design a system to avoid
inconsistencies, but often impossible to do so
without making unbearable sacrifices in other
characteristics of the system.
As a result, we often have to tradeoff consistency for
something else.
Trading off consistency is a familiar concept even in
single-server relational database systems
58. Enforce consistency is the transaction, and transactions
can provide strong consistency guarantees.
Transaction systems usually come with the ability to
relax isolation levels, allowing queries to read data that
hasn’t been committed yet, and in practice we see most
applications relax consistency down from the highest
isolation level (serialized) in order to get effective
performance.
We most commonly see people using the read-committed
transaction level, which eliminates some read-write
conflicts but allows others
59. The CAP Theorem
Proposed by Eric Brewer in 2000 and given a formal
proof by Seth Gilbert and Nancy Lynch [Lynch and
Gilbert] a couple of years later.
“Given the three properties of Consistency,
Availability, and Partition tolerance, you can only get
two.”
60. CAP states that in case of failures you can have at
most two of these three properties for any shared-
data system
To scale out, you have to distribute resources.
P is not really an option but rather a need
The real selection is among Consistency or Availability
In almost all cases, you would choose availability over
consistency.
61. Consistency:
all people see the same data at the same time
Availability:
guarantee that every request receives a response about
whether it was successful or failed.
However, it does not guarantee that a read request returns the
most recent write.
The more number of users a system can cater to better is the
availability.
62. Partition tolerance:
The system continues to operate despite communication
breakages that separate the cluster into partitions unable to
communicate with each other
Out of these three guarantees, no system can provide
more than 2 guarantees.
Since in the case of distributed systems, the
partitioning of the network is a must, the tradeoff is
always between consistency and availability.
63. With two breaks in the communication lines, the network
partitions into two groups.
64. Martin and Pramod are both trying to book the last hotel
room on a system that uses peer-to-peer distribution
with two nodes (London for Martin and Mumbai for
Pramod).
If we want to ensure consistency, then when Martin tries
to book his room on the London node, that node must
communicate with the Mumbai node before confirming
the booking.
Essentially, both nodes must agree on the serialization of
their requests.
This gives us consistency—but should the network link
break, then neither system can book any hotel room,
sacrificing availability.
65. One way to improve availability is to designate one
node as the master for a particular hotel and ensure
all bookings are processed by that master. Should
that master be Mumbai, then Mumbai can still
process hotel bookings for that hotel and Pramod
will get the last room, whereas Martin can see the
inconsistent room information but cannot make a
booking (which would in this case cause an update
inconsistency). This is a lack of availability for
Martin.
66. To gain more availability, we might allow both
systems to keep accepting hotel reservations even if a
link in the network breaks down.
But this may cause both Martin a Pramod book the
same room => Inconsistency.
But in this domain it might be tolerated somehow:
the travel company may tolerate some overbooking;
some hotels might always keep a few rooms clear
even when they are fully booked; Some hotels might
even cancel the booking with an apology once they
detected the conflict
67.
68. CA(Consistency and Availability)-
The system prioritizes availability over consistency and can
respond with possibly stale data.
Example databases: Cassandra, CouchDB, Riak, Voldemort.
AP(Availability and Partition Tolerance)-
The system prioritizes availability over consistency and can
respond with possibly stale data.
The system can be distributed across multiple nodes and is
designed to operate reliably even in the face of network
partitions.
Example databases: Amazon DynamoDB, Google Cloud
Spanner.
69. CP(Consistency and Partition Tolerance)-
The system prioritizes consistency over availability and
responds with the latest updated data.
The system can be distributed across multiple nodes and is
designed to operate reliably even in the face of network
partitions.
Example databases: Apache HBase, MongoDB, Redis.
70. Version Stamps
Provide a means of detecting concurrency conflicts
Each data item has a version stamp which gets incremented
each time the item is updated
Before updating a data item, a process can check its version
stamp to see if it has been updated since it was last read
Implementation methods
Counter – requires a single master to “own” the counter
GUID (Guaranteed Unique ID) – can be computed by any
node, but are large and cannot be compared directly
Hash the contents of a resource
Timestamp of last update – node clocks must be synchronized
71. Counter – requires a single master to “own” the
counter
Always incrementing it when you update the resource.
Counters are useful since they make it easy to tell if one version is
more recent than another.
On the other hand, they require the server to generate the counter
value, and also need a single master to ensure the counters aren’t
duplicated.
72. For example.
R1 to R6 are replicas and M1 is master (which replica R7)C is
counter variable. Its value is 3. a=1(database item).All replicas
are having database item value i.e. a= 1. Following figure shows
all replicas are having value of database item a=1 and C=3. The
above situation is shown in following figure .
73. Now, Replica R3 want to update database item value to 6 i.e., a=6.
So, its counter value should be incremented first.
Server at master side will increment the value of counter by 1. i.e.
C=4. So, the values of database item and counter are reflected at
replica R3 and Master.
Now C=4 is the recent (highest counter value) value, so this update is
recent.
Master will propagate to all replicas this update
Now if any other replica wants to update database item value, then
its counter value should be updated first and communicated to
master then master will communicate it to the other replicas.
74.
75. GUID (Guaranteed Unique ID) – can be computed
by any node, but are large and cannot be compared
directly.
The large random number which is to be used to be assured as
unique.
This random number is created by taking combination of
dates, hardware information or any other way.
These GUID i.e., random numbers will never be same.
The disadvantage is that the random number is very large so
comparison of then for their recentness will be difficult.
76. Hashing -Hash the contents of a resource
With a big enough hash key size, a content hash can be globally
unique like a GUID and can also be generated by anyone;
Advantage is that they are deterministic—any node will
generate the same content hash for same resource data.
However, like GUIDs they can’t be directly compared for
recentness, and they can be lengthy
77. For example, consider database item a=1, replicas R1 to R6. R7
replica is Master. Replica R1 wat to update database item a to
value 43. With the help of simple hashing function of mod 10
(10 is hash key) it will generate bucket. i.e., 43%10= 3 Now it is
easy to find out that replica R1 is having recent updates
78.
79. Timestamp of last update – node clocks must be
synchronized
If any update is made timestamp will be checked.
Their working is same as counters, and they can be compared for
their recentness.
In this situation many replicas(machines) can generate timestamps.
One thing must keep in mind that all machines’ clocks should be
synchronized with each other.
If any replica will have bad clock (or its clock is not working properly
in synchronization manner) then data corruption problem will arise.
Database administrators must check granularity of timestamps
otherwise timestamps will also get duplicated. Timestamps are good
if milliseconds precision will be used.
80.
81. Now, Replica R2 want to update database item value to 4 i.e.,
a=4. R2s timestamps is by millisecond updated and it is now
TS=-3-25-22:02:31:29,571 so it is compared with other
replicas time stamp and 571>570 (with millisecond precision)
so R2 contains most updated value
82.
83. Map-Reduce
When you have a cluster, you have lots of machines
to spread the computation over.
However, you also still need to try to reduce the
amount of data that needs to be transferred across
the network.
The map-reduce pattern is a way to organize
processing in such a way as to take advantage of
multiple machine on a cluster while keeping as much
processing and the data it needs together on the
same machine.
84. This programming model gained prominence with
Google’s MapReduce framework [Dean and
Ghemawat, OSDI-04].
A widely used open-source implementation is part of
the Apache Hadoop project.
The name “map-reduce” reveals its inspiration from
the map and reduce operations on collections in
functional programming languages
85. Map Reduce – benefits
Complex details are abstracted away from the
developer
– No file I/O
– No networking code
– No synchronization
It’s scalable because you process one record at a time
A record consists of a key and corresponding value
86. Example:
Let us consider the usual scenario of customers and
orders
We have chosen order as our
aggregate, with each order having
line items.
Each line item has product ID,
quantity, an the price charged.
Sales analysis people want to see a
product and its total revenue for the last
seven days.
87. In order to get the product revenue report, you’ll
have to visit every machine in the cluster and
examine many records on each machine.
This is exactly the kind of situation that calls for
map-reduce. Again, the first stage in a map-reduce
job is the map.
A map is a function whose input is a single aggregate
and whose output is a bunch of key-value pairs.
In this case, the input would be an order, and the
output would be key-value pairs corresponding to
the line items
88. For this example, we are just selecting a value out of
the record, but there’s no reason why we can’t carry
out some arbitrarily complex function as part of the
map—providing it only depends on one aggregate’s
worth of data.
89. Each such pair would have the product ID as the key and an embedded
map with the quantity and price as the value
90. The reduce function takes multiple map outputs with the same key and
combines their values
91.
92. A map function might yield 1000 line items from
orders for “Database Refactoring”; the reduce
function would reduce down to one, with the totals
for the quantity and revenue.
While the map function is limited to working only on
data from a single aggregate, the reduce function can
use all values emitted for a single key
93. Partitioning and Combining
To increase parallelism, we can also partition the output of the
mappers and send each partition to a different reducer
(“shuffling”)
To take advantage of this, the results of the mapper are
divided up based the key on each processing node.
Typically, multiple keys are grouped together into partitions.
The framework then takes the data from all the nodes for one
partition, combines it into a single group for that partition,
and sends it off to a reducer.
Multiple reducers can then operate on the partitions in
parallel, with the final results merged together.
(This step is also called “shuffling,” and the partitions are
sometimes referred to as “buckets” or “regions.”)
94. The next problem we can deal with is the amount of data
being moved from node to node between the map and
reduce stages.
Much of this data is repetitive, consisting of multiple key-
value pairs for the same key.
A combiner function cuts this data down by combining
all the data for the same key into a single value A
combiner function is, in essence, a reducer function—
indeed, in many cases the same function can be used for
combining as the final reduction.
The reduce function needs a special shape for this to
work: Its output must match its input. We call such a
function a combinable reducer.
95.
96. When you have combining reducers, the map-reduce
framework can safely run not only in parallel (to reduce
different partitions), but also in series to reduce the same
partition at different times and places.
In addition to allowing combining to occur on a node before
data transmission, you can also start combining before
mappers have finished.
This provides a good bit of extra flexibility to the map-reduce
processing. Some map-reduce frameworks require all
reducers to be combining reducers, which maximizes this
flexibility.
If you need to do a noncombining reducer with one of these
frameworks, you’ll need to separate the processing into
pipelined map-reduce steps.