SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
NoSQL – Data Stores for Big Data
NoSQL - Data Stores
for Big Data
2nd International ScaDS Summer School on Big Data
Anika Groß, Database Group, Universität Leipzig
Leipzig, 12.07.2016
NoSQL – Data Stores for Big Data
“NoSQL for BigData“
• Massive data growth
• Big data, cloud, real-time applications, …
• Requirements
• High read and write scalability
• Management of unstructured and semi-structured data
• Continuous availability
• Decentralized applications
• …
• Modern NoSQL data stores pioneered by
leading internet companies as in-house solutions
2
Figure: https://www.dezyre.com/article/nosql-vssql-4-
reasons-why-nosql-is-better-for-big-data-applications/86
NoSQL – Data Stores for Big Data
“Not only SQL”
• No standardized definition!
• Non-relational approaches
• Different applications require different types of databases
• Database system with one or more of these criteria:
• No relational data model
• Schema free, only weak restrictions
• No joins, no normalization
• Distributed, horizontally scalable system
• Use of commodity hardware
• “No SQL”
• Simple API instead of SQL
• “No transactions”
• BASE consistency model instead of ACID
3
NoSQL – Data Stores for Big Data
Key-Value Stores
Wide Column Stores
Document Stores
Graph Databases
NoSQL Data Stores
4
*
* multi-model
*
• Collection of key-value pairs
• Data access via key: get(key), put(key, value)
• Semi-structured data in documents (e.g. JSON)
• Access via key or simple API/query language
• Tables with records with (many) dynamic columns
• Access via key, SQL-like query language, ..
• Data as nodes and edges with properties
• Database queries incl. graph algorithms
NoSQL – Data Stores for Big Data
DB Engines Ranking
5http://db-engines.com/en/ranking
NoSQL – Data Stores for Big Data
Agenda
• CAP, ACID, BASE, Consistency Models
• Key-value store: Dynamo
• Consistent hashing
• Object versioning
• Quorum-like consistency model
• …
• Document store: MongoDB
• Query language
• Indexing
• Replication
• Sharding
6
NoSQL – Data Stores for Big Data
Distributed Data Management
• Distributed system needs to deal with
• Network failures
• Network latency, limited throughput
• Change of network topology
• …
• Communication between nodes
• a.o. synchronization and replication
• Robust against node failure, loss of messages, …
• Trade-off: performance vs. data consistency
• Wait for synchronization between nodes
• Avoid conflicts / inconsistencies
7
NoSQL – Data Stores for Big Data
• Consistency
• All nodes see the same data at
the same time
• Availability
• Every read or write request receives
a response (succeeded or failed)
• Partitioning tolerance
• System continues to operate despite arbitrary partitioning
due to network failures (loss of messages)
• Theorem: A distributed computer system can at most
provide two of these three properties.
CAP Theorem
8
Brewer: Towards robust distributed systems. Proceedings of the
Annual ACM Symposium on Principles of Distributed Computing, 2000
Partition
Tolerance
Availability
Consistency
NoSQL – Data Stores for Big Data
Source: Misconceptions about
the CAP Theorem
CAP Theorem (2)
9
Partition
Tolerance
Availability
Consistency
CP
AP
(CA)
MongoDB
BigTable
HBase
Dynamo/S3
Cassandra
• Consistent but not available under
network partitions
• Lock transactions, avoid conflicts, …
• Available but not consistent under
network partitions
• Writes always possible even if no
communication/synchronization is possible
• Inconsistent data, conflict resolution necessary
• Controversy!
• “2 of 3” was misleading, no CA:
CAP Twelve Years Later: How the "Rules" Have Changed
• Classification of systems difficult:
Please stop calling databases CP or AP
CP
AP
NoSQL – Data Stores for Big Data
ACID
• RDBMS ensure ACID properties for transactions:
• Atomicity
• "all or nothing” – property
• if part of the transaction fails, the entire transaction fails, and the database state
is left unchanged
• Consistency
• A successful transaction preserves the database consistency
• Guarantee defined integrity constraints
• Isolation
• Concurrent execution of transactions results in a system state as if
transactions were executed serially
• Transactions can not rely on intermediate or unfinished state
• Durability
• Successfully committed transactions will remain, even in the event of system
failure, power loss, other breakdowns (persistency)
10
NoSQL – Data Stores for Big Data
BASE
• BA - Basically Available
• Partial network failure → response to any request (response could be ‘failure’)
• Replication factor = 3, 1 node fails: query response still possible
• S - Soft State
• The system could change over time
• Even during times without input → changes due to “eventual consistency”
→ state of the system is always “soft”
• E - Eventually Consistent
• Consistency is not checked for every transaction before it moves
onto the next one → Replica can be inconsistent
• The system will eventually become consistent (once it stops receiving input)
• “Sooner or later” the data will be propagated to everywhere it should
11
NoSQL – Data Stores for Big Data
Consistency Models
Strong Consistency
12
inconsistency window
r(x)=v1 r(x)=v2
r(x)=v2update(x,v2)
r(x)=v2
r(x)=v1 r(x)=v2
r(x)=v1update(x,v2)
r(x)=v1
r(x)=v1 r(x)=v2
t
t
Eventual Consistency
Eventually (after the
inconsistency window
closes) all accesses will
return the last
updated value.
NoSQL – Data Stores for Big Data
Consistency Models
Read-your-writes Consistency
13
Monotonic Read Consistency
r(x)=v1 r(x)=v1
r(x)=v2update(x,v2)
r(x)=v1
r(x)=v1 r(x)=v2
r(x)=v1update(x,v2)
r(x)=v1
r(x)=v2 r(x)=v2
r(x)=v2
t
t
Client reads updated value
→ will never read any
previous value
Client updates item
→ will always access the
updated value and never
see an older value
NoSQL – Data Stores for Big Data
Agenda
• CAP, ACID, BASE, Consistency Models
• Key-value store: Dynamo
• Consistent hashing
• Object versioning
• Quorum-like consistency model
• …
• Document store: MongoDB
• Query language
• Indexing
• Replication
• Sharding
14
NoSQL – Data Stores for Big Data
Key-Value Stores
• Data structure: collection of
key-value pairs = associative
array / dictionary / map
• Key
• Unique within a namespace
Namespace = collection of keys, ‘bucket‘
• Values
• Uninterpreted string of bytes of arbitrary length (BLOB)
• No integrity constraints (check on application side)
• Different types of key-value stores
• Different consistency models, ordered/unordered keys,
RAM vs. disk/SSD
15
Sales
key 1 value
key 2 value
key 3 value
key n value
…
…
Inventory
key 1 value
key 2 value
key 3 value
key n value
…
…
Product
descriptions
key 1 value
key 2 value
key 3 value
key n value
…
…
NoSQL – Data Stores for Big Data
Amazon Dynamo
• Scalable distributed data store built
for Amazon’s platform
• Dynamo principles (or part of them)
implemented in several NoSQL solutions
• “Not only” Dynamo:
e.g. Cassandra = Dynamo + BigTable
• Motivation
• Scale to extreme peak loads efficiently without any downtime
• e.g. busy holiday shopping season
16
DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store.
ACM SIGOPS Operating Systems Review, 41(6), 2007.
Project
Voldemort
NoSQL – Data Stores for Big Data
Amazon Dynamo
• Aims: high availability and performance
• Address tradeoffs between availability, consistency,
cost-effectiveness and performance
• Eventually-consistent storage system
• “always writeable” data store
• Favor availability over consistency (if necessary)
• Performance SLA (Service Level Agreement)
• “response within 300ms for 99.9% of requests for peak client load
of 500 requests per second”
• Decentralized system: P2P-like distribution
• No master nodes
• All nodes have the same functionality
17
NoSQL – Data Stores for Big Data
Techniques
• Consistent hashing
• Object versioning / vector clocks
• Quorum-like consistency model
• Decentralized replica synchronization protocol
• Gossip-based membership protocol and failure detection
18
NoSQL – Data Stores for Big Data
Partitioning and Replication of Keys
• Logical ring of nodes
• Output range of a hash function
→ fixed circular space
• Node position is a random value
in the range of the hash function
• Assignment of data to nodes
• Determine hash value of keys → position on ring
• Assign to N successor nodes (clockwise)
• Hash value between A and B, N=3 → B, C, D
Consistent hashing
• Minimize the number of re-assignments when nodes are added or removed
• Need sophisticated hash function for good load balancing and data locality
• Preference list: list of nodes that is responsible for storing a
particular key (every node knows the preference list)
19
A
B
C
DE
F
G
Hash(key)
NoSQL – Data Stores for Big Data
Data Access
• Key-Value Store interface
• Access via Primary-Key; no complex queries
• Every node in the ring can route each query
• Routing to (the usually first) node in preference list of the specific key
• Put (Key, Context, Object)
• Coordinator creates vector clock (versioning) based on context
• Local write of object incl. vector clock
• Asynchronous replication
• Write request to N-1 remaining nodes in preference list
• Successful write, if (at least) W-1 nodes respond
• Asynchronous update of replicas W<N → consistency problems
• Get (Key)
• Read request to N nodes in preference list
• Response from R nodes → possibly different versions of the same object:
List of (Object, Context) pairs
20
NoSQL – Data Stores for Big Data
Replication
• Read/Write quorum
• R/W = minimal number of N replica nodes that must
participate in a successful read/write operation
• Flexible adaptation of (N,R,W) according to application
requirements w.r.t. performance, availability, durability
• Ensure read of current version: R + W > N
• No loss of information
• Conflict resolution
• Data store side: e.g. “last write wins”
• Application side: e.g. merge conflicting shopping cart versions
21
NoSQL – Data Stores for Big Data
Quorum variants
• Optimizing reads: R=1, W=N
• Consistency due to „write to all“
= wait for all write ack’s
• Optimizing writes: R=N, W=1
• Consistency due to „read from all“
= last version will be included
• R+W>N
R=N=3, W=1
R
W
R=1, W=N=3
R=3, W=3, N=5 R=4, W=2, N=5
Eventual consistency: R+W≤N
• Read might not cover current write
R=2, W=2, N=4
NoSQL – Data Stores for Big Data
Versioning
• Aim: Capture causality between different versions of an object
• Which object versions are known?
• Parallel branches or causal ordering?
• Vector clock: List of (node, counter) pairs
• Version counter per replica node
e.g. 𝐷([𝑆𝑥, 1]) for object 𝐷, node 𝑆𝑥, version 1
• Example: evolving object versions
23
Sx
Sy
Sz m1, m2 - messages
NoSQL – Data Stores for Big Data
Versioning (2)
• Does one object version descend
from another one?
• Counters 1st vector clock
≤ all counters 2nd vector clock
→ 1st version is ancestor of 2nd
(forget 1st version)
• Otherwise: conflicting versions
• Client identifies conflict
during read
• Gets all known versions
• Subsequent update
consolidates versions
24
Figure: DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store.
ACM SIGOPS Operating Systems Review, 41(6), 2007.
NoSQL – Data Stores for Big Data
Handling Temporary Failures
• “ Sloppy“ Quorum (N, R, W)
• Perform all operations on the first N healthy nodes from
the preference list (not necessarily the first N nodes in the ring)
• Still “writable” in case of node failure
• Hinted Handoff
• Unreachable node → write request send to other node (“hinted replica”)
• Availability!
• Node recovers → sync of hinted replica and original node
• Example
• B is not available
• Replica to E (handoff) with hint
to intended recipient B
• B recovers
• Hinted replica E → B
• E can delete hinted replica
25
NoSQL – Data Stores for Big Data
Replica Synchronization
• Hash tree (Merkle tree) for key range
• Leaves are hashes of the values for individual keys
• Parent nodes are hash values of
child node hash values
• Advantages:
• Efficient check: equal root hashes
→ replica are in sync
• Efficient identification of “out of sync”
keys: subtree traversal to find differences
• Disadvantages
• Recalculation of hash trees in case of
repartitioning (added or removed node)
26
K1
V1
K2
V2
K3
V3
K4
V4
H(k1) H(k2) H(k4)H(k3)
H(H(k1), H(k2)) H(H(k3), H(k4))
H(H(H(k1), H(k2)), H(H(k3), H(k4)))
...
...
H(...)
...
NoSQL – Data Stores for Big Data
Overview: Amazon Dynamo Techniques
Problem Technique Advantage
Partitioning Consistent Hashing Incremental Scalability
High Availability
for writes
Vector clocks with
reconciliation during reads
Version size is decoupled from update rates
Handling
temporary failures
Sloppy Quorum and
hinted handoff
Provides high availability and durability
guarantee when some of the replicas are not
available
Recovering from
permanent
failures
Anti-entropy using
Merkle trees
Efficient synchronization of divergent replicas
in the background
Membership and
failure detection
Gossip-based membership
protocol and failure
detection
Preserves symmetry and avoids having a
centralized registry for storing membership
and node liveness information
27
Source: DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store.
ACM SIGOPS Operating Systems Review, 41(6), 2007.
NoSQL – Data Stores for Big Data
Agenda
• CAP, ACID, BASE, Consistency Models
• Key-value store: Dynamo
• Consistent hashing
• Object versioning
• Quorum-like consistency model
• …
• Document store: MongoDB
• Query language
• Indexing
• Replication
• Sharding
28
NoSQL – Data Stores for Big Data
• Collection of documents
• Semi-structured data (e.g. JSON format)
• Flexible, extensible schema
• Embedded (denormalized) data model
• Data access via key, (simple) query language, map/reduce queries
• Use cases: web applications, mobile applications,
e-commerce solutions …
• Examples
Document Stores
29
Database
Collection
{
{doc1}
{doc2}
…
}
Collection
{
{doc1}
{doc2}
…
}
NoSQL – Data Stores for Big Data
Example – Collection “images”
{
_id: 1,
name: “fish.jpg”,
time: 17:46,
user: “bob”,
camera: “nikon”,
info: { width: 100, height: 200, size: 12345 },
tags: [“tuna”, “shark”]
}
30
id name time user camera info tags
width height size
1 fish.jpg 17:46 bob nikon 100 200 12345 [tuna, shark]
2 trees.jpg 17:57 john canon 30 250 32091 [oak]
3 hawaii.png 17:59 john nikon 128 64 92834 [maui, tuna]
4 island.gif 17:43 zztop nikon 640 480 50398 [maui]
{
_id: 2,
name: “trees.jpg”,
time: 17:57,
user: “john”,
camera: “canon”,
info: { width: 30, height: 250, size: 32091 },
tags: [oak]
}
…
embedded document
array of strings
field: value
NoSQL – Data Stores for Big Data
mongoDB
• Open source document database
• Current release 3.2
• Embedded data model
• JSON-like documents (BSON = Binary JSON)
• Features
• Query language
• Indexing
• Replication
• Sharding
31
NoSQL – Data Stores for Big Data
Query Language
• Selection, projection:
db.images.find({camera:"nikon"}, {name:1, camera:1, _id:0})
• Querying multi-valued attributes:
• Pictures with tag "shark“: db.images.find({tags:"shark"})
• Pictures with tags "a", "b" and "c":
db.images.find({tags:{$all:["a","b","c"]}})
• Querying nested objects
• Pictures with width < 100px : db.images.find({info.width:{$lt:100}})
32
{
_id: 1,
name: “fish.jpg”,
…
camera: “nikon”,
info: { width: 100, height: 200, size: 12345 },
tags: [“tuna”, “shark”]
}
NoSQL – Data Stores for Big Data
Aggregation Framework
• Pipeline of operators
• $match: filter documents
• $project: inclusion, suppression, new field, value reset for attributes
• $group: grouping and aggregation
• $unwind: unnest arrays (one document per array element)
• $sort, $limit, $skip, …
http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
NoSQL – Data Stores for Big Data
Indexing
• Aim: Efficient query execution
• Avoid collection scan
• Similar to other database
systems (B-tree data structure)
Index Types
• Default _id index (unique)
• Single field index
• Compound index: e.g. { userid: 1, score: -1 } → 1 asc, -1 desc
• Multikey index: index content of arrays
(separate index entries for every array element)
• Geospatial index: index coordinate pairs
34
https://docs.mongodb.com/manual/indexes/
NoSQL – Data Stores for Big Data
Replication
• Aim: redundancy, data availability
• Asynchronous master-slave replication
• Replica Sets
• Group of servers (mongod instances) with
multiple copies of the same data set
• Writes
• Primary receives all write operations
• Writes recorded to operation log (oplog), ‘write acknowledgement’
• Replication of oplog to “secondaries”
• Secondaries apply operations asynchronously
• ‘majority’ writeConcern: ack’ write from majority of members (not only primary)
• Reads
• Default “primary”: all reads directed to primary
• Read preference modes
• “primary preferred”, “secondary preferred”, “nearest” … → eventual consistency
35
https://docs.mongodb.com/manual/replication
NoSQL – Data Stores for Big Data
Atomicity, Isolation, Durability
• Atomic single document writes
• write can update multiple fields in a document
→ reader can not see partially updated documents
• Non-atomic (!) multiple document writes
• Alternative: $isolated operator
• Isolate multi-document update operation (no interleaving operations)
• BUT: not “all-or-nothing” atomicity (no rollback after error during write)
• $isolated does not work for sharded clusters
• Durability
• In replica set: update written to a majority of voting nodes’ journal files
• readConcern
• ‘local’: concurrent readers may see the updated document
before changes are durable (read uncommitted)
• ‘majority’: client can read only durable writes
36
NoSQL – Data Stores for Big Data
https://docs.mongodb.com/manual/
core/replica-set-elections/
Automatic Failover
• Primary election
• Primary inaccessible for 10 seconds
• During election
→ no primary → read-only
• New primary: first secondary that,
• holds election
• received a majority of the members’ votes
• has most current optime (timestamp last write)
• Network partition
• Minority partition: primary downgraded to secondary
• Rollback: revert writes on former primary when
it rejoins its replica set after failover
• Majority partition: if necessary, election of new primary
37
NoSQL – Data Stores for Big Data
Sharding
• Aim: scalability
• Horizontal partitioning into shards
• Shard
• Contains subset of the data
• Every shard can be a replica set
• Shard key
• Immutable field or fields, that exist in
every document of the collection
• Collection must be indexed on the shard key
38
https://docs.mongodb.com/manual/sharding/
NoSQL – Data Stores for Big Data
Sharding (2)
• mongos
• query router
• interface between client
applications and sharded
cluster
• config servers
• store metadata and
configuration settings for
the cluster (data location)
• can be deployed as
replica set
• mongod
• Primary daemon process
for MongoDB
• Request handling,
manages data access, …
39
Web server
mongos
App
Web server
mongos
App
mongod
configsrv
mongod
configsrv
mongod
mongod
mongod
Shard01
Replica Set
rs01
mongod
mongod
mongod
Shard02
Replica Set
rs02
mongod
mongod
mongod
Shard03
Replica Set
rs03
mongod
configsrv
Source: Tilmann Beittner, Jeremias Brödel: Erste Gehversuche mit
MongoDB - Schritt für Schritt. iXDeveloper, Big Data, 02/2015.
NoSQL – Data Stores for Big Data
Sharding - Chunks
• Sharded data is partitioned into chunks
• Lower and upper range based on shard key
• Shard split:
• chunk size > max chunk size
(default 64MB)
• #documents > max # documents
per chunk
• Migration of chunks across
shards (even balance)
40https://docs.mongodb.com/manual/core/sharding-data-partitioning
NoSQL – Data Stores for Big Data
Hashed and Ranged Sharding
• Hash of shard key field’s value
• Range of hashed shard key values
assigned to each chunk
+ Even data distribution
- “close range” of shard key values
unlikely in same chunk
41
• Specific key range → same chunk
+ Efficient range queries
• Routing only to shards that
contain required data
- Possibly uneven data distribution
• Careful selection of shard key!
Hash-based Range-based
NoSQL – Data Stores for Big Data
CP system? - take care …
• “Jepsen: MongoDB stale reads”
• “… we’ll see that Mongo’s consistency model is broken by design: not only
can “strictly consistent” reads see stale versions of documents, but they
can also return garbage data from writes that never should have
occurred. …”
• Source: https://aphyr.com/posts/322-jepsen-mongodb-stale-reads
42
NoSQL – Data Stores for Big Data
Summary
• CAP, ACID, BASE, Consistency Models
• Key-value store: Dynamo
• Consistent hashing , vector clocks, sloppy quorum, …
• Document data store: MongoDB
• Query language, indexing, replication, sharding, …
• The “best” database? Application dependent!
• Relational vs. non-relational
(document, graph… ) data model?
• ACID transactions?
• Large data sets? High query load?
Need for distributed data storage?
• Availability, consistency requirements?
• …
43
NoSQL – Data Stores for Big Data
References
• Eric A. Brewer: Towards robust distributed systems. Proceedings of the Annual ACM
Symposium on Principles of Distributed Computing (PODS), 2000
• Eric A. Brewer: CAP Twelve Years Later: How the "Rules" Have Changed, Computer,
45(2), 2012. https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
• Martin Kleppmann: Please stop calling databases CP or AP, 2015.
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
• Giuseppe DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM
SIGOPS Operating Systems Review, 41(6), 2007.
• MongoDB documentation: https://docs.mongodb.com/
• Tilmann Beittner, Jeremias Brödel: Erste Gehversuche mit MongoDB - Schritt für
Schritt. iXDeveloper, Big Data, 02/2015.
• Kyle Kingsbury: Jepsen: MongoDB stale reads, 2015.
https://aphyr.com/posts/322-jepsen-mongodb-stale-reads
• Lecture “NoSQL-Datenbanken”, Database Group, Universität Leipzig
• Contributors: Anika Groß, Martin Junghanns, Lars Kolb, Andreas Thor
44

Weitere ähnliche Inhalte

Andere mochten auch

Manual do Candidato do DEM 2016
Manual do Candidato do DEM 2016Manual do Candidato do DEM 2016
Manual do Candidato do DEM 2016Paulo Veras
 
Pesquisa da Nassau sobre eleições no Recife
Pesquisa da Nassau sobre eleições no RecifePesquisa da Nassau sobre eleições no Recife
Pesquisa da Nassau sobre eleições no RecifeJamildo Melo
 
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - Coal...
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - Coal...FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - Coal...
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - Coal...FGV | Fundação Getulio Vargas
 
Como se eleger vereador
Como se eleger vereadorComo se eleger vereador
Como se eleger vereadorCaciano Ca
 
Comportamentos eleitorais - Prof. Doutor Rui Teixeira Santos
Comportamentos eleitorais - Prof. Doutor Rui Teixeira SantosComportamentos eleitorais - Prof. Doutor Rui Teixeira Santos
Comportamentos eleitorais - Prof. Doutor Rui Teixeira SantosA. Rui Teixeira Santos
 
Utilizando a Programação Multicritério (Analytic Hierarchy Process - AHP) par...
Utilizando a Programação Multicritério (Analytic Hierarchy Process - AHP) par...Utilizando a Programação Multicritério (Analytic Hierarchy Process - AHP) par...
Utilizando a Programação Multicritério (Analytic Hierarchy Process - AHP) par...Ricardo Viana Vargas
 
#PESQUISA365 - Pesquisa de Opinião - Implantação do Uber em Manaus - Fevereir...
#PESQUISA365 - Pesquisa de Opinião - Implantação do Uber em Manaus - Fevereir...#PESQUISA365 - Pesquisa de Opinião - Implantação do Uber em Manaus - Fevereir...
#PESQUISA365 - Pesquisa de Opinião - Implantação do Uber em Manaus - Fevereir...Durango Duarte
 
Método das Eliminações Sucessivas
Método das Eliminações SucessivasMétodo das Eliminações Sucessivas
Método das Eliminações Sucessivasmatematica97
 
The Potential and Limits of Digital Election Forensics
The Potential and Limits of Digital Election ForensicsThe Potential and Limits of Digital Election Forensics
The Potential and Limits of Digital Election Forensicshyperak
 
As Eleições 2010 em 140 caracteres: O Twitter como ferramenta de comunicação ...
As Eleições 2010 em 140 caracteres: O Twitter como ferramenta de comunicação ...As Eleições 2010 em 140 caracteres: O Twitter como ferramenta de comunicação ...
As Eleições 2010 em 140 caracteres: O Twitter como ferramenta de comunicação ...Cinthia Mendonça
 
#PESQUISA365 - Pesquisa de Opinião - Assembleia Legislativa do Estado do Amaz...
#PESQUISA365 - Pesquisa de Opinião - Assembleia Legislativa do Estado do Amaz...#PESQUISA365 - Pesquisa de Opinião - Assembleia Legislativa do Estado do Amaz...
#PESQUISA365 - Pesquisa de Opinião - Assembleia Legislativa do Estado do Amaz...Durango Duarte
 
When the winner_comes_third
When the winner_comes_thirdWhen the winner_comes_third
When the winner_comes_thirddmsilv
 
Painel mural-statements-institucionais-partidos-políticos-brasileiros
Painel mural-statements-institucionais-partidos-políticos-brasileirosPainel mural-statements-institucionais-partidos-políticos-brasileiros
Painel mural-statements-institucionais-partidos-políticos-brasileirosManoel Marcondes Neto
 
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - #Não...
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - #Não...FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - #Não...
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - #Não...FGV | Fundação Getulio Vargas
 
Eleição Presidencial
Eleição PresidencialEleição Presidencial
Eleição Presidencialdmsilv
 
Rodando um Recomendador no Hadoop usando Mahout
Rodando um Recomendador no Hadoop usando MahoutRodando um Recomendador no Hadoop usando Mahout
Rodando um Recomendador no Hadoop usando MahoutIvanilton Polato
 

Andere mochten auch (20)

Manual do Candidato do DEM 2016
Manual do Candidato do DEM 2016Manual do Candidato do DEM 2016
Manual do Candidato do DEM 2016
 
Pesquisa da Nassau sobre eleições no Recife
Pesquisa da Nassau sobre eleições no RecifePesquisa da Nassau sobre eleições no Recife
Pesquisa da Nassau sobre eleições no Recife
 
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - Coal...
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - Coal...FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - Coal...
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - Coal...
 
Resolução TSE 23.459 (anexo) TABELA - Municípios x Valores
Resolução TSE 23.459 (anexo) TABELA - Municípios x ValoresResolução TSE 23.459 (anexo) TABELA - Municípios x Valores
Resolução TSE 23.459 (anexo) TABELA - Municípios x Valores
 
Brasil - Eleições 2014
Brasil - Eleições 2014Brasil - Eleições 2014
Brasil - Eleições 2014
 
Como se eleger vereador
Como se eleger vereadorComo se eleger vereador
Como se eleger vereador
 
Comportamentos eleitorais - Prof. Doutor Rui Teixeira Santos
Comportamentos eleitorais - Prof. Doutor Rui Teixeira SantosComportamentos eleitorais - Prof. Doutor Rui Teixeira Santos
Comportamentos eleitorais - Prof. Doutor Rui Teixeira Santos
 
Utilizando a Programação Multicritério (Analytic Hierarchy Process - AHP) par...
Utilizando a Programação Multicritério (Analytic Hierarchy Process - AHP) par...Utilizando a Programação Multicritério (Analytic Hierarchy Process - AHP) par...
Utilizando a Programação Multicritério (Analytic Hierarchy Process - AHP) par...
 
#PESQUISA365 - Pesquisa de Opinião - Implantação do Uber em Manaus - Fevereir...
#PESQUISA365 - Pesquisa de Opinião - Implantação do Uber em Manaus - Fevereir...#PESQUISA365 - Pesquisa de Opinião - Implantação do Uber em Manaus - Fevereir...
#PESQUISA365 - Pesquisa de Opinião - Implantação do Uber em Manaus - Fevereir...
 
Método das Eliminações Sucessivas
Método das Eliminações SucessivasMétodo das Eliminações Sucessivas
Método das Eliminações Sucessivas
 
Itatiba eleições 2016
Itatiba eleições 2016Itatiba eleições 2016
Itatiba eleições 2016
 
The Potential and Limits of Digital Election Forensics
The Potential and Limits of Digital Election ForensicsThe Potential and Limits of Digital Election Forensics
The Potential and Limits of Digital Election Forensics
 
As Eleições 2010 em 140 caracteres: O Twitter como ferramenta de comunicação ...
As Eleições 2010 em 140 caracteres: O Twitter como ferramenta de comunicação ...As Eleições 2010 em 140 caracteres: O Twitter como ferramenta de comunicação ...
As Eleições 2010 em 140 caracteres: O Twitter como ferramenta de comunicação ...
 
#PESQUISA365 - Pesquisa de Opinião - Assembleia Legislativa do Estado do Amaz...
#PESQUISA365 - Pesquisa de Opinião - Assembleia Legislativa do Estado do Amaz...#PESQUISA365 - Pesquisa de Opinião - Assembleia Legislativa do Estado do Amaz...
#PESQUISA365 - Pesquisa de Opinião - Assembleia Legislativa do Estado do Amaz...
 
When the winner_comes_third
When the winner_comes_thirdWhen the winner_comes_third
When the winner_comes_third
 
Semana de MKT: Os desafios da pesquisa
Semana de MKT: Os desafios da pesquisaSemana de MKT: Os desafios da pesquisa
Semana de MKT: Os desafios da pesquisa
 
Painel mural-statements-institucionais-partidos-políticos-brasileiros
Painel mural-statements-institucionais-partidos-políticos-brasileirosPainel mural-statements-institucionais-partidos-políticos-brasileiros
Painel mural-statements-institucionais-partidos-políticos-brasileiros
 
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - #Não...
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - #Não...FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - #Não...
FGV / IBRE – Seminário "Eleições 2014: Panoramas Político e Eleitoral" - #Não...
 
Eleição Presidencial
Eleição PresidencialEleição Presidencial
Eleição Presidencial
 
Rodando um Recomendador no Hadoop usando Mahout
Rodando um Recomendador no Hadoop usando MahoutRodando um Recomendador no Hadoop usando Mahout
Rodando um Recomendador no Hadoop usando Mahout
 

Kürzlich hochgeladen

Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...Sebastiano Panichella
 
Sunlight Spectacle 2024 Practical Action Launch Event 2024-04-08
Sunlight Spectacle 2024 Practical Action Launch Event 2024-04-08Sunlight Spectacle 2024 Practical Action Launch Event 2024-04-08
Sunlight Spectacle 2024 Practical Action Launch Event 2024-04-08LloydHelferty
 
Scootsy Overview Deck - Pan City Delivery
Scootsy Overview Deck - Pan City DeliveryScootsy Overview Deck - Pan City Delivery
Scootsy Overview Deck - Pan City Deliveryrishi338139
 
Understanding Post Production changes (PPC) in Clinical Data Management (CDM)...
Understanding Post Production changes (PPC) in Clinical Data Management (CDM)...Understanding Post Production changes (PPC) in Clinical Data Management (CDM)...
Understanding Post Production changes (PPC) in Clinical Data Management (CDM)...soumyapottola
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Sebastiano Panichella
 
General Elections Final Press Noteas per M
General Elections Final Press Noteas per MGeneral Elections Final Press Noteas per M
General Elections Final Press Noteas per MVidyaAdsule1
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitysandeepnani2260
 
Application of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxApplication of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxRoquia Salam
 
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityDon't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityApp Ethena
 
GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE
 
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptxerickamwana1
 

Kürzlich hochgeladen (11)

Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
 
Sunlight Spectacle 2024 Practical Action Launch Event 2024-04-08
Sunlight Spectacle 2024 Practical Action Launch Event 2024-04-08Sunlight Spectacle 2024 Practical Action Launch Event 2024-04-08
Sunlight Spectacle 2024 Practical Action Launch Event 2024-04-08
 
Scootsy Overview Deck - Pan City Delivery
Scootsy Overview Deck - Pan City DeliveryScootsy Overview Deck - Pan City Delivery
Scootsy Overview Deck - Pan City Delivery
 
Understanding Post Production changes (PPC) in Clinical Data Management (CDM)...
Understanding Post Production changes (PPC) in Clinical Data Management (CDM)...Understanding Post Production changes (PPC) in Clinical Data Management (CDM)...
Understanding Post Production changes (PPC) in Clinical Data Management (CDM)...
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
 
General Elections Final Press Noteas per M
General Elections Final Press Noteas per MGeneral Elections Final Press Noteas per M
General Elections Final Press Noteas per M
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber security
 
Application of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxApplication of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptx
 
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityDon't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
 
GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024
 
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
 

NoSQL - Data Stores for Big Data

  • 1. NoSQL – Data Stores for Big Data NoSQL - Data Stores for Big Data 2nd International ScaDS Summer School on Big Data Anika Groß, Database Group, Universität Leipzig Leipzig, 12.07.2016
  • 2. NoSQL – Data Stores for Big Data “NoSQL for BigData“ • Massive data growth • Big data, cloud, real-time applications, … • Requirements • High read and write scalability • Management of unstructured and semi-structured data • Continuous availability • Decentralized applications • … • Modern NoSQL data stores pioneered by leading internet companies as in-house solutions 2 Figure: https://www.dezyre.com/article/nosql-vssql-4- reasons-why-nosql-is-better-for-big-data-applications/86
  • 3. NoSQL – Data Stores for Big Data “Not only SQL” • No standardized definition! • Non-relational approaches • Different applications require different types of databases • Database system with one or more of these criteria: • No relational data model • Schema free, only weak restrictions • No joins, no normalization • Distributed, horizontally scalable system • Use of commodity hardware • “No SQL” • Simple API instead of SQL • “No transactions” • BASE consistency model instead of ACID 3
  • 4. NoSQL – Data Stores for Big Data Key-Value Stores Wide Column Stores Document Stores Graph Databases NoSQL Data Stores 4 * * multi-model * • Collection of key-value pairs • Data access via key: get(key), put(key, value) • Semi-structured data in documents (e.g. JSON) • Access via key or simple API/query language • Tables with records with (many) dynamic columns • Access via key, SQL-like query language, .. • Data as nodes and edges with properties • Database queries incl. graph algorithms
  • 5. NoSQL – Data Stores for Big Data DB Engines Ranking 5http://db-engines.com/en/ranking
  • 6. NoSQL – Data Stores for Big Data Agenda • CAP, ACID, BASE, Consistency Models • Key-value store: Dynamo • Consistent hashing • Object versioning • Quorum-like consistency model • … • Document store: MongoDB • Query language • Indexing • Replication • Sharding 6
  • 7. NoSQL – Data Stores for Big Data Distributed Data Management • Distributed system needs to deal with • Network failures • Network latency, limited throughput • Change of network topology • … • Communication between nodes • a.o. synchronization and replication • Robust against node failure, loss of messages, … • Trade-off: performance vs. data consistency • Wait for synchronization between nodes • Avoid conflicts / inconsistencies 7
  • 8. NoSQL – Data Stores for Big Data • Consistency • All nodes see the same data at the same time • Availability • Every read or write request receives a response (succeeded or failed) • Partitioning tolerance • System continues to operate despite arbitrary partitioning due to network failures (loss of messages) • Theorem: A distributed computer system can at most provide two of these three properties. CAP Theorem 8 Brewer: Towards robust distributed systems. Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, 2000 Partition Tolerance Availability Consistency
  • 9. NoSQL – Data Stores for Big Data Source: Misconceptions about the CAP Theorem CAP Theorem (2) 9 Partition Tolerance Availability Consistency CP AP (CA) MongoDB BigTable HBase Dynamo/S3 Cassandra • Consistent but not available under network partitions • Lock transactions, avoid conflicts, … • Available but not consistent under network partitions • Writes always possible even if no communication/synchronization is possible • Inconsistent data, conflict resolution necessary • Controversy! • “2 of 3” was misleading, no CA: CAP Twelve Years Later: How the "Rules" Have Changed • Classification of systems difficult: Please stop calling databases CP or AP CP AP
  • 10. NoSQL – Data Stores for Big Data ACID • RDBMS ensure ACID properties for transactions: • Atomicity • "all or nothing” – property • if part of the transaction fails, the entire transaction fails, and the database state is left unchanged • Consistency • A successful transaction preserves the database consistency • Guarantee defined integrity constraints • Isolation • Concurrent execution of transactions results in a system state as if transactions were executed serially • Transactions can not rely on intermediate or unfinished state • Durability • Successfully committed transactions will remain, even in the event of system failure, power loss, other breakdowns (persistency) 10
  • 11. NoSQL – Data Stores for Big Data BASE • BA - Basically Available • Partial network failure → response to any request (response could be ‘failure’) • Replication factor = 3, 1 node fails: query response still possible • S - Soft State • The system could change over time • Even during times without input → changes due to “eventual consistency” → state of the system is always “soft” • E - Eventually Consistent • Consistency is not checked for every transaction before it moves onto the next one → Replica can be inconsistent • The system will eventually become consistent (once it stops receiving input) • “Sooner or later” the data will be propagated to everywhere it should 11
  • 12. NoSQL – Data Stores for Big Data Consistency Models Strong Consistency 12 inconsistency window r(x)=v1 r(x)=v2 r(x)=v2update(x,v2) r(x)=v2 r(x)=v1 r(x)=v2 r(x)=v1update(x,v2) r(x)=v1 r(x)=v1 r(x)=v2 t t Eventual Consistency Eventually (after the inconsistency window closes) all accesses will return the last updated value.
  • 13. NoSQL – Data Stores for Big Data Consistency Models Read-your-writes Consistency 13 Monotonic Read Consistency r(x)=v1 r(x)=v1 r(x)=v2update(x,v2) r(x)=v1 r(x)=v1 r(x)=v2 r(x)=v1update(x,v2) r(x)=v1 r(x)=v2 r(x)=v2 r(x)=v2 t t Client reads updated value → will never read any previous value Client updates item → will always access the updated value and never see an older value
  • 14. NoSQL – Data Stores for Big Data Agenda • CAP, ACID, BASE, Consistency Models • Key-value store: Dynamo • Consistent hashing • Object versioning • Quorum-like consistency model • … • Document store: MongoDB • Query language • Indexing • Replication • Sharding 14
  • 15. NoSQL – Data Stores for Big Data Key-Value Stores • Data structure: collection of key-value pairs = associative array / dictionary / map • Key • Unique within a namespace Namespace = collection of keys, ‘bucket‘ • Values • Uninterpreted string of bytes of arbitrary length (BLOB) • No integrity constraints (check on application side) • Different types of key-value stores • Different consistency models, ordered/unordered keys, RAM vs. disk/SSD 15 Sales key 1 value key 2 value key 3 value key n value … … Inventory key 1 value key 2 value key 3 value key n value … … Product descriptions key 1 value key 2 value key 3 value key n value … …
  • 16. NoSQL – Data Stores for Big Data Amazon Dynamo • Scalable distributed data store built for Amazon’s platform • Dynamo principles (or part of them) implemented in several NoSQL solutions • “Not only” Dynamo: e.g. Cassandra = Dynamo + BigTable • Motivation • Scale to extreme peak loads efficiently without any downtime • e.g. busy holiday shopping season 16 DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6), 2007. Project Voldemort
  • 17. NoSQL – Data Stores for Big Data Amazon Dynamo • Aims: high availability and performance • Address tradeoffs between availability, consistency, cost-effectiveness and performance • Eventually-consistent storage system • “always writeable” data store • Favor availability over consistency (if necessary) • Performance SLA (Service Level Agreement) • “response within 300ms for 99.9% of requests for peak client load of 500 requests per second” • Decentralized system: P2P-like distribution • No master nodes • All nodes have the same functionality 17
  • 18. NoSQL – Data Stores for Big Data Techniques • Consistent hashing • Object versioning / vector clocks • Quorum-like consistency model • Decentralized replica synchronization protocol • Gossip-based membership protocol and failure detection 18
  • 19. NoSQL – Data Stores for Big Data Partitioning and Replication of Keys • Logical ring of nodes • Output range of a hash function → fixed circular space • Node position is a random value in the range of the hash function • Assignment of data to nodes • Determine hash value of keys → position on ring • Assign to N successor nodes (clockwise) • Hash value between A and B, N=3 → B, C, D Consistent hashing • Minimize the number of re-assignments when nodes are added or removed • Need sophisticated hash function for good load balancing and data locality • Preference list: list of nodes that is responsible for storing a particular key (every node knows the preference list) 19 A B C DE F G Hash(key)
  • 20. NoSQL – Data Stores for Big Data Data Access • Key-Value Store interface • Access via Primary-Key; no complex queries • Every node in the ring can route each query • Routing to (the usually first) node in preference list of the specific key • Put (Key, Context, Object) • Coordinator creates vector clock (versioning) based on context • Local write of object incl. vector clock • Asynchronous replication • Write request to N-1 remaining nodes in preference list • Successful write, if (at least) W-1 nodes respond • Asynchronous update of replicas W<N → consistency problems • Get (Key) • Read request to N nodes in preference list • Response from R nodes → possibly different versions of the same object: List of (Object, Context) pairs 20
  • 21. NoSQL – Data Stores for Big Data Replication • Read/Write quorum • R/W = minimal number of N replica nodes that must participate in a successful read/write operation • Flexible adaptation of (N,R,W) according to application requirements w.r.t. performance, availability, durability • Ensure read of current version: R + W > N • No loss of information • Conflict resolution • Data store side: e.g. “last write wins” • Application side: e.g. merge conflicting shopping cart versions 21
  • 22. NoSQL – Data Stores for Big Data Quorum variants • Optimizing reads: R=1, W=N • Consistency due to „write to all“ = wait for all write ack’s • Optimizing writes: R=N, W=1 • Consistency due to „read from all“ = last version will be included • R+W>N R=N=3, W=1 R W R=1, W=N=3 R=3, W=3, N=5 R=4, W=2, N=5 Eventual consistency: R+W≤N • Read might not cover current write R=2, W=2, N=4
  • 23. NoSQL – Data Stores for Big Data Versioning • Aim: Capture causality between different versions of an object • Which object versions are known? • Parallel branches or causal ordering? • Vector clock: List of (node, counter) pairs • Version counter per replica node e.g. 𝐷([𝑆𝑥, 1]) for object 𝐷, node 𝑆𝑥, version 1 • Example: evolving object versions 23 Sx Sy Sz m1, m2 - messages
  • 24. NoSQL – Data Stores for Big Data Versioning (2) • Does one object version descend from another one? • Counters 1st vector clock ≤ all counters 2nd vector clock → 1st version is ancestor of 2nd (forget 1st version) • Otherwise: conflicting versions • Client identifies conflict during read • Gets all known versions • Subsequent update consolidates versions 24 Figure: DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6), 2007.
  • 25. NoSQL – Data Stores for Big Data Handling Temporary Failures • “ Sloppy“ Quorum (N, R, W) • Perform all operations on the first N healthy nodes from the preference list (not necessarily the first N nodes in the ring) • Still “writable” in case of node failure • Hinted Handoff • Unreachable node → write request send to other node (“hinted replica”) • Availability! • Node recovers → sync of hinted replica and original node • Example • B is not available • Replica to E (handoff) with hint to intended recipient B • B recovers • Hinted replica E → B • E can delete hinted replica 25
  • 26. NoSQL – Data Stores for Big Data Replica Synchronization • Hash tree (Merkle tree) for key range • Leaves are hashes of the values for individual keys • Parent nodes are hash values of child node hash values • Advantages: • Efficient check: equal root hashes → replica are in sync • Efficient identification of “out of sync” keys: subtree traversal to find differences • Disadvantages • Recalculation of hash trees in case of repartitioning (added or removed node) 26 K1 V1 K2 V2 K3 V3 K4 V4 H(k1) H(k2) H(k4)H(k3) H(H(k1), H(k2)) H(H(k3), H(k4)) H(H(H(k1), H(k2)), H(H(k3), H(k4))) ... ... H(...) ...
  • 27. NoSQL – Data Stores for Big Data Overview: Amazon Dynamo Techniques Problem Technique Advantage Partitioning Consistent Hashing Incremental Scalability High Availability for writes Vector clocks with reconciliation during reads Version size is decoupled from update rates Handling temporary failures Sloppy Quorum and hinted handoff Provides high availability and durability guarantee when some of the replicas are not available Recovering from permanent failures Anti-entropy using Merkle trees Efficient synchronization of divergent replicas in the background Membership and failure detection Gossip-based membership protocol and failure detection Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information 27 Source: DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6), 2007.
  • 28. NoSQL – Data Stores for Big Data Agenda • CAP, ACID, BASE, Consistency Models • Key-value store: Dynamo • Consistent hashing • Object versioning • Quorum-like consistency model • … • Document store: MongoDB • Query language • Indexing • Replication • Sharding 28
  • 29. NoSQL – Data Stores for Big Data • Collection of documents • Semi-structured data (e.g. JSON format) • Flexible, extensible schema • Embedded (denormalized) data model • Data access via key, (simple) query language, map/reduce queries • Use cases: web applications, mobile applications, e-commerce solutions … • Examples Document Stores 29 Database Collection { {doc1} {doc2} … } Collection { {doc1} {doc2} … }
  • 30. NoSQL – Data Stores for Big Data Example – Collection “images” { _id: 1, name: “fish.jpg”, time: 17:46, user: “bob”, camera: “nikon”, info: { width: 100, height: 200, size: 12345 }, tags: [“tuna”, “shark”] } 30 id name time user camera info tags width height size 1 fish.jpg 17:46 bob nikon 100 200 12345 [tuna, shark] 2 trees.jpg 17:57 john canon 30 250 32091 [oak] 3 hawaii.png 17:59 john nikon 128 64 92834 [maui, tuna] 4 island.gif 17:43 zztop nikon 640 480 50398 [maui] { _id: 2, name: “trees.jpg”, time: 17:57, user: “john”, camera: “canon”, info: { width: 30, height: 250, size: 32091 }, tags: [oak] } … embedded document array of strings field: value
  • 31. NoSQL – Data Stores for Big Data mongoDB • Open source document database • Current release 3.2 • Embedded data model • JSON-like documents (BSON = Binary JSON) • Features • Query language • Indexing • Replication • Sharding 31
  • 32. NoSQL – Data Stores for Big Data Query Language • Selection, projection: db.images.find({camera:"nikon"}, {name:1, camera:1, _id:0}) • Querying multi-valued attributes: • Pictures with tag "shark“: db.images.find({tags:"shark"}) • Pictures with tags "a", "b" and "c": db.images.find({tags:{$all:["a","b","c"]}}) • Querying nested objects • Pictures with width < 100px : db.images.find({info.width:{$lt:100}}) 32 { _id: 1, name: “fish.jpg”, … camera: “nikon”, info: { width: 100, height: 200, size: 12345 }, tags: [“tuna”, “shark”] }
  • 33. NoSQL – Data Stores for Big Data Aggregation Framework • Pipeline of operators • $match: filter documents • $project: inclusion, suppression, new field, value reset for attributes • $group: grouping and aggregation • $unwind: unnest arrays (one document per array element) • $sort, $limit, $skip, … http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  • 34. NoSQL – Data Stores for Big Data Indexing • Aim: Efficient query execution • Avoid collection scan • Similar to other database systems (B-tree data structure) Index Types • Default _id index (unique) • Single field index • Compound index: e.g. { userid: 1, score: -1 } → 1 asc, -1 desc • Multikey index: index content of arrays (separate index entries for every array element) • Geospatial index: index coordinate pairs 34 https://docs.mongodb.com/manual/indexes/
  • 35. NoSQL – Data Stores for Big Data Replication • Aim: redundancy, data availability • Asynchronous master-slave replication • Replica Sets • Group of servers (mongod instances) with multiple copies of the same data set • Writes • Primary receives all write operations • Writes recorded to operation log (oplog), ‘write acknowledgement’ • Replication of oplog to “secondaries” • Secondaries apply operations asynchronously • ‘majority’ writeConcern: ack’ write from majority of members (not only primary) • Reads • Default “primary”: all reads directed to primary • Read preference modes • “primary preferred”, “secondary preferred”, “nearest” … → eventual consistency 35 https://docs.mongodb.com/manual/replication
  • 36. NoSQL – Data Stores for Big Data Atomicity, Isolation, Durability • Atomic single document writes • write can update multiple fields in a document → reader can not see partially updated documents • Non-atomic (!) multiple document writes • Alternative: $isolated operator • Isolate multi-document update operation (no interleaving operations) • BUT: not “all-or-nothing” atomicity (no rollback after error during write) • $isolated does not work for sharded clusters • Durability • In replica set: update written to a majority of voting nodes’ journal files • readConcern • ‘local’: concurrent readers may see the updated document before changes are durable (read uncommitted) • ‘majority’: client can read only durable writes 36
  • 37. NoSQL – Data Stores for Big Data https://docs.mongodb.com/manual/ core/replica-set-elections/ Automatic Failover • Primary election • Primary inaccessible for 10 seconds • During election → no primary → read-only • New primary: first secondary that, • holds election • received a majority of the members’ votes • has most current optime (timestamp last write) • Network partition • Minority partition: primary downgraded to secondary • Rollback: revert writes on former primary when it rejoins its replica set after failover • Majority partition: if necessary, election of new primary 37
  • 38. NoSQL – Data Stores for Big Data Sharding • Aim: scalability • Horizontal partitioning into shards • Shard • Contains subset of the data • Every shard can be a replica set • Shard key • Immutable field or fields, that exist in every document of the collection • Collection must be indexed on the shard key 38 https://docs.mongodb.com/manual/sharding/
  • 39. NoSQL – Data Stores for Big Data Sharding (2) • mongos • query router • interface between client applications and sharded cluster • config servers • store metadata and configuration settings for the cluster (data location) • can be deployed as replica set • mongod • Primary daemon process for MongoDB • Request handling, manages data access, … 39 Web server mongos App Web server mongos App mongod configsrv mongod configsrv mongod mongod mongod Shard01 Replica Set rs01 mongod mongod mongod Shard02 Replica Set rs02 mongod mongod mongod Shard03 Replica Set rs03 mongod configsrv Source: Tilmann Beittner, Jeremias Brödel: Erste Gehversuche mit MongoDB - Schritt für Schritt. iXDeveloper, Big Data, 02/2015.
  • 40. NoSQL – Data Stores for Big Data Sharding - Chunks • Sharded data is partitioned into chunks • Lower and upper range based on shard key • Shard split: • chunk size > max chunk size (default 64MB) • #documents > max # documents per chunk • Migration of chunks across shards (even balance) 40https://docs.mongodb.com/manual/core/sharding-data-partitioning
  • 41. NoSQL – Data Stores for Big Data Hashed and Ranged Sharding • Hash of shard key field’s value • Range of hashed shard key values assigned to each chunk + Even data distribution - “close range” of shard key values unlikely in same chunk 41 • Specific key range → same chunk + Efficient range queries • Routing only to shards that contain required data - Possibly uneven data distribution • Careful selection of shard key! Hash-based Range-based
  • 42. NoSQL – Data Stores for Big Data CP system? - take care … • “Jepsen: MongoDB stale reads” • “… we’ll see that Mongo’s consistency model is broken by design: not only can “strictly consistent” reads see stale versions of documents, but they can also return garbage data from writes that never should have occurred. …” • Source: https://aphyr.com/posts/322-jepsen-mongodb-stale-reads 42
  • 43. NoSQL – Data Stores for Big Data Summary • CAP, ACID, BASE, Consistency Models • Key-value store: Dynamo • Consistent hashing , vector clocks, sloppy quorum, … • Document data store: MongoDB • Query language, indexing, replication, sharding, … • The “best” database? Application dependent! • Relational vs. non-relational (document, graph… ) data model? • ACID transactions? • Large data sets? High query load? Need for distributed data storage? • Availability, consistency requirements? • … 43
  • 44. NoSQL – Data Stores for Big Data References • Eric A. Brewer: Towards robust distributed systems. Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODS), 2000 • Eric A. Brewer: CAP Twelve Years Later: How the "Rules" Have Changed, Computer, 45(2), 2012. https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed • Martin Kleppmann: Please stop calling databases CP or AP, 2015. https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html • Giuseppe DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6), 2007. • MongoDB documentation: https://docs.mongodb.com/ • Tilmann Beittner, Jeremias Brödel: Erste Gehversuche mit MongoDB - Schritt für Schritt. iXDeveloper, Big Data, 02/2015. • Kyle Kingsbury: Jepsen: MongoDB stale reads, 2015. https://aphyr.com/posts/322-jepsen-mongodb-stale-reads • Lecture “NoSQL-Datenbanken”, Database Group, Universität Leipzig • Contributors: Anika Groß, Martin Junghanns, Lars Kolb, Andreas Thor 44