NoSQL - Data Stores for Big Data

6.095 Aufrufe

Veröffentlicht am

Talk at the "2nd International ScaDS Summer School on Big Data", Universität Leipzig, 07/2016:

Modern web-scale and big data applications require efficient solutions to process huge amounts of possibly unstructured data. Many NoSQL stores allow a schema-free data storage, easy replication mechanisms and horizontal scalability. According to the CAP theorem, distributed systems can guarantee consistency (C) or availability (A) while preserving partition tolerance (P) in case of network failures. The talk will give an introduction to different data models and technical models for NoSQL datastores, focusing on two prominent systems: the AP system and key-value store Dynamo as well as the CP system and document store MongoDB.

Veröffentlicht in: Präsentationen & Vorträge
0 Kommentare
1 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
6.095
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
10
Aktionen
Geteilt
0
Downloads
38
Kommentare
0
Gefällt mir
1
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

NoSQL - Data Stores for Big Data

  1. 1. NoSQL – Data Stores for Big Data NoSQL - Data Stores for Big Data 2nd International ScaDS Summer School on Big Data Anika Groß, Database Group, Universität Leipzig Leipzig, 12.07.2016
  2. 2. NoSQL – Data Stores for Big Data “NoSQL for BigData“ • Massive data growth • Big data, cloud, real-time applications, … • Requirements • High read and write scalability • Management of unstructured and semi-structured data • Continuous availability • Decentralized applications • … • Modern NoSQL data stores pioneered by leading internet companies as in-house solutions 2 Figure: https://www.dezyre.com/article/nosql-vssql-4- reasons-why-nosql-is-better-for-big-data-applications/86
  3. 3. NoSQL – Data Stores for Big Data “Not only SQL” • No standardized definition! • Non-relational approaches • Different applications require different types of databases • Database system with one or more of these criteria: • No relational data model • Schema free, only weak restrictions • No joins, no normalization • Distributed, horizontally scalable system • Use of commodity hardware • “No SQL” • Simple API instead of SQL • “No transactions” • BASE consistency model instead of ACID 3
  4. 4. NoSQL – Data Stores for Big Data Key-Value Stores Wide Column Stores Document Stores Graph Databases NoSQL Data Stores 4 * * multi-model * • Collection of key-value pairs • Data access via key: get(key), put(key, value) • Semi-structured data in documents (e.g. JSON) • Access via key or simple API/query language • Tables with records with (many) dynamic columns • Access via key, SQL-like query language, .. • Data as nodes and edges with properties • Database queries incl. graph algorithms
  5. 5. NoSQL – Data Stores for Big Data DB Engines Ranking 5http://db-engines.com/en/ranking
  6. 6. NoSQL – Data Stores for Big Data Agenda • CAP, ACID, BASE, Consistency Models • Key-value store: Dynamo • Consistent hashing • Object versioning • Quorum-like consistency model • … • Document store: MongoDB • Query language • Indexing • Replication • Sharding 6
  7. 7. NoSQL – Data Stores for Big Data Distributed Data Management • Distributed system needs to deal with • Network failures • Network latency, limited throughput • Change of network topology • … • Communication between nodes • a.o. synchronization and replication • Robust against node failure, loss of messages, … • Trade-off: performance vs. data consistency • Wait for synchronization between nodes • Avoid conflicts / inconsistencies 7
  8. 8. NoSQL – Data Stores for Big Data • Consistency • All nodes see the same data at the same time • Availability • Every read or write request receives a response (succeeded or failed) • Partitioning tolerance • System continues to operate despite arbitrary partitioning due to network failures (loss of messages) • Theorem: A distributed computer system can at most provide two of these three properties. CAP Theorem 8 Brewer: Towards robust distributed systems. Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, 2000 Partition Tolerance Availability Consistency
  9. 9. NoSQL – Data Stores for Big Data Source: Misconceptions about the CAP Theorem CAP Theorem (2) 9 Partition Tolerance Availability Consistency CP AP (CA) MongoDB BigTable HBase Dynamo/S3 Cassandra • Consistent but not available under network partitions • Lock transactions, avoid conflicts, … • Available but not consistent under network partitions • Writes always possible even if no communication/synchronization is possible • Inconsistent data, conflict resolution necessary • Controversy! • “2 of 3” was misleading, no CA: CAP Twelve Years Later: How the "Rules" Have Changed • Classification of systems difficult: Please stop calling databases CP or AP CP AP
  10. 10. NoSQL – Data Stores for Big Data ACID • RDBMS ensure ACID properties for transactions: • Atomicity • "all or nothing” – property • if part of the transaction fails, the entire transaction fails, and the database state is left unchanged • Consistency • A successful transaction preserves the database consistency • Guarantee defined integrity constraints • Isolation • Concurrent execution of transactions results in a system state as if transactions were executed serially • Transactions can not rely on intermediate or unfinished state • Durability • Successfully committed transactions will remain, even in the event of system failure, power loss, other breakdowns (persistency) 10
  11. 11. NoSQL – Data Stores for Big Data BASE • BA - Basically Available • Partial network failure → response to any request (response could be ‘failure’) • Replication factor = 3, 1 node fails: query response still possible • S - Soft State • The system could change over time • Even during times without input → changes due to “eventual consistency” → state of the system is always “soft” • E - Eventually Consistent • Consistency is not checked for every transaction before it moves onto the next one → Replica can be inconsistent • The system will eventually become consistent (once it stops receiving input) • “Sooner or later” the data will be propagated to everywhere it should 11
  12. 12. NoSQL – Data Stores for Big Data Consistency Models Strong Consistency 12 inconsistency window r(x)=v1 r(x)=v2 r(x)=v2update(x,v2) r(x)=v2 r(x)=v1 r(x)=v2 r(x)=v1update(x,v2) r(x)=v1 r(x)=v1 r(x)=v2 t t Eventual Consistency Eventually (after the inconsistency window closes) all accesses will return the last updated value.
  13. 13. NoSQL – Data Stores for Big Data Consistency Models Read-your-writes Consistency 13 Monotonic Read Consistency r(x)=v1 r(x)=v1 r(x)=v2update(x,v2) r(x)=v1 r(x)=v1 r(x)=v2 r(x)=v1update(x,v2) r(x)=v1 r(x)=v2 r(x)=v2 r(x)=v2 t t Client reads updated value → will never read any previous value Client updates item → will always access the updated value and never see an older value
  14. 14. NoSQL – Data Stores for Big Data Agenda • CAP, ACID, BASE, Consistency Models • Key-value store: Dynamo • Consistent hashing • Object versioning • Quorum-like consistency model • … • Document store: MongoDB • Query language • Indexing • Replication • Sharding 14
  15. 15. NoSQL – Data Stores for Big Data Key-Value Stores • Data structure: collection of key-value pairs = associative array / dictionary / map • Key • Unique within a namespace Namespace = collection of keys, ‘bucket‘ • Values • Uninterpreted string of bytes of arbitrary length (BLOB) • No integrity constraints (check on application side) • Different types of key-value stores • Different consistency models, ordered/unordered keys, RAM vs. disk/SSD 15 Sales key 1 value key 2 value key 3 value key n value … … Inventory key 1 value key 2 value key 3 value key n value … … Product descriptions key 1 value key 2 value key 3 value key n value … …
  16. 16. NoSQL – Data Stores for Big Data Amazon Dynamo • Scalable distributed data store built for Amazon’s platform • Dynamo principles (or part of them) implemented in several NoSQL solutions • “Not only” Dynamo: e.g. Cassandra = Dynamo + BigTable • Motivation • Scale to extreme peak loads efficiently without any downtime • e.g. busy holiday shopping season 16 DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6), 2007. Project Voldemort
  17. 17. NoSQL – Data Stores for Big Data Amazon Dynamo • Aims: high availability and performance • Address tradeoffs between availability, consistency, cost-effectiveness and performance • Eventually-consistent storage system • “always writeable” data store • Favor availability over consistency (if necessary) • Performance SLA (Service Level Agreement) • “response within 300ms for 99.9% of requests for peak client load of 500 requests per second” • Decentralized system: P2P-like distribution • No master nodes • All nodes have the same functionality 17
  18. 18. NoSQL – Data Stores for Big Data Techniques • Consistent hashing • Object versioning / vector clocks • Quorum-like consistency model • Decentralized replica synchronization protocol • Gossip-based membership protocol and failure detection 18
  19. 19. NoSQL – Data Stores for Big Data Partitioning and Replication of Keys • Logical ring of nodes • Output range of a hash function → fixed circular space • Node position is a random value in the range of the hash function • Assignment of data to nodes • Determine hash value of keys → position on ring • Assign to N successor nodes (clockwise) • Hash value between A and B, N=3 → B, C, D Consistent hashing • Minimize the number of re-assignments when nodes are added or removed • Need sophisticated hash function for good load balancing and data locality • Preference list: list of nodes that is responsible for storing a particular key (every node knows the preference list) 19 A B C DE F G Hash(key)
  20. 20. NoSQL – Data Stores for Big Data Data Access • Key-Value Store interface • Access via Primary-Key; no complex queries • Every node in the ring can route each query • Routing to (the usually first) node in preference list of the specific key • Put (Key, Context, Object) • Coordinator creates vector clock (versioning) based on context • Local write of object incl. vector clock • Asynchronous replication • Write request to N-1 remaining nodes in preference list • Successful write, if (at least) W-1 nodes respond • Asynchronous update of replicas W<N → consistency problems • Get (Key) • Read request to N nodes in preference list • Response from R nodes → possibly different versions of the same object: List of (Object, Context) pairs 20
  21. 21. NoSQL – Data Stores for Big Data Replication • Read/Write quorum • R/W = minimal number of N replica nodes that must participate in a successful read/write operation • Flexible adaptation of (N,R,W) according to application requirements w.r.t. performance, availability, durability • Ensure read of current version: R + W > N • No loss of information • Conflict resolution • Data store side: e.g. “last write wins” • Application side: e.g. merge conflicting shopping cart versions 21
  22. 22. NoSQL – Data Stores for Big Data Quorum variants • Optimizing reads: R=1, W=N • Consistency due to „write to all“ = wait for all write ack’s • Optimizing writes: R=N, W=1 • Consistency due to „read from all“ = last version will be included • R+W>N R=N=3, W=1 R W R=1, W=N=3 R=3, W=3, N=5 R=4, W=2, N=5 Eventual consistency: R+W≤N • Read might not cover current write R=2, W=2, N=4
  23. 23. NoSQL – Data Stores for Big Data Versioning • Aim: Capture causality between different versions of an object • Which object versions are known? • Parallel branches or causal ordering? • Vector clock: List of (node, counter) pairs • Version counter per replica node e.g. 𝐷([𝑆𝑥, 1]) for object 𝐷, node 𝑆𝑥, version 1 • Example: evolving object versions 23 Sx Sy Sz m1, m2 - messages
  24. 24. NoSQL – Data Stores for Big Data Versioning (2) • Does one object version descend from another one? • Counters 1st vector clock ≤ all counters 2nd vector clock → 1st version is ancestor of 2nd (forget 1st version) • Otherwise: conflicting versions • Client identifies conflict during read • Gets all known versions • Subsequent update consolidates versions 24 Figure: DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6), 2007.
  25. 25. NoSQL – Data Stores for Big Data Handling Temporary Failures • “ Sloppy“ Quorum (N, R, W) • Perform all operations on the first N healthy nodes from the preference list (not necessarily the first N nodes in the ring) • Still “writable” in case of node failure • Hinted Handoff • Unreachable node → write request send to other node (“hinted replica”) • Availability! • Node recovers → sync of hinted replica and original node • Example • B is not available • Replica to E (handoff) with hint to intended recipient B • B recovers • Hinted replica E → B • E can delete hinted replica 25
  26. 26. NoSQL – Data Stores for Big Data Replica Synchronization • Hash tree (Merkle tree) for key range • Leaves are hashes of the values for individual keys • Parent nodes are hash values of child node hash values • Advantages: • Efficient check: equal root hashes → replica are in sync • Efficient identification of “out of sync” keys: subtree traversal to find differences • Disadvantages • Recalculation of hash trees in case of repartitioning (added or removed node) 26 K1 V1 K2 V2 K3 V3 K4 V4 H(k1) H(k2) H(k4)H(k3) H(H(k1), H(k2)) H(H(k3), H(k4)) H(H(H(k1), H(k2)), H(H(k3), H(k4))) ... ... H(...) ...
  27. 27. NoSQL – Data Stores for Big Data Overview: Amazon Dynamo Techniques Problem Technique Advantage Partitioning Consistent Hashing Incremental Scalability High Availability for writes Vector clocks with reconciliation during reads Version size is decoupled from update rates Handling temporary failures Sloppy Quorum and hinted handoff Provides high availability and durability guarantee when some of the replicas are not available Recovering from permanent failures Anti-entropy using Merkle trees Efficient synchronization of divergent replicas in the background Membership and failure detection Gossip-based membership protocol and failure detection Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information 27 Source: DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6), 2007.
  28. 28. NoSQL – Data Stores for Big Data Agenda • CAP, ACID, BASE, Consistency Models • Key-value store: Dynamo • Consistent hashing • Object versioning • Quorum-like consistency model • … • Document store: MongoDB • Query language • Indexing • Replication • Sharding 28
  29. 29. NoSQL – Data Stores for Big Data • Collection of documents • Semi-structured data (e.g. JSON format) • Flexible, extensible schema • Embedded (denormalized) data model • Data access via key, (simple) query language, map/reduce queries • Use cases: web applications, mobile applications, e-commerce solutions … • Examples Document Stores 29 Database Collection { {doc1} {doc2} … } Collection { {doc1} {doc2} … }
  30. 30. NoSQL – Data Stores for Big Data Example – Collection “images” { _id: 1, name: “fish.jpg”, time: 17:46, user: “bob”, camera: “nikon”, info: { width: 100, height: 200, size: 12345 }, tags: [“tuna”, “shark”] } 30 id name time user camera info tags width height size 1 fish.jpg 17:46 bob nikon 100 200 12345 [tuna, shark] 2 trees.jpg 17:57 john canon 30 250 32091 [oak] 3 hawaii.png 17:59 john nikon 128 64 92834 [maui, tuna] 4 island.gif 17:43 zztop nikon 640 480 50398 [maui] { _id: 2, name: “trees.jpg”, time: 17:57, user: “john”, camera: “canon”, info: { width: 30, height: 250, size: 32091 }, tags: [oak] } … embedded document array of strings field: value
  31. 31. NoSQL – Data Stores for Big Data mongoDB • Open source document database • Current release 3.2 • Embedded data model • JSON-like documents (BSON = Binary JSON) • Features • Query language • Indexing • Replication • Sharding 31
  32. 32. NoSQL – Data Stores for Big Data Query Language • Selection, projection: db.images.find({camera:"nikon"}, {name:1, camera:1, _id:0}) • Querying multi-valued attributes: • Pictures with tag "shark“: db.images.find({tags:"shark"}) • Pictures with tags "a", "b" and "c": db.images.find({tags:{$all:["a","b","c"]}}) • Querying nested objects • Pictures with width < 100px : db.images.find({info.width:{$lt:100}}) 32 { _id: 1, name: “fish.jpg”, … camera: “nikon”, info: { width: 100, height: 200, size: 12345 }, tags: [“tuna”, “shark”] }
  33. 33. NoSQL – Data Stores for Big Data Aggregation Framework • Pipeline of operators • $match: filter documents • $project: inclusion, suppression, new field, value reset for attributes • $group: grouping and aggregation • $unwind: unnest arrays (one document per array element) • $sort, $limit, $skip, … http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  34. 34. NoSQL – Data Stores for Big Data Indexing • Aim: Efficient query execution • Avoid collection scan • Similar to other database systems (B-tree data structure) Index Types • Default _id index (unique) • Single field index • Compound index: e.g. { userid: 1, score: -1 } → 1 asc, -1 desc • Multikey index: index content of arrays (separate index entries for every array element) • Geospatial index: index coordinate pairs 34 https://docs.mongodb.com/manual/indexes/
  35. 35. NoSQL – Data Stores for Big Data Replication • Aim: redundancy, data availability • Asynchronous master-slave replication • Replica Sets • Group of servers (mongod instances) with multiple copies of the same data set • Writes • Primary receives all write operations • Writes recorded to operation log (oplog), ‘write acknowledgement’ • Replication of oplog to “secondaries” • Secondaries apply operations asynchronously • ‘majority’ writeConcern: ack’ write from majority of members (not only primary) • Reads • Default “primary”: all reads directed to primary • Read preference modes • “primary preferred”, “secondary preferred”, “nearest” … → eventual consistency 35 https://docs.mongodb.com/manual/replication
  36. 36. NoSQL – Data Stores for Big Data Atomicity, Isolation, Durability • Atomic single document writes • write can update multiple fields in a document → reader can not see partially updated documents • Non-atomic (!) multiple document writes • Alternative: $isolated operator • Isolate multi-document update operation (no interleaving operations) • BUT: not “all-or-nothing” atomicity (no rollback after error during write) • $isolated does not work for sharded clusters • Durability • In replica set: update written to a majority of voting nodes’ journal files • readConcern • ‘local’: concurrent readers may see the updated document before changes are durable (read uncommitted) • ‘majority’: client can read only durable writes 36
  37. 37. NoSQL – Data Stores for Big Data https://docs.mongodb.com/manual/ core/replica-set-elections/ Automatic Failover • Primary election • Primary inaccessible for 10 seconds • During election → no primary → read-only • New primary: first secondary that, • holds election • received a majority of the members’ votes • has most current optime (timestamp last write) • Network partition • Minority partition: primary downgraded to secondary • Rollback: revert writes on former primary when it rejoins its replica set after failover • Majority partition: if necessary, election of new primary 37
  38. 38. NoSQL – Data Stores for Big Data Sharding • Aim: scalability • Horizontal partitioning into shards • Shard • Contains subset of the data • Every shard can be a replica set • Shard key • Immutable field or fields, that exist in every document of the collection • Collection must be indexed on the shard key 38 https://docs.mongodb.com/manual/sharding/
  39. 39. NoSQL – Data Stores for Big Data Sharding (2) • mongos • query router • interface between client applications and sharded cluster • config servers • store metadata and configuration settings for the cluster (data location) • can be deployed as replica set • mongod • Primary daemon process for MongoDB • Request handling, manages data access, … 39 Web server mongos App Web server mongos App mongod configsrv mongod configsrv mongod mongod mongod Shard01 Replica Set rs01 mongod mongod mongod Shard02 Replica Set rs02 mongod mongod mongod Shard03 Replica Set rs03 mongod configsrv Source: Tilmann Beittner, Jeremias Brödel: Erste Gehversuche mit MongoDB - Schritt für Schritt. iXDeveloper, Big Data, 02/2015.
  40. 40. NoSQL – Data Stores for Big Data Sharding - Chunks • Sharded data is partitioned into chunks • Lower and upper range based on shard key • Shard split: • chunk size > max chunk size (default 64MB) • #documents > max # documents per chunk • Migration of chunks across shards (even balance) 40https://docs.mongodb.com/manual/core/sharding-data-partitioning
  41. 41. NoSQL – Data Stores for Big Data Hashed and Ranged Sharding • Hash of shard key field’s value • Range of hashed shard key values assigned to each chunk + Even data distribution - “close range” of shard key values unlikely in same chunk 41 • Specific key range → same chunk + Efficient range queries • Routing only to shards that contain required data - Possibly uneven data distribution • Careful selection of shard key! Hash-based Range-based
  42. 42. NoSQL – Data Stores for Big Data CP system? - take care … • “Jepsen: MongoDB stale reads” • “… we’ll see that Mongo’s consistency model is broken by design: not only can “strictly consistent” reads see stale versions of documents, but they can also return garbage data from writes that never should have occurred. …” • Source: https://aphyr.com/posts/322-jepsen-mongodb-stale-reads 42
  43. 43. NoSQL – Data Stores for Big Data Summary • CAP, ACID, BASE, Consistency Models • Key-value store: Dynamo • Consistent hashing , vector clocks, sloppy quorum, … • Document data store: MongoDB • Query language, indexing, replication, sharding, … • The “best” database? Application dependent! • Relational vs. non-relational (document, graph… ) data model? • ACID transactions? • Large data sets? High query load? Need for distributed data storage? • Availability, consistency requirements? • … 43
  44. 44. NoSQL – Data Stores for Big Data References • Eric A. Brewer: Towards robust distributed systems. Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODS), 2000 • Eric A. Brewer: CAP Twelve Years Later: How the "Rules" Have Changed, Computer, 45(2), 2012. https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed • Martin Kleppmann: Please stop calling databases CP or AP, 2015. https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html • Giuseppe DeCandia et al.: Dynamo: Amazon’s Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, 41(6), 2007. • MongoDB documentation: https://docs.mongodb.com/ • Tilmann Beittner, Jeremias Brödel: Erste Gehversuche mit MongoDB - Schritt für Schritt. iXDeveloper, Big Data, 02/2015. • Kyle Kingsbury: Jepsen: MongoDB stale reads, 2015. https://aphyr.com/posts/322-jepsen-mongodb-stale-reads • Lecture “NoSQL-Datenbanken”, Database Group, Universität Leipzig • Contributors: Anika Groß, Martin Junghanns, Lars Kolb, Andreas Thor 44

×