Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Institut f¨ur Informatik
Replication and Synchronization...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Institut f¨ur Informatik
Short CV Dr. Lena Wiese
Univers...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Institut f¨ur Informatik
Book / Workshop
Master’s level ...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Distributed Database Systems Institut f¨ur Informatik
Ou...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Distributed Database Systems Institut f¨ur Informatik
Di...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Distributed Database Systems Institut f¨ur Informatik
Di...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Epidemic Protocols Institut f¨ur Informatik
Outline
1 Di...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Epidemic Protocols Institut f¨ur Informatik
Epidemic Pro...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Epidemic Protocols Institut f¨ur Informatik
Variants of ...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Epidemic Protocols Institut f¨ur Informatik
Hash Trees
H...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Data Allocation and Replication Institut f¨ur Informatik...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Data Allocation and Replication Institut f¨ur Informatik...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Data Allocation and Replication Institut f¨ur Informatik...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Data Allocation and Replication Institut f¨ur Informatik...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Data Allocation and Replication Institut f¨ur Informatik...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Data Allocation and Replication Institut f¨ur Informatik...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Data Allocation and Replication Institut f¨ur Informatik...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Data Allocation and Replication Institut f¨ur Informatik...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Multiversion Concurrency Control Institut f¨ur Informati...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Multiversion Concurrency Control Institut f¨ur Informati...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Multiversion Concurrency Control Institut f¨ur Informati...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Version Vectors Institut f¨ur Informatik
Outline
1 Distr...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Version Vectors Institut f¨ur Informatik
Version Vectors...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Version Vectors Institut f¨ur Informatik
Version Vectors...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Version Vectors Institut f¨ur Informatik
Version Vectors...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Outline
1...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Write and...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Quorums
E...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Hinted Ha...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
2-Phase C...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
2-Phase C...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
2-Phase C...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Paxos
Pax...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Paxos
cli...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Paxos wit...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Paxos wit...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Quorums and Consensus Institut f¨ur Informatik
Paxos wit...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Implementations Institut f¨ur Informatik
Outline
1 Distr...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Implementations Institut f¨ur Informatik
Amazon Dynamo
G...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Implementations Institut f¨ur Informatik
Riak
Uses SHA1 ...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Implementations Institut f¨ur Informatik
MongoDB
Uses ma...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Implementations Institut f¨ur Informatik
Cassandra
“A tr...
{ }
Knowledge
Engineering
K∃
Georg-August-Universit¨at G¨ottingen
Conclusion Institut f¨ur Informatik
Conclusion
Modern di...
Nächste SlideShare
Wird geladen in …5
×

Replication and Synchronization Algorithms for Distributed Databases - Lena Wiese

2.074 Aufrufe

Veröffentlicht am

This talk will provide in-depth background on strategies for replication and synchronization as implemented in modern distributed databases. The following topics will be covered: master-slave vs multi-master replication; epidemic protocols; two-phase commit vs Paxos; multiversion concurrency control; read and write quorums. A concise overview of implementations in current NoSQL databases will be presented.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Replication and Synchronization Algorithms for Distributed Databases - Lena Wiese

  1. 1. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Institut f¨ur Informatik Replication and Synchronization Algorithms for Distributed Databases Lena Wiese Research Group Knowledge Engineering Institut f¨ur Informatik Uni G¨ottingen Sep 19th, 2015 Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 1 / 43
  2. 2. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Institut f¨ur Informatik Short CV Dr. Lena Wiese University of G¨ottingen (Research Group Leader Knowledge Engineering) University of Hildesheim (Visiting Professor for Databases) National Institute of Informatics, Tokyo, Japan Robert Bosch India Ltd., Bangalore, India Master/PhD: TU Dortmund Teaching and Research NoSQL databases (lecture, seminars, projects) Database security (encryption for Cassandra and HBase) Web: http://wiese.free.fr/ Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 2 / 43
  3. 3. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Institut f¨ur Informatik Book / Workshop Master’s level text book (in English): Lena Wiese: Advanced Data Management for SQL, NoSQL, Cloud and Distributed Databases c 2015 DeGruyter/Oldenbourg (several pictures in this talk taken from the book) Workshop NoSQL-Net organized jointly with Irena Holubova (Charles Unversity, Prague) and Valentina Janev (Instituto Mihailo Pupin, Belgrade) 3rd edition to be organized at DEXA conference 2016 We welcome industrial application papers Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 3 / 43
  4. 4. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Distributed Database Systems Institut f¨ur Informatik Outline 1 Distributed Database Systems 2 Epidemic Protocols 3 Data Allocation and Replication 4 Multiversion Concurrency Control 5 Version Vectors 6 Quorums and Consensus 7 Implementations 8 Conclusion Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 4 / 43
  5. 5. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Distributed Database Systems Institut f¨ur Informatik Distributed Database Systems Scaling up (vertical scaling) vs Scaling out (horizontal scaling) Scaling out on commodity hardware in a shared-nothing architecture Distributed Database Systems A distributed database is a collection of data records that are physically distributed on several servers in a network while logically they belong together. Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 5 / 43
  6. 6. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Distributed Database Systems Institut f¨ur Informatik Distribution Transparency For a user it must be transparent how the DBMS internally handles data storage and query processing in a distributed manner. Access transparency: access independent of the structure of the network or the storage organization Location transparency: data distribution hidden from the user Replication transparency: user may access any replica Fragmentation transparency: fragmentation/sharding handled internally; user can query the database as if it contained the unfragmented data set Migration transparency: data reorganization does not affect data access Concurrency transparency: operations of multiple users must not interfere Failure transparency: continue processing user requests even in the presence of failures Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 6 / 43
  7. 7. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Epidemic Protocols Institut f¨ur Informatik Outline 1 Distributed Database Systems 2 Epidemic Protocols 3 Data Allocation and Replication 4 Multiversion Concurrency Control 5 Version Vectors 6 Quorums and Consensus 7 Implementations 8 Conclusion Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 7 / 43
  8. 8. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Epidemic Protocols Institut f¨ur Informatik Epidemic Protocols Propagation of information (like membership lists or data updates) in the network of database servers difficult: message loss high churn: change due to insertions or removals of servers Epidemic protocols spread new information like an infection or like gossip infected nodes have received a new message that they want to pass on to other susceptible nodes so far have not received the new message removed nodes have received the message but are no longer willing to pass it on Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 8 / 43
  9. 9. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Epidemic Protocols Institut f¨ur Informatik Variants of Epidemic Protocols Anti-entropy: periodic task (for example, run once every minute) Anti-entropy is a simple epidemic: any server is either susceptible or infective (there are no removed servers) the infection process does not degrade over time or due to some probabilistic decision Rumor spreading: the infection can be triggered by the arrival of a new message (in which case the server becomes infective); or it can be run periodically Rumor spreading is a complex epidemic: the infection proceeds in several rounds The amount of infections decreases with every round as the number of removed servers grows. probabilistic: After each exchange with another server, the server stops being infective with a certain probability counter-based: After a certain number k of exchanges, the server stops being infective Alan J. Demers, Daniel H. Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard E. Sturgis, Daniel C. Swinehart, and Douglas B. Terry. Epidemic algorithms for replicated database maintenance. Operating Systems Review, 22(1):8-32, 1988. Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 9 / 43
  10. 10. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Epidemic Protocols Institut f¨ur Informatik Hash Trees Hash a list of messages level-wise Quick detection of identical message lists Detection of different subtrees Hash1,2,3,4 Hash1,2 Hash1 Hash2 Hash3,4 Hash3 Hash4 Message 1 Message 2 Message 3 Message 4 top hash message list BUT: all servers need the same sorting order in their message lists Ralph C. Merkle. A digital signature based on a conventional encryption function. In International Cryptology Conference (CRYPTO), pages 369-378. Springer, 1987. Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 10 / 43
  11. 11. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Data Allocation and Replication Institut f¨ur Informatik Outline 1 Distributed Database Systems 2 Epidemic Protocols 3 Data Allocation and Replication 4 Multiversion Concurrency Control 5 Version Vectors 6 Quorums and Consensus 7 Implementations 8 Conclusion Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 11 / 43
  12. 12. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Data Allocation and Replication Institut f¨ur Informatik Consistent Hashing Allocation of servers and data items on a hash ring Based on hash of server name and file name or key/ID Allocate file to next (clock-wise) server on hash ring Can handle churn David R. Karger, Eric Lehman, Frank Thomson Leighton, Rina Panigrahy, Matthew S. Levine, and Daniel Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Frank Thom- son Leighton and Peter W. Shor, editors, Twenty-Ninth Annual ACM Symposium on the Theory of Computing, pages 654-663, 1997. Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 12 / 43
  13. 13. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Data Allocation and Replication Institut f¨ur Informatik Consistent Hashing server1 hash:2 server2 hash:7 server3 hash:11 file1 hash:0 file5 hash:1 file2 hash:10 file4 hash:8 file6 hash:6 file3 hash:3 file7 hash:5 Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 13 / 43
  14. 14. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Data Allocation and Replication Institut f¨ur Informatik Consistent Hashing server1 hash:2 server2 hash:7 server3 hash:11 file1 hash:0 file5 hash:1 file2 hash:10 file4 hash:8 file6 hash:6 file3 hash:3 file7 hash:5 Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 14 / 43
  15. 15. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Data Allocation and Replication Institut f¨ur Informatik Consistent Hashing server1 hash:2 server2 hash:7 server3 hash:11 server4 hash:4 file1 hash:0 file5 hash:1 file2 hash:10 file4 hash:8 file6 hash:6 file3 hash:3 file7 hash:5 Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 15 / 43
  16. 16. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Data Allocation and Replication Institut f¨ur Informatik Consistent Hashing with Virtual Servers server1 hash:2 server2 hash:7 server3 hash:11 server4a hash:4 server4b hash:9 file1 hash:0 file5 hash:1 file2 hash:10 file4 hash:8 file6 hash:6 file3 hash:3 file7 hash:5 Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 16 / 43
  17. 17. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Data Allocation and Replication Institut f¨ur Informatik Master-Slave Replication Master Slave Slave write read read read update update Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 17 / 43
  18. 18. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Data Allocation and Replication Institut f¨ur Informatik Multi-Master Replication Master Master Master write write write read read read synchronize synchronize synchronize Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 18 / 43
  19. 19. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Multiversion Concurrency Control Institut f¨ur Informatik Outline 1 Distributed Database Systems 2 Epidemic Protocols 3 Data Allocation and Replication 4 Multiversion Concurrency Control 5 Version Vectors 6 Quorums and Consensus 7 Implementations 8 Conclusion Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 19 / 43
  20. 20. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Multiversion Concurrency Control Institut f¨ur Informatik Multiversion Concurrency Control Non-blocking reads User 1 v-current=v0 v-current=v1 v-current=v2 User 2 t0 t1 t2 t3 abc uvw xyz v=0 v=1 v=2 time read v0 read v0 read v0 read v1 write v2 read v0 write v1 Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 20 / 43
  21. 21. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Multiversion Concurrency Control Institut f¨ur Informatik Multiversion Concurrency Control Abort transaction on concurrent write User 1 v-current=v0 v-current=v0 v-current=v1-1 User 2 v-read=v0 v-read=v0 t0 t1 t2 t3 abc xyz ijk v=0 v=1-1 v=1-2 time read v0 write v1-1 read v0 write v1-2 conflict!!!v-read < v-current Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 21 / 43
  22. 22. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Version Vectors Institut f¨ur Informatik Outline 1 Distributed Database Systems 2 Epidemic Protocols 3 Data Allocation and Replication 4 Multiversion Concurrency Control 5 Version Vectors 6 Quorums and Consensus 7 Implementations 8 Conclusion Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 22 / 43
  23. 23. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Version Vectors Institut f¨ur Informatik Version Vectors Logical counter for each client Option 1: Automatic synch with union semantics (e.g. shopping cart) Replica 1 Replica 2 a:0 b:0 c:0 a:1 b:0 c:0 a:1 b:1 c:0 a:1 b:1 c:1 a:1 b:1 c:1 a:0 b:0 c:0 a:0 b:1 c:0 a:1 b:1 c:0 a:1 b:1 c:1 Client a Client b Client c read write read write union synch read write synch Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 23 / 43
  24. 24. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Version Vectors Institut f¨ur Informatik Version Vectors Logical counter for each client Option 2: Sibling versions and reconciliation by client context Replica 1 Replica 2 a:0 b:0 c:0 a:1 b:0 c:0 a:1 b:0 c:0 a:0 b:1 c:0 a:1 b:1 c:1 a:1 b:1 c:1 a:0 b:0 c:0 a:0 b:1 c:0 a:1 b:0 c:0 a:0 b:1 c:0 a:1 b:1 c:1 Client a Client b Client c read write read write siblings synch read reconciling write synch Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 24 / 43
  25. 25. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Version Vectors Institut f¨ur Informatik Version Vectors Caution: Logical counter for each replica leads to lost updates! Replica 1 Replica 2 rep1:0 rep2:0 rep1:1 rep2:0 rep1:2 rep2:0 rep1:0 rep2:0 rep1:? rep2:0 Client a Client b read write read write read write Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 25 / 43
  26. 26. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Outline 1 Distributed Database Systems 2 Epidemic Protocols 3 Data Allocation and Replication 4 Multiversion Concurrency Control 5 Version Vectors 6 Quorums and Consensus 7 Implementations 8 Conclusion Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 26 / 43
  27. 27. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Write and Read quorums Use quorums to ensure consistency A read quorum is defined as a subset of replicas that have to be contacted when reading data; for a successful read, all replicas in a read quorum have to return the same answer value. A write quorum is defined as a subset of replicas that have to be contacted when writing data; for a successful write, all replicas in the write quorum have to acknowledge the write request. Quorum rules for N replicas, read quorum size R, write quorum size W R + W > N (consistent reads) 2W > N (strongly consistent writes) Partial quorums only ensure weak consistency (for example, during failures) Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 27 / 43
  28. 28. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Quorums Example 1: ROWA read one write all (left) Example 2: Majority quorums (right) S1 S2 S3 S4 S5 read quorum write quorum S1 S2 S3 S4 S5 read quorum write quorum Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 28 / 43
  29. 29. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Hinted Handoff and Read Repair Hinted handoff: handle temporary failures if a replica is unavailable, write requests are delegated to another available server hint that these requests should be relayed to the replica server as soon as possible Read repair: a coordinator node sends out the read request to a couple of replicas coordinator contacts a set of replicas larger than the needed quorum returns the majority response to the client sends repair (update) instructions to those replicas that are not yet synchronized Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 29 / 43
  30. 30. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik 2-Phase Commit Execution of a distributed transaction where all agents have to acknowledge a successful finalization of the transaction 2PC has a voting phase and a decision phase Failure of a single agent will lead to a global abort of the transaction The state between the two phases – before the coordinator sends his decision to all agents – is called the in-doubt state In case the coordinator irrecoverably fails before sending his decision, the agents cannot proceed to either commit or abort the transaction. Jim Gray. Notes on Data Base Operating Systems. In: M. J. Flynn, J. N. Gray, A. K. Jones, K. Lagally H. Opderbeck, G. J. Popek, B. Randell J. H. Saltzer, H. R. Wehle: Operating Systems: An Advanced Course. Lecture Notes in Computer Science, volume 60, pp. 393-481. Springer, 1978. Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 30 / 43
  31. 31. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik 2-Phase Commit coordinator agent 1 agent 2 agent 3 Phase 1 Phase 2 prepare ready commit acknowledge commit Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 31 / 43
  32. 32. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik 2-Phase Commit coordinator agent 1 agent 2 agent 3 Phase 1 Phase 2 prepare failed ready ready abort acknowledge abort Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 32 / 43
  33. 33. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Paxos Paxos algorithm can be applied to keep a distributed DBMS in a consistent state under certain failures A client can issue a read request Database servers have to come to a consensus on what the current state (and hence the most recent value) of the record is Database servers as agents Proposer Leader Acceptor Learner Leslie Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2):133–169, 1998 Leslie Lamport. Generalized consensus and Paxos. Technical report, Technical Report MSR-TR-2005-33, Microsoft Research, 2005 Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 33 / 43
  34. 34. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Paxos client leader acceptor 1 acceptor 2 acceptor 3 learner Phase 1 Phase 2 request prepare(propNum) promise(propNum,maxAcceptPropNum,value) accept(propNum,chosenValue) accepted(propNum,value) return(value) Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 34 / 43
  35. 35. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Paxos with minority of failing acceptors client leader acceptor 1 acceptor 2 acceptor 3 learner Phase 1 Phase 2 request prepare(propNum) promise(propNum,maxAcceptPropNum,value) accept(propNum,chosenValue) accepted(propNum,value) return(value) Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 35 / 43
  36. 36. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Paxos with majority of failing acceptors client leader acceptor 1 acceptor 2 acceptor 3 learner acceptor 5acceptor 4 Phase 1 Phase 2 request prepare(p1) promise(p1,maxAcceptPropNum,value) accept(p1,chosenValue) accepted(p1,value) Phase 1 p1<p2 Phase 2 prepare(p2) promise(p2,maxAcceptPropNum,value) accept(p2,chosenValue) accepted(p2,value) return(value) Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 36 / 43
  37. 37. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Quorums and Consensus Institut f¨ur Informatik Paxos with dueling proposers client leader1 leader2 acceptor 1 acceptor 2 acceptor 3 learner Phase 1 p1<p2 p2<p3 p3<p4 request prepare(p1) promise(p1,maxAcceptPropNum,value) prepare(p2) promise(p2,maxAcceptPropNum,value) accept(p1,chosenValue) Notaccepted(p1) prepare(p3) promise(p3,maxAcceptPropNum,value) accept(p2,chosenValue) Notaccepted(p2) prepare(p4) promise(p4,maxAcceptPropNum,value) Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 37 / 43
  38. 38. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Implementations Institut f¨ur Informatik Outline 1 Distributed Database Systems 2 Epidemic Protocols 3 Data Allocation and Replication 4 Multiversion Concurrency Control 5 Version Vectors 6 Quorums and Consensus 7 Implementations 8 Conclusion Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 38 / 43
  39. 39. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Implementations Institut f¨ur Informatik Amazon Dynamo Giuseppe DeCandia, Deniz Hastorun, Madan Jam- pani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon’s highly available key-value store. In Sympo- sium on Operating Systems Principles (SOSP), pages 205–220. ACM, 2007. Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 39 / 43
  40. 40. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Implementations Institut f¨ur Informatik Riak Uses SHA1 for consistent hashing with virtual servers to distribute the load Uses “dotted” version vectors (no locks) with sibling semantics Exposes them to the users as a context for client-side conflict resolution Uses hinted handoff (temporary and permanent “ownership transfer”) and read repair Uses Anti-Antropy with hash trees as a background task to detect inconsistent data http://basho.com/posts/technical/why-riak-just-works/ Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 40 / 43
  41. 41. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Implementations Institut f¨ur Informatik MongoDB Uses master-slave replication Replica set with one primary and several secondaries If primary fail a new one is selected from the replica set Heartbeat: by default 10 seconds Write concern is used to configure the write quorum (0,1,majority,...) Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 41 / 43
  42. 42. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Implementations Institut f¨ur Informatik Cassandra “A true masterless architecture” Supports consistent hashing with RandomPartitioner Uses read repair and hinted handoff Write quorums can be configured (“tunable consistency”: ONE, QUORUM, or ALL) Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 42 / 43
  43. 43. { } Knowledge Engineering K∃ Georg-August-Universit¨at G¨ottingen Conclusion Institut f¨ur Informatik Conclusion Modern distributed databases provide a revival / an implementation of established ideas on novel hardware. Thank you! Lena Wiese Replication and Synchronization Algorithms for Distributed Databases 43 / 43

×