A new challenge is emerging due to the advent
of new classes of applications and technologies such as smart environments, sensor networks, mobile systems, peertopeer systems, cloud computing etc. In these settings, the underlying distributed systems cannot be fully managed but it needs some degree of self-management that depends on the specific application domain. However, it is possible
to delineate some common consequences of the presence of such self management: first, there is no entity that can always ensure the validity of the system assumptions during the entire computation and, second, no one knows
accurately who joins and who leaves the system at any
time introducing a kind of unpredictability in the system
composition (this phenomenon of arrival and departure
of processes in a system is also known as churn).
As a consequence, distributed computing abstractions have to deal not only with asynchrony and failures, but also with this dynamic dimension where a process that does not crash can leave the system at any time implying that membership can fully change several times during the same
computation. Hence, the abstractions for reliable distributed compiuting
have to be reconsidered
to take into account this new “adversary” setting. This selfdefined
and continuously evolving distributed system, that
we will name in the following dynamic distributed system,
makes abstractions more difficult to understand and master
than in distributed systems where the set of processes is
fixed and known by all participants. The churn notion
becomes thus a system parameter whose aim is to make
tractable systems having their composition evolving along
the time.
The presentation analyzes the issues in building a regula register in an environment that considers crashs and byzantine failures.
This presentation has been delivered during the Retirement Seminar for Professor Santosh Shrivastava that took place in Newcastle (UK) on september 2011.
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems
1. Roberto Baldoni Università di Roma “La Sapienza” Retirement Seminar for Professor Santosh Shrivastava 8 th of September 2011, Newcastle, U K The Price of Mastering Churn in Distributed Systems Roberto Baldoni, “The price of mastering churn in a distributed system”
2.
3.
4.
5.
6.
7.
8.
9. Spectrum of Possible System Models World Orderly Chaotic Static Managed Distributed Systems Dynamic Unmanaged Distributed Systems Roberto Baldoni, “The price of mastering churn in a distributed system” Air traffic Control Mobile ad-hoc Systems Cloud Computing Peer-to-peer
10.
11. System Model with Churn Roberto Baldoni, “The price of mastering churn in a distributed system”
12.
13.
14. Churn Distributed System Distributed Computation Connectivity Protocol Communication Protocols Abstraction Roberto Baldoni, “The price of mastering churn in a distributed system” For simplicity we assume N processes are in the distributed computation at any given time
15. Object Abstraction: The Regular Register A register is a shared variable accessed by processes through read and write operations Roberto Baldoni, “The price of mastering churn in a distributed system”
16.
17. Regular Register: write() Roberto Baldoni, “The price of mastering churn in a distributed system” The writer process p w wants to write the value v p w sends a broadcast message (WRITE, v, sn) … in the meanwhile processes join and leave the computation OBS . Only processes belonging to the computation when p w starts the write and that remain in the computation for all the time of the write will maintain the updated copy of the register Active Processes keeps the state of the computation Distributed System A subset of processes participate to the register computation p w
18. Processes in the distributed computation vs Active Processes Roberto Baldoni, “The price of mastering churn in a distributed system” N Churn A(t) t Correctness bound #processes Joining processe=leaving processes
19. Processes in the distributed computation vs Active Processes Roberto Baldoni, “The price of mastering churn in a distributed system” N Churn A(t) t Correctness bound #processes Joining processe=leaving processes Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.
20. Processes in the distributed computation vs Active Processes N Churn A(t) t #processes Joining processe=leaving processes Correctness bound Liveness and Safety issues Roberto Baldoni, “The price of mastering churn in a distributed system” Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.
21.
22.
23. Synchronous System Safety: case register i = Roberto Baldoni, “The price of mastering churn in a distributed system” Join() 0 0 0 0 p i p j p h p k INQUIRY(i) REPLY(h, 0, 0) If no write is concurrent with the join operation, and c<1/3 then there always exists an active process that replies with the last written value Join Write Reply
24. Synchronous System Safety: case register i = Roberto Baldoni, “The price of mastering churn in a distributed system” write (1) Join() 0 0 0 1 1 1 p i p j p h p k INQUIRY(i) REPLY(h, 0, 0) WRITE(1, 1) p i can receive both WRITE( < val,sn > ) messages and REPLY( < j, val, sn > ) messages. According the values received at time τ + 2 δ , p i will update register i to the value written by a concurrent update, or the value written before the concurrent writes WRITE(1, 1) If pi receives the write before the reply, pi does not overwrite the value and then any following write will return the last value written.
25.
26. Horizontal Quorums for Register Persistence Roberto Baldoni, “The price of mastering churn in a distributed system” 3 δ joining Active process Non-active process 1 5 9 3 1 5 9 8 1 5 7 8 2 5 7 8 2
27.
28.
29.
30.
31. Regular Register with Byzantine Failures Roberto Baldoni, “The price of mastering churn in a distributed system”
32.
33.
34.
35. Requirements Write Persistency: Servers maintain the last value written by a write operation despite servers departures Byzantine Resiliency: There are always at least f+1 servers maintaining the same value Read- Validity: any read() operation returns the last value written by a completed write() or a value concurrently written
36. Issues in read() operations time t 1 t 2 t i t k v x x v v x x v v v x x v v x x v v v y
37.
38.
39. Pictorial Related Work and summary of results for Regular Register System Model Churn Model Failure model Asyncronous Eventually synchronous synchronous crash byzantine static quiescent continuous Aguilera et al. PODC 2010 Baldoni et al. ICDCS 2009 Baldoni et al. PODC 2011 Roberto Baldoni, “The price of mastering churn in a distributed system”
40. Pictorial Related Work and summary of results for Regular Register Roberto Baldoni, “The price of mastering churn in a distributed system” No Churn Quiescent Churn Continuous Churn Synch Crash BFT papers Baldoni et al ICDCS 2009 Byzant Baldoni et al. PODC 2011 (ba) Event Synch crash Baldoni et al ICDCS 2009 byzantine Open Problem Asynch Crash Aguillera et al 2009 Impossible byzant Open Problem
41.
42.
43.
44.
45. One slide to remember Roberto Baldoni, “The price of mastering churn in a distributed system”
46. One slide to remember N Churn A(t) t #processes Joining processe=leaving processes Correctness bound Liveness and Safety issues Roberto Baldoni, “The price of mastering churn in a distributed system” Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.
Hinweis der Redaktion
What is the weakest system model in which we are still able to provide meaningful specifications of a distributed computing abstraction and solutions?