1. Chubby Lock Service
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 1 / 38
2. What is the Problem?
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 2 / 38
3. What is the Problem?
Large distributed system
How to ensure that only one process across a fleet of processes acts
on a resource?
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 3 / 38
4. What is the Problem?
Large distributed system
How to ensure that only one process across a fleet of processes acts
on a resource?
• Ensure only one server can write to a database.
• Ensure that only one server can perform a particular action.
• Ensure that there is a single master that processes all writes.
• ...
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 3 / 38
5. What is the Problem?
All these scenarios need a simple way to coordinate execution of
process to ensure that only one entity is performing an action.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
6. What is the Problem?
All these scenarios need a simple way to coordinate execution of
process to ensure that only one entity is performing an action.
We need agreement (consensus)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
7. What is the Problem?
All these scenarios need a simple way to coordinate execution of
process to ensure that only one entity is performing an action.
We need agreement (consensus)
Paxos ...
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
8. How Can We Use Paxos
to Solve the Problem
of Coordination?
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 5 / 38
9. Possible Solutions
Building a consensus library
Building a centralized lock service
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 6 / 38
10. Consensus Library vs. Centralized Service
Consensus library
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
11. Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
12. Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
13. Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
14. Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
15. Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
• Clean interface.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
16. Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
• Clean interface.
• Service can be modified independently.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
17. Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
• Clean interface.
• Service can be modified independently.
• A single client can obtain a lock and make progress (non-quorum
based decisions, the lock service takes care of it)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
18. Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
• Clean interface.
• Service can be modified independently.
• A single client can obtain a lock and make progress (non-quorum
based decisions, the lock service takes care of it)
• Application developers do not have to worry about dealing with
operations, various failure modes debugging etc.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
20. Chubby is a Lock Service
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 9 / 38
21. Chubby
Primary goals: reliability and availability (not high performance)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 10 / 38
22. Chubby
Primary goals: reliability and availability (not high performance)
Used in Google: GFS, Bigtable, etc.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 10 / 38
23. Chubby Structure
Two main components:
• Server (chubby cell)
• Client library
Communicate via RPC
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 11 / 38
24. Chubby Cell
A small set of servers (replicas)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
25. Chubby Cell
A small set of servers (replicas)
One is the master (elected from the replicas via Paxos)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
26. Chubby Cell
A small set of servers (replicas)
One is the master (elected from the replicas via Paxos)
Maintain copies of simple database (replicated state machines)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
27. Chubby Cell
A small set of servers (replicas)
One is the master (elected from the replicas via Paxos)
Maintain copies of simple database (replicated state machines)
• Only the master initiates reads and writes of this database.
• All other replicas simply copy updates from the master (using the
Paxos protocol).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
28. Chubby Client
Find the master: sending master location requests to replicas.
All requests are sent directly to the master.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 13 / 38
31. Chubby Operations
Write requests
• The master receives the request.
• Write requests are propagated via the consensus protocol to all
replicas.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
32. Chubby Operations
Write requests
• The master receives the request.
• Write requests are propagated via the consensus protocol to all
replicas.
• Acknowledges when write request reaches the majority of the
replicas
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
33. Chubby Operations
Write requests
• The master receives the request.
• Write requests are propagated via the consensus protocol to all
replicas.
• Acknowledges when write request reaches the majority of the
replicas
Read requests
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
34. Chubby Operations
Write requests
• The master receives the request.
• Write requests are propagated via the consensus protocol to all
replicas.
• Acknowledges when write request reaches the majority of the
replicas
Read requests
• Satisfied by the master alone.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
35. Let’s Go into More Details
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 15 / 38
36. Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
37. Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Tree of files with names separated by slashes (namesapace):
/ls/cell1/aut/cc14:
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
38. Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Tree of files with names separated by slashes (namesapace):
/ls/cell1/aut/cc14:
• 1st component (ls): lock service (common to all names)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
39. Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Tree of files with names separated by slashes (namesapace):
/ls/cell1/aut/cc14:
• 1st component (ls): lock service (common to all names)
• 2nd component (cell1): the chubby cell name
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
40. Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Tree of files with names separated by slashes (namesapace):
/ls/cell1/aut/cc14:
• 1st component (ls): lock service (common to all names)
• 2nd component (cell1): the chubby cell name
• The rest: the name of the directory and the file (name inside the
cell)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
41. Chubby Interface (2/2)
Chubby maintains a namespace that contains files and directories,
called nodes.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
42. Chubby Interface (2/2)
Chubby maintains a namespace that contains files and directories,
called nodes.
Clients can create nodes and add contents to nodes.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
43. Chubby Interface (2/2)
Chubby maintains a namespace that contains files and directories,
called nodes.
Clients can create nodes and add contents to nodes.
Support most normal operations:
• create, delete, open, write, ...
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
44. Chubby Interface (2/2)
Chubby maintains a namespace that contains files and directories,
called nodes.
Clients can create nodes and add contents to nodes.
Support most normal operations:
• create, delete, open, write, ...
Clients open nodes to obtain handles that are analogous to unix file
descriptors.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
45. Locks (1/2)
Each Chubby file and directory (node) can act as a reader/writer
lock.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
46. Locks (1/2)
Each Chubby file and directory (node) can act as a reader/writer
lock.
A client handle may hold the lock in two modes: writer (exclusive)
mode or reader (shared) mode.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
47. Locks (1/2)
Each Chubby file and directory (node) can act as a reader/writer
lock.
A client handle may hold the lock in two modes: writer (exclusive)
mode or reader (shared) mode.
• Only one client can hold lock in writer mode.
• Many clients can hold lock in reader mode.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
48. Locks (2/2)
Locks are advisory in Chubby.
• Holding a lock is not necessary to access file.
• Holding a lock does not prevent other clients accessing file.
• Conflicts only with other attempts to acquire the same lock.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 19 / 38
49. Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
50. Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
• Clients open a lock file and attempt to acquire the lock in write
mode.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
51. Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
• Clients open a lock file and attempt to acquire the lock in write
mode.
• One client succeeds (becomes the primary) and writes its
name/identity into the file.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
52. Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
• Clients open a lock file and attempt to acquire the lock in write
mode.
• One client succeeds (becomes the primary) and writes its
name/identity into the file.
• Other clients fail (become replicas) and discover the name of the
primary by reading the file.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
53. Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
• Clients open a lock file and attempt to acquire the lock in write
mode.
• One client succeeds (becomes the primary) and writes its
name/identity into the file.
• Other clients fail (become replicas) and discover the name of the
primary by reading the file.
• Primary obtains sequencer and passes it to servers (servers can
insure that the elected primary is still valid).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
54. Chubby Example (2/2)
Leader election
Open("/ls/foo/OurServicePrimary", "write mode")
if (successful) { // primary
SetContents(primary_identity)
} else { // replica
Open("/ls/foo/OurServicePrimary", "read mode",
"file-modification event")
when notified of file modification:
primary = GetContentsAndStat()
}
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 21 / 38
56. Events
Chubby clients may subscribe to many events when they create a
handle.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 23 / 38
57. Events
Chubby clients may subscribe to many events when they create a
handle.
Events include:
• File contents modified
• Child node added, removed, or modified
• Chubby master failed over
• Handle has become invalid
• Lock acquired
• Conflicting lock requested from another client
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 23 / 38
58. Locking - Problem
After acquiring a lock and issuing a request (R1) a client may fail.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
59. Locking - Problem
After acquiring a lock and issuing a request (R1) a client may fail.
Another client may acquire the lock and issue its own request (R2).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
60. Locking - Problem
After acquiring a lock and issuing a request (R1) a client may fail.
Another client may acquire the lock and issue its own request (R2).
R1 arrive later at the server and be acted upon (possible inconsis-
tency).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
61. Locking - Solutions (1/3)
Solution 1: virtual time
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 25 / 38
62. Locking - Solutions (1/3)
Solution 1: virtual time
• But, it is costly to introduce sequence numbers into all the
interactions.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 25 / 38
63. Locking - Solutions (2/3)
Solution 2: sequencers
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
64. Locking - Solutions (2/3)
Solution 2: sequencers
• Sequence numbers are introduced into only those interactions that
make use of locks.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
65. Locking - Solutions (2/3)
Solution 2: sequencers
• Sequence numbers are introduced into only those interactions that
make use of locks.
• A lock holder can obtain a sequencer from Chubby.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
66. Locking - Solutions (2/3)
Solution 2: sequencers
• Sequence numbers are introduced into only those interactions that
make use of locks.
• A lock holder can obtain a sequencer from Chubby.
• It attaches the sequencer to any requests that it sends to other
servers (e.g., Bigtable)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
67. Locking - Solutions (2/3)
Solution 2: sequencers
• Sequence numbers are introduced into only those interactions that
make use of locks.
• A lock holder can obtain a sequencer from Chubby.
• It attaches the sequencer to any requests that it sends to other
servers (e.g., Bigtable)
• The other servers can verify the sequencer information
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
68. Locking - Solutions (3/3)
Solution 3: lock-delay
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
69. Locking - Solutions (3/3)
Solution 3: lock-delay
• Correctly released locks are immediately available.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
70. Locking - Solutions (3/3)
Solution 3: lock-delay
• Correctly released locks are immediately available.
• If a lock becomes free because holder failed or becomes inaccessible,
lock cannot be claimed by another client until lock-delay expires.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
71. Caching
To reduce read traffic, Chubby clients cache file data and node
meta-data.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
72. Caching
To reduce read traffic, Chubby clients cache file data and node
meta-data.
Write-through: write is done synchronously both to the cache and
to the primary copy.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
73. Caching
To reduce read traffic, Chubby clients cache file data and node
meta-data.
Write-through: write is done synchronously both to the cache and
to the primary copy.
Master will invalidate cached copies upon a write request.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
74. Caching
To reduce read traffic, Chubby clients cache file data and node
meta-data.
Write-through: write is done synchronously both to the cache and
to the primary copy.
Master will invalidate cached copies upon a write request.
The client also can allow its cache lease to expire.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
75. Session Lease (1/3)
Lease: a promise by one party to abide by an agreement for a given
interval of time unless the promise is explicitly revoked.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
76. Session Lease (1/3)
Lease: a promise by one party to abide by an agreement for a given
interval of time unless the promise is explicitly revoked.
Session: a relationship between a Chubby cell and a Chubby client.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
77. Session Lease (1/3)
Lease: a promise by one party to abide by an agreement for a given
interval of time unless the promise is explicitly revoked.
Session: a relationship between a Chubby cell and a Chubby client.
Session lease: interval during which client’s cached items (handles,
locks, files) are valid.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
78. Session Lease (1/3)
Lease: a promise by one party to abide by an agreement for a given
interval of time unless the promise is explicitly revoked.
Session: a relationship between a Chubby cell and a Chubby client.
Session lease: interval during which client’s cached items (handles,
locks, files) are valid.
Session maintained through KeepAlives messages.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
79. Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
80. Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
• Client disables cache.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
81. Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
• Client disables cache.
• Session in jeopardy, client waits in grace period (45s).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
82. Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
• Client disables cache.
• Session in jeopardy, client waits in grace period (45s).
• Sends jeopardy event to application.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
83. Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
• Client disables cache.
• Session in jeopardy, client waits in grace period (45s).
• Sends jeopardy event to application.
• Application can suspend activity.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
84. Session Lease (3/3)
Client attempts to exchange KeepAlive messages with master during
grace period.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
85. Session Lease (3/3)
Client attempts to exchange KeepAlive messages with master during
grace period.
• Succeed: re-enables cache; send safe event to application
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
86. Session Lease (3/3)
Client attempts to exchange KeepAlive messages with master during
grace period.
• Succeed: re-enables cache; send safe event to application
• Failed: discards cache; sends expired event to application
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
87. Scaling
Reducing communication with the master.
Two techniques:
• Proxies
• Partitioning
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 32 / 38
88. Scaling - Proxies
Proxy is a trusted process that pass requests from other clients to
a Chubby cell.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
89. Scaling - Proxies
Proxy is a trusted process that pass requests from other clients to
a Chubby cell.
Can handle KeepAlives and reads → reduce the master load.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
90. Scaling - Proxies
Proxy is a trusted process that pass requests from other clients to
a Chubby cell.
Can handle KeepAlives and reads → reduce the master load.
Cannot reduce write loads, but they are << 1% of workload.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
91. Scaling - Proxies
Proxy is a trusted process that pass requests from other clients to
a Chubby cell.
Can handle KeepAlives and reads → reduce the master load.
Cannot reduce write loads, but they are << 1% of workload.
Introduces another point of failure.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
93. Scaling - Partitioning
Namespace partitioned between servers.
N partitions, each with master and replicas
Node D/C stored on P(D/C) = hash(D) mod N
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 34 / 38
95. Summary
Library vs. Service
Chubby: coarse-grained lock service
Chubby cell and clients
Unix-like interface
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 36 / 38
96. References:
M. Burrows, The Chubby lock service for loosely-coupled distributed
systems, OSDI, 2006.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 37 / 38