There are several patterns that are very useful when designing and building distributed applications, like leader election or “atomic broadcast”. In practice, implementing these concepts is a very difficult undertaking and it is often offloaded to external services. These services in turn need to be maintained and operated and might even expose very complex APIs or semantics.
“Exclusive Producer” is a new addition to the set of Pulsar’s features, and it allows an application to have the guarantee of exclusive access when writing to a topic and to discard any data from producers who lost the exclusive access.
During this session we are going to show how easy it is, using Exclusive Producer, to build very robust mechanisms of communications across different services or instances of a single service.
These mechanisms, like for example leader election, failover and “total order and atomic broadcast”, are the main building blocks to build consistent, correct and reliable distributed systems.
We will also present how the exclusive producer is implemented internally, and how it can guarantee its properties in the presence of all different sorts of failures.
Exclusive Producer: Using Pulsar to Build Distributed Applications - Pulsar Summit NA 2021
1. Pulsar Virtual Summit North America 2021
Exclusive Producer:
Using Pulsar to Build
Distributed Applications
2. Pulsar Virtual Summit North America 2021
Matteo Merli
CTO @ StreamNative
Co-Creator and PMC Chair for Apache Pulsar
PMC Member Apache BookKeeper
Prev: Splunk, Streamlio, Yahoo
3. Pulsar Virtual Summit North America 2021
Agenda
I. Common patterns in distributed applications
II. It’s a tricky business
III. Fencing the resources
IV. How to use Pulsar to solve the problem
V. How does it work internally??
5. Pulsar Virtual Summit North America 2021
● Acquire exclusive access to a shared
resource
● Release the lock automatically in
case of failures
Example of use cases:
● Protect conflicting modifications
● Ownership/Assignment
● Service Discovery
Distributed Locks
6. Pulsar Virtual Summit North America 2021
Leader Election
● Reduce a distributed system
problem in to a local one
● Elect a leader among a set of peers
● The leader can perform tasks and
take decision on its own
● It will use some kind of mechanism
to communicate the decisions to
the followers
Example of use cases:
● Traffic load manager
7. Pulsar Virtual Summit North America 2021
Available options
Mature coordination services - Different semantics - Similar capabilities
8. Pulsar Virtual Summit North America 2021
Implementing Distributed Locks
● Zookeeper
a. Client-1 tries to create an “ephemeral” z-node
b. If it exists, the lock is already taken
c. If we successfully create it, Client-1 is the “owner”
d. If Client-1 loses its ZooKeeper session, the ephemeral z-node is automatically
cleaned
● Etcd
○ Creating keys with a TTL
○ The key will automatically expire if the client does not refresh it
9. Pulsar Virtual Summit North America 2021
● ZooKeeper
a. Each client creates a “sequential” “ephemeral” znode
b. ZK will automatically prepend a unique version number to the z-node name
c. A client will get the list of z-nodes under a certain path and sort them, the
one with lowest version number is chosen as the leader
d. Set a watch to get notified when the leader is gone. Do step (c) again
● Etcd
a. Similar mechanism as for locks, using keys with TTL
Implementing Leader Election
11. Pulsar Virtual Summit North America 2021
● The mentioned systems work very well, as advertised
● There are fine prints which are often overlooked
Difficult questions:
1. How can be 100% that we don’t have 2 clients that think to be
the leader at the same time?
2. What about interactions with other system, like DB or disk?
Things are not as easy as they seem...
12. Pulsar Virtual Summit North America 2021
● It’s not possible to
guaranteed that Client 1
will release the resource
before it expires on the
Lock Service
Concurrent owner
problem
13. Pulsar Virtual Summit North America 2021
● There’s no way to cancel
the write from Client-1
when it loses the
Interacting with
external systems
15. Pulsar Virtual Summit North America 2021
● We need to make sure that the ownership of the shared
resource is validated also in the external systems
● Example:
a. C1 is writing to DB
b. C2 starts writing to DB
c. DB will reject any pending writes from C1
The need for fencing
16. Pulsar Virtual Summit North America 2021
● BookKeeper ensure
the consistency of
data
● Fencing is done
before attempting
to read the data
● No more data is
allowed after that
Example of fencing:
BookKeeper
17. Pulsar Virtual Summit North America 2021
● Fencing is a very powerful property
● Gives guarantee of having 1 single active “writer”
Concerns:
1. BookKeeper doesn’t do leader elections or distributed locks
2. BookKeeper API is fairly “low-level”
Fencing
19. Pulsar Virtual Summit North America 2021
● Goals
a. Ensuring a linear non-interleaved history of messages
b. Expose building block for creating leader-election and distributed locks,
directly to Pulsar users
c. Expose “Fencing” as a property in Pulsar
Pulsar Exclusive Producer
20. Pulsar Virtual Summit North America 2021
To require exclusive access or fail immediately
Producer<String> producer = client.newProducer(Schema.STRING)
.topic("my-topic")
.accessMode(ProducerAccessMode.Exclusive)
.create();
Pulsar Exclusive Producer
21. Pulsar Virtual Summit North America 2021
To require exclusive access or wait if there’s already a producer
Producer<String> producer = client.newProducer(Schema.STRING)
.topic("my-topic")
.accessMode(ProducerAccessMode.WaitForExclusive)
.create();
Pulsar Exclusive Producer
22. Pulsar Virtual Summit North America 2021
● Once producers are
fenced, no more
data from them will
be accepted
● Topic will have a
linear history
● One segment per
each producer
Linear Topic
History
23. Pulsar Virtual Summit North America 2021
● Added new concept of “Topic Epoch”
a. Counter
b. Stored in the topic metadata
c. Only used if there are exclusive producers
● Epoch is incremented each time a new exclusive producer
becomes active
● A producer trying to use “epoch < currentEpoch” will receive a
ProducerFenced exception
● Once a producer is fenced, the producer instance cannot be
used again
Implementation
24. Pulsar Virtual Summit North America 2021
● Epoch only changes
if a new producer
takes over
● The 1st time, a
producer doesn’t
know its epoch
● After that, it will
always use the same
epoch when
reconnecting
Topic Epoch
25. Pulsar Virtual Summit North America 2021
● While P-1 is
disconnected, a new
producer can take
over
● P-1 will be fenced
Topic Epoch
26. Pulsar Virtual Summit North America 2021
Putting everything
together:
Fencing & Exclusivity
27. Pulsar Virtual Summit North America 2021
● Each peer tries to become
the “exclusive producer”
● Whoever succeeds, is
considered the “leader”
● Leader will make the
decisions
● It will communicate them
by publishing on the topic
● If a message is written,
the decision is taken
Leader election
using Pulsar
28. Pulsar Virtual Summit North America 2021
● Very similar to “leader election model
● Become the “exclusive producer” to hold the lock
● Make all the mutations to the resource through messages
published on the topic
Distributed locks using Pulsar
29. Pulsar Virtual Summit North America 2021
We’re hiring
Build Pulsar with the team that builds Pulsar
✓ Work with the creators of Pulsar
✓ Exciting, growth-stage company
✓ Open and collaborative environment
✓ Competitive compensation and
benefits
✓ Best teammates on earth
https://streamnative.io/careers