14. ◦ An “island” :
◦ A “barrier”:
◦ Partition identifier (PID) = (pd, i, pnodes)
ABCDEF SP DEF
ABCDEF SP D
sourcedestination
destination source
15. Subscription is accepted when it is added into routing
tables
That requires acknowledgments from whole outgoing
set
ABCDEP S
16. Subscription is accepted when it is added into routing
tables
That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
s
17. Subscription is accepted when it is added into routing
tables
That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
ssssss
18. Subscription is accepted when it is added into routing
tables
That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑
conf
19. Subscription is accepted when it is added into routing
tables
That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑
conf
☑
conf
☑
conf
☑
conf
☑
conf
☑
conf
20. Subscription is accepted when it is added into routing
tables
That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑
conf
☑
conf
☑
conf
☑
conf
☑
conf
☑
conf
☑
21. Brokers’ B FD detects partition, and connects to first alive
broker along the path
It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
22. Brokers’ B FD detects partition, and connects to first alive
broker along the path
It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
sss
23. Brokers’ B FD detects partition, and connects to first alive
broker along the path
It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑
conf
ss
☑
conf
24. Brokers’ B FD detects partition, and connects to first alive
broker along the path
It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑
conf
ss
☑
conf
☑
conf*
*Tag conf with pid
25. Brokers’ B FD detects partition, and connects to first alive
broker along the path
It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑
conf
ss
☑
conf
☑
conf*
☑
conf*
☑* pid tag is also
stored along
with s*Tag conf with pid
☑
26. Forwarding compromises of five steps:
◦ Queuing
◦ Barrier checking
◦ Matching
◦ Routing
◦ cleanup
27. Forwarding only uses subscriptions accepted brokers.
Steps in forwarding of publication p:
◦ Identify broker of accepted subscriptions that match p
◦ Determine active connections towards matching subscriptions’
brokers
◦ Send p on those active connections and wait for confirmations
◦ If there are local matching subscribers, deliver to them
◦ If no downstream matching subscriber exists, issue confirmation
towards P
◦ Once confirmations arrive, discard p and send a conf towards P
Publications
ABCDEP S
Subscriptions
☑
p
☑ ☑ ☑ ☑ ☑ ☑
CE
p p p p p
Deliver to local
subscribers
confconfconfconfconfconf
p
28. Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
Publications
ABCDEP S
Subscriptions
29. Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
Publications
ABCDEP S
Subscriptions
☑ ☑ ☑ ☑ ☑*
30. Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p
31. Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
p
32. Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
conf
conf
conf
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
conf
p
33. Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
conf
conf
conf
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
Depending on when this link has been established
either recovery or subscription propagation ensure
C accepts s prior to receiving p
conf
p
34. Is initiated upon activation of a new session.
Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
35. Is initiated upon activation of a new session.
Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
ABCDEX R
New session
36. Is initiated upon activation of a new session.
Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New sessionsi si
sisi
37. Is initiated upon activation of a new session.
Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New session
csi
si
csi
csi
csicsi
csi
si
sisi
Ack messages
38. Is initiated upon activation of a new session.
Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New session
csi ☑*
si
csi
csi
csicsi
csi
si
sisi
Ack messages
39. Is initiated upon activation of a new session.
Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New session
csi ☑*
si
csi
csi
csicsi
csi
si
sisi
Ack messages
40. Is initiated upon activation of a new session.
Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New session
csi ☑*
si
csi
csi
csicsi
csi
si
sisi
Ack messages
41. Is required for crashed broker, that have been restarted
Restarted node should be able:
◦ Restoring its δ+1 – neighborhood from stable storage
◦ Querying a network management service aware of
neighborhood information
Further steps:
◦ Activating links with neighbors
◦ Partial recovery initiation
42. Size of brokers’ neighborhoods as a function of ∆
∆=4
∆=3
∆=1
∆=2
• Network size of 1000
• Broker fanout of 3
43. Impact of failures on end-to-end broker reachability
– Overlay setup:
• Network size 1000 Brokers with
fanout=3
– Failure injection:
• Failures: up to 100 brokers
• We randomly marked a given
number of nodes as failed
– Measurements:
• The number of end-to-end
brokers whose intermediate
primary tree path contains ∆
consecutive failed brokers in a
chain have been counted.
44. Impact of failures on end-to-end broker reachability
∆=3
∆=4
∆=2
∆=1
– Overlay setup:
• Network size 1000 Brokers with
fanout=3
– Failure injection:
• Failures: up to 100 brokers
• We randomly marked a given
number of nodes as failed
– Measurements:
• The number of end-to-end
brokers whose intermediate
primary tree path contains ∆
consecutive failed brokers in a
chain have been counted.
45. Impact of failures on publication delivery
500 brokers deployed on
8-core machines in a
cluster:
• Network setup: Overlay
fanout = 3.
• We measured
aggregate pub. delivery
count in an interval of
120s
• Expected bar is number
of publications that
must be delivered
despite failures (this
excludes traffic to/from
failed brokers).
46. Impact of failures on publication delivery
500 brokers deployed on
8-core machines in a
cluster:
• Network setup: Overlay
fanout = 3.
• We measured
aggregate pub. delivery
count in an interval of
120s
• Expected bar is number
of publications that
must be delivered
despite failures (this
excludes traffic to/from
failed brokers).
47. Snoeren – publications are forwarded redundantly on
multiple disjoint paths between subscribers and
publishers
XNET – provides crash/failover scheme similar to this
works when δ=1
Gryphon – based on replication scheme, in which
routing information is replicated across multiple
physical machines
48. Developed reliable P/S system that tolerates
concurrent broker and link failures:
◦ Configuration parameter δ determines level of resiliency
against failures (in the worst case).
◦ Dissemination trees augmented with neighborhood
knowledge.
◦ Neighborhood knowledge allows brokers to maintain
network connectivity and make forwarding decision despite
failures.