SlideShare ist ein Scribd-Unternehmen logo
1 von 49
 Focus on:
◦ fault – tolerance
◦ reliability
 Based on:
◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
 Focus on:
◦ fault – tolerance
◦ reliability
 Based on:
◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
 Focus on:
◦ fault – tolerance
◦ reliability
 Based on:
◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
1-neighborhood
 Focus on:
◦ fault – tolerance
◦ reliability
 Based on:
◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
2-neighborhood
1-neighborhood
 Focus on:
◦ fault – tolerance
◦ reliability
 Based on:
◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
2-neighborhood
1-neighborhood
 Focus on:
◦ fault – tolerance
◦ reliability
 Based on:
◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
3-neighborhood
2-neighborhood
1-neighborhood
◦ An “island” :
ABCDEF SP D
sourcedestination
◦ An “island” :
◦ A “barrier”:
◦ Partition identifier (PID) = (pd, i, pnodes)
ABCDEF SP DEF
ABCDEF SP D
sourcedestination
destination source
 Subscription is accepted when it is added into routing
tables
 That requires acknowledgments from whole outgoing
set
ABCDEP S
 Subscription is accepted when it is added into routing
tables
 That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
s
 Subscription is accepted when it is added into routing
tables
 That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
ssssss
 Subscription is accepted when it is added into routing
tables
 That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑
conf
 Subscription is accepted when it is added into routing
tables
 That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑
conf
☑
conf
☑
conf
☑
conf
☑
conf
☑
conf
 Subscription is accepted when it is added into routing
tables
 That requires acknowledgments from whole outgoing
set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑
conf
☑
conf
☑
conf
☑
conf
☑
conf
☑
conf
☑
 Brokers’ B FD detects partition, and connects to first alive
broker along the path
 It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
 Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
 Brokers’ B FD detects partition, and connects to first alive
broker along the path
 It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
 Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
sss
 Brokers’ B FD detects partition, and connects to first alive
broker along the path
 It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
 Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑
conf
ss
☑
conf
 Brokers’ B FD detects partition, and connects to first alive
broker along the path
 It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
 Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑
conf
ss
☑
conf
☑
conf*
*Tag conf with pid
 Brokers’ B FD detects partition, and connects to first alive
broker along the path
 It removes identified nodes from Outs list and sends
confirmation to upper brokers with included PID of partition
 Subscription is accepted when all ACK messages are received
from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑
conf
ss
☑
conf
☑
conf*
☑
conf*
☑* pid tag is also
stored along
with s*Tag conf with pid
☑
 Forwarding compromises of five steps:
◦ Queuing
◦ Barrier checking
◦ Matching
◦ Routing
◦ cleanup
 Forwarding only uses subscriptions accepted brokers.
 Steps in forwarding of publication p:
◦ Identify broker of accepted subscriptions that match p
◦ Determine active connections towards matching subscriptions’
brokers
◦ Send p on those active connections and wait for confirmations
◦ If there are local matching subscribers, deliver to them
◦ If no downstream matching subscriber exists, issue confirmation
towards P
◦ Once confirmations arrive, discard p and send a conf towards P
Publications
ABCDEP S
Subscriptions
☑
p
☑ ☑ ☑ ☑ ☑ ☑
CE
p p p p p
Deliver to local
subscribers
confconfconfconfconfconf
p
 Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
Publications
ABCDEP S
Subscriptions
 Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
Publications
ABCDEP S
Subscriptions
☑ ☑ ☑ ☑ ☑*
 Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p
 Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
p
 Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
conf
conf
conf
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
conf
p
 Key forwarding invariant to ensure reliability:
ensuring that no stream of publications are delivered
to a subscriber after being forwarded by brokers that
have not accepted its subscription
conf
conf
conf
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
Depending on when this link has been established
either recovery or subscription propagation ensure
C accepts s prior to receiving p
conf
p
 Is initiated upon activation of a new session.
 Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
 Is initiated upon activation of a new session.
 Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
ABCDEX R
New session
 Is initiated upon activation of a new session.
 Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New sessionsi si
sisi
 Is initiated upon activation of a new session.
 Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New session
csi
si
csi
csi
csicsi
csi
si
sisi
Ack messages
 Is initiated upon activation of a new session.
 Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New session
csi ☑*
si
csi
csi
csicsi
csi
si
sisi
Ack messages
 Is initiated upon activation of a new session.
 Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New session
csi ☑*
si
csi
csi
csicsi
csi
si
sisi
Ack messages
 Is initiated upon activation of a new session.
 Have five steps:
◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are
transferred too
◦ Subscriptions are accepted by R and sent to its downstream
network
◦ Partition information is updated within distance 2δ
si
si
ABCDEX R
New session
csi ☑*
si
csi
csi
csicsi
csi
si
sisi
Ack messages
 Is required for crashed broker, that have been restarted
 Restarted node should be able:
◦ Restoring its δ+1 – neighborhood from stable storage
◦ Querying a network management service aware of
neighborhood information
 Further steps:
◦ Activating links with neighbors
◦ Partial recovery initiation
Size of brokers’ neighborhoods as a function of ∆
∆=4
∆=3
∆=1
∆=2
• Network size of 1000
• Broker fanout of 3
Impact of failures on end-to-end broker reachability
– Overlay setup:
• Network size 1000 Brokers with
fanout=3
– Failure injection:
• Failures: up to 100 brokers
• We randomly marked a given
number of nodes as failed
– Measurements:
• The number of end-to-end
brokers whose intermediate
primary tree path contains ∆
consecutive failed brokers in a
chain have been counted.
Impact of failures on end-to-end broker reachability
∆=3
∆=4
∆=2
∆=1
– Overlay setup:
• Network size 1000 Brokers with
fanout=3
– Failure injection:
• Failures: up to 100 brokers
• We randomly marked a given
number of nodes as failed
– Measurements:
• The number of end-to-end
brokers whose intermediate
primary tree path contains ∆
consecutive failed brokers in a
chain have been counted.
Impact of failures on publication delivery
500 brokers deployed on
8-core machines in a
cluster:
• Network setup: Overlay
fanout = 3.
• We measured
aggregate pub. delivery
count in an interval of
120s
• Expected bar is number
of publications that
must be delivered
despite failures (this
excludes traffic to/from
failed brokers).
Impact of failures on publication delivery
500 brokers deployed on
8-core machines in a
cluster:
• Network setup: Overlay
fanout = 3.
• We measured
aggregate pub. delivery
count in an interval of
120s
• Expected bar is number
of publications that
must be delivered
despite failures (this
excludes traffic to/from
failed brokers).
 Snoeren – publications are forwarded redundantly on
multiple disjoint paths between subscribers and
publishers
 XNET – provides crash/failover scheme similar to this
works when δ=1
 Gryphon – based on replication scheme, in which
routing information is replicated across multiple
physical machines
 Developed reliable P/S system that tolerates
concurrent broker and link failures:
◦ Configuration parameter δ determines level of resiliency
against failures (in the worst case).
◦ Dissemination trees augmented with neighborhood
knowledge.
◦ Neighborhood knowledge allows brokers to maintain
network connectivity and make forwarding decision despite
failures.
Partition-Tolerant Distributed Publish/Subscribe System

Weitere ähnliche Inhalte

Ähnlich wie Partition-Tolerant Distributed Publish/Subscribe System

Ähnlich wie Partition-Tolerant Distributed Publish/Subscribe System (10)

Scaling connections in peer-to-peer applications
Scaling connections in peer-to-peer applicationsScaling connections in peer-to-peer applications
Scaling connections in peer-to-peer applications
 
MMIX Peering Forum: Securing Internet Routing
MMIX Peering Forum: Securing Internet RoutingMMIX Peering Forum: Securing Internet Routing
MMIX Peering Forum: Securing Internet Routing
 
Ericsson-IMS-call-flows-volte123_pdf.pdf
Ericsson-IMS-call-flows-volte123_pdf.pdfEricsson-IMS-call-flows-volte123_pdf.pdf
Ericsson-IMS-call-flows-volte123_pdf.pdf
 
Codemotion 2019: A million likes/second: Real-time interactions on Live Video
Codemotion 2019: A million likes/second: Real-time interactions on Live VideoCodemotion 2019: A million likes/second: Real-time interactions on Live Video
Codemotion 2019: A million likes/second: Real-time interactions on Live Video
 
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovNetworking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
 
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017
Deep Dive: AWS Direct Connect and VPNs - NET403 - re:Invent 2017
 
BKNIX Peering Forum 2019: Securing Internet Routing
BKNIX Peering Forum 2019: Securing Internet RoutingBKNIX Peering Forum 2019: Securing Internet Routing
BKNIX Peering Forum 2019: Securing Internet Routing
 
Cisco Connect Vancouver 2017 - Optimizing your client's wi fi experience
Cisco Connect Vancouver 2017 - Optimizing your client's wi fi experienceCisco Connect Vancouver 2017 - Optimizing your client's wi fi experience
Cisco Connect Vancouver 2017 - Optimizing your client's wi fi experience
 
20190727 HashiCorp Consul Workshop: 管管你們家 config 啦
20190727 HashiCorp Consul Workshop: 管管你們家 config 啦20190727 HashiCorp Consul Workshop: 管管你們家 config 啦
20190727 HashiCorp Consul Workshop: 管管你們家 config 啦
 
Realtime Content Delivery: Powering dynamic instant experiences
Realtime Content Delivery: Powering dynamic instant experiencesRealtime Content Delivery: Powering dynamic instant experiences
Realtime Content Delivery: Powering dynamic instant experiences
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Partition-Tolerant Distributed Publish/Subscribe System

  • 1.
  • 2.
  • 3.
  • 4.  Focus on: ◦ fault – tolerance ◦ reliability  Based on: ◦ tree-overlay ◦ neighborhood knowledge ◦ δ - configuration parameter
  • 5.  Focus on: ◦ fault – tolerance ◦ reliability  Based on: ◦ tree-overlay ◦ neighborhood knowledge ◦ δ - configuration parameter
  • 6.  Focus on: ◦ fault – tolerance ◦ reliability  Based on: ◦ tree-overlay ◦ neighborhood knowledge ◦ δ - configuration parameter 1-neighborhood
  • 7.  Focus on: ◦ fault – tolerance ◦ reliability  Based on: ◦ tree-overlay ◦ neighborhood knowledge ◦ δ - configuration parameter 2-neighborhood 1-neighborhood
  • 8.  Focus on: ◦ fault – tolerance ◦ reliability  Based on: ◦ tree-overlay ◦ neighborhood knowledge ◦ δ - configuration parameter 2-neighborhood 1-neighborhood
  • 9.  Focus on: ◦ fault – tolerance ◦ reliability  Based on: ◦ tree-overlay ◦ neighborhood knowledge ◦ δ - configuration parameter 3-neighborhood 2-neighborhood 1-neighborhood
  • 10.
  • 11.
  • 12.
  • 13. ◦ An “island” : ABCDEF SP D sourcedestination
  • 14. ◦ An “island” : ◦ A “barrier”: ◦ Partition identifier (PID) = (pd, i, pnodes) ABCDEF SP DEF ABCDEF SP D sourcedestination destination source
  • 15.  Subscription is accepted when it is added into routing tables  That requires acknowledgments from whole outgoing set ABCDEP S
  • 16.  Subscription is accepted when it is added into routing tables  That requires acknowledgments from whole outgoing set ABCDEP S Subscriptions s
  • 17.  Subscription is accepted when it is added into routing tables  That requires acknowledgments from whole outgoing set ABCDEP S Subscriptions ssssss
  • 18.  Subscription is accepted when it is added into routing tables  That requires acknowledgments from whole outgoing set ABCDEP S Subscriptions Confirmations ssssss ☑ conf
  • 19.  Subscription is accepted when it is added into routing tables  That requires acknowledgments from whole outgoing set ABCDEP S Subscriptions Confirmations ssssss ☑ conf ☑ conf ☑ conf ☑ conf ☑ conf ☑ conf
  • 20.  Subscription is accepted when it is added into routing tables  That requires acknowledgments from whole outgoing set ABCDEP S Subscriptions Confirmations ssssss ☑ conf ☑ conf ☑ conf ☑ conf ☑ conf ☑ conf ☑
  • 21.  Brokers’ B FD detects partition, and connects to first alive broker along the path  It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition  Subscription is accepted when all ACK messages are received from brokers in Outs list ABCDEP S Confirmations Subscriptions CD B
  • 22.  Brokers’ B FD detects partition, and connects to first alive broker along the path  It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition  Subscription is accepted when all ACK messages are received from brokers in Outs list ABCDEP S Confirmations Subscriptions CD B sss
  • 23.  Brokers’ B FD detects partition, and connects to first alive broker along the path  It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition  Subscription is accepted when all ACK messages are received from brokers in Outs list ABCDEP S Confirmations Subscriptions CD B s ☑ conf ss ☑ conf
  • 24.  Brokers’ B FD detects partition, and connects to first alive broker along the path  It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition  Subscription is accepted when all ACK messages are received from brokers in Outs list ABCDEP S Confirmations Subscriptions CD B s ☑ conf ss ☑ conf ☑ conf* *Tag conf with pid
  • 25.  Brokers’ B FD detects partition, and connects to first alive broker along the path  It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition  Subscription is accepted when all ACK messages are received from brokers in Outs list ABCDEP S Confirmations Subscriptions CD B s ☑ conf ss ☑ conf ☑ conf* ☑ conf* ☑* pid tag is also stored along with s*Tag conf with pid ☑
  • 26.  Forwarding compromises of five steps: ◦ Queuing ◦ Barrier checking ◦ Matching ◦ Routing ◦ cleanup
  • 27.  Forwarding only uses subscriptions accepted brokers.  Steps in forwarding of publication p: ◦ Identify broker of accepted subscriptions that match p ◦ Determine active connections towards matching subscriptions’ brokers ◦ Send p on those active connections and wait for confirmations ◦ If there are local matching subscribers, deliver to them ◦ If no downstream matching subscriber exists, issue confirmation towards P ◦ Once confirmations arrive, discard p and send a conf towards P Publications ABCDEP S Subscriptions ☑ p ☑ ☑ ☑ ☑ ☑ ☑ CE p p p p p Deliver to local subscribers confconfconfconfconfconf p
  • 28.  Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription Publications ABCDEP S Subscriptions
  • 29.  Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription Publications ABCDEP S Subscriptions ☑ ☑ ☑ ☑ ☑*
  • 30.  Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription Publications ABCDEP S Subscriptionsp C BD ☑ ☑ ☑ ☑ ☑* p
  • 31.  Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription Publications ABCDEP S Subscriptionsp C BD ☑ ☑ ☑ ☑ ☑* p p p
  • 32.  Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription conf conf conf Publications ABCDEP S Subscriptionsp C BD ☑ ☑ ☑ ☑ ☑* p p conf p
  • 33.  Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription conf conf conf Publications ABCDEP S Subscriptionsp C BD ☑ ☑ ☑ ☑ ☑* p p Depending on when this link has been established either recovery or subscription propagation ensure C accepts s prior to receiving p conf p
  • 34.  Is initiated upon activation of a new session.  Have five steps: ◦ Notify about active session ◦ Reply by sending a summary of subscriptions ◦ Summary is compared to local list, missing subscriptions are transferred too ◦ Subscriptions are accepted by R and sent to its downstream network ◦ Partition information is updated within distance 2δ
  • 35.  Is initiated upon activation of a new session.  Have five steps: ◦ Notify about active session ◦ Reply by sending a summary of subscriptions ◦ Summary is compared to local list, missing subscriptions are transferred too ◦ Subscriptions are accepted by R and sent to its downstream network ◦ Partition information is updated within distance 2δ ABCDEX R New session
  • 36.  Is initiated upon activation of a new session.  Have five steps: ◦ Notify about active session ◦ Reply by sending a summary of subscriptions ◦ Summary is compared to local list, missing subscriptions are transferred too ◦ Subscriptions are accepted by R and sent to its downstream network ◦ Partition information is updated within distance 2δ si si ABCDEX R New sessionsi si sisi
  • 37.  Is initiated upon activation of a new session.  Have five steps: ◦ Notify about active session ◦ Reply by sending a summary of subscriptions ◦ Summary is compared to local list, missing subscriptions are transferred too ◦ Subscriptions are accepted by R and sent to its downstream network ◦ Partition information is updated within distance 2δ si si ABCDEX R New session csi si csi csi csicsi csi si sisi Ack messages
  • 38.  Is initiated upon activation of a new session.  Have five steps: ◦ Notify about active session ◦ Reply by sending a summary of subscriptions ◦ Summary is compared to local list, missing subscriptions are transferred too ◦ Subscriptions are accepted by R and sent to its downstream network ◦ Partition information is updated within distance 2δ si si ABCDEX R New session csi ☑* si csi csi csicsi csi si sisi Ack messages
  • 39.  Is initiated upon activation of a new session.  Have five steps: ◦ Notify about active session ◦ Reply by sending a summary of subscriptions ◦ Summary is compared to local list, missing subscriptions are transferred too ◦ Subscriptions are accepted by R and sent to its downstream network ◦ Partition information is updated within distance 2δ si si ABCDEX R New session csi ☑* si csi csi csicsi csi si sisi Ack messages
  • 40.  Is initiated upon activation of a new session.  Have five steps: ◦ Notify about active session ◦ Reply by sending a summary of subscriptions ◦ Summary is compared to local list, missing subscriptions are transferred too ◦ Subscriptions are accepted by R and sent to its downstream network ◦ Partition information is updated within distance 2δ si si ABCDEX R New session csi ☑* si csi csi csicsi csi si sisi Ack messages
  • 41.  Is required for crashed broker, that have been restarted  Restarted node should be able: ◦ Restoring its δ+1 – neighborhood from stable storage ◦ Querying a network management service aware of neighborhood information  Further steps: ◦ Activating links with neighbors ◦ Partial recovery initiation
  • 42. Size of brokers’ neighborhoods as a function of ∆ ∆=4 ∆=3 ∆=1 ∆=2 • Network size of 1000 • Broker fanout of 3
  • 43. Impact of failures on end-to-end broker reachability – Overlay setup: • Network size 1000 Brokers with fanout=3 – Failure injection: • Failures: up to 100 brokers • We randomly marked a given number of nodes as failed – Measurements: • The number of end-to-end brokers whose intermediate primary tree path contains ∆ consecutive failed brokers in a chain have been counted.
  • 44. Impact of failures on end-to-end broker reachability ∆=3 ∆=4 ∆=2 ∆=1 – Overlay setup: • Network size 1000 Brokers with fanout=3 – Failure injection: • Failures: up to 100 brokers • We randomly marked a given number of nodes as failed – Measurements: • The number of end-to-end brokers whose intermediate primary tree path contains ∆ consecutive failed brokers in a chain have been counted.
  • 45. Impact of failures on publication delivery 500 brokers deployed on 8-core machines in a cluster: • Network setup: Overlay fanout = 3. • We measured aggregate pub. delivery count in an interval of 120s • Expected bar is number of publications that must be delivered despite failures (this excludes traffic to/from failed brokers).
  • 46. Impact of failures on publication delivery 500 brokers deployed on 8-core machines in a cluster: • Network setup: Overlay fanout = 3. • We measured aggregate pub. delivery count in an interval of 120s • Expected bar is number of publications that must be delivered despite failures (this excludes traffic to/from failed brokers).
  • 47.  Snoeren – publications are forwarded redundantly on multiple disjoint paths between subscribers and publishers  XNET – provides crash/failover scheme similar to this works when δ=1  Gryphon – based on replication scheme, in which routing information is replicated across multiple physical machines
  • 48.  Developed reliable P/S system that tolerates concurrent broker and link failures: ◦ Configuration parameter δ determines level of resiliency against failures (in the worst case). ◦ Dissemination trees augmented with neighborhood knowledge. ◦ Neighborhood knowledge allows brokers to maintain network connectivity and make forwarding decision despite failures.