Citi Tech Talk Disaster Recovery Solutions Deep Dive

confluent
confluentconfluent
Disaster Recovery Solutions Deep Dive
Customer Success Engineering
August 2022
Table of Contents
2
1. Brokers, Zookeeper, Producers &
Consumers
A quick Primer
3. Stretch Clusters & Multi-Region Cluster
An asynchronous, multi-region solution
2. Disaster Recovery Options - Cluster
Linking & Schema Linking
A synchronous and optionally asynchronous
solution
4. Summary
Which solution is right for me?
01. Brokers, Zookeepers,
Producers & Consumers 101
Brokers & Zookeeper
Apache Kafka: Scale Out Vs. Failover
5
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
Apache Zookeeper - Cluster coordination
6
6
Broker 1
partition
Broker 2
(controller) Broker 3 Broker 4
Zookeeper 2
partition
partition
Zookeeper 1
Zookeeper 3
(leader)
partition
partition
partition
partition
Stores metadata:
heartbeats, watches,
controller elections,
cluster/topic configs,
permissions writes go to leader
Clients
Smart Clients to dumb pipes
Producer
8
P
partition 1
partition 2
partition 3
partition 4
A Kafka producer sends data to
multiple partitions based on
partitioning strategy (default is
hash(key) % no of partitions).
Data is sent in batch per partition then batched in request per broker.
Can configure batch size, linger, parallel connections per broker
Producer
9
P
partition 1
partition 2
partition 3
partition 4
A producer can choose to get
acknowledgement (acks) from 0,
1, or ALL (in-sync) replicas of
the partition
Consumer
10
C
A consumer polls data from
partitions it has been assigned
based on a subscription
Consumer
11
C
As the consumer reads the data and
processes the data, it can commit
offsets (where it has read up to) in
different ways (per time interval,
individual records, or “end of current
batch”)
commit
offset
heartbeat
poll records
Consumers - Consumer Groups
12
C
C
C1
C
C
C2
Different applications can
independently read from same
topic partitions at their own pace
Consumers - Consumer group members
13
C C
C C
Within the same application (consumer
group), different partitions can be
assigned to different consumers to
increase parallel consumption as well as
support failover
Make Kafka
Widely Accessible
to Developers
14
Enable all developers to leverage Kafka throughout the
organization with a wide variety of Confluent clients
Confluent Clients
Battle-tested and high performing
producer and consumer APIs (plus
admin client)
2. Disaster Recovery
Options
Why Disaster Recovery?
Recent Regional Cloud
Outages
1
7
AWS Azure GCP
Dec 2021: An unexplained
AWS outage created
business disruptions all day
(CNBC)
Nov 2020: A Kinesis outage
brought down over a dozen
AWS services for 17 hours
in us-east-1
(CRN, AWS)
Apr 1 2021: Some critical
Azure services were
unavailable for an hour
(Coralogix)
Sept 2018: South Central
US region was unavailable
for over a day
(The Register)
Nov 2021: An outage that
affected Home Depot, Snap,
Spotify, and Etsy
(Bloomberg)
Outages hurt business
performance
1
8
A data center or a region
may be down for multiple
hours–up to a day–based
on hist
Data Center
has an outage
The applications in that
data center that run your
business go of
Mission-critical
applications fail
Customers are unable to
place orders, discover
products, receive service,
etc.
Customer
Impact
Revenue is lost directly
from the inability to do
business during downtime,
and indirectly by damaging
brand image and customer
trust
Financial/Reput
a
Failure Types
1
9
Transient Failures Permanent Failures (Data Loss)
Transient failures in data-centers or
clusters are common and worth
protecting against for business
continuity purposes.
Regional outages are rare but still
worth protecting against for mission
critical systems.
Outages are typically transient but
occasionally permanent. Users
accidentally delete topics, human
error occurs.
If your data is unrecoverable and
mission critical, you need an
additional complementary solution.
Failure Scenarios
Data-Center / Regional
Outages
Platform Failures Human Error
Data-Centers have single
points of failure associated with
hardware resulting in
associated outages.
Regional Outages arise from
failures in the underlying cloud
provider.
People delete topics, clusters
and worse.
Unexpected behaviour arise
from standard operations and
within the CI/CD pipeline.
Load is applied unevenly or in
short bursts by batch
processing systems.
Performance limitations arise
unexpectedly.
Bugs occur in Kafka,
Zookeeper and associated
systems.
Cluster Linking & Schema
Linking
22
Cluster Linking
Cluster Linking, built into Confluent Platform and
Confluent Cloud allows you to directly connect
clusters together mirroring topics from one cluster
to another.
Cluster Linking makes it easier to build multi-cluster,
multi-cloud, and hybrid cloud deployments.
Active cluster
Consumers
Producers
clicks
clicks
Topics
DR cluster
clicks
clicks
Mirror Topics
Cluster Link
Primary Region DR Region
23
Schema Linking
Schema Linking, built into Schema Registry allows you
to directly connect Schema Registry clusters
together mirroring subjects or entire contexts.
Contexts, introduced alongside Schema Linking allows
you to create namespaces within Schema Registry
which ensures mirrored subjects don’t run into
schema naming clashes.
Active cluster
Consumers
Producers
clicks
clicks
Schemas
DR cluster
clicks
clicks
Mirror Schemas
Schema Link
Primary Region DR Region
Consumers
Producers
24
Prefixing
Prefixing allows you to add a prefix to a topic and if
desired the associated consumer group to avoid
topic and consumer group naming clashes between
the primary and Disaster Recovery cluster.
This is important when used in an active-active setup
and required to use a two way Cluster Link strategy
which is the recommended approach.
Active cluster
Consumer-Group
clicks
clicks
Topic
DR cluster
clicks
clicks
DR-topic
Cluster Link
Primary Region DR Region
DR-Consumer-Group
Active-Passive
25
HA/DR Active-Passive
1. Steady state
Setup
● The cluster link can
automatically create mirror
topics for any new topics on
the active cluster
● Historical data is replicated &
incoming data is synced in
real-time
Active cluster
Consumers
Producers
clicks
clicks
topics
DR cluster
clicks
clicks
mirror topics
Cluster Link
Primary Region DR Region
HA/DR Active-Passive
2. Failover
1. Detect a regional outage via
metrics going to zero in that
region; decide to failover
2. Call failover API on mirror
topics to make them writable
3. Update DNS to point at DR
cluster
4. Start clients in DR region
Active cluster
Consumers
Producers
clicks
clicks
topics
DR cluster
clicks
clicks
mirror topics
failover
REST API or CLI
Consumers
Producers
Primary Region DR Region
HA/DR Active-Passive
3. Fail forward
The standard strategy is to “fail
forward” promoting the DR region
to be their new Primary Region:
● Cloud regions offer identical
service
● They moved all of their
applications & data systems to
the DR region
● Failing back would introduce
risk with little benefit
To fail forward, simply:
1. Delete topics on original
cluster (or spin up new cluster)
2. Establish cluster link in reverse
direction
Active DR cluster
clicks
clicks
mirror topics
DR Active cluster
clicks
clicks
mirror topics
Cluster Link
Consumers
Producers
Primary DR Region DR Primary Region
HA/DR Active-Passive
3. Failback (alternative)
If you can’t fail forward and need
to failback to the original region:
1. Delete topics on Primary
cluster (or spin up a new
cluster)
2. Establish a cluster link in the
reverse direction
3. When Primary has caught up,
migrate producers &
consumers back:
a. Stop clients
b. promote mirror topic(s)
c. Restart clients pointed at
Primary cluster
DR cluster
clicks
clicks
mirror topics
Consumers
Producers
Cluster Link
Primary Region DR Region
Primary cluster
clicks
clicks
mirror topics
Synced
asynchronously
HA/DR - Consumers must tolerate some duplicates
Consumers must tolerate
duplicate messages
because Cluster Linking is
asynchronous.
Primary cluster
Consumer X
A B C D
Topic
Consumer X offset
at time
of outage
DR cluster
A B C D
Mirror Topic
Consumer X offset
at time of failover
... ...
A B C C D ...
Consumes message
C twice
Active-Passive
Bi-Directional Cluster Linking
31
DR cluster
“East”
HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback
1. Steady state
Setup
For a topic named clicks
● We create duplicate topics on
both the Primary and DR
cluster
● Create prefixed cluster links in
both directions
● Produce records to clicks on
the Primary cluster
● Consumers consume from a
Regex pattern
Primary cluster
“West”
clicks
Consumers
.*clicks
Producers
clicks Add prefix
west
clicks
clicks clicks
west.clicks
east.clicks
Add prefix
east
DR cluster
“East”
HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback
2. Outage strikes!
An outage in the primary region
stops:
● Stops producers & consumers
in primary region
● Temporarily pauses cluster
link mirroring
● A small set of data may not
have been replicated yet to
the DR cluster – this is your
“RPO”
Primary cluster
“West”
clicks
Consumers
.*clicks
Producers
clicks
clicks
clicks clicks
west.clicks
east.clicks
DR cluster
“East”
HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback
3. Failover
To failover:
● Move consumers and
producers to the DR cluster -
keep the same topic names /
regex
● Consumers consume both
○ Pre-failover data in
west.clicks
○ Post-failover data in clicks
● Don’t delete the cluster link
● Disable clicks -> west.clicks
offset replication
Primary cluster
“West”
clicks
Consumers
.*clicks
Producers
clicks
clicks
clicks clicks
west.clicks
east.clicks
DR cluster
“East”
HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback
4. Recovery
If/when the outage is over:
● The primary-to-DR cluster link
automatically recovers the
lagged data (RPO) from the
primary cluster
Note: this data will be “late arriving”
to the consumers
● New records generated to the
DR cluster will automatically
begin replicating to the primary
Primary cluster
“West”
clicks
Consumers
.*clicks
Producers
clicks
Recovers
data
Fails back
data clicks
clicks clicks
west.clicks
east.clicks
DR cluster
“East”
HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback
5. Failback
To failback to the primary region
Consumers need to pick up at the
end of the writable topics, so:
● Ensure that all consumer
groups have 0 consumer lag
for their DR topics e.g.
west.clicks
● Reset all consumer offsets to
the last offsets (LEO), this
can be done by the platform
operator
Finally, move consumers & producers
back to Primary
● Each producer / consumer
group can be moved
independently
Primary cluster
“West”
clicks
Consumers
.*clicks
Producers
clicks
Recovers
data
Fails back
data clicks
clicks clicks
west.clicks
east.clicks
Reset consumers to
resume here
move
move
DR cluster
“East”
Primary cluster
“West”
clicks
Recovers
data
Fails back
data clicks
clicks clicks
west.clicks
east.clicks
HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback
6. And beyond
Re-enable clicks -> west.clicks
consumer offset replication
Once consumer lag is 0 on
east.clicks, then reset all
consumer groups to Log End
Offset (last offset of the partition)
on “clicks” on DR cluster
Consumers
.*clicks
Producers
clicks
Reset consumers to
resume here
Active-Active
Bi-Directional Cluster Linking
38
West cluster
clicks
east.clicks
East cluster
west.clicks
clicks
Consumers
.*clicks
Producers
Add prefix
west
Add prefix
east
Consumers
.*clicks
Producers
Applications /
Web Traffic
Load Balancer
(example)
Applications Applications
39
HA/DR Bi-Directional Cluster Linking: Active-Active
1. Steady state
West cluster
clicks
east.clicks
East cluster
west.clicks
clicks
Consumers
.*clicks
Producers
Add prefix
west
Add prefix
east
Consumers
.*clicks
Producers
Applications /
Web Traffic
Load Balancer
(example)
Applications Applications
40
HA/DR Bi-Directional Cluster Linking: Active-Active
2. Outage strikes!
West cluster
clicks
east.clicks
East cluster
west.clicks
clicks
Consumers
.*clicks
Producers
Add prefix
west
Add prefix
east
Consumers
.*clicks
Producers
Applications /
Web Traffic
Load Balancer
(example)
Applications Applications
re-route
41
HA/DR Bi-Directional Cluster Linking: Active-Active
3. Failover
West cluster
clicks
east.clicks
East cluster
west.clicks
clicks
Consumers
.*clicks
Producers
Add prefix
west
Add prefix
east
Consumers
.*clicks
Producers
Applications /
Web Traffic
Load Balancer
(example)
Applications Applications
Any remaining pre-failure data is
automatically recovered by the
consumers
re-route
42
HA/DR Bi-Directional Cluster Linking: Active-Active
4. Return to Steady State
43
Stretch Cluster
A Stretch Cluster is ONE Kafka cluster that is
“stretched” across multiple availability zones or data
centers.
Uses Kafka internal replication features to achieve
RPO = 0 & low RTO.
3. Stretch Clusters & Multi-Region Cluster
44
Stretch Cluster - Why?
45
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
1. Steady State
Setup
● Any unknown number of
brokers represented here by
brokers 1-4 spread across 2
DCs
● A standard three node
Zookeeper cluster spread
across 2 DCs
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 3
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
1. Steady State… continued
Setup
● Any unknown number of
brokers represented here by
brokers 1-4 spread across 2
DCs
● A standard three node
Zookeeper cluster spread
across 2 DCs
● We’ll also assume a
replication-factor of 3,
min.insync.replicas of 2 and
acks=all
DC “West”
Replica 1
Replica 2
Zookeeper 1
Zookeeper 2
DC “East”
Replica 3
Unused
Broker
Zookeeper 3
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
2. DC Outage
An outage in DC “West”
● … let’s start by just focusing
on Kafka.
DC “West”
Replica 1
Replica 2
Zookeeper 1
Zookeeper 2
DC “East”
Replica 3
Unused
Broker
Zookeeper 3
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
2. DC Outage
An outage in DC “West”
● Min.insync.replicas can no
longer be met and we lose
availability
DC “West”
Replica 1
Replica 2
Zookeeper 1
Zookeeper 2
DC “East”
Replica 3
Unused
Broker
Zookeeper 3
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
3. Fixing Broker Availability
Increase to rf=4
● Looks like we’ve solved our
issue…
DC “West”
Replica 1
Replica 2
Zookeeper 1
Zookeeper 2
DC “East”
Replica 3
Replica 4
Zookeeper 3
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
3. Fixing Broker Availability… But
Increase to rf=4
● Looks like we’ve solved our
issue… but, if our 2 replicas
are down or out of sync then
we lose availability unless we
trigger an unclean leader
election and accept data loss.
DC “West”
Replica 1
Replica 2
Zookeeper 1
Zookeeper 2
DC “East”
Replica 3
Out of Sync
Replica 4
Out of Sync
Zookeeper 3
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
4. Fixing Data Loss
Increase to min.insync.replicas
to 3
● Consumers continue to
operate
● Producers continue to operate
once we revert to
min.insync.replicas=2
DC “West”
Replica 1
Replica 2
Zookeeper 1
Zookeeper 2
DC “East”
Replica 3
Replica 4
Out of Sync
Zookeeper 3
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
4. Fixing Data Loss… But What About Zookeeper?
DC “West”
Replica 1
Replica 2
Zookeeper 1
Zookeeper 2
DC “East”
Replica 3
Replica 4
Out of Sync
Zookeeper 3
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
4. Fixing Data Loss… But What About Zookeeper?
DC “West”
Zookeeper 1
Zookeeper 2
DC “East”
Zookeeper 3
Broker 1
Broker 2
Broker 3
Broker 4
Stretch Cluster: Non-Stretch Cluster Cluster Behaviour
4. Fixing Data Loss… But What About Zookeeper?
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 3
Stretch Cluster - 2 DC
56
Stretch Cluster: 2 DC + Observer
1. Steady State
Setup
● A minimum of 4 brokers
● 6 Zookeeper nodes, one of
which is an observer
● Replication factor of 4,
min.insync.replicas of 3 and
acks=all
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3
Zookeeper 6
(Observer)
Stretch Cluster: 2 DC + Observer
2. DC Outage - On observer DC
An outage in DC “East”
● Consumers continue to
operate
● Producers continue to
operate once we revert to
min.insync.replicas=2
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3
Zookeeper 6
(Observer)
Stretch Cluster: 2 DC + Observer
3. DC Outage - On non-observer DC
An outage in DC “West”
● We can’t reach Zookeeper
quorum!
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3
Zookeeper 6
(Observer)
Stretch Cluster: 2 DC + Observer
3. DC Outage - On non-observer DC… but
An outage in DC “West”
● We promote the Zookeeper
observer to a full follower
● Remove Zookeeper 1, 2 &
3 from quorum list
● Perform rolling restart of
Zookeeper nodes
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3 Zookeeper 6
Stretch Cluster: 2 DC + Observer
3. DC Outage - On non-observer DC
An outage in DC “West”
● Consumers continue to
operate
● Producers continue to
operate once we revert to
min.insync.replicas=2
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3 Zookeeper 6
Stretch Cluster: 2 DC + Observer
4. Network Partition
A network partition occurs
between DCs
● Consumers continue to
operate as usual up until
they’ve consumed all fully
replicated data
● Producer will fail as we can
no longer meet
min.insync.replicas=3
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3
Zookeeper 6
(Observer)
Stretch Cluster: 2 DC + Observer
5. Fixing Network Partition
A network partition occurs
between DCs
● We manually shutdown DC
“East” then update
min.insync.replicas=2
● Clients resume operating
as normal
● Consumers failing over
from DC “East” will
consume some duplicate
records
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3
Zookeeper 6
(Observer)
64
Observer Risk!
Zookeeper observers solve our availability and split-
brain issues but risk data loss!
DC “West”
Zookeeper
Leader
Zookeeper
Follower
Zookeeper
Follower
DC “East”
Zookeeper
Follower (Out
of Sync)
Zookeeper
Follower (Out
of Sync)
Zookeeper
Observer (Out
of Sync)
Quorum
65
Hierarchical Quorum
Hierarchical Quorum involves getting consensus
between multiple Zookeeper “groups” which each
form their own quorum. In the case of two DC
hierarchy, consensus must be reached between
BOTH DCs.
DC “West”
Zookeeper 1
(Leader)
Zookeeper 2
Zookeeper 3
DC “East”
Zookeeper 4
Zookeeper 5
Zookeeper 6
Quorum
Stretch Cluster: 2 DC + Hierarchical Quorum
1. Steady State
Setup
● A minimum of 4 brokers
● 6 Zookeeper nodes, arranged
into two groups
● Replication factor of 4,
min.insync.replicas of 3 and
acks=all
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3 Zookeeper 6
Stretch Cluster: 2 DC + Hierarchical Quorum
2. DC Outage
An outage in DC “East”
● Consumers continue to
operate for leaders on DC
“West”
● Leaders can’t be elected
and configuration updates
can’t be made until we
have hierarchical quorum
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3 Zookeeper 6
Stretch Cluster: 2 DC + Hierarchical Quorum
3. DC Outage
An outage in DC “East”
● Remove DC “East”
Zookeeper group from
hierarchy
● Revert to
min.insync.replicas=2
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3 Zookeeper 6
Stretch Cluster: 2 DC + Hierarchical Quorum
4. Network Partition
A network partition occurs
between DCs
● Consumers continue to
operate as usual up until
they’ve consumed all fully
replicated data
● Producer will fail as we can
no longer meet
min.insync.replicas=3
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3 Zookeeper 6
Stretch Cluster: 2 DC + Hierarchical Quorum
5. Fixing Network Partition
A network partition occurs
between DCs
● We manually shutdown DC
“East”, remove from the
hierarchy & update
min.insync.replicas=2
● Clients resume operating
as normal
● Consumers failing over
from DC “East” will
consume some duplicate
records
DC “West”
Broker 1
Broker 2
Zookeeper 1
Zookeeper 2
DC “East”
Broker 3
Broker 4
Zookeeper 4
Zookeeper 5
Zookeeper 3 Zookeeper 6
Stretch Cluster - 2.5 DC
71
Stretch Cluster: 2.5 DC
1. Steady State
Setup
● A minimum of 4 brokers
● 3 Zookeeper nodes
● Replication factor of 4,
min.insync.replicas of 3 and
acks=all
● Note: It’s actually better for the
DC’s with brokers to be
closest
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 3
Broker 4
Zookeeper 3
DC “Central”
Zookeeper 2
Stretch Cluster: 2.5 DC
2. DC Outage
An outage in DC “West”
● Consumers continue to
operate
● Producers continue to operate
once we revert to
min.insync.replicas=2
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 3
Broker 4
Zookeeper 3
DC “Central”
Zookeeper 2
Stretch Cluster: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● Consumers connected to DC
“East” continue to operate
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 3
Broker 4
Zookeeper 3
DC “Central”
Zookeeper 2
Stretch Cluster: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● Consumers connected to DC
“West” continue to operate
until they’ve processed all fully
replicated records
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 3
Broker 4
Zookeeper 3
DC “Central”
Zookeeper 2
Stretch Cluster: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● Producers connected to DC
“East” continue to operate
once we revert to
min.insync.replicas=2
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 3
Broker 4
Zookeeper 3
DC “Central”
Zookeeper 2
Stretch Cluster: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● Producers connected to DC
“West” continue to operate
once we shutdown DC “West”,
failover and revert to
min.insync.replicas=2
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 3
Broker 4
Zookeeper 3
DC “Central”
Zookeeper 2
Multi-Region Cluster
78
79
Multi-Region Clusters:
Followers vs Observers
Followers are normal replicas, however, observers act
the same except that they are not considered for
acks=all produce requests.
DC “West”
Producers
Follower
Synchronous
Leader
DC “East”
Observer
Asynchronous
80
Multi-Region Clusters:
Automatic Observer
Promotion
As of Confluent Platform v6.1 observers can be
configured to be promoted to meet the
ObserverPromotionPolicy, including:
● Under-min-isr: Promoted if in-sync replica size
drops below min.insync.replicas
● Under-replicated: Promoted to cover any replica
which is no longer insync
● Leader-is-observe: Promoted if the current
leader is an observer
DC “West”
Producers
Follower
Synchronous
Leader
DC “East”
Follower
Asynchronous
Multi-Region Clusters: 2.5 DC
1. Steady State
Setup
● A minimum of 6 brokers
● 3 Zookeeper nodes
● Replication factor of 4, 2
additional observers,
min.insync.replicas of 3 and
acks=all
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 4
Broker 5
Zookeeper 3
DC “Central”
Zookeeper 2
Broker 3
(Observer1)
Broker 6
(Observer2)
Multi-Region Clusters: 2.5 DC
2. DC Outage
An outage in DC “West”
● The Observer in DC “East” is
promoted
● Consumers and Producers
continue to operate as usual
● RPO = 0
● RTO ~ 0
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 4
(Replica 3)
Broker 5
(Replica 4)
Zookeeper 3
DC “Central”
Zookeeper 2
Broker 3
(Observer1)
Broker 6
(Replica 5)
Multi-Region Clusters: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● The Observer in DC “East” is
promoted
● Consumers and Producers
connected to DC “East”
continue to operate as usual
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 4
(Replica 3)
Broker 5
(Replica 4)
Zookeeper 3
DC “Central”
Zookeeper 2
Broker 3
(Observer1)
Broker 6
(Replica 5)
Multi-Region Clusters: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● The Observer in DC “West”
cannot be promoted as it has
no Zookeeper Quorum
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 4
(Replica 3)
Broker 5
(Replica 4)
Zookeeper 3
DC “Central”
Zookeeper 2
Broker 3
(Observer 1)
Broker 6
(Replica 5)
Multi-Region Clusters: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● Consumers connected to DC
“West” continue to operate
until they’ve processed all fully
replicated records. Once we
shutdown DC “West” the
consumers will failover and
consume from the same point.
This will result in duplicate
consumption
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 4
(Replica 3)
Broker 5
(Replica 4)
Zookeeper 3
DC “Central”
Zookeeper 2
Broker 3
(Observer 1)
Broker 6
(Replica 5)
Multi-Region Clusters: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● Producers connected to DC
“West” fail as we can no longer
meet min.insync.replicas=3
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 4
(Replica 3)
Broker 5
(Replica 4)
Zookeeper 3
DC “Central”
Zookeeper 2
Broker 3
(Observer 1)
Broker 6
(Replica 5)
Multi-Region Clusters: 2.5 DC
3. DC Network Partition
A network partition in DC
“West”
● To continue operating as
normal we must manually
shutdown DC “West
DC “West”
Broker 1
Broker 2
Zookeeper 1
DC “East”
Broker 4
(Replica 3)
Broker 5
(Replica 4)
Zookeeper 3
DC “Central”
Zookeeper 2
Broker 3
(Observer 1)
Broker 6
(Replica 5)
Stretch Cluster - 3 DC (ICG EAP KaaS)
88
Multi-Region Clusters: 3 DC
1. Steady State
Setup
● 9 brokers from which 3 brokers
from MWDC stores only
observers
● 5 Zookeeper nodes
● Replication factor of 4, 1
additional observers,
min.insync.replicas of 3 and
acks=all
DC “MW”
Broker 1
(Observer)
Broker 2
(Observer)
Zookeeper 1
DC “NJ”
Broker 7
Broker 8
Zookeeper 4
DC “NY”
Zookeeper 2
Broker 3
(Observer)
Broker 9
Broker 4
Broker 5
Broker 6
Zookeeper 3 Zookeeper 4
Multi-Region Clusters: 3 DC
2. DC Outage
An outage in DC “NY”
● The Observers in DC “MW” is
promoted
● Consumers and Producers
continue to operate as usual
● RPO = 0
● RTO ~ 0
DC “MWDC”
Broker 1
(Replica)
Broker 2
(Replica)
Zookeeper 1
DC “NJ”
Broker 7
Broker 8
Zookeeper 4
DC “NY”
Zookeeper 2
Broker 3
(Replica)
Broker 9
Broker 4
Broker 5
Broker 6
Zookeeper 3 Zookeeper 4
Multi-Region Clusters: 3 DC
3. DC Network Partition
A network partition in DC “NJ”
● The Observer in DC “MW” is
promoted
● Consumers and Producers
connected to DC “NY” continue
to operate as usual
DC “MW”
Broker 1
(Replica)
Broker 2
(Replica)
Zookeeper 1
DC “NJ”
Broker 7
Broker 8
Zookeeper 4
DC “NY”
Zookeeper 2
Broker 3
(Replica)
Broker 9
Broker 4
Broker 5
Broker 6
Zookeeper 3 Zookeeper 4
Multi-Region Clusters: 3 DC
3. DC Network Partition
A network partition in DC “NJ”
● Consumers connected to DC
“NJ” continue to operate until
they’ve processed all fully
replicated records. Once we
shutdown DC “NJ” or
application is restarted, the
consumers will failover and
consume from the same point.
This will result in duplicate
consumption
DC “MW”
Broker 1
(Replica)
Broker 2
(Replica)
Zookeeper 1
DC “NJ”
Broker 7
Broker 8
Zookeeper 4
DC “NY”
Zookeeper 2
Broker 3
(Replica)
Broker 9
Broker 4
Broker 5
Broker 6
Zookeeper 3 Zookeeper 4
Multi-Region Clusters: 3 DC
3. DC Network Partition
A network partition in DC “NJ”
● Producers connected to DC
“NJ” fail as we can no longer
meet min.insync.replicas=3
● Once application is restarted,
the producers will failover and
produce the data connecting to
DC “NY” / DC “MW”
DC “MW”
Broker 1
(Replica)
Broker 2
(Replica)
Zookeeper 1
DC “NJ”
Broker 7
Broker 8
Zookeeper 4
DC “NY”
Zookeeper 2
Broker 3
(Replica)
Broker 9
Broker 4
Broker 5
Broker 6
Zookeeper 3 Zookeeper 5
Multi-Region Clusters: 3 DC
3. DC Network Partition
A network partition in DC “NJ”
● To continue operating as
normal we must manually
shutdown DC “NJ”
DC “MW”
Broker 1
(Replica)
Broker 2
(Replica)
Zookeeper 1
DC “NJ”
Broker 7
Broker 8
Zookeeper 4
DC “NY”
Zookeeper 2
Broker 3
(Replica)
Broker 9
Broker 4
Broker 5
Broker 6
Zookeeper 3 Zookeeper 5
4. Summary
Comparison
96
Supported Cluster Linking
Stretch Cluster / Multi-Region
Cluster
Replicator / MirrorMaker 2
RPO=0 ✓
RTO=~0 ✓ ✓ ✓
Active-Active ✓ ✓ ✓
Failover With All Clients ✓ ✓
Failover With Transactions ✓ ✓
Failover Maintains Record Ordering
✓
✓
Smooth Failback ✓ ✓
Handles Full Cluster Failure ✓ ✓
Hybrid Cloud / Multi-Cloud ✓ ✓
Open Source ✓* ✓*
Preserves Metadata ✓ ✓ ✓*
Citi Tech Talk  Disaster Recovery Solutions Deep Dive
1 von 97

Recomendados

A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ... von
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...HostedbyConfluent
847 views27 Folien
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr... von
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...HostedbyConfluent
575 views30 Folien
Mmckeown hadr that_conf von
Mmckeown hadr that_confMmckeown hadr that_conf
Mmckeown hadr that_confMike McKeown
567 views49 Folien
Beyond the Brokers | Emma Humber and Andrew Borley, IBM von
Beyond the Brokers | Emma Humber and Andrew Borley, IBMBeyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMHostedbyConfluent
336 views28 Folien
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME von
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
352 views65 Folien
Accelerating Application Development with Amazon Aurora (DAT312-R2) - AWS re:... von
Accelerating Application Development with Amazon Aurora (DAT312-R2) - AWS re:...Accelerating Application Development with Amazon Aurora (DAT312-R2) - AWS re:...
Accelerating Application Development with Amazon Aurora (DAT312-R2) - AWS re:...Amazon Web Services
383 views37 Folien

Más contenido relacionado

Similar a Citi Tech Talk Disaster Recovery Solutions Deep Dive

Federated Kubernetes: As a Platform for Distributed Scientific Computing von
Federated Kubernetes: As a Platform for Distributed Scientific ComputingFederated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific ComputingBob Killen
1.5K views36 Folien
Hybrid Cloud Solutions (with Datapipe) von
Hybrid Cloud Solutions (with Datapipe)Hybrid Cloud Solutions (with Datapipe)
Hybrid Cloud Solutions (with Datapipe)RightScale
1.2K views21 Folien
MySQL Database Architectures - 2022-08 von
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08Kenny Gryp
145 views56 Folien
Achieve big data analytic platform with lambda architecture on cloud von
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
1.1K views56 Folien
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor... von
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
571 views28 Folien
Getting Started with Kafka on k8s von
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8sVMware Tanzu
504 views25 Folien

Similar a Citi Tech Talk Disaster Recovery Solutions Deep Dive(20)

Federated Kubernetes: As a Platform for Distributed Scientific Computing von Bob Killen
Federated Kubernetes: As a Platform for Distributed Scientific ComputingFederated Kubernetes: As a Platform for Distributed Scientific Computing
Federated Kubernetes: As a Platform for Distributed Scientific Computing
Bob Killen1.5K views
Hybrid Cloud Solutions (with Datapipe) von RightScale
Hybrid Cloud Solutions (with Datapipe)Hybrid Cloud Solutions (with Datapipe)
Hybrid Cloud Solutions (with Datapipe)
RightScale1.2K views
MySQL Database Architectures - 2022-08 von Kenny Gryp
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08
Kenny Gryp145 views
Achieve big data analytic platform with lambda architecture on cloud von Scott Miao
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
Scott Miao1.1K views
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor... von ScyllaDB
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB571 views
Getting Started with Kafka on k8s von VMware Tanzu
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
VMware Tanzu504 views
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ... von confluent
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent5.7K views
Cloud Composer workshop at Airflow Summit 2023.pdf von Leah Cole
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdf
Leah Cole165 views
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM von HostedbyConfluent
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBMAvailability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
HostedbyConfluent409 views
Scenarios for building Hybrid Cloud von Pracheta Budhwar
Scenarios for building Hybrid CloudScenarios for building Hybrid Cloud
Scenarios for building Hybrid Cloud
Pracheta Budhwar1.3K views
Disaster Recovery and High Availability with Kafka, SRM and MM2 von Abdelkrim Hadjidj
Disaster Recovery and High Availability with Kafka, SRM and MM2Disaster Recovery and High Availability with Kafka, SRM and MM2
Disaster Recovery and High Availability with Kafka, SRM and MM2
Abdelkrim Hadjidj1.7K views
APACHE KAFKA / Kafka Connect / Kafka Streams von Ketan Gote
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote493 views
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it? von Miguel Araújo
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Miguel Araújo293 views
Evolving Your Distributed Cache In A Continuous Delivery World: Tyler Vangorder von Redis Labs
Evolving Your Distributed Cache In A Continuous Delivery World: Tyler VangorderEvolving Your Distributed Cache In A Continuous Delivery World: Tyler Vangorder
Evolving Your Distributed Cache In A Continuous Delivery World: Tyler Vangorder
Redis Labs200 views
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc... von Neo4j
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Neo4j82 views
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment von HBaseCon
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon4K views
Joe Graziano – Challenge 2 Design Solution (Part 1) von tovmug
Joe Graziano – Challenge 2 Design Solution (Part 1)Joe Graziano – Challenge 2 Design Solution (Part 1)
Joe Graziano – Challenge 2 Design Solution (Part 1)
tovmug1.1K views
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta von vdmchallenge
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta
vdmchallenge509 views

Más de confluent

Citi TechTalk Session 2: Kafka Deep Dive von
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
19 views60 Folien
Build real-time streaming data pipelines to AWS with Confluent von
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
73 views53 Folien
Q&A with Confluent Professional Services: Confluent Service Mesh von
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
67 views69 Folien
Citi Tech Talk: Event Driven Kafka Microservices von
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
25 views29 Folien
Confluent & GSI Webinars series - Session 3 von
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
17 views59 Folien
Citi Tech Talk: Messaging Modernization von
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
17 views39 Folien

Más de confluent(20)

Citi TechTalk Session 2: Kafka Deep Dive von confluent
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent19 views
Build real-time streaming data pipelines to AWS with Confluent von confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent73 views
Q&A with Confluent Professional Services: Confluent Service Mesh von confluent
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
confluent67 views
Citi Tech Talk: Event Driven Kafka Microservices von confluent
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
confluent25 views
Confluent & GSI Webinars series - Session 3 von confluent
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent17 views
Citi Tech Talk: Messaging Modernization von confluent
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
confluent17 views
Citi Tech Talk: Data Governance for streaming and real time data von confluent
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
confluent21 views
Confluent & GSI Webinars series: Session 2 von confluent
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
confluent16 views
Data In Motion Paris 2023 von confluent
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
confluent233 views
The Future of Application Development - API Days - Melbourne 2023 von confluent
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
confluent68 views
The Playful Bond Between REST And Data Streams von confluent
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
confluent49 views
The Journey to Data Mesh with Confluent von confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
confluent73 views
Citi Tech Talk: Monitoring and Performance von confluent
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
confluent47 views
Citi Tech Talk: Hybrid Cloud von confluent
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
confluent43 views
Confluent Partner Tech Talk with QLIK von confluent
Confluent Partner Tech Talk with QLIKConfluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIK
confluent90 views
Real-time Streaming for Government and the Public Sector von confluent
Real-time Streaming for Government and the Public SectorReal-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public Sector
confluent41 views
Confluent Partner Tech Talk with SVA von confluent
Confluent Partner Tech Talk with SVAConfluent Partner Tech Talk with SVA
Confluent Partner Tech Talk with SVA
confluent95 views
How to Build Real-Time Analytics Applications like Netflix, Confluent, and Re... von confluent
How to Build Real-Time Analytics Applications like Netflix, Confluent, and Re...How to Build Real-Time Analytics Applications like Netflix, Confluent, and Re...
How to Build Real-Time Analytics Applications like Netflix, Confluent, and Re...
confluent28 views
Single View of Data von confluent
Single View of DataSingle View of Data
Single View of Data
confluent71 views
Leveraging streaming data in real-time to build a Single View of Customer (SVOC) von confluent
Leveraging streaming data in real-time to build a Single View of Customer (SVOC)Leveraging streaming data in real-time to build a Single View of Customer (SVOC)
Leveraging streaming data in real-time to build a Single View of Customer (SVOC)
confluent21 views

Último

How Workforce Management Software Empowers SMEs | TraQSuite von
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteTraQSuite
6 views3 Folien
Quality Assurance von
Quality Assurance Quality Assurance
Quality Assurance interworksoftware2
5 views6 Folien
Bootstrapping vs Venture Capital.pptx von
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptxZeljko Svedic
15 views17 Folien
ADDO_2022_CICID_Tom_Halpin.pdf von
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdfTomHalpin9
5 views33 Folien
Top-5-production-devconMunich-2023.pptx von
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptxTier1 app
9 views40 Folien
Quality Engineer: A Day in the Life von
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the LifeJohn Valentino
7 views18 Folien

Último(20)

How Workforce Management Software Empowers SMEs | TraQSuite von TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 views
Bootstrapping vs Venture Capital.pptx von Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic15 views
ADDO_2022_CICID_Tom_Halpin.pdf von TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin95 views
Top-5-production-devconMunich-2023.pptx von Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app9 views
Dapr Unleashed: Accelerating Microservice Development von Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski13 views
Transport Management System - Shipment & Container Tracking von Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 5 views
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... von Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 views
How to build dyanmic dashboards and ensure they always work von Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom14 views
360 graden fabriek von info33492
360 graden fabriek360 graden fabriek
360 graden fabriek
info33492162 views
Ports-and-Adapters Architecture for Embedded HMI von Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert29 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... von NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi216 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... von TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views

Citi Tech Talk Disaster Recovery Solutions Deep Dive

  • 1. Disaster Recovery Solutions Deep Dive Customer Success Engineering August 2022
  • 2. Table of Contents 2 1. Brokers, Zookeeper, Producers & Consumers A quick Primer 3. Stretch Clusters & Multi-Region Cluster An asynchronous, multi-region solution 2. Disaster Recovery Options - Cluster Linking & Schema Linking A synchronous and optionally asynchronous solution 4. Summary Which solution is right for me?
  • 5. Apache Kafka: Scale Out Vs. Failover 5 Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 6. Apache Zookeeper - Cluster coordination 6 6 Broker 1 partition Broker 2 (controller) Broker 3 Broker 4 Zookeeper 2 partition partition Zookeeper 1 Zookeeper 3 (leader) partition partition partition partition Stores metadata: heartbeats, watches, controller elections, cluster/topic configs, permissions writes go to leader
  • 8. Producer 8 P partition 1 partition 2 partition 3 partition 4 A Kafka producer sends data to multiple partitions based on partitioning strategy (default is hash(key) % no of partitions). Data is sent in batch per partition then batched in request per broker. Can configure batch size, linger, parallel connections per broker
  • 9. Producer 9 P partition 1 partition 2 partition 3 partition 4 A producer can choose to get acknowledgement (acks) from 0, 1, or ALL (in-sync) replicas of the partition
  • 10. Consumer 10 C A consumer polls data from partitions it has been assigned based on a subscription
  • 11. Consumer 11 C As the consumer reads the data and processes the data, it can commit offsets (where it has read up to) in different ways (per time interval, individual records, or “end of current batch”) commit offset heartbeat poll records
  • 12. Consumers - Consumer Groups 12 C C C1 C C C2 Different applications can independently read from same topic partitions at their own pace
  • 13. Consumers - Consumer group members 13 C C C C Within the same application (consumer group), different partitions can be assigned to different consumers to increase parallel consumption as well as support failover
  • 14. Make Kafka Widely Accessible to Developers 14 Enable all developers to leverage Kafka throughout the organization with a wide variety of Confluent clients Confluent Clients Battle-tested and high performing producer and consumer APIs (plus admin client)
  • 17. Recent Regional Cloud Outages 1 7 AWS Azure GCP Dec 2021: An unexplained AWS outage created business disruptions all day (CNBC) Nov 2020: A Kinesis outage brought down over a dozen AWS services for 17 hours in us-east-1 (CRN, AWS) Apr 1 2021: Some critical Azure services were unavailable for an hour (Coralogix) Sept 2018: South Central US region was unavailable for over a day (The Register) Nov 2021: An outage that affected Home Depot, Snap, Spotify, and Etsy (Bloomberg)
  • 18. Outages hurt business performance 1 8 A data center or a region may be down for multiple hours–up to a day–based on hist Data Center has an outage The applications in that data center that run your business go of Mission-critical applications fail Customers are unable to place orders, discover products, receive service, etc. Customer Impact Revenue is lost directly from the inability to do business during downtime, and indirectly by damaging brand image and customer trust Financial/Reput a
  • 19. Failure Types 1 9 Transient Failures Permanent Failures (Data Loss) Transient failures in data-centers or clusters are common and worth protecting against for business continuity purposes. Regional outages are rare but still worth protecting against for mission critical systems. Outages are typically transient but occasionally permanent. Users accidentally delete topics, human error occurs. If your data is unrecoverable and mission critical, you need an additional complementary solution.
  • 20. Failure Scenarios Data-Center / Regional Outages Platform Failures Human Error Data-Centers have single points of failure associated with hardware resulting in associated outages. Regional Outages arise from failures in the underlying cloud provider. People delete topics, clusters and worse. Unexpected behaviour arise from standard operations and within the CI/CD pipeline. Load is applied unevenly or in short bursts by batch processing systems. Performance limitations arise unexpectedly. Bugs occur in Kafka, Zookeeper and associated systems.
  • 21. Cluster Linking & Schema Linking
  • 22. 22 Cluster Linking Cluster Linking, built into Confluent Platform and Confluent Cloud allows you to directly connect clusters together mirroring topics from one cluster to another. Cluster Linking makes it easier to build multi-cluster, multi-cloud, and hybrid cloud deployments. Active cluster Consumers Producers clicks clicks Topics DR cluster clicks clicks Mirror Topics Cluster Link Primary Region DR Region
  • 23. 23 Schema Linking Schema Linking, built into Schema Registry allows you to directly connect Schema Registry clusters together mirroring subjects or entire contexts. Contexts, introduced alongside Schema Linking allows you to create namespaces within Schema Registry which ensures mirrored subjects don’t run into schema naming clashes. Active cluster Consumers Producers clicks clicks Schemas DR cluster clicks clicks Mirror Schemas Schema Link Primary Region DR Region Consumers Producers
  • 24. 24 Prefixing Prefixing allows you to add a prefix to a topic and if desired the associated consumer group to avoid topic and consumer group naming clashes between the primary and Disaster Recovery cluster. This is important when used in an active-active setup and required to use a two way Cluster Link strategy which is the recommended approach. Active cluster Consumer-Group clicks clicks Topic DR cluster clicks clicks DR-topic Cluster Link Primary Region DR Region DR-Consumer-Group
  • 26. HA/DR Active-Passive 1. Steady state Setup ● The cluster link can automatically create mirror topics for any new topics on the active cluster ● Historical data is replicated & incoming data is synced in real-time Active cluster Consumers Producers clicks clicks topics DR cluster clicks clicks mirror topics Cluster Link Primary Region DR Region
  • 27. HA/DR Active-Passive 2. Failover 1. Detect a regional outage via metrics going to zero in that region; decide to failover 2. Call failover API on mirror topics to make them writable 3. Update DNS to point at DR cluster 4. Start clients in DR region Active cluster Consumers Producers clicks clicks topics DR cluster clicks clicks mirror topics failover REST API or CLI Consumers Producers Primary Region DR Region
  • 28. HA/DR Active-Passive 3. Fail forward The standard strategy is to “fail forward” promoting the DR region to be their new Primary Region: ● Cloud regions offer identical service ● They moved all of their applications & data systems to the DR region ● Failing back would introduce risk with little benefit To fail forward, simply: 1. Delete topics on original cluster (or spin up new cluster) 2. Establish cluster link in reverse direction Active DR cluster clicks clicks mirror topics DR Active cluster clicks clicks mirror topics Cluster Link Consumers Producers Primary DR Region DR Primary Region
  • 29. HA/DR Active-Passive 3. Failback (alternative) If you can’t fail forward and need to failback to the original region: 1. Delete topics on Primary cluster (or spin up a new cluster) 2. Establish a cluster link in the reverse direction 3. When Primary has caught up, migrate producers & consumers back: a. Stop clients b. promote mirror topic(s) c. Restart clients pointed at Primary cluster DR cluster clicks clicks mirror topics Consumers Producers Cluster Link Primary Region DR Region Primary cluster clicks clicks mirror topics
  • 30. Synced asynchronously HA/DR - Consumers must tolerate some duplicates Consumers must tolerate duplicate messages because Cluster Linking is asynchronous. Primary cluster Consumer X A B C D Topic Consumer X offset at time of outage DR cluster A B C D Mirror Topic Consumer X offset at time of failover ... ... A B C C D ... Consumes message C twice
  • 32. DR cluster “East” HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback 1. Steady state Setup For a topic named clicks ● We create duplicate topics on both the Primary and DR cluster ● Create prefixed cluster links in both directions ● Produce records to clicks on the Primary cluster ● Consumers consume from a Regex pattern Primary cluster “West” clicks Consumers .*clicks Producers clicks Add prefix west clicks clicks clicks west.clicks east.clicks Add prefix east
  • 33. DR cluster “East” HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback 2. Outage strikes! An outage in the primary region stops: ● Stops producers & consumers in primary region ● Temporarily pauses cluster link mirroring ● A small set of data may not have been replicated yet to the DR cluster – this is your “RPO” Primary cluster “West” clicks Consumers .*clicks Producers clicks clicks clicks clicks west.clicks east.clicks
  • 34. DR cluster “East” HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback 3. Failover To failover: ● Move consumers and producers to the DR cluster - keep the same topic names / regex ● Consumers consume both ○ Pre-failover data in west.clicks ○ Post-failover data in clicks ● Don’t delete the cluster link ● Disable clicks -> west.clicks offset replication Primary cluster “West” clicks Consumers .*clicks Producers clicks clicks clicks clicks west.clicks east.clicks
  • 35. DR cluster “East” HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback 4. Recovery If/when the outage is over: ● The primary-to-DR cluster link automatically recovers the lagged data (RPO) from the primary cluster Note: this data will be “late arriving” to the consumers ● New records generated to the DR cluster will automatically begin replicating to the primary Primary cluster “West” clicks Consumers .*clicks Producers clicks Recovers data Fails back data clicks clicks clicks west.clicks east.clicks
  • 36. DR cluster “East” HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback 5. Failback To failback to the primary region Consumers need to pick up at the end of the writable topics, so: ● Ensure that all consumer groups have 0 consumer lag for their DR topics e.g. west.clicks ● Reset all consumer offsets to the last offsets (LEO), this can be done by the platform operator Finally, move consumers & producers back to Primary ● Each producer / consumer group can be moved independently Primary cluster “West” clicks Consumers .*clicks Producers clicks Recovers data Fails back data clicks clicks clicks west.clicks east.clicks Reset consumers to resume here move move
  • 37. DR cluster “East” Primary cluster “West” clicks Recovers data Fails back data clicks clicks clicks west.clicks east.clicks HA/DR Bi-Directional Cluster Linking: Automatic Data Recovery & Failback 6. And beyond Re-enable clicks -> west.clicks consumer offset replication Once consumer lag is 0 on east.clicks, then reset all consumer groups to Log End Offset (last offset of the partition) on “clicks” on DR cluster Consumers .*clicks Producers clicks Reset consumers to resume here
  • 39. West cluster clicks east.clicks East cluster west.clicks clicks Consumers .*clicks Producers Add prefix west Add prefix east Consumers .*clicks Producers Applications / Web Traffic Load Balancer (example) Applications Applications 39 HA/DR Bi-Directional Cluster Linking: Active-Active 1. Steady state
  • 40. West cluster clicks east.clicks East cluster west.clicks clicks Consumers .*clicks Producers Add prefix west Add prefix east Consumers .*clicks Producers Applications / Web Traffic Load Balancer (example) Applications Applications 40 HA/DR Bi-Directional Cluster Linking: Active-Active 2. Outage strikes!
  • 41. West cluster clicks east.clicks East cluster west.clicks clicks Consumers .*clicks Producers Add prefix west Add prefix east Consumers .*clicks Producers Applications / Web Traffic Load Balancer (example) Applications Applications re-route 41 HA/DR Bi-Directional Cluster Linking: Active-Active 3. Failover
  • 42. West cluster clicks east.clicks East cluster west.clicks clicks Consumers .*clicks Producers Add prefix west Add prefix east Consumers .*clicks Producers Applications / Web Traffic Load Balancer (example) Applications Applications Any remaining pre-failure data is automatically recovered by the consumers re-route 42 HA/DR Bi-Directional Cluster Linking: Active-Active 4. Return to Steady State
  • 43. 43 Stretch Cluster A Stretch Cluster is ONE Kafka cluster that is “stretched” across multiple availability zones or data centers. Uses Kafka internal replication features to achieve RPO = 0 & low RTO.
  • 44. 3. Stretch Clusters & Multi-Region Cluster 44
  • 45. Stretch Cluster - Why? 45
  • 46. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 1. Steady State Setup ● Any unknown number of brokers represented here by brokers 1-4 spread across 2 DCs ● A standard three node Zookeeper cluster spread across 2 DCs DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 3
  • 47. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 1. Steady State… continued Setup ● Any unknown number of brokers represented here by brokers 1-4 spread across 2 DCs ● A standard three node Zookeeper cluster spread across 2 DCs ● We’ll also assume a replication-factor of 3, min.insync.replicas of 2 and acks=all DC “West” Replica 1 Replica 2 Zookeeper 1 Zookeeper 2 DC “East” Replica 3 Unused Broker Zookeeper 3
  • 48. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 2. DC Outage An outage in DC “West” ● … let’s start by just focusing on Kafka. DC “West” Replica 1 Replica 2 Zookeeper 1 Zookeeper 2 DC “East” Replica 3 Unused Broker Zookeeper 3
  • 49. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 2. DC Outage An outage in DC “West” ● Min.insync.replicas can no longer be met and we lose availability DC “West” Replica 1 Replica 2 Zookeeper 1 Zookeeper 2 DC “East” Replica 3 Unused Broker Zookeeper 3
  • 50. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 3. Fixing Broker Availability Increase to rf=4 ● Looks like we’ve solved our issue… DC “West” Replica 1 Replica 2 Zookeeper 1 Zookeeper 2 DC “East” Replica 3 Replica 4 Zookeeper 3
  • 51. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 3. Fixing Broker Availability… But Increase to rf=4 ● Looks like we’ve solved our issue… but, if our 2 replicas are down or out of sync then we lose availability unless we trigger an unclean leader election and accept data loss. DC “West” Replica 1 Replica 2 Zookeeper 1 Zookeeper 2 DC “East” Replica 3 Out of Sync Replica 4 Out of Sync Zookeeper 3
  • 52. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 4. Fixing Data Loss Increase to min.insync.replicas to 3 ● Consumers continue to operate ● Producers continue to operate once we revert to min.insync.replicas=2 DC “West” Replica 1 Replica 2 Zookeeper 1 Zookeeper 2 DC “East” Replica 3 Replica 4 Out of Sync Zookeeper 3
  • 53. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 4. Fixing Data Loss… But What About Zookeeper? DC “West” Replica 1 Replica 2 Zookeeper 1 Zookeeper 2 DC “East” Replica 3 Replica 4 Out of Sync Zookeeper 3
  • 54. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 4. Fixing Data Loss… But What About Zookeeper? DC “West” Zookeeper 1 Zookeeper 2 DC “East” Zookeeper 3 Broker 1 Broker 2 Broker 3 Broker 4
  • 55. Stretch Cluster: Non-Stretch Cluster Cluster Behaviour 4. Fixing Data Loss… But What About Zookeeper? DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 3
  • 56. Stretch Cluster - 2 DC 56
  • 57. Stretch Cluster: 2 DC + Observer 1. Steady State Setup ● A minimum of 4 brokers ● 6 Zookeeper nodes, one of which is an observer ● Replication factor of 4, min.insync.replicas of 3 and acks=all DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6 (Observer)
  • 58. Stretch Cluster: 2 DC + Observer 2. DC Outage - On observer DC An outage in DC “East” ● Consumers continue to operate ● Producers continue to operate once we revert to min.insync.replicas=2 DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6 (Observer)
  • 59. Stretch Cluster: 2 DC + Observer 3. DC Outage - On non-observer DC An outage in DC “West” ● We can’t reach Zookeeper quorum! DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6 (Observer)
  • 60. Stretch Cluster: 2 DC + Observer 3. DC Outage - On non-observer DC… but An outage in DC “West” ● We promote the Zookeeper observer to a full follower ● Remove Zookeeper 1, 2 & 3 from quorum list ● Perform rolling restart of Zookeeper nodes DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6
  • 61. Stretch Cluster: 2 DC + Observer 3. DC Outage - On non-observer DC An outage in DC “West” ● Consumers continue to operate ● Producers continue to operate once we revert to min.insync.replicas=2 DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6
  • 62. Stretch Cluster: 2 DC + Observer 4. Network Partition A network partition occurs between DCs ● Consumers continue to operate as usual up until they’ve consumed all fully replicated data ● Producer will fail as we can no longer meet min.insync.replicas=3 DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6 (Observer)
  • 63. Stretch Cluster: 2 DC + Observer 5. Fixing Network Partition A network partition occurs between DCs ● We manually shutdown DC “East” then update min.insync.replicas=2 ● Clients resume operating as normal ● Consumers failing over from DC “East” will consume some duplicate records DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6 (Observer)
  • 64. 64 Observer Risk! Zookeeper observers solve our availability and split- brain issues but risk data loss! DC “West” Zookeeper Leader Zookeeper Follower Zookeeper Follower DC “East” Zookeeper Follower (Out of Sync) Zookeeper Follower (Out of Sync) Zookeeper Observer (Out of Sync) Quorum
  • 65. 65 Hierarchical Quorum Hierarchical Quorum involves getting consensus between multiple Zookeeper “groups” which each form their own quorum. In the case of two DC hierarchy, consensus must be reached between BOTH DCs. DC “West” Zookeeper 1 (Leader) Zookeeper 2 Zookeeper 3 DC “East” Zookeeper 4 Zookeeper 5 Zookeeper 6 Quorum
  • 66. Stretch Cluster: 2 DC + Hierarchical Quorum 1. Steady State Setup ● A minimum of 4 brokers ● 6 Zookeeper nodes, arranged into two groups ● Replication factor of 4, min.insync.replicas of 3 and acks=all DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6
  • 67. Stretch Cluster: 2 DC + Hierarchical Quorum 2. DC Outage An outage in DC “East” ● Consumers continue to operate for leaders on DC “West” ● Leaders can’t be elected and configuration updates can’t be made until we have hierarchical quorum DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6
  • 68. Stretch Cluster: 2 DC + Hierarchical Quorum 3. DC Outage An outage in DC “East” ● Remove DC “East” Zookeeper group from hierarchy ● Revert to min.insync.replicas=2 DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6
  • 69. Stretch Cluster: 2 DC + Hierarchical Quorum 4. Network Partition A network partition occurs between DCs ● Consumers continue to operate as usual up until they’ve consumed all fully replicated data ● Producer will fail as we can no longer meet min.insync.replicas=3 DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6
  • 70. Stretch Cluster: 2 DC + Hierarchical Quorum 5. Fixing Network Partition A network partition occurs between DCs ● We manually shutdown DC “East”, remove from the hierarchy & update min.insync.replicas=2 ● Clients resume operating as normal ● Consumers failing over from DC “East” will consume some duplicate records DC “West” Broker 1 Broker 2 Zookeeper 1 Zookeeper 2 DC “East” Broker 3 Broker 4 Zookeeper 4 Zookeeper 5 Zookeeper 3 Zookeeper 6
  • 71. Stretch Cluster - 2.5 DC 71
  • 72. Stretch Cluster: 2.5 DC 1. Steady State Setup ● A minimum of 4 brokers ● 3 Zookeeper nodes ● Replication factor of 4, min.insync.replicas of 3 and acks=all ● Note: It’s actually better for the DC’s with brokers to be closest DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 3 Broker 4 Zookeeper 3 DC “Central” Zookeeper 2
  • 73. Stretch Cluster: 2.5 DC 2. DC Outage An outage in DC “West” ● Consumers continue to operate ● Producers continue to operate once we revert to min.insync.replicas=2 DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 3 Broker 4 Zookeeper 3 DC “Central” Zookeeper 2
  • 74. Stretch Cluster: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● Consumers connected to DC “East” continue to operate DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 3 Broker 4 Zookeeper 3 DC “Central” Zookeeper 2
  • 75. Stretch Cluster: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● Consumers connected to DC “West” continue to operate until they’ve processed all fully replicated records DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 3 Broker 4 Zookeeper 3 DC “Central” Zookeeper 2
  • 76. Stretch Cluster: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● Producers connected to DC “East” continue to operate once we revert to min.insync.replicas=2 DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 3 Broker 4 Zookeeper 3 DC “Central” Zookeeper 2
  • 77. Stretch Cluster: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● Producers connected to DC “West” continue to operate once we shutdown DC “West”, failover and revert to min.insync.replicas=2 DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 3 Broker 4 Zookeeper 3 DC “Central” Zookeeper 2
  • 79. 79 Multi-Region Clusters: Followers vs Observers Followers are normal replicas, however, observers act the same except that they are not considered for acks=all produce requests. DC “West” Producers Follower Synchronous Leader DC “East” Observer Asynchronous
  • 80. 80 Multi-Region Clusters: Automatic Observer Promotion As of Confluent Platform v6.1 observers can be configured to be promoted to meet the ObserverPromotionPolicy, including: ● Under-min-isr: Promoted if in-sync replica size drops below min.insync.replicas ● Under-replicated: Promoted to cover any replica which is no longer insync ● Leader-is-observe: Promoted if the current leader is an observer DC “West” Producers Follower Synchronous Leader DC “East” Follower Asynchronous
  • 81. Multi-Region Clusters: 2.5 DC 1. Steady State Setup ● A minimum of 6 brokers ● 3 Zookeeper nodes ● Replication factor of 4, 2 additional observers, min.insync.replicas of 3 and acks=all DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 4 Broker 5 Zookeeper 3 DC “Central” Zookeeper 2 Broker 3 (Observer1) Broker 6 (Observer2)
  • 82. Multi-Region Clusters: 2.5 DC 2. DC Outage An outage in DC “West” ● The Observer in DC “East” is promoted ● Consumers and Producers continue to operate as usual ● RPO = 0 ● RTO ~ 0 DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 4 (Replica 3) Broker 5 (Replica 4) Zookeeper 3 DC “Central” Zookeeper 2 Broker 3 (Observer1) Broker 6 (Replica 5)
  • 83. Multi-Region Clusters: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● The Observer in DC “East” is promoted ● Consumers and Producers connected to DC “East” continue to operate as usual DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 4 (Replica 3) Broker 5 (Replica 4) Zookeeper 3 DC “Central” Zookeeper 2 Broker 3 (Observer1) Broker 6 (Replica 5)
  • 84. Multi-Region Clusters: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● The Observer in DC “West” cannot be promoted as it has no Zookeeper Quorum DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 4 (Replica 3) Broker 5 (Replica 4) Zookeeper 3 DC “Central” Zookeeper 2 Broker 3 (Observer 1) Broker 6 (Replica 5)
  • 85. Multi-Region Clusters: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● Consumers connected to DC “West” continue to operate until they’ve processed all fully replicated records. Once we shutdown DC “West” the consumers will failover and consume from the same point. This will result in duplicate consumption DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 4 (Replica 3) Broker 5 (Replica 4) Zookeeper 3 DC “Central” Zookeeper 2 Broker 3 (Observer 1) Broker 6 (Replica 5)
  • 86. Multi-Region Clusters: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● Producers connected to DC “West” fail as we can no longer meet min.insync.replicas=3 DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 4 (Replica 3) Broker 5 (Replica 4) Zookeeper 3 DC “Central” Zookeeper 2 Broker 3 (Observer 1) Broker 6 (Replica 5)
  • 87. Multi-Region Clusters: 2.5 DC 3. DC Network Partition A network partition in DC “West” ● To continue operating as normal we must manually shutdown DC “West DC “West” Broker 1 Broker 2 Zookeeper 1 DC “East” Broker 4 (Replica 3) Broker 5 (Replica 4) Zookeeper 3 DC “Central” Zookeeper 2 Broker 3 (Observer 1) Broker 6 (Replica 5)
  • 88. Stretch Cluster - 3 DC (ICG EAP KaaS) 88
  • 89. Multi-Region Clusters: 3 DC 1. Steady State Setup ● 9 brokers from which 3 brokers from MWDC stores only observers ● 5 Zookeeper nodes ● Replication factor of 4, 1 additional observers, min.insync.replicas of 3 and acks=all DC “MW” Broker 1 (Observer) Broker 2 (Observer) Zookeeper 1 DC “NJ” Broker 7 Broker 8 Zookeeper 4 DC “NY” Zookeeper 2 Broker 3 (Observer) Broker 9 Broker 4 Broker 5 Broker 6 Zookeeper 3 Zookeeper 4
  • 90. Multi-Region Clusters: 3 DC 2. DC Outage An outage in DC “NY” ● The Observers in DC “MW” is promoted ● Consumers and Producers continue to operate as usual ● RPO = 0 ● RTO ~ 0 DC “MWDC” Broker 1 (Replica) Broker 2 (Replica) Zookeeper 1 DC “NJ” Broker 7 Broker 8 Zookeeper 4 DC “NY” Zookeeper 2 Broker 3 (Replica) Broker 9 Broker 4 Broker 5 Broker 6 Zookeeper 3 Zookeeper 4
  • 91. Multi-Region Clusters: 3 DC 3. DC Network Partition A network partition in DC “NJ” ● The Observer in DC “MW” is promoted ● Consumers and Producers connected to DC “NY” continue to operate as usual DC “MW” Broker 1 (Replica) Broker 2 (Replica) Zookeeper 1 DC “NJ” Broker 7 Broker 8 Zookeeper 4 DC “NY” Zookeeper 2 Broker 3 (Replica) Broker 9 Broker 4 Broker 5 Broker 6 Zookeeper 3 Zookeeper 4
  • 92. Multi-Region Clusters: 3 DC 3. DC Network Partition A network partition in DC “NJ” ● Consumers connected to DC “NJ” continue to operate until they’ve processed all fully replicated records. Once we shutdown DC “NJ” or application is restarted, the consumers will failover and consume from the same point. This will result in duplicate consumption DC “MW” Broker 1 (Replica) Broker 2 (Replica) Zookeeper 1 DC “NJ” Broker 7 Broker 8 Zookeeper 4 DC “NY” Zookeeper 2 Broker 3 (Replica) Broker 9 Broker 4 Broker 5 Broker 6 Zookeeper 3 Zookeeper 4
  • 93. Multi-Region Clusters: 3 DC 3. DC Network Partition A network partition in DC “NJ” ● Producers connected to DC “NJ” fail as we can no longer meet min.insync.replicas=3 ● Once application is restarted, the producers will failover and produce the data connecting to DC “NY” / DC “MW” DC “MW” Broker 1 (Replica) Broker 2 (Replica) Zookeeper 1 DC “NJ” Broker 7 Broker 8 Zookeeper 4 DC “NY” Zookeeper 2 Broker 3 (Replica) Broker 9 Broker 4 Broker 5 Broker 6 Zookeeper 3 Zookeeper 5
  • 94. Multi-Region Clusters: 3 DC 3. DC Network Partition A network partition in DC “NJ” ● To continue operating as normal we must manually shutdown DC “NJ” DC “MW” Broker 1 (Replica) Broker 2 (Replica) Zookeeper 1 DC “NJ” Broker 7 Broker 8 Zookeeper 4 DC “NY” Zookeeper 2 Broker 3 (Replica) Broker 9 Broker 4 Broker 5 Broker 6 Zookeeper 3 Zookeeper 5
  • 96. Comparison 96 Supported Cluster Linking Stretch Cluster / Multi-Region Cluster Replicator / MirrorMaker 2 RPO=0 ✓ RTO=~0 ✓ ✓ ✓ Active-Active ✓ ✓ ✓ Failover With All Clients ✓ ✓ Failover With Transactions ✓ ✓ Failover Maintains Record Ordering ✓ ✓ Smooth Failback ✓ ✓ Handles Full Cluster Failure ✓ ✓ Hybrid Cloud / Multi-Cloud ✓ ✓ Open Source ✓* ✓* Preserves Metadata ✓ ✓ ✓*

Hinweis der Redaktion

  1. Today we are going to go through a comparison of disaster recovery solutions.
  2. So, a quick summary of our agenda: We’ll discuss what disaster recovery solutions are and why we should use them; We’ll discuss cluster linking & schema linking as a complementary component; We’ll cover Stretch Clusters and multi-region clusters as an extension; We’ll go into S3 backup and restore as a complementary solution; We’ll cover legacy solutions including Replicator and Mirrormaker2; and Finally, we’ll wrap up with a summary.
  3. So, let’s start with what disaster recovery solutions are and why you should use them.
  4. So, let’s start with what disaster recovery solutions are and why you should use them.
  5. Let’s start with why you should care. I’ve included here five examples of regional outages across the three largest cloud providers. It should go without saying, but if these incidents occurs on services like AWS, Azure and GCP then we should assume they can happen to you to.
  6. So, why does this matter? Simply put, outages hurt business performance! As an example of this: Say you experience a cloud region outage - This service may be down for multiple hours, up to a day based on historical experience; Then mission-critical applications will fail - The applications in that region that run your business go offline; As such customer are impacted - Customers are unable to place orders, discover products, receive service, etc; and Ultimately, we are impacted financially - Revenue is lost directly from the inability to do business during downtime, and indirectly by damaging brand image and customer trust.
  7. Now that we know why we want to avoid them, let’s review the types of failures we need to consider: First, we have transient failures - Transient failures refer to disaster scenarios which involve a temporary failure where part or the entirety of the platform go offline, but, no records or information is necessarily lost forever; Second, we have permanent failures - Permanent failures are characterized by their associated data loss due to hardware failures or human error.
  8. There are an endless number of failure scenarios but they broadly fall under: Data-center & regional outages - These occur due to a range of hardware, infrastructure and cloud provider issues resulting in both transient and permanent failures. However, they only impact services running within the zone of impact; Platform failures - These arise from a range of issues such as batch processing systems overwhelming the Kafka cluster and bugs in Kafka or third party software. The key issue here is that transient or permanent, the entire Kafka cluster may be impacted and occasionally this failure will propagate between Kafka clusters; and Human error - People make mistakes and design choices in our software can have broad and unexpected consequences. At best these cause transient outages. At worst they cause permanent data loss across our Kafka clusters.
  9. Generally Available (GA) since Confluent Platform v7 Cluster Linking is Confluent’s preferred and long term supported solution to Disaster Recovery. Schema Linking is a complementary feature Generally Available since Confluent Platform v7.1.
  10. Cluster Linking, built into Confluent Platform and Confluent Cloud allows you to directly connect clusters together mirroring topics from one cluster to another. Cluster Linking works by replicating topics byte for byte including records with their associated offsets, consumer groups with their associated offsets, topics with their associated configuration and ACLs. Some key features Cluster Linking supports are: Low Recovery Time Objectives, effectively RTO=0; It supports active-active setups, so you can make use of both clusters and failover in whichever direction is required; Failover support for all clients including Librdkafka based clients; Smooth failback after disaster recovery scenarios; The ability to support multi-cloud and hybrid cloud setups; As described it preserves metadata including offsets, consumer groups, topics and ACLs; As it copies data byte for byte it avoids decompression and recompression, saving significant resources; and As it’s built into CP Server you don’t need any additional components like Kafka Connect. Some of the challenges associated with using Cluster Linking are: The nature of asynchronous replication means that it cannot support RTO=0; Presently the DR cluster is unaware of transactions meaning it cannot cancel a transaction mid way through processing during a disaster recovery scenario; and To support high availability you must accept a small to moderate breach of record ordering guarantees.
  11. Schema Linking, built into Schema Registry allows you to directly connect Schema Registry clusters together mirroring subjects or entire contexts. Contexts, introduced alongside Schema Linking allow you to create namespaces within Schema Registry which ensures mirrored subjects don’t run into schema naming clashes. Schema Linking complements Cluster Linking by allowing you to copy all associated schemas alongside the data without needing a centralised Schema Registry which would otherwise be the case.
  12. Prefixing allows you to add a prefix to a topic and if desired the associated consumer group to avoid topic and consumer group naming clashes between the primary and DR cluster. This is important when used in an active-active setup and required to use a two way Cluster Link strategy which is the recommended approach and we’ll go into this later.
  13. So, how does this work in practice? Let’s start by discussing a standard active-passive setup
  14. First, starting with a standard cluster and an empty DR cluster we create our Cluster Linking rules, which can be used to copy any current or new topics matching our criteria. And, will replicate historic data as well as sync all future data in real time.
  15. Now, we experience our hypothetical failure of our primary cluster. During this scenario our monitoring alerts us of the failure and we manually (ideally scripted) trigger a failover using the REST API or CLI. During this, we then update our DNS entry to redirect all clients to the Disaster Recovery cluster and either restart our clients or wait for them to reconnect.
  16. A standard strategy is to “fail forward” promoting the DR region to be their new Primary Region, this is because: Cloud regions offer identical service; They already moved all of their applications & data systems to the DR region; and Failing back would introduce risk with little benefit. To fail forward, simply: Delete topics on original cluster (or spin up new cluster) Establish cluster link in reverse direction You may optionally implement a solution to retrieve the subset of data which had not yet been replicated at the time of the original failover.
  17. If it is required that you failback to the original cluster the solution is to wipe the original cluster, cluster link back until you’re syncronised and failover again back to the original cluster.
  18. As cluster linking is asynchronous it means that the consumer offsets which describe which records the consumer has processed may not yet have been replicated across at the time of failover. This may result in the consumer reprocessing these records when it fails over to the DR cluster.
  19. Bi-directional Cluster Linking is an alternative which vastly simplifies your DR strategy.
  20. Let’s start with our steady state, we: Create duplicate topics on both the Primary and DR cluster; Create prefixed cluster links in both directions; Produce records to clicks on the Primary cluster; and Consume records from both all varients of the clicks topic from the Primary cluster using a regex pattern. Note, at this point data is being generated to the clicks topic in the primary cluster and replicated to west.clicks in the DR cluster. However, no data is being produced to the clicks topic in the DR cluster.
  21. Now, an outage occurs impacting the primary region, this results in: The producers and consumers go down in the primary region; and The cluster links are temporarily paused. It’s important to note here that a small amount of data has not yet been replicated to your DR cluster at this point.
  22. To recover, we update our DNS entry to redirect all clients to the Disaster Recovery cluster and either restart our clients or wait for them to reconnect. It’s important to specify that we don’t change topic name or regex during this failover. The consumers will continue to consume pre-failover data in west.clicks and post-failover data in clicks, both from the DR cluster. It’s important to note that unlike the prior strategy we don’t delete the cluster links. You’ll also need to temporarily disable offset replication from clicks to west.clicks. This will stop the stale consumer offsets overwriting the new consumer offsets when the Primary Cluster is brought back online.
  23. When the outage is over we automatically recover the records which had yet to be replicated. This means that although the records will arrive out of order, assuming the primary cluster eventually recovers, your RPO=0. New records generated to the DR cluster will also automatically begin replicating to the primary.
  24. To failback to the primary region. Consumers need to pick up at the end of the writable topics, so: Ensure that all consumer groups have 0 consumer lag for their DR topics e.g. west.clicks Reset all consumer offsets to the last offsets (LEO), this can be done by the platform operator Finally, move consumers & producers back to Primary cluster: Each producer / consumer group can be moved independently
  25. Now that we’ve moved our consumers back to the Primary Cluster we can re-enable consumer offset replication between clicks and west.clicks. Once consumer Once consumer lag is 0 on east.clicks, then reset all consumer groups to Log End Offset (last offset of the partition) on “clicks” on DR cluster.
  26. Bi-directional Cluster Linking is an alternative which vastly simplifies your DR strategy.
  27. Bi-Directional Cluster Linking easily translates to an active-active strategy as seen here where we use a load balancer to spread load across the clusters.
  28. Again, we see an outage which brings down our West cluster (formerly considered our primary cluster).
  29. Now we utilise our load balancer to re-route traffic to our East cluster.
  30. Finally once our West cluster recovers we re-route traffic back to it.
  31. Looking at Stretch Clusters A Stretch Cluster works by splitting a cluster over more than one data center. Some key features Stretch Clusters supports are: RPO=0 and low RTO; As it’s a single cluster it doesn’t require failover / failback or syncing of metadata; and For the same reasons it supports all clients as well as transactions and maintains record ordering. Some of the challenges associated with using a Stretch Cluster are: It requires low latency connections between data centers. We recommend 50ms, but it can support up to 100ms; It increases end-to-end latency; and It doesn’t protect against certain types of failures, such as: Full cluster failures; or Deleting topics. There are a few different approaches to implementing Stretch Clusters and we’ll review them now.
  32. Let’s start with discussing what problem Stretch Clusters are solving.
  33. Let’s start with discussing what problem Stretch Clusters are solving.
  34. Let’s start with our steady state which is: An unknown number of brokers represented here by brokers one through four spread across two DCs; and A standard three node Zookeeper cluster spread across two DCs.
  35. We’ll also assume a replication-factor of three, min.insync.replicas of two and acks=all.
  36. Now, let’s look at what happens when DC “East” fails. But, let’s start by just focusing on Kafka.
  37. First, although we have no data-loss, we now are no longer able to meet min.insync.replicas of two, so we lose availability. We could drop min.insync.replicas to one but then if another broker failed during this period we would lose data.
  38. To resolve this, the immediate solution is to increase replication factor to four. Now when we lose the first two replicas we still have two replicas available to meet the min.insync.replicas requirement.
  39. But, if our 2 replicas are down or out of sync then we lose availability unless we trigger an unclean leader election and accept data loss.
  40. To resolve this, we increase min.insync.replicas to three and in the case of a failure scenario we roll back to min.insync.replicas of two.
  41. But… what about Zookeeper?
  42. We were ignoring it as a factor but now that we’ve solved the Kafka component of the issue we need to consider Zookeepers behaviour as well.
  43. For a Zookeeper cluster to be considered available it needs a minimum of (n/2 + 1) nodes available. This allows it to achieve quorum and which is required to elect a leader or commit writes. Zookeeper has two out of three nodes unavailable and as such cannot form quorum and is now offline.
  44. This is where our first Stretch Cluster design comes into play.
  45. Here, our steady state is: An unknown number of brokers represented here by brokers one through four spread across two DCs; Six Zookeeper nodes with three in each DCs of which one is an observer; and Replication factor of 4, min.insync.replicas of 3 and acks=all.
  46. Now, let’s assume DC “East” which has our Zookeeper observer has an outage. We still have three copies of the Zookeeper data allowing us to reach a “degraded” quorum. Consumers continue to operate as normal and producers continue to operate once we revert to min.insync.replicas=2.
  47. Now, let’s assume DC West has an outage. We can no longer reach Zookeeper quorum so new leaders can’t be elected.
  48. But… we can modify Zookeeper six’s configuration to change it to a standard follower. We must also update Zookeeper four, five and six’s configuration to remove Zookeeper one, two and three from the list of quorum participants and perform a rolling restart so they receive the new configuration.
  49. Now consumers continue to operate. Producers continue to operate once we revert to min.insync.replicas=2.
  50. Now let’s assume a network partition arises. Consumers continue to operate as usual up until they’ve consumed all fully replicated data. Producer will fail as we can no longer meet min.insync.replicas=3.
  51. We manually shutdown DC “East” then update min.insync.replicas=2. Clients resume operating as normal. Consumers failing over from DC “East” will consume some duplicate records.
  52. It’s important to raise the risk run with using Zookeeper observers. While they solve our availability and split-brain issues, they risk data loss in the unlikely scenario that the data-center with the observer is out of sync at the time the data-center without one fails.
  53. Our second option is to use hierarchical quorum. Hierarchical quorum involves getting consensus between multiple Zookeeper “groups” which each form their own quorum.
  54. Here, our steady state is: An unknown number of brokers represented here by brokers one through four spread across two DCs; Six Zookeeper nodes with three in each DCs and each DC representing a Zookeeper group in the hierarchy; and Replication factor of 4, min.insync.replicas of 3 and acks=all.
  55. Now, let’s assume a DC outage on DC “East” We still have three copies of the Zookeeper data, however, we only have consensus from one group, meaning we no longer meet the requirement for hierarchical quorum Consumers continue to operate for leaders on DC “West”, but new leaders can’t be elected on DC “West” and configuration updates can’t be made until we have hierarchical quorum
  56. To resolve this we must remove the DC “East” Zookeeper group from hierarchy then update min.insync.replicas to two.
  57. Now let’s assume a network partition arises. Consumers continue to operate as usual up until they’ve consumed all fully replicated data. Producer will fail as we can no longer meet min.insync.replicas=3 on either DC.
  58. We manually shutdown DC “East”, remove from the hierarchy & update min.insync.replicas=2 Clients now resume operating as normal, but, consumers failing over from DC “East” will consume some duplicate records.
  59. Next, let’s review the “gold standard” for Stretch Clusters which is 2.5 DCs.
  60. Just like our previous solution we use a minimum of four brokers split across two DCs, replication factor of four, min.insync.replicas of three and acks all. Where this solution differs is that instead of using Zookeeper observers, we instead spread the Zookeeper nodes across three DC’s, ideally the DC’s containing brokers should be located closter together. This ensure that under both a single network partition, or DC failure we still always have exactly one leader.
  61. We still have two copies of the Zookeeper data allowing us to reach a “degraded” quorum. Consumers continue to operate Producers continue to operate once we revert to min.insync.replicas=2
  62. Now, let’s assume DC “West” becomes network partitioned from the rest of the cluster. Consumers connected to DC “East” continue to operate.
  63. Consumers connected to DC “West” continue to operate until they’ve processed all fully replicated records.
  64. Producers connected to DC “East” continue to operate once we revert to min.insync.replicas=2.
  65. Producers connected to DC “West” continue to operate once we shutdown DC “West”, failover and revert to min.insync.replicas=2.
  66. Let’s looks at how Multi-Region Clusters can enhance a standard Stretch Cluster.
  67. Followers are normal replicas, however, observers act the same except that they are not considered for acks=all produce requests.
  68. As of CP v6.1 observers can be configured to be promoted to meet the ObserverPromotionPolicy, including: Under-min-isr: Promoted if in-sync replica size drops below min.insync.replicas Under-replicated: Promoted to cover any replica which is no longer insync Leader-is-observe: Promoted if the current leader is an observer
  69. In this example we extend our 2.5 DC Stretch Cluster with an additional Observer on either of the two primary DCs. Just like out previous solution we use a minimum of four brokers split across two DCs, replication factor of four, min.insync.replicas of three and acks all. Where this solution differs is that instead of using a hierarchical Zookeeper cluster, we instead spread the Zookeeper nodes across three DC’s, ideally the DC’s containing brokers should be located closter together. This ensure that under both a single network partition, or DC failure we still always have exactly one leader.
  70. In the case of an outage on DC “West” our Observer on DC “East” gets promoted to a full fledge replica and as such continue to meet min.insync.replicas of 3 and operate as usual.
  71. In the case of a network partition which separates DC “West” from DC “Central” and DC “East” the Observer in DC “East” is promoted to a full replica and all clients connected to it continue to operate as normal.
  72. The Observer in DC “West” cannot be promoted as it has no Zookeeper Quorum.
  73. Consumers connected to DC “West” continue to operate until they’ve processed all fully replicated records. Once we shutdown DC “West” the consumers will failover and consume from the same point. This will result in duplicate consumption.
  74. Producers connected to DC “West” fail as we can no longer meet min.insync.replicas=3.
  75. To continue operating as normal we must manually shutdown DC “West.
  76. Next, let’s review the “gold standard” for Stretch Clusters which is 2.5 DCs.
  77. In this example we extend our 2.5 DC Stretch Cluster with an additional Observer on either of the two primary DCs. Just like out previous solution we use a minimum of four brokers split across two DCs, replication factor of four, min.insync.replicas of three and acks all. Where this solution differs is that instead of using a hierarchical Zookeeper cluster, we instead spread the Zookeeper nodes across three DC’s, ideally the DC’s containing brokers should be located closter together. This ensure that under both a single network partition, or DC failure we still always have exactly one leader.
  78. In this example we extend our 2.5 DC Stretch Cluster with an additional Observer on either of the two primary DCs. Just like out previous solution we use a minimum of four brokers split across two DCs, replication factor of four, min.insync.replicas of three and acks all. Where this solution differs is that instead of using a hierarchical Zookeeper cluster, we instead spread the Zookeeper nodes across three DC’s, ideally the DC’s containing brokers should be located closter together. This ensure that under both a single network partition, or DC failure we still always have exactly one leader.
  79. In this example we extend our 2.5 DC Stretch Cluster with an additional Observer on either of the two primary DCs. Just like out previous solution we use a minimum of four brokers split across two DCs, replication factor of four, min.insync.replicas of three and acks all. Where this solution differs is that instead of using a hierarchical Zookeeper cluster, we instead spread the Zookeeper nodes across three DC’s, ideally the DC’s containing brokers should be located closter together. This ensure that under both a single network partition, or DC failure we still always have exactly one leader.
  80. In this example we extend our 2.5 DC Stretch Cluster with an additional Observer on either of the two primary DCs. Just like out previous solution we use a minimum of four brokers split across two DCs, replication factor of four, min.insync.replicas of three and acks all. Where this solution differs is that instead of using a hierarchical Zookeeper cluster, we instead spread the Zookeeper nodes across three DC’s, ideally the DC’s containing brokers should be located closter together. This ensure that under both a single network partition, or DC failure we still always have exactly one leader.
  81. In this example we extend our 2.5 DC Stretch Cluster with an additional Observer on either of the two primary DCs. Just like out previous solution we use a minimum of four brokers split across two DCs, replication factor of four, min.insync.replicas of three and acks all. Where this solution differs is that instead of using a hierarchical Zookeeper cluster, we instead spread the Zookeeper nodes across three DC’s, ideally the DC’s containing brokers should be located closter together. This ensure that under both a single network partition, or DC failure we still always have exactly one leader.
  82. In this example we extend our 2.5 DC Stretch Cluster with an additional Observer on either of the two primary DCs. Just like out previous solution we use a minimum of four brokers split across two DCs, replication factor of four, min.insync.replicas of three and acks all. Where this solution differs is that instead of using a hierarchical Zookeeper cluster, we instead spread the Zookeeper nodes across three DC’s, ideally the DC’s containing brokers should be located closter together. This ensure that under both a single network partition, or DC failure we still always have exactly one leader.
  83. Finally, we’ll wrap up with a quick summary and recommendations.
  84. So, I’ll leave this here for you to review in your own time, however, the key details are: Cluster Linking should be considered the default solution; Stretch Clusters should be utilised when RPO=0 is required and cross DC latency is acceptable; and Mirrormaker can be used if an open source solution is mandatory but is highly advised against.
  85. Thanks for listening.