Two years ago, we helped to contribute a framework for exactly once semantics (or EOS) to Apache Kafka. This much-needed feature brought transactional guarantees to stream processing engines such as Kafka Streams. In this talk, we will recount the journey since then and the lessons we have learned as usage has gradually picked up steam. What did we get right and what did we get wrong? Most importantly, we will discuss how the work is continuing to evolve in order to provide more reliability and better performance. This talk assumes basic familiarity with Kafka and the log abstraction. What you will get out of it is a deeper understanding of the underlying architecture of the EOS framework in Kafka, what its limitations are, and how you can use it to solve problems.
43. 43
Output (O)
A’ B’
ongoing ongoing
(O) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
44. 44
Output (O)
2
A’ B’
ongoing ongoing
(O) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
45. 45
Output (O)
2
A’ B’
ongoing ongoing
prepare
commit
(O) (O, P) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
46. 46
Output (O)
2
A’ B’
ongoing ongoing
prepare
commit
(O) (O, P) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
47. 47
Output (O)
2
A’ B’
ongoing ongoing
prepare
commit
(O) (O, P) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
48. 48
Output (O)
2
A’ B’
ongoing ongoing
prepare
commit
done
commit
(O) (O, P) (O, P) ()
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
51. 51
Output (O)
2 5
A’ B’ C’ D’ E’
done
commit
ongoing ongoing
() (O) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
52. 52
Output (O)
2 5
A’ B’ C’ D’ E’
done
commit
ongoing ongoing
() (O) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
Timeout
AbortTxn
53. 53
Output (O)
2 5
A’ B’ C’ D’ E’
done
commit
ongoing ongoing
prepare
abort
() (O) (O, P) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
Timeout
AbortTxn
54. 54
Output (O)
2 5
A’ B’ C’ D’ E’
done
commit
ongoing ongoing
prepare
abort
() (O) (O, P) (O, P)
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
Timeout
AbortTxn
55. 55
Output (O)
2 5
A’ B’ C’ D’ E’
done
commit
ongoing ongoing
prepare
abort
finish
abort
() (O) (O, P) (O, P) ()
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
Timeout
AbortTxn
56. 56
Output (O)
2 5
A’ B’ C’ D’ E’
done
commit
ongoing ongoing
prepare
abort
finish
abort
() (O) (O, P) (O, P) ()
Status
Partitions
A B C D E
Input
Position (P)
Transaction Log
AddPartition
Write
BeginTxn
CommitTxn
Timeout
AbortTxn
62. 62
OutputA B C
Input
D E F
G H I
“Single writer” does not mean that an
output partition has only one writer.
Processor 1
Processor 2
Processor 3
63. 63
OutputA B C
Input
D E F
G H I
The guarantee we need is that there is a
single writer tied to each input partition.
Processor 1
Processor 2
Processor 3
68. 68
Processor 2
A B C D E F
A’
Input
Output
1
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
69. 69
Processor 2
A B C D E F
A’ B’ C’
Input
Output
1 3
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
70. 70
Processor 2
A B C D E F
A’ B’ C’
Input
Output
1 3
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Transaction committing but
not complete
71. 71
Processor 2
A B C D E F
A’ B’ C’
Input
Output
1 3
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Transaction committing but
not complete
72. 72
Processor 2
A B C D E F
A’ B’ C’
Input
Output
1 3
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Input partition reassigned
to processor 2
73. 73
Processor 2
A B C D E F
A’ B’ C’ B’
Input
Output
1 3 2
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Processor 2 reads latest
committed position of 1
74. 74
Processor 2
A B C D E F
A’ B’ C’ B’
Input
Output
1 3 2
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Transaction from processor
1 completes
76. 76
Processor 1’
A B C D E F
A’ B’ C’
Input
Output
1
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Processor 1 has an
ongoing transaction
77. 77
Processor 1’
A B C D E F
A’ B’ C’
Input
Output
1
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Processor 1 is partitioned
from the cluster
78. 78
Processor 1’
A B C D E F
A’ B’ C’
Input
Output
1
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Partition reassigned to
processor 2
79. 79
Processor 1’
A B C D E F
A’ B’ C’
Input
Output
1
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Transaction is aborted
80. 80
Processor 1’
A B C D E F
A’ B’ C’ B’
Input
Output
1 2
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
81. 81
Processor 1’
A B C D E F
A’ B’ C’ B’
Input
Output
1 2
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
Processor 1 is able to
communicate again
82. 82
Processor 1’
A B C D E F
A’ B’ C’ B’ D’
Input
Output
1 2
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
83. 83
Processor 1’
A B C D E F
A’ B’ C’ B’ D’
Input
Output
1 2
Position
Read
Process
Write
ReadPosition
WritePosition
Processor 1
Read
Process
Write
ReadPosition
WritePosition
84. 84
● Configured by the producer `transactional.id`
property
● Defines a single writer scope
● Enforced by a monotonic epoch
● Initialization protocol to await pending
transaction completion
Transactional ID
85. 85
Bump Epoch
Is Transaction
In Progress?
Begin Abort
Is Transaction
Completing?
Await
Completion
Yes
Yes
No
Return New
Epoch
No
Transactional Id
Initialization
86. 86
Bump Epoch
Is Transaction
In Progress?
Begin Abort
Is Transaction
Completing?
Await
Completion
Yes
Yes
No
Return New
Epoch
No
Transactional Id
Initialization
87. 87
Bump Epoch
Is Transaction
In Progress?
Begin Abort
Is Transaction
Completing?
Await
Completion
Yes
Yes
No
Return New
Epoch
No
Transactional Id
Initialization
88. 88
Bump Epoch
Is Transaction
In Progress?
Begin Abort
Is Transaction
Completing?
Await
Completion
Yes
Yes
No
Return New
Epoch
No
Transactional Id
Initialization
89. 89
Bump Epoch
Is Transaction
In Progress?
Begin Abort
Is Transaction
Completing?
Await
Completion
Yes
Yes
No
Return New
Epoch
No
Transactional Id
Initialization
90. 90
Consumer Group
A B C
Input
Processor 1
D E F
H I J
Processor 2
Processor 3
txnl.id=A
epoch=1
txnl.id=B
epoch=1
txnl.id=C
epoch=1
91. 91
Processor 1
Consumer Group
A B C
Input
D E F
H I J
txnl.id=A
epoch=1
txnl.id=B
epoch=1
txnl.id=C
epoch=1
Processor 2
Processor 3
92. 92
Processor 1
Consumer Group
A B C
Input
D E F
H I J
txnl.id=A
epoch=2
txnl.id=B
epoch=1
txnl.id=C
epoch=1
Processor 2
Processor 3
121. 121
● The transactional producer assumes a static
assignment of input partitions
● Consumer group partition assignments are
dynamic
What is the
problem?
122. 122122122
What is the
problem? consumer.assign(partitions)
producer.initTransactions()
while (true) {
input, offsets = consumer.poll()
output = process(input)
producer.beginTransaction()
producer.send(output)
producer.sendOffsets(offsets)
producer.commitTransaction()
}
123. 123123123
What is the
problem? consumer.assign(partitions)
producer.initTransactions()
while (true) {
input, offsets = consumer.poll()
output = process(input)
producer.beginTransaction()
producer.send(output)
producer.sendOffsets(offsets)
producer.commitTransaction()
}
124. 124124124
What is the
problem? consumer.subscribe(topics)
producer.initTransactions()
while (true) {
input, offsets = consumer.poll()
output = process(input)
producer.beginTransaction()
producer.send(output)
producer.sendOffsets(offsets)
producer.commitTransaction()
}
130. 130
1. Allow the producer to multiplex many
transactional IDs
Options
131. 131
1. Allow the producer to multiplex many
transactional IDs
2. Producer pooling for better resource sharing
Options
132. 132
1. Allow the producer to multiplex many
transactional IDs
2. Producer pooling for better resource sharing
3. Address the assignment dependency problem
Options
133. 133
1. Allow the producer to multiplex many
transactional IDs
2. Producer pooling for better resource sharing
3. Address the assignment dependency problem
Options
141. 141
1. Use the shared group id to find the transaction
coordinator
2. Make the transaction coordinator aware of
group partition assignments
KIP-447
Recipe
144. 144
1. Use the shared group id to find the transaction
coordinator
2. Make the transaction coordinator aware of
group partition assignments
3. Add logic to initialize and fence with
consideration of assignment
KIP-447
Recipe
145. 145
Bump Epoch
Is Transaction
In Progress?
Begin Abort
Is Transaction
Completing?
Await
Completion
Yes
Yes
No
Return New
Epoch
No
Transactional Id
Initialization
146. 146
Bump Epoch
Is Transaction
In Progress?
Begin Abort
Is Transaction
Completing?
Await
Completion
Yes
Yes
No
Return New
Epoch
No
Transaction
Assignment
Initialization
For all
assigned
partitions:
147. 147
1. Use the shared group id to find the transaction
coordinator
2. Make the transaction coordinator aware of
group partition assignments
3. Add logic to initialize and fence with
consideration of assignment
4. Expose all this in a nice API
KIP-447
Recipe
152. 152
kafka-producer-network-thread | producer-1] ERROR o.a.k.clients.producer.internals.Sender -
[Producer clientId=producer-1] The broker returned
org.apache.kafka.common.errors.UnknownProducerIdException: This exception is
raised by the broker if it could not locate the producer metadata associated with the producerId in
question. This could happen if, for instance, the producer's records were deleted because their
retention time had elapsed. Once the last records of the producerId are removed, the producer's
metadata is removed from the broker, and future appends by the producer will return this
exception. for topic-partition foo-0 at offset -1. This indicates data loss on the broker, and
should be investigated.
164. 164
- Used to change the partition key in Kafka
Streams
- Once repartitioned data has been consumed, it
is no longer needed
- Proactively actively delete unneeded data!
Repartition
Topics
196. 196
Kafka has an elegant transaction model which has held up well.
To reach full maturity:
● Address the producer/consumer semantic mismatch
● Improve producer resilience to errors