This document discusses improvements made to Kafka-on-Pulsar (KoP) for Pulsar 2.8.0 to help take it to production at internet scale. Key updates include continuous offset support, Kafka entry formatting, heap memory optimizations, exposed metrics, and OAuth 2.0 authentication. KoP is now generally available and sees use at Tencent supporting 60 trillion daily messages through optimizations like speeding up reboots, producer throttling, and implementing continuous offsets.
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for Pulsar 2.8.0 - Pulsar Summit NA 2021
1. Take Kafka-on-Pulsar to production
at internet scale:
Improvements made for Pulsar 2.8.0
Yunze Xu
Software Engineer@StreamNative
Pulsar Summit North America 2021
2. 1. Why KoP (Kafka on Pulsar)?
2. Brief introduction to KoP
3. What's new for Pulsar 2.8.0
4. Practice of KoP in Tencent Big Data
Agenda
Pulsar Summit North America 2021
3. About Me
Pulsar Summit North America 2021
• StreamNative Software Engineer
• Apache Pulsar Committer
• KoP (Kafka on Pulsar) Core Maintainer
4. Kafka & Pulsar for pub-sub
Pulsar Summit North America
Producer
Producer
Producer
Consumer
Consumer
Consumer
Broker
topic
partition
partition
partition
6. Migrate from Kafka to Pulsar
Pulsar Summit North America
Infra Teams: Could you change your existed code using Pulsar?
The responses may be:
• Sorry I have other important things to do.
• Sorry I used Kafka Connect to an external system but there’s no
connector to Pulsar.
• Pulsar adapters? I am OK. But I write PHP.
• …
8. How to use KoP
Pulsar Summit North America
1. Put the nar file under protocols directory
2. Configure the broker.conf or standalone.conf
3. Start the broker
For quick start, you can just configure KoP on Pulsar standalone.
9. Supported clients
Pulsar Summit North America
The core tests are based on Kafka client 2.0
Following clients are verified by basic end to end tests:
• Java client 1.0 ~ 2.6
• rdkafka based clients
• golang-sarama
NOTE, currently
• Kafka < 1.0 is not supported
• Records with multiple batches (<= v2 or >= v8) is not supported.
10. How it works?
Pulsar Summit North America
BrokerService manages all broker’s resources
• Producer, subscriptions (with consumers)
• Topic and the associated managed ledgers
• Built-in admin and clients (configured with
broker’s authentication)
• …
Load Validate Configure Start
Bind
address
11. Topic & Partition
Pulsar Summit North America
Pulsar
persistent://public/default/my-topic-partition-0
Kafka
• Whether to persist the message
• Tenant/Namespace
• Short topic name
• Partition suffix
12. Topic & Partition
Pulsar Summit North America
• my-topic => persistent://public/default/my-topic
• Tenant-0/Namespace-0/Topic-0 => persistent://Tenant-0/Namespace-0/Topic-0
• xxx/my-topic => invalid topic name
• persistent://Tenant-0/Namespace-0/Topic-0
13. Produce & Fetch requests
Pulsar Summit North America
Find PersistentTopic
Encode MemoryRecords
Write the bytes to bookie
via ManagedLedger
Read bytes from bookie
Decode bytes to
MemoryRecords
Produce Fetch
Q: What if failed to find PersistentTopic?
A: Returns a NOT_LEADER_FOR_PARTITION error to trigger client’s retry behavior.
14. What's new in 2.8.0
Pulsar Summit North America
• Continuous Offset Support
• Kafka’s entry formatter
• Heap memory optimization
• Expose metrics for requests
• OAuth 2.0 authentication support
• More support for Kafka’s admin client
• Expose advertised listeners
15. Kafka entry formatter
Pulsar Summit North America
Before 2.8.0
• Decompress the batch
• Get single messages from batch
• Convert each single message
• Reconstruct the metadata Is it necessary?
If a user migrates a service from Kafka to Pulsar, their clients should be all Kafka clients.
16. Kafka entry formatter
Pulsar Summit North America
entryFormat=kafka
• Produce:
• Only add the metadata header
• Consume:
• Fill offset and length fields of each entry
• Merge these entries
Entry 1 Entry 2 Entry N
Bytes 1 Bytes 2 Bytes N
17. Kafka entry formatter
Pulsar Summit North America
Limit:
✓ Kafka producer & Kafka consumer
✓ Pulsar producer & Pulsar consumer
p Kafka producer & Pulsar consumer
p Pulsar producer & Kafka consumer
We still add a Pulsar message metadata header before the Kafka’s message. In KoP, it adds
a property to mark the message’s format is Kafka
In future, we’ll handle this property in Pulsar’s broker or client side.
18. Heap memory optimization
Pulsar Summit North America
The heap memory increases quickly and triggers GC frequently.
However, Pulsar uses direct memory for messages. This should not happen.
When I tested a producer, I found the heap memory increased fast and triggered GC
frequently.
19. Heap memory optimization
Pulsar Summit North America
However, Pulsar uses direct memory for messages. This should not happen.
When I tested a producer, I found the heap memory increased fast and triggered GC
frequently.
20. Heap memory optimization
Pulsar Summit North America
Once you eliminate the impossible, whatever remains, no matter how
improbable, must be the truth.
by Sherlock Holmes
21. Heap memory optimization
Pulsar Summit North America
When a Netty CompositeByteBuf that contains more than one component is converted to a NIO
buffer, it will be created in heap memory.
22. Heap memory optimization
Pulsar Summit North America
Cause:
• BookKeeper’s JNI CRC32 algorithm needs a NIO buffer as the argument
• When KoP handles fetch request, it uses a CompositeByteBuf to wrap multiple ByteBuf
instances but it’s converted to a NIO buffer eventually.
Fix:
• BookKeeper: compute CRC32 checksum for each component of a CompositeByteBuf
• KoP: allocate a new buffer in direct memory to merge buffers
24. Heap memory optimization
Pulsar Summit North America
After the previous fix, when the consumer calls poll() method in a tight loop, the heap
memory still increased fast.
And the increasing speed is not related to produce rate.
i.e. no difference between 50 MB/s and 200 MB/s.
25. Heap memory optimization
Pulsar Summit North America
Find PersistentTopic
Read bytes from bookie
Decode bytes to
MemoryRecords
Handle FETCH request
The FETCH request can contain many partitions’ request
The real process is (before 2.8.0):
1. Find all PersistentTopics of each partition and collect to list.
2. Read messages of each partition and collect to list.
3. Use Maps to cache middle results.
4. Decode the list of messages and create the response.
Though we already uses Netty’s Object pool for each FETCH request
handler’s context, the temporary list and maps are not allocated
from pool.
26. Heap memory optimization
Pulsar Summit North America
Find PersistentTopic of
partition-0
Read bytes from
bookie
Decode bytes to
MemoryRecords
FETCH request
Find PersistentTopic of
partition-1
Read bytes from
bookie
Decode bytes to
MemoryRecords
Find PersistentTopic of
partition-2
Read bytes from
bookie
Decode bytes to
MemoryRecords
FETCH response
Handle FETCH request (2.8.0)
27. Heap memory optimization
Pulsar Summit North America
The GC interval increases from 90 seconds to 120 seconds.
Though there is still room for FETCH request optimization to reduce GC.
28. Performance Test
Pulsar Summit North America
Virtual Machines from AWS-CN
openmessaging-benchmark
• Compare: Kafka client and Pulsar client in the same KoP cluster.
• Goal: the performance of Kafka client (to KoP) is closed to Pulsar client (to Broker)
KoP is for users that want to use Pulsar but they don’t want to change client.
29. Performance: Latency
Pulsar Summit North America
Producer only for 1 topic, batch size is 1 MiB, max batched latency is 1 ms. 100 MiB/s produce rate.
It’s OK because there’re still some overhead:
• Validate Kafka records before writing to BK.
• Network framework design (request queue).
30. Performance: Latency
Pulsar Summit North America
Both producer and consumer are created.
As I’ve said before, there is still room for FETCH request optimization to reduce GC.
31. Performance: Throughput
Pulsar Summit North America
Disk throughput of AWS i3.4xlarge NVMe SSD
Adjust BookKeeper’s config to avoid files are rolled too frequently.
Default: 2 GiB, Now: 20 GiB
33. Metrics
Pulsar Summit North America
See more details at https://github.com/streamnative/kop/blob/master/docs/reference-metrics.md
34. Summary & Plan
Pulsar Summit North America
In short, KoP will be generally available for production from 2.8.0
• Continuous offset
• Performance improvement
• Metrics support
Plans
• More improvements on performance
• Support Kafka client 0.10.x.y and 0.11.x.y
• Authorization support
36. Practice of Kafka on Pulsar in
Tencent Big Data
Dawei Zhang
Senior Software Engineer at Tencent
Kafka-on-Pulsar maintainer
Apache Incubator-InLong commiter
Pulsar Summit North America 2021
37. 1. Why KoP ?
2. Problems in KoP
3. Speedup Reboot
4. Publish throttle
5. Continuous offset
6. KoP in product
Agenda
Pulsar Summit North America 2021
38. Background
Pulsar Summit North America
Message Queue Team
A part of Tencent Big Data Working Group.
Provides MQ service and management which supports 60 trillion+ traffic access daily.
MQ options:
- TubeMQ (core module of Apache Incubator-InLong) for high throughput
- Pulsar for high reliability and high consistency
- KoP for Kafka protocol messages (latest)
39. Why kop?
Pulsar Summit North America
Open Source & Collaboration In Tencent
Kafka: Difficulty in large-scale operation and maintenance
- Performance degradation as topics/partitions grows
- Data migration when scaling out
KoP: Reuse Pulsar capabilities
- Scalability
- Failover
- Isolation for read & write etc.
Seamless migration from Kafka to KoP
- Without any change in code
40. Problems
Pulsar Summit North America
• Poor performance
• Kafka admin CLI tools are not supported well
• No metrics for both producer and consumer
• Frequent OutOfDirectryMemory
• Reboot process gets slower
• Consumer lag not accurate
• KoP cluster is not available after running for a long time
• …
41. Speedup reboot
Pulsar Summit North America
define the max retention time
retention
reduced by compact
compact
Offset topic contains more and more data
Reboot gets slower
42. Producer throttle in Pulsar
Pulsar Summit North America
Counter for entry memory usage
Broker receives entry from producer, counter increased by
entry size
Broker writes entry to bookie, counter decreased by entry
size
If counter > maxMessagePublishBufferSizeInMB, throttle all
producers
maxMessagePublishBufferSizeInMB
43. Producer throttle in KoP
Pulsar Summit North America
Reuse the counter of Pulsar Broker
KoP receive entry from kafka producer, counter increase by
entry size
KoP write entry to bookie, counter decrease by entry size
If counter > maxMessagePublishBufferSizeInMB, throttle all
kafka producers
maxMessagePublishBufferSizeInMB
44. How to find a message?
Pulsar Summit North America
offset
ledgerId entryId batchIndex
64 bits
20 + 32 + 12 bits
Pulsar
Message Id
Map between Kafka offset and Pulsar Message Id
Kafka
Offset
● Offset Hollow
- not friendly to the third party system which dependents on continuous offset
- lag of consume is not accurate
● LedgerId overflows, KoP cluster will not be available
● EntryId/BatchIndex overflows, offset is not accurate
45. Continuous offset-PIP 70
Pulsar Summit North America
Append offset for every Entry at broker side(BrokerEntryMetadata)
• every topic partition has a monotonically increasing offset, start from 0
• every entry has a offset added from broker side
• offset of an entry + message number in entry = offset of next entry
• find a message id from a given offset and vice versa
46. Continuous offset-PIP 70
Pulsar Summit North America
[PROTOCOL_HEA
D]
HEADERS_AND_PAYLOA
D
Original protocol
[MAGIC_NUMBER] [CHECKSUM] [METADATA_SIZE] [METADATA]
[PAYLOAD]
[BROKER_ENTRY_META_MAGIC_NUMBER] [BROKER_ENTRY_META_SIZE]
[BROKER_ENTRY_META]
[BROKER_ENTRY_METADA
TA]
HEADERS_AND_PAYLOA
D
New protocol
Add broker entry metadata
47. Continuous offset-PIP 70
Pulsar Summit North America
Metadat
a
Payloa
d
Metadat
a
Payloa
d
BrokerEntryMetadat
a
Message
MessageWithBrokerEntryMetadata
Broker
Bookie
Client
Message
Message
Broker
Bookie
Client
Message
MessageWithBrokerEntryMetadata
Before After
48. Continuous offset- KoP implement
Pulsar Summit North America
1. Receive produce request
2. Build entry metadata
3. Put entry metadata ahead of original entry
4. Write new message to bookie
1. Receive fetch request
2. Find message id for fetch offset
3. Build cursor for offset
4. Read entries using cursor
5. Parse offset from entry metadata and return fetch response
1. Find entry by timestamp
2. Get entry metadata
3. Get offset from entry metadata
Produce
Fetch
ListOffset
KoP could interact with all components of broker, like managed ledger, which uses a BK client to access bookie. And the load balancer, which determines the topic ownership. Also it could use the internal ZooKeeper cache to access ZooKeeper for metadata.
You can use Pulsar standalone with KoP for a quick verification of your Kafka client.
Before 2.8.0 the offset is not continuous and it may cause many problems. My partner will talk about how we implement it.
I’ll talk about these two improvements in detail because it’s related to the performance.
The left I’ll just give a simple introduction.
With Kafka entry formatter (not default), you can get better performance.
From the dump analysis, we can see there are a lot of messages in heap. Each byte array is a message whose size is one mebibyte.
Parallel the fetch handler and don’t use list and map to cache middle results.
There’re still room for improvement to reduce heap memory.
There’re some blogs that compare performance of Kafka and Pulsar.
Bookie and broker are deployed in the same machine to reduce network traffic
Ensure the bookie’s journal disk is SSD
Ensure the machines of broker and client have high (10 Gbits) network bandwidth
As we can see, there’s a crest in 99 percent publish latency. It’s caused by GC
Regarding to continuous offset, let my partner talk about it in detail.