Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for Pulsar 2.8.0 - Pulsar Summit NA 2021

Take Kafka-on-Pulsar to production
at internet scale:
Improvements made for Pulsar 2.8.0
Yunze Xu
Software Engineer@StreamNative
Pulsar Summit North America 2021

1. Why KoP (Kafka on Pulsar)?
2. Brief introduction to KoP
3. What's new for Pulsar 2.8.0
4. Practice of KoP in Tencent Big Data
Agenda

About Me
• StreamNative Software Engineer
• Apache Pulsar Committer
• KoP (Kafka on Pulsar) Core Maintainer

Kafka & Pulsar for pub-sub
Pulsar Summit North America
Producer
Producer
Producer
Consumer
Consumer
Consumer
Broker
topic
partition
partition
partition

Kafka vs. Pulsar (storage)
Client
Broker Broker Broker
Kafka
Leader
Follower Follower
Pulsar
Bookie Bookie
Bookie Bookie
Bookie Bookie
Bookie Bookie
Bookie Bookie
Bookie

Migrate from Kafka to Pulsar
Infra Teams: Could you change your existed code using Pulsar?
The responses may be:
• Sorry I have other important things to do.
• Sorry I used Kafka Connect to an external system but there’s no
connector to Pulsar.
• Pulsar adapters? I am OK. But I write PHP.
• …

Protocol Handler

How to use KoP
1. Put the nar file under protocols directory
2. Configure the broker.conf or standalone.conf
3. Start the broker
For quick start, you can just configure KoP on Pulsar standalone.

Supported clients
The core tests are based on Kafka client 2.0
Following clients are verified by basic end to end tests:
• Java client 1.0 ~ 2.6
• rdkafka based clients
• golang-sarama
NOTE, currently
• Kafka < 1.0 is not supported
• Records with multiple batches (<= v2 or >= v8) is not supported.

How it works?
BrokerService manages all broker’s resources
• Producer, subscriptions (with consumers)
• Topic and the associated managed ledgers
• Built-in admin and clients (configured with
broker’s authentication)
• …
Load Validate Configure Start
Bind
address

Topic & Partition
Pulsar
persistent://public/default/my-topic-partition-0
Kafka
• Whether to persist the message
• Tenant/Namespace
• Short topic name
• Partition suffix

Topic & Partition
• my-topic => persistent://public/default/my-topic
• Tenant-0/Namespace-0/Topic-0 => persistent://Tenant-0/Namespace-0/Topic-0
• xxx/my-topic => invalid topic name
• persistent://Tenant-0/Namespace-0/Topic-0

Produce & Fetch requests
Find PersistentTopic
Encode MemoryRecords
Write the bytes to bookie
via ManagedLedger
Read bytes from bookie
Decode bytes to
MemoryRecords
Produce Fetch
Q: What if failed to find PersistentTopic?
A: Returns a NOT_LEADER_FOR_PARTITION error to trigger client’s retry behavior.

What's new in 2.8.0
• Continuous Offset Support
• Kafka’s entry formatter
• Heap memory optimization
• Expose metrics for requests
• OAuth 2.0 authentication support
• More support for Kafka’s admin client
• Expose advertised listeners

Kafka entry formatter
Before 2.8.0
• Decompress the batch
• Get single messages from batch
• Convert each single message
• Reconstruct the metadata Is it necessary?
If a user migrates a service from Kafka to Pulsar, their clients should be all Kafka clients.

entryFormat=kafka
• Produce:
• Only add the metadata header
• Consume:
• Fill offset and length fields of each entry
• Merge these entries
Entry 1 Entry 2 Entry N
Bytes 1 Bytes 2 Bytes N

Limit:
✓ Kafka producer & Kafka consumer
✓ Pulsar producer & Pulsar consumer
p Kafka producer & Pulsar consumer
p Pulsar producer & Kafka consumer
We still add a Pulsar message metadata header before the Kafka’s message. In KoP, it adds
a property to mark the message’s format is Kafka
In future, we’ll handle this property in Pulsar’s broker or client side.

Heap memory optimization
The heap memory increases quickly and triggers GC frequently.
However, Pulsar uses direct memory for messages. This should not happen.
When I tested a producer, I found the heap memory increased fast and triggered GC
frequently.

However, Pulsar uses direct memory for messages. This should not happen.
When I tested a producer, I found the heap memory increased fast and triggered GC
frequently.

Once you eliminate the impossible, whatever remains, no matter how
improbable, must be the truth.
by Sherlock Holmes

When a Netty CompositeByteBuf that contains more than one component is converted to a NIO
buffer, it will be created in heap memory.

Cause:
• BookKeeper’s JNI CRC32 algorithm needs a NIO buffer as the argument
• When KoP handles fetch request, it uses a CompositeByteBuf to wrap multiple ByteBuf
instances but it’s converted to a NIO buffer eventually.
Fix:
• BookKeeper: compute CRC32 checksum for each component of a CompositeByteBuf
• KoP: allocate a new buffer in direct memory to merge buffers

Before
After
Produce with 200 MB/s

After the previous fix, when the consumer calls poll() method in a tight loop, the heap
memory still increased fast.
And the increasing speed is not related to produce rate.
i.e. no difference between 50 MB/s and 200 MB/s.

Find PersistentTopic
Read bytes from bookie
Decode bytes to
MemoryRecords
Handle FETCH request
The FETCH request can contain many partitions’ request
The real process is (before 2.8.0):
1. Find all PersistentTopics of each partition and collect to list.
2. Read messages of each partition and collect to list.
3. Use Maps to cache middle results.
4. Decode the list of messages and create the response.
Though we already uses Netty’s Object pool for each FETCH request
handler’s context, the temporary list and maps are not allocated
from pool.

Find PersistentTopic of
partition-0
Read bytes from
bookie
Decode bytes to
MemoryRecords
FETCH request
partition-1
Read bytes from
bookie
Decode bytes to
MemoryRecords
partition-2
Read bytes from
bookie
Decode bytes to
MemoryRecords
FETCH response
Handle FETCH request (2.8.0)

The GC interval increases from 90 seconds to 120 seconds.
Though there is still room for FETCH request optimization to reduce GC.

Performance Test
Virtual Machines from AWS-CN
openmessaging-benchmark
• Compare: Kafka client and Pulsar client in the same KoP cluster.
• Goal: the performance of Kafka client (to KoP) is closed to Pulsar client (to Broker)
KoP is for users that want to use Pulsar but they don’t want to change client.

Performance: Latency
Producer only for 1 topic, batch size is 1 MiB, max batched latency is 1 ms. 100 MiB/s produce rate.
It’s OK because there’re still some overhead:
• Validate Kafka records before writing to BK.
• Network framework design (request queue).

Performance: Latency
Both producer and consumer are created.
As I’ve said before, there is still room for FETCH request optimization to reduce GC.

Performance: Throughput
Disk throughput of AWS i3.4xlarge NVMe SSD
Adjust BookKeeper’s config to avoid files are rolled too frequently.
Default: 2 GiB, Now: 20 GiB

Performance: Throughput
100 partitions, 2 producers, 2 consumers, 800 MB/s
• KoP’s throughput is about 88% of Pulsar

Metrics
See more details at https://github.com/streamnative/kop/blob/master/docs/reference-metrics.md

Summary & Plan
In short, KoP will be generally available for production from 2.8.0
• Continuous offset
• Performance improvement
• Metrics support
Plans
• More improvements on performance
• Support Kafka client 0.10.x.y and 0.11.x.y
• Authorization support

Thank you!

Practice of Kafka on Pulsar in
Tencent Big Data
Dawei Zhang
Senior Software Engineer at Tencent
Kafka-on-Pulsar maintainer
Apache Incubator-InLong commiter

1. Why KoP ?
2. Problems in KoP
3. Speedup Reboot
4. Publish throttle
5. Continuous offset
6. KoP in product
Agenda

Background
Message Queue Team
A part of Tencent Big Data Working Group.
Provides MQ service and management which supports 60 trillion+ traffic access daily.
MQ options:
- TubeMQ (core module of Apache Incubator-InLong) for high throughput
- Pulsar for high reliability and high consistency
- KoP for Kafka protocol messages (latest)

Why kop?
Open Source & Collaboration In Tencent
Kafka: Difficulty in large-scale operation and maintenance
- Performance degradation as topics/partitions grows
- Data migration when scaling out
KoP: Reuse Pulsar capabilities
- Scalability
- Failover
- Isolation for read & write etc.
Seamless migration from Kafka to KoP
- Without any change in code

Problems
• Poor performance
• Kafka admin CLI tools are not supported well
• No metrics for both producer and consumer
• Frequent OutOfDirectryMemory
• Reboot process gets slower
• Consumer lag not accurate
• KoP cluster is not available after running for a long time
• …

Speedup reboot
define the max retention time
retention
reduced by compact
compact
Offset topic contains more and more data
Reboot gets slower

Producer throttle in Pulsar
Counter for entry memory usage
Broker receives entry from producer, counter increased by
entry size
Broker writes entry to bookie, counter decreased by entry
size
If counter > maxMessagePublishBufferSizeInMB, throttle all
producers
maxMessagePublishBufferSizeInMB

Producer throttle in KoP
Reuse the counter of Pulsar Broker
KoP receive entry from kafka producer, counter increase by
entry size
KoP write entry to bookie, counter decrease by entry size
If counter > maxMessagePublishBufferSizeInMB, throttle all
kafka producers
maxMessagePublishBufferSizeInMB

How to find a message?
offset
ledgerId entryId batchIndex
64 bits
20 + 32 + 12 bits
Pulsar
Message Id
Map between Kafka offset and Pulsar Message Id
Kafka
Offset
● Offset Hollow
- not friendly to the third party system which dependents on continuous offset
- lag of consume is not accurate
● LedgerId overflows, KoP cluster will not be available
● EntryId/BatchIndex overflows, offset is not accurate

Continuous offset-PIP 70
Append offset for every Entry at broker side(BrokerEntryMetadata)
• every topic partition has a monotonically increasing offset, start from 0
• every entry has a offset added from broker side
• offset of an entry + message number in entry = offset of next entry
• find a message id from a given offset and vice versa

[PROTOCOL_HEA
D]
HEADERS_AND_PAYLOA
D
Original protocol
[MAGIC_NUMBER] [CHECKSUM] [METADATA_SIZE] [METADATA]
[PAYLOAD]
[BROKER_ENTRY_META_MAGIC_NUMBER] [BROKER_ENTRY_META_SIZE]
[BROKER_ENTRY_META]
[BROKER_ENTRY_METADA
TA]
HEADERS_AND_PAYLOA
D
New protocol
Add broker entry metadata

Metadat
a
Payloa
d
Metadat
a
Payloa
d
BrokerEntryMetadat
a
Message
MessageWithBrokerEntryMetadata
Broker
Bookie
Client
Message
Message
Broker
Bookie
Client
Message
MessageWithBrokerEntryMetadata
Before After

Continuous offset- KoP implement
1. Receive produce request
2. Build entry metadata
3. Put entry metadata ahead of original entry
4. Write new message to bookie
1. Receive fetch request
2. Find message id for fetch offset
3. Build cursor for offset
4. Read entries using cursor
5. Parse offset from entry metadata and return fetch response
1. Find entry by timestamp
2. Get entry metadata
3. Get offset from entry metadata
Produce
Fetch
ListOffset

KoP in Product

Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for Pulsar 2.8.0 - Pulsar Summit NA 2021

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for Pulsar 2.8.0 - Pulsar Summit NA 2021

Ähnlich wie Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for Pulsar 2.8.0 - Pulsar Summit NA 2021 (20)

Mehr von StreamNative

Mehr von StreamNative (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for Pulsar 2.8.0 - Pulsar Summit NA 2021

Hinweis der Redaktion