A Kafka Client’s Request: There and Back Again with Danica Fine

dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/
A Kafka Client’s Request
Or, There and Back Again

A Schema
{
"doc": "Accounting for the whereabouts and current
activities of hobbits.",
"fields": [
{
"doc": "Name of the hobbit in question.",
"name": "hobbit_name",
"type": "string"
},
{
"doc": "Current location of the hobbit.",
"name": "location",
"type": "string"
},
{
"doc": "Current status of the hobbit.",
"name": "status",
"type": {
"name": "Status",
"type": "enum",
"symbols": ["EATING", "NAPPING", "SMOKING",
"ADVENTURING", "THIEVING"]
}
}
],
"name": "hobbitUpdate",
"type": "record"
}

A Schema and a Topic
{
"doc": "Accounting for the whereabouts and current
activities of hobbits.",
"fields": [
{
"doc": "Name of the hobbit in question.",
"name": "hobbit_name",
"type": "string"
},
{
"doc": "Current location of the hobbit.",
"name": "location",
"type": "string"
},
{
"doc": "Current status of the hobbit.",
"name": "status",
"type": {
"name": "Status",
"type": "enum",
"symbols": ["EATING", "NAPPING", "SMOKING",
"ADVENTURING", "THIEVING"]
}
}
],
"name": "hobbitUpdate",
"type": "record"
}
Topic hobbit-updates:
● "num.partitions"=6
● "replication.factor"=3
● "min.insync.replicas"=2
● "cleanup.policy"=delete

Producing an Event
key: "Bilbo Baggins"
{
"hobbit_name": "Bilbo Baggins",
"location": "Bag End",
"status": "EATING"
}
producer.send()

So the data is in Kafka, right?

Let’s go on a journey…

0: Within the Producer Client
A. Serializing
○ key.serializer
○ value.serializer

A. Serializing
○ key.serializer
B. Partitioning

Partitioning Conﬁgurations
● partitioner.class
○ None
■ Hash of key, if present
■ Otherwise, use sticky partition*
○ RoundRobinPartitioner
○ Customize using the Partitioner interface
● partitioner.ignore.keys
● Others to know:
○ partitioner.adaptive.partitioning.enable
○ partitioner.availability.timeout.ms

A. Serializing
○ key.serializer
B. Partitioning
C. Batching

Batching Conﬁgurations
● batch.size
○ Default 16 KB (no batching when set to 0)
○ Batches may not be full
● linger.ms
○ Default 0 ms (no batching when set to 0)
○ Directly affects latency, e.g. linger.ms=10 adds up to 10 ms of latency
● buffer.memory
○ Default ~32 MB
○ Should be > batch.size
○ Chunked into segments of batch.size

Batching Metrics
● batch-size-avg
● records-per-request-avg
● record-size-avg
● buffer-available-bytes
● record-queue-time-avg

A. Serializing
○ key.serializer
B. Partitioning
C. Batching
D. [Compressing]

A. Serializing
○ key.serializer
B. Partitioning
C. Batching
D. [Compressing]
E. Request

Request Conﬁgurations
● max.request.size
○ Default ~1 MB
○ Limits number of record batches
● acks
○ Default “all”
○ How many replicas should write the data before sending a response back?
● max.in.flight.requests.per.connection
○ Default 5
○ Limit on unacknowledged requests per broker
● enable.idempotence and transactional.id
● request.timeout.ms
○ Default 30 seconds
○ Time between issuing request and retrying or throwing exception
○ Retrying set handled by delivery.timeout.ms, retries, and retry.backoff.ms

Request Metrics (General)
● request-rate
● requests-in-flight
● request-latency-avg

Produce Request (Bound by max.request.size)
Request Metadata
- transaction ID
- acks
- timeoutMs
Producer Data
Topic Data
dwarf_updates Partition_1
index
Batch
Topic Data
hobbit_updates Partition_0
index
Batch Batch
Partition_4
index
Batch

● Initial request landing zone
● Awaiting processing
● Low-level conﬁgurations:
○ socket.receive.buffer.bytes
○ socket.request.max.bytes
1: Socket Receive Buffer

● Forms the ofﬁcial produce request
● Adds to the request queue
● Conﬁgure and monitor:
○ num.network.threads
○ NetworkProcessorAvgIdlePercent
2: Network threads

● Await processing by I/O Threads
● Conﬁgure:
○ queued.max.requests
○ queued.max.request.bytes
● Monitor
○ RequestQueueSize
○ RequestQueueTimeMs
3: Request Queue

● Finally ready to be processed!
● Data validation
● Conﬁgure and monitor:
○ num.io.threads
○ RequestHandlerAvgIdlePercent
4: I/O Threads, aka Request Handler Threads

● Underlying commit log
● Segment components
○ .log for data
○ .index for (offset, record)
○ .timeindex
○ .snapshot
5: Page Cache and To Disk

5. Page Cache/Disk Conﬁguration and Monitoring
● Conﬁgure:
○ log.segment.bytes
○ log.flush.interval.ms
○ log.flush.interval.messages
○ [log.]cleanup.policy
● Monitor:
○ LogFlushRateAndTimeMs
○ LocalTimeMs
○ Bonus: monitor page cache performance using system memory

● Hold request until data is has been
replicated as per acks
● Based on hierarchical timing wheel
● Conﬁgure:
○ default.replication.factor
○ num.replica.fetchers
○ replica.fetch.wait.max.ms
● Monitor using RemoteTimeMs
6: Mordor Purgatory (but actually)

● Await handoff to the network thread
● Unbounded, unconﬁgurable
● Monitor
○ ResponseQueueSize
○ ResponseQueueTimeMs
7: Response Queue

● Places the response on the socket
send buffer
● Last ofﬁcial task of the network thread
8: Network thread hand-off

● Final interval on the broker
● Awaiting receipt by producer
● socket.send.buffer.bytes
● Monitor with ResponseSendTimeMs
9: Socket Send Buffer

● Final summary broker metrics:
○ TotalTimeMs
■ RequestQueueTimeMs
■ LocalTimeMs
■ RemoteTimeMs
■ ResponseSendTimeMs
○ ResponseQueueTimeMs
10: And back again to the Producer

Start your own journey…
LinkTree Resources

Questions?

A Kafka Client’s Request: There and Back Again with Danica Fine

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie A Kafka Client’s Request: There and Back Again with Danica Fine

Ähnlich wie A Kafka Client’s Request: There and Back Again with Danica Fine (20)

Mehr von HostedbyConfluent

Mehr von HostedbyConfluent (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Kafka Client’s Request: There and Back Again with Danica Fine