Batch Message Listener capabilities of the Apache Kafka Connector
1. Batch Message Listener capabilities of the Apache Kafka
Connector
New York MuleSoft Meetup Group
June 25th, 2022
2. 2
▪ Around 14 years of Experience into ERP and Integrations.
▪ Working as Principal Architect @Slalom LLC.
▪ 3X MuleSoft Certified, 1X AWS Certified
▪ Managed multiple End to End Implementations/Integration projects.
▪ Mentoring Mule Developers and People trying to switch to MuleSoft.
Introduction
Neeraj Kumar- Host NYC
3. Safe Harbor Statement
● Both the speaker and host are organizing this meet up in individual capacity
only. We are not representing our companies here.
● This presentation is strictly for learning purpose only. Organizer/Presenter do
not hold any responsibility that same solution will work for your business
requirements also.
● This presentation is not meant for any promotional activities.
4. 4
Lars Grube
●15 years of IT Software Engineering experience
●Works on business intelligence, data integration and training
solutions
●Specialized in highly parallel processing APIs interacting with
Teradata Vantage
●MuleSoft Certified Developer - Level 1
●LinkedIn profile: https://www.linkedin.com/in/lars-grube
Introductions
5. 5
● Why?
● The overall picture
● The detailed approach
● Live Demo
● Wrap up - Q&A
Agenda
11. 11
● Should match approximately
1000 messages
● Should match default fetch
maximum size
● Should be increased if default
fetch minimum size is
frequently missed
● Must be at least as long as
fetch maximum wait timeout,
preferably one minute longer
Configuring the Apache Kafka Consumer
12. 12
● Implement a guard function in order to filter and log messages that
cannot be parsed into the expected format
● Based on: https://blogs.mulesoft.com/dev-guides/how-to-tutorials/guarding-
collections-dataweave-try-function-2/
● Usage
Extracting the payload from the batch message
fun guardWithDefaultOutput (fn, defaultOutput = null) = dw::Runtime::try(fn) match {
case tr if (tr.success) -> tr.result
else -> defaultOutput
}
var payloadValues = (payload.*payload default [] map (value, index) -> do {
var jsonVal = guard(() -> read(value, "application/json"))
---
value: if (jsonVal is Null) null else
{
...
13. 13
● Perform an explicit mapping for all fields to avoid implicit null values
● Segment the payload for parallel processing
● Combine the “Parallel For Each” scope and “Bulk Insert” component
Inserting the data into the database
var payloadValues = (payload.*payload default [] map (value, index) -> do {
...
orderNo: jsonVal.orderNo,
orderDate: jsonVal.orderDate,
customerNo: jsonVal.customerNo,
...
import divideBy as divBy from dw::core::Arrays
---
(vars.batchMessagePayload filter not ($.value is Null)) divBy 100
14. 14
● Define the primary index of the target table on the correlation id and an
ascending number by payload
Inserting the data into the database
CREATE MULTISET TABLE MULE.STG_FACT_ORDERS
(
orderNo BIGINT NOT NULL,
orderDate DATE FORMAT 'YYYY-MM-DD' NOT NULL,
customerNo BIGINT NOT NULL,
itemNo BIGINT NOT NULL,
itemDesc VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC NOT
NULL,
orderState VARCHAR(30) CHARACTER SET UNICODE NOT CASESPECIFIC NOT
NULL,
lastUpdatedTS TIMESTAMP(3) WITH TIME ZONE NOT NULL,
mule_correlation_id VARCHAR(128) CHARACTER SET UNICODE NOT
CASESPECIFIC NOT NULL,
row_number_by_payload INTEGER NOT NULL)
PRIMARY INDEX ( mule_correlation_id ,row_number_by_payload );
var payloadValues = (payload.*payload default [] map (value, index) -> do {
...
mule_correlation_id: correlationId,
row_number_by_payload: index
}
15. 15
● Use the message attributes (key, creation timestamp, partition, offset) for
precisely logging problematic messages
● Identify irreproducible database issues
● Error type "DB:CONNECTIVITY"
● Error type "MULE:COMPOSITE_ROUTING" containing "DB:CONNECTIVITY"
error or another nested "MULE:COMPOSITE_ROUTING"
● Teradata error codes (2631, 1095, 3134)
● In case of a reproducible database issue occuring during the bulk insert,
split the message array into its items and perform parallel inserts on the
item level
● In case of an irreproducible database issue or any other error, rely on the
default behaviour (redelivery)
Handling errors
(error.errors ++ (flatten(error.errors.*errors) default [])).*errorType.*asString
contains "DB:CONNECTIVITY"
17. 17
● Performance comparison between single and batch message listener
● Handling irreproducible database issues
● Handling and logging problematic message data and reproducible database issues
Live Demo
18. 18
● Teradata Vantage Express on VMWare Workstation Player
https://downloads.teradata.com/download/database/teradata-express-for-vmware-player
https://downloads.vmware.com/d/
● Apache Kafka on Ubuntu
https://kafka.apache.org/
https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-10
● AKHQ
https://akhq.io
● API implementation
https://github.com/larsgrube/teradata-fast-import/tree/master/teradata-import-api
Components overview
21. 21
● Functions for catching and handling DataWeave errors are available in the module dw:: …
a) Core
b) Mule
c) Runtime
d) System
Q1
22. 22
● Which Apache Kafka Consumer configuration parameter should be checked and, if
necessary, increased when increasing the fetch maximum wait timeout?
a) Request timeout
b) Session timeout
c) Maximum polling interval
d) All of the above
Q2
23. 23
● What primary index of a target table should be chosen for parallel bulk inserts?
a) Logical key, correlation id, insert timestamp
b) Correlation id, row number by payload
c) Surrogate key
d) No primary index
Q3