SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Kafka Connect
Kafka Meetup 2019
Yi Zhang
Software Engineer
Building Data Pipeline
Observability using
Elasticsearch and Kibana
Kafka Connect
Sink
Kafka
Data Monitoring Platform
Data Format Standard is important!
Enter Section/Running Header Here
syntax="proto2";
message envelope {
# required fields
required string data_type = 1;
required string create_at_us = 1;
required string source_name = 1;
# optional
string schema = 1;
# payload
bytes payload = 1;
}
• To query Kafka message in real-time
• To quickly find the location of a message
• To trace for a historic event for debugging/diagnose
• To monitor data quality in the pipeline
• To monitor and project data volume in the pipeline for capacity planning
• To detect abnormal data patterns
Why?
Enter Section/Running Header Here
• Quick overview of Kafka Connect
• How does the data transformation work in Kafka Connect
• What is SMT
• Some use cases for SMT
• SMT vs Kafka Streams for data transformation
• Tips for using Kafka Connect to sink data to Elasticsearch
Takeaways
Enter Section/Running Header Here
Quick Overview of
Kafka Connect
• Lightweight and stateless
• Scalable and fault-tolerant
• Integrates with Kafka and many other data systems
• Pluggable architecture make customization easy and configurable
• Lots open source (connectors and converter plugins) available
• Run in two modes:
• standalone mode is great for dev and local testing
• distributed mode is great for scaling and fault-tolerance
• REST API available to monitor and configure your connectors in the distributed
mode
More reasons to use Kafka Connect
Enter Section/Running Header Here
Data Sink
Converter
Transformers
Connector
Kafka Connect SinkKafka
Order of Operation
● Default AVRO or JSON or write your own
● Configurable
○ Different data converter for key and value
○ Specify how null, invalid or malformed message should be handled
● Kafka Connect isolates each plugin from one another so that libraries in one plugin
are not affected by the libraries in any other plugins
○ `plugin.path` is configured in the Kafka Connect worker configuration
○ Build your JAR with dependency and copy it to `plugin.path`.
Plugin: Data Converter
Enter Section/Running Header Here
# directory other than the home directory of Confluent Platform.
plugin.path=share/java
# Data converter plugin
value.converter.protoClassName=net.demonware.pipes.connect.data.proto.MessageEnvelopeOuterClass$Mess
ageEnvelope
Plugin: Data Converter
Enter Section/Running Header Here
Single Message
Transformation (SMT)
• Modifies messages going out of Kafka before it reaches Elasticsearch
• One message at a time
• Many built-in SMT are already available
• Flexible within the constraints of the TransformableRecord API and 1:{0,1}
mapping
• Transformation is chained
• Pluggable transformers through Connect configuration
What is SMT?
Enter Section/Running Header Here
Default Kafka Connect SMT
Enter Section/Running Header Here
Field Name Included in Kibana
InsertField Insert field using attributes from the record metadata or a configured
static value.
MaskField Mask specified fields with a valid null value for the field type.
ReplaceField Filter or rename fields.
TimestampConverter Convert timestamps between different formats such as Unix epoch,
strings, and Connect Date and Timestamp types.
TimestampRouter Update the record’s topic field as a function of the original topic value
and the record timestamp.
RegexRouter Update the record topic using the configured regular expression and
replacement string.
Field Name Included in Kibana
Cast Cast fields or the entire key or value to a specific type, e.g. to force an
integer field to a smaller width.
ExtractField Extract the specified field from a Struct when schema present, or a Map
in the case of schemaless data. Any null values are passed through
unmodified
ExtractTopic Replace the record topic with a new topic derived from its key or value.
Flatten Flatten a nested data structure. This generates names for each field by
concatenating the field names at each level with a configurable delimiter
character.
HoistField Wrap data using the specified field name in a Struct when schema
present, or a Map in the case of schemaless data.
ValueToKey Replace the record key with a new key formed from a subset of fields in
the record value.
• An alias in transforms implies that some additional keys are configurable.
• Syntax:
• transforms.$alias.type – fully qualified class name for the transformation
• transforms.$alias.* – all other keys as defined in Transformation.config() are
embedded with this prefix
• Example:
Configuring SMT
Enter Section/Running Header Here
transforms.insertKafkaMetadata.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.insertKafkaMetadata.topic.field=kafka_topic
transforms.removeFields.type=org.apache.kafka.connect.transforms.ReplaceField$Value
transforms.removeFields.blacklist=context,tracing,payload
transforms.convertTimestampUnit.type=net.demonware.pipes.kafka.connect.transforms.ConvertTimeToMillis$Value
transforms.convertTimestampUnit.timestamp.fields=created_at_us,ingested_at_us
• SMT is chained
• SMT are applied in the order they are specified in `transforms`.
• If your transformation is order dependent then need to make sure they are specified in the correct order
• Example:
Ordering of SMT matters!
Enter Section/Running Header Here
transforms=insertKafkaMetadata,indexMapping
transforms.indexMapping.type:org.apache.kafka.connect.transforms.TimestampRouter
transforms.indexMapping.topic.format:topic-changed-${timestamp}
transforms.indexMapping.timestamp.format:yyyy.MM.dd
transforms.insertKafkaMetadata.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.insertKafkaMetadata.topic.field=kafka_topic
• Only if you cannot use the built-in and cannot use Kafka Streams for the data
transformation.
• Must implement the Transformation interface.
• Consider to make your SMT configurable.
• If you have multiple custom SMT, better to have separate Transformation
implementation.
Create Custom SMT
Enter Section/Running Header Here
// Existing base class for SourceRecord and SinkRecord, new self type parameter.
public abstract class ConnectRecord<R extends ConnectRecord<R>> {
// ...
// New abstract method:
/** Generate a new record of the same type as itself, with the specified parameter values. **/
public abstract R newRecord(String topic, Schema keySchema, Object key, Schema valueSchema, Object value, Long timestamp);
}
public interface Transformation<R extends ConnectRecord<R>> extends Configurable, Closeable {
// via Configurable base interface:
// void configure(Map<String, ?> configs);
/**
* Apply transformation to the {@code record} and return another record object (which may be {@code record} itself) or {@code null},
* corresponding to a map or filter operation respectively. The implementation must be thread-safe.
*/
R apply(R record);
/** Configuration specification for this transformation. **/
ConfigDef config();
/** Signal that this transformation instance will no longer will be used. **/
@Override
void close();
}
Interface: Transformation
Map data format from
Kafka to
Elasticsearch
Enhance your data
Compute and add
monitoring metric
data
Data Filtering
● Recommended practice in general
● Transformation involves multiple
messages, such as aggregation.
● More complex transformation:
aggregation, windowing, joining
● When the transformed data will be
consumed by multiple downstream
consumers. Reduce overhead by
running transformation only once and
allow reuse.
● Lightweight and simple data
transformation.
● Covered by the Kafka Connect
built-in SMT
● Data footprint cost is a concern.
Large amount of transformed data
written back to Kafka is too costly.
● Simplicity in streaming data pipeline
is important. Want to keep data
pipeline stage and services to a
minimum
● Transformation does not interact with
external systems
SMT
Enter Section/Running Header Here
Kafka Streams
Dos and Don’ts
• Overwrite the ES @timestamp internal field
• Overwrite the document ‘_id’ field to have control over how should your data be
de-duplicated
• Remove unnecessary columns/fields to save space and footprint of your ES cluster.
• Manage your ES Index by day. You can use ‘TimestampRouter’ and ‘RegexRouter’ SMT to
generate ES indice per day for your data.
• Have binary data available for search in a user-friendly format then you need to transform
the binary data prior to indexing.
“123e4567-e89b-12d3-a456-426655440000”
Do’s
Enter Section/Running Header Here
● Some cosmetic data format tweaking can be done in Kibana
○ Date display format
○ Base64 Decode binary data for display
○ Type casting from integer to text
● If you need to modify Kafka Connect source code for any reason then you might want to
reconsider using Kafka Connect
○ it can be hard to debug and test. Maybe you should consider Kafka Streams instead
● When implement your own transformation, keep each transformation implementation
separate rather than have a single transformation class that does a bunch of things.
Don’ts
Enter Section/Running Header Here
Questions?
Kafka meetup -  kafka connect

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
kafka
kafkakafka
kafka
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Apache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging SystemApache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging System
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
Schema registry
Schema registrySchema registry
Schema registry
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Ähnlich wie Kafka meetup - kafka connect

Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connectorconfluent
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector? confluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)Landoop Ltd
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla Maheedhar Gunturu
 
Integration for real-time Kafka SQL
Integration for real-time Kafka SQLIntegration for real-time Kafka SQL
Integration for real-time Kafka SQLAmit Nijhawan
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageKai Sasaki
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEOMACHBASE
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafkadatamantra
 

Ähnlich wie Kafka meetup - kafka connect (20)

Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Catalyst optimizer
Catalyst optimizerCatalyst optimizer
Catalyst optimizer
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla
 
Integration for real-time Kafka SQL
Integration for real-time Kafka SQLIntegration for real-time Kafka SQL
Integration for real-time Kafka SQL
 
PD_Tcl_Examples
PD_Tcl_ExamplesPD_Tcl_Examples
PD_Tcl_Examples
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEO
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 

Kürzlich hochgeladen

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 

Kürzlich hochgeladen (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 

Kafka meetup - kafka connect

  • 1.
  • 2. Kafka Connect Kafka Meetup 2019 Yi Zhang Software Engineer
  • 3. Building Data Pipeline Observability using Elasticsearch and Kibana
  • 5. Data Format Standard is important! Enter Section/Running Header Here syntax="proto2"; message envelope { # required fields required string data_type = 1; required string create_at_us = 1; required string source_name = 1; # optional string schema = 1; # payload bytes payload = 1; }
  • 6. • To query Kafka message in real-time • To quickly find the location of a message • To trace for a historic event for debugging/diagnose • To monitor data quality in the pipeline • To monitor and project data volume in the pipeline for capacity planning • To detect abnormal data patterns Why? Enter Section/Running Header Here
  • 7. • Quick overview of Kafka Connect • How does the data transformation work in Kafka Connect • What is SMT • Some use cases for SMT • SMT vs Kafka Streams for data transformation • Tips for using Kafka Connect to sink data to Elasticsearch Takeaways Enter Section/Running Header Here
  • 9. • Lightweight and stateless • Scalable and fault-tolerant • Integrates with Kafka and many other data systems • Pluggable architecture make customization easy and configurable • Lots open source (connectors and converter plugins) available • Run in two modes: • standalone mode is great for dev and local testing • distributed mode is great for scaling and fault-tolerance • REST API available to monitor and configure your connectors in the distributed mode More reasons to use Kafka Connect Enter Section/Running Header Here
  • 11. ● Default AVRO or JSON or write your own ● Configurable ○ Different data converter for key and value ○ Specify how null, invalid or malformed message should be handled ● Kafka Connect isolates each plugin from one another so that libraries in one plugin are not affected by the libraries in any other plugins ○ `plugin.path` is configured in the Kafka Connect worker configuration ○ Build your JAR with dependency and copy it to `plugin.path`. Plugin: Data Converter Enter Section/Running Header Here # directory other than the home directory of Confluent Platform. plugin.path=share/java
  • 12. # Data converter plugin value.converter.protoClassName=net.demonware.pipes.connect.data.proto.MessageEnvelopeOuterClass$Mess ageEnvelope Plugin: Data Converter Enter Section/Running Header Here
  • 14. • Modifies messages going out of Kafka before it reaches Elasticsearch • One message at a time • Many built-in SMT are already available • Flexible within the constraints of the TransformableRecord API and 1:{0,1} mapping • Transformation is chained • Pluggable transformers through Connect configuration What is SMT? Enter Section/Running Header Here
  • 15. Default Kafka Connect SMT Enter Section/Running Header Here Field Name Included in Kibana InsertField Insert field using attributes from the record metadata or a configured static value. MaskField Mask specified fields with a valid null value for the field type. ReplaceField Filter or rename fields. TimestampConverter Convert timestamps between different formats such as Unix epoch, strings, and Connect Date and Timestamp types. TimestampRouter Update the record’s topic field as a function of the original topic value and the record timestamp. RegexRouter Update the record topic using the configured regular expression and replacement string.
  • 16. Field Name Included in Kibana Cast Cast fields or the entire key or value to a specific type, e.g. to force an integer field to a smaller width. ExtractField Extract the specified field from a Struct when schema present, or a Map in the case of schemaless data. Any null values are passed through unmodified ExtractTopic Replace the record topic with a new topic derived from its key or value. Flatten Flatten a nested data structure. This generates names for each field by concatenating the field names at each level with a configurable delimiter character. HoistField Wrap data using the specified field name in a Struct when schema present, or a Map in the case of schemaless data. ValueToKey Replace the record key with a new key formed from a subset of fields in the record value.
  • 17. • An alias in transforms implies that some additional keys are configurable. • Syntax: • transforms.$alias.type – fully qualified class name for the transformation • transforms.$alias.* – all other keys as defined in Transformation.config() are embedded with this prefix • Example: Configuring SMT Enter Section/Running Header Here transforms.insertKafkaMetadata.type=org.apache.kafka.connect.transforms.InsertField$Value transforms.insertKafkaMetadata.topic.field=kafka_topic transforms.removeFields.type=org.apache.kafka.connect.transforms.ReplaceField$Value transforms.removeFields.blacklist=context,tracing,payload transforms.convertTimestampUnit.type=net.demonware.pipes.kafka.connect.transforms.ConvertTimeToMillis$Value transforms.convertTimestampUnit.timestamp.fields=created_at_us,ingested_at_us
  • 18. • SMT is chained • SMT are applied in the order they are specified in `transforms`. • If your transformation is order dependent then need to make sure they are specified in the correct order • Example: Ordering of SMT matters! Enter Section/Running Header Here transforms=insertKafkaMetadata,indexMapping transforms.indexMapping.type:org.apache.kafka.connect.transforms.TimestampRouter transforms.indexMapping.topic.format:topic-changed-${timestamp} transforms.indexMapping.timestamp.format:yyyy.MM.dd transforms.insertKafkaMetadata.type=org.apache.kafka.connect.transforms.InsertField$Value transforms.insertKafkaMetadata.topic.field=kafka_topic
  • 19. • Only if you cannot use the built-in and cannot use Kafka Streams for the data transformation. • Must implement the Transformation interface. • Consider to make your SMT configurable. • If you have multiple custom SMT, better to have separate Transformation implementation. Create Custom SMT Enter Section/Running Header Here
  • 20. // Existing base class for SourceRecord and SinkRecord, new self type parameter. public abstract class ConnectRecord<R extends ConnectRecord<R>> { // ... // New abstract method: /** Generate a new record of the same type as itself, with the specified parameter values. **/ public abstract R newRecord(String topic, Schema keySchema, Object key, Schema valueSchema, Object value, Long timestamp); } public interface Transformation<R extends ConnectRecord<R>> extends Configurable, Closeable { // via Configurable base interface: // void configure(Map<String, ?> configs); /** * Apply transformation to the {@code record} and return another record object (which may be {@code record} itself) or {@code null}, * corresponding to a map or filter operation respectively. The implementation must be thread-safe. */ R apply(R record); /** Configuration specification for this transformation. **/ ConfigDef config(); /** Signal that this transformation instance will no longer will be used. **/ @Override void close(); } Interface: Transformation
  • 21. Map data format from Kafka to Elasticsearch
  • 25. ● Recommended practice in general ● Transformation involves multiple messages, such as aggregation. ● More complex transformation: aggregation, windowing, joining ● When the transformed data will be consumed by multiple downstream consumers. Reduce overhead by running transformation only once and allow reuse. ● Lightweight and simple data transformation. ● Covered by the Kafka Connect built-in SMT ● Data footprint cost is a concern. Large amount of transformed data written back to Kafka is too costly. ● Simplicity in streaming data pipeline is important. Want to keep data pipeline stage and services to a minimum ● Transformation does not interact with external systems SMT Enter Section/Running Header Here Kafka Streams
  • 27. • Overwrite the ES @timestamp internal field • Overwrite the document ‘_id’ field to have control over how should your data be de-duplicated • Remove unnecessary columns/fields to save space and footprint of your ES cluster. • Manage your ES Index by day. You can use ‘TimestampRouter’ and ‘RegexRouter’ SMT to generate ES indice per day for your data. • Have binary data available for search in a user-friendly format then you need to transform the binary data prior to indexing. “123e4567-e89b-12d3-a456-426655440000” Do’s Enter Section/Running Header Here
  • 28. ● Some cosmetic data format tweaking can be done in Kibana ○ Date display format ○ Base64 Decode binary data for display ○ Type casting from integer to text ● If you need to modify Kafka Connect source code for any reason then you might want to reconsider using Kafka Connect ○ it can be hard to debug and test. Maybe you should consider Kafka Streams instead ● When implement your own transformation, keep each transformation implementation separate rather than have a single transformation class that does a bunch of things. Don’ts Enter Section/Running Header Here