Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Dixith, Wework) Kafka Summit 2020

Creating Connector to bridge the worlds of
Kafka and gRPC at Wework
-- Anoop Dixith

Agenda
❏ Role of Kafka at Wework
❏ - Why Kafka? Why gRPC?
❏ Life without Connect
❏ Why Connectors?
❏ gRPC sink Connector at WeWork
❏ - Configuration, implementation, monitoring

Architecture (continued)
❏ Importance of the architecture
❏ Kafka-gRPC combination
❏ Java/Scala-Go
❏ Why gRPC?
❏ How the combination worked perfectly?

Architecture (continued)
Machine readable API contract, so
platform neutral
Throughput
Protobuf, so polyglot and faster,
smaller and simpler, safer payload
Distributed, HA
HTTP/2 - multiplexing, binary, safer,
future
Scalability
Streaming vs request/response Complementary components -
KStream, KTable, KSQL, Connectors
Community Community
gRPC Kafka

Life Without Connect
In the very beginning, there was no Connect
Kafka sources were connected to a variety of sinks at Wework- gRPC,
Elasticsearch etc. And hey, it worked well.
Yes, but what about
❏ Scalability?
❏ Security?
❏ Configurability?
❏ Error handling?
❏ Extendability?
Do not reinvent the wheel!

Then there was Connect...
Very simply, Kafka Connect is a framework to stream data into and out of Kafka
Properties:
❏ Broad copying, scalability, streaming and batch apps, parallelism.
❏ Does one and one thing very well - copying data
❏ Extensible through Connectors
Models:
❏ Connector
❏ Worker
❏ Data

Connect components
❏ Connectors – abstraction that handles data streaming by managing “tasks”
❏ Tasks – the implementation of how data is copied to or from Kafka
❏ Workers – the running processes that execute connectors and tasks
❏ Converters – translates data between Connect and the source/sink
❏ Transforms – alters messages produced by or sent to a connector
❏ Dead Letter Queue – error handling

Types of Connectors
❏ Source Connector - Kinesis, Zendesk, Jira, Twitter, email
❏ Sink Connector - S3, MongoDB, HDFS, YouNameIt
❏ Connect Hub - https://www.confluent.io/hub/
❏ Also available are different Transform and Converter
❏ Availability in Confluent Cloud - fully managed
❏ Licenses, levels of verification

Writing your own Connectors. Yes, we can!
❏ Why?
❏ Oh, we don’t have that Connector!
❏ We have a Connector, but we need to customize it to our needs
❏ Complete control on how we want to move the data
❏ Give it back to the community
❏ How?
❏ Kafka Connect API to the rescue!
❏ Implement/extend your Connector, Task, Config interfaces/abstract classes

gRPC Sink Connector
❏ What is it intended to do?
❏ Why not directly sink to underlying databases?

Implementation
❏ SinkRecord
❏ SinkTask

GrpcClient Interface
Crux of bulkSend() AKA how is gRPC Connector different from MySql Connector?
gRPC glossary
❏ stub: generated when protoc is run if a service declaration is in the proto file
❏ service class
❏ rpc name and args
❏ channel: provides a connection to a gRPC server on a specified host and port
❏ along with grpc server url and port

bulkSend()
bulkSend() forms the crux of the gRPC Sink Connector’s data copying
❏ Handles channel readiness (connectivity state)
❏ Manages security and all logic related to error-handling
❏ Controls the rate of data copying
❏ Potentially retry logic

Limitations of bulkSend()
Uses reflection to get stub classes and methods
What you can’t do with bulkSend() (yet)?

Configuration
Configs passed to GrpcSinkConnector object

Testing, Deployment, and Monitoring
❏ Testing Connector by mocking is hard and tricky as it involves two systems -
Kafka and the external source/sink
❏ Independent unit-testing is recommended for all Task and Connector classes
❏ End-to-end testing using gRPC servers created on the fly in Docker
containers of CircleCI testing plan
❏ Extremely difficult to test gRPC channel connectivity states and other error
scenarios

Testing, Deployment, and Monitoring (contd…)
❏ Packaging - for easy installing into into Kafka Connect installations
❏ By creating an Archive
❏ create a tarball or ZIP archive
❏ contains a single directory with unique name (name and version likely)
❏ all JAR files and other resource files needed by the connector are in tld
❏ doesn’t include Kafka Connect API or runtime libraries
❏ By creating an Uber JAR
❏ create an uber JAR that contains all JAR files and other resource files
❏ Installation -
❏ User simply unpacks the archive or places the uber JAR in a directory listed in Kafka
Connect’s plugin path

Monitoring Connectors
❏ Monitored via Connect’s extensive REST interface
❏ current status of a connector and its tasks
❏ worker ids to whom tasks are assigned
❏ pause/resume APIs
❏ active connectors, connector tasks, restart a connector, restart a task, update config, delete
connector
❏ Logging
❏ Connect comes with default Java-based logging utility Apache Log4j to collect runtime data
and record component events

Connect metrics and metrics using Prometheus
❏ Reports a variety of metrics through Java Management Extensions (JMX)
❏ task and worker metrics - status, running-ratio,
offset-commit-success-percentage, offset-commit-avg-time-ms, task-count,
connector-count, rebalancing metrics
❏ A variety of client metrics like connection-count, connection-close-rate,
network-io-rate, outgoing-byte-rate, request-rate etc
❏ gRPC Sink Connector metrics:
❏ sink-record-read-rate
❏ sink-record-active-count
❏ sink-record-read-total
❏ sink-record-send-rate

Connect metrics and metrics using Prometheus (cntd..)
❏ The monitoring tool Prometheus ingests metrics, makes them graphable, and
helps build alerts on top of metrics
❏ pulls metrics from HTTP endpoints added to the Prometheus configuration file
❏ provides JMX Exporter, a collector that can configurably scrape and expose
mBeans of a JMX target

Challenges and Lessons Learned
❏ Configuring more than one rpcs in a service
❏ Configuring rpcs with multiple arguments
❏ Testing the two components of the system
❏ Connectors have the capability to be extremely flexible, and can also hide
intricate logic when used off the shelf

Thank you!
Lets Connect over questions

Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Dixith, Wework) Kafka Summit 2020

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Dixith, Wework) Kafka Summit 2020

Ähnlich wie Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Dixith, Wework) Kafka Summit 2020 (20)

Mehr von confluent

Mehr von confluent (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Dixith, Wework) Kafka Summit 2020