SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Creating Connector to bridge the worlds of
Kafka and gRPC at Wework
-- Anoop Dixith
Agenda
❏ Role of Kafka at Wework
❏ - Why Kafka? Why gRPC?
❏ Life without Connect
❏ Why Connectors?
❏ gRPC sink Connector at WeWork
❏ - Configuration, implementation, monitoring
Architecture (publisher side)
Architecture (continued)
❏ Importance of the architecture
❏ Kafka-gRPC combination
❏ Java/Scala-Go
❏ Why gRPC?
❏ How the combination worked perfectly?
Architecture (continued)
Machine readable API contract, so
platform neutral
Throughput
Protobuf, so polyglot and faster,
smaller and simpler, safer payload
Distributed, HA
HTTP/2 - multiplexing, binary, safer,
future
Scalability
Streaming vs request/response Complementary components -
KStream, KTable, KSQL, Connectors
Community Community
gRPC Kafka
Life Without Connect
In the very beginning, there was no Connect
Kafka sources were connected to a variety of sinks at Wework- gRPC,
Elasticsearch etc. And hey, it worked well.
Yes, but what about
❏ Scalability?
❏ Security?
❏ Configurability?
❏ Error handling?
❏ Extendability?
Do not reinvent the wheel!
Architecture without Connect
Then there was Connect...
Very simply, Kafka Connect is a framework to stream data into and out of Kafka
Properties:
❏ Broad copying, scalability, streaming and batch apps, parallelism.
❏ Does one and one thing very well - copying data
❏ Extensible through Connectors
Models:
❏ Connector
❏ Worker
❏ Data
Connect components
❏ Connectors – abstraction that handles data streaming by managing “tasks”
❏ Tasks – the implementation of how data is copied to or from Kafka
❏ Workers – the running processes that execute connectors and tasks
❏ Converters – translates data between Connect and the source/sink
❏ Transforms – alters messages produced by or sent to a connector
❏ Dead Letter Queue – error handling
Types of Connectors
❏ Source Connector - Kinesis, Zendesk, Jira, Twitter, email
❏ Sink Connector - S3, MongoDB, HDFS, YouNameIt
❏ Connect Hub - https://www.confluent.io/hub/
❏ Also available are different Transform and Converter
❏ Availability in Confluent Cloud - fully managed
❏ Licenses, levels of verification
Connect flow
Writing your own Connectors. Yes, we can!
❏ Why?
❏ Oh, we don’t have that Connector!
❏ We have a Connector, but we need to customize it to our needs
❏ Complete control on how we want to move the data
❏ Give it back to the community
❏ How?
❏ Kafka Connect API to the rescue!
❏ Implement/extend your Connector, Task, Config interfaces/abstract classes
Custom Connectors
gRPC Sink Connector
❏ What is it intended to do?
❏ Why not directly sink to underlying databases?
Implementation
❏ SinkRecord
❏ SinkTask
GrpcService Interface
GrpcClient Interface
Crux of bulkSend() AKA how is gRPC Connector different from MySql Connector?
gRPC glossary
❏ stub: generated when protoc is run if a service declaration is in the proto file
❏ service class
❏ rpc name and args
❏ channel: provides a connection to a gRPC server on a specified host and port
❏ along with grpc server url and port
bulkSend()
bulkSend() forms the crux of the gRPC Sink Connector’s data copying
❏ Handles channel readiness (connectivity state)
❏ Manages security and all logic related to error-handling
❏ Controls the rate of data copying
❏ Potentially retry logic
Limitations of bulkSend()
Uses reflection to get stub classes and methods
What you can’t do with bulkSend() (yet)?
Configuration
Configs passed to GrpcSinkConnector object
Testing, Deployment, and Monitoring
❏ Testing Connector by mocking is hard and tricky as it involves two systems -
Kafka and the external source/sink
❏ Independent unit-testing is recommended for all Task and Connector classes
❏ End-to-end testing using gRPC servers created on the fly in Docker
containers of CircleCI testing plan
❏ Extremely difficult to test gRPC channel connectivity states and other error
scenarios
Testing, Deployment, and Monitoring (contd…)
❏ Packaging - for easy installing into into Kafka Connect installations
❏ By creating an Archive
❏ create a tarball or ZIP archive
❏ contains a single directory with unique name (name and version likely)
❏ all JAR files and other resource files needed by the connector are in tld
❏ doesn’t include Kafka Connect API or runtime libraries
❏ By creating an Uber JAR
❏ create an uber JAR that contains all JAR files and other resource files
❏ Installation -
❏ User simply unpacks the archive or places the uber JAR in a directory listed in Kafka
Connect’s plugin path
Monitoring Connectors
❏ Monitored via Connect’s extensive REST interface
❏ current status of a connector and its tasks
❏ worker ids to whom tasks are assigned
❏ pause/resume APIs
❏ active connectors, connector tasks, restart a connector, restart a task, update config, delete
connector
❏ Logging
❏ Connect comes with default Java-based logging utility Apache Log4j to collect runtime data
and record component events
Connect metrics and metrics using Prometheus
❏ Reports a variety of metrics through Java Management Extensions (JMX)
❏ task and worker metrics - status, running-ratio,
offset-commit-success-percentage, offset-commit-avg-time-ms, task-count,
connector-count, rebalancing metrics
❏ A variety of client metrics like connection-count, connection-close-rate,
network-io-rate, outgoing-byte-rate, request-rate etc
❏ gRPC Sink Connector metrics:
❏ sink-record-read-rate
❏ sink-record-active-count
❏ sink-record-read-total
❏ sink-record-send-rate
Connect metrics and metrics using Prometheus (cntd..)
❏ The monitoring tool Prometheus ingests metrics, makes them graphable, and
helps build alerts on top of metrics
❏ pulls metrics from HTTP endpoints added to the Prometheus configuration file
❏ provides JMX Exporter, a collector that can configurably scrape and expose
mBeans of a JMX target
Challenges and Lessons Learned
❏ Configuring more than one rpcs in a service
❏ Configuring rpcs with multiple arguments
❏ Testing the two components of the system
❏ Connectors have the capability to be extremely flexible, and can also hide
intricate logic when used off the shelf
Thank you!
Lets Connect over questions

Weitere ähnliche Inhalte

Was ist angesagt?

What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...
What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...
What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...
Simplilearn
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 

Was ist angesagt? (20)

Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...
What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...
What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...
 
Spring Cloud Config
Spring Cloud ConfigSpring Cloud Config
Spring Cloud Config
 
Service-mesh options with Linkerd, Consul, Istio and AWS AppMesh
Service-mesh options with Linkerd, Consul, Istio and AWS AppMeshService-mesh options with Linkerd, Consul, Istio and AWS AppMesh
Service-mesh options with Linkerd, Consul, Istio and AWS AppMesh
 
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Quarkus k8s
Quarkus   k8sQuarkus   k8s
Quarkus k8s
 
Docker & kubernetes
Docker & kubernetesDocker & kubernetes
Docker & kubernetes
 
Kubernetes Deployment Strategies
Kubernetes Deployment StrategiesKubernetes Deployment Strategies
Kubernetes Deployment Strategies
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory Guide
 

Ähnlich wie Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Dixith, Wework) Kafka Summit 2020

Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 

Ähnlich wie Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Dixith, Wework) Kafka Summit 2020 (20)

Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
REST in Peace. Long live gRPC!
REST in Peace. Long live gRPC!REST in Peace. Long live gRPC!
REST in Peace. Long live gRPC!
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskip
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28
 
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsComparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systems
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Dixith, Wework) Kafka Summit 2020

  • 1. Creating Connector to bridge the worlds of Kafka and gRPC at Wework -- Anoop Dixith
  • 2. Agenda ❏ Role of Kafka at Wework ❏ - Why Kafka? Why gRPC? ❏ Life without Connect ❏ Why Connectors? ❏ gRPC sink Connector at WeWork ❏ - Configuration, implementation, monitoring
  • 3.
  • 5. Architecture (continued) ❏ Importance of the architecture ❏ Kafka-gRPC combination ❏ Java/Scala-Go ❏ Why gRPC? ❏ How the combination worked perfectly?
  • 6. Architecture (continued) Machine readable API contract, so platform neutral Throughput Protobuf, so polyglot and faster, smaller and simpler, safer payload Distributed, HA HTTP/2 - multiplexing, binary, safer, future Scalability Streaming vs request/response Complementary components - KStream, KTable, KSQL, Connectors Community Community gRPC Kafka
  • 7. Life Without Connect In the very beginning, there was no Connect Kafka sources were connected to a variety of sinks at Wework- gRPC, Elasticsearch etc. And hey, it worked well. Yes, but what about ❏ Scalability? ❏ Security? ❏ Configurability? ❏ Error handling? ❏ Extendability? Do not reinvent the wheel!
  • 9. Then there was Connect... Very simply, Kafka Connect is a framework to stream data into and out of Kafka Properties: ❏ Broad copying, scalability, streaming and batch apps, parallelism. ❏ Does one and one thing very well - copying data ❏ Extensible through Connectors Models: ❏ Connector ❏ Worker ❏ Data
  • 10. Connect components ❏ Connectors – abstraction that handles data streaming by managing “tasks” ❏ Tasks – the implementation of how data is copied to or from Kafka ❏ Workers – the running processes that execute connectors and tasks ❏ Converters – translates data between Connect and the source/sink ❏ Transforms – alters messages produced by or sent to a connector ❏ Dead Letter Queue – error handling
  • 11. Types of Connectors ❏ Source Connector - Kinesis, Zendesk, Jira, Twitter, email ❏ Sink Connector - S3, MongoDB, HDFS, YouNameIt ❏ Connect Hub - https://www.confluent.io/hub/ ❏ Also available are different Transform and Converter ❏ Availability in Confluent Cloud - fully managed ❏ Licenses, levels of verification
  • 13. Writing your own Connectors. Yes, we can! ❏ Why? ❏ Oh, we don’t have that Connector! ❏ We have a Connector, but we need to customize it to our needs ❏ Complete control on how we want to move the data ❏ Give it back to the community ❏ How? ❏ Kafka Connect API to the rescue! ❏ Implement/extend your Connector, Task, Config interfaces/abstract classes
  • 15. gRPC Sink Connector ❏ What is it intended to do? ❏ Why not directly sink to underlying databases?
  • 18. GrpcClient Interface Crux of bulkSend() AKA how is gRPC Connector different from MySql Connector? gRPC glossary ❏ stub: generated when protoc is run if a service declaration is in the proto file ❏ service class ❏ rpc name and args ❏ channel: provides a connection to a gRPC server on a specified host and port ❏ along with grpc server url and port
  • 19. bulkSend() bulkSend() forms the crux of the gRPC Sink Connector’s data copying ❏ Handles channel readiness (connectivity state) ❏ Manages security and all logic related to error-handling ❏ Controls the rate of data copying ❏ Potentially retry logic
  • 20. Limitations of bulkSend() Uses reflection to get stub classes and methods What you can’t do with bulkSend() (yet)?
  • 21. Configuration Configs passed to GrpcSinkConnector object
  • 22. Testing, Deployment, and Monitoring ❏ Testing Connector by mocking is hard and tricky as it involves two systems - Kafka and the external source/sink ❏ Independent unit-testing is recommended for all Task and Connector classes ❏ End-to-end testing using gRPC servers created on the fly in Docker containers of CircleCI testing plan ❏ Extremely difficult to test gRPC channel connectivity states and other error scenarios
  • 23. Testing, Deployment, and Monitoring (contd…) ❏ Packaging - for easy installing into into Kafka Connect installations ❏ By creating an Archive ❏ create a tarball or ZIP archive ❏ contains a single directory with unique name (name and version likely) ❏ all JAR files and other resource files needed by the connector are in tld ❏ doesn’t include Kafka Connect API or runtime libraries ❏ By creating an Uber JAR ❏ create an uber JAR that contains all JAR files and other resource files ❏ Installation - ❏ User simply unpacks the archive or places the uber JAR in a directory listed in Kafka Connect’s plugin path
  • 24. Monitoring Connectors ❏ Monitored via Connect’s extensive REST interface ❏ current status of a connector and its tasks ❏ worker ids to whom tasks are assigned ❏ pause/resume APIs ❏ active connectors, connector tasks, restart a connector, restart a task, update config, delete connector ❏ Logging ❏ Connect comes with default Java-based logging utility Apache Log4j to collect runtime data and record component events
  • 25. Connect metrics and metrics using Prometheus ❏ Reports a variety of metrics through Java Management Extensions (JMX) ❏ task and worker metrics - status, running-ratio, offset-commit-success-percentage, offset-commit-avg-time-ms, task-count, connector-count, rebalancing metrics ❏ A variety of client metrics like connection-count, connection-close-rate, network-io-rate, outgoing-byte-rate, request-rate etc ❏ gRPC Sink Connector metrics: ❏ sink-record-read-rate ❏ sink-record-active-count ❏ sink-record-read-total ❏ sink-record-send-rate
  • 26. Connect metrics and metrics using Prometheus (cntd..) ❏ The monitoring tool Prometheus ingests metrics, makes them graphable, and helps build alerts on top of metrics ❏ pulls metrics from HTTP endpoints added to the Prometheus configuration file ❏ provides JMX Exporter, a collector that can configurably scrape and expose mBeans of a JMX target
  • 27. Challenges and Lessons Learned ❏ Configuring more than one rpcs in a service ❏ Configuring rpcs with multiple arguments ❏ Testing the two components of the system ❏ Connectors have the capability to be extremely flexible, and can also hide intricate logic when used off the shelf
  • 28. Thank you! Lets Connect over questions