Connecting kafka message systems with scylla

Connecting Kafka Message
Systems with Scylla
Maheedhar Gunturu, Solutions Architect

Presenter
Maheedhar Gunturu, Solutions Architect
Maheedhar held senior roles both in engineering and sales
organizations. He has over a decade of experience designing &
developing server-side applications in the cloud and working on
big data and ETL frameworks in companies such as Samsung,
MapR, Apple, VoltDB, Zscaler and Qualcomm. He holds a
masters degree in Electrical and Computer engineering from
the University of Texas at San Antonio.

Agenda
1. Benefits of Message Queues
2. Kafka Connect Framework

Benefits of messaging queues
■ Centralized Infrastructure.
■ Intermediate layer for buffering.
■ Export and Import capabilities.
■ Publish CDC streams.
■ Integrate with various applications.
■ Streaming Data Transformations.
■ Impedance mismatch between applications.
■ Ability to recreate state.

Centralized Infrastructure
Microservices
Apps
Operational
Applications
Data Warehouse
Databases
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Operational
Alerts
Streams of real time events
Stream processing apps
Databases

Intermediate layer for buffering.
■ Provides flexibility for downstream Consumers.
● Buffer data while upgrades, migrations or troubleshooting.
■ Downstream systems don't have to be provisioned for peak traffic
● Save hardware costs.
■ Dynamically scalable layer to handle bursty loads
● Add more partitions/brokers to increase parallelism and throughput.
● Use kafka operator and it will dynamically scale the cluster based on ingress traffic.
■ Provides resiliency and fault tolerance
● Each Topic has replicas available with multiple partitions spread across multiple
brokers.
● Set TTLs at the topic level to determine retention.

Export and Import capabilities.

Publish CDC streams
■ Publish record level changes to the corresponding Topics.
● Usually configurable to what level of detail you want in the change records.
■ Upstream changes from watched rows emitted as a change record
● The format of these rows is in a configurable format (JSON, Avro etc)
■ Downstream processing for reporting, caching, or full-text indexing.
● Subscribe with the corresponding consumer (i.e. Scylla, elasticsearch, spark)
■ Changefeeds are emitted with at-least-once delivery guarantees.
● In most cases, each version of a row will be emitted once. However, some infrequent
conditions (e.g., node failures, network partitions) will cause them to be repeated.

Integrations
Scylla
Mongo
Example Consumers
Serializer
App 2
Serializer
App 3
!
Schema
Registry
Elastic
Serializer
App 1
!
Kafka Topic
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible
changes
● Support multi-data center environments
Hbase

Streaming Data Transformations.
Streams
API
Producer
Topic TopicTopic
Consumer Consumer
Overview
• Write standard Java applications
• No separate processing cluster required
• Exactly-once processing semantics
• Elastic, highly scalable, fault-tolerant
• Fully integrated with Kafka security
Example Use Cases
• Event-Driven Microservices
• Continuous queries
• Continuous transformations
Kafka Cluster

■ Applications produce and consume data at a different rate.
● Provides flexibility for the downstream applications to scale based on their SLAs
■ Downstream applications can be independently scaled
● Dynamically move partitions to optimize resource utilization and reliability.
● Enable elastic scaling by easily adding and removing nodes from your Kafka cluster.
■ Tuning topic’s configuration will help in efficient use of consumers
● Determine the ratio between number of partitions in a topic and number of
consumers.
● ADB traffic is throttled upon data transfers to ensure network bandwidth
Impedance mismatch between applications.

Event Sourcing
■ Every change to the state of an application is captured in an event
object.
● Order of the events needs to be maintained.
■ Ability to recreate state in your application and the supporting
database.
● cqrs provides the benefit of event sourcing analogous to a materialized view
● Need to keep track of lineage and the transformations that were run on the data.
■ Newer versions of ML algorithms can operate on the raw Event data
to recreate the state in the database.
● Better model serving/benchmarking.

Kafka Connect Features
01
A standard framework for
Kafka connectors.
04
Distributed & scalable by
default.
04
Automatic offset mgmt.
02
Distributed and standalone
modes.
06
Streaming/batch integration.
03
REST interface for configuration.
Port: 8083

Kafka Connect API
CDC
Database
Mongo
Cassandra
Elastic
Scylla
HDFS
Kafka Connect API
Kafka Pipeline
Connect Worker
Connect worker
Connect worker
Connect Worker
Connect Worker
Connect Worker
Sources Sinks
Auto-recovery and
Fault tolerant
Manage hundreds of
data sources and
sinks
Preserves data
schema
Integrated within
Confluent Control
Center
Simple Parallelism

Configuring Kafka Connect (sink)
#sample casssandra-sink.properties file
name=sink
topics=temperature
tasks.max=1
connector.class=io.confluent.connect.cassandra.CassandraSinkConn
ector
cassandra.contact.points=<PUBLIC IPs of your SCYLLA Cluster
(IP1,IP2,IP3)>
cassandra.keyspace=demo
cassandra.compression=SNAPPY
cassandra.consistency.level=LOCAL_QUORUM
transforms=prune
transforms.prune.type=org.apache.kafka.connect.transforms.Replac
eField$Value
transforms.prune.whitelist=CreatedAt,Id,Text,Source,Truncated
1. Update the sink.properties
2. Update the connect-
distributed.properties file
3. start the Connect framework using the
Cassandra connector in distributed
mode.
ref: https://www.scylladb.com/2018/12/19/scylla-and-confluent-for-iot/

Kafka Connect Security
Encryption
■ Kafka Connect also works with SSL-encrypted connections to these
brokers.
Authentication
■ Kafka Connect works with SASL – e.g. Kerberos, Active Directory
Authorization
■ Restrict who can create, write to, read from topics, and more
■ REST API for Kafka Connect nodes are not secure.
● Require an external proxy (eg Apache HTTP) to act as a secure gateway to the REST
services, when configuring a secure cluster.

Confluent Hub
■ Discover and share
Connectors
■ Cassandra (OSS) and
Dynamodb Source/Sink
connectors available.
■ Scylla Shard aware
connector to be
published soon!
https://www.confluent.io/hub/confluentinc/kafka-connect-cassandra

TakeAways
■ Message queues are useful for a variety of reasons.
■ Scylla Kafka Connecter ( Sink and CDC source) will be coming out
soon!
■ Event Streaming and Event-driven microservices are useful - try it
out!

Thank you Stay in touch
Any questions?
Maheedhar Gunturu
maheedhar@scylladb.com
@vanguard_space

some useful links
Here are some useful links for further reading/watching.
1. Useful video explaining most things for a low level of understanding – https://www.confluent.io/kafka-summit-sf18/so-
you-want-to-write-a-connector
2. Confluent’s Developer guide to connectors which covers most basics –
https://docs.confluent.io/current/connect/devguide.html
3. The source for above developer guide is available through maven here –
https://mvnrepository.com/artifact/org.apache.kafka/connect-file/2.1.1
4. Useful guide providing additional best practices ( now deprecated though still useful) –
https://docs.google.com/document/d/1jEn_G-KDsrhdecPTGIWIcke1I4gw4fR0G8OVj8e3iAI/edit#
5. Verification guide though a little generic as it is for both Connectors and Consumer/producers –
https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
6. https://opencredo.com/blogs/kafka-connect-source-connectors-a-detailed-guide-to-connecting-to-what-you-love/
7. https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/
8. https://www.confluent.io/blog/the-simplest-useful-kafka-connect-data-pipeline-in-the-world-or-thereabouts-part-2/
9. https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-3/

Connecting kafka message systems with scylla

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Connecting kafka message systems with scylla

Ähnlich wie Connecting kafka message systems with scylla (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Connecting kafka message systems with scylla

Hinweis der Redaktion