Event Streaming Architectures with Confluent and ScyllaDB

Event Streaming
Architectures with
Confluent and ScyllaDB
Jeff Bean, Partner Solutions Architect

Presenters
Jeff bean
Jeff Bean is a Partner Solution Architect at Confluent. He's responsible for helping to
build and verify integrations with the Confluent Platform.

Agenda
■ Kafka on-the-quick
■ The Latest with Confluent
● Schema Registry
● Kafka Connect
● Kubernetes Operator
● Confluent Cloud
● KSQL and Event Streaming DB
■ Confluent + ScyllaDB in 2020

Kafka
A modern, distributed
Platform for data
streams

Scalability of a filesystem
■ hundreds of MB/s throughput
■ many TB per server
■ commodity hardware

Guarantees of a Database
■ Strict ordering
■ Persistence

Rewind & Replay
Reset to any point in the shared narrative

Distributed by design
■ Replication
■ Fault Tolerance
■ Partitioning
■ Elastic Scaling

Schema Registry: Make Data Backwards
Compatible and Future-Proof
Deploy with reliability
■ Validate data compatibility and get
warnings
■ Let developers focus on deploying apps
App 1
!
Schema
Registry
Kafka
topicScale with confidence
■ Store a versioned history of all schemas
■ Enable evolution of schemas while
preserving backwards compatibility for
existing consumers
!
Serializer
App 1
Serializer

REST Proxy
Non-Java
Applications
Native Kafka Java
Applications
Schema
Registry
REST / HTTP
Simplifies message
creation and consumption
Communicate via HTTP-
connected devices
Provides a RESTful interface
to a Kafka cluster
REST Proxy: Talk to Non-native Kafka Apps and
Outside the Firewall

MQTT Proxy: Streamline IoT Data Integration
with Kafka
Connect IoT data sources
leveraging all of your
infrastructure investments
Reduce operational cost and
complexity by eliminating third
party MQTT brokers and their
intermediate storage and lag
Ensure IoT data delivery at all QoS
levels (QoS0, QoS1 and QoS2) of the
MQTT protocol
Gateways B R O K E R
Devices MQTT Proxy

Confluent Platform Deployment Options
Confluent Platform
New!
Confluent Cloud On Kubernetes RPMs/Debs/Tarballs
No DevOps Medium DevOps High DevOps

Confluent operator -
a custom Kubernetes controller
■ Nodes and pods are where
Applications run on Kubernetes
■ Applications use objects like
StatefulSets, Configmaps, PVs
■ Custom Controllers create custom
resources that provide unique
application functionality:
● Upgrades, elasticity, Kafka
Operational Logic
PODS
StatefulSets
ConfigMaps
PVs
Custom
Resources
API Server
Scheduler
Controllers &
Custom
Controllers
Master Node Worker Node

Persistent Volumes - AWS EBS, GlusterFS, GCE Persistent Disk
External
Access
Load
Balancers
Configurations
ConfigMaps
K8 Node
KSQL Pod REST Proxy Pod
K8 Node
SR
Pod
Replicator
Pod
C3
Pod
K8 Node
ZK Pod
K8 Node
ZK Pod
Kubernetes
Cluster
Operator
Confluent Operator Architecture and Deployment

Automated
Provisioning
Load balancer
Storage
Kafka
pod
Kafka
pod
Kafka
pod
Persistent
volumes
Ingress

Automated Security
Configuration
■ SASL PLAIN, SASL_SSL, TLS with
Mutual Authentication
■ Automate configuration of
truststores and keystores with
secret objects
■ Automate configuration of Kafka
and all Confluent Platform
Components

Scale Horizontally
■ Elastic Scaling:
● Spin up new brokers, connect
workers easily
● All components are
automatically configured when
scaled
■ Distribute partitions to new
brokers:
● Determine balancing plan
● Execute balancing plan
● Monitor Resources

Rolling Upgrade of
all components
Automated rolling upgrades of all
components - Kafka Brokers,
Zookeeper, Connect, Control Center
Kafka Broker Upgrades:
■ Stop the broker, upgrade Kafka
■ Wait for Partition Leader reassignment
■ Start the upgraded broker
■ Wait for zero under-replicated partitions
■ Upgrade the next broker

Confluent Cloud | Full feature set
Scale
Unlimited throughput,
unlimited retention
Availability
99.95% uptime SLA
Durability
Multi-AZ with 3
availability zones
(option)
Connection
VPC peering (option)
Support
24x7 Gold SVPC peering
(option)
Terms
1 year commitment;
flexible payment options
Cloud
AWS and GCP

● Sub-25 ms latencies*
at massive scale
● 3 days or less to get
up and running
● Unlimited throughput
and fanout
● Infinite retention
Confluent Cloud | High performance

Customer VPC
Dedicated
Kafka (Multi-
AZ)
Confluent Cloud VPC
ELB
(Public IP)
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
3x replication
Confluent Cloud | High availability
99.95%
Uptime
Guarantee

Confluent Cloud, What does Fully-managed
Mean?
Infrastructure
management
(commodity)
Scaling
● Sizing (retention, latency, throughput, storage, etc.)
● Data balancing for optimal performance
● Performance tuning for real-time and latency requirements
● Fixing Kafka bugs
● Uptime monitoring and proactive remediation of issues
● Recovery support from data corruption
● Scaling the cluster as needed
● Data balancing the cluster as nodes are added
● Support for any Kafka issue with less than 60 minute response time
Evolve as you
need
Future-proof
Harness full power of Kafka
Platform-as-a-Service
Mission-critical reliability
Most Kafka as a Service offerings are partially-managed
Kafka-specific
management
● Upgrades (latest stable version of Kafka)
● Patching
● Maintenance
Infra-as-a-Service

Apache Kafka™ Connect API – Streaming Data
Capture
JDBC
Tibco
EMS
MySQL
Elastic
ScyllaDB
HDFS/
HIVE
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of
data sources and sinks
Preserves data schema
Part of Apache Kafka
project
Integrated within
Confluent Platform’s
Control Center

Kafka Connect API Library of Connectors

1 input, 1 output.
Low latency,
Poor throughput.
Request/
Response
All inputs, all outputs.
Poor latency,
high throughput.
Batch
Some inputs, some outputs.
Tunable latency &
throughput.
Stream
Processing

ConsumerRecords<String, String> records = consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
Stream processing approach comparison
Kafka producer/consumer Kafka Streams KSQL
builder
.stream("input-stream",
Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(), Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;

TABLES STREAMS
USER
JAY
SUE
FRED
CREDIT_SCORE
695
430
710V1
V3
V2
PAYMENTS
42
18
65
...
USER
JAY
SUE
FRED
...

STREAMS
CREATE STREAM payments (user VARCHAR, amount INT)
WITH (kafka_topic = ’all_payments’,
key = ’user’,
value_format= ’avro’);

TABLES
CREATE TABLE credit_scores AS
SELECT user, updateScore(p.amount) AS credit_score
FROM payments AS p
GROUP BY user;

Most stream processing architectures are
complex
DB CONNECTOR
CONNECTOR
APP
APP
DB
STREAM
PROCESSING
CONNECTOR APPDB

Simplifying: Event Streaming Database
DB
APP
APP
DB
APP
PULL
PUSH
CONNECTORS
STREAM
PROCESSING
STATE STORES

Serve lookups against
materialized views
Create
materialized views
Perform continuous
transformations
CREATE SOURCE CONNECTOR jdbcConnector WITH (
‘connector.class’ = '...JdbcSourceConnector',
‘connection.url’ = '...',
…);
CREATE STREAM purchases AS
SELECT viewtime, userid,pageid,
TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS')
FROM pageviews;
CREATE TABLE orders_by_country AS
SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total
FROM purchases
WINDOW TUMBLING (SIZE 5 MINUTES)
LEFT JOIN purchases ON purchases.customer_id = user_profiles.customer_id
GROUP BY country
EMIT CHANGES;
SELECT * FROM orders_by_country WHERE country='usa' LIMIT 100;
Capture data
Build a complete streaming app with 4 SQL
statements

Connectors as First Class Components
CREATE SINK CONNECTOR scyllaDB WITH (
‘connector.class’ = ‘...ScyllaDBConnector',
‘topics’ = 'CREDIT_SCORES',
‘connection.url’ = 'http://localhost:9200',
‘type.name’ = 'kafka-connect',
...
);
APP
KSQL
CONNECTOR
APP

Kafka Tutorials are our new best way
to learn KSQL
Everyone gets that stream
processing is cool, but no one
knows how to apply it.
Kafka Tutorials is a new microsite
that evangelizes end-to-end use cases
for CP, with an emphasis on KSQL.

The Lookout Approach
https://www.confluent.io/online-talks/scaling-security-on-100s-of-millions-of-mobile-devices-using-kafka-and-scylla

The Connector Template
{
"connector.class": "com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkConnector",
"errors.log.include.messages": "true",
"connect.cassandra.key.space": "cds",
"tasks.max": "10",
"topics": "metron.client",
"connect.cassandra.kcql": "UPSERT INTO client SELECT * from metron.client",
"connect.cassandra.password": "cassandra",
"connect.progress.enabled": "true",
"connect.cassandra.username": "cassandra",
"connect.cassandra.contact.points": "scylla-node1.staging.hollandaise.com,scylla-
node2.staging.hollandaise.com,scylla-node3.staging.hollandaise.com",
"connect.cassandra.port": "9042",
"name": "client-flow",
"errors.tolerance": "all",
"errors.log.enable": "true",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://sr0.staging.hollandaise.com:8081"
}

https://www.confluent.io/blog/scylla-confluent-integration-for-iot-deployments
Confluent + ScyllaDB in IoT

The Future!
■ Shard-Aware native ScyllaDB connector
■ Event Streaming DB
■ Connector Ecosystem in Confluent Cloud

More Information
■ Confluent + Scylla Blog on IoT:
https://www.confluent.io/blog/scylla-confluent-integration-for-iot-deployments
■ Confluent + Scylla + Lookout Joint webinar
https://www.confluent.io/online-talks/scaling-security-on-100s-of-millions-of-mobile-
devices-using-kafka-and-scylla
■ Simplest Useful Data Pipeline with Kafka Connect
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-
thereabouts-part-1

More Information
■ KSQL Tutorials
https://kafka-tutorials.confluent.io
■ Kubernetes Operator Tutorial
https://docs.confluent.io/current/installation/operator/index.html
■ Confluent Cloud
https://docs.confluent.io/current/cloud/index.html
■ Event Streaming DB: Jay Kreps Keynote Kafka Summit 2019
https://www.confluent.io/kafka-summit-san-francisco-2019/jay-kreps-keynote-ft-dev-
tagare-lyft

Thank you Stay in touch
Any questions? Jeff Bean
jwfbean@confluent.io
@jwfbean

Event Streaming Architectures with Confluent and ScyllaDB

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Event Streaming Architectures with Confluent and ScyllaDB

Ähnlich wie Event Streaming Architectures with Confluent and ScyllaDB (20)

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Event Streaming Architectures with Confluent and ScyllaDB