SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Building a Real-time Streaming
ETL Framework Using ksqlDB
and NoSQL
Hojjat Jafarpour, Software Engineer at Confluent
Maheedhar Gunturu, Solutions Architect at ScyllaDB
Presenters
Hojjat Jafarpour, Software Engineer at Confluent
Hojjat is a software engineer and the creator of KSQL, the Streaming SQL engine for Apache
Kafka, at Confluent. Before joining Confluent he worked at NEC Labs, Informatica, Quantcast
and Tidemark on various big data management projects. He has a Ph.D. in computer
science from UC Irvine, where he worked on scalable stream processing and
publish/subscribe systems.
Maheedhar Gunturu, Solutions Architect at ScyllaDB
Maheedhar held senior roles both in engineering and sales organizations. He has over a decade
of experience designing & developing server-side applications in the cloud and working on big
data and ETL frameworks in companies such as Samsung, MapR, Apple, VoltDB, Zscaler and
Qualcomm.
2
Agenda
+ Overview of ScyllaDB
+ Apache Kafka and The Confluent Platform
+ Example Use Cases
+ QA
3
About ScyllaDB
5
+ The Real-Time Big Data Database
+ Drop-in replacement for Apache Cassandra
and Amazon DynamoDB
+ 10X the performance & low tail latency
+ Open Source, Enterprise and Cloud options
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA, USA; Herzelia, Israel;
Warsaw, Poland
About ScyllaDB
Scylla Design Principles
C++ instead of Java Shard per Core All Things Async
Unified Cache I/O Scheduler Self-Optimizing
Seastar Framework
Compatibility
+ CQL native protocol
+ JMX management protocol
+ Management command line
+ SSTable file format
+ Configuration file format
+ CQL language
8
/REST
+ Helps with
+ Database mirroring/replication/state propagation
+ Direct data into a Kafka stream
+ Configurable subscription options to the change log (per table)
+ Post-image (Changed state)
+ Delta (changes per column)
+ Pre-image (Previous state)
+ Scylla CDC-Kafka source connector coming out soon!
Ref: https://www.scylladb.com/tech-talk/change-data-capture-in-scylla/
Change Data Capture (CDC) from Scylla
Apache Kafka and Confluent Platform
Pre-Streaming
New World: Streaming First
Apache Kafka
Kafka
Cluster
A Distributed Commit Log. Publish and subscribe to
streams of records. Highly scalable, high throughput.
Supports transactions. Persisted data.
Reads are a single seek & scan
Writes
are
append
only
Apache Kafka
Kafka Connect API
Reliable and scalable integration of Kafka with other systems –
no coding required.
Apache Kafka
Kafka Streams API
Write standard Java applications & microservices to
process your data in real-time
Orders
Table
Customers
Kafka Streams API
Stream Processing by Analogy
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
Stream Processing in Kafka
Simplicity
Flexibility
Consumer,
Producer
subscribe(), poll(),
send(), flush()
Stream Processing in Kafka
Simplicity
Flexibility
Consumer,
Producer
subscribe(), poll(),
send(), flush()
Kafka Streams
map(), filter(),
aggregate(), join()
Stream Processing in Kafka
Simplicity
Flexibility
Consumer,
Producer
subscribe(), poll(),
send(), flush()
Kafka Streams
map(), filter(),
aggregate(), join()
ksqlDB
SELECT … FROM
… JOIN .. GROUP
BY ...
ksqlDB
+ The event streaming database purpose-built for stream
processing applications
+ Enables stream processing with zero coding required
+ The simplest way to process streams of data in real
time
+ Powered by Kafka: scalable, distributed, battle-tested
+ All you need is Kafka–no complex deployments of
bespoke systems for stream processing
ksqlDB
+ Streaming ETL
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
ksqlDB
+ Real-Time Monitoring
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
ksqlDB
+ Features
+ Aggregation
+ Window
+ Tumbling
+ Hopping
+ Session
+ Join
+ Stream-Stream
+ Stream-Table
+ Table-Table
+ Nested data
+ STRUCT
+ UDF/UDAF/UDTF
+ AVRO, JSON, CSV
+ Protobuf to come soon
+ And many more...
Example Use Cases
Using Syslog to Detect SSH Attacks
KSQL
Syslog Syslog Data Syslog Data
Sink
connector
ksql> CREATE SINK CONNECTORSINK_SCYLLA_SYSLOG WITH (
'connector.class' = 'io.connect.scylladb.ScyllaDbSinkConnector',
'connection.url' = 'localhost:9092',
'type.name' = '',
'behavior.on.malformed.documents' = 'warn',
'errors.tolerance' = 'all',
'errors.log.enable' = 'true',
'errors.log.include.messages' = 'true',
'topics' = 'syslog',
'key.ignore' = 'true',
'schema.ignore' = 'true',
'key.converter' = 'org.apache.kafka.connect.storage.StringConverter'
);
ksql> CREATE STREAM SYSLOG WITH (KAFKA_TOPIC='syslog', VALUE_FORMAT='AVRO');
ksql> SELECT TIMESTAMPTOSTRING(S.DATE, 'yyyy-MM-dd HH:mm:ss') AS SYSLOG_TS, S.HOST,
F.DESCRIPTION AS FACILITY, S.MESSAGE, S.REMOTEADDRESS FROM SYSLOG S
LEFT OUTER JOIN FACILITY F ON S.FACILITY=F.ROWKEY WHERE S.HOST='demo' EMIT CHANGES;
ksql> CREATE STREAM SYSLOG_INVALID_USERS AS SELECT * FROM SYSLOG WHERE MESSAGE LIKE
'Invalid user%';
ksql> CREATE STREAM SSH_ATTACKS AS SELECT TIMESTAMPTOSTRING(DATE, 'yyyy-MM-dd HH:mm:ss')
AS SYSLOG_TS, HOST, SPLIT(REPLACE(MESSAGE,'Invalid user ',''),' from ')[0] AS ATTACK_USER,
SPLIT(REPLACE(MESSAGE,'Invalid user ',''),' from ')[1] AS ATTACK_IP FROM
SYSLOG_INVALID_USERS EMIT CHANGES;
ksql> CREATE TABLE SSH_ATTACKS_BY_USER AS SELECT ATTACK_USER, COUNT(*) AS ATTEMPTS FROM
SSH_ATTACKS GROUP BY ATTACK_USER;
ksql> SELECT ATTACK_USER, ATTEMPTS FROM SSH_ATTACKS_BY_USER EMIT CHANGES; (push)
ksql> SELECT ATTACK_USER, ATTEMPTS FROM SSH_ATTACKS_BY_USER WHERE ROWKEY='oracle'; (pull)
Sink
connector
ksql> CREATE SOURCE CONNECTOR SOURCE_SYSLOG_UDP_01 WITH (
'tasks.max' = '1',
'connector.class',
'io.confluent.connect.syslog.SyslogSourceConnector',
'topic' = 'syslog',
'syslog.port' = '42514',
'syslog.listener' = 'UDP',
'syslog.reverse.dns.remote.ip' = 'true',
'confluent.license' = '',
'confluent.topic.bootstrap.servers' = 'kafka:29092',
'confluent.topic.replication.factor' = '1'
);
ksql> CREATE SINK CONNECTOR SINK_ELASTIC_SYSLOG WITH (
'connector.class' =
'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
'connection.url' = 'http://elasticsearch:9200',
'type.name' = '',
'behavior.on.malformed.documents' = 'warn',
'errors.tolerance' = 'all',
'errors.log.enable' = 'true',
'errors.log.include.messages' = 'true',
'topics' = 'SYSLOG_INVALID_USERS',
'key.ignore' = 'true',
'schema.ignore' = 'true',
'key.converter' = 'org.apache.kafka.connect.storage.StringConverter'
);
IOT - Smart Home
Hub
Mode
Service CTADevice
State
Device
health
Streams of real time events CDC
Hub Info Lookup info
Device data
mgmt Apps and services
MQTT PROXY
ksql> CREATE STREAM device_stream_mode WITH
(KAFKA_TOPIC='syslog',VALUE_FORMAT ='AVRO');
ksql> CREATE STREAM device_change_mode AS SELECT D.dev_id,
D.dev_type, H.hub_mode AS device_mode FROM hub_mode H LEFT OUTER
JOIN device_data D ON H.hub_id=D.hub_id EMIT CHANGES;
ksql> CREATE STREAM device_stream_mode AS SELECT DS.dev_id ,
DS.dev_type, DS.mode, F.state AS dev_state FROM device_change_mode
DS LEFT OUTER JOIN FACILITY F ON DS.dev_type=F.dev_type WHERE
DS.mode=<DEVICE_MODE> EMIT CHANGES;
### CONFIGURE THE MQTT SINK
ksql> INSERT INTO hub_mode SELECT * FROM /mqttTopicA/+/sensors
[WITHCONVERTER=`myclass.AvroConverter`]
ksql> CREATE STREAM hub_mode WITH (KAFKA_TOPIC='hub_mode', VALUE_FORMAT='AVRO');
### Create the necessary sink and CDC Source connector to SCYLLA
Source and Sink
connector
Customer Satisfaction - CES Score
CDC
Segmentation ChurnCustomer Loyalty Support
Customer
Service
Number of
attempts per
issue
Violate SLA
Customer
Customer
interaction
Customer
Log
ksql> CREATE STREAM cust_interactions (incident_Id VARCHAR,
timestamp) WITH (VALUE_FORMAT='JSON', PARTITIONS=1,
KAFKA_TOPIC=cust_interaction);
ksql> CREATE TABLE cust_log_aggregate AS SELECT ROWKEY AS
customer_id, COUNT(*) AS touch_points FROM cust_interactions GROUP
BY customer_id;
ksql> CREATE TABLE cust_log_by_issue AS SELECT ROWKEY AS
incident_Id, customer_id, COUNT_DISTINCT(touch_points) AS
UNIQUE_TOUCH_POINTS FROM cust_interactions GROUP BY ROWKEY EMIT
CHANGES;
ksql> Select C.incident_id, (C.incident_id_first_touch_point_TS -
CL.incident_id_last_touchpoint_TS)/1000/60/60 AS current_SLA_hours
FROM customer_log C INNER JOIN call_log CL ON C.incident_id =
CL.incident_id WHERE current_SLA_hours > 24;
Customer 360
Security - Endpoint Security
Syslog DNSnetflow Firewall
Streams of real time events
#JOIN the various streams of DATA using the SOURCE
connector from CASSANDRA.
#BUILD AND DEPLOY THE CUSTOM UDF
ksql> CREATE STREAM entity_risk_score AS SELECT
source_IP, mac_ID, derived_risk_score(priority_errors
, DNS_burstiness, reputation,
firewall_intrusion_attempts) AS risk_score FROM
endpoint_profile WHERE
derived_risk_score(priority_errors , DNS_burstiness,
reputation, firewall_intrusion_attempts) >
<THRESHOLD>;
Ref: https://www.confluent.io/blog/build-udf-udaf-ksql-5-0/
Takeaways
Takeaways
+ ScyllaDB now Supports Change Data Capture (CDC)
+ ksqlDB provides a SQL interface for Streaming Applications
+ ksqlDB is easily extensible with Custom UDFs
+ Scylla has a new SINK connector (CDC source connector is coming soon!)
Resources
+ Scylla Sink Connector
+ Source Connector (Cassandra)
+ Scylla CDC Presentation
+ Debezium
+ Scylla’s 7 Design Principles
+ Scylla Benchmarks
+ Useful Links:
+ Stream Processing Book Bundle
+ Kafka tutorials
+ ksqlDB
+ Confluent - Scylla Partnership Overview
+ Kafka Summits 2020
+ Kafka Summit London: April 27 - 28
+ Kafka Summit Austin: August 24 - 25
Confluent Resources
Q&A
maheedhar@scylladb.com
@vanguard_space
hojjat@confluent.io
@Hojjat
Stay in touch
United States
545 Faber Place
Palo Alto, CA 94303
Israel
11 Galgalei Haplada
Herzelia, Israel
www.scylladb.com
@scylladb
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
 
Streaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaStreaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaScyllaDB
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Luke Tillman
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
Server side data sync for mobile apps with silex
Server side data sync for mobile apps with silexServer side data sync for mobile apps with silex
Server side data sync for mobile apps with silexMichele Orselli
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaLightbend
 
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)AWS re:Invent 2016: The Effective AWS CLI User (DEV402)
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)Amazon Web Services
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developersChristopher Batey
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax
 
Deep dive into event store using Apache Cassandra
Deep dive into event store using Apache CassandraDeep dive into event store using Apache Cassandra
Deep dive into event store using Apache CassandraAhmedabadJavaMeetup
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandrazznate
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
Inside Azure Diagnostics
Inside Azure DiagnosticsInside Azure Diagnostics
Inside Azure DiagnosticsMichael Collier
 

Was ist angesagt? (20)

Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
 
Streaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaStreaming Data from Scylla to Kafka
Streaming Data from Scylla to Kafka
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Server side data sync for mobile apps with silex
Server side data sync for mobile apps with silexServer side data sync for mobile apps with silex
Server side data sync for mobile apps with silex
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
 
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)AWS re:Invent 2016: The Effective AWS CLI User (DEV402)
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
 
Deep dive into event store using Apache Cassandra
Deep dive into event store using Apache CassandraDeep dive into event store using Apache Cassandra
Deep dive into event store using Apache Cassandra
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandra
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Inside Azure Diagnostics
Inside Azure DiagnosticsInside Azure Diagnostics
Inside Azure Diagnostics
 

Ähnlich wie Event streaming webinar feb 2020

KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Michael Noll
 
Query Your Streaming Data on Kafka using SQL: Why, How, and What
Query Your Streaming Data on Kafka using SQL: Why, How, and WhatQuery Your Streaming Data on Kafka using SQL: Why, How, and What
Query Your Streaming Data on Kafka using SQL: Why, How, and WhatHostedbyConfluent
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshopconfluent
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...Kai Wähner
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Michael Noll
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafkaconfluent
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentKinetica
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Paolo Castagna
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applicationsRobert Sanders
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafkaconfluent
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...HostedbyConfluent
 

Ähnlich wie Event streaming webinar feb 2020 (20)

KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
 
Query Your Streaming Data on Kafka using SQL: Why, How, and What
Query Your Streaming Data on Kafka using SQL: Why, How, and WhatQuery Your Streaming Data on Kafka using SQL: Why, How, and What
Query Your Streaming Data on Kafka using SQL: Why, How, and What
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applications
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
 

Kürzlich hochgeladen

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 

Kürzlich hochgeladen (20)

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

Event streaming webinar feb 2020

  • 1. Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL Hojjat Jafarpour, Software Engineer at Confluent Maheedhar Gunturu, Solutions Architect at ScyllaDB
  • 2. Presenters Hojjat Jafarpour, Software Engineer at Confluent Hojjat is a software engineer and the creator of KSQL, the Streaming SQL engine for Apache Kafka, at Confluent. Before joining Confluent he worked at NEC Labs, Informatica, Quantcast and Tidemark on various big data management projects. He has a Ph.D. in computer science from UC Irvine, where he worked on scalable stream processing and publish/subscribe systems. Maheedhar Gunturu, Solutions Architect at ScyllaDB Maheedhar held senior roles both in engineering and sales organizations. He has over a decade of experience designing & developing server-side applications in the cloud and working on big data and ETL frameworks in companies such as Samsung, MapR, Apple, VoltDB, Zscaler and Qualcomm. 2
  • 3. Agenda + Overview of ScyllaDB + Apache Kafka and The Confluent Platform + Example Use Cases + QA 3
  • 5. 5 + The Real-Time Big Data Database + Drop-in replacement for Apache Cassandra and Amazon DynamoDB + 10X the performance & low tail latency + Open Source, Enterprise and Cloud options + Founded by the creators of KVM hypervisor + HQs: Palo Alto, CA, USA; Herzelia, Israel; Warsaw, Poland About ScyllaDB
  • 6. Scylla Design Principles C++ instead of Java Shard per Core All Things Async Unified Cache I/O Scheduler Self-Optimizing
  • 8. Compatibility + CQL native protocol + JMX management protocol + Management command line + SSTable file format + Configuration file format + CQL language 8 /REST
  • 9. + Helps with + Database mirroring/replication/state propagation + Direct data into a Kafka stream + Configurable subscription options to the change log (per table) + Post-image (Changed state) + Delta (changes per column) + Pre-image (Previous state) + Scylla CDC-Kafka source connector coming out soon! Ref: https://www.scylladb.com/tech-talk/change-data-capture-in-scylla/ Change Data Capture (CDC) from Scylla
  • 10. Apache Kafka and Confluent Platform
  • 13. Apache Kafka Kafka Cluster A Distributed Commit Log. Publish and subscribe to streams of records. Highly scalable, high throughput. Supports transactions. Persisted data. Reads are a single seek & scan Writes are append only
  • 14. Apache Kafka Kafka Connect API Reliable and scalable integration of Kafka with other systems – no coding required.
  • 15. Apache Kafka Kafka Streams API Write standard Java applications & microservices to process your data in real-time Orders Table Customers Kafka Streams API
  • 16. Stream Processing by Analogy Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  • 17. Stream Processing in Kafka Simplicity Flexibility Consumer, Producer subscribe(), poll(), send(), flush()
  • 18. Stream Processing in Kafka Simplicity Flexibility Consumer, Producer subscribe(), poll(), send(), flush() Kafka Streams map(), filter(), aggregate(), join()
  • 19. Stream Processing in Kafka Simplicity Flexibility Consumer, Producer subscribe(), poll(), send(), flush() Kafka Streams map(), filter(), aggregate(), join() ksqlDB SELECT … FROM … JOIN .. GROUP BY ...
  • 20. ksqlDB + The event streaming database purpose-built for stream processing applications + Enables stream processing with zero coding required + The simplest way to process streams of data in real time + Powered by Kafka: scalable, distributed, battle-tested + All you need is Kafka–no complex deployments of bespoke systems for stream processing
  • 21. ksqlDB + Streaming ETL CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 22. ksqlDB + Real-Time Monitoring CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 23. ksqlDB + Features + Aggregation + Window + Tumbling + Hopping + Session + Join + Stream-Stream + Stream-Table + Table-Table + Nested data + STRUCT + UDF/UDAF/UDTF + AVRO, JSON, CSV + Protobuf to come soon + And many more...
  • 25. Using Syslog to Detect SSH Attacks KSQL Syslog Syslog Data Syslog Data Sink connector ksql> CREATE SINK CONNECTORSINK_SCYLLA_SYSLOG WITH ( 'connector.class' = 'io.connect.scylladb.ScyllaDbSinkConnector', 'connection.url' = 'localhost:9092', 'type.name' = '', 'behavior.on.malformed.documents' = 'warn', 'errors.tolerance' = 'all', 'errors.log.enable' = 'true', 'errors.log.include.messages' = 'true', 'topics' = 'syslog', 'key.ignore' = 'true', 'schema.ignore' = 'true', 'key.converter' = 'org.apache.kafka.connect.storage.StringConverter' ); ksql> CREATE STREAM SYSLOG WITH (KAFKA_TOPIC='syslog', VALUE_FORMAT='AVRO'); ksql> SELECT TIMESTAMPTOSTRING(S.DATE, 'yyyy-MM-dd HH:mm:ss') AS SYSLOG_TS, S.HOST, F.DESCRIPTION AS FACILITY, S.MESSAGE, S.REMOTEADDRESS FROM SYSLOG S LEFT OUTER JOIN FACILITY F ON S.FACILITY=F.ROWKEY WHERE S.HOST='demo' EMIT CHANGES; ksql> CREATE STREAM SYSLOG_INVALID_USERS AS SELECT * FROM SYSLOG WHERE MESSAGE LIKE 'Invalid user%'; ksql> CREATE STREAM SSH_ATTACKS AS SELECT TIMESTAMPTOSTRING(DATE, 'yyyy-MM-dd HH:mm:ss') AS SYSLOG_TS, HOST, SPLIT(REPLACE(MESSAGE,'Invalid user ',''),' from ')[0] AS ATTACK_USER, SPLIT(REPLACE(MESSAGE,'Invalid user ',''),' from ')[1] AS ATTACK_IP FROM SYSLOG_INVALID_USERS EMIT CHANGES; ksql> CREATE TABLE SSH_ATTACKS_BY_USER AS SELECT ATTACK_USER, COUNT(*) AS ATTEMPTS FROM SSH_ATTACKS GROUP BY ATTACK_USER; ksql> SELECT ATTACK_USER, ATTEMPTS FROM SSH_ATTACKS_BY_USER EMIT CHANGES; (push) ksql> SELECT ATTACK_USER, ATTEMPTS FROM SSH_ATTACKS_BY_USER WHERE ROWKEY='oracle'; (pull) Sink connector ksql> CREATE SOURCE CONNECTOR SOURCE_SYSLOG_UDP_01 WITH ( 'tasks.max' = '1', 'connector.class', 'io.confluent.connect.syslog.SyslogSourceConnector', 'topic' = 'syslog', 'syslog.port' = '42514', 'syslog.listener' = 'UDP', 'syslog.reverse.dns.remote.ip' = 'true', 'confluent.license' = '', 'confluent.topic.bootstrap.servers' = 'kafka:29092', 'confluent.topic.replication.factor' = '1' ); ksql> CREATE SINK CONNECTOR SINK_ELASTIC_SYSLOG WITH ( 'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector', 'connection.url' = 'http://elasticsearch:9200', 'type.name' = '', 'behavior.on.malformed.documents' = 'warn', 'errors.tolerance' = 'all', 'errors.log.enable' = 'true', 'errors.log.include.messages' = 'true', 'topics' = 'SYSLOG_INVALID_USERS', 'key.ignore' = 'true', 'schema.ignore' = 'true', 'key.converter' = 'org.apache.kafka.connect.storage.StringConverter' );
  • 26. IOT - Smart Home Hub Mode Service CTADevice State Device health Streams of real time events CDC Hub Info Lookup info Device data mgmt Apps and services MQTT PROXY ksql> CREATE STREAM device_stream_mode WITH (KAFKA_TOPIC='syslog',VALUE_FORMAT ='AVRO'); ksql> CREATE STREAM device_change_mode AS SELECT D.dev_id, D.dev_type, H.hub_mode AS device_mode FROM hub_mode H LEFT OUTER JOIN device_data D ON H.hub_id=D.hub_id EMIT CHANGES; ksql> CREATE STREAM device_stream_mode AS SELECT DS.dev_id , DS.dev_type, DS.mode, F.state AS dev_state FROM device_change_mode DS LEFT OUTER JOIN FACILITY F ON DS.dev_type=F.dev_type WHERE DS.mode=<DEVICE_MODE> EMIT CHANGES; ### CONFIGURE THE MQTT SINK ksql> INSERT INTO hub_mode SELECT * FROM /mqttTopicA/+/sensors [WITHCONVERTER=`myclass.AvroConverter`] ksql> CREATE STREAM hub_mode WITH (KAFKA_TOPIC='hub_mode', VALUE_FORMAT='AVRO'); ### Create the necessary sink and CDC Source connector to SCYLLA Source and Sink connector
  • 27. Customer Satisfaction - CES Score CDC Segmentation ChurnCustomer Loyalty Support Customer Service Number of attempts per issue Violate SLA Customer Customer interaction Customer Log ksql> CREATE STREAM cust_interactions (incident_Id VARCHAR, timestamp) WITH (VALUE_FORMAT='JSON', PARTITIONS=1, KAFKA_TOPIC=cust_interaction); ksql> CREATE TABLE cust_log_aggregate AS SELECT ROWKEY AS customer_id, COUNT(*) AS touch_points FROM cust_interactions GROUP BY customer_id; ksql> CREATE TABLE cust_log_by_issue AS SELECT ROWKEY AS incident_Id, customer_id, COUNT_DISTINCT(touch_points) AS UNIQUE_TOUCH_POINTS FROM cust_interactions GROUP BY ROWKEY EMIT CHANGES; ksql> Select C.incident_id, (C.incident_id_first_touch_point_TS - CL.incident_id_last_touchpoint_TS)/1000/60/60 AS current_SLA_hours FROM customer_log C INNER JOIN call_log CL ON C.incident_id = CL.incident_id WHERE current_SLA_hours > 24; Customer 360
  • 28. Security - Endpoint Security Syslog DNSnetflow Firewall Streams of real time events #JOIN the various streams of DATA using the SOURCE connector from CASSANDRA. #BUILD AND DEPLOY THE CUSTOM UDF ksql> CREATE STREAM entity_risk_score AS SELECT source_IP, mac_ID, derived_risk_score(priority_errors , DNS_burstiness, reputation, firewall_intrusion_attempts) AS risk_score FROM endpoint_profile WHERE derived_risk_score(priority_errors , DNS_burstiness, reputation, firewall_intrusion_attempts) > <THRESHOLD>; Ref: https://www.confluent.io/blog/build-udf-udaf-ksql-5-0/
  • 30. Takeaways + ScyllaDB now Supports Change Data Capture (CDC) + ksqlDB provides a SQL interface for Streaming Applications + ksqlDB is easily extensible with Custom UDFs + Scylla has a new SINK connector (CDC source connector is coming soon!)
  • 31. Resources + Scylla Sink Connector + Source Connector (Cassandra) + Scylla CDC Presentation + Debezium + Scylla’s 7 Design Principles + Scylla Benchmarks
  • 32. + Useful Links: + Stream Processing Book Bundle + Kafka tutorials + ksqlDB + Confluent - Scylla Partnership Overview + Kafka Summits 2020 + Kafka Summit London: April 27 - 28 + Kafka Summit Austin: August 24 - 25 Confluent Resources
  • 34. United States 545 Faber Place Palo Alto, CA 94303 Israel 11 Galgalei Haplada Herzelia, Israel www.scylladb.com @scylladb Thank you

Hinweis der Redaktion

  1. So here is a brief agenda for today’s talk - First --- we will begin with a high-level overview about ScyllaDB -- and the open source framework -- Seastar --- on which scylla is built on top of. We will also briefly touch upon some new features and how customers can take advantage of them. Next ----- we will dive into Apache Kafka and the various Confluent Platform components like Kafka Connect, ksqlDB etc. we will then -- explore some joint use-cases based on customers and prospects who use Kafka and various nosql solutions across domains. Towards the end we will take some questions - Please post your questions into the Q and A window as and when you get them - so we can get them prioritized and answered appropriately.
  2. Apache Kafka and the Confluent Platform
  3. A little bit about ScyllaDB and our product Scylla, the real-time big data database. Scylla is a highly performant, Low latency, scalable and autonomous NoSQL database -- which supports both Apache Cassandra and Dynamo API. So, the same client drivers and application code written on top of Cassandra or Dynamo are compatible with Scylla without any code changes. Scylla has three offerings: Scylla Open Source, Scylla Enterprise and Scylla Cloud which is a fully managed Service. We also support the option of running scylla cloud in the customer’s cloud account. We are founded by the creators of the KVM hypervisor which as some of you might know is an opensource virtualization technology that is built into linux. ScyllaDB, the company has about 100 employees, located in more than 15 countries across the world. We are expanding and currently hiring C++ and GoLang developers and evops Engineers.
  4. Now let's spend some time understanding some of the basic design principles that differentiates scylla from other NOSQL databases - Scylla is written in C++ -- which results in faster performance, lower latencies and efficient use of the hardware. Scylla has a shard per core isolated architecture where each scylla shard is associated with a CPU core. Also the necessary RAM is allocated by shard as well. This ensures that we are never starved of resources. Everything is async on the platform - so there are no locks ------ Scylla also runs on XFS --- so we get the performance benefits of using an asynchronous file system as well. Scylla comes along with a cache which is highly optimized and auto-tuned to your workload. So no need to worry about key caching, row caching, Linux caching, on heap, off heap - everything is auto-tuned. Scylla is build on top of an opensource framework called Seastar which orchestrates both Task IO and Disk IO scheduling - we will go into a bit more details in the next slide. Scylla is an autonomous database which means yours administrators get the benefit of minimum tuning.
  5. As I mentioned before ---- Scylla is build on an open source framework called the seastar. The framework understands that there are essentially five different kinds of back-end operations - Commitlogs, memtables, Compactions, Queries and Repairs. These individual operations are scheduled through the Seastar scheduler. So effectively, the Seastar scheduler moved the scheduling out of the kernel space and into the user space, where every request is parallelized and given its own priority. Also when you install scylla , it tunes itself according to the Disk , RAM, CPU and network made available to the server. This in effect makes the database autonomous.
  6. ScyllaDB’s users --- benefit from Scylla’s full compatibility with Cassandra and the Cassandra ecosystem. All the ecosystem components listed here and many more---- just work out of the box with Scylla. In addition, there are many client drivers available in a variety of programming languages that can be reused as well. Scylla has created optimized drivers for developers using Java and Go. These optimized drivers can better take advantage of the internals of Scylla Enterprise and Scylla Cloud.
  7. We recently released a new feature called CDC or Change Data Capture. Change Data Capture logging basically captures and keeps a track of data that has changed. It can also help with mirroring your database , replication and state propagation across various microservice. It is configured per table, with limits on the amount of disk space to consume for storing the CDC logs. we co-locate CDC Log partitions along with the Base Table partitions as it greatly improves the performance. There are a variety of configurable subscription options that helps you configure the stream of the change log ------- you can choose to have either the post image i.e. the changed state ----or just the delta ( i.e. changes made to the columns) or you can also stream the pre-image that is the previous state. Feel free refer the link I pasted below--- for more detailed information. We are planning on releasing a SCYLLA CDC-kafka source connector--- which overall makes it easier for developers to build globally synchronized services. This also significantly reduces the amount of data to be moved and provides a reliable way to stream the data between various systems .
  8. Now let's dive into kafka and the confluent platform. Hojjat onto you --
  9. Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.
  10. Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.
  11. Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.
  12. Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.
  13. Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.
  14. Based on a number of customers and prospects who use Kafka and various nosql systems - I would like to walk you through a few use-cases --- that are relevant for ksqlDB and CDC. Some of the examples we are going to talk about are How to detect SSH Attacks, IOT - Smart Home, Customer 360 and Network and Endpoint Security.
  15. Here is a simple use-case ---- where we have access to certain logs - say syslogs - and now we need to detect if there were any brute force ssh attacks. You want to see the results in Scylla and point your monitoring to it and also to elasticsearch for some ad hoc analysis on top this data. <CLICK> Syslog is built into Linux and its also common in networking and IoT devices to stream the log messages, along with metadata such as the source host, severity of the message, message payload, tags etc to either a local logfile or a centralized syslog server - here in this case lets stream it into kafka using the connect framework <CLICK> and it can be done with this simple one liner using the connect framework. ksqlDB now includes the ability to define connectors from within it, which makes setting things up a lot easier. Now let's try to set up the elastic Sink - which you can do with this simple one liner and then you can setup the Scylla Sink <CLICK> ( currently this is underdevelopment and you should have it available soon). Now that we have establish all the necessary connections lets write a few simple ksql queries. To begin let's create a stream SYSLOG - which reads from the topic syslog - then you can browse through the data --- it already has the necessary schema - now lets join it with the FACILITY table which should contain all the necessary information regarding various message levels say 0 is kernel message , 1 is user level messages and so on. And after you joined the data…. Lets filter the messages which contain invalid ssh attempts…. As you can see in the second query you can filter out the messages which contain the “invalid user” as this indicates that there was a failed ssh attempt - Then lets create a stream of such messages - you can do a bit or reorganization of the data using the SPLIT and REPLACE primitives. Now if you want to persist the query - you can simply create a table based off the stream - this is what Hojjat explained before about the stream table Duality. There are two ways to interact with this data - either via a push query or a pull query….
  16. Smart home and the IOT ecosystems are becoming very popular. For example, even in my home. I have about 15-20 connected devices that are active at any point in time. There are typically 3 parts to this connected ecosystem - There is the smart hub to which all the devices are connected to and then, there is the mobile which can control all the devices and then there is the cloud which keeps track of the state of the hub, various devices connected to it directly or via the partner network, various services enabled etc. The mobile typically connects and control all the devices but all the orchestration of the various automations is initiated via the cloud. Let’s go through an example here - <CLICK> If you were to put your smart home into “away” mode, then all your lights need to be turned off, doors need to be locked, and the air-conditioning needs to go into “eco” mode, and cameras need to detect motion etc. The state of all the devices that are connected to the hub is typically monitored in the cloud. The devices which are attached to the hub are automatically moved to the corresponding state, and all of this is communicated from the cloud to the hub. This communication can happen via multiple protocols - COAPP, MQTT or TCP. For this example, let’s assume its been communicated via MQTT. Here is some sample code which can orchestrate all of this via KSQL. <CLICK> you could possibly use the MQTT proxy which is available as part of Confluent enterprise, and the data once it comes into Kafka can be automatically transformed into Avro schema. You should set up the necessary SINK and CDC connectors to Scylla or Scylla cloud. So, now the data regarding the change of mode of the hub gets streamed into Scylla. This change is detected by CDC on the Scylla side and then it propagates the state to a few custom topics. KSQL is going to pick up this change and run the stream table join operations (As you can see listed ) , and these join operations combine the data of all the devices that are connected with the hub with a look-up table which contains the final state those devices need to be in. And this state of the devices can simply be communicated via the mqtt proxy.
  17. Lets dive into another use-case - Let say that you are pushing out a new release of your product - and you would like to track various customer issues and also access your support. There are a number of ways to do this but calculating CES or Customer engagement score is one easy way to find out. And also eventually you want all of this information to be populated into your customer 360 solution. You could start off by logging all your customer interaction data into Kafka from your application or you could persist it into a mysql instance directly as well - if you enable CDC on this then the necessary changes from the customer log tables are streamed out into Kafka - now by simply running the ksql queries on this data --- you can find out the number of issues organized per customer, you can list them by the number of touch points per incident ID - Also you can set a quality of service level where in you want to track issues that have received repeated calls from a customer on the same incident over the last 24 hrs. All of this can be achieved by these 3 simple queries.
  18. So, in this use case, let’s assume we are collecting syslog, netflow, dns and firewall logs from all the devices that are connected to your internal corporate network. These could be from devices like mobile phones, laptops, routers, servers, switches. Using these logs, we need to identify the higher risk end-points that could possibly be compromised. This is typically a very high throughput use-case and this type of risk analysis can either be done in either in real-time or done as post-mortem analysis for incident response. After identifying the bad actors within the network, the sys admin would remove the compromised end-points from within the network. To implement this, usually involves a lot of complexity and IT infrastructure needs to be setup. Let’s explore how we can simplify this - The data from all the various end-points is streamed into Kafka and using this sink-connector pushed into Scylla. Now, using the KSQL stream-stream-join, you can combine these various streams to derive a risk-score. <CLICK> KSQL provides a rich set of SQL like primitives but sometimes you might need to create a custom ML algorithm often times very specific to your solution and also you want to make this available to your business analysts or data scientist who want to do some ad-hoc analysis on this data - In this case - we can derive the risk score by combining the data from different streams--- and as you can see derived_risk_score is a UDF that I have defined which takes the data from multiple sources and gives out a risk score - you can simply use this UDF as part of your regular KSQL. Once we calculate the risk scores, we can order them in descending ORDER and simply pick the end-points that cross a certain given threshold. This will trigger a monitoring alert for the sys-admin who would then look into re-mediating the risk.
  19. So ---------- to summarize, here are some takeaways from today’s webinar.
  20. Here are some Scylla Resources that you can use to further your understanding …. These slides would be sent out as a follow-up to the webinar, feel free to go over them.