Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
12. May 2020•0 gefällt mir
2 gefällt mir
Sei der Erste, dem dies gefällt
Mehr anzeigen
•345 Aufrufe
Aufrufe
Aufrufe insgesamt
0
Auf Slideshare
0
Aus Einbettungen
0
Anzahl der Einbettungen
0
Melden
Technologie
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services, Perry Krol, Head of Systems Engineering, CEMEA, Confluent
https://www.meetup.com/Frankfurt-Apache-Kafka-Meetup-by-Confluent/events/269751169/
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
1
Build a Bridge to Cloud with
Apache Kafka® for Data
Analytics Cloud Services
Perry Krol, Head of Systems Engineering CEMEA
2
Confluent Community
Slack Channel
Over 10,000 Kafkateers are
collaborating every single day on the
Confluent Community Slack channel!
cnfl.io/community-slack
Subscribe to the
Confluent blog
Get frequent updates from key
names in Apache Kafka®
on best
practices, product updates & more!
cnfl.io/read
Welcome to HUG meets
Kafka UG Frankfurt
Meetup !
Zoom open at 18:15
18:20PM - 18:30PM
Virtual Cheers and Networking
18:30PM - 19:15PM
Setup Kafka with Terraform
Dominik Fries & Qais Babie
19:15PM - 20:00PM
Build a Bridge to Cloud with
Apache Kafka® for Data
Analytics Cloud Services
Perry Krol
77
Replicas
Broker 1 Broker 2 Broker 3 Broker 4
Topic A
Partition 0
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 2
Topic A
Partition 3
Topic A
Partition 1
Topic A
Partition 2
Topic A
Partition 2
Leader
Follower
88
Replicas
Broker 1 Broker 2 Broker 3 Broker 4
Topic A
Partition 0
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 2
Topic A
Partition 3
Topic A
Partition 1
Topic A
Partition 2
Topic A
Partition 2
1010
Record Keys & Ordering
Record keys determine the partition with the default kafka partitioner
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
18
Confluent Hub
Online library of pre-packaged and
ready-to-install extensions or add-ons
for Confluent Platform and Apache
Kafka®:
● Connectors
● Transforms
● Converters
Easily install the components that suit
your needs into your local
environment with the Confluent Hub
client command line tool . https://hub.confuent.io
25
CREATE STREAM vip_actions AS
SELECT userid,
page,
action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.user_id
WHERE u.level = 'Platinum’
EMIT CHANGES;
Simple SQL syntax for expressing reasoning along and
across data streams.
You can write user-defined functions in Java
Stream Processing with KSQL
@rmoff / Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!
The truth is the log.
The database is a cache
of a subset of the log.
—Pat Helland
Immutability Changes Everything
http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
Photo by Bobby Burch on Unsplash
28
KSQL for Real-Time Monitoring
28
•Log data monitoring, tracking and alerting
•syslog data
•Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
29
KSQL for Streaming ETL
29
CREATE STREAM engine_oil_pressure_readings AS
SELECT r.deviceid, r.reading, r.timestamp
d.sensor_type, d.uom, d.component
FROM sensor_readings r
LEFT JOIN device_master d
ON r.deviceid = d.id
WHERE d.component = ‘Engine’
AND d.sensor_type = ‘Oil Pressure’
EMIT CHANGES;
Joining, filtering, and aggregating streams of event data
31
KSQL is a stream processing technology
As such it is not yet a great fit for:
Ad-hoc queries
● No indexes yet in KSQL
● Kafka often configured to retain
data for only a limited span of
time
BI reports (Tableau etc.)
● No indexes yet in KSQL
● No JDBC
● Most BI tools don’t understand
continuous, streaming results
32
Building blocks for Stream Processing
Core Kafka
Producer Topic Consumer
Kafka Streams
State Stores Change Logs
Processors Operators
Stream Table
Persistence
Compute
Declarative API
ksqlDB
Push Queries Pull Queries
Serverless
Topology
Durable Pub Sub
Transformers
33C O N F I D E N T I A L 33
Streaming ETL
OLTP RDBMS to Big Data in Cloud Environments
34C O N F I D E N T I A L
Sample UseCase: Sales data
● Dataset from Kaggle https://www.kaggle.com/kyanyoga/sample-sales-data
35C O N F I D E N T I A L
RDBMS
● Current de-facto
data integration
technology
● Third Normal Form
● Minimises data
duplication
36C O N F I D E N T I A L 36
Big Data
● Data storage is
cheap
● Tabular data
● Flat schema
37C O N F I D E N T I A L
Hybrid Cloud
● OLTP Database
on-premises DC
● Big Data service in
Cloud environment
● Heterogeneous
network, across
security zones
● What’s the Bridge to
Cloud ?
38C O N F I D E N T I A L 38
Connector
over WAN?
● Stability of Connector
and reliability of data
delivery dependent on
stable WAN
connection and QoS
● Tight coupling between
Kafka cluster in
on-premises DC and
individual application
endpoints in the Cloud
39C O N F I D E N T I A L 3939C O N F I D E N T I A L
Bridge 2 Cloud
● Decoupling with Kafka
Clusters in both the DC
and Cloud environment
● Reliable data delivery with
buffer in each
environment, independent
of network security or QoS
SLA
● Persistent bridge to cloud,
ensures data is only sent
once to the cloud (or to
DC) and can be reused by
many stream processors,
connectors and apps
42C O N F I D E N T I A L
Streaming KSQL: pairwise joins
CREATE STREAM ORDER_ORDERLINE_EVENTS AS
SELECT ol.ORDERNUMBER,
o.ORDERDATE,
o.STATUS,
o.QTR_ID,
o.MONTH_ID,
o.YEAR_ID,
o.DEALSIZE,
o.CUSTOMERNAME,
ol.ORDERLINENUMBER,
ol.QUANTITYORDERED,
ol.PRICEEACH,
ol.PRODUCTCODE
FROM ORDER_LINES_EVENTS ol
LEFT JOIN ORDER_HEADERS o
ON ol.ORDERNUMBER = o.ORDERNUMBER
PARTITION BY ol.CUSTOMERNAME;
43C O N F I D E N T I A L
Streaming KSQL: pairwise joins
CREATE STREAM CUSTOMER_ORDER_ORDERLINE_EVENTS AS
SELECT ool.ORDERNUMBER,ool.ORDERDATE,
ool.STATUS,ool.QTR_ID,
ool.MONTH_ID,ool.YEAR_ID,
ool.DEALSIZE, ool.CUSTOMERNAME,
c.PHONE,c.ADDRESSLINE1,c.ADDRESSLINE2,
c.CITY,c.STATE,c.POSTALCODE,c.COUNTRY,
c.CONTACTLASTNAME,c.CONTACTFIRSTNAME,
ool.ORDERLINENUMBER,
ool.QUANTITYORDERED,
ool.PRICEEACH,ool.PRODUCTCODE
FROM ORDER_ORDERLINE_EVENTS ool
LEFT JOIN CUSTOMERS c
ON ool.CUSTOMERNAME = c.CUSTOMERNAME
PARTITION BY ool.PRODUCTCODE;
44C O N F I D E N T I A L
Streaming KSQL: pairwise joins
CREATE STREAM PRODUCT_CUSTOMER_ORDERLINE_EVENTS
AS SELECT col.ORDERNUMBER,col.ORDERDATE,
col.STATUS,col.QTR_ID,col.MONTH_ID,
col.YEAR_ID,col.DEALSIZE,
col.CUSTOMERNAME,col.PHONE,
col.ADDRESSLINE1,col.ADDRESSLINE2,
col.CITY,col.STATE,col.POSTALCODE,
col.COUNTRY,col.CONTACTLASTNAME,
col.CONTACTFIRSTNAME,
col.ORDERLINENUMBER,
col.QUANTITYORDERED,col.PRICEEACH,
col.PRODUCTCODE,p.PRODUCTLINE, p.MSRP
FROM CUSTOMER_ORDER_ORDERLINE_EVENTS col
LEFT JOIN PRODUCTS p
ON pcol.OL1_OL_PRODUCTCODE = p.PRODUCTCODE
PARTITION BY col.COUNTRY;
45C O N F I D E N T I A L
Streaming KSQL: pairwise joins
CREATE STREAM TABULAR_ORDER_EVENTS
WITH (KAFKA_TOPIC='orders_enriched')
AS SELECT pcol.ORDERNUMBER,pcol.ORDERDATE,
pcol.STATUS,pcol.QTR_ID,pcol.MONTH_ID,
pcol.YEAR_ID,pcol.DEALSIZE,
pcol.CUSTOMERNAME,pcol.PHONE,
pcol.ADDRESSLINE1,pcol.ADDRESSLINE2,
pcol.CITY,pcol.STATE,
pcol.POSTALCODE,pcol.COUNTRY COUNTRY,
c.TERRITORY,pcol.CONTACTLASTNAME,
pcol.CONTACTFIRSTNAME,pcol.ORDERLINENUMBER,
pcol.QUANTITYORDERED, pcol.PRICEEACH,
pcol.PRODUCTCODE, pcol.PRODUCTLINE,
pcol.MSRP
FROM PRODUCT_CUSTOMER_ORDERLINE_EVENTS pcol
LEFT JOIN COUNTRIES c ON pcol.COUNTRY =
c.COUNTRY;
PARTITION BY ORDERNUMBER;
46C O N F I D E N T I A L
What does KSQL look like?
● First load a topic into a stream
● Then flatten to a table
● Join stream to table for enrichment
CREATE STREAM orderlines1 AS
SELECT ol.*, o.ORDERDATE, o.STATUS, o.QTR_ID, o.MONTH_ID, o.YEAR_ID,
o.DEALSIZE, o.CUSTOMERNAME
FROM ORDERLINES_3NF ol
LEFT JOIN T_ORDERS_3NF o ON ol.ORDERNUMBER = o.ORDERNUMBER;
CREATE STREAM ORDER_EVENTS
WITH (KAFKA_TOPIC='orders_cdc', VALUE_FORMAT='AVRO’)
PARTITION BY ORDERNUMBER;
CREATE TABLE ORDERS
WITH (KAFKA_TOPIC='ORDER_EVENTS', VALUE_FORMAT='AVRO',
KEY='ORDERNUMBER’);
47C O N F I D E N T I A L
Or use the Kafka Streams API
● Java or Scala
● Can do multiple joins in one operation
● Provides an interactive query API which makes it possible to query the state
store.
51Confluent are giving new users $50 of free usage per month for their first 3 months
Sign up for a Confluent Cloud
account
Please bear in mind that you will be
required to enter credit card
information but will not be charged
unless you go over the $50 usage
in any of the first 3 months or if
you don’t cancel your subscription
before the end of your promotion.
Here’s advice on how to use this promotion to try Confluent Cloud for free!
You won’t be charged if you don’t
go over the limit!
Get the benefits of Confluent
Cloud, but keep an eye on your
your account making sure that you
have enough remaining free credits
available for the rest of your
subscription month!!
Cancel before the 3 months end If
you don’t want to continue past
the promotion
If you fail to cancel within your first
three months you will start being
charged full price. To cancel,
immediately stop all streaming and
storing data in Confluent Cloud and
email cloud-support@confluent.io
bit.ly/TryConfluentCloudAvailable on
bit.ly/TryConfluentCloud
52
A Confluent community catalyst is
a person who invests relentlessly in
the Apache Kafka®
and/or
Confluent communities.
Massive
bragging rights
Access to the
private MVP
Slack channel
Special swagThe recognition
of your peers
Direct interaction
with Apache Kafka
contributors as well as
the Confluent founders
at special events
Free pass for
Kafka Summit SF
Nominate yourself or a peer at
CONFLUENT.IO/NOMINATE
5353
Want to host or speak at
one of our meetups?
Please contact community@confluent.io
and we will make it happen!
54C O N F I D E N T I A L 54C O N F I D E N T I A L
Learn more about
Apache Kafka®
and
Confluent Platform
55C O N F I D E N T I A L 55C O N F I D E N T I A L
Confluent Blog
https://confluent.io/blog
Get the latest info and read about interesting use
case or implementation details in our Blog!
Categories include:
● Analytics
● Big Ideas
● Clients
● Company
● Confluent Cloud
● Connecting to Apache Kafka
● Frameworks
● Kafka Summit
● Use Cases
● Stream Processing
56C O N F I D E N T I A L 56C O N F I D E N T I A L
Podcast - Streaming
Audio
● Real-Time Banking with Clojure and Apache Kafka ft.
Bobby Calderwood
● Announcing ksqlDB ft. Jay Kreps (20.11.2019)
● Installing Apache Kafka with Ansible ft. Viktor Gamov
and Justin Manchester (18.11.2019)
● Securing the Cloud with VPC Peering ft. Daniel LaMotte
(13.11.2019)
● ETL and Event Streaming Explained ft. Stewart Bryson
(06.11.2019)
● The Pro’s Guide to Fully Managed Apache Kafka
Services ft. Ricardo Ferreira (04.11.2019)
● Many more!
57C O N F I D E N T I A L 57
https://www.confluent.io/apache-kafka-stream-processing-book-bundle/
58C O N F I D E N T I A L
Confluent Community
Slack Channel
Over 10,000 Kafkateers are
collaborating every single day on the
Confluent Community Slack channel!
cnfl.io/community-slack
Confluent Community
Catalyst Program
Nominate yourself or a peer at
confluent.io/nominate
Subscribe to the
Confluent blog
Get frequent updates from key
names in Apache Kafka®
on best
practices, product updates & more!
cnfl.io/read
59C O N F I D E N T I A L 59
https://www.confluent.io/download/compare/
60C O N F I D E N T I A LConfluent are giving new users $50 of free usage per month for their first 3 months
Sign up for a Confluent Cloud
account
Please bear in mind that you will be
required to enter credit card
information but will not be charged
unless you go over the $50 usage
in any of the first 3 months or if
you don’t cancel your subscription
before the end of your promotion.
Here’s advice on how to use this promotion to try Confluent Cloud for free!
You won’t be charged if you don’t
go over the limit!
Get the benefits of Confluent
Cloud, but keep an eye on your
your account making sure that you
have enough remaining free credits
available for the rest of your
subscription month!!
Cancel before the 3 months end If
you don’t want to continue past
the promotion
If you fail to cancel within your first
three months you will start being
charged full price. To cancel,
immediately stop all streaming and
storing data in Confluent Cloud and
email cloud-support@confluent.io
bit.ly/TryConfluentCloudAvailable on
bit.ly/TryConfluentCloud
61C O N F I D E N T I A L
Demos
A curated list of demos that showcase
Apache Kafka® event stream
processing on the Confluent Platform.
This list is curated by our DevX Team
and include demos for:
● Confluent Cloud
● Stream Processing
● Data Pipelines
● Confluent Platform
62C O N F I D E N T I A L
https://github.com/confluentinc/examples
63C O N F I D E N T I A L
https://github.com/confluentinc/demo-scene