Milano Apache Kafka Meetup by Confluent (First Italian Kafka Meetup) on Wednesday, November 29th 2017.
Il talk introduce Apache Kafka (incluse le APIs Kafka Connect e Kafka Streams), Confluent (la società creata dai creatori di Kafka) e spiega perché Kafka è un'ottima e semplice soluzione per la gestione di stream di dati nel contesto di due delle principali forze trainanti e trend industriali: Internet of Things (IoT) e Microservices.
How AI, OpenAI, and ChatGPT impact business and software.
Introduction to Apache Kafka and Confluent... and why they matter
1. 11
Introduction to Apache Kafka and Confluent
... and why they matter!
First Italian Kafka Meetup
Wednesday, November 29th 2017
18:45 – 20:30
PoliHub – Startup District & Incubator
via Durando 39, Milano
https://www.meetup.com/Milano-Kafka-meetup/events/244352352/
2. 22
How Organizations Handle Data Flows: a Giant Mess
Data
Warehouse
Hadoop
NoSQL
Oracle
SFDC
Logging
Bloomberg
…any sink/source
Web Custom Apps Microservices Monitoring Analytics
…and more
OLTP
ActiveMQ
App App
Caches
OLTP OLTPAppAppApp
3. 33
Apache Kafka™: A Distributed Streaming Platform
Apache Kafka
Offline Batch (+1 Hour)Near-Real Time (>100s ms)Real Time (0-100 ms)
Data
Warehouse
Hadoop
NoSQL
Oracle
SFDC
Twitter
Bloomberg
…any sink/source …any sink/source
…and more
Web Custom Apps Microservices Monitoring Analytics
4. 44
Over 35% of Fortune 500’s are using Apache Kafka™
6 of top 10
Travel
7 of top 10
Global banks
8 of top 10
Insurance
9 of top 10
Telecom
5. 55
Industry Trends… and why Apache Kafka matters!
1. From ‘big data’ (batch) to ‘fast data’ (stream processing)
2. Internet of Things (IoT) and sensor data
3. Microservices and asynchronous communication (coordination
messages and data streams) between loosely coupled and fine-
grained services
7. 77
Apache Kafka API – ETL Analogy
Source SinkConnectAPI
ConnectAPI
Streams API
Extract Transform Load
8. 88
The Connect API of Apache Kafka®
Centralized management and configuration
Support for hundreds of technologies
including RDBMS, Elasticsearch, HDFS, S3
Supports CDC ingest of events from RDBMS
Preserves data schema
Fault tolerant and automatically load balanced
Extensible API
Single Message Transforms
Part of Apache Kafka, included in
Confluent Open Source
Reliable and scalable integration of Kafka
with other systems – no coding required.
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo",
"table.whitelist": "sales,orders,customers"
}
https://docs.confluent.io/current/connect/
9. 99
The Streams API of Apache Kafka®
No separate processing cluster required
Develop on Mac, Linux, Windows
Deploy to containers, VMs, bare metal, cloud
Powered by Kafka: elastic, scalable,
distributed, battle-tested
Perfect for small, medium, large use cases
Fully integrated with Kafka security
Exactly-once processing semantics
Part of Apache Kafka, included in
Confluent Open Source
Write standard Java applications and microservices
to process your data in real-time
KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic");
KTable<Windowed<User>, Long> viewsPerUserSession = pageViews
.groupByKey()
.count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views");
https://docs.confluent.io/current/streams/
10. 1010
KSQL: a Streaming SQL Engine for Apache Kafka® from Confluent
No coding required, all you need is SQL
No separate processing cluster required
Powered by Kafka: elastic, scalable,
distributed, battle-tested
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.userid
WHERE u.level = 'Platinum';
KSQL is the simplest way to process streams of data in real-time
Perfect for streaming ETL, anomaly detection,
event monitoring, and more
Part of Confluent Open Source
https://github.com/confluentinc/ksql
12. 1212
Confluent Enterprise: Physical Architecture
Rack 1
Kafka Broker #1
ToR Switch
ToR Switch
Schema Registry #1
Kafka Connect #1
Zookeeper #1
REST Proxy #1
Kafka Broker #4
Zookeeper #4
Rack 2
Kafka Broker #2
ToR Switch
ToR Switch
Schema Registry #2
Kafka Connect #2
Zookeeper #2
Kafka Broker #5
Zookeeper #5
Rack 3
Kafka Broker #3
ToR Switch
ToR Switch
Kafka Connect #3
Zookeeper #3
Core Switch Core Switch
REST Proxy #2
Load Balancer Load Balancer
Control Center #1 Control Center #2
13. 1313
Confluent Completes Kafka
Feature Benefit Apache Kafka Confluent Open Source Confluent Enterprise
Apache Kafka
High throughput, low latency, high availability, secure distributed streaming
platform
Kafka Connect API Advanced API for connecting external sources/destinations into Kafka
Kafka Streams API
Simple library that enables streaming application development within the
Kafka framework
Additional Clients Supports non-Java clients; C, C++, Python, .NET and several others
REST Proxy
Provides universal access to Kafka from any network connected device via
HTTP
Schema Registry
Central registry for the format of Kafka data – guarantees all data is always
consumable
Pre-Built Connectors
HDFS, JDBC, Elasticsearch, Amazon S3 and other connectors fully certified
and supported by Confluent
JMS Client
Support for legacy Java Message Service (JMS) applications consuming
and producing directly from Kafka
Confluent Control
Center
Enables easy connector management, monitoring and alerting for a Kafka
cluster
Auto Data Balancer Rebalancing data across cluster to remove bottlenecks
Replicator Multi-datacenter replication simplifies and automates MDC Kafka clusters
Support
Enterprise class support to keep your Kafka environment running at top
performance Community Community 24x7x365
14. 1414
Big Data and Fast Data Ecosystems
Synchronous Req/Response
0 – 100s ms
Near Real Time
> 100s ms
Offline Batch
> 1 hour
Apache Kafka
Stream Data Platform
Search
RDBMS
Apps Monitoring
Real-time
Analytics
NoSQL
Stream
Processing
Apache Hadoop
Data Lake
Impala
DWH
Hive
Spark Map-Reduce
Confluent HDFS Connector
(exactly once semantics)
https://www.confluent.io/blog/the-value-of-apache-kafka-in-big-data-ecosystem/
15. 1515
Building a Microservices Ecosystem with Kafka Streams and KSQL
https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/
https://github.com/confluentinc/kafka-streams-examples/tree/3.3.0-post/src/main/java/io/confluent/examples/streams/microservices
16. 1616
About Confluent and Apache Kafka™
70% of active Kafka
Committers
Founded
September 2014
Technology developed
while at LinkedIn
Founded by the creators of
Apache Kafka
17. 1717
Apache Kafka: PMC members and committers
https://kafka.apache.org/committers
PMC
PMC PMC PMCPMC PMC PMC PMC
PMC PMC PMC