SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
Stream Processing Airport Data
Sönke Liebau – Co-Founder and Partner @ OpenCore
October 17th 2018
Serving the Real-Time Data Needs of an Airport with Kafka
Streams and KSQL
Who Am I?
• Partner & Co-Founder at
• Small consulting company with a Big Data & Open Source focus
• First production Kafka deployment in 2014
Website: www.opencore.com
soenke.liebau@opencore.com
https://www.linkedin.com/in/soenkeliebau/
@soenkeliebau
Kafka Streams & KSQL
Source: https://kafka.apache.org/20/documentation/streams/ 13
What Is Kafka Streams?
“The easiest way to write mission-critical real-time applications and
microservices”
“Kafka Streams is a client library for building applications and microservices, where the input and output
data are stored in Kafka clusters. “
What Is KSQL?
Confluent KSQL is the open source,
streaming SQL engine that enables
real-time data processing against
Apache Kafka®
Source: https://www.confluent.io/product/ksql/ 14
© 2018 OpenCore GmbH & Co. KG 17
Kafka Streams In The Ecosystem
Sources KafkaConnect
KafkaConnect
Destinations
Kafka
Streams
Jobs
© 2018 OpenCore GmbH & Co. KG 18
The Big Difference
20
Using Kafka Streams
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
KStream<String, String> textLines = builder.stream("streams-plaintext-input",
Consumed.with(stringSerde, stringSerde);
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+")))
.groupBy((key, value) -> value)
.count()
wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
Source: https://kafka.apache.org/20/documentation/streams/quickstart
21
Using Kafka Streams
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
KStream<String, String> textLines = builder.stream("streams-plaintext-input",
Consumed.with(stringSerde, stringSerde);
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+")))
.groupBy((key, value) -> value)
.count()
wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(),
Serdes.Long()));
Source: https://kafka.apache.org/20/documentation/streams/quickstart
…
© 2018 OpenCore GmbH & Co. KG 22
Using KSQL
RestInterface
CLI
Rest
Client
SELECT *
FROM security_in
WHERE status=’success’
AND terminal=’t1’;
23
Running A KSQL Statement
© 2017 OpenCore GmbH & Co. KG 24
The Competition
Kafka Streams KSQL
When To Use Which?
• Offers lower level access
• More data formats supported
• Queryable state
• Problems that cannot be expressed in
SQL
• Easier for people used to SQL
• No need for additional orchestration
• Data exploration
© 2018 OpenCore GmbH & Co. KG 25
Our Airport
© 2017 OpenCore GmbH & Co. KG 26
© 2017 OpenCore GmbH & Co. KG 27
A Few Facts Up Front
• A lot of independent data sources
• Airline ticketing
• Baggage transport system
• Passenger counting
• Retail
• Radar
• Weather
• …
• Spread over multiple companies
• Many legacy interfaces
© 2018 OpenCore GmbH & Co. KG 28
Integrations
Operations
Database
External
System
External
System
External
System
External
System
External
System
External
System
© 2018 OpenCore GmbH & Co. KG 29
Isolated Islands Of Data
• A lot of isolated data stores to provide data for necessary solutions
• Spiderweb of integrations
• Operational DB needs to push data to a lot of systems
• Many different formats
© 2018 OpenCore GmbH & Co. KG 31
The Dream
…
Weird
binary
source
XML
Source
Destination
Destination
Destination
Raw Source Processed
RestStream Processing
© 2018 OpenCore GmbH & Co. KG 32
Ingest Transformation - Kafka Streams
StreamsBuilder builder = new StreamsBuilder();
Serde<ProprietaryObject> weirdFormatSerde = new ProprietaryWeirdFormatSerde();
Serde<ProprietaryObject> avroSerde = new ProprietaryAvroSerde();
builder.stream(“proprietary_input_topic",
Consumed.with(
Serdes.String(),
weirdFormatSerde))
.to("avro_output_topic",
Produced.with(
Serdes.String(),
avroSerde));
33
Ingest Transformation - KSQL
ksql> CREATE STREAM source (uid INT, name VARCHAR) WITH (KAFKA_TOPIC='mysql_users',
VALUE_FORMAT='JSON‘);
ksql> CREATE STREAM target_avro WITH (VALUE_FORMAT='AVRO', KAFKA_TOPIC='mysql_users_avro')
AS SELECT * FROM source;
Source: https://gist.github.com/rmoff/165b05e4554c41719b71f1a47ee7b113
© 2018 OpenCore GmbH & Co. KG 34
Stream Processing
• Stream processing jobs read converted avro topics and create enriched
topics/alerts/… by
• Joining streams
• Aggregating streams
• Filtering or alerting on streams
• …
© 2018 OpenCore GmbH & Co. KG 35
DISCLAIMER
© 2018 OpenCore GmbH & Co. KG 36
Gate Changes
• Gate changes can be based on different information
• Delays of the incoming flight
• Changes on other outgoing flights
• …
• Join relevant streams and publish change events that are consumed by
• Apps
• Gate monitors
• Departure boards
• …
© 2018 OpenCore GmbH & Co. KG 37
Passenger Count
• Join stream of tickets scanned before line to security check and camera
count of passengers leaving security check to estimate number of waiting
passengers
• Change routing of passengers (physical: signs change & digital: different routing in
app)
• Also consumed by
• Monitors to display predicted waiting time
• App to display predicted wait time
• Predicition systems to feed models for capacity planning
• Models to predict if a passenger might miss his flight -> reroute to priority lane
© 2018 OpenCore GmbH & Co. KG 38
Wait Time
• Calculate how long a passenger took to clear the security checkpoint by
joining when he scanned his boarding pass and when he is first spotted by
an iBeacon beyond security
• Push offers based on wait time and flight time
• Long wait, lot of time till take-off -> free coffee or sandwich
• Long wait, short time till take-off -> duty free voucher
• …
© 2018 OpenCore GmbH & Co. KG 39
Baggage Notification
• Baggage containers are scanned when they are loaded/unloaded
• By joining this with data from the baggage sorter passengers could receive
push notifications when their luggage is loaded/unloaded into/from the
plane
© 2018 OpenCore GmbH & Co. KG 40
Arrival At Gate
• There are complex models running to estimate when the plane will arrive
at the gate after it has landed
• Based on ground radar data
• Can be used to
• Predict whether the following flight might be delayed
• Coordinate cleaning crews
• Coordinate refueling
• Feed into gate change decisions
© 2018 OpenCore GmbH & Co. KG 41
An Example Flow
{"boardingpass_id":"123",
"passenger“:"smith",
"flight_number":"LH454“,
“checked_bags”:1}
{"boardingpass_id":"123",
"security_area":"t1_2",
"status":"success"}
{"security_area":"t1_2",
"count":"1"}
{"passenger":"smith",
"beacon_id":"t1_b123"}
{"boardingpass_id":"123",
"item_group":"cigarettes"}
{"boardingpass_id":"123",
"status":"success"}
{"flight_no":"LH454",
“runway":“1north"}
{“old_gate":“a12”,
“new_gate":“e50"}
© 2018 OpenCore GmbH & Co. KG 42
Check-In Event
{"boardingpass_id":"123",
"passenger":"smith",
"flight_number":"LH454“,
“terminal”:“terminal1” }
check_in_count
CREATE TABLE check_in_count
AS SELECT terminal, count(terminal)
FROM security_in
WINDOW TUMBLING (SIZE 24 hour)
GROUP BY terminal;
check_in
What is it good for?
• Early warning for security capacity
• „Don‘t dawdle“ warning based on
security queues
© 2018 OpenCore GmbH & Co. KG 43
Passenger Enters Security Area
{"boardingpass_id":"123",
"security_area":"t1_2",
"status" : "success"}
security_in_count
CREATE TABLE security_in_count
AS SELECT
security_area,
count(security_area)
FROM security_in
WINDOW TUMBLING (SIZE 24 hour)
WHERE status='success'
GROUP BY security_area;
security_in
What is it good for?
• Monitor for failed attempts
• Passenger routing to security
• Unload baggage of late passengers
• …
time_to_security
SELECT
s.boardingpass_id, c.rowtime - s.rowtime
as time_to_security
FROM security_in s
LEFT JOIN check_in c WITHIN 1 HOUR
ON s.boardingpass_id=c.boardingpass_id;
© 2018 OpenCore GmbH & Co. KG 44
Passenger Leaves Security Area
{"security_area":"t1_2",
"count“:"1"}
security_out security_out_count
CREATE TABLE security_out_count
AS SELECT security_area, sum(count)
FROM security_out
WINDOW TUMBLING (SIZE 24 hour)
GROUP BY security_area;
security_in_count
current_count
What is it good for?
• Capacity planning
• Wait time prediction
• Passenger routing (apps & physical)
• Alerting on late passengers checking in
• …
SELECT
i.terminal AS terminal,
i.KSQL_COL_1 AS entry,
o.KSQL_COL_1 AS exit,
i.KSQL_COL_1 - o.KSQL_COL_1 AS
waiting
FROM security_in_count i
INNER JOIN security_out_count o
ON i.terminal=o.terminal;
© 2018 OpenCore GmbH & Co. KG 45
Passenger Located Via iBeacon
{"passenger":"smith",
"beacon_id":"t1_b123"}
security_duration
security_in
dutyfree_joined
CREATE STREAM dutyfree_joined
AS SELECT c.boardingpass_id, d.passenger
FROM dutyfree_in d
LEFT JOIN security_in s WITHIN 1 HOURS
ON s.passenger=d.passenger;
dutyfree_in
SELECT
d.boardingpass_id,
d.d_passenger,
d.rowtime - s.rowtime as
time_in_security
FROM dutyfree_in_with_bc d
LEFT JOIN security_in s WITHIN 1 HOUR
ON d.boardingpass_id=s.boardingpass_id;
What is it good for?
• Refining wait time prediction
• Targeted questionaire (find reasons for
outliers)
• Vouchers for huge delays
• …
© 2018 OpenCore GmbH & Co. KG 46
Purchase Event
{"boardingpass_id":"123",
"item_group":"cigarettes"}
flight_information
check_in
dutyfree_joined
What is it good for?
• Retail models
• Route to smoking area nearest to gate
• Advise of walk time if time is tight
• …
dutyfree_purchase
CREATE STREAM dutyfree_joined
AS SELECT
c.boardingpass_id,
c.passenger,
p.purchase_type,
f.gate
FROM dutyfree_purchase p
LEFT JOIN check_in c WITHIN 1 HOURS
ON c.passenger=p.passenger
LEFT JOIN flight_information f
WITHIN 1 HOURS
ON f.flight_number = c.flight_number;
© 2018 OpenCore GmbH & Co. KG 47
Gate Change
expected_gate_arrival
notifications
expected_gate_departure
CREATE STREAM gate_wait_time
AS SELECT
a.flight,
d.departure_time - a.arrival_time as wait_time
FROM expected_gate_arrival a
INNER JOIN expected_gate_departure d WITHIN 1 HOURS
ON a.gate=d.gate;
gate_wait_time
gate_change
CREATE STREAM gate_change
AS SELECT
flight
FROM gate_wait_time
WHERE wait_time > 600000;
CREATE STREAM notifications
AS SELECT f.passenger
FROM gate_change g
LEFT JOIN flight_information f
WITHIN 1 HOURS ON f.gate=g.gate;
flight_information
© 2018 OpenCore GmbH & Co. KG 48
Passenger Boards Plane
{"boardingpass_id":"123",
"status":"success"}
gate_in
What is it good for?
• Alert on bags without matching passengers
• Trigger unloading based on related events
• Gate closed
• Time based
• …
bags_joined
check_in baggage_loaded
CREATE STREAM bag_join
AS SELECT
c.passenger,
c.bags
FROM gate_in g
LEFT JOIN check_in c
WITHIN 1 HOURS
ON c.boardingpass_id=g.boardingpass_id
LEFT JOIN baggage_loaded b
WITHIN 1 HOURS
ON b.bag_id = c.bag_id;
Thank You!
© 2018 OpenCore GmbH & Co. KG

Weitere ähnliche Inhalte

Was ist angesagt?

Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Guido Schmutz
 

Was ist angesagt? (20)

Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
 
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring System
 
Schema registry
Schema registrySchema registry
Schema registry
 
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use cases
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 

Ähnlich wie Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL

Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
IBM Cloud Data Services
 
Anatomy of an AWS account Cryptojack
Anatomy of an AWS account CryptojackAnatomy of an AWS account Cryptojack
Anatomy of an AWS account Cryptojack
Anton Gurov
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 

Ähnlich wie Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL (20)

Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQLServing the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Intro to AWS Batch & How AQR Capital leverages AWS to Identify New Investment...
Intro to AWS Batch & How AQR Capital leverages AWS to Identify New Investment...Intro to AWS Batch & How AQR Capital leverages AWS to Identify New Investment...
Intro to AWS Batch & How AQR Capital leverages AWS to Identify New Investment...
 
UberCloud: From Experiment to Marketplace
UberCloud: From Experiment to MarketplaceUberCloud: From Experiment to Marketplace
UberCloud: From Experiment to Marketplace
 
UberCloud: From Experiment to Marketplace
UberCloud: From Experiment to MarketplaceUberCloud: From Experiment to Marketplace
UberCloud: From Experiment to Marketplace
 
Anatomy of an AWS account Cryptojack
Anatomy of an AWS account CryptojackAnatomy of an AWS account Cryptojack
Anatomy of an AWS account Cryptojack
 
AWS for Manufacturing: Digital Transformation throughout the Value Chain (MFG...
AWS for Manufacturing: Digital Transformation throughout the Value Chain (MFG...AWS for Manufacturing: Digital Transformation throughout the Value Chain (MFG...
AWS for Manufacturing: Digital Transformation throughout the Value Chain (MFG...
 
Enabling Event Driven Architecture with PubSub+
Enabling Event Driven Architecture with PubSub+Enabling Event Driven Architecture with PubSub+
Enabling Event Driven Architecture with PubSub+
 
Driving Efficiency with Splunk Cloud at Gatwick Airport
Driving Efficiency with Splunk Cloud at Gatwick AirportDriving Efficiency with Splunk Cloud at Gatwick Airport
Driving Efficiency with Splunk Cloud at Gatwick Airport
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the Cloud
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
 
Serverless patterns
Serverless patternsServerless patterns
Serverless patterns
 
Javier Hijas & Ori Kuyumgiski - Security at the speed of DevOps [rooted2018]
Javier Hijas & Ori Kuyumgiski	- Security at the speed of DevOps [rooted2018]Javier Hijas & Ori Kuyumgiski	- Security at the speed of DevOps [rooted2018]
Javier Hijas & Ori Kuyumgiski - Security at the speed of DevOps [rooted2018]
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Containers and Kubernetes without limits
Containers and Kubernetes without limitsContainers and Kubernetes without limits
Containers and Kubernetes without limits
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
How AQR Capital Uses AWS to Research New Investment Signals
How AQR Capital Uses AWS to Research New Investment Signals How AQR Capital Uses AWS to Research New Investment Signals
How AQR Capital Uses AWS to Research New Investment Signals
 
Check Point and Accenture Webinar
Check Point and Accenture Webinar Check Point and Accenture Webinar
Check Point and Accenture Webinar
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL

  • 1. Stream Processing Airport Data Sönke Liebau – Co-Founder and Partner @ OpenCore October 17th 2018 Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
  • 2. Who Am I? • Partner & Co-Founder at • Small consulting company with a Big Data & Open Source focus • First production Kafka deployment in 2014 Website: www.opencore.com soenke.liebau@opencore.com https://www.linkedin.com/in/soenkeliebau/ @soenkeliebau
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 13. Source: https://kafka.apache.org/20/documentation/streams/ 13 What Is Kafka Streams? “The easiest way to write mission-critical real-time applications and microservices” “Kafka Streams is a client library for building applications and microservices, where the input and output data are stored in Kafka clusters. “
  • 14. What Is KSQL? Confluent KSQL is the open source, streaming SQL engine that enables real-time data processing against Apache Kafka® Source: https://www.confluent.io/product/ksql/ 14
  • 15. © 2018 OpenCore GmbH & Co. KG 17 Kafka Streams In The Ecosystem Sources KafkaConnect KafkaConnect Destinations Kafka Streams Jobs
  • 16. © 2018 OpenCore GmbH & Co. KG 18 The Big Difference
  • 17. 20 Using Kafka Streams final Serde<String> stringSerde = Serdes.String(); final Serde<Long> longSerde = Serdes.Long(); KStream<String, String> textLines = builder.stream("streams-plaintext-input", Consumed.with(stringSerde, stringSerde); KTable<String, Long> wordCounts = textLines .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+"))) .groupBy((key, value) -> value) .count() wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Source: https://kafka.apache.org/20/documentation/streams/quickstart
  • 18. 21 Using Kafka Streams final Serde<String> stringSerde = Serdes.String(); final Serde<Long> longSerde = Serdes.Long(); KStream<String, String> textLines = builder.stream("streams-plaintext-input", Consumed.with(stringSerde, stringSerde); KTable<String, Long> wordCounts = textLines .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+"))) .groupBy((key, value) -> value) .count() wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Source: https://kafka.apache.org/20/documentation/streams/quickstart …
  • 19. © 2018 OpenCore GmbH & Co. KG 22 Using KSQL RestInterface CLI Rest Client SELECT * FROM security_in WHERE status=’success’ AND terminal=’t1’;
  • 20. 23 Running A KSQL Statement
  • 21. © 2017 OpenCore GmbH & Co. KG 24 The Competition
  • 22. Kafka Streams KSQL When To Use Which? • Offers lower level access • More data formats supported • Queryable state • Problems that cannot be expressed in SQL • Easier for people used to SQL • No need for additional orchestration • Data exploration © 2018 OpenCore GmbH & Co. KG 25
  • 23. Our Airport © 2017 OpenCore GmbH & Co. KG 26
  • 24. © 2017 OpenCore GmbH & Co. KG 27 A Few Facts Up Front • A lot of independent data sources • Airline ticketing • Baggage transport system • Passenger counting • Retail • Radar • Weather • … • Spread over multiple companies • Many legacy interfaces
  • 25. © 2018 OpenCore GmbH & Co. KG 28 Integrations Operations Database External System External System External System External System External System External System
  • 26. © 2018 OpenCore GmbH & Co. KG 29 Isolated Islands Of Data • A lot of isolated data stores to provide data for necessary solutions • Spiderweb of integrations • Operational DB needs to push data to a lot of systems • Many different formats
  • 27. © 2018 OpenCore GmbH & Co. KG 31 The Dream … Weird binary source XML Source Destination Destination Destination Raw Source Processed RestStream Processing
  • 28. © 2018 OpenCore GmbH & Co. KG 32 Ingest Transformation - Kafka Streams StreamsBuilder builder = new StreamsBuilder(); Serde<ProprietaryObject> weirdFormatSerde = new ProprietaryWeirdFormatSerde(); Serde<ProprietaryObject> avroSerde = new ProprietaryAvroSerde(); builder.stream(“proprietary_input_topic", Consumed.with( Serdes.String(), weirdFormatSerde)) .to("avro_output_topic", Produced.with( Serdes.String(), avroSerde));
  • 29. 33 Ingest Transformation - KSQL ksql> CREATE STREAM source (uid INT, name VARCHAR) WITH (KAFKA_TOPIC='mysql_users', VALUE_FORMAT='JSON‘); ksql> CREATE STREAM target_avro WITH (VALUE_FORMAT='AVRO', KAFKA_TOPIC='mysql_users_avro') AS SELECT * FROM source; Source: https://gist.github.com/rmoff/165b05e4554c41719b71f1a47ee7b113
  • 30. © 2018 OpenCore GmbH & Co. KG 34 Stream Processing • Stream processing jobs read converted avro topics and create enriched topics/alerts/… by • Joining streams • Aggregating streams • Filtering or alerting on streams • …
  • 31. © 2018 OpenCore GmbH & Co. KG 35 DISCLAIMER
  • 32. © 2018 OpenCore GmbH & Co. KG 36 Gate Changes • Gate changes can be based on different information • Delays of the incoming flight • Changes on other outgoing flights • … • Join relevant streams and publish change events that are consumed by • Apps • Gate monitors • Departure boards • …
  • 33. © 2018 OpenCore GmbH & Co. KG 37 Passenger Count • Join stream of tickets scanned before line to security check and camera count of passengers leaving security check to estimate number of waiting passengers • Change routing of passengers (physical: signs change & digital: different routing in app) • Also consumed by • Monitors to display predicted waiting time • App to display predicted wait time • Predicition systems to feed models for capacity planning • Models to predict if a passenger might miss his flight -> reroute to priority lane
  • 34. © 2018 OpenCore GmbH & Co. KG 38 Wait Time • Calculate how long a passenger took to clear the security checkpoint by joining when he scanned his boarding pass and when he is first spotted by an iBeacon beyond security • Push offers based on wait time and flight time • Long wait, lot of time till take-off -> free coffee or sandwich • Long wait, short time till take-off -> duty free voucher • …
  • 35. © 2018 OpenCore GmbH & Co. KG 39 Baggage Notification • Baggage containers are scanned when they are loaded/unloaded • By joining this with data from the baggage sorter passengers could receive push notifications when their luggage is loaded/unloaded into/from the plane
  • 36. © 2018 OpenCore GmbH & Co. KG 40 Arrival At Gate • There are complex models running to estimate when the plane will arrive at the gate after it has landed • Based on ground radar data • Can be used to • Predict whether the following flight might be delayed • Coordinate cleaning crews • Coordinate refueling • Feed into gate change decisions
  • 37. © 2018 OpenCore GmbH & Co. KG 41 An Example Flow {"boardingpass_id":"123", "passenger“:"smith", "flight_number":"LH454“, “checked_bags”:1} {"boardingpass_id":"123", "security_area":"t1_2", "status":"success"} {"security_area":"t1_2", "count":"1"} {"passenger":"smith", "beacon_id":"t1_b123"} {"boardingpass_id":"123", "item_group":"cigarettes"} {"boardingpass_id":"123", "status":"success"} {"flight_no":"LH454", “runway":“1north"} {“old_gate":“a12”, “new_gate":“e50"}
  • 38. © 2018 OpenCore GmbH & Co. KG 42 Check-In Event {"boardingpass_id":"123", "passenger":"smith", "flight_number":"LH454“, “terminal”:“terminal1” } check_in_count CREATE TABLE check_in_count AS SELECT terminal, count(terminal) FROM security_in WINDOW TUMBLING (SIZE 24 hour) GROUP BY terminal; check_in What is it good for? • Early warning for security capacity • „Don‘t dawdle“ warning based on security queues
  • 39. © 2018 OpenCore GmbH & Co. KG 43 Passenger Enters Security Area {"boardingpass_id":"123", "security_area":"t1_2", "status" : "success"} security_in_count CREATE TABLE security_in_count AS SELECT security_area, count(security_area) FROM security_in WINDOW TUMBLING (SIZE 24 hour) WHERE status='success' GROUP BY security_area; security_in What is it good for? • Monitor for failed attempts • Passenger routing to security • Unload baggage of late passengers • … time_to_security SELECT s.boardingpass_id, c.rowtime - s.rowtime as time_to_security FROM security_in s LEFT JOIN check_in c WITHIN 1 HOUR ON s.boardingpass_id=c.boardingpass_id;
  • 40. © 2018 OpenCore GmbH & Co. KG 44 Passenger Leaves Security Area {"security_area":"t1_2", "count“:"1"} security_out security_out_count CREATE TABLE security_out_count AS SELECT security_area, sum(count) FROM security_out WINDOW TUMBLING (SIZE 24 hour) GROUP BY security_area; security_in_count current_count What is it good for? • Capacity planning • Wait time prediction • Passenger routing (apps & physical) • Alerting on late passengers checking in • … SELECT i.terminal AS terminal, i.KSQL_COL_1 AS entry, o.KSQL_COL_1 AS exit, i.KSQL_COL_1 - o.KSQL_COL_1 AS waiting FROM security_in_count i INNER JOIN security_out_count o ON i.terminal=o.terminal;
  • 41. © 2018 OpenCore GmbH & Co. KG 45 Passenger Located Via iBeacon {"passenger":"smith", "beacon_id":"t1_b123"} security_duration security_in dutyfree_joined CREATE STREAM dutyfree_joined AS SELECT c.boardingpass_id, d.passenger FROM dutyfree_in d LEFT JOIN security_in s WITHIN 1 HOURS ON s.passenger=d.passenger; dutyfree_in SELECT d.boardingpass_id, d.d_passenger, d.rowtime - s.rowtime as time_in_security FROM dutyfree_in_with_bc d LEFT JOIN security_in s WITHIN 1 HOUR ON d.boardingpass_id=s.boardingpass_id; What is it good for? • Refining wait time prediction • Targeted questionaire (find reasons for outliers) • Vouchers for huge delays • …
  • 42. © 2018 OpenCore GmbH & Co. KG 46 Purchase Event {"boardingpass_id":"123", "item_group":"cigarettes"} flight_information check_in dutyfree_joined What is it good for? • Retail models • Route to smoking area nearest to gate • Advise of walk time if time is tight • … dutyfree_purchase CREATE STREAM dutyfree_joined AS SELECT c.boardingpass_id, c.passenger, p.purchase_type, f.gate FROM dutyfree_purchase p LEFT JOIN check_in c WITHIN 1 HOURS ON c.passenger=p.passenger LEFT JOIN flight_information f WITHIN 1 HOURS ON f.flight_number = c.flight_number;
  • 43. © 2018 OpenCore GmbH & Co. KG 47 Gate Change expected_gate_arrival notifications expected_gate_departure CREATE STREAM gate_wait_time AS SELECT a.flight, d.departure_time - a.arrival_time as wait_time FROM expected_gate_arrival a INNER JOIN expected_gate_departure d WITHIN 1 HOURS ON a.gate=d.gate; gate_wait_time gate_change CREATE STREAM gate_change AS SELECT flight FROM gate_wait_time WHERE wait_time > 600000; CREATE STREAM notifications AS SELECT f.passenger FROM gate_change g LEFT JOIN flight_information f WITHIN 1 HOURS ON f.gate=g.gate; flight_information
  • 44. © 2018 OpenCore GmbH & Co. KG 48 Passenger Boards Plane {"boardingpass_id":"123", "status":"success"} gate_in What is it good for? • Alert on bags without matching passengers • Trigger unloading based on related events • Gate closed • Time based • … bags_joined check_in baggage_loaded CREATE STREAM bag_join AS SELECT c.passenger, c.bags FROM gate_in g LEFT JOIN check_in c WITHIN 1 HOURS ON c.boardingpass_id=g.boardingpass_id LEFT JOIN baggage_loaded b WITHIN 1 HOURS ON b.bag_id = c.bag_id;
  • 45. Thank You! © 2018 OpenCore GmbH & Co. KG