Weitere ähnliche Inhalte Ähnlich wie Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL (20) Kürzlich hochgeladen (20) Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL 1. Stream Processing Airport Data
Sönke Liebau – Co-Founder and Partner @ OpenCore
October 17th 2018
Serving the Real-Time Data Needs of an Airport with Kafka
Streams and KSQL
2. Who Am I?
• Partner & Co-Founder at
• Small consulting company with a Big Data & Open Source focus
• First production Kafka deployment in 2014
Website: www.opencore.com
soenke.liebau@opencore.com
https://www.linkedin.com/in/soenkeliebau/
@soenkeliebau
14. What Is KSQL?
Confluent KSQL is the open source,
streaming SQL engine that enables
real-time data processing against
Apache Kafka®
Source: https://www.confluent.io/product/ksql/ 14
15. © 2018 OpenCore GmbH & Co. KG 17
Kafka Streams In The Ecosystem
Sources KafkaConnect
KafkaConnect
Destinations
Kafka
Streams
Jobs
17. 20
Using Kafka Streams
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
KStream<String, String> textLines = builder.stream("streams-plaintext-input",
Consumed.with(stringSerde, stringSerde);
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+")))
.groupBy((key, value) -> value)
.count()
wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
Source: https://kafka.apache.org/20/documentation/streams/quickstart
18. 21
Using Kafka Streams
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
KStream<String, String> textLines = builder.stream("streams-plaintext-input",
Consumed.with(stringSerde, stringSerde);
KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+")))
.groupBy((key, value) -> value)
.count()
wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(),
Serdes.Long()));
Source: https://kafka.apache.org/20/documentation/streams/quickstart
…
19. © 2018 OpenCore GmbH & Co. KG 22
Using KSQL
RestInterface
CLI
Rest
Client
SELECT *
FROM security_in
WHERE status=’success’
AND terminal=’t1’;
22. Kafka Streams KSQL
When To Use Which?
• Offers lower level access
• More data formats supported
• Queryable state
• Problems that cannot be expressed in
SQL
• Easier for people used to SQL
• No need for additional orchestration
• Data exploration
© 2018 OpenCore GmbH & Co. KG 25
24. © 2017 OpenCore GmbH & Co. KG 27
A Few Facts Up Front
• A lot of independent data sources
• Airline ticketing
• Baggage transport system
• Passenger counting
• Retail
• Radar
• Weather
• …
• Spread over multiple companies
• Many legacy interfaces
25. © 2018 OpenCore GmbH & Co. KG 28
Integrations
Operations
Database
External
System
External
System
External
System
External
System
External
System
External
System
26. © 2018 OpenCore GmbH & Co. KG 29
Isolated Islands Of Data
• A lot of isolated data stores to provide data for necessary solutions
• Spiderweb of integrations
• Operational DB needs to push data to a lot of systems
• Many different formats
27. © 2018 OpenCore GmbH & Co. KG 31
The Dream
…
Weird
binary
source
XML
Source
Destination
Destination
Destination
Raw Source Processed
RestStream Processing
28. © 2018 OpenCore GmbH & Co. KG 32
Ingest Transformation - Kafka Streams
StreamsBuilder builder = new StreamsBuilder();
Serde<ProprietaryObject> weirdFormatSerde = new ProprietaryWeirdFormatSerde();
Serde<ProprietaryObject> avroSerde = new ProprietaryAvroSerde();
builder.stream(“proprietary_input_topic",
Consumed.with(
Serdes.String(),
weirdFormatSerde))
.to("avro_output_topic",
Produced.with(
Serdes.String(),
avroSerde));
29. 33
Ingest Transformation - KSQL
ksql> CREATE STREAM source (uid INT, name VARCHAR) WITH (KAFKA_TOPIC='mysql_users',
VALUE_FORMAT='JSON‘);
ksql> CREATE STREAM target_avro WITH (VALUE_FORMAT='AVRO', KAFKA_TOPIC='mysql_users_avro')
AS SELECT * FROM source;
Source: https://gist.github.com/rmoff/165b05e4554c41719b71f1a47ee7b113
30. © 2018 OpenCore GmbH & Co. KG 34
Stream Processing
• Stream processing jobs read converted avro topics and create enriched
topics/alerts/… by
• Joining streams
• Aggregating streams
• Filtering or alerting on streams
• …
32. © 2018 OpenCore GmbH & Co. KG 36
Gate Changes
• Gate changes can be based on different information
• Delays of the incoming flight
• Changes on other outgoing flights
• …
• Join relevant streams and publish change events that are consumed by
• Apps
• Gate monitors
• Departure boards
• …
33. © 2018 OpenCore GmbH & Co. KG 37
Passenger Count
• Join stream of tickets scanned before line to security check and camera
count of passengers leaving security check to estimate number of waiting
passengers
• Change routing of passengers (physical: signs change & digital: different routing in
app)
• Also consumed by
• Monitors to display predicted waiting time
• App to display predicted wait time
• Predicition systems to feed models for capacity planning
• Models to predict if a passenger might miss his flight -> reroute to priority lane
34. © 2018 OpenCore GmbH & Co. KG 38
Wait Time
• Calculate how long a passenger took to clear the security checkpoint by
joining when he scanned his boarding pass and when he is first spotted by
an iBeacon beyond security
• Push offers based on wait time and flight time
• Long wait, lot of time till take-off -> free coffee or sandwich
• Long wait, short time till take-off -> duty free voucher
• …
35. © 2018 OpenCore GmbH & Co. KG 39
Baggage Notification
• Baggage containers are scanned when they are loaded/unloaded
• By joining this with data from the baggage sorter passengers could receive
push notifications when their luggage is loaded/unloaded into/from the
plane
36. © 2018 OpenCore GmbH & Co. KG 40
Arrival At Gate
• There are complex models running to estimate when the plane will arrive
at the gate after it has landed
• Based on ground radar data
• Can be used to
• Predict whether the following flight might be delayed
• Coordinate cleaning crews
• Coordinate refueling
• Feed into gate change decisions
37. © 2018 OpenCore GmbH & Co. KG 41
An Example Flow
{"boardingpass_id":"123",
"passenger“:"smith",
"flight_number":"LH454“,
“checked_bags”:1}
{"boardingpass_id":"123",
"security_area":"t1_2",
"status":"success"}
{"security_area":"t1_2",
"count":"1"}
{"passenger":"smith",
"beacon_id":"t1_b123"}
{"boardingpass_id":"123",
"item_group":"cigarettes"}
{"boardingpass_id":"123",
"status":"success"}
{"flight_no":"LH454",
“runway":“1north"}
{“old_gate":“a12”,
“new_gate":“e50"}
38. © 2018 OpenCore GmbH & Co. KG 42
Check-In Event
{"boardingpass_id":"123",
"passenger":"smith",
"flight_number":"LH454“,
“terminal”:“terminal1” }
check_in_count
CREATE TABLE check_in_count
AS SELECT terminal, count(terminal)
FROM security_in
WINDOW TUMBLING (SIZE 24 hour)
GROUP BY terminal;
check_in
What is it good for?
• Early warning for security capacity
• „Don‘t dawdle“ warning based on
security queues
39. © 2018 OpenCore GmbH & Co. KG 43
Passenger Enters Security Area
{"boardingpass_id":"123",
"security_area":"t1_2",
"status" : "success"}
security_in_count
CREATE TABLE security_in_count
AS SELECT
security_area,
count(security_area)
FROM security_in
WINDOW TUMBLING (SIZE 24 hour)
WHERE status='success'
GROUP BY security_area;
security_in
What is it good for?
• Monitor for failed attempts
• Passenger routing to security
• Unload baggage of late passengers
• …
time_to_security
SELECT
s.boardingpass_id, c.rowtime - s.rowtime
as time_to_security
FROM security_in s
LEFT JOIN check_in c WITHIN 1 HOUR
ON s.boardingpass_id=c.boardingpass_id;
40. © 2018 OpenCore GmbH & Co. KG 44
Passenger Leaves Security Area
{"security_area":"t1_2",
"count“:"1"}
security_out security_out_count
CREATE TABLE security_out_count
AS SELECT security_area, sum(count)
FROM security_out
WINDOW TUMBLING (SIZE 24 hour)
GROUP BY security_area;
security_in_count
current_count
What is it good for?
• Capacity planning
• Wait time prediction
• Passenger routing (apps & physical)
• Alerting on late passengers checking in
• …
SELECT
i.terminal AS terminal,
i.KSQL_COL_1 AS entry,
o.KSQL_COL_1 AS exit,
i.KSQL_COL_1 - o.KSQL_COL_1 AS
waiting
FROM security_in_count i
INNER JOIN security_out_count o
ON i.terminal=o.terminal;
41. © 2018 OpenCore GmbH & Co. KG 45
Passenger Located Via iBeacon
{"passenger":"smith",
"beacon_id":"t1_b123"}
security_duration
security_in
dutyfree_joined
CREATE STREAM dutyfree_joined
AS SELECT c.boardingpass_id, d.passenger
FROM dutyfree_in d
LEFT JOIN security_in s WITHIN 1 HOURS
ON s.passenger=d.passenger;
dutyfree_in
SELECT
d.boardingpass_id,
d.d_passenger,
d.rowtime - s.rowtime as
time_in_security
FROM dutyfree_in_with_bc d
LEFT JOIN security_in s WITHIN 1 HOUR
ON d.boardingpass_id=s.boardingpass_id;
What is it good for?
• Refining wait time prediction
• Targeted questionaire (find reasons for
outliers)
• Vouchers for huge delays
• …
42. © 2018 OpenCore GmbH & Co. KG 46
Purchase Event
{"boardingpass_id":"123",
"item_group":"cigarettes"}
flight_information
check_in
dutyfree_joined
What is it good for?
• Retail models
• Route to smoking area nearest to gate
• Advise of walk time if time is tight
• …
dutyfree_purchase
CREATE STREAM dutyfree_joined
AS SELECT
c.boardingpass_id,
c.passenger,
p.purchase_type,
f.gate
FROM dutyfree_purchase p
LEFT JOIN check_in c WITHIN 1 HOURS
ON c.passenger=p.passenger
LEFT JOIN flight_information f
WITHIN 1 HOURS
ON f.flight_number = c.flight_number;
43. © 2018 OpenCore GmbH & Co. KG 47
Gate Change
expected_gate_arrival
notifications
expected_gate_departure
CREATE STREAM gate_wait_time
AS SELECT
a.flight,
d.departure_time - a.arrival_time as wait_time
FROM expected_gate_arrival a
INNER JOIN expected_gate_departure d WITHIN 1 HOURS
ON a.gate=d.gate;
gate_wait_time
gate_change
CREATE STREAM gate_change
AS SELECT
flight
FROM gate_wait_time
WHERE wait_time > 600000;
CREATE STREAM notifications
AS SELECT f.passenger
FROM gate_change g
LEFT JOIN flight_information f
WITHIN 1 HOURS ON f.gate=g.gate;
flight_information
44. © 2018 OpenCore GmbH & Co. KG 48
Passenger Boards Plane
{"boardingpass_id":"123",
"status":"success"}
gate_in
What is it good for?
• Alert on bags without matching passengers
• Trigger unloading based on related events
• Gate closed
• Time based
• …
bags_joined
check_in baggage_loaded
CREATE STREAM bag_join
AS SELECT
c.passenger,
c.bags
FROM gate_in g
LEFT JOIN check_in c
WITHIN 1 HOURS
ON c.boardingpass_id=g.boardingpass_id
LEFT JOIN baggage_loaded b
WITHIN 1 HOURS
ON b.bag_id = c.bag_id;