SlideShare ist ein Scribd-Unternehmen logo
1 von 97
Downloaden Sie, um offline zu lesen
Using Location Data to
Showcase Keys, Windows, and Joins
in Kafka Streams DSL and KSQL
Where are my Keys?
Neil Buesing
Kafka Summit
London, 2019
!1
Introduction
• Object Partners, Inc
• Located : Minneapolis, Minnesota & Omaha, Nebraska, USA
• http://www.objectpartners.com
• Software Development Consulting
• JVM Technologies, Mobile/Web, DevOps, Real-Time Data
• Neil Buesing
• Director, Real-Time Data
• 19 Years with Object Partners, Inc.
• Find me at:
• https://www.linkedin.com/in/neilbuesing
• https://twitter.com/nbuesing
• https://github.com/nbuesing
!2
https://github.com/nbuesing/kafka-summit-london-2019
Source Code
• Fully Contained GitHub Repository
• Java
• Spring Boot
• Gradle Build Files
• Docker Container (Kafka Cluster)
!3
The “Pre” Projects
• Common
• Avro Data Model 

• KdTree

• Bucket / Bucket Factory

• Geolocation
• RESTful Location to Airport Lookup Service

• Connector
• OpenSky Apache Kafka Source Connector

• Web Application
• D3 / Spring Boot / Spring / Spring MVC
• Docker
• 3 broker, 1 zookeeper, 1 SR docker compose file
!4
Avro Data Model
Record (OpenSky)
Aircraft
Location
Nearest Airport
Count
Distance
!5
Airport Lookup Service
• Get Location of An Airport
• /airport/{code}
• Closest Airport
• /airport?latitude={latitude}&longitude={longitude}
!6
OpenSky Source Connector
• Pulls current data from OpenSky API
• Offset — a timestamp
• Polling - 30 seconds
The OpenSky Network, http://www.opensky-network.org
!7
OpenSky Source Connector
• Pulls current data from OpenSky API
• Offset — a timestamp
• Polling - 30 seconds
{
"time": 1535739820,
"states": [
[
"a12345",
“N0000 ",
"United States",
1535739649,
1535739649,
-122.5351,
38.1321,
167.64,
false,
31.29,
226.33,
-2.93,
null,
160.02,
null,
false,
0
]
]
}
The OpenSky Network, http://www.opensky-network.org
!7
OpenSky Source Connector
• Pulls current data from OpenSky API
• Offset — a timestamp
• Polling - 30 seconds
{
"time": 1535739820,
"states": [
[
"a12345",
“N0000 ",
"United States",
1535739649,
1535739649,
-122.5351,
38.1321,
167.64,
false,
31.29,
226.33,
-2.93,
null,
160.02,
null,
false,
0
]
]
}
transponder
callsign
geolocation
position update
The OpenSky Network, http://www.opensky-network.org
!7
OpenSky Source Connector
• Pulls current data from OpenSky API
• Offset — a timestamp
• Polling - 30 seconds
{
"time": 1535739820,
"states": [
[
"a12345",
“N0000 ",
"United States",
1535739649,
1535739649,
-122.5351,
38.1321,
167.64,
false,
31.29,
226.33,
-2.93,
null,
160.02,
null,
false,
0
]
]
}
transponder
callsign
geolocation
position update
api.getStates(0, null, new OpenSkyApi.BoundingBox(24.39, 49.38, -124.84, -66.88));
The OpenSky Network, http://www.opensky-network.org
!7
OpenSky Source Connector
• Pulls current data from OpenSky API
• Offset — a timestamp
• Polling - 30 seconds
{
"time": 1535739820,
"states": [
[
"a12345",
“N0000 ",
"United States",
1535739649,
1535739649,
-122.5351,
38.1321,
167.64,
false,
31.29,
226.33,
-2.93,
null,
160.02,
null,
false,
0
]
]
}
transponder
callsign
geolocation
position update
api.getStates(0, null, new OpenSkyApi.BoundingBox(24.39, 49.38, -124.84, -66.88));
api.getStates(0, null, new OpenSkyApi.BoundingBox(-80.0, 80.0, -180.0, 180.0));
The OpenSky Network, http://www.opensky-network.org
!7
OpenSky Source Connector
The OpenSky Network, http://www.opensky-network.org
{
"time": 1535739820,
"states": [
[
"a12345",
“N0000 ",
"United States",
1535739649,
1535739649,
-122.5351,
38.1321,
167.64,
false,
31.29,
226.33,
-2.93,
null,
160.02,
null,
false,
0
]
]
}
• Kafka Connect
• Not a lot of code needed to write
• Considering making this an Open

Source Connector
!8
Nearest Airport
• How many aircrafts are closer to a given airport than
any other airport?
• What is my stateful window/duration I want to
measure?
• How do I access the data?
• How do I count my data?
• How do I find my data?
• How do I provide a visual of this data to the user?
!9
Kafka Streams
Nearest Airport
• How many aircrafts are closer to a given airport than
any other airport?
• What is my stateful window/duration I want to
measure?
• How do I access the data?
• How do I count my data?
• How do I find my data?
• How do I provide a visual of this data to the user?
!9
D3
Kafka Streams
Nearest Airport
• How many aircrafts are closer to a given airport than
any other airport?
• What is my stateful window/duration I want to
measure?
• How do I access the data?
• How do I count my data?
• How do I find my data?
• How do I provide a visual of this data to the user?
!9
Nearest Airport
• Airport Lookup is based distance
• First thought - Airport as a KTable.
• How do I find my keys?
• Window - 5 minute tumbling window
• What to do with late arriving data?
• Source - Airports
• RESTful endpoint
• Source - OpenSky
• Kafka Source Connector
• Frequency of updates?
• Source Offsets?
!10
• Create A Windowed KTable of all flights

• 5 minute window, 1 minute grace period

• On update, keep the most recent reading

• Materialize the data so it is programmatically accessible

• Repeat for other items—Materialize for all state-stores

• What about multiple instances of my Kafka Streams Applications?
Nearest Airport - Lookup
!11
Nearest Airport - Lookup
public KTable<Windowed<String>, Record> flightsStore() {
return flights()
.groupByKey()
.windowedBy(TimeWindows.of("5m").grace("1m"))
.reduce((current, v) -> { return v; }, Materialized.as(“flights”));
}
!12
Keep latest
Make programmatically
accessible
Nearest Airport - Lookup
private ReadOnlyWindowStore<String, Record> flights() {
return kafkaStreams().store(
"flights", QueryableStoreTypes.windowStore()
);
}
• Provide access to the state store as a queryable read-only window
store.
!13
Access by Name
Nearest Airport - Lookup
public List<AircraftJson> flights(Long start, Long end) {
KeyValueIterator<Windowed<String>, Record> iterator =
flights().fetchAll(Instant.ofEpochMilli(start), Instant.ofEpochMilli(end));
…
}
• Provide access to the state store as a queryable read-only window
store.
!14
fetchAll for the given time
window
state-store for kTables only store
data in the partitions being
processed by a given streams
Nearest Airport - Lookup
Aircrafts
Pulled from flightStore()
Materialized Stores
streams.allMetadataForStore("nearest_airport").forEach(app -> {
list.add(app.host() + ":" + app.hostInfo().port());
});
!15
Nearest Airport - Count
!16
• Algorithm #1 - Count the Aircrafts
Nearest Airport - Count
.selectKey((key, value) -> value.getAirport())
.groupByKey()
.windowedBy(TimeWindows.of(“5m").grace(Duration.of("1m"))
.aggregate(
() -> 0,
(key, value, aggregate) -> {
return aggregate + 1;
},
Materialized.as(NEAREST_AIRPORT_STORE)
)
.toStream((wk, v) -> wk.key())
!17
Nearest Airport - Count
.selectKey((key, value) -> value.getAirport())
.groupByKey()
.windowedBy(TimeWindows.of(“5m").grace(Duration.of("1m"))
.aggregate(
() -> 0,
(key, value, aggregate) -> {
return aggregate + 1;
},
Materialized.as(NEAREST_AIRPORT_STORE)
)
.toStream((wk, v) -> wk.key())
What’s wrong with
this approach?
!17
Nearest Airport - Count - KSQL
@UdfDescription(name = "closestAirport", description = "return airport")
public class ClosestAirport {
private Geolocation geolocation = Feign.builder()
.options(new Request.Options(200, 200))
.encoder(new JacksonEncoder())
.decoder(new JacksonDecoder())
.target(Geolocation.class, "http://geolocation:9080");
@Udf(description = "find closest airport to given location.")
public String closestAirport(final Double latitude, final Double longitude) {
return geolocation.closestAirport(latitude, longitude).getCode();
}
}
!18
• Step 1 : create specialized User Defined Function Written in Java
Nearest Airport - Count - KSQL
!19
• Step 2 : deploy specialized function compiled as uber jar
Nearest Airport - Count - KSQL
create stream 
ksql_nearest_airport 
as select 
aircraft->transponder transponder, 
closestAirport(location->latitude, location->longitude) as airport, 
location 
from flights 
partition by transponder;
!20
• Step 3 : use specialized function to enrich the streaming data
Nearest Airport - Count - KSQL
create table 
ksql_nearest_airport_count 
as select 
airport, 
count(*) as count 
from ksql_nearest_airport window tumbling (size 5 minutes) 
group by airport;
!21
• Step 4 : aggregate the data using standard KSQL syntax
Nearest Airport - Aggregate
!22
• Algorithm #2 - Collect the Aircrafts
Nearest Airport - Aggregate
flights()
.map((key, value) -> {
Airport airport = geolocation.closestAirport(value.getLocation());
return KeyValue.pair(key, createNearestAirport(airport, value));
})
.groupBy((k, v) -> v.getAirport())
.windowedBy(TimeWindows.of("5m"))
.aggregate(() -> null,
(key, value, aggregate) -> {
if (aggregate == null) {
aggregate = createAgg(value.getAirport(), value.getAirportLocation());
}
if (!aggregate.getAircrafts().contains(value.getCallsign())) {
aggregate.getAircrafts().add(value.getCallsign());
}
return aggregate;
}, Materialized.as(NEAREST_AIRPORT_AGG_STORE));
!23
Nearest Airport - Aggregate
flights()
.map((key, value) -> {
Airport airport = geolocation.closestAirport(value.getLocation());
return KeyValue.pair(key, createNearestAirport(airport, value));
})
.groupBy((k, v) -> v.getAirport())
.windowedBy(TimeWindows.of("5m"))
.aggregate(() -> null,
(key, value, aggregate) -> {
if (aggregate == null) {
aggregate = createAgg(value.getAirport(), value.getAirportLocation());
}
if (!aggregate.getAircrafts().contains(value.getCallsign())) {
aggregate.getAircrafts().add(value.getCallsign());
}
return aggregate;
}, Materialized.as(NEAREST_AIRPORT_AGG_STORE));
Double Counting?
!23
create table 
ksql_nearest_airport_count_agg_count 
as select 
airport, 
countList(count) 
from ksql_nearest_airport window tumbling (size 5 minutes) 
group by airport;
Nearest Airport - Aggregate - KSQL
create table 
ksql_nearest_airport_count_agg 
as select 
airport, 
collect_set(transponder) as count 
from ksql_nearest_airport window tumbling (size 5 minutes) 
group by airport;
!24
Nearest Airport - Suppression
!25
• Algorithm #3 - Suppress Streaming of the Topology

• Count With Suppression 

(KIP-328: Ability to suppress updates for KTables)

• Kafka Streams 2.1
Nearest Airport - Suppression
public KStream<String, Record> flightsSuppressed() {
return flightsStore()
.suppress(Suppressed.untilWindowCloses(
Suppressed.BufferConfig.unbounded())
).toStream()
.selectKey((k, v) -> k.key());
}
KStream<String, NearestAirport> stream =
flightsSuppressed()
.map((key, value) -> {
Airport airport = geolocation.closestAirport(value.getLocation());
return KeyValue.pair(key, createNearestAirport(airport, value));
});
!26
Nearest Airport - Suppression
!27
Nearest Airport - Suppression
!27
Algorithm Concerns?
Nearest Airport - Suppression
!27
Algorithm Concerns?
Information Delay
Nearest Airport - Suppression
!27
Algorithm Concerns?
Information Delay
Memory
Nearest Airport - Suppression
!27
Algorithm Concerns?
Information Delay
Memory
Not Available in KSQL
Nearest Airport - Retrospective
• Kafka Streams
• Windows are not magic
• treating it like magic means you will get it wrong
• Window state-stores are powerful
• late arriving messages
• retention (default 24 hours)
• materialization
• change-log topics
• suppression
• keeps evolving (read the KIPs)
!28
Nearest Neighbor
!29
Nearest Neighbor
!30
• Find the nearest Blue Team aircraft for every given
Red Team aircraft.

• Make sure the algorithm can be properly sharded so
the work can be distributed.

• Selected a five minute time window.
Nearest Neighbor
For this red aircraft,
find the closest aircraft.
!31
Nearest Neighbor
Create a 3°x3° region that I call
the “bucket”
this becomes the topic key
!32
Nearest Neighbor
Aircraft 1
!33
Nearest Neighbor
!34
Bucket overlap
distance calculation performed
Nearest Neighbor
!35
Distance Object
Distance, Red Aircraft, & Blue
Aircraft
Keep all the information needed
for next operation
Nearest Neighbor
!35
Distance Object
Distance, Red Aircraft, & Blue
Aircraft
Keep all the information needed
for next operation
What did I do
wrong?
Nearest Neighbor
Aircraft 2
!36
Nearest Neighbor
Place blue aircraft
into 9 “buckets”
(replicate the data)
!37
DSL : flatMap()
KSQL : “insert into”
Nearest Neighbor
Bucket overlap
!38
Nearest Neighbor
Calculate distance
!39
Nearest Neighbor
Aircraft 3
!40
Nearest Neighbor
Place (replicate)
aircraft into 9 “buckets”
!41
Nearest Neighbor
No Bucket overlap
no distance calculated
Aircraft 3 not sharded
with red aircraft
!42
Nearest Neighbor
Nearest Neighbor
Aircraft 4
!43
Nearest Neighbor
Place (replicate)
aircraft into 9 “buckets”
!44
Nearest Neighbor
Bucket overlap
distance calculation performed
!45
51° N, 6° W51° N, 3° E 51° N, 3° W51° N, 0° W 51° N, 9° W
A Ba caa bbb ccAA BBa cb
!46
51° N, 6° W51° N, 3° E 51° N, 3° W51° N, 0° W 51° N, 9° W
A Ba ca a bbb ccAA BBa cb
!46
51° N, 6° W51° N, 3° E 51° N, 3° W51° N, 0° W 51° N, 9° W
A
B
a ca
a
b
b
b c
c
A A
B
B
a
cb
!46
51° N, 6° W51° N, 3° E 51° N, 3° W51° N, 0° W 51° N, 9° W
A
B
a ca
a
b
b
b c
c
A
A
B
B
a
cb
!46
Nearest Neighbor
!47
Nearest Neighbor
!47
Stream Concepts
Nearest Neighbor
!47
Stream Concepts
Key on bucket
Nearest Neighbor
!47
Stream Concepts
Key on bucket
Re-Key on red aircraft
Nearest Neighbor
!47
Stream Concepts
Key on bucket
Re-Key on red aircraft
Aggregate (reduce) on red aircraft
(keeping smallest distance)
Nearest Neighbor
!48
Nearest Neighbor
!48
Algorithm Limitations?
Nearest Neighbor
!48
Algorithm Limitations?
Bucket size selection
Nearest Neighbor
!48
Algorithm Limitations?
Bucket size selection
Sparse Data Location
missing result
Nearest Neighbor
!48
Algorithm Limitations?
Bucket size selection
Sparse Data Location
missing result
Sparse Data Location
wrong result
Nearest Neighbor
!49
Nearest Neighbor
!49
Performance Limitations?
Nearest Neighbor
!49
Performance Limitations?
Partitioning and Key Hash
Nearest Neighbor
!49
Performance Limitations?
Partitioning and Key Hash
Uniformity of the Data
Nearest Neighbor
!49
Performance Limitations?
Partitioning and Key Hash
Uniformity of the Data
Replication of Data
Nearest NeighborNearest Neighbor
!50
Nearest Neighbor
!51
Nearest Neighbor
!52
Nearest Neighbor
!53
red()
.map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()),
value))
.join(blue()
.flatMap((key, value) ->
bucketFactory.createSurronding(value.getLocation())
.stream()
.map((b) -> KeyValue.pair(b.toString(), value))
.collect(Collectors.toList())).selectKey((key, value) -> key),
(value1, value2) -> {
double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation());
return new Distance(value1, value2, d);
}, JoinWindows.of(WINDOW),
Joined.with(Serdes.String(), recordSerde, recordSerde))
.to("distance", Produced.with(Serdes.String(), distaneSerde));
Nearest Neighbor
!54
red()
.map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()),
value))
.join(blue()
.flatMap((key, value) ->
bucketFactory.createSurronding(value.getLocation())
.stream()
.map((b) -> KeyValue.pair(b.toString(), value))
.collect(Collectors.toList())).selectKey((key, value) -> key),
(value1, value2) -> {
double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation());
return new Distance(value1, value2, d);
}, JoinWindows.of(WINDOW),
Joined.with(Serdes.String(), recordSerde, recordSerde))
.to("distance", Produced.with(Serdes.String(), distaneSerde));
Nearest Neighbor
!54
red()
.map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()),
value))
.join(blue()
.flatMap((key, value) ->
bucketFactory.createSurronding(value.getLocation())
.stream()
.map((b) -> KeyValue.pair(b.toString(), value))
.collect(Collectors.toList())).selectKey((key, value) -> key),
(value1, value2) -> {
double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation());
return new Distance(value1, value2, d);
}, JoinWindows.of(WINDOW),
Joined.with(Serdes.String(), recordSerde, recordSerde))
.to("distance", Produced.with(Serdes.String(), distaneSerde));
Nearest Neighbor
!54
red()
.map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()),
value))
.join(blue()
.flatMap((key, value) ->
bucketFactory.createSurronding(value.getLocation())
.stream()
.map((b) -> KeyValue.pair(b.toString(), value))
.collect(Collectors.toList())).selectKey((key, value) -> key),
(value1, value2) -> {
double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation());
return new Distance(value1, value2, d);
}, JoinWindows.of(WINDOW),
Joined.with(Serdes.String(), recordSerde, recordSerde))
.to("distance", Produced.with(Serdes.String(), distaneSerde));
Nearest Neighbor
!54
red()
.map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()),
value))
.join(blue()
.flatMap((key, value) ->
bucketFactory.createSurronding(value.getLocation())
.stream()
.map((b) -> KeyValue.pair(b.toString(), value))
.collect(Collectors.toList())).selectKey((key, value) -> key),
(value1, value2) -> {
double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation());
return new Distance(value1, value2, d);
}, JoinWindows.of(WINDOW),
Joined.with(Serdes.String(), recordSerde, recordSerde))
.to("distance", Produced.with(Serdes.String(), distaneSerde));
Nearest Neighbor
!54
KTable<Windowed<String>, Distance> result = distance()
.selectKey((k, v) -> v.getRed().getAircraft().getTransponder())
.groupByKey()
.windowedBy(TimeWindows.of(WINDOW))
.aggregate(
() -> new Distance(null, null, Double.MAX_VALUE),
(k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg);
result.toStream()
.map((k, v) -> KeyValue.pair(k.key().toString(), v))
.to("closest");
Nearest Neighbor
!55
KTable<Windowed<String>, Distance> result = distance()
.selectKey((k, v) -> v.getRed().getAircraft().getTransponder())
.groupByKey()
.windowedBy(TimeWindows.of(WINDOW))
.aggregate(
() -> new Distance(null, null, Double.MAX_VALUE),
(k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg);
result.toStream()
.map((k, v) -> KeyValue.pair(k.key().toString(), v))
.to("closest");
Nearest Neighbor
!55
KTable<Windowed<String>, Distance> result = distance()
.selectKey((k, v) -> v.getRed().getAircraft().getTransponder())
.groupByKey()
.windowedBy(TimeWindows.of(WINDOW))
.aggregate(
() -> new Distance(null, null, Double.MAX_VALUE),
(k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg);
result.toStream()
.map((k, v) -> KeyValue.pair(k.key().toString(), v))
.to("closest");
Nearest Neighbor
!55
KTable<Windowed<String>, Distance> result = distance()
.selectKey((k, v) -> v.getRed().getAircraft().getTransponder())
.groupByKey()
.windowedBy(TimeWindows.of(WINDOW))
.aggregate(
() -> new Distance(null, null, Double.MAX_VALUE),
(k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg);
result.toStream()
.map((k, v) -> KeyValue.pair(k.key().toString(), v))
.to("closest");
Nearest Neighbor
!55
KTable<Windowed<String>, Distance> result = distance()
.selectKey((k, v) -> v.getRed().getAircraft().getTransponder())
.groupByKey()
.windowedBy(TimeWindows.of(WINDOW))
.aggregate(
() -> new Distance(null, null, Double.MAX_VALUE),
(k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg);
result.toStream()
.map((k, v) -> KeyValue.pair(k.key().toString(), v))
.to("closest");
Nearest Neighbor
!55
KTable<Windowed<String>, Distance> result = distance()
.selectKey((k, v) -> v.getRed().getAircraft().getTransponder())
.groupByKey()
.windowedBy(TimeWindows.of(WINDOW))
.aggregate(
() -> new Distance(null, null, Double.MAX_VALUE),
(k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg);
result.toStream()
.map((k, v) -> KeyValue.pair(k.key().toString(), v))
.to("closest");
Nearest Neighbor
!55
Same as Join Window?
Nearest Neighbor - KSQL
!56
create stream blueBucket 
as select 
bucketLocation(location->latitude, location->longitude, 3.0) as bucket, 
aircraft->transponder as transponder, aircraft->callsign as callsign, 
location->latitude as latitude, location->longitude as longitude 
from blue partition by bucket;
create stream blueBucket_w 
as select 
bucketLocation(location->latitude, location->longitude, 3.0, 'w') as bucket, 
aircraft->transponder as transponder, aircraft->callsign as callsign, 
location->latitude as latitude, location->longitude as longitude 
from blue partition by bucket;
insert into blueBucket select * from blueBucket_w;
Nearest Neighbor - Retrospective
• Kafka Streams
• understand your keys
• flatMap / insert into
• maintain the state you need within the domain
• understand intermediate topics
!57
Retrospective
• How do these examples help me / apply to me?

• Do I really need to write a distributed application?
• Should I be programmatically accessing the Kafka Stream State Stores? 

• Kafka Streams or KSQL
• how do I choose?
• do I have to choose?

• Some settings to investigate
• group.initial.rebalance.delay.ms
• segment size on change-log topic
• num.standby.replica
!58
Credits
• Object Partners, Inc.
• Apache Kafka
• Confluent Platform
• OpenSky - https://opensky-network.org/
• D3
• D3 v4
• http://techslides.com/d3-map-starter-i
• Apache Avro
• KdTree Author Justin Wetherell
• https://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/
KdTree.java
• Apache 2.0 License
• Distance Formula
• https://stackoverflow.com/questions/837872/calculate-distance-in-meters-when-you-know-longitude-and-latitude-in-java
• Additional open source libraries referenced in repository
!59
Questions
!60

Weitere ähnliche Inhalte

Was ist angesagt?

Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019confluent
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)confluent
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?confluent
 
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
Leveraging Microservice Architectures & Event-Driven Systems for Global APIsLeveraging Microservice Architectures & Event-Driven Systems for Global APIs
Leveraging Microservice Architectures & Event-Driven Systems for Global APIsconfluent
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
 
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail DataOn Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Dataconfluent
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent
 
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...HostedbyConfluent
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafkaconfluent
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRconfluent
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architecturesconfluent
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafkaconfluent
 

Was ist angesagt? (20)

Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
 
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
Leveraging Microservice Architectures & Event-Driven Systems for Global APIsLeveraging Microservice Architectures & Event-Driven Systems for Global APIs
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail DataOn Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architectures
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 

Ähnlich wie Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL and KSQL (Neil Buesing, Object Partners, Inc) Kafka Summit London 2019

Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
FrenchKit 2017: Server(less) Swift
FrenchKit 2017: Server(less) SwiftFrenchKit 2017: Server(less) Swift
FrenchKit 2017: Server(less) SwiftChris Bailey
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Dev fest 2020 taiwan how to debug microservices on kubernetes as a pros (ht...
Dev fest 2020 taiwan   how to debug microservices on kubernetes as a pros (ht...Dev fest 2020 taiwan   how to debug microservices on kubernetes as a pros (ht...
Dev fest 2020 taiwan how to debug microservices on kubernetes as a pros (ht...KAI CHU CHUNG
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...HostedbyConfluent
 
Automating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformGlobus
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale Subbu Allamaraju
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017iguazio
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Datadog
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSuzquiano
 
Workshop Infrastructure as Code - Suestra
Workshop Infrastructure as Code - SuestraWorkshop Infrastructure as Code - Suestra
Workshop Infrastructure as Code - SuestraMario IC
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)Eran Duchan
 
Backend, app e internet das coisas com NodeJS no Google Cloud Platform
Backend, app e internet das coisas com NodeJS no Google Cloud PlatformBackend, app e internet das coisas com NodeJS no Google Cloud Platform
Backend, app e internet das coisas com NodeJS no Google Cloud PlatformAlvaro Viebrantz
 
Backend, app e internet das coisas com NodeJS no Google Cloud Platform
Backend, app e internet das coisas com NodeJS no Google Cloud PlatformBackend, app e internet das coisas com NodeJS no Google Cloud Platform
Backend, app e internet das coisas com NodeJS no Google Cloud PlatformDevMT
 
Serverless technologies with Kubernetes
Serverless technologies with KubernetesServerless technologies with Kubernetes
Serverless technologies with KubernetesProvectus
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetesBen Hall
 

Ähnlich wie Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL and KSQL (Neil Buesing, Object Partners, Inc) Kafka Summit London 2019 (20)

Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
FrenchKit 2017: Server(less) Swift
FrenchKit 2017: Server(less) SwiftFrenchKit 2017: Server(less) Swift
FrenchKit 2017: Server(less) Swift
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Dev fest 2020 taiwan how to debug microservices on kubernetes as a pros (ht...
Dev fest 2020 taiwan   how to debug microservices on kubernetes as a pros (ht...Dev fest 2020 taiwan   how to debug microservices on kubernetes as a pros (ht...
Dev fest 2020 taiwan how to debug microservices on kubernetes as a pros (ht...
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
 
Automating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus Platform
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart Systems
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMS
 
Workshop Infrastructure as Code - Suestra
Workshop Infrastructure as Code - SuestraWorkshop Infrastructure as Code - Suestra
Workshop Infrastructure as Code - Suestra
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)
 
Backend, app e internet das coisas com NodeJS no Google Cloud Platform
Backend, app e internet das coisas com NodeJS no Google Cloud PlatformBackend, app e internet das coisas com NodeJS no Google Cloud Platform
Backend, app e internet das coisas com NodeJS no Google Cloud Platform
 
Backend, app e internet das coisas com NodeJS no Google Cloud Platform
Backend, app e internet das coisas com NodeJS no Google Cloud PlatformBackend, app e internet das coisas com NodeJS no Google Cloud Platform
Backend, app e internet das coisas com NodeJS no Google Cloud Platform
 
Serverless technologies with Kubernetes
Serverless technologies with KubernetesServerless technologies with Kubernetes
Serverless technologies with Kubernetes
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetes
 

Mehr von confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL and KSQL (Neil Buesing, Object Partners, Inc) Kafka Summit London 2019

  • 1. Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL and KSQL Where are my Keys? Neil Buesing Kafka Summit London, 2019 !1
  • 2. Introduction • Object Partners, Inc • Located : Minneapolis, Minnesota & Omaha, Nebraska, USA • http://www.objectpartners.com • Software Development Consulting • JVM Technologies, Mobile/Web, DevOps, Real-Time Data • Neil Buesing • Director, Real-Time Data • 19 Years with Object Partners, Inc. • Find me at: • https://www.linkedin.com/in/neilbuesing • https://twitter.com/nbuesing • https://github.com/nbuesing !2
  • 3. https://github.com/nbuesing/kafka-summit-london-2019 Source Code • Fully Contained GitHub Repository • Java • Spring Boot • Gradle Build Files • Docker Container (Kafka Cluster) !3
  • 4. The “Pre” Projects • Common • Avro Data Model • KdTree • Bucket / Bucket Factory • Geolocation • RESTful Location to Airport Lookup Service • Connector • OpenSky Apache Kafka Source Connector • Web Application • D3 / Spring Boot / Spring / Spring MVC • Docker • 3 broker, 1 zookeeper, 1 SR docker compose file !4
  • 5. Avro Data Model Record (OpenSky) Aircraft Location Nearest Airport Count Distance !5
  • 6. Airport Lookup Service • Get Location of An Airport • /airport/{code} • Closest Airport • /airport?latitude={latitude}&longitude={longitude} !6
  • 7. OpenSky Source Connector • Pulls current data from OpenSky API • Offset — a timestamp • Polling - 30 seconds The OpenSky Network, http://www.opensky-network.org !7
  • 8. OpenSky Source Connector • Pulls current data from OpenSky API • Offset — a timestamp • Polling - 30 seconds { "time": 1535739820, "states": [ [ "a12345", “N0000 ", "United States", 1535739649, 1535739649, -122.5351, 38.1321, 167.64, false, 31.29, 226.33, -2.93, null, 160.02, null, false, 0 ] ] } The OpenSky Network, http://www.opensky-network.org !7
  • 9. OpenSky Source Connector • Pulls current data from OpenSky API • Offset — a timestamp • Polling - 30 seconds { "time": 1535739820, "states": [ [ "a12345", “N0000 ", "United States", 1535739649, 1535739649, -122.5351, 38.1321, 167.64, false, 31.29, 226.33, -2.93, null, 160.02, null, false, 0 ] ] } transponder callsign geolocation position update The OpenSky Network, http://www.opensky-network.org !7
  • 10. OpenSky Source Connector • Pulls current data from OpenSky API • Offset — a timestamp • Polling - 30 seconds { "time": 1535739820, "states": [ [ "a12345", “N0000 ", "United States", 1535739649, 1535739649, -122.5351, 38.1321, 167.64, false, 31.29, 226.33, -2.93, null, 160.02, null, false, 0 ] ] } transponder callsign geolocation position update api.getStates(0, null, new OpenSkyApi.BoundingBox(24.39, 49.38, -124.84, -66.88)); The OpenSky Network, http://www.opensky-network.org !7
  • 11. OpenSky Source Connector • Pulls current data from OpenSky API • Offset — a timestamp • Polling - 30 seconds { "time": 1535739820, "states": [ [ "a12345", “N0000 ", "United States", 1535739649, 1535739649, -122.5351, 38.1321, 167.64, false, 31.29, 226.33, -2.93, null, 160.02, null, false, 0 ] ] } transponder callsign geolocation position update api.getStates(0, null, new OpenSkyApi.BoundingBox(24.39, 49.38, -124.84, -66.88)); api.getStates(0, null, new OpenSkyApi.BoundingBox(-80.0, 80.0, -180.0, 180.0)); The OpenSky Network, http://www.opensky-network.org !7
  • 12. OpenSky Source Connector The OpenSky Network, http://www.opensky-network.org { "time": 1535739820, "states": [ [ "a12345", “N0000 ", "United States", 1535739649, 1535739649, -122.5351, 38.1321, 167.64, false, 31.29, 226.33, -2.93, null, 160.02, null, false, 0 ] ] } • Kafka Connect • Not a lot of code needed to write • Considering making this an Open
 Source Connector !8
  • 13. Nearest Airport • How many aircrafts are closer to a given airport than any other airport? • What is my stateful window/duration I want to measure? • How do I access the data? • How do I count my data? • How do I find my data? • How do I provide a visual of this data to the user? !9
  • 14. Kafka Streams Nearest Airport • How many aircrafts are closer to a given airport than any other airport? • What is my stateful window/duration I want to measure? • How do I access the data? • How do I count my data? • How do I find my data? • How do I provide a visual of this data to the user? !9
  • 15. D3 Kafka Streams Nearest Airport • How many aircrafts are closer to a given airport than any other airport? • What is my stateful window/duration I want to measure? • How do I access the data? • How do I count my data? • How do I find my data? • How do I provide a visual of this data to the user? !9
  • 16. Nearest Airport • Airport Lookup is based distance • First thought - Airport as a KTable. • How do I find my keys? • Window - 5 minute tumbling window • What to do with late arriving data? • Source - Airports • RESTful endpoint • Source - OpenSky • Kafka Source Connector • Frequency of updates? • Source Offsets? !10
  • 17. • Create A Windowed KTable of all flights • 5 minute window, 1 minute grace period • On update, keep the most recent reading • Materialize the data so it is programmatically accessible • Repeat for other items—Materialize for all state-stores • What about multiple instances of my Kafka Streams Applications? Nearest Airport - Lookup !11
  • 18. Nearest Airport - Lookup public KTable<Windowed<String>, Record> flightsStore() { return flights() .groupByKey() .windowedBy(TimeWindows.of("5m").grace("1m")) .reduce((current, v) -> { return v; }, Materialized.as(“flights”)); } !12 Keep latest Make programmatically accessible
  • 19. Nearest Airport - Lookup private ReadOnlyWindowStore<String, Record> flights() { return kafkaStreams().store( "flights", QueryableStoreTypes.windowStore() ); } • Provide access to the state store as a queryable read-only window store. !13 Access by Name
  • 20. Nearest Airport - Lookup public List<AircraftJson> flights(Long start, Long end) { KeyValueIterator<Windowed<String>, Record> iterator = flights().fetchAll(Instant.ofEpochMilli(start), Instant.ofEpochMilli(end)); … } • Provide access to the state store as a queryable read-only window store. !14 fetchAll for the given time window state-store for kTables only store data in the partitions being processed by a given streams
  • 21. Nearest Airport - Lookup Aircrafts Pulled from flightStore() Materialized Stores streams.allMetadataForStore("nearest_airport").forEach(app -> { list.add(app.host() + ":" + app.hostInfo().port()); }); !15
  • 22. Nearest Airport - Count !16 • Algorithm #1 - Count the Aircrafts
  • 23. Nearest Airport - Count .selectKey((key, value) -> value.getAirport()) .groupByKey() .windowedBy(TimeWindows.of(“5m").grace(Duration.of("1m")) .aggregate( () -> 0, (key, value, aggregate) -> { return aggregate + 1; }, Materialized.as(NEAREST_AIRPORT_STORE) ) .toStream((wk, v) -> wk.key()) !17
  • 24. Nearest Airport - Count .selectKey((key, value) -> value.getAirport()) .groupByKey() .windowedBy(TimeWindows.of(“5m").grace(Duration.of("1m")) .aggregate( () -> 0, (key, value, aggregate) -> { return aggregate + 1; }, Materialized.as(NEAREST_AIRPORT_STORE) ) .toStream((wk, v) -> wk.key()) What’s wrong with this approach? !17
  • 25. Nearest Airport - Count - KSQL @UdfDescription(name = "closestAirport", description = "return airport") public class ClosestAirport { private Geolocation geolocation = Feign.builder() .options(new Request.Options(200, 200)) .encoder(new JacksonEncoder()) .decoder(new JacksonDecoder()) .target(Geolocation.class, "http://geolocation:9080"); @Udf(description = "find closest airport to given location.") public String closestAirport(final Double latitude, final Double longitude) { return geolocation.closestAirport(latitude, longitude).getCode(); } } !18 • Step 1 : create specialized User Defined Function Written in Java
  • 26. Nearest Airport - Count - KSQL !19 • Step 2 : deploy specialized function compiled as uber jar
  • 27. Nearest Airport - Count - KSQL create stream ksql_nearest_airport as select aircraft->transponder transponder, closestAirport(location->latitude, location->longitude) as airport, location from flights partition by transponder; !20 • Step 3 : use specialized function to enrich the streaming data
  • 28. Nearest Airport - Count - KSQL create table ksql_nearest_airport_count as select airport, count(*) as count from ksql_nearest_airport window tumbling (size 5 minutes) group by airport; !21 • Step 4 : aggregate the data using standard KSQL syntax
  • 29. Nearest Airport - Aggregate !22 • Algorithm #2 - Collect the Aircrafts
  • 30. Nearest Airport - Aggregate flights() .map((key, value) -> { Airport airport = geolocation.closestAirport(value.getLocation()); return KeyValue.pair(key, createNearestAirport(airport, value)); }) .groupBy((k, v) -> v.getAirport()) .windowedBy(TimeWindows.of("5m")) .aggregate(() -> null, (key, value, aggregate) -> { if (aggregate == null) { aggregate = createAgg(value.getAirport(), value.getAirportLocation()); } if (!aggregate.getAircrafts().contains(value.getCallsign())) { aggregate.getAircrafts().add(value.getCallsign()); } return aggregate; }, Materialized.as(NEAREST_AIRPORT_AGG_STORE)); !23
  • 31. Nearest Airport - Aggregate flights() .map((key, value) -> { Airport airport = geolocation.closestAirport(value.getLocation()); return KeyValue.pair(key, createNearestAirport(airport, value)); }) .groupBy((k, v) -> v.getAirport()) .windowedBy(TimeWindows.of("5m")) .aggregate(() -> null, (key, value, aggregate) -> { if (aggregate == null) { aggregate = createAgg(value.getAirport(), value.getAirportLocation()); } if (!aggregate.getAircrafts().contains(value.getCallsign())) { aggregate.getAircrafts().add(value.getCallsign()); } return aggregate; }, Materialized.as(NEAREST_AIRPORT_AGG_STORE)); Double Counting? !23
  • 32. create table ksql_nearest_airport_count_agg_count as select airport, countList(count) from ksql_nearest_airport window tumbling (size 5 minutes) group by airport; Nearest Airport - Aggregate - KSQL create table ksql_nearest_airport_count_agg as select airport, collect_set(transponder) as count from ksql_nearest_airport window tumbling (size 5 minutes) group by airport; !24
  • 33. Nearest Airport - Suppression !25 • Algorithm #3 - Suppress Streaming of the Topology
 • Count With Suppression 
 (KIP-328: Ability to suppress updates for KTables)
 • Kafka Streams 2.1
  • 34. Nearest Airport - Suppression public KStream<String, Record> flightsSuppressed() { return flightsStore() .suppress(Suppressed.untilWindowCloses( Suppressed.BufferConfig.unbounded()) ).toStream() .selectKey((k, v) -> k.key()); } KStream<String, NearestAirport> stream = flightsSuppressed() .map((key, value) -> { Airport airport = geolocation.closestAirport(value.getLocation()); return KeyValue.pair(key, createNearestAirport(airport, value)); }); !26
  • 35. Nearest Airport - Suppression !27
  • 36. Nearest Airport - Suppression !27 Algorithm Concerns?
  • 37. Nearest Airport - Suppression !27 Algorithm Concerns? Information Delay
  • 38. Nearest Airport - Suppression !27 Algorithm Concerns? Information Delay Memory
  • 39. Nearest Airport - Suppression !27 Algorithm Concerns? Information Delay Memory Not Available in KSQL
  • 40. Nearest Airport - Retrospective • Kafka Streams • Windows are not magic • treating it like magic means you will get it wrong • Window state-stores are powerful • late arriving messages • retention (default 24 hours) • materialization • change-log topics • suppression • keeps evolving (read the KIPs) !28
  • 42. Nearest Neighbor !30 • Find the nearest Blue Team aircraft for every given Red Team aircraft.
 • Make sure the algorithm can be properly sharded so the work can be distributed.
 • Selected a five minute time window.
  • 43. Nearest Neighbor For this red aircraft, find the closest aircraft. !31
  • 44. Nearest Neighbor Create a 3°x3° region that I call the “bucket” this becomes the topic key !32
  • 47. Nearest Neighbor !35 Distance Object Distance, Red Aircraft, & Blue Aircraft Keep all the information needed for next operation
  • 48. Nearest Neighbor !35 Distance Object Distance, Red Aircraft, & Blue Aircraft Keep all the information needed for next operation What did I do wrong?
  • 50. Nearest Neighbor Place blue aircraft into 9 “buckets” (replicate the data) !37 DSL : flatMap() KSQL : “insert into”
  • 55. Nearest Neighbor No Bucket overlap no distance calculated Aircraft 3 not sharded with red aircraft !42 Nearest Neighbor
  • 58. Nearest Neighbor Bucket overlap distance calculation performed !45
  • 59. 51° N, 6° W51° N, 3° E 51° N, 3° W51° N, 0° W 51° N, 9° W A Ba caa bbb ccAA BBa cb !46
  • 60. 51° N, 6° W51° N, 3° E 51° N, 3° W51° N, 0° W 51° N, 9° W A Ba ca a bbb ccAA BBa cb !46
  • 61. 51° N, 6° W51° N, 3° E 51° N, 3° W51° N, 0° W 51° N, 9° W A B a ca a b b b c c A A B B a cb !46
  • 62. 51° N, 6° W51° N, 3° E 51° N, 3° W51° N, 0° W 51° N, 9° W A B a ca a b b b c c A A B B a cb !46
  • 66. Nearest Neighbor !47 Stream Concepts Key on bucket Re-Key on red aircraft
  • 67. Nearest Neighbor !47 Stream Concepts Key on bucket Re-Key on red aircraft Aggregate (reduce) on red aircraft (keeping smallest distance)
  • 71. Nearest Neighbor !48 Algorithm Limitations? Bucket size selection Sparse Data Location missing result
  • 72. Nearest Neighbor !48 Algorithm Limitations? Bucket size selection Sparse Data Location missing result Sparse Data Location wrong result
  • 76. Nearest Neighbor !49 Performance Limitations? Partitioning and Key Hash Uniformity of the Data
  • 77. Nearest Neighbor !49 Performance Limitations? Partitioning and Key Hash Uniformity of the Data Replication of Data
  • 82. red() .map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()), value)) .join(blue() .flatMap((key, value) -> bucketFactory.createSurronding(value.getLocation()) .stream() .map((b) -> KeyValue.pair(b.toString(), value)) .collect(Collectors.toList())).selectKey((key, value) -> key), (value1, value2) -> { double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation()); return new Distance(value1, value2, d); }, JoinWindows.of(WINDOW), Joined.with(Serdes.String(), recordSerde, recordSerde)) .to("distance", Produced.with(Serdes.String(), distaneSerde)); Nearest Neighbor !54
  • 83. red() .map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()), value)) .join(blue() .flatMap((key, value) -> bucketFactory.createSurronding(value.getLocation()) .stream() .map((b) -> KeyValue.pair(b.toString(), value)) .collect(Collectors.toList())).selectKey((key, value) -> key), (value1, value2) -> { double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation()); return new Distance(value1, value2, d); }, JoinWindows.of(WINDOW), Joined.with(Serdes.String(), recordSerde, recordSerde)) .to("distance", Produced.with(Serdes.String(), distaneSerde)); Nearest Neighbor !54
  • 84. red() .map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()), value)) .join(blue() .flatMap((key, value) -> bucketFactory.createSurronding(value.getLocation()) .stream() .map((b) -> KeyValue.pair(b.toString(), value)) .collect(Collectors.toList())).selectKey((key, value) -> key), (value1, value2) -> { double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation()); return new Distance(value1, value2, d); }, JoinWindows.of(WINDOW), Joined.with(Serdes.String(), recordSerde, recordSerde)) .to("distance", Produced.with(Serdes.String(), distaneSerde)); Nearest Neighbor !54
  • 85. red() .map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()), value)) .join(blue() .flatMap((key, value) -> bucketFactory.createSurronding(value.getLocation()) .stream() .map((b) -> KeyValue.pair(b.toString(), value)) .collect(Collectors.toList())).selectKey((key, value) -> key), (value1, value2) -> { double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation()); return new Distance(value1, value2, d); }, JoinWindows.of(WINDOW), Joined.with(Serdes.String(), recordSerde, recordSerde)) .to("distance", Produced.with(Serdes.String(), distaneSerde)); Nearest Neighbor !54
  • 86. red() .map((key, value) -> KeyValue.pair(bucketFactory.create(value.getLocation()), value)) .join(blue() .flatMap((key, value) -> bucketFactory.createSurronding(value.getLocation()) .stream() .map((b) -> KeyValue.pair(b.toString(), value)) .collect(Collectors.toList())).selectKey((key, value) -> key), (value1, value2) -> { double d = DistanceUtil.distance(value1.getLocation(), value2.getLocation()); return new Distance(value1, value2, d); }, JoinWindows.of(WINDOW), Joined.with(Serdes.String(), recordSerde, recordSerde)) .to("distance", Produced.with(Serdes.String(), distaneSerde)); Nearest Neighbor !54
  • 87. KTable<Windowed<String>, Distance> result = distance() .selectKey((k, v) -> v.getRed().getAircraft().getTransponder()) .groupByKey() .windowedBy(TimeWindows.of(WINDOW)) .aggregate( () -> new Distance(null, null, Double.MAX_VALUE), (k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg); result.toStream() .map((k, v) -> KeyValue.pair(k.key().toString(), v)) .to("closest"); Nearest Neighbor !55
  • 88. KTable<Windowed<String>, Distance> result = distance() .selectKey((k, v) -> v.getRed().getAircraft().getTransponder()) .groupByKey() .windowedBy(TimeWindows.of(WINDOW)) .aggregate( () -> new Distance(null, null, Double.MAX_VALUE), (k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg); result.toStream() .map((k, v) -> KeyValue.pair(k.key().toString(), v)) .to("closest"); Nearest Neighbor !55
  • 89. KTable<Windowed<String>, Distance> result = distance() .selectKey((k, v) -> v.getRed().getAircraft().getTransponder()) .groupByKey() .windowedBy(TimeWindows.of(WINDOW)) .aggregate( () -> new Distance(null, null, Double.MAX_VALUE), (k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg); result.toStream() .map((k, v) -> KeyValue.pair(k.key().toString(), v)) .to("closest"); Nearest Neighbor !55
  • 90. KTable<Windowed<String>, Distance> result = distance() .selectKey((k, v) -> v.getRed().getAircraft().getTransponder()) .groupByKey() .windowedBy(TimeWindows.of(WINDOW)) .aggregate( () -> new Distance(null, null, Double.MAX_VALUE), (k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg); result.toStream() .map((k, v) -> KeyValue.pair(k.key().toString(), v)) .to("closest"); Nearest Neighbor !55
  • 91. KTable<Windowed<String>, Distance> result = distance() .selectKey((k, v) -> v.getRed().getAircraft().getTransponder()) .groupByKey() .windowedBy(TimeWindows.of(WINDOW)) .aggregate( () -> new Distance(null, null, Double.MAX_VALUE), (k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg); result.toStream() .map((k, v) -> KeyValue.pair(k.key().toString(), v)) .to("closest"); Nearest Neighbor !55
  • 92. KTable<Windowed<String>, Distance> result = distance() .selectKey((k, v) -> v.getRed().getAircraft().getTransponder()) .groupByKey() .windowedBy(TimeWindows.of(WINDOW)) .aggregate( () -> new Distance(null, null, Double.MAX_VALUE), (k, v, agg) -> (v.getDistance() < agg.getDistance()) ? v : agg); result.toStream() .map((k, v) -> KeyValue.pair(k.key().toString(), v)) .to("closest"); Nearest Neighbor !55 Same as Join Window?
  • 93. Nearest Neighbor - KSQL !56 create stream blueBucket as select bucketLocation(location->latitude, location->longitude, 3.0) as bucket, aircraft->transponder as transponder, aircraft->callsign as callsign, location->latitude as latitude, location->longitude as longitude from blue partition by bucket; create stream blueBucket_w as select bucketLocation(location->latitude, location->longitude, 3.0, 'w') as bucket, aircraft->transponder as transponder, aircraft->callsign as callsign, location->latitude as latitude, location->longitude as longitude from blue partition by bucket; insert into blueBucket select * from blueBucket_w;
  • 94. Nearest Neighbor - Retrospective • Kafka Streams • understand your keys • flatMap / insert into • maintain the state you need within the domain • understand intermediate topics !57
  • 95. Retrospective • How do these examples help me / apply to me?
 • Do I really need to write a distributed application? • Should I be programmatically accessing the Kafka Stream State Stores? 
 • Kafka Streams or KSQL • how do I choose? • do I have to choose?
 • Some settings to investigate • group.initial.rebalance.delay.ms • segment size on change-log topic • num.standby.replica !58
  • 96. Credits • Object Partners, Inc. • Apache Kafka • Confluent Platform • OpenSky - https://opensky-network.org/ • D3 • D3 v4 • http://techslides.com/d3-map-starter-i • Apache Avro • KdTree Author Justin Wetherell • https://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/ KdTree.java • Apache 2.0 License • Distance Formula • https://stackoverflow.com/questions/837872/calculate-distance-in-meters-when-you-know-longitude-and-latitude-in-java • Additional open source libraries referenced in repository !59