SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Predictive Maintenance
with
Deep Learning and Flink .
Dongwon Kim, PhD
Solution R&D center
SK Telecom
• Big Data processing engines
• MapReduce, Tez, Spark, Flink, Hive, Presto
• Recent interest
• Deep learning model serving (TensorFlow serving)
• Containerization (Docker, Kubernetes)
• Time-series data (InfluxDB, Prometheus, Grafana)
• Flink Forward 2015 Berlin
• A Comparative Performance Evaluation of Flink
About me (@eastcirclek)
Covered
in this talk
Refinery and semiconductor companies depend on equipment
Breakdown of equipment largely affects company profit
Equipment maintenance to minimize breakdown
Breakdown
Planned Maintenance
Shutdown equipment on a regular basis for
parts replacement, cleaning, adjustments, etc.
Time
2015.07 2016.07 2017.07
Predictive Maintenance (PdM)
Unplanned maintenance based on
prediction of equipment condition
Predictive
Maintenance
Planned
Maintenance
...
<equipment sensors> <predictive maintenance system>
Our approach to Predictive Maintenance
Better prediction of equipment condition
using Deep Learning
PdM
Planned
Maintenance
Breakdown
PdM
Planned
Maintenance
Breakdown
Machine Learning
& Statistics
Machine learning
& Statistics
+ Deep Learning
Contents
1
Why we use Flink
for our time-series prediction model
2
Flink pipeline design for
rendezvous and DNN ensemble
3
Solution packaging and monitoring
with Docker and Prometheus
Time-series prediction model
TF Serving
Grafana
Flink MySQL
PrometheusInfluxDB
Role Toolbox
My team consists of two groups
* The diagram is based on Ted Dunning’s slides (Flink Forward 2017 SF)
Data Engineers
Model Developers
*
history
data
training
DNN
model
*
live
data
prediction
&
alarm
DNN
model
Model developers give Convolutional LSTM to engineers
CNN
RNN
(LSTM)
...
...
Output (Ŷ)
expected sensor values
after 2 days
U
W3 W3
U
W3
O
U
W2
W1
Ŷ
Input (X)
1 week time-series (10080 records)
Model developer
It does not return whether
the equipment breaks down
after 2 days
...
* assuming one minute sampling rate
Data engineers apply Convolutional LSTM to live sensor data
Sensors Vector
y1
y2
y3
ym
......
Equipment
We have multi-sensor data
Data engineers apply Convolutional LSTM to live sensor data
Multi-sensor data arrive
at a fixed interval of 1 minute
Sensors Vector
y1
y2
y3
ym
......
Equipment
...
Timeline
...
Data engineers apply Convolutional LSTM to live sensor data
X (10080 records)X (10080 records)X (10080 records)X (10080 records)
We maintain a count window
over the latest 10080 records
It slides as a new record arrives
(a sliding count window)
Timeline
Vector
y1
y2
y3
ym
...
Sensors
...
Equipment
Data engineers apply Convolutional LSTM to live sensor data
2 days
...
Y
y1
y2
y3
ym
...
Ŷ
ŷ1
ŷ2
ŷ3
ŷm
...
Given one week time-series (X),
DNN returns predicted values
after 2 days (Ŷ)
Ŷ
Raise an alarm
if the distance of two vectors is
above a defined threshold
...U
W3 W3
U
W3
U
W2
W1
O
...
CNN
RNN
(LSTM)
X (10080 records)
Timeline
Whenever the sliding window moves,
we apply Convolutional LSTM to ....
Stream
source
Sliding
count
window
SinkScoreJoin
Desired streaming topology by data engineers
Apply DNN to X Ŷ
...
timeline
...
X
Requirement 1
Count window to maintain
10080 records instead of
1 week event-time window
Requirement 2
Joining of two streams
on event time
(Rendezvous)
Prediction
stream
Ŷ
Outlier
filter
Y
Measurement
stream
Proof-of-Concept of Spark Structured Streaming
Types
Score
Streaming
Dataset
Input
Streaming
Dataset
Prediction
Streaming
Dataset
joining of two streams
apply DNN to 1 week time-series,
not 10080 records
(Sliding count window is not supported)
generate an input stream
from local files
Inner join between two streaming Datasets
is not supported
in Spark Structured Streaming
Score
Dataset
Unsupported Operations on Spark Structured Streaming (v2.2)
Requirement 1
Count window to maintain
10080 records instead of
1 week event-time window
Requirement 2
Joining of two streams
on event time
(Rendezvous)
That’s why we move to Flink DataStream API
• Sliding count window : not supported
• Joining of two streams : not supported
• Micro-batch behind the scene
• Continuous processing proposed in SPARK-20928
• Sliding count window : supported
• Joining of two streams : supported
• Scalability and performance proved by other use cases
Spark Structured Streaming
Flink DataStream API
* it could be possible to use our Convolutional LSTM model using Spark Structured Stream in some other way
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
Score
Sink
W(t) @t
Input
Sink
Flink can faithfully implement our streaming topology design
Stream
source
Sliding count
window
SinkScoreJoin
Apply DNN to X Ŷ
...
...
X
Y
Ŷ
Outlier
filter
<Topology design> <Flink implementation>
addSource process
countWindowAll
(custom evictor)
apply
assignTimestampsAndW
atermarks
join applywindow
...
... Ŷ
Contents
1
Why we use Flink
for our time-series prediction model
2
Flink pipeline design for
rendezvous and DNN ensemble
3
Solution packaging and monitoring
with Docker and Prometheus
Time-series prediction model
TF Serving
Grafana
Flink MySQL
PrometheusInfluxDB
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
Score
Sink
W(t) @t
Input
sink
We read data
from MySQL
Data processing pipeline with Flink DataStream API
Stateful custom source to read from MySQL
• We assume that sensor data arrive in order
• Emit an input record and a watermark of the same time
• Increase lastTimestamp afterward (11:15  11:16)
• Exactly-once semantics
• Store lastTimestamp when taking a snapshot
• Restore lastTimestamp when restarted
addSource
lastTimestamp
(state)
2017-9-13 11:15
2017-9-13 11:16 [y1, y2, ..., y 𝑚]
JDBC Connection
SELECT timestamp, measured
FROM input
WHERE timestamp>$lastTimestamp
W
(11:16)
2017-9-13 11:13 [y1, y2, ..., y 𝑚]
2017-9-13 11:14
2017-9-13 11:15
timestamp measured
[y1, y2, ..., y 𝑚]
[y1, y2, ..., y 𝑚]
Input table in MySQL
2017-9-13 11:16 [y1, y2, ..., y 𝑚] @
11:16
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Prediction
Sink
Score
Sink
W(t) @t
filter out
outliers
maintain last N
elements
Emit outliers to
a side output
Event time window for 1 week
cannot guarantee 10080 records
as data can be missing or filtered
We define
a custom evictor
What if data is absent or filtered for a long period of time?
They look totally different!
3 missing days
We’d better start a new sliding window
for the time-series!
CustomEvictor.of(3, timeThreshold=4)
CustomEvictor evicts
all but the last one
when the last one occurs
after timeThreshold
CountEvictor.of(3)
CountEvictor evicts
elements beyond
its capacity
How to start a new sliding count window after a long break
timeline
1 2 3 4 95 6 7 8
no records for a while
CountTrigger.of(1)
fires every time
a record comes in
EvictingWindowOperator
adds a new input record to
InternalListState
92 3 4
2 3 4 9
2 3 4 9
Sliding count window
of size 3
We want to start a new window
after missing 4 timestamps
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
Score
Sink
W(t) @t
Input
sink
Working with model developers
They stick to using Python
They develop models using a Python library called Keras
I don’t want to use
Deeplearning4J
because that’s Java…
We use Keras on Python!
I want to develop our
solution on JVM!
Why don’t we
develop models using
Deeplearning4J?
How to load Keras models in Flink?
I don’t know how
to have it!
Loading Keras models in JVM
• Convert Keras models to TensorFlow SavedModel
• use tensorflow.python.saved_model.builder.SavedModelBuilder
• TensorFlow Java API (Flink TensorFlow)
• Do inference inside the JVM process
• TensorFlow Serving
• Do inference outside the JVM process
• Execute Keras through CPython inside JVM
• Do inference inside the JVM process
• Java Embedded Python (JEP) to ease the use of CPython
• https://github.com/ninia/jep
• Use KerasModelImport from Deeplearning4J
• Not mature enough
Comparison of approaches to use Keras models in JVM
TaskManager Process
RichWindowFunction
TensorFlow
Java API
Java Embedded
Python (JEP)
TaskManager Process
RichWindowFunction
TaskManager Process
TensorFlow
Serving
RichWindowFunction
TensorFlow
Native Library
TensorFlow Java API
(very thin wrapper)
Ŷ...
X
CPython
JEP Java object
JEP native code
Ŷ...
X
Saved
Model
Keras
model
Saved
Model
Keras
model
gRPC client
...
X
Ŷ
TFServing process
DynamicManager
Loader
SavedModel
v1
SavedModel
v2Keras
model
Execute Python commands
- import keras
- load a model & weights
- pass X and get Ŷ
Saved
Model
Comparison of runtime inference performance
TensorFlow Java API
77.7 milliseconds per inference
TensorFlow Serving
71.2 milliseconds per inference
Keras inside CPython
w/ TensorFlow backend
32 milliseconds per inference
(* Theano backend is extremely slow in our case)
(* We do not batch inference calls)
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
Score
Sink
W(t) @t
Input
sink
Tumbling
EventTimeWindows
(size = interval)
Joining two streams on event time
• At a certain time t,
• Y of timestamp t is arriving
• Ŷ of timestamp t+2d is arriving
• Ŷ of timestamp t has arrived two days ago
• TumblingEventTimeWindows.of( Time.seconds(timeUnit) )
• To maintain a window for a single pair of Y and Ŷ
• A window is triggered when watermarks from both streams have arrived
join windowMeasurement
stream
...
@t+2d
@t
@t
apply
assignTimestamps
AndWatermarks
Prediction
stream
Tumbling
EventTime
Windows
@tW(t)
Y
@t+2dW(t+2d)
Ŷ
+2
days
trigger!
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
W(t) @t
Raise an alarm
if the distance of Ŷ and Y
goes beyond
a defined threshold
Input
sink
Score
Sink
Prediction
Sink
Input
sink
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
W(t) @t
Input, Prediction, Score sinks write records to InfluxDB
We then plot time-series using Grafana
Predicting from a single DNN is not enough!
Prediction from a single DNN Possibly biased predictionMeasurement
Prediction from
an ensemble of 10 DNNs
...
...
Measurement More reliable prediction!
mean
DNN ensemble for reliable prediction
Ŷ Ŷ Ŷ
Different Convolutional LSTMNs return slightly different prediction results
Ŷ
Timeline
Y
2 days
...
...
...
Raise an alarm
if the distance of two vectors
is above a defined threshold
X (one week time-series) ...
...
Data processing pipeline with Flink DataStream API
addSource process
join applywindow
Measurement
stream
Prediction
stream
Outlier
sink
Prediction
Sink
Score
Sink
Ŷ
…
DNN ensemble
Ŷ
Ŷ
Ŷ
Input
sink
how to implement our ensemble pipeline?
...
...
...
... ...
mean
Ŷ Ŷ Ŷ
Ŷ
…
Data processing pipeline with Flink DataStream API
addSource process
join applywindow
Measurement
stream
Prediction
stream
Outlier
sink
Prediction
Sink
Score
Sink
ŶapplykeyBy
countWindow
(custom evictor)
…
...
...
...
Ŷ
Ŷ
Ŷ
…
…
setParallelism(ensembleSize=10)
assign
Timestamps
And
Watermarks
…
Ŷ
Ŷ
Ŷ
setParallelism(1)
...
...
...
…
DNN ensemble
Ŷ
Ŷ
Ŷ
mean
Ŷ Ŷ Ŷ
Ŷ
flatMap
…
replicate
10 times
Input
sink
applywindowAll
Tumbling
EventTimeWindow
…
... ...
Distribute 10 keys evenly over 10 partitions
Carefully generate keys not to belong to the same partitions
flatMap keyBy
(murmurHash)
PARTITION_0KEY_0,
PARTITION_1KEY_1,
PARTITION_9KEY_9,
…
KEY_1, KEY_9,…
KEY_0,
replicate 10 times with different keys
PARTITION = murmurHash(KEY) / maxParallalism*(parallelism/maxParallism)
Contents
1
Why we use Flink
for our time-series prediction model
2
Flink pipeline design for
rendezvous and DNN ensemble
3
Solution packaging and monitoring
with Docker and Prometheus
TF Serving
Grafana
Flink MySQL
PrometheusInfluxDB
Time-series prediction model
A simple software stack on top of Docker
Customer machine
MySQL
official image
Grafana
official image
Prometheus
official image
TensorFlow
Serving
official image
InfluxDB
official image
Docker engine
Flink
official image
No custom Docker image!
A single yml file is okay to deploy our software stack!
Launch JobManager & TaskManager with some changes
in the official repository of the Docker image for Flink
You need to get
flink-metrics-prometheus-1.4-SNAPSHOT.jar
by yourself
until Flink-1.4 is officially released
metrics.reporter : prom
metrics.reporter.prom.class :
org.apache.flink.metrics.prometheus.PrometheusReporter
jobmanager.heap.mb : 10240
taskmanager.heap.mb : 10240
* Every process runs inside a Docker container
Flink JobManager
TaskManager TaskManager TaskManager
Prometheus
Prometheus scrapes
HTTP endpoints of
metrics exporters
specified in configuration
Grafana
Solution
deployment
:9249/metrics
:9249/metrics
Flink runtime metrics
& Custom metrics
:9104/metrics
MySQLd Exporter
MySQL
metrics
CPU/Disk
Memory
Network
Servers
System
metrics
:9100/metrics
Node Exporter
:8080/metrics
Container
metrics
cAdvisor
TensorFlow
Serving
inference MySQL
source
InfluxDB
sink
Docker
Submits a Flink Job
to launch our pipeline
InfluxDB
Sensor
time-series
dashboard
Solution
monitoring
dashboard
* if using TFServing
Solution monitoring dashboard
* this dashboard is based on ”Docker Dashboard” by Brian Christner
Server
Mem/CPU/FS
usage
(by node exporter)
Container
CPU usage
(by cAdvisor)
Inference time
from each DNN
(custom metrics)
TaskManager
JVM memory
Usage
# records
written to sinks
(custom metrics)
Recap – contents
1
Why we use Flink
for our time-series prediction model
2
Flink pipeline design for
rendezvous and DNN ensemble
3
Solution packaging and monitoring
with Docker and Prometheus
Time-series prediction model
TF Serving
Grafana
Flink MySQL
PrometheusInfluxDB
Conclusion
• Flink helps us concentrate on the core logic
• DataStream API is just like a natural language in presenting streaming topologies
• flexible windowing mechanisms (count window and evictor)
• joining of two streams on event time
• Thanks to it, we can focus on
• implementation of custom source/sink to meet customer requirements
• interaction with DNN ensembles
• It has a nice ecosystem to help build a solution
• Docker
• Prometheus metric reporter
THE END
You cannot use TFServing with Flink-1.3.2
• Netty binary incompatbility
• flink-runtime_2.11:1.3.2
• depends on io.netty:4.0.27.Final
• grpc-netty:4.1.14
• depends on io.netty:4.1.14.Final
• You could use grpc-okhttp instead of grpc-netty
• grpc-okhttp conflicts with another library (influxdb client for Java)
• You can use TFServing with FLINK-1.4-SNAPSHOT
• [FLINK-7013] Add shaded netty dependency
• io.netty.* is recently relocated to org.apache.flink.shaded.netty4.*
From “New and noteworthy in 4.1” page of Netty project
4.1 contains multiple additions which might not be fully backward-compatible with 4.0
Looking forward to the official release of v1.4
No more 1.4-SNAPSHOT on the customer site!

Weitere ähnliche Inhalte

Was ist angesagt?

Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Plotly dash and data visualisation in Python
Plotly dash and data visualisation in PythonPlotly dash and data visualisation in Python
Plotly dash and data visualisation in PythonVolodymyr Kazantsev
 
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결confluent
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...HostedbyConfluent
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumClement Demonchy
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...confluent
 
Inside Financial Markets
Inside Financial MarketsInside Financial Markets
Inside Financial MarketsKhader Shaik
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
devops 2년차 이직 성공기.pptx
devops 2년차 이직 성공기.pptxdevops 2년차 이직 성공기.pptx
devops 2년차 이직 성공기.pptxByungho Lee
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changesconfluent
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafkaconfluent
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022HostedbyConfluent
 
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...Vietnam Open Infrastructure User Group
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupStephan Ewen
 
Autoscaled Distributed Automation using AWS at Selenium London MeetUp
Autoscaled Distributed Automation using AWS at Selenium London MeetUpAutoscaled Distributed Automation using AWS at Selenium London MeetUp
Autoscaled Distributed Automation using AWS at Selenium London MeetUparagavan
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the MonolithVMware Tanzu
 

Was ist angesagt? (20)

Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Plotly dash and data visualisation in Python
Plotly dash and data visualisation in PythonPlotly dash and data visualisation in Python
Plotly dash and data visualisation in Python
 
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
 
Inside Financial Markets
Inside Financial MarketsInside Financial Markets
Inside Financial Markets
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
devops 2년차 이직 성공기.pptx
devops 2년차 이직 성공기.pptxdevops 2년차 이직 성공기.pptx
devops 2년차 이직 성공기.pptx
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
 
Autoscaled Distributed Automation using AWS at Selenium London MeetUp
Autoscaled Distributed Automation using AWS at Selenium London MeetUpAutoscaled Distributed Automation using AWS at Selenium London MeetUp
Autoscaled Distributed Automation using AWS at Selenium London MeetUp
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 

Ähnlich wie Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache Flink

Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkDongwon Kim
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisAmazon Web Services
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkDataWorks Summit/Hadoop Summit
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantParis Carbone
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...confluent
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharingconfluent
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Florian Lautenschlager
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptrveiga100
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusQAware GmbH
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 

Ähnlich wie Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache Flink (20)

Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Influx data basic
Influx data basicInflux data basic
Influx data basic
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Telegraph Cq English
Telegraph Cq EnglishTelegraph Cq English
Telegraph Cq English
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
From Trill to Quill and Beyond
From Trill to Quill and BeyondFrom Trill to Quill and Beyond
From Trill to Quill and Beyond
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 

Mehr von Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Mehr von Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Kürzlich hochgeladen

毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 

Kürzlich hochgeladen (20)

毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 

Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache Flink

  • 1. Predictive Maintenance with Deep Learning and Flink . Dongwon Kim, PhD Solution R&D center SK Telecom
  • 2. • Big Data processing engines • MapReduce, Tez, Spark, Flink, Hive, Presto • Recent interest • Deep learning model serving (TensorFlow serving) • Containerization (Docker, Kubernetes) • Time-series data (InfluxDB, Prometheus, Grafana) • Flink Forward 2015 Berlin • A Comparative Performance Evaluation of Flink About me (@eastcirclek) Covered in this talk
  • 3. Refinery and semiconductor companies depend on equipment Breakdown of equipment largely affects company profit
  • 4. Equipment maintenance to minimize breakdown Breakdown Planned Maintenance Shutdown equipment on a regular basis for parts replacement, cleaning, adjustments, etc. Time 2015.07 2016.07 2017.07 Predictive Maintenance (PdM) Unplanned maintenance based on prediction of equipment condition Predictive Maintenance Planned Maintenance ... <equipment sensors> <predictive maintenance system>
  • 5. Our approach to Predictive Maintenance Better prediction of equipment condition using Deep Learning PdM Planned Maintenance Breakdown PdM Planned Maintenance Breakdown Machine Learning & Statistics Machine learning & Statistics + Deep Learning
  • 6. Contents 1 Why we use Flink for our time-series prediction model 2 Flink pipeline design for rendezvous and DNN ensemble 3 Solution packaging and monitoring with Docker and Prometheus Time-series prediction model TF Serving Grafana Flink MySQL PrometheusInfluxDB
  • 7. Role Toolbox My team consists of two groups * The diagram is based on Ted Dunning’s slides (Flink Forward 2017 SF) Data Engineers Model Developers * history data training DNN model * live data prediction & alarm DNN model
  • 8. Model developers give Convolutional LSTM to engineers CNN RNN (LSTM) ... ... Output (Ŷ) expected sensor values after 2 days U W3 W3 U W3 O U W2 W1 Ŷ Input (X) 1 week time-series (10080 records) Model developer It does not return whether the equipment breaks down after 2 days ... * assuming one minute sampling rate
  • 9. Data engineers apply Convolutional LSTM to live sensor data Sensors Vector y1 y2 y3 ym ...... Equipment We have multi-sensor data
  • 10. Data engineers apply Convolutional LSTM to live sensor data Multi-sensor data arrive at a fixed interval of 1 minute Sensors Vector y1 y2 y3 ym ...... Equipment ... Timeline
  • 11. ... Data engineers apply Convolutional LSTM to live sensor data X (10080 records)X (10080 records)X (10080 records)X (10080 records) We maintain a count window over the latest 10080 records It slides as a new record arrives (a sliding count window) Timeline Vector y1 y2 y3 ym ... Sensors ... Equipment
  • 12. Data engineers apply Convolutional LSTM to live sensor data 2 days ... Y y1 y2 y3 ym ... Ŷ ŷ1 ŷ2 ŷ3 ŷm ... Given one week time-series (X), DNN returns predicted values after 2 days (Ŷ) Ŷ Raise an alarm if the distance of two vectors is above a defined threshold ...U W3 W3 U W3 U W2 W1 O ... CNN RNN (LSTM) X (10080 records) Timeline Whenever the sliding window moves, we apply Convolutional LSTM to ....
  • 13. Stream source Sliding count window SinkScoreJoin Desired streaming topology by data engineers Apply DNN to X Ŷ ... timeline ... X Requirement 1 Count window to maintain 10080 records instead of 1 week event-time window Requirement 2 Joining of two streams on event time (Rendezvous) Prediction stream Ŷ Outlier filter Y Measurement stream
  • 14. Proof-of-Concept of Spark Structured Streaming Types Score Streaming Dataset Input Streaming Dataset Prediction Streaming Dataset joining of two streams apply DNN to 1 week time-series, not 10080 records (Sliding count window is not supported) generate an input stream from local files
  • 15. Inner join between two streaming Datasets is not supported in Spark Structured Streaming Score Dataset
  • 16. Unsupported Operations on Spark Structured Streaming (v2.2) Requirement 1 Count window to maintain 10080 records instead of 1 week event-time window Requirement 2 Joining of two streams on event time (Rendezvous)
  • 17. That’s why we move to Flink DataStream API • Sliding count window : not supported • Joining of two streams : not supported • Micro-batch behind the scene • Continuous processing proposed in SPARK-20928 • Sliding count window : supported • Joining of two streams : supported • Scalability and performance proved by other use cases Spark Structured Streaming Flink DataStream API * it could be possible to use our Convolutional LSTM model using Spark Structured Stream in some other way
  • 18. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink Score Sink W(t) @t Input Sink
  • 19. Flink can faithfully implement our streaming topology design Stream source Sliding count window SinkScoreJoin Apply DNN to X Ŷ ... ... X Y Ŷ Outlier filter <Topology design> <Flink implementation> addSource process countWindowAll (custom evictor) apply assignTimestampsAndW atermarks join applywindow ... ... Ŷ
  • 20. Contents 1 Why we use Flink for our time-series prediction model 2 Flink pipeline design for rendezvous and DNN ensemble 3 Solution packaging and monitoring with Docker and Prometheus Time-series prediction model TF Serving Grafana Flink MySQL PrometheusInfluxDB
  • 21. addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink Score Sink W(t) @t Input sink We read data from MySQL Data processing pipeline with Flink DataStream API
  • 22. Stateful custom source to read from MySQL • We assume that sensor data arrive in order • Emit an input record and a watermark of the same time • Increase lastTimestamp afterward (11:15  11:16) • Exactly-once semantics • Store lastTimestamp when taking a snapshot • Restore lastTimestamp when restarted addSource lastTimestamp (state) 2017-9-13 11:15 2017-9-13 11:16 [y1, y2, ..., y 𝑚] JDBC Connection SELECT timestamp, measured FROM input WHERE timestamp>$lastTimestamp W (11:16) 2017-9-13 11:13 [y1, y2, ..., y 𝑚] 2017-9-13 11:14 2017-9-13 11:15 timestamp measured [y1, y2, ..., y 𝑚] [y1, y2, ..., y 𝑚] Input table in MySQL 2017-9-13 11:16 [y1, y2, ..., y 𝑚] @ 11:16
  • 23. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Prediction Sink Score Sink W(t) @t filter out outliers maintain last N elements Emit outliers to a side output Event time window for 1 week cannot guarantee 10080 records as data can be missing or filtered We define a custom evictor
  • 24. What if data is absent or filtered for a long period of time? They look totally different! 3 missing days We’d better start a new sliding window for the time-series!
  • 25. CustomEvictor.of(3, timeThreshold=4) CustomEvictor evicts all but the last one when the last one occurs after timeThreshold CountEvictor.of(3) CountEvictor evicts elements beyond its capacity How to start a new sliding count window after a long break timeline 1 2 3 4 95 6 7 8 no records for a while CountTrigger.of(1) fires every time a record comes in EvictingWindowOperator adds a new input record to InternalListState 92 3 4 2 3 4 9 2 3 4 9 Sliding count window of size 3 We want to start a new window after missing 4 timestamps
  • 26. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink Score Sink W(t) @t Input sink
  • 27. Working with model developers They stick to using Python They develop models using a Python library called Keras I don’t want to use Deeplearning4J because that’s Java… We use Keras on Python! I want to develop our solution on JVM! Why don’t we develop models using Deeplearning4J?
  • 28. How to load Keras models in Flink? I don’t know how to have it!
  • 29. Loading Keras models in JVM • Convert Keras models to TensorFlow SavedModel • use tensorflow.python.saved_model.builder.SavedModelBuilder • TensorFlow Java API (Flink TensorFlow) • Do inference inside the JVM process • TensorFlow Serving • Do inference outside the JVM process • Execute Keras through CPython inside JVM • Do inference inside the JVM process • Java Embedded Python (JEP) to ease the use of CPython • https://github.com/ninia/jep • Use KerasModelImport from Deeplearning4J • Not mature enough
  • 30. Comparison of approaches to use Keras models in JVM TaskManager Process RichWindowFunction TensorFlow Java API Java Embedded Python (JEP) TaskManager Process RichWindowFunction TaskManager Process TensorFlow Serving RichWindowFunction TensorFlow Native Library TensorFlow Java API (very thin wrapper) Ŷ... X CPython JEP Java object JEP native code Ŷ... X Saved Model Keras model Saved Model Keras model gRPC client ... X Ŷ TFServing process DynamicManager Loader SavedModel v1 SavedModel v2Keras model Execute Python commands - import keras - load a model & weights - pass X and get Ŷ Saved Model
  • 31. Comparison of runtime inference performance TensorFlow Java API 77.7 milliseconds per inference TensorFlow Serving 71.2 milliseconds per inference Keras inside CPython w/ TensorFlow backend 32 milliseconds per inference (* Theano backend is extremely slow in our case) (* We do not batch inference calls)
  • 32. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink Score Sink W(t) @t Input sink Tumbling EventTimeWindows (size = interval)
  • 33. Joining two streams on event time • At a certain time t, • Y of timestamp t is arriving • Ŷ of timestamp t+2d is arriving • Ŷ of timestamp t has arrived two days ago • TumblingEventTimeWindows.of( Time.seconds(timeUnit) ) • To maintain a window for a single pair of Y and Ŷ • A window is triggered when watermarks from both streams have arrived join windowMeasurement stream ... @t+2d @t @t apply assignTimestamps AndWatermarks Prediction stream Tumbling EventTime Windows @tW(t) Y @t+2dW(t+2d) Ŷ +2 days trigger!
  • 34. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink W(t) @t Raise an alarm if the distance of Ŷ and Y goes beyond a defined threshold Input sink
  • 35. Score Sink Prediction Sink Input sink Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink W(t) @t
  • 36. Input, Prediction, Score sinks write records to InfluxDB We then plot time-series using Grafana
  • 37. Predicting from a single DNN is not enough! Prediction from a single DNN Possibly biased predictionMeasurement Prediction from an ensemble of 10 DNNs ... ... Measurement More reliable prediction!
  • 38. mean DNN ensemble for reliable prediction Ŷ Ŷ Ŷ Different Convolutional LSTMNs return slightly different prediction results Ŷ Timeline Y 2 days ... ... ... Raise an alarm if the distance of two vectors is above a defined threshold X (one week time-series) ... ...
  • 39. Data processing pipeline with Flink DataStream API addSource process join applywindow Measurement stream Prediction stream Outlier sink Prediction Sink Score Sink Ŷ … DNN ensemble Ŷ Ŷ Ŷ Input sink how to implement our ensemble pipeline? ... ... ... ... ... mean Ŷ Ŷ Ŷ Ŷ …
  • 40. Data processing pipeline with Flink DataStream API addSource process join applywindow Measurement stream Prediction stream Outlier sink Prediction Sink Score Sink ŶapplykeyBy countWindow (custom evictor) … ... ... ... Ŷ Ŷ Ŷ … … setParallelism(ensembleSize=10) assign Timestamps And Watermarks … Ŷ Ŷ Ŷ setParallelism(1) ... ... ... … DNN ensemble Ŷ Ŷ Ŷ mean Ŷ Ŷ Ŷ Ŷ flatMap … replicate 10 times Input sink applywindowAll Tumbling EventTimeWindow … ... ...
  • 41. Distribute 10 keys evenly over 10 partitions Carefully generate keys not to belong to the same partitions flatMap keyBy (murmurHash) PARTITION_0KEY_0, PARTITION_1KEY_1, PARTITION_9KEY_9, … KEY_1, KEY_9,… KEY_0, replicate 10 times with different keys PARTITION = murmurHash(KEY) / maxParallalism*(parallelism/maxParallism)
  • 42. Contents 1 Why we use Flink for our time-series prediction model 2 Flink pipeline design for rendezvous and DNN ensemble 3 Solution packaging and monitoring with Docker and Prometheus TF Serving Grafana Flink MySQL PrometheusInfluxDB Time-series prediction model
  • 43. A simple software stack on top of Docker Customer machine MySQL official image Grafana official image Prometheus official image TensorFlow Serving official image InfluxDB official image Docker engine Flink official image No custom Docker image! A single yml file is okay to deploy our software stack!
  • 44. Launch JobManager & TaskManager with some changes in the official repository of the Docker image for Flink You need to get flink-metrics-prometheus-1.4-SNAPSHOT.jar by yourself until Flink-1.4 is officially released metrics.reporter : prom metrics.reporter.prom.class : org.apache.flink.metrics.prometheus.PrometheusReporter jobmanager.heap.mb : 10240 taskmanager.heap.mb : 10240
  • 45. * Every process runs inside a Docker container Flink JobManager TaskManager TaskManager TaskManager Prometheus Prometheus scrapes HTTP endpoints of metrics exporters specified in configuration Grafana Solution deployment :9249/metrics :9249/metrics Flink runtime metrics & Custom metrics :9104/metrics MySQLd Exporter MySQL metrics CPU/Disk Memory Network Servers System metrics :9100/metrics Node Exporter :8080/metrics Container metrics cAdvisor TensorFlow Serving inference MySQL source InfluxDB sink Docker Submits a Flink Job to launch our pipeline InfluxDB Sensor time-series dashboard Solution monitoring dashboard * if using TFServing
  • 46. Solution monitoring dashboard * this dashboard is based on ”Docker Dashboard” by Brian Christner Server Mem/CPU/FS usage (by node exporter) Container CPU usage (by cAdvisor) Inference time from each DNN (custom metrics) TaskManager JVM memory Usage # records written to sinks (custom metrics)
  • 47. Recap – contents 1 Why we use Flink for our time-series prediction model 2 Flink pipeline design for rendezvous and DNN ensemble 3 Solution packaging and monitoring with Docker and Prometheus Time-series prediction model TF Serving Grafana Flink MySQL PrometheusInfluxDB
  • 48. Conclusion • Flink helps us concentrate on the core logic • DataStream API is just like a natural language in presenting streaming topologies • flexible windowing mechanisms (count window and evictor) • joining of two streams on event time • Thanks to it, we can focus on • implementation of custom source/sink to meet customer requirements • interaction with DNN ensembles • It has a nice ecosystem to help build a solution • Docker • Prometheus metric reporter
  • 50. You cannot use TFServing with Flink-1.3.2 • Netty binary incompatbility • flink-runtime_2.11:1.3.2 • depends on io.netty:4.0.27.Final • grpc-netty:4.1.14 • depends on io.netty:4.1.14.Final • You could use grpc-okhttp instead of grpc-netty • grpc-okhttp conflicts with another library (influxdb client for Java) • You can use TFServing with FLINK-1.4-SNAPSHOT • [FLINK-7013] Add shaded netty dependency • io.netty.* is recently relocated to org.apache.flink.shaded.netty4.* From “New and noteworthy in 4.1” page of Netty project 4.1 contains multiple additions which might not be fully backward-compatible with 4.0
  • 51. Looking forward to the official release of v1.4 No more 1.4-SNAPSHOT on the customer site!