This session will introduce a new open-source project - Flink TensorFlow - that enables Flink programs to operate on data using TensorFlow machine learning models. Applications include real-time image processing, NLP, and anomaly detection. The session will: - Introduce TensorFlow and describe its component model which allows for model reuse across environments - Demonstrate how to use TensorFlow models in Flink ML and Flink Streaming environments - Present a roadmap and provide opportunities to contribute
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
1. Eron Wright
@eronwright
TensorFlow & Apache
FlinkTM
An early look at a community project
Apache®, Apache Flink™, Flink™, and the Apache feather logo are trademarks of The Apache Software Foundation.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
https://github.com/cookieai/flink-tensorflow
3. Why TensorFlow?
A powerful & flexible platform for machine
intelligence
Reusable machine learning models
C++ core / Java language binding
Ease of integration with Apache Flink
3
4. TF Scenarios
Language Understanding
• “syntaxnet”, Google Translate
Image/Video/Audio Recognition
• “Inception”
Creative Arts
• “Magenta”
4
5. TF Models
Portable using “Saved Model” format
• Train on GPU-equipped cluster
• Perform inference anywhere
Well-defined interactions and data types
using “signatures”
Moving towards a Model Zoo
5
6. TF Graphs
6
x
W
*
b
+ softmax() y
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
9. Project Status
A prototype focused on inference using
pre-trained TF models
Scala-only (for now)
A community effort
9
10. Basic Idea
Use TF functionality in a Flink program;
Not a TF compatibility layer
“TF graph as a Flink map function”
Support inference today, online learning in
the future
10
12. “Johnny”
A hypothetical security system based on
picture passwords
• Present three specific pictures within one
minute: Access Granted!
• On timeout: Access Denied!
TF model for image labeling (“Inception”)
Flink CEP library for sequence detection
12
13. Inception Model
Pretrained with ImageNet dataset
Supports “retraining” for learning new
objects not in original dataset
13
load() label()
burger.jpg
ladybug.jpg
Stream
(Connector)
1 0.02
..
623 0.97
…
16. Basic Usage
1. Import a TensorFlow model
2. Write code to convert your domain
objects to/from tensors
3. Use the model in a batch or streaming
function
16
17. Importing a Model
1. Define the graph method(s) supported by
the model (ref)
2. Specify how to load the model
1. ”saved model” loader, or
2. graphdef loader, or
3. ad-hoc graph builder
17
19. Working with Tensors
Tensors are off-heap, AutoCloseable multi-
dimensional arrays
You: convert input records to tensor
Use Scala Automatic Resource
Management (ARM)
• Supports both imperative and monadic style
19
20. Writing a Flink Function
Design goal: use TF in any transformation
function
• `MapFunction`
• `ProcessFunction` (with event-time timers!)
• `WindowFunction`
Required: model lifecycle support
• `ModelAwareFunction`
20
21. Runtime
TF embedded within the Flink JobManager
/ TaskManager
One model instance per sub-task
Large unmanaged memory blocks
No Python needed
21
24. Graph Builder (DSL)
Construct TensorFlow graphs from scratch
(ref)
Code generation for TF operations
Incorporate other libraries, high-level APIs
(e.g. TF Keras)
24
25. Other
Instrumentation
• TF Summaries (for TensorBoard)
• Flink Metrics
Model versioning
• Leverage Flink job versioning methods
• Treat models as side-input
25
26. Eron Wright
@eronwright
TensorFlow & Apache
FlinkTM
An early look at a community project
Apache®, Apache Flink™, Flink™, and the Apache feather logo are trademarks of The Apache Software Foundation.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
https://github.com/cookieai/flink-tensorflow
Best of both worlds:
Flink unified stream and batch programming
Machine learning with event time as first-class aspect
Connectors for rich I/O with exactly-once semantics
Integration with Flink libraries
TensorFlow GPU!
Best of both worlds:
Flink unified stream and batch programming
Machine learning with event time as first-class aspect
Connectors for rich I/O with exactly-once semantics
Integration with Flink libraries
TensorFlow GPU!
- Show labels
- Be sure to use a model-aware function.
Load from HDFS
Restoring state
Load from HDFS
Restoring state
Typed tensors
Converters
Scala ARM
- mode-aware function
One model instance per sub-task, not per key
Uses unmanaged memory, suggest tuning Flink memory settings
Has performance advantage over remote TF, and vastly more flexibility