SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Build a Flink AI Ecosystem
Jiangjie (Becket) Qin
Flink Forward Berlin 2019
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
2
Lambda - what’s everyone doing
HDFS
Message Queue
Batch Processing
Stream Processing
Combine the
results
Query Result
Offline path
Online path
3
Batch Layer
Speed Layer
Serving Layer
Lambda - what’s everyone doing
HDFS
Message Queue
Batch Processing
(Spark/M-R)
Stream Processing
(Flink/Storm)
Combine the
results
Query Result
• Two code bases for online and offline processing logic
• High maintenance cost
• Difficult to ensure consistent processing logic
Offline path
Online path
4
Batch-Stream Processing Unification
• Use the same engine for online and offline processing
• Spark
• Flink
HDFS
Message Queue
Batch Processing
(Flink/Spark)
Stream Processing
(Flink/Spark Streaming)
Combine the
results
Query Result
Offline path
Online path
5
So what about ML?
• A typical ML scenario
• Offline training (TF, PyTorch, etc)
• Static models
• Online inference (Flink)
• The data preprocessing logic in training and inference are often two
code bases
HDFS Offline Training Inference
Static model
Preprocessing
PreprocessingOffline path Online path
6
So what about ML?
• Online training is gaining popularity
• More prompt model update
• Dynamic model and continuous training
• Progressive validation
• More sophisticated monitoring and model deployment / rollback
Message Queue Online Training Inference
dynamic model
PreprocessingOffline path Online path
7
“Lambda” architecture for ML
• Offline training: a static base model
• Online training: incremental updates to the base model
• Users have to deal with different systems / code bases
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
8
Value of Flink
• The inference is latency sensitive online / nearline processing
• Flink is a good option in this case
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
9
Batch-Stream Unification in ML
• The online inference is latency sensitive online / nearline processing
• Flink is a good option in this case
• Use Flink everywhere to avoid maintaining different code bases.
Message Queue
Offline Training
Online Training
Inference
Dynamic model
Static
model
Preprocessing
HDFS Preprocessing
Offline path
Online path
Static model
10
Additional Values
• One-stop data processing solution
• Shared dataset management
• Switch processing APIs freely
Dataset Management
DataStreamSQL ML CEP
Flink AI Ecosystem By ML Stages
Rich connector
support &
Dataset
management
Stream-Batch unification
Strong SQL support
Enhanced Iteration
Flink ML Lib
DL on Flink (TF, PyTorch)
Dynamic model serving
Model Management
Rollout / Rollback
Online monitoring
Online evaluation
Message
Queue
Offline Training
Online Training
Inference
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing
Offline path Online path
Static
model
Model Validation
Flink ML Pipeline,
Python support
12
Data
Acquisition
Model Training Model Validation &
Serving
InferencePreprocessing
Efforts&RequirementsAIFlowMLStage
Flink AI Ecosystem By ML Stages
Rich connector
support &
Dataset
management
Stream-Batch unification
Strong SQL support
Enhanced Iteration
Flink ML Lib
DL on Flink (TF, PyTorch)
Dynamic model serving
Model Management
Rollout / Rollback
Online monitoring
Online evaluation
Message
Queue
Offline Training
Online Training
Inference
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing
Offline path Online path
Static
model
Model Validation
Flink ML Pipeline,
Python support
13
Data
Acquisition
Model Training Model Validation &
Serving
InferencePreprocessing
Efforts&RequirementsAIFlowMLStage
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
14
Flink ML Pipeline - Overview
PipelineStage
EstimatorTransformer
Model
K-Means
NaiveBayes
Linear
regression
DecisionTree
RandomForest
GBDT
Table based ML Pipeline
EstimatorTransformer
table2=Transformer.
transform(table1) Estimator.fit(table2)
ML Lib Developers ML Lib Users
……
Input
Table
Output
Table
15
Data -> Data transition
(Preprocessing, Inference)
Data -> Model transition
(Model Training)
K-Means
NaiveBayes
Linear
regression
GBDT
DecisionTree
PCA
Random
Forest
Correlation
ML libs
……
Rewrite Flink ML Libs
• ML pipeline based
• Table API based
• Battle tested algorithms
Flink ML Libs
16
Training
Inference
Estimator Model
Estimator.fit(input1)
Input1: Table
Model
Result
Table
Model.transform(input2)
Input2: Table
pipeline.fit(input1)
pipeline.transform(input2)
ML Pipeline - Simple Case
17
EstimatorTransformer
output1=Transformer.
transform(input1)
Estimator Pipeline
pipeline.fit(input1)
Estimator.fit(output1)
pipeline.transform(input2)
Model.transform(output2)
Result Table
ModelTransformerInput1: TableTraining
ModelTransformer
output2=Transformer.
transform(input2)
Model Pipeline
Input2: Table
Model Pipeline
Inference
ML Pipeline
18
Value of Flink ML Pipeline
• Unify APIs of Model Training and Inference for the end users
• End users only needs to deal with either Estimators or Transformers
• Ensure consistent logic between training and inference
• The same pipeline topology in training will be persisted and used for inference
19
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
20
Data Acquisition
Data Process and
Transformation
Model Training Test and Validation Model Serving
Model or Params
Tuning
Deep Learning Pipeline
21
Distributed TF framework in a Cluster/Environment
WORKER WORKER WORKER
PS PS
Resulting
Model
One Flink job in Cluster/Environment
SOURCE
SOURCE
JOIN UDTF
External
Storage
Queue
>>> >>>
Data Acquisition
Data Process and
Transformation
Model Training
Deep Learning Pipeline
22
Data Acquisition
Data Process and
Transformation
Model Training Test and Validation Model Serving
Model or Params
Tuning
Deep Learning Pipeline
23
One single Flink job in a Cluster/Environment
Distributed TF framework in a Cluster/Environment
WORKER WORKER WORKER
PS PS
Resulting
Model
SOURCE
SOURCE
JOIN UDTF WORKER
PS PS
WORKER WORKER
One Flink job in Cluster/Environment
SOURCE
SOURCE
JOIN UDTF
External
Storage
Queue
>>> >>>
Resulting
Model
TensorFlow-Flink Integration
24
DL on Flink and ML Pipeline integration
One single Flink job in a Cluster/Environment
SOURCE
SOURCE
JOIN UDTF WORKER
PS PS
WORKER WORKER
Resulting
Model
Transformer Estimator
The ML Pipeline API could be used for both traditional ML and deep learning.
25
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
26
• Native iteration implemented by the processing engine
• Feedback edge on the processing DAG
• Improve the caveats in DataSet / DataStream iterations
Flink Cluster
Partition 1
Partition 2
Partition 3
Partition N
…
map
map
map
map
…
Enhance Iteration in Flink
27
{
val a: Table = ...
val b: Table = ...
val resultSeq = Table.iterate(a, b) {
val next_a = b.select('v_b + 1 as 'v_a)
val next_b = next_a.select('v_a * 2 as 'v_b)
Seq(next_a, next_b)
}.times(10)
}
Iteration variables
Step function
Termination condition
Multi-variable iteration
28
{
val a: Table = ...
val b: Table = ...
val resultSeq = iterate(a, b) {
val next_a = iterate(a) {
Seq(a.select(‘v_a + 1 as 'v_a))
}.times(100).head
val next_b = next_a.select('v_a * 2 as 'v_b)
Seq(next_a, next_b)
}.times(10)
}
Nested Iteration
29
Mini-batch iteration
• A stream is chunked in to multiple mini-batches
• Each mini-batch iterates independently in the iteration loop
• The results are emitted in the mini-batch order
MB3
MB2
MB1
Flink Cluster
Partition 1
Partition 2
Partition 3
Partition N
…
map
map
map
map
…
MB2 MB1
30
Mini-batch iteration
• Native support for Stochastic Gradient Descendent (SGD)
• Native support for online learning
31
Iteration and Dynamic Model Update
Model
Initial model
Samples
Gradient
Computing
Gradient
Reduce
Model_V1
Model_V2
Model_V3
…
Final Model
32
Iteration and Dynamic Model Update
Model
Initial model
Samples
Gradient
Computing
Gradient
Reduce
Model_V1
Model_V2
Model_V3
…
Final Model
33
Dynamic Model Serving
Message
Queue
Offline Training
Online Training
Dynamicmodel
Static
model
Preprocessing
HDFS Preprocessing Static model
Model Validation
Samples
Inference
Model_V1
Model_V2
Model_V3
…
The exact same mechanism of native iteration could be used for dynamic model serving.
34
Agenda
• Why AI Ecosystem on Flink?
• Flink ML Pipeline & Flink ML Libs
• Deep learning on Flink
• Enhanced Iteration & Dynamic Model Serving
• Better Python support
35
Python
process
Java process
input
Python Table API Python UDF
Python
TableAPI
Java
gateway
Server
RPC (Py4j)
Python
gateway
Python VM
DAGGragh
upstream input
downstream output
output
Flink Python Table API
36
Working with Apache Beam Community
More Python API Support
• Flink ML Pipeline
• Flink-AI-Extended
• DataStream
37
Summary
• Flink has unique values in AI use case
• Flink suits very well in the “lambda” ML architecture
• Multiple ongoing works to make Flink more AI friendly
• Flink ML Pipeline
• Flink ML Libs
• Deep learning on Flink
• Iteration enhancement
• Python API
• …
38
Q & A
We are hiring!!
becket.qin@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 

Was ist angesagt? (20)

2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 

Ähnlich wie [FFE19] Build a Flink AI Ecosystem

Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Lightbend
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 

Ähnlich wie [FFE19] Build a Flink AI Ecosystem (20)

Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetup
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017
 
AI made easy with Flink AI Flow
AI made easy with Flink AI FlowAI made easy with Flink AI Flow
AI made easy with Flink AI Flow
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 

Kürzlich hochgeladen

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

[FFE19] Build a Flink AI Ecosystem

  • 1. Build a Flink AI Ecosystem Jiangjie (Becket) Qin Flink Forward Berlin 2019
  • 2. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 2
  • 3. Lambda - what’s everyone doing HDFS Message Queue Batch Processing Stream Processing Combine the results Query Result Offline path Online path 3 Batch Layer Speed Layer Serving Layer
  • 4. Lambda - what’s everyone doing HDFS Message Queue Batch Processing (Spark/M-R) Stream Processing (Flink/Storm) Combine the results Query Result • Two code bases for online and offline processing logic • High maintenance cost • Difficult to ensure consistent processing logic Offline path Online path 4
  • 5. Batch-Stream Processing Unification • Use the same engine for online and offline processing • Spark • Flink HDFS Message Queue Batch Processing (Flink/Spark) Stream Processing (Flink/Spark Streaming) Combine the results Query Result Offline path Online path 5
  • 6. So what about ML? • A typical ML scenario • Offline training (TF, PyTorch, etc) • Static models • Online inference (Flink) • The data preprocessing logic in training and inference are often two code bases HDFS Offline Training Inference Static model Preprocessing PreprocessingOffline path Online path 6
  • 7. So what about ML? • Online training is gaining popularity • More prompt model update • Dynamic model and continuous training • Progressive validation • More sophisticated monitoring and model deployment / rollback Message Queue Online Training Inference dynamic model PreprocessingOffline path Online path 7
  • 8. “Lambda” architecture for ML • Offline training: a static base model • Online training: incremental updates to the base model • Users have to deal with different systems / code bases Message Queue Offline Training Online Training Inference Dynamic model Static model Preprocessing HDFS Preprocessing Offline path Online path Static model 8
  • 9. Value of Flink • The inference is latency sensitive online / nearline processing • Flink is a good option in this case Message Queue Offline Training Online Training Inference Dynamic model Static model Preprocessing HDFS Preprocessing Offline path Online path Static model 9
  • 10. Batch-Stream Unification in ML • The online inference is latency sensitive online / nearline processing • Flink is a good option in this case • Use Flink everywhere to avoid maintaining different code bases. Message Queue Offline Training Online Training Inference Dynamic model Static model Preprocessing HDFS Preprocessing Offline path Online path Static model 10
  • 11. Additional Values • One-stop data processing solution • Shared dataset management • Switch processing APIs freely Dataset Management DataStreamSQL ML CEP
  • 12. Flink AI Ecosystem By ML Stages Rich connector support & Dataset management Stream-Batch unification Strong SQL support Enhanced Iteration Flink ML Lib DL on Flink (TF, PyTorch) Dynamic model serving Model Management Rollout / Rollback Online monitoring Online evaluation Message Queue Offline Training Online Training Inference Dynamicmodel Static model Preprocessing HDFS Preprocessing Offline path Online path Static model Model Validation Flink ML Pipeline, Python support 12 Data Acquisition Model Training Model Validation & Serving InferencePreprocessing Efforts&RequirementsAIFlowMLStage
  • 13. Flink AI Ecosystem By ML Stages Rich connector support & Dataset management Stream-Batch unification Strong SQL support Enhanced Iteration Flink ML Lib DL on Flink (TF, PyTorch) Dynamic model serving Model Management Rollout / Rollback Online monitoring Online evaluation Message Queue Offline Training Online Training Inference Dynamicmodel Static model Preprocessing HDFS Preprocessing Offline path Online path Static model Model Validation Flink ML Pipeline, Python support 13 Data Acquisition Model Training Model Validation & Serving InferencePreprocessing Efforts&RequirementsAIFlowMLStage
  • 14. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 14
  • 15. Flink ML Pipeline - Overview PipelineStage EstimatorTransformer Model K-Means NaiveBayes Linear regression DecisionTree RandomForest GBDT Table based ML Pipeline EstimatorTransformer table2=Transformer. transform(table1) Estimator.fit(table2) ML Lib Developers ML Lib Users …… Input Table Output Table 15 Data -> Data transition (Preprocessing, Inference) Data -> Model transition (Model Training)
  • 16. K-Means NaiveBayes Linear regression GBDT DecisionTree PCA Random Forest Correlation ML libs …… Rewrite Flink ML Libs • ML pipeline based • Table API based • Battle tested algorithms Flink ML Libs 16
  • 17. Training Inference Estimator Model Estimator.fit(input1) Input1: Table Model Result Table Model.transform(input2) Input2: Table pipeline.fit(input1) pipeline.transform(input2) ML Pipeline - Simple Case 17
  • 18. EstimatorTransformer output1=Transformer. transform(input1) Estimator Pipeline pipeline.fit(input1) Estimator.fit(output1) pipeline.transform(input2) Model.transform(output2) Result Table ModelTransformerInput1: TableTraining ModelTransformer output2=Transformer. transform(input2) Model Pipeline Input2: Table Model Pipeline Inference ML Pipeline 18
  • 19. Value of Flink ML Pipeline • Unify APIs of Model Training and Inference for the end users • End users only needs to deal with either Estimators or Transformers • Ensure consistent logic between training and inference • The same pipeline topology in training will be persisted and used for inference 19
  • 20. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 20
  • 21. Data Acquisition Data Process and Transformation Model Training Test and Validation Model Serving Model or Params Tuning Deep Learning Pipeline 21
  • 22. Distributed TF framework in a Cluster/Environment WORKER WORKER WORKER PS PS Resulting Model One Flink job in Cluster/Environment SOURCE SOURCE JOIN UDTF External Storage Queue >>> >>> Data Acquisition Data Process and Transformation Model Training Deep Learning Pipeline 22
  • 23. Data Acquisition Data Process and Transformation Model Training Test and Validation Model Serving Model or Params Tuning Deep Learning Pipeline 23
  • 24. One single Flink job in a Cluster/Environment Distributed TF framework in a Cluster/Environment WORKER WORKER WORKER PS PS Resulting Model SOURCE SOURCE JOIN UDTF WORKER PS PS WORKER WORKER One Flink job in Cluster/Environment SOURCE SOURCE JOIN UDTF External Storage Queue >>> >>> Resulting Model TensorFlow-Flink Integration 24
  • 25. DL on Flink and ML Pipeline integration One single Flink job in a Cluster/Environment SOURCE SOURCE JOIN UDTF WORKER PS PS WORKER WORKER Resulting Model Transformer Estimator The ML Pipeline API could be used for both traditional ML and deep learning. 25
  • 26. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 26
  • 27. • Native iteration implemented by the processing engine • Feedback edge on the processing DAG • Improve the caveats in DataSet / DataStream iterations Flink Cluster Partition 1 Partition 2 Partition 3 Partition N … map map map map … Enhance Iteration in Flink 27
  • 28. { val a: Table = ... val b: Table = ... val resultSeq = Table.iterate(a, b) { val next_a = b.select('v_b + 1 as 'v_a) val next_b = next_a.select('v_a * 2 as 'v_b) Seq(next_a, next_b) }.times(10) } Iteration variables Step function Termination condition Multi-variable iteration 28
  • 29. { val a: Table = ... val b: Table = ... val resultSeq = iterate(a, b) { val next_a = iterate(a) { Seq(a.select(‘v_a + 1 as 'v_a)) }.times(100).head val next_b = next_a.select('v_a * 2 as 'v_b) Seq(next_a, next_b) }.times(10) } Nested Iteration 29
  • 30. Mini-batch iteration • A stream is chunked in to multiple mini-batches • Each mini-batch iterates independently in the iteration loop • The results are emitted in the mini-batch order MB3 MB2 MB1 Flink Cluster Partition 1 Partition 2 Partition 3 Partition N … map map map map … MB2 MB1 30
  • 31. Mini-batch iteration • Native support for Stochastic Gradient Descendent (SGD) • Native support for online learning 31
  • 32. Iteration and Dynamic Model Update Model Initial model Samples Gradient Computing Gradient Reduce Model_V1 Model_V2 Model_V3 … Final Model 32
  • 33. Iteration and Dynamic Model Update Model Initial model Samples Gradient Computing Gradient Reduce Model_V1 Model_V2 Model_V3 … Final Model 33
  • 34. Dynamic Model Serving Message Queue Offline Training Online Training Dynamicmodel Static model Preprocessing HDFS Preprocessing Static model Model Validation Samples Inference Model_V1 Model_V2 Model_V3 … The exact same mechanism of native iteration could be used for dynamic model serving. 34
  • 35. Agenda • Why AI Ecosystem on Flink? • Flink ML Pipeline & Flink ML Libs • Deep learning on Flink • Enhanced Iteration & Dynamic Model Serving • Better Python support 35
  • 36. Python process Java process input Python Table API Python UDF Python TableAPI Java gateway Server RPC (Py4j) Python gateway Python VM DAGGragh upstream input downstream output output Flink Python Table API 36 Working with Apache Beam Community
  • 37. More Python API Support • Flink ML Pipeline • Flink-AI-Extended • DataStream 37
  • 38. Summary • Flink has unique values in AI use case • Flink suits very well in the “lambda” ML architecture • Multiple ongoing works to make Flink more AI friendly • Flink ML Pipeline • Flink ML Libs • Deep learning on Flink • Iteration enhancement • Python API • … 38
  • 39. Q & A We are hiring!! becket.qin@gmail.com