SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
Fabric for Deep Learning
FfDL
FfDL Github Page
https://github.com/IBM/FfDL
FfDL dwOpen Page
https://developer.ibm.com/code/open/projects/
fabric-for-deep-learning-ffdl/
FfDL Announcement Blog
http://developer.ibm.com/code/2018/03/20/
fabric-for-deep-learning
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20/
democratize-ai-with-fabric-for-deep-learning
Deep Learning as a Service within Watson Studio
https://www.ibm.com/cloud/deep-learning
Research paper: “Scalable Multi-Framework
Management of Deep Learning Training Jobs”
http://learningsys.org/nips17/assets/papers/
paper_29.pdf
FfDL
1
Animesh Singh, Tommy Li
@AnimeshSingh
@Tomipli
https://github.com/IBM/FfDL

that automate
decisions.

to build models
Use data

The Enterprise AI Process
2
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Center for Open Source
Data and AI Technologies
March 30 2018 / © 2018 IBM Corporation
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codaitCode - Build and improve practical frameworks
to enable more developers to realize immediate
value (e.g. FfDL, Tensorflow Jupyter, Spark)
Content – Showcase solutions to complex and
real world AI problems
Community – Bring developers and data
scientists together to engage
Improving Enterprise AI lifecycle in Open Source
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
CODAIT
codait.org
3
Machine Learning!
and AI !
are everywhere 
4IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation
facial recognition
unlocks your phone
fraud detection
protects your credit
recommendations
help you shop faster
speech recognition
lets you go hands-free
chat bots
route calls quicker
autonomous vehicles
detect pedestrians
machine vision
detects cancer early
spam detection
unclogs your Inbox
Deep Learning Has Revolutionized
Machine Learning
5
Data
Accuracy
Deep
Learning
Traditional
Machine
Learning
100
80
60
40
20
0
# of Searches for Deep Learning from
2011 to 2017
Source: Google Trends. Search term “Deep Learning”
2011 2012 2013 2014 2015 2016 2017
mile 2
mile 1
mile 3


Deep learning marathon!

not a sprint
mile 4
We are here!
mile 26.2
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 6
2012
AI winter
AI summer
1985
AI spring
Deep Learning
+ GPUs
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 7
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997


Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 8
what’s slowing progress in deep learning?
too few
practitioners
tools are young 
and evolving
need to do more
with less data
IBM Watson Studio
TensorBoard
IBM Watson Studio
30 million students
8 million students
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 9
A human brain has:
‱  200 billion neurons
‱  32 trillion connections between them
‱  25 million “neurons”
‱  100 million connections (parameters)
Deep Learning = Training ArtiïŹcial Neural Networks
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 10
What is an artiïŹcial neuron?
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation
 11
input
 neuron
 Output
Think of them as calculators
X2
X3
X1
Xn


neuron
inputs
 Output = x1 + x2 + x3 + 
Xn
How do humans recognize numbers?!
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 12
Human brains detect patterns within variations!
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 13
Perhaps by decomposing into sub-parts?!
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 14
Source: https://ml4a.github.io/ml4a/neural_networks/
28 pixels!
28 pixels!
Pixels of an image capture variations in light and dark!
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 15
How do we teach a computer these 784 pixels are the number 8?!
Source: https://ml4a.github.io/ml4a/neural_networks/
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 16
p1
 p2
 p3
 p784


784 pixels
p4
 p5
 p6
 p7
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation
 17
Our ïŹrst layer of neurons
One neuron per pixel in our image example
Source: https://ml4a.github.io/ml4a/neural_networks/Putting it all together!
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 18
Source: https://ml4a.github.io/ml4a/neural_networks/
output layer
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 19
Each layer transforms the input to match the desired output!
prediction!
Source: https://ml4a.github.io/ml4a/neural_networks/
Not just pixels. Text, sound and much more can be used an input!!
Lorem ipsum dolor sit amet, nam
id alterum principes cotidieque,
at suas indoctum his. No inani
soleat sed, per illum quaestio id.
No prompta luptatum sit. His alii
alterum feugiat ne. Eu delenit
expetendis duo, no possit utamur
patrioque mei. Admodum
appellantur at quo, albucius
periculis adolescens an mel, veri
quaerendum sea ut.Eam
noluisse copiosae democritum
ei, cu eos.
Lorem
ipsum
dolor
sit
Amet
nam
id
alterum
principes
Cotidieque
at
suas
indoctum
eos
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 20
Some useful
prediction like
sentiment or
even fraud!
Backpropagation: Iteratively train a neuron
X2
X3
X1
Xn


Wn
W1
W2
W3
output
neuron
desired
output
Δ error / loss
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 21
optimizatio
n
function
Adjust weights until the output matches expectation
How does deep learning work?
start with your data
data
data data
data
1
data
data
Enter new data into your model
If patterns in the new data
match the training data then
the model makes accurate
predictions
5
prediction
???
trained model
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 22
DeïŹne a neural network
2
Model learns to recognize
patterns in historical data
3
4
GPU = Graphics Processing Unit
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 23
Application !
Developer
pre-trained model!
SME
transfer learning!
data scientist
custom models!
your
domain
data
+
There are 3 paths to AI systems
1 2 3
your
domain
data
+
pre-Trained
model
+
pre-trained
model
+
pre-trained
model
Application
Developer
1) Pre-Trained Models
deploys
app
submit data
prediction
Pre-Trained
Model
transfer learning
model
domain
data
Application
Developer
Deploy to application
SME
2) Transfer Learning
Deploy to application
3) Create Custom Models
Application
Developer
domain
data
data scientist
custom model
Take a Multi-Framework Approach to Deep Learning
New frameworks emerging monthly.
Tensorflow was awesome yesterday but has static graphs so PyTorch’s dynamic graphs are now popular.
Caffe2
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 28
Neural Network Design WorkïŹ‚ow! domain
data
design
neural network
 HPO
‱  neural network structure
‱  hyperparameters
NO
Performance
meets needs?
Start another
experiment
 optimal
hyperparameters
Neural Network Design WorkïŹ‚ow! domain
data
HPO
‱  neural network structure
‱  hyperparameters
NO
yes
Performance
meets needs?
Start another
experiment
trained
model
deployCloud
optimal
hyperparameters
evaluate
BAD
 Still
good!
design
neural network
Introducing
Fabric for Deep Learning
FfDL (pronounced as ïŹddle)
Multi Framework Approach
to Deep Learning, on your
own Cloud
31
Fabric for Deep Learning
https://github.com/IBM/FfDL
FfDL provides a scalable, resilient, and fault
tolerant deep-learning framework
FfDL Github Page
https://github.com/IBM/FfDL
FfDL dwOpen Page
https://developer.ibm.com/code/open/projects/
fabric-for-deep-learning-ffdl/
FfDL Announcement Blog
http://developer.ibm.com/code/2018/03/20/
fabric-for-deep-learning
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20/
democratize-ai-with-fabric-for-deep-learning
Deep Learning as a Service within Watson Studio
https://www.ibm.com/cloud/deep-learning
Research paper: “Scalable Multi-Framework
Management of Deep Learning Training Jobs”
http://learningsys.org/nips17/assets/papers/
paper_29.pdf
‱  Fabric for Deep Learning or FfDL (pronounced as â€˜ïŹddle’) is an
open source project which aims at making Deep Learning easily
accessible to the people it matters the most i.e. Data Scientists,
and AI developers.
‱  FfDL Provides a consistent way to deploy, train and visualize
Deep Learning jobs across multiple frameworks like
TensorFlow, Caffe, PyTorch, Keras etc.
‱  FfDL is being developed in close collaboration with IBM
Research and IBM Watson. It forms the core of Watson`s Deep
Learning service in open source.
FfDL
32
Fabric for Deep Learning
https://github.com/IBM/FfDL
FfDL is built using Microservices architecture
on Kubernetes
‱  FfDL platform uses a microservices architecture to offer
resilience, scalability, multi-tenancy, and security without
modifying the deep learning frameworks, and with no or minimal
changes to model code.
‱  FfDL control plane microservices are deployed as pods on
Kubernetes to manage this cluster of GPU- and CPU-enabled
machines effectively
‱  Tested Platforms: Minikube, IBM Cloud Public, IBM Cloud
Private, GPUs using both Kubernetes feature gate Accelerators
and NVidia device plugins
33
source code
training
deïŹnition
Access to elastic compute leveraging Kubernetes
Auto-allocation means infrastructure is used only when needed
Kubernetes container
training
artifacts
compute cluster
NVIDIA Tesla K80, P100, V100
Cloud Object Storage
Training assets are
managed and tracked.
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 34
NVIDIA GPUs
Kubernetes
container orchestration

training runs
containers
Model training distributed across containers
server cluster
dataset
Cloud Object Storage
35
36
FfDL: Architecture
37
FfDL: Research Papers
https://arxiv.org/abs/1709.05871
38
FfDL: Research Papers
http://learningsys.org/nips17/assets/papers/paper_29.pdf
And we offer more
Model Asset Exchange
MAX
and
Advarsarial Robustness Toolkit
ART
39
IBM Model Asset eXchange
MAX
MAX is a one stop exchange to ïŹnd ML/DL
models created using popular Machine
Learning engines and provides a
standardized approach to consume these
models for training and inferencing.
40
developer.ibm.com/code/exchanges/models/
IBM Adversarial Robustness
Toolkit
ART
ART is a library dedicated to adversarial
machine learning. Its purpose is to allow rapid
crafting and analysis of attacks and defense
methods for machine learning models. The
Adversarial Robustness Toolbox provides an
implementation for many state-of-the-art
methods for attacking and defending
classiïŹers.
41
https://developer.ibm.com/code/open/projects/
adversarial-robustness-toolbox/
The Adversarial Robustness Toolbox contains
implementations of the following attacks:
Deep Fool (Moosavi-Dezfooli et al., 2015)
Fast Gradient Method (Goodfellow et al., 2014)
Jacobian Saliency Map (Papernot et al., 2016)
Universal Perturbation (Moosavi-Dezfooli et al., 2016)
Virtual Adversarial Method (Moosavi-Dezfooli et al.,
2015)
C&W Attack (Carlini and Wagner, 2016)
NewtonFool (Jang et al., 2017)
The following defense methods are also supported:
Feature squeezing (Xu et al., 2017)
Spatial smoothing (Xu et al., 2017)
Label smoothing (Warde-Farley and Goodfellow, 2016)
Adversarial training (Szegedy et al., 2013)
Virtual adversarial training (Miyato et al., 2017)
FfDL
Core of Deep Learning as a
Service in Watson Studio
42
Model Lifecycle Management
Machine Learning Runtimes Deep Learning Runtimes
Authoring Tools
Cloud Infrastructure as a Service
‱ Most popular open source frameworks
‱ IBM best-in-class frameworks
‱ Create, collaborate, deploy, and monitor
‱ Best of breed open source & IBM tools
‱ Code (R, Python or Scala) and no-code/visual
modeling tools
‱ Fully managed service
‱ Container-based resource management
‱ Elastic pay as you go cpu/gpu power
Watson Studio
Tools for supporting the end-to-end AI workflow
3
Train neural
networks in
parallel across
NVIDIA GPUs.
Pay only for what
you use. Auto-
deallocation
means no more
remembering to
shutdown your
cloud training
instances.
Monitor batch training
experiments then
compare cross-model
performance without
worrying about log
transfers and scripts to
visualize results. You
focus on designing your
neural networks. We’ll
manage and track your
assets.
Python client, command
line interface (CLI) or
UI? You choose the
tooling that best ïŹts
your existing workflows.
Training history and
assets are tracked then
automatically
transferred to the
customer’s Object
Storage for quick
access.
Deploy models into
production then
monitor them to
evaluate
performance.
Capture new data
for continuous
learning and
retrain models so
they continually
adapt to changing
conditions.
Deep Learning as a Service within Watson Studio‹
Using FfDL as core‹
Neural Network Modeller within Watson Studio‹
An intuitive drag-and-drop, no-code interface for designing neural network structure‹
DLaaS Training Dashboard in Watson Studio‹
FfDL
Architecture Details
47
OBJECT
STORAGE	
REST 
API
CLIs
SDKs
Browser
Parameter Server
Lifecycle 
Manager
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Training
Data
Service
Mongo
DB
Trainer 
Service

EtcD
!

Prometheus
Push Gateway
Alert Manager


Log Collector
ELK
Stack
Web UI
Training Job
Model
DeïŹnition
‹
Training
Data

Trained
Models


Launch
Training
Job
!
! FfDL:	Current	Release
OBJECT
STORAGE	
REST 
API
CLIs
SDKs
Browser
Parameter Server
Lifecycle 
Manager
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Training
Data
Service
Mongo
DB
Trainer 
Service

EtcD
!

Prometheus
Push Gateway
Alert Manager


Log Collector
ELK
Stack
Web UI
Training Job
Model
DeïŹnition
‹
Training
Data

Trained
Models


Launch
Training
Job
!
! FfDL:	Current	Release	
REST	API	
	
‱  The	REST	API	microservice	handles	REST-level	
HTTP	requests	and	acts	as	proxy	to	the	lower-
level	gRPC	Trainer	service.		
‱  The	service	also	load-balances	requests	and	is	
responsible	for	authenHcaHon.	Load	balancing	is	
implemented	by	registering	the	REST	API	service	
instances	dynamically	in	a	service	registry.		
‱  The	interface	is	speciïŹed	through	a	Swagger	
deïŹniHon	ïŹle.	
REST 
API
OBJECT
STORAGE	
REST 
API
CLIs
SDKs
Browser
Parameter Server
Lifecycle 
Manager
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Training
Data
Service
Mongo
DB
Trainer 
Service

EtcD
!

Prometheus
Push Gateway
Alert Manager


Log Collector
ELK
Stack
Web UI
Training Job
Model
DeïŹnition
‹
Training
Data

Trained
Models


Launch
Training
Job
!
! FfDL:	Current	Release	
Trainer 
Service
Trainer	
	
‱  The	Trainer	service	admits	training	job	requests,	
persisHng	metadata	and	model	input	
conïŹguraHon	in	a	database	(MongoDB).		
‱  It	iniHates	job	deployment,	halHng,	and	(user-
requested)	job	terminaHon	by	calling	the	
appropriate	gRPC	methods	on	the	Lifecycle	
Manager	microservice.	
‱  The	Trainer	also	assigns	a	unique	idenHïŹer	to	
each	job,	which	is	used	by	all	other	components	
to	track	the	job.	
‱  The	data	can	also	be	used	for	billing/chargeback	
purposes
OBJECT
STORAGE	
REST 
API
CLIs
SDKs
Browser
Parameter Server
Lifecycle 
Manager
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Training
Data
Service
Mongo
DB
Trainer 
Service

EtcD
!

Prometheus
Push Gateway
Alert Manager


Log Collector
ELK
Stack
Web UI
Training Job
Model
DeïŹnition
‹
Training
Data

Trained
Models


Launch
Training
Job
!
! FfDL:	Current	Release	
Lifecycle 
Manager
Lifecycle	Manager		
	
‱  The	Lifecycle	Manager	(LCM)	deploys	training	
jobs	arriving	from	the	Trainer,	halHng	(pausing)	
and	terminaHng	training	jobs.		
‱  LCM	uses	the	Kubernetes	cluster	manager	to	
deploy	containerized	training	jobs.		
‱  A	training	job	is	a	set	of	interconnected	
Kubernetes	pods,	each	containing	one	or	more	
Docker	containers.
OBJECT
STORAGE	
REST 
API
CLIs
SDKs
Browser
Parameter Server
Lifecycle 
Manager
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Training
Data
Service
Mongo
DB
Trainer 
Service

EtcD
!

Prometheus
Push Gateway
Alert Manager


Log Collector
ELK
Stack
Web UI
Training Job
Model
DeïŹnition
‹
Training
Data

Trained
Models


Launch
Training
Job
!
! FfDL:	Current	Release	
Lifecycle 
Manager
Training	Jobs		-	Learner	Pods	
	
‱  The	LCM	determines	the	learner	pods,	
parameter	servers,	and	interconnecHons	among	
them	based	on	the	job	conïŹguraHon,	and	calls	
on	Kubernetes	for	deployment.		
‱  For	example,	if	a	user	creates	a	TensorïŹ‚ow	
training	job	with	four	learners	and	two	CPUs/
GPUs	per	learner,	the	LCM	creates	ïŹve	pods:	one	
for	each	learner	(called	the	learner	pod),	and	
one	monitoring	pod	called	the	job	monitor.	
‱  As	the	training	job	progresses,	informaHon	is	
needed	for	evaluaHon	of	the	ongoing	success	or	
failure	of	the	learning	progress.	These	metrics	
normally	come	in	the	form	of	scalar	values,	and	
are	termed	evaluaHon	metrics	
	
Parameter Server
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Log Collector
Training Job
OBJECT
STORAGE	
REST 
API
CLIs
SDKs
Browser
Parameter Server
Lifecycle 
Manager
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Training
Data
Service
Mongo
DB
Trainer 
Service

EtcD
!

Prometheus
Push Gateway
Alert Manager


Log Collector
ELK
Stack
Web UI
Training Job
Model
DeïŹnition
‹
Training
Data

Trained
Models


Launch
Training
Job
!
! FfDL:	Current	Release	
Training
Data
Service
Training	Data	Service	
	
‱  The	Training	Data	Service	(TDS)	provides	short-
lived	storage	and	retrieval	for	logs	and	
evaluaHon	data	from	a	Deep	Learning	training	
job.		
‱  While	the	learning	job	is	running,	a	process	runs	
as	a	sidecar	to	extract	the	training	data	from	the	
learner,	and	then	pushes	that	data	into	the	TDS,	
which	pushes	the	data	into	ElasHc	Search.	
‱  	The	sidecars	used	for	collecHng	training	data	are	
termed	log-collectors.	Depending	on	the	
framework	and	desired	extracHon	method,	
diïŹ€erent	types	of	log-collectors	can	be	used.	Log-
collectors		responsibiliHes	include	at	least	both	
log	line	collecHon,	and	evaluaHon	metrics	
extracHon.
OBJECT
STORAGE	
Model
DeïŹnition
‹
Training
Data

Trained
Models


REST 
API
CLIs
SDKs
Browser
Parameter Server
Lifecycle 
Manager
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Training
Data
Mongo
DB
Trainer 
Service
Model
DeïŹnition
Training
Data
Trained
Models

EtcD
Launch
Job
Status
Job Info
!

Prometheus
Push Gateway
Alert Manager


Log Collector
ELK
Stack
Web UI
Training Job
!
! FfDL:	Current	Release
OBJECT
STORAGE	
Model
DeïŹnition
‹
Training
Data

Trained
Models


REST 
API
CLIs
SDKs
Browser
Parameter Server
Lifecycle 
Manager
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Job Monitor
Training
Data
Mongo
DB
Trainer 
Service
Model
DeïŹnition
Training
Data
Trained
Models

EtcD
Launch
Job
Status
Job Info
!

Prometheus
Push Gateway
Alert Manager


Log Collector
ELK
Stack
Web UI
Training Job
!
! FfDL:	Current	Release
OBJECT
STORAGE	
Model
DeïŹnition
‹
Training
Data

Trained
Models


REST 
API
CLIs
SDKs
Browser
Parameter
Server
Lifecycle 
Manager
Job Monitor
Training
Data
Mongo
DB
Trainer 
Service

EtcD
Launch
Job
Status
Job Info
!

Prometheus
Push Gateway
Alert Manager


ELK
Stack
Web UI
Horovod
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Log Collector
Training Job
MOUNT	
OBJECT	
STORAGE	
!
! FfDL:	Next	Release	(v0.1)
Demos
Animesh Singh, Tommy Li
@AnimeshSingh @Tomipli
57
THANK YOU!
FfDL Github Page
https://github.com/IBM/FfDL
FfDL dwOpen Page
https://developer.ibm.com/code/open/projects/
fabric-for-deep-learning-ffdl/
FfDL Announcement Blog
http://developer.ibm.com/code/2018/03/20/
fabric-for-deep-learning
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20/
democratize-ai-with-fabric-for-deep-learning
Deep Learning as a Service within Watson Studio
https://www.ibm.com/cloud/deep-learning
Research paper: “Scalable Multi-Framework
Management of Deep Learning Training Jobs”
http://learningsys.org/nips17/assets/papers/
paper_29.pdf
FfDL
58

Weitere Àhnliche Inhalte

Was ist angesagt?

IBM Bluemix Paris Meetup #22-20170315 Meetup @VillagebyCA - Serverless & Open...
IBM Bluemix Paris Meetup #22-20170315 Meetup @VillagebyCA - Serverless & Open...IBM Bluemix Paris Meetup #22-20170315 Meetup @VillagebyCA - Serverless & Open...
IBM Bluemix Paris Meetup #22-20170315 Meetup @VillagebyCA - Serverless & Open...
IBM France Lab
 

Was ist angesagt? (20)

Developing for Hybrid Cloud with Bluemix
Developing for Hybrid Cloud with BluemixDeveloping for Hybrid Cloud with Bluemix
Developing for Hybrid Cloud with Bluemix
 
Bluemix the digital innovation platform
Bluemix   the digital innovation platformBluemix   the digital innovation platform
Bluemix the digital innovation platform
 
Hybrid Cloud with IBM Bluemix, Docker and Open Stack
Hybrid Cloud with IBM Bluemix, Docker and Open StackHybrid Cloud with IBM Bluemix, Docker and Open Stack
Hybrid Cloud with IBM Bluemix, Docker and Open Stack
 
Cloud adoption patterns
Cloud adoption patternsCloud adoption patterns
Cloud adoption patterns
 
Bootstrap4XPages - an introduction
Bootstrap4XPages - an introductionBootstrap4XPages - an introduction
Bootstrap4XPages - an introduction
 
Bluemix presentation IBM Cloud Briefing in San Jose
Bluemix presentation IBM Cloud Briefing in San JoseBluemix presentation IBM Cloud Briefing in San Jose
Bluemix presentation IBM Cloud Briefing in San Jose
 
How to develop your first cloud-native Applications with Java
How to develop your first cloud-native Applications with JavaHow to develop your first cloud-native Applications with Java
How to develop your first cloud-native Applications with Java
 
A Node.js Developer's Guide to Bluemix
A Node.js Developer's Guide to BluemixA Node.js Developer's Guide to Bluemix
A Node.js Developer's Guide to Bluemix
 
IBM Bluemix & IoT Foundation
IBM Bluemix & IoT FoundationIBM Bluemix & IoT Foundation
IBM Bluemix & IoT Foundation
 
IBM Containers- Bluemix
IBM Containers- BluemixIBM Containers- Bluemix
IBM Containers- Bluemix
 
IBM Bluemix Paris Meetup #22-20170315 Meetup @VillagebyCA - Serverless & Open...
IBM Bluemix Paris Meetup #22-20170315 Meetup @VillagebyCA - Serverless & Open...IBM Bluemix Paris Meetup #22-20170315 Meetup @VillagebyCA - Serverless & Open...
IBM Bluemix Paris Meetup #22-20170315 Meetup @VillagebyCA - Serverless & Open...
 
IBM Bluemix OpenWhisk: IBM InterConnect 2017, Las Vegas, USA: Technical Strategy
IBM Bluemix OpenWhisk: IBM InterConnect 2017, Las Vegas, USA: Technical StrategyIBM Bluemix OpenWhisk: IBM InterConnect 2017, Las Vegas, USA: Technical Strategy
IBM Bluemix OpenWhisk: IBM InterConnect 2017, Las Vegas, USA: Technical Strategy
 
Serverless APIs with Apache OpenWhisk
Serverless APIs with Apache OpenWhiskServerless APIs with Apache OpenWhisk
Serverless APIs with Apache OpenWhisk
 
Go Cloud Native with IBM Bluemix Developer Console - GIDS17
Go Cloud Native with IBM Bluemix Developer Console - GIDS17Go Cloud Native with IBM Bluemix Developer Console - GIDS17
Go Cloud Native with IBM Bluemix Developer Console - GIDS17
 
The Developer's Journey through IBM Cloud Pak for Applications
The Developer's Journey through IBM Cloud Pak for ApplicationsThe Developer's Journey through IBM Cloud Pak for Applications
The Developer's Journey through IBM Cloud Pak for Applications
 
Out of the Blue: Getting started with IBM Bluemix development
Out of the Blue: Getting started with IBM Bluemix developmentOut of the Blue: Getting started with IBM Bluemix development
Out of the Blue: Getting started with IBM Bluemix development
 
NRB - BE MAINFRAME DAY 2017 - Case Study
NRB - BE MAINFRAME DAY 2017 - Case StudyNRB - BE MAINFRAME DAY 2017 - Case Study
NRB - BE MAINFRAME DAY 2017 - Case Study
 
Software Architecture Fundamentals Part-1-Architecture soft skills
Software Architecture Fundamentals Part-1-Architecture soft skillsSoftware Architecture Fundamentals Part-1-Architecture soft skills
Software Architecture Fundamentals Part-1-Architecture soft skills
 
Cloud Foundry - #IBMOTS 2016
Cloud Foundry - #IBMOTS 2016Cloud Foundry - #IBMOTS 2016
Cloud Foundry - #IBMOTS 2016
 
IBM Cloud Paks - IBM Cloud
IBM Cloud Paks - IBM CloudIBM Cloud Paks - IBM Cloud
IBM Cloud Paks - IBM Cloud
 

Ähnlich wie Fabric for Deep Learning

Ähnlich wie Fabric for Deep Learning (20)

Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Intel 20180608 v2
Intel 20180608 v2Intel 20180608 v2
Intel 20180608 v2
 
A reading of ibm research innovations - for 2018 and ahead
A reading of ibm research innovations - for 2018 and aheadA reading of ibm research innovations - for 2018 and ahead
A reading of ibm research innovations - for 2018 and ahead
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Intelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudIntelligent internet of things with Google Cloud
Intelligent internet of things with Google Cloud
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
 
How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...
 
Why Supercomputing matters to Deep Learning (DLAI D3L2 2017 UPC Deep Learning...
Why Supercomputing matters to Deep Learning (DLAI D3L2 2017 UPC Deep Learning...Why Supercomputing matters to Deep Learning (DLAI D3L2 2017 UPC Deep Learning...
Why Supercomputing matters to Deep Learning (DLAI D3L2 2017 UPC Deep Learning...
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchangeIBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
 
Northwestern 20181004 v9
Northwestern 20181004 v9Northwestern 20181004 v9
Northwestern 20181004 v9
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Scaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling DownScaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling Down
 
Digital Fabrication Studio v.0.2: Digital Fabrication and FabLab ecosystem
Digital Fabrication Studio v.0.2: Digital Fabrication and FabLab ecosystemDigital Fabrication Studio v.0.2: Digital Fabrication and FabLab ecosystem
Digital Fabrication Studio v.0.2: Digital Fabrication and FabLab ecosystem
 
Virtualization and containers
Virtualization and containersVirtualization and containers
Virtualization and containers
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Scaling up deep learning by scaling down
Scaling up deep learning by scaling downScaling up deep learning by scaling down
Scaling up deep learning by scaling down
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
IOT with Drupal 8 - Webinar Hyderabad Drupal Community
IOT with Drupal 8 -  Webinar Hyderabad Drupal CommunityIOT with Drupal 8 -  Webinar Hyderabad Drupal Community
IOT with Drupal 8 - Webinar Hyderabad Drupal Community
 
2019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v22019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v2
 
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018 Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
 

Mehr von Animesh Singh

Mehr von Animesh Singh (20)

Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPO
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AI
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAs a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
 
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
 
Finding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsFinding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User Groups
 

KĂŒrzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

KĂŒrzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Fabric for Deep Learning

  • 1. Fabric for Deep Learning FfDL FfDL Github Page https://github.com/IBM/FfDL FfDL dwOpen Page https://developer.ibm.com/code/open/projects/ fabric-for-deep-learning-ffdl/ FfDL Announcement Blog http://developer.ibm.com/code/2018/03/20/ fabric-for-deep-learning FfDL Technical Architecture Blog http://developer.ibm.com/code/2018/03/20/ democratize-ai-with-fabric-for-deep-learning Deep Learning as a Service within Watson Studio https://www.ibm.com/cloud/deep-learning Research paper: “Scalable Multi-Framework Management of Deep Learning Training Jobs” http://learningsys.org/nips17/assets/papers/ paper_29.pdf FfDL 1 Animesh Singh, Tommy Li @AnimeshSingh @Tomipli https://github.com/IBM/FfDL
  • 2. 
that automate decisions. 
to build models
Use data
 The Enterprise AI Process 2 Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model
  • 3. Center for Open Source Data and AI Technologies March 30 2018 / © 2018 IBM Corporation codait (French) = coder/coded https://m.interglot.com/fr/en/codaitCode - Build and improve practical frameworks to enable more developers to realize immediate value (e.g. FfDL, Tensorflow Jupyter, Spark) Content – Showcase solutions to complex and real world AI problems Community – Bring developers and data scientists together to engage Improving Enterprise AI lifecycle in Open Source Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow CODAIT codait.org 3
  • 4. Machine Learning! and AI ! are everywhere  4IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation facial recognition unlocks your phone fraud detection protects your credit recommendations help you shop faster speech recognition lets you go hands-free chat bots route calls quicker autonomous vehicles detect pedestrians machine vision detects cancer early spam detection unclogs your Inbox
  • 5. Deep Learning Has Revolutionized Machine Learning 5 Data Accuracy Deep Learning Traditional Machine Learning 100 80 60 40 20 0 # of Searches for Deep Learning from 2011 to 2017 Source: Google Trends. Search term “Deep Learning” 2011 2012 2013 2014 2015 2016 2017
  • 6. mile 2 mile 1 mile 3 
 Deep learning marathon! 
not a sprint mile 4 We are here! mile 26.2 IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 6
  • 7. 2012 AI winter AI summer 1985 AI spring Deep Learning + GPUs IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 7
  • 8. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 
 Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 8
  • 9. what’s slowing progress in deep learning? too few practitioners tools are young and evolving need to do more with less data IBM Watson Studio TensorBoard IBM Watson Studio 30 million students 8 million students IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 9
  • 10. A human brain has: ‱  200 billion neurons ‱  32 trillion connections between them ‱  25 million “neurons” ‱  100 million connections (parameters) Deep Learning = Training ArtiïŹcial Neural Networks IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 10
  • 11. What is an artiïŹcial neuron? IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 11 input neuron Output Think of them as calculators X2 X3 X1 Xn 
 neuron inputs Output = x1 + x2 + x3 + 
Xn
  • 12. How do humans recognize numbers?! IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 12
  • 13. Human brains detect patterns within variations! IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 13
  • 14. Perhaps by decomposing into sub-parts?! IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 14
  • 15. Source: https://ml4a.github.io/ml4a/neural_networks/ 28 pixels! 28 pixels! Pixels of an image capture variations in light and dark! IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 15
  • 16. How do we teach a computer these 784 pixels are the number 8?! Source: https://ml4a.github.io/ml4a/neural_networks/ IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 16
  • 17. p1 p2 p3 p784 
 784 pixels p4 p5 p6 p7 IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 17 Our ïŹrst layer of neurons One neuron per pixel in our image example
  • 18. Source: https://ml4a.github.io/ml4a/neural_networks/Putting it all together! IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 18
  • 19. Source: https://ml4a.github.io/ml4a/neural_networks/ output layer IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 19 Each layer transforms the input to match the desired output! prediction!
  • 20. Source: https://ml4a.github.io/ml4a/neural_networks/ Not just pixels. Text, sound and much more can be used an input!! Lorem ipsum dolor sit amet, nam id alterum principes cotidieque, at suas indoctum his. No inani soleat sed, per illum quaestio id. No prompta luptatum sit. His alii alterum feugiat ne. Eu delenit expetendis duo, no possit utamur patrioque mei. Admodum appellantur at quo, albucius periculis adolescens an mel, veri quaerendum sea ut.Eam noluisse copiosae democritum ei, cu eos. Lorem ipsum dolor sit Amet nam id alterum principes Cotidieque at suas indoctum eos IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 20 Some useful prediction like sentiment or even fraud!
  • 21. Backpropagation: Iteratively train a neuron X2 X3 X1 Xn 
 Wn W1 W2 W3 output neuron desired output Δ error / loss IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 21 optimizatio n function Adjust weights until the output matches expectation
  • 22. How does deep learning work? start with your data data data data data 1 data data Enter new data into your model If patterns in the new data match the training data then the model makes accurate predictions 5 prediction ??? trained model IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 22 DeïŹne a neural network 2 Model learns to recognize patterns in historical data 3 4
  • 23. GPU = Graphics Processing Unit IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 23
  • 24. Application ! Developer pre-trained model! SME transfer learning! data scientist custom models! your domain data + There are 3 paths to AI systems 1 2 3 your domain data + pre-Trained model + pre-trained model +
  • 27. Deploy to application 3) Create Custom Models Application Developer domain data data scientist custom model
  • 28. Take a Multi-Framework Approach to Deep Learning New frameworks emerging monthly. Tensorflow was awesome yesterday but has static graphs so PyTorch’s dynamic graphs are now popular. Caffe2 IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 28
  • 29. Neural Network Design WorkïŹ‚ow! domain data design neural network HPO ‱  neural network structure ‱  hyperparameters NO Performance meets needs? Start another experiment optimal hyperparameters
  • 30. Neural Network Design WorkïŹ‚ow! domain data HPO ‱  neural network structure ‱  hyperparameters NO yes Performance meets needs? Start another experiment trained model deployCloud optimal hyperparameters evaluate BAD Still good! design neural network
  • 31. Introducing Fabric for Deep Learning FfDL (pronounced as ïŹddle) Multi Framework Approach to Deep Learning, on your own Cloud 31
  • 32. Fabric for Deep Learning https://github.com/IBM/FfDL FfDL provides a scalable, resilient, and fault tolerant deep-learning framework FfDL Github Page https://github.com/IBM/FfDL FfDL dwOpen Page https://developer.ibm.com/code/open/projects/ fabric-for-deep-learning-ffdl/ FfDL Announcement Blog http://developer.ibm.com/code/2018/03/20/ fabric-for-deep-learning FfDL Technical Architecture Blog http://developer.ibm.com/code/2018/03/20/ democratize-ai-with-fabric-for-deep-learning Deep Learning as a Service within Watson Studio https://www.ibm.com/cloud/deep-learning Research paper: “Scalable Multi-Framework Management of Deep Learning Training Jobs” http://learningsys.org/nips17/assets/papers/ paper_29.pdf ‱  Fabric for Deep Learning or FfDL (pronounced as â€˜ïŹddle’) is an open source project which aims at making Deep Learning easily accessible to the people it matters the most i.e. Data Scientists, and AI developers. ‱  FfDL Provides a consistent way to deploy, train and visualize Deep Learning jobs across multiple frameworks like TensorFlow, Caffe, PyTorch, Keras etc. ‱  FfDL is being developed in close collaboration with IBM Research and IBM Watson. It forms the core of Watson`s Deep Learning service in open source. FfDL 32
  • 33. Fabric for Deep Learning https://github.com/IBM/FfDL FfDL is built using Microservices architecture on Kubernetes ‱  FfDL platform uses a microservices architecture to offer resilience, scalability, multi-tenancy, and security without modifying the deep learning frameworks, and with no or minimal changes to model code. ‱  FfDL control plane microservices are deployed as pods on Kubernetes to manage this cluster of GPU- and CPU-enabled machines effectively ‱  Tested Platforms: Minikube, IBM Cloud Public, IBM Cloud Private, GPUs using both Kubernetes feature gate Accelerators and NVidia device plugins 33
  • 34. source code training deïŹnition Access to elastic compute leveraging Kubernetes Auto-allocation means infrastructure is used only when needed Kubernetes container training artifacts compute cluster NVIDIA Tesla K80, P100, V100 Cloud Object Storage Training assets are managed and tracked. IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 34
  • 35. NVIDIA GPUs Kubernetes container orchestration training runs containers Model training distributed across containers server cluster dataset Cloud Object Storage 35
  • 39. And we offer more Model Asset Exchange MAX and Advarsarial Robustness Toolkit ART 39
  • 40. IBM Model Asset eXchange MAX MAX is a one stop exchange to ïŹnd ML/DL models created using popular Machine Learning engines and provides a standardized approach to consume these models for training and inferencing. 40 developer.ibm.com/code/exchanges/models/
  • 41. IBM Adversarial Robustness Toolkit ART ART is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attacks and defense methods for machine learning models. The Adversarial Robustness Toolbox provides an implementation for many state-of-the-art methods for attacking and defending classiïŹers. 41 https://developer.ibm.com/code/open/projects/ adversarial-robustness-toolbox/ The Adversarial Robustness Toolbox contains implementations of the following attacks: Deep Fool (Moosavi-Dezfooli et al., 2015) Fast Gradient Method (Goodfellow et al., 2014) Jacobian Saliency Map (Papernot et al., 2016) Universal Perturbation (Moosavi-Dezfooli et al., 2016) Virtual Adversarial Method (Moosavi-Dezfooli et al., 2015) C&W Attack (Carlini and Wagner, 2016) NewtonFool (Jang et al., 2017) The following defense methods are also supported: Feature squeezing (Xu et al., 2017) Spatial smoothing (Xu et al., 2017) Label smoothing (Warde-Farley and Goodfellow, 2016) Adversarial training (Szegedy et al., 2013) Virtual adversarial training (Miyato et al., 2017)
  • 42. FfDL Core of Deep Learning as a Service in Watson Studio 42
  • 43. Model Lifecycle Management Machine Learning Runtimes Deep Learning Runtimes Authoring Tools Cloud Infrastructure as a Service ‱ Most popular open source frameworks ‱ IBM best-in-class frameworks ‱ Create, collaborate, deploy, and monitor ‱ Best of breed open source & IBM tools ‱ Code (R, Python or Scala) and no-code/visual modeling tools ‱ Fully managed service ‱ Container-based resource management ‱ Elastic pay as you go cpu/gpu power Watson Studio Tools for supporting the end-to-end AI workflow
  • 44. 3 Train neural networks in parallel across NVIDIA GPUs. Pay only for what you use. Auto- deallocation means no more remembering to shutdown your cloud training instances. Monitor batch training experiments then compare cross-model performance without worrying about log transfers and scripts to visualize results. You focus on designing your neural networks. We’ll manage and track your assets. Python client, command line interface (CLI) or UI? You choose the tooling that best ïŹts your existing workflows. Training history and assets are tracked then automatically transferred to the customer’s Object Storage for quick access. Deploy models into production then monitor them to evaluate performance. Capture new data for continuous learning and retrain models so they continually adapt to changing conditions. Deep Learning as a Service within Watson Studio‹ Using FfDL as core‹
  • 45. Neural Network Modeller within Watson Studio‹ An intuitive drag-and-drop, no-code interface for designing neural network structure‹
  • 46. DLaaS Training Dashboard in Watson Studio‹
  • 48. OBJECT STORAGE REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Training Data Service Mongo DB Trainer Service EtcD ! Prometheus Push Gateway Alert Manager Log Collector ELK Stack Web UI Training Job Model DeïŹnition ‹ Training Data Trained Models Launch Training Job ! ! FfDL: Current Release
  • 49. OBJECT STORAGE REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Training Data Service Mongo DB Trainer Service EtcD ! Prometheus Push Gateway Alert Manager Log Collector ELK Stack Web UI Training Job Model DeïŹnition ‹ Training Data Trained Models Launch Training Job ! ! FfDL: Current Release REST API ‱  The REST API microservice handles REST-level HTTP requests and acts as proxy to the lower- level gRPC Trainer service. ‱  The service also load-balances requests and is responsible for authenHcaHon. Load balancing is implemented by registering the REST API service instances dynamically in a service registry. ‱  The interface is speciïŹed through a Swagger deïŹniHon ïŹle. REST API
  • 50. OBJECT STORAGE REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Training Data Service Mongo DB Trainer Service EtcD ! Prometheus Push Gateway Alert Manager Log Collector ELK Stack Web UI Training Job Model DeïŹnition ‹ Training Data Trained Models Launch Training Job ! ! FfDL: Current Release Trainer Service Trainer ‱  The Trainer service admits training job requests, persisHng metadata and model input conïŹguraHon in a database (MongoDB). ‱  It iniHates job deployment, halHng, and (user- requested) job terminaHon by calling the appropriate gRPC methods on the Lifecycle Manager microservice. ‱  The Trainer also assigns a unique idenHïŹer to each job, which is used by all other components to track the job. ‱  The data can also be used for billing/chargeback purposes
  • 51. OBJECT STORAGE REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Training Data Service Mongo DB Trainer Service EtcD ! Prometheus Push Gateway Alert Manager Log Collector ELK Stack Web UI Training Job Model DeïŹnition ‹ Training Data Trained Models Launch Training Job ! ! FfDL: Current Release Lifecycle Manager Lifecycle Manager ‱  The Lifecycle Manager (LCM) deploys training jobs arriving from the Trainer, halHng (pausing) and terminaHng training jobs. ‱  LCM uses the Kubernetes cluster manager to deploy containerized training jobs. ‱  A training job is a set of interconnected Kubernetes pods, each containing one or more Docker containers.
  • 52. OBJECT STORAGE REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Training Data Service Mongo DB Trainer Service EtcD ! Prometheus Push Gateway Alert Manager Log Collector ELK Stack Web UI Training Job Model DeïŹnition ‹ Training Data Trained Models Launch Training Job ! ! FfDL: Current Release Lifecycle Manager Training Jobs - Learner Pods ‱  The LCM determines the learner pods, parameter servers, and interconnecHons among them based on the job conïŹguraHon, and calls on Kubernetes for deployment. ‱  For example, if a user creates a TensorïŹ‚ow training job with four learners and two CPUs/ GPUs per learner, the LCM creates ïŹve pods: one for each learner (called the learner pod), and one monitoring pod called the job monitor. ‱  As the training job progresses, informaHon is needed for evaluaHon of the ongoing success or failure of the learning progress. These metrics normally come in the form of scalar values, and are termed evaluaHon metrics Parameter Server Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Log Collector Training Job
  • 53. OBJECT STORAGE REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Training Data Service Mongo DB Trainer Service EtcD ! Prometheus Push Gateway Alert Manager Log Collector ELK Stack Web UI Training Job Model DeïŹnition ‹ Training Data Trained Models Launch Training Job ! ! FfDL: Current Release Training Data Service Training Data Service ‱  The Training Data Service (TDS) provides short- lived storage and retrieval for logs and evaluaHon data from a Deep Learning training job. ‱  While the learning job is running, a process runs as a sidecar to extract the training data from the learner, and then pushes that data into the TDS, which pushes the data into ElasHc Search. ‱  The sidecars used for collecHng training data are termed log-collectors. Depending on the framework and desired extracHon method, diïŹ€erent types of log-collectors can be used. Log- collectors responsibiliHes include at least both log line collecHon, and evaluaHon metrics extracHon.
  • 54. OBJECT STORAGE Model DeïŹnition ‹ Training Data Trained Models REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Training Data Mongo DB Trainer Service Model DeïŹnition Training Data Trained Models EtcD Launch Job Status Job Info ! Prometheus Push Gateway Alert Manager Log Collector ELK Stack Web UI Training Job ! ! FfDL: Current Release
  • 55. OBJECT STORAGE Model DeïŹnition ‹ Training Data Trained Models REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Job Monitor Training Data Mongo DB Trainer Service Model DeïŹnition Training Data Trained Models EtcD Launch Job Status Job Info ! Prometheus Push Gateway Alert Manager Log Collector ELK Stack Web UI Training Job ! ! FfDL: Current Release
  • 56. OBJECT STORAGE Model DeïŹnition ‹ Training Data Trained Models REST API CLIs SDKs Browser Parameter Server Lifecycle Manager Job Monitor Training Data Mongo DB Trainer Service EtcD Launch Job Status Job Info ! Prometheus Push Gateway Alert Manager ELK Stack Web UI Horovod Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Log Collector Training Job MOUNT OBJECT STORAGE ! ! FfDL: Next Release (v0.1)
  • 57. Demos Animesh Singh, Tommy Li @AnimeshSingh @Tomipli 57
  • 58. THANK YOU! FfDL Github Page https://github.com/IBM/FfDL FfDL dwOpen Page https://developer.ibm.com/code/open/projects/ fabric-for-deep-learning-ffdl/ FfDL Announcement Blog http://developer.ibm.com/code/2018/03/20/ fabric-for-deep-learning FfDL Technical Architecture Blog http://developer.ibm.com/code/2018/03/20/ democratize-ai-with-fabric-for-deep-learning Deep Learning as a Service within Watson Studio https://www.ibm.com/cloud/deep-learning Research paper: “Scalable Multi-Framework Management of Deep Learning Training Jobs” http://learningsys.org/nips17/assets/papers/ paper_29.pdf FfDL 58