SlideShare ist ein Scribd-Unternehmen logo
1 von 115
Downloaden Sie, um offline zu lesen
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AI & Machine Learning Pipelines with Knative
@AnimeshSingh @Tomipli
≈
Center for Open Source
Data and AI
Technologies (CODAIT)
Code – Build and improve practical frameworks to enable
more developers to realize immediate value.
Content – Showcase solutions for complex and
real-world AI problems.
Community – Bring developers and data
scientists to engage with IBM
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
Improving Enterprise AI lifecycle in
Open Source
•  Team	contributes	to	over	10	open	source	projects
•  17	committers	and	many	contributors	in	Apache	projects	
•  Over	1100	JIRAs	and	66,000	lines	of	code	committed	to	Apache	Spark	itself;	over	65,000	
LoC	into	SystemML			
•  Over	25	product	lines	within	IBM	leveraging	Apache	Spark	
•  Speakers	at	over	100	conferences,	meetups,	unconferences	and	more
CODAIT
codait.org
Agenda
3
•  Progress in ML and DL
•  AI and Cloud: Complimentary
•  Why we need Knative to build AI
Platform
•  How to build a transparent and
trusted AI platform leveraging
Knative
•  Demo
CODAIT
codait.org
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
4
1997
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
5
2011
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
6
2017
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
7
2017
2018
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 8
A human brain has:
•  200 billion neurons
•  32 trillion connections between them
•  25 million “neurons”
•  100 million connections (parameters)
Deep Learning = Training Artificial Neural Networks
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 9
Neural Network Design Workflow! domain
data
design
neural network
 HPO
•  neural network structure
•  hyperparameters
NO
Performance
meets needs?
Start another
experiment
 optimal
hyperparameters
Neural Network Design Workflow! domain
data
HPO
•  neural network structure
•  hyperparameters
NO
yes
Performance
meets needs?
Start another
experiment
trained
model
deployCloud
optimal
hyperparameters
evaluate
BAD
 Still
good!
design
neural network
So AI in general and
Deep Learning in
particular are very
iterative and repetitive.
And they need Cloud.
Why?
AI requires
the strength
of HPC &
GPUs
Ability to
scale AI
workloads on
demand
15
!
!
1.  Model/Data
Parallelism!
!
2.  MPI!
3.  NCCL!
4.  …..!
!
Ability to utilize various
technologies and achieve high
performance computing.
16
!
!
Microservices!
!
Containers!
!
DevOps
automation!
To scale we need to go
Cloud native for AI
API!
UI!
CLI!
Kubernetes !
!
Master!
Worker Node 1!
Worker Node 2!
Worker Node 3!
Worker Node n!
Registry
•  Etcd
•  API Server
•  Controller Manager
Server
•  Scheduler Server
And Cloud native means we use Kubernetes!!
18	
But is Kubernetes enough for Cloud Native platform?!
19	
Kubernetes is not the end game – Says who?!
20	
What else do we need? Let`s understand it from the context of an AI Lifecycle!
AI		
PLATFORM
AIOps 
Prepared
and
Analyzed
Data
AI		
PLATFORM	
Initial Model
Trained
Model
Deployed
Model
We need a Cloud native AI Platform to build, train, deploy and monitor Models!
AIOps 
Prepared
and
Analyzed
Data
AI		
PLATFORM	
Trained
Model
Deployed
Model
Also, we need a transparent and trusted AI Platform!
Prepared
and
Analyzed
Data
AI		
PLATFORM	
Initial Model
Deployed
Model
Is	model	training	is	showing	increasing	
loss?	
Are	model	weights		biased?	
Is	the	model	vulnerable	to	adversarial	
attacks?	
Has	the	training	data	changed?	
Are	the	model	predictions	less	accurate?	
Are	Hyperparameters	suboptimal?	
Are	predictions	biased?	
Is	the	dataset	biased?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Infact, we need a transparent, trusted and automated AI Pipeline!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Is	model	training	is	showing	increasing	
loss?	
Are	model	weights		biased?	
Is	the	model	vulnerable	to	adversarial	
attacks?	
Has	the	training	data	changed?	
Are	ate	model	predictions	less	accurate?	
Are	Hyperparameters	suboptimal?	
Are	predictions	biased?	
Is	the	dataset	biased?
Trigger:	Trained	Model	is	showing	increasing	loss		
Prepare Data
AIOps 
Prepared
and
Analyzed
Data
Initial Model
Trained
Model
Deployed
Model
Train
Deploy
Harden
Train
Deploy
Trigger:	Model	performance	is	suboptimal	
Implement Defense
Train
Deploy
Trigger:	Model	is	vulnerable	to	attack	
Trigger:	Data	changed	
Optimize
Hyperparameters
Train
Trigger:	Bias	Detected	
Actions	
Transparent, trusted, automated and event driven AI Pipeline!
Trigger:	Trained	Model	is	showing	increasing	loss		
Prepare Data
AIOps 
Prepared
and
Analyzed
Data
Initial Model
Trained
Model
Deployed
Model
Train
Deploy
Harden
Train
Deploy
Trigger:	Model	performance	is	suboptimal	
Implement Defense
Train
Deploy
Trigger:	Model	is	vulnerable	to	attack	
Trigger:	Data	changed	
Optimize
Hyperparameters
Train
Trigger:	Bias	Detected	
Actions	
Transparent, trusted, automated, event driven and auditable AI Pipeline!
Trigger:	Trained	Model	is	showing	increasing	loss		
Prepare Data
AIOps 
Prepared
and
Analyzed
Data
Initial Model
Trained
Model
Deployed
Model
Train
Deploy
Debias
Train
Deploy
Trigger:	Model	performance	is	biased	
Implement Defense
Train
Deploy
Trigger:	Model	is	vulnerable	to	attack	
Trigger:	Data	changed	
Optimize
Hyperparameters
Train
Trigger:	Bias	Detected	
Data	Scientist	
AIOps	Engineer	
Actions	
Transparent, trusted, automated, event driven, auditable AI Pipeline as a Service!


Training Pipe











Model
Validation
Pipe






If we translate it to logical architecture, it looks like this…!










Data Pipe



















Model
Deployment
Pipe




















Deployment
Analysis
Pipe







AI Pipeline (Python Definition – Orchestrate and Track)
AI Developer and Data Scientist
 AI Operator




Python
Function











Python
Function











Python
Function











Python
Function











Python
Function
So we need more than Kubernetes. We need to be able to……!
build Data
Scientists
Code
orchestrate
the ML
code
automate
the ML
workflow
send event
notifications
Data	Scientist	
AIOps	Engineer
That means, we need the concepts of..!
Build
 Serving
Pipeline
Eventing
So……We need KNative!
Build
 Serving
Pipeline
Eventing
● Build
● Eventing
● Serving
● Pipeline
KNative provides a set of building blocks that enable modern, source-centric and container-based
serverless workloads on Kubernetes. Uses K8S CRDs to define a new set of APIs around
Well….what is Knative?!
Build
 Serving
Pipeline
Eventing
Knative Build Components
•  Build
•  Builder
•  BuildTemplate
For example, you can write a build that uses
Kubernetes-native resources to obtain your source
code from a repository, build a container image, then
run that image.
•  A Build can include multiple steps where each step specifies
a Builder.
•  A Builder is a type of container image that you create to
accomplish any task, whether that's a single step in a
process, or the whole process itself.
•  The steps in a Build can push to a repository.
•  A BuildTemplate can be used to defined reusable
parameterized templates.
Knative Build!
Build — Source-to-container build orchestration
KNative Eventing components
●  Sources — Sources that are firing events
●  Channels — A single event forwarding and
persistence layer
●  Subscriptions — Deliver and forward
events to channels/services.
	
	
Knative Eventing is designed around the
following goals:
●  Knative	Eventing	services	are	loosely	coupled.	
●  Event	producers	and	event	sources	are	independent.		
●  Other	services	can	be	connected	to	the	Eventing	
system..	
●  Ensure	cross-service	interoperability.	Knative	
Eventing	is	consistent	with	the	CloudEvents	
specification	that	is	developed	by	the	CNCF	
Serverless	WG.	
Knative Eventing!
Eventing — Delivery and management of events, universal subscription, binding services to event
ecosystems,
Knative Pipeline!
Knative Pipeline components
● Task — A collection of sequential steps you would want
to run as part of your CI flow. It including the inputs/
outputs and steps
● Pipeline — A graph of Tasks to execute.
● Runs — To invoke a Pipeline or a Task.
	
High	level	details	of	this	design:	
	
•  Pipelines	do	not	know	what	will	trigger	them,	they	can	be	triggered	
by	events	or	by	manually	creating	PipelineRuns	
•  Tasks	can	exist	and	be	invoked	completely	independently	of	
Pipelines	
•  Test	results	are	a	first	class	concept,	being	able	to	navigate	test	
results	easily	is	powerful		
•  Tasks	can	depend	on	artifacts,	output	and	parameters	created	by	
other	tasks.	
•  Resources	are	the	artifacts	used	as	inputs	and	outputs	of	TaskRuns.	
Pipeline— Configure	and	run	CI/CD	style	pipelines	for	your	kubernetes	application
●  Configuration
○  Desired current state of deployment (#HEAD)
○  Records both code and configuration (separated, ala 12 factor)
○  Stamps out builds / revisions as it is updated
●  Revision
○  Code and configuration snapshot
○  k8s infra: Deployment, ReplicaSet, Pods, etc
●  Route
○  Traffic assignment to Revisions (fractional scaling or by name)
○  Built using Istio
●  Service
○  Provides a simple entry point for UI and CLI tooling to achieve
common behavior
○  Acts as a top-level controller to orchestrate Route and
Configuration.
Configuration
Revision
Revision
Revision
Route
latest
explicit
creates
Service	
creates
creates
Knative Serving!
Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing
traffic	
Knative Serving components
Revision	
Serving
Knative Serving!
Deployment	
Revision	
Serving
Knative Serving Building Block!
Worker Node	 Worker Node	
Deployment	
Replica Set	
Revision	
Serving
Knative Serving Building Block!
Worker Node	
Pod	
Worker Node	
Deployment	
Replica Set	
Revision	
Serving
Knative Serving Building Block!
Worker Node	
Pod	
Worker Node	
Deployment	
Replica Set	
Revision	
Pod	
Serving
Knative Serving Building Block!
Worker Node	 Worker Node	
Deployment	
Revision	
Replica Set	
Serving
Knative Serving Building Block!
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Revision 5	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Revision 5	
Route	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Revision 5	
Route	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Revision 5	
Route	
5%	
95%	
Serving
Knative Serving Building Block!
Route	
Configuration	 Revision	
Creates	
References	
References	
Serving
Knative Serving Building Block!
Route	
Configuration	 Revision	
Creates	
References	
References	
Service	
Knative Serving Building Block!
Route	
Configuration	 Revision	
Creates	
References	
References	
Creates	
Service	
Creates	
Knative Serving Building Block!
● Build — Source-to-container build orchestration
● Eventing — Universal subscription, binding services to event ecosystems, delivery and management of events
● Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing traffic
● Pipeline — Configure and run CI/CD style pipelines for your kubernetes application.
.
So Knative uses K8S CRDs to define a new set of APIs
!
Build
 Serving
Pipeline
Eventing
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
But do we really need something like Knative to build an ML Platform?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Let`s go through different phases of AI Lifecycle through our use cases…!
Many tools available to build initial models!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Many tools to train machine learning and deep learning models!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
We need a multi framework ML- DL cloud native platform !
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
e.g. Tensorflow is awesome ,
but has static graphs so
PyTorch’s dynamic graphs are
becoming more popular.
How do we handle multiple
Open Source Deep Learning
frameworks in a consistent way.
and leverage
the power of cloud for
distributed computing?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Enter: Fabric for Deep Learning!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
FfDL
Fabric for Deep Learning
https://github.com/IBM/FfDL
FfDL Github Page
https://github.com/IBM/FfDL
FfDL dwOpen Page
https://developer.ibm.com/code/open/projects/
fabric-for-deep-learning-ffdl/
FfDL Announcement Blog
http://developer.ibm.com/code/2018/03/20/
fabric-for-deep-learning
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20/
democratize-ai-with-fabric-for-deep-learning
Deep Learning as a Service within Watson Studio
https://www.ibm.com/cloud/deep-learning
Research paper: “Scalable Multi-Framework
Management of Deep Learning Training Jobs”
http://learningsys.org/nips17/assets/papers/
paper_29.pdf
•  Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’
aims at making Deep Learning easily accessible to Data
Scientists, and AI developers.
•  FfDL Provides a consistent way to train and visualize Deep
Learning jobs across multiple frameworks like TensorFlow,
Caffe, PyTorch, Keras etc.
FfDL
58
Community Partners
FfDL is one of InfoWorld’s 2018 Best of Open
Source Software Award winners for machine
learning and deep learning!
Fabric for Deep Learning
https://github.com/IBM/FfDL
FfDL is built using Microservices architecture
on Kubernetes
•  FfDL platform uses a microservices architecture to offer
resilience, scalability, multi-tenancy, and security without
modifying the deep learning frameworks, and with no or
minimal changes to model code.
•  FfDL control plane microservices are deployed as pods on
Kubernetes to manage this cluster of GPU- and CPU-
enabled machines effectively
•  Tested Platforms: Kube DIND, IBM Cloud Public, IBM Cloud
Private, GPUs using both Kubernetes feature gate
Accelerators and NVidia device plugins
59
source code
training
definition
Access to elastic compute leveraging Kubernetes
Auto-allocation means infrastructure is used only when needed
Kubernetes container
training
artifacts
compute cluster
NVIDIA Tesla K80, P100, V100
Cloud Object Storage
Training assets are
managed and tracked.
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 60
NVIDIA GPUs
Kubernetes
container orchestration

training runs
containers
Model training distributed across containers
server cluster
dataset
Cloud Object Storage
61
OBJECT
STORAGE	
Model
Definition


Training
Data

Trained
Models


REST 
API
Parameter
Server
Lifecycle 
Manager
Job Monitor
Training
Data
Mongo
DB
Trainer 
Service
Launch
Job
Status
Job Info
"

Prometheus
Push Gateway
Alert Manager


Web UI
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Log Collector
Training Job
MOUNT	
OBJECT	
STORAGE	
BUCKET	
Elastic
Searc
h
FfDL: Architecture - Current Release!
Open MPI
CLIs
Browser
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Great , but we also need a batch scheduler for AI Workloads. Enter Kube-Batch!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
kube-batch
OBJECT
STORAGE	
Model
Definition


Training
Data

Trained
Models


REST 
API
Parameter
Server
Job Monitor
Mongo
DB
Trainer 
Service
Launch
Job
Job Info
"

Prometheus
Push Gateway
Alert Manager


Web UI
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Log Collector
Training Job
MOUNT	
OBJECT	
STORAGE	
BUCKET	
Elastic
Searc
h
Open MPI
CLIs
Browser
Scheduling	
Engine	
Kube-
Batch	
QueueJob	
Advanced Batch Scheduling with Kube-Batch !Advanced Batch Scheduling with Kube-Batch !
!
!
!
!
!
!
!
!
!
!
!
!
Kube-Batch: Filling gaps for FfDL and AI Workloads!
•  Job Queuing for enabling job priority and preemption
•  Holistic scheduling
•  Support for job dynamic prioritization/budgeting
•  User requested time based scheduling
•  Preemption/resumption in a supported checkpoint/
restart env
•  Topology aware placement (data intensive workloads
requiring network topology aware placement)
•  Partial preemption of elastic jobs
•  Performance estimation
•  Kube-Batch
–  Kubernetes incubator project
–  Introduces batch scheduling in Kubernetes
–  Support for simple job definition and lifecycle
management
–  Support for job queuing
–  Introduces queue-based quota management
12 Watson
services/apps
represented as
800+
Kubernetes
services
IBM Watson workloads:
Proven AI workload on IBM
Cloud Kubernetes Service
One deployment example:
3000+ pods on
500+ nodes
“We no longer worry about managing the
infrastructure because IBM Cloud
Kubernetes Service takes care of that
for us.” – Watson Project Team
Jason McGee / © 2018 IBM Corporation
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Training is accomplished. Model is ready – Can we trust it?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Can	the	model	be	trusted?
Is it fair?
Is it easy to
understand?
Did anyone
tamper with it? Is it accountable?
#21, #32, #93	
#21, #32, #93	
What does it take to trust a decision made by a machine?!
(Other than that it is 99% accurate)?!
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
So let`s start with vulnerability detection of Models?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Is	the	model	vulnerable	to	
adversarial	attacks?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Enter: Adversarial Robustness Toolbox!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
ART
IBM Adversarial Robustness
Toolbox
ART
ART is a library dedicated to adversarial
machine learning. Its purpose is to allow rapid
crafting and analysis of attack and defense
methods for machine learning models. The
Adversarial Robustness Toolbox provides an
implementation for many state-of-the-art
methods for attacking and defending
classifiers.
71
https://github.com/IBM/adversarial-robustness-
toolbox
The Adversarial Robustness Toolbox contains
implementations of the following attacks:
Deep Fool (Moosavi-Dezfooli et al., 2015)
Fast Gradient Method (Goodfellow et al., 2014)
Jacobian Saliency Map (Papernot et al., 2016)
Universal Perturbation (Moosavi-Dezfooli et al., 2016)
Virtual Adversarial Method (Moosavi-Dezfooli et al.,
2015)
C&W Attack (Carlini and Wagner, 2016)
NewtonFool (Jang et al., 2017)
The following defense methods are also supported:
Feature squeezing (Xu et al., 2017)
Spatial smoothing (Xu et al., 2017)
Label smoothing (Warde-Farley and Goodfellow, 2016)
Adversarial training (Szegedy et al., 2013)
Virtual adversarial training (Miyato et al., 2017)
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Robustness check accomplished. How do we check for bias throughout lifecycle?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Are	model	weights		
and	classifiers	biased?	
Are	predictions	
biased?	
Is	the	dataset	biased?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Enter: AI Fairness 360!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIF360
AI Fairness 360
https://github.com/IBM/AIF360
AIF360AIF360 toolkit is an open-source library to
help detect and remove bias in machine
learning models.
The AI Fairness 360 Python package includes
a comprehensive set of metrics for datasets
and models to test for biases, explanations for
these metrics, and algorithms to mitigate bias
in datasets and models.
Toolbox
Fairness metrics (30+)
Fairness metric explanations
Bias mitigation algorithms (9+)
74
Supported bias mitigation algorithms
Optimized Preprocessing (Calmon et al., 2017)
Disparate Impact Remover (Feldman et al., 2015)
Equalized Odds Postprocessing (Hardt et al., 2016)
Reweighing (Kamiran and Calders, 2012)
Reject Option Classification (Kamiran et al., 2012)
Prejudice Remover Regularizer (Kamishima et al., 2012)
Calibrated Equalized Odds Postprocessing (Pleiss et al.,
2017)
Learning Fair Representations (Zemel et al., 2013)
Adversarial Debiasing (Zhang et al., 2018)
Supported fairness metrics
Comprehensive set of group fairness metrics derived
from selection rates and error rates
Comprehensive set of sample distortion metrics
Generalized Entropy Index (Speicher et al., 2018)
(d’Alessandro et al., 2017)
AIF 360 detects for fairness in building and deploying models throughout AI
Lifecycle!
dataset
metric
pre-
processing
algorithm in-
processing
algorithm
post-
processing
algorithm
classifier
metric
classifier
metric
explainer
dataset
metric
explainer
With Metrics, Algorithms, and Explainers!
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Model is trained, tested and validated. Just deploy as microservices on
Kubernetes? Do we need anything else?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Model is trained, tested and validated. Just deploy as microservices on
Kubernetes? Do we need anything else?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
q  As I develop multiple versions
of my models, how can I
easily dark launch and shift
traffic?
q  How can I add and enforce
policies on my model
microservices?
q  The network among microservices
is not reliable.
q  How can I monitor and trace my
model microservices?
q  How can I ensure the
communication among
microservices is secure?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Enter Istio!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
An open service mesh platform to connect, observe,
secure, and control microservices.
Istio!
Connect: Traffic Control, Discovery,
Load Balancing, Resiliency
Observe: Metrics, Logging, Tracing
Secure: Encryption (TLS),
Authentication, and Authorization of
service-to-service communication
Control: Policy Enforcement
Istio!
BA
call
How does it work?

!
1. Deploy a proxy (Envoy) beside your application (“sidecar deployment”)
Env
oy
A
Envoy
Env
oy
B
Envoy
call
How does it work?

!
2. Deploy Pilot to configure the sidecars
Envoy
Pilot
config
Env
oy
A
Envoy
Env
oy
B
Envoy
How does it work?

!
3. Deploy Telemetry to get telemetry
Envoy
Pilot
Env
oy
A
Envoy
Env
oy
B
Envoy
Envoy
Telemetry
telemetry
How does it work?

!
4. Deploy Citadel to assign identities and enable secure communication
Envoy
Pilot
Env
oy
A
Envoy
Env
oy
B
Envoy
Envoy
Telemetry
Envoy
Citadel
How does it work?

!
5. Deploy Policy to enforce policies
Envoy
Pilot
policy decisions
Envoy
A
Envoy Envoy
B
Envoy
Envoy
Policy
Envoy
Telemetry
Envoy
Citadel
How does it work?

!
Pod	
Traffic Routing
50%	
50%	
Traffic Splitting
Pod
user	=	jason	
Traffic Steering
Pod
100%	
Access Policy
Pod
And….Istio is part of Knative!!!!
AIOps 
Model is deployed. We need to ensure triggers and actions can be defined based
on any vulnerability detection, dataset changes, bias detection etc… !
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Enter Apache OpenWhisk!
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Triggers
(response)
Rules
Actions
(code)
Source
(events) Results
OpenWhisk	Apache OpenWhisk!
FaaS platform to
execute code in
response to events!
Delivered as

Open source via Apache
openwhisk.org
OpenWhisk	Apache OpenWhisk!
Event	
Provider	 Periodic	 IBM	Cloudant Message	Hub	
Mobile	Push	 Github	
OpenWhisk	
IBM	App	Connect	
OpenWhisk!
98
And….OpenWhisk is being developed to work with Knative!!!!
AIOps 
And last but not the least, what do we use for Pipeline orchestration?!
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
100
•  Currently in IBM we use IBM DevOps toolchain pipeline, built on top of Jenkins
•  On open source, we are investigating, Pachyderm, Argo and Airflow. 
•  Pachyderm and Argo are Dataflow and Workflow orchestrators on Kubernetes
•  Both have strong merits, and provide a DevOps Operator view around pipeline execution
times, logs, metrics, exit handlers, notification systems etc.
•  As discussed, Knative is now evolving its own Pipelining capabilities, though in early stage 
AI Pipeline Orchestration!
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
AIOps platform is ready. How do Data Scientists interact with it using Notebooks?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Jupyter
Enterprise
Gateway
Jupyter Enterprise Gateway
March 30 2018 / © 2018 IBM Corporation
Jupyter Enterprise Gateway at IBM Code
https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter-incubator/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
102
A lightweight, multi-tenant, scalable and secure
gateway that enables Jupyter Notebooks to
share resources across an Apache Spark or
Kubernetes cluster for Enterprise/Cloud use
cases
Kernel
Kernel
Kernel
Kernel
Kernel
KernelKernel
Jupyter Notebooks: Support for FfDL	
User Experience
103	
•  User	select	where	to	run	the	experiment	
•  Job	is	packaged	and	submitted	on	behalf	of	
user	
•  User	has	access	to	Job	Console	to	monitor	
experiment
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Together: A Transparent, and trusted Open Source AI Pipeline using Knative
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
FfDL kube-batch
Jupyter Enterprise Gateway
MAX
AIF360 AIF360
Istio OpenWhisk
ART
DEMO
105
CODAIT
codait.org
Demo Model: Gender Classification!
Data	
UTKFace	
Simple	convolutional	neural	network	(CNN)		
with	3	convolutional	layers		
and	2	fully	connected	layers	
Male	
	
	
	
Female	
SoftMax	
Confident	Score
Demo Flow!
Fabric for Deep Learning
Data	
Robustness	
Check	
Model	
Deployment	
ADVERSARIAL ROBUSTNESS 
TOOLBOX
Fairness	
Check	
AI Fairness 360
Model		
Revision	1	
10%	
90%	
Model		
Revision	2	
Gender Classification Model
Source to Container:
Knative Build
ML
Code
Training
To fully automate the workflow, Knative Pipeline can be used with
Knative Eventing!
Fabric for Deep Learning
Data	
Robustness	
Check	
Model	
Deployment	
ADVERSARIAL ROBUSTNESS 
TOOLBOX
Fairness	
Check	
AI Fairness 360
Model		
Revision	1	
10%	
90%	
Model		
Revision	2	
Gender Classification Model
Source to Container:
Knative Build
ML
Code
Training	
Eventing
Knative Eventing Sources, Channel and Subscription can be defined!
Knative Pipeline Tasks and Flow Definition can be defined!
For this demonstration, we used Argo Workflows for Pipelining!
•  Knative brings serveless actions, events, workflows, apps, containers, service-mesh in one single
stack
•  Knative eliminates the need for stitching various individual components together (events, actions,
service mesh, pipeline etc.)
•  Knative Build and Serving are simple to use and integrate with our existing containers.
•  Traffic Routing and Canary testing on Knative can be done without any Istio knowledge.
•  Built-in monitoring tools are very helpful to visualize the benefits of building services using Knative.
•  Knative Eventing shows promise, needs to evolve more for production
•  Knative Build-Pipeline is still in very early stage, and changing rapidly.
•  Knative Build-Pipeline could get very complicated with many tasks and it needs a better debugging
tool/GUI to monitor errors and visualize the pipeline flow.
What we learnt!
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Bringing it together: A Secure, transparent, and trusted Open Source AI Pipeline
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
FfDL kube-batch
Jupyter Enterprise Gateway
MAX
AIF360 AIF360
Istio OpenWhisk
ART
Links:!
•  Knative: https://github.com/knative
•  Fabric for Deep Learning: https://github.com/IBM/FfDL
•  Adversarial Robustness Toolbox: https://github.com/IBM/adversarial-robustness-toolbox
•  AI Fairness 360: http://aif360.mybluemix.net/
•  Jupyter Enterprise Gateway: https://github.com/jupyter/enterprise_gateway
•  Kube-Batch: https://github.com/kubernetes-sigs/kube-batch
•  Istio: https://github.com/istio/istio
•  OpenWhisk: https://github.com/apache/incubator-openwhisk
•  Kiali: https://github.com/kiali/kiali
•  Argo: https://github.com/argoproj/argo
•  PyTorch: https://pytorch.org/
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
@AnimeshSingh @Tomipli
Thank You!
AI & Machine Learning Pipelines with Knative

Weitere ähnliche Inhalte

Was ist angesagt?

Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with BackstageOpsta
 
CI with Gitlab & Docker
CI with Gitlab & DockerCI with Gitlab & Docker
CI with Gitlab & DockerJoerg Henning
 
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Amazon Web Services
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Janusz Nowak
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Weaveworks
 
AWS Black Belt Online Seminar AWS Direct Connect
AWS Black Belt Online Seminar AWS Direct ConnectAWS Black Belt Online Seminar AWS Direct Connect
AWS Black Belt Online Seminar AWS Direct ConnectAmazon Web Services Japan
 
Infrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using TerraformInfrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using TerraformAdin Ermie
 
CI and CD with Jenkins
CI and CD with JenkinsCI and CD with Jenkins
CI and CD with JenkinsMartin Málek
 
Terraform: An Overview & Introduction
Terraform: An Overview & IntroductionTerraform: An Overview & Introduction
Terraform: An Overview & IntroductionLee Trout
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformKangaroot
 
Kubernetes - Security Journey
Kubernetes - Security JourneyKubernetes - Security Journey
Kubernetes - Security JourneyJerry Jalava
 
Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...
Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...
Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...Claus Ibsen
 
Deploying Azure DevOps using Terraform
Deploying Azure DevOps using TerraformDeploying Azure DevOps using Terraform
Deploying Azure DevOps using TerraformAdin Ermie
 
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...SlideTeam
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesAlexei Ledenev
 
Docker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker, Inc.
 

Was ist angesagt? (20)

Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
CI with Gitlab & Docker
CI with Gitlab & DockerCI with Gitlab & Docker
CI with Gitlab & Docker
 
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)
 
AWS Black Belt Online Seminar AWS Direct Connect
AWS Black Belt Online Seminar AWS Direct ConnectAWS Black Belt Online Seminar AWS Direct Connect
AWS Black Belt Online Seminar AWS Direct Connect
 
Infrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using TerraformInfrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using Terraform
 
CI and CD with Jenkins
CI and CD with JenkinsCI and CD with Jenkins
CI and CD with Jenkins
 
Terraform: An Overview & Introduction
Terraform: An Overview & IntroductionTerraform: An Overview & Introduction
Terraform: An Overview & Introduction
 
Terraform
TerraformTerraform
Terraform
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platform
 
Kubernetes - Security Journey
Kubernetes - Security JourneyKubernetes - Security Journey
Kubernetes - Security Journey
 
Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...
Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...
Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...
 
Deploying Azure DevOps using Terraform
Deploying Azure DevOps using TerraformDeploying Azure DevOps using Terraform
Deploying Azure DevOps using Terraform
 
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for Kubernetes
 
Terraform
TerraformTerraform
Terraform
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
KEDA Overview
KEDA OverviewKEDA Overview
KEDA Overview
 
Docker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker 101: Introduction to Docker
Docker 101: Introduction to Docker
 

Ähnlich wie AI & Machine Learning Pipelines with Knative

Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learningAntje Barth
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningAlex Ioannides
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERIndrajit Poddar
 
Your easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processingYour easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processinggvernik
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectPatrick Chanezon
 
Serverless brewbox
Serverless   brewboxServerless   brewbox
Serverless brewboxLino Telera
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDocker, Inc.
 
Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help? Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help? Applatix
 

Ähnlich wie AI & Machine Learning Pipelines with Knative (20)

Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine Learning
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
DevOps demystified
DevOps demystifiedDevOps demystified
DevOps demystified
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Deploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with KubernetesDeploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with Kubernetes
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
 
Your easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processingYour easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processing
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby project
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Serverless brewbox
Serverless   brewboxServerless   brewbox
Serverless brewbox
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help? Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help?
 

Mehr von Animesh Singh

Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Animesh Singh
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIAnimesh Singh
 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesAnimesh Singh
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOAnimesh Singh
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Animesh Singh
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingAnimesh Singh
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Animesh Singh
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAnimesh Singh
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceAnimesh Singh
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAnimesh Singh
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep LearningAnimesh Singh
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Animesh Singh
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...Animesh Singh
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...Animesh Singh
 
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAs a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAnimesh Singh
 
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Animesh Singh
 
Finding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsFinding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsAnimesh Singh
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...Animesh Singh
 

Mehr von Animesh Singh (20)

Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPO
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AI
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep Learning
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAs a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
 
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
 
Finding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsFinding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User Groups
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 

Kürzlich hochgeladen

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 

Kürzlich hochgeladen (20)

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 

AI & Machine Learning Pipelines with Knative

  • 2. ≈ Center for Open Source Data and AI Technologies (CODAIT) Code – Build and improve practical frameworks to enable more developers to realize immediate value. Content – Showcase solutions for complex and real-world AI problems. Community – Bring developers and data scientists to engage with IBM Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow Improving Enterprise AI lifecycle in Open Source •  Team contributes to over 10 open source projects •  17 committers and many contributors in Apache projects •  Over 1100 JIRAs and 66,000 lines of code committed to Apache Spark itself; over 65,000 LoC into SystemML •  Over 25 product lines within IBM leveraging Apache Spark •  Speakers at over 100 conferences, meetups, unconferences and more CODAIT codait.org
  • 3. Agenda 3 •  Progress in ML and DL •  AI and Cloud: Complimentary •  Why we need Knative to build AI Platform •  How to build a transparent and trusted AI platform leveraging Knative •  Demo CODAIT codait.org
  • 4. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 4 1997
  • 5. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 5 2011
  • 6. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 6 2017
  • 7. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 7 2017 2018
  • 8. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 8
  • 9. A human brain has: •  200 billion neurons •  32 trillion connections between them •  25 million “neurons” •  100 million connections (parameters) Deep Learning = Training Artificial Neural Networks IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 9
  • 10. Neural Network Design Workflow! domain data design neural network HPO •  neural network structure •  hyperparameters NO Performance meets needs? Start another experiment optimal hyperparameters
  • 11. Neural Network Design Workflow! domain data HPO •  neural network structure •  hyperparameters NO yes Performance meets needs? Start another experiment trained model deployCloud optimal hyperparameters evaluate BAD Still good! design neural network
  • 12. So AI in general and Deep Learning in particular are very iterative and repetitive. And they need Cloud. Why?
  • 15. 15 ! ! 1.  Model/Data Parallelism! ! 2.  MPI! 3.  NCCL! 4.  …..! ! Ability to utilize various technologies and achieve high performance computing.
  • 17. API! UI! CLI! Kubernetes ! ! Master! Worker Node 1! Worker Node 2! Worker Node 3! Worker Node n! Registry •  Etcd •  API Server •  Controller Manager Server •  Scheduler Server And Cloud native means we use Kubernetes!!
  • 18. 18 But is Kubernetes enough for Cloud Native platform?!
  • 19. 19 Kubernetes is not the end game – Says who?!
  • 20. 20 What else do we need? Let`s understand it from the context of an AI Lifecycle! AI PLATFORM
  • 21. AIOps Prepared and Analyzed Data AI PLATFORM Initial Model Trained Model Deployed Model We need a Cloud native AI Platform to build, train, deploy and monitor Models!
  • 22. AIOps Prepared and Analyzed Data AI PLATFORM Trained Model Deployed Model Also, we need a transparent and trusted AI Platform! Prepared and Analyzed Data AI PLATFORM Initial Model Deployed Model Is model training is showing increasing loss? Are model weights biased? Is the model vulnerable to adversarial attacks? Has the training data changed? Are the model predictions less accurate? Are Hyperparameters suboptimal? Are predictions biased? Is the dataset biased?
  • 23. AIOps Prepared and Analyzed Data Trained Model Deployed Model Infact, we need a transparent, trusted and automated AI Pipeline! Prepared and Analyzed Data Initial Model Deployed Model Is model training is showing increasing loss? Are model weights biased? Is the model vulnerable to adversarial attacks? Has the training data changed? Are ate model predictions less accurate? Are Hyperparameters suboptimal? Are predictions biased? Is the dataset biased?
  • 24. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Harden Train Deploy Trigger: Model performance is suboptimal Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Actions Transparent, trusted, automated and event driven AI Pipeline!
  • 25. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Harden Train Deploy Trigger: Model performance is suboptimal Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Actions Transparent, trusted, automated, event driven and auditable AI Pipeline!
  • 26. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Debias Train Deploy Trigger: Model performance is biased Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Data Scientist AIOps Engineer Actions Transparent, trusted, automated, event driven, auditable AI Pipeline as a Service!
  • 27. 
 Training Pipe Model Validation Pipe If we translate it to logical architecture, it looks like this…! 
 
 Data Pipe Model Deployment Pipe Deployment Analysis Pipe AI Pipeline (Python Definition – Orchestrate and Track) AI Developer and Data Scientist AI Operator 
 Python Function 
 Python Function 
 Python Function 
 Python Function 
 Python Function
  • 28. So we need more than Kubernetes. We need to be able to……! build Data Scientists Code orchestrate the ML code automate the ML workflow send event notifications Data Scientist AIOps Engineer
  • 29. That means, we need the concepts of..! Build Serving Pipeline Eventing
  • 30. So……We need KNative! Build Serving Pipeline Eventing
  • 31. ● Build ● Eventing ● Serving ● Pipeline KNative provides a set of building blocks that enable modern, source-centric and container-based serverless workloads on Kubernetes. Uses K8S CRDs to define a new set of APIs around Well….what is Knative?! Build Serving Pipeline Eventing
  • 32. Knative Build Components •  Build •  Builder •  BuildTemplate For example, you can write a build that uses Kubernetes-native resources to obtain your source code from a repository, build a container image, then run that image. •  A Build can include multiple steps where each step specifies a Builder. •  A Builder is a type of container image that you create to accomplish any task, whether that's a single step in a process, or the whole process itself. •  The steps in a Build can push to a repository. •  A BuildTemplate can be used to defined reusable parameterized templates. Knative Build! Build — Source-to-container build orchestration
  • 33. KNative Eventing components ●  Sources — Sources that are firing events ●  Channels — A single event forwarding and persistence layer ●  Subscriptions — Deliver and forward events to channels/services. Knative Eventing is designed around the following goals: ●  Knative Eventing services are loosely coupled. ●  Event producers and event sources are independent. ●  Other services can be connected to the Eventing system.. ●  Ensure cross-service interoperability. Knative Eventing is consistent with the CloudEvents specification that is developed by the CNCF Serverless WG. Knative Eventing! Eventing — Delivery and management of events, universal subscription, binding services to event ecosystems,
  • 34. Knative Pipeline! Knative Pipeline components ● Task — A collection of sequential steps you would want to run as part of your CI flow. It including the inputs/ outputs and steps ● Pipeline — A graph of Tasks to execute. ● Runs — To invoke a Pipeline or a Task. High level details of this design: •  Pipelines do not know what will trigger them, they can be triggered by events or by manually creating PipelineRuns •  Tasks can exist and be invoked completely independently of Pipelines •  Test results are a first class concept, being able to navigate test results easily is powerful •  Tasks can depend on artifacts, output and parameters created by other tasks. •  Resources are the artifacts used as inputs and outputs of TaskRuns. Pipeline— Configure and run CI/CD style pipelines for your kubernetes application
  • 35. ●  Configuration ○  Desired current state of deployment (#HEAD) ○  Records both code and configuration (separated, ala 12 factor) ○  Stamps out builds / revisions as it is updated ●  Revision ○  Code and configuration snapshot ○  k8s infra: Deployment, ReplicaSet, Pods, etc ●  Route ○  Traffic assignment to Revisions (fractional scaling or by name) ○  Built using Istio ●  Service ○  Provides a simple entry point for UI and CLI tooling to achieve common behavior ○  Acts as a top-level controller to orchestrate Route and Configuration. Configuration Revision Revision Revision Route latest explicit creates Service creates creates Knative Serving! Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing traffic Knative Serving components
  • 38. Worker Node Worker Node Deployment Replica Set Revision Serving Knative Serving Building Block!
  • 39. Worker Node Pod Worker Node Deployment Replica Set Revision Serving Knative Serving Building Block!
  • 40. Worker Node Pod Worker Node Deployment Replica Set Revision Pod Serving Knative Serving Building Block!
  • 41. Worker Node Worker Node Deployment Revision Replica Set Serving Knative Serving Building Block!
  • 42. Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  • 43. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  • 44. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  • 45. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Serving Knative Serving Building Block!
  • 46. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route Serving Knative Serving Building Block!
  • 47. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route Serving Knative Serving Building Block!
  • 48. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route 5% 95% Serving Knative Serving Building Block!
  • 52. ● Build — Source-to-container build orchestration ● Eventing — Universal subscription, binding services to event ecosystems, delivery and management of events ● Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing traffic ● Pipeline — Configure and run CI/CD style pipelines for your kubernetes application. . So Knative uses K8S CRDs to define a new set of APIs ! Build Serving Pipeline Eventing
  • 53. AIOps Prepared and Analyzed Data Trained Model Deployed Model But do we really need something like Knative to build an ML Platform?! Prepared and Analyzed Data Initial Model Deployed Model
  • 54. AIOps Prepared and Analyzed Data Trained Model Deployed Model Let`s go through different phases of AI Lifecycle through our use cases…! Many tools available to build initial models! Prepared and Analyzed Data Initial Model Deployed Model
  • 55. AIOps Prepared and Analyzed Data Trained Model Deployed Model Many tools to train machine learning and deep learning models! Prepared and Analyzed Data Initial Model Deployed Model
  • 56. AIOps Prepared and Analyzed Data Trained Model Deployed Model We need a multi framework ML- DL cloud native platform ! Prepared and Analyzed Data Initial Model Deployed Model e.g. Tensorflow is awesome , but has static graphs so PyTorch’s dynamic graphs are becoming more popular. How do we handle multiple Open Source Deep Learning frameworks in a consistent way. and leverage the power of cloud for distributed computing?
  • 57. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: Fabric for Deep Learning! Prepared and Analyzed Data Initial Model Deployed Model FfDL
  • 58. Fabric for Deep Learning https://github.com/IBM/FfDL FfDL Github Page https://github.com/IBM/FfDL FfDL dwOpen Page https://developer.ibm.com/code/open/projects/ fabric-for-deep-learning-ffdl/ FfDL Announcement Blog http://developer.ibm.com/code/2018/03/20/ fabric-for-deep-learning FfDL Technical Architecture Blog http://developer.ibm.com/code/2018/03/20/ democratize-ai-with-fabric-for-deep-learning Deep Learning as a Service within Watson Studio https://www.ibm.com/cloud/deep-learning Research paper: “Scalable Multi-Framework Management of Deep Learning Training Jobs” http://learningsys.org/nips17/assets/papers/ paper_29.pdf •  Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’ aims at making Deep Learning easily accessible to Data Scientists, and AI developers. •  FfDL Provides a consistent way to train and visualize Deep Learning jobs across multiple frameworks like TensorFlow, Caffe, PyTorch, Keras etc. FfDL 58 Community Partners FfDL is one of InfoWorld’s 2018 Best of Open Source Software Award winners for machine learning and deep learning!
  • 59. Fabric for Deep Learning https://github.com/IBM/FfDL FfDL is built using Microservices architecture on Kubernetes •  FfDL platform uses a microservices architecture to offer resilience, scalability, multi-tenancy, and security without modifying the deep learning frameworks, and with no or minimal changes to model code. •  FfDL control plane microservices are deployed as pods on Kubernetes to manage this cluster of GPU- and CPU- enabled machines effectively •  Tested Platforms: Kube DIND, IBM Cloud Public, IBM Cloud Private, GPUs using both Kubernetes feature gate Accelerators and NVidia device plugins 59
  • 60. source code training definition Access to elastic compute leveraging Kubernetes Auto-allocation means infrastructure is used only when needed Kubernetes container training artifacts compute cluster NVIDIA Tesla K80, P100, V100 Cloud Object Storage Training assets are managed and tracked. IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 60
  • 61. NVIDIA GPUs Kubernetes container orchestration training runs containers Model training distributed across containers server cluster dataset Cloud Object Storage 61
  • 62. OBJECT STORAGE Model Definition 
 Training Data Trained Models REST API Parameter Server Lifecycle Manager Job Monitor Training Data Mongo DB Trainer Service Launch Job Status Job Info " Prometheus Push Gateway Alert Manager Web UI Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Log Collector Training Job MOUNT OBJECT STORAGE BUCKET Elastic Searc h FfDL: Architecture - Current Release! Open MPI CLIs Browser
  • 63. AIOps Prepared and Analyzed Data Trained Model Deployed Model Great , but we also need a batch scheduler for AI Workloads. Enter Kube-Batch! Prepared and Analyzed Data Initial Model Deployed Model kube-batch
  • 64. OBJECT STORAGE Model Definition 
 Training Data Trained Models REST API Parameter Server Job Monitor Mongo DB Trainer Service Launch Job Job Info " Prometheus Push Gateway Alert Manager Web UI Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Log Collector Training Job MOUNT OBJECT STORAGE BUCKET Elastic Searc h Open MPI CLIs Browser Scheduling Engine Kube- Batch QueueJob Advanced Batch Scheduling with Kube-Batch !Advanced Batch Scheduling with Kube-Batch !
  • 65. ! ! ! ! ! ! ! ! ! ! ! ! Kube-Batch: Filling gaps for FfDL and AI Workloads! •  Job Queuing for enabling job priority and preemption •  Holistic scheduling •  Support for job dynamic prioritization/budgeting •  User requested time based scheduling •  Preemption/resumption in a supported checkpoint/ restart env •  Topology aware placement (data intensive workloads requiring network topology aware placement) •  Partial preemption of elastic jobs •  Performance estimation •  Kube-Batch –  Kubernetes incubator project –  Introduces batch scheduling in Kubernetes –  Support for simple job definition and lifecycle management –  Support for job queuing –  Introduces queue-based quota management
  • 66. 12 Watson services/apps represented as 800+ Kubernetes services IBM Watson workloads: Proven AI workload on IBM Cloud Kubernetes Service One deployment example: 3000+ pods on 500+ nodes “We no longer worry about managing the infrastructure because IBM Cloud Kubernetes Service takes care of that for us.” – Watson Project Team Jason McGee / © 2018 IBM Corporation
  • 67. AIOps Prepared and Analyzed Data Trained Model Deployed Model Training is accomplished. Model is ready – Can we trust it?! Prepared and Analyzed Data Initial Model Deployed Model Can the model be trusted?
  • 68. Is it fair? Is it easy to understand? Did anyone tamper with it? Is it accountable? #21, #32, #93 #21, #32, #93 What does it take to trust a decision made by a machine?! (Other than that it is 99% accurate)?!
  • 69. AIOps Prepared and Analyzed Data Trained Model Deployed Model So let`s start with vulnerability detection of Models?! Prepared and Analyzed Data Initial Model Deployed Model Is the model vulnerable to adversarial attacks?
  • 70. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: Adversarial Robustness Toolbox! Prepared and Analyzed Data Initial Model Deployed Model ART
  • 71. IBM Adversarial Robustness Toolbox ART ART is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attack and defense methods for machine learning models. The Adversarial Robustness Toolbox provides an implementation for many state-of-the-art methods for attacking and defending classifiers. 71 https://github.com/IBM/adversarial-robustness- toolbox The Adversarial Robustness Toolbox contains implementations of the following attacks: Deep Fool (Moosavi-Dezfooli et al., 2015) Fast Gradient Method (Goodfellow et al., 2014) Jacobian Saliency Map (Papernot et al., 2016) Universal Perturbation (Moosavi-Dezfooli et al., 2016) Virtual Adversarial Method (Moosavi-Dezfooli et al., 2015) C&W Attack (Carlini and Wagner, 2016) NewtonFool (Jang et al., 2017) The following defense methods are also supported: Feature squeezing (Xu et al., 2017) Spatial smoothing (Xu et al., 2017) Label smoothing (Warde-Farley and Goodfellow, 2016) Adversarial training (Szegedy et al., 2013) Virtual adversarial training (Miyato et al., 2017)
  • 72. AIOps Prepared and Analyzed Data Trained Model Deployed Model Robustness check accomplished. How do we check for bias throughout lifecycle?! Prepared and Analyzed Data Initial Model Deployed Model Are model weights and classifiers biased? Are predictions biased? Is the dataset biased?
  • 73. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: AI Fairness 360! Prepared and Analyzed Data Initial Model Deployed Model AIF360
  • 74. AI Fairness 360 https://github.com/IBM/AIF360 AIF360AIF360 toolkit is an open-source library to help detect and remove bias in machine learning models. The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models. Toolbox Fairness metrics (30+) Fairness metric explanations Bias mitigation algorithms (9+) 74 Supported bias mitigation algorithms Optimized Preprocessing (Calmon et al., 2017) Disparate Impact Remover (Feldman et al., 2015) Equalized Odds Postprocessing (Hardt et al., 2016) Reweighing (Kamiran and Calders, 2012) Reject Option Classification (Kamiran et al., 2012) Prejudice Remover Regularizer (Kamishima et al., 2012) Calibrated Equalized Odds Postprocessing (Pleiss et al., 2017) Learning Fair Representations (Zemel et al., 2013) Adversarial Debiasing (Zhang et al., 2018) Supported fairness metrics Comprehensive set of group fairness metrics derived from selection rates and error rates Comprehensive set of sample distortion metrics Generalized Entropy Index (Speicher et al., 2018)
  • 75. (d’Alessandro et al., 2017) AIF 360 detects for fairness in building and deploying models throughout AI Lifecycle!
  • 77. AIOps Prepared and Analyzed Data Trained Model Deployed Model Model is trained, tested and validated. Just deploy as microservices on Kubernetes? Do we need anything else?! Prepared and Analyzed Data Initial Model Deployed Model
  • 78. AIOps Prepared and Analyzed Data Trained Model Deployed Model Model is trained, tested and validated. Just deploy as microservices on Kubernetes? Do we need anything else?! Prepared and Analyzed Data Initial Model Deployed Model q  As I develop multiple versions of my models, how can I easily dark launch and shift traffic? q  How can I add and enforce policies on my model microservices? q  The network among microservices is not reliable. q  How can I monitor and trace my model microservices? q  How can I ensure the communication among microservices is secure?
  • 80. An open service mesh platform to connect, observe, secure, and control microservices. Istio!
  • 81. Connect: Traffic Control, Discovery, Load Balancing, Resiliency Observe: Metrics, Logging, Tracing Secure: Encryption (TLS), Authentication, and Authorization of service-to-service communication Control: Policy Enforcement Istio!
  • 82. BA call How does it work?
 !
  • 83. 1. Deploy a proxy (Envoy) beside your application (“sidecar deployment”) Env oy A Envoy Env oy B Envoy call How does it work?
 !
  • 84. 2. Deploy Pilot to configure the sidecars Envoy Pilot config Env oy A Envoy Env oy B Envoy How does it work?
 !
  • 85. 3. Deploy Telemetry to get telemetry Envoy Pilot Env oy A Envoy Env oy B Envoy Envoy Telemetry telemetry How does it work?
 !
  • 86. 4. Deploy Citadel to assign identities and enable secure communication Envoy Pilot Env oy A Envoy Env oy B Envoy Envoy Telemetry Envoy Citadel How does it work?
 !
  • 87. 5. Deploy Policy to enforce policies Envoy Pilot policy decisions Envoy A Envoy Envoy B Envoy Envoy Policy Envoy Telemetry Envoy Citadel How does it work?
 !
  • 92. And….Istio is part of Knative!!!!
  • 93. AIOps Model is deployed. We need to ensure triggers and actions can be defined based on any vulnerability detection, dataset changes, bias detection etc… ! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  • 94. AIOps Enter Apache OpenWhisk! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  • 95. Triggers (response) Rules Actions (code) Source (events) Results OpenWhisk Apache OpenWhisk! FaaS platform to execute code in response to events! Delivered as
 Open source via Apache openwhisk.org
  • 97. Event Provider Periodic IBM Cloudant Message Hub Mobile Push Github OpenWhisk IBM App Connect OpenWhisk!
  • 98. 98 And….OpenWhisk is being developed to work with Knative!!!!
  • 99. AIOps And last but not the least, what do we use for Pipeline orchestration?! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  • 100. 100 •  Currently in IBM we use IBM DevOps toolchain pipeline, built on top of Jenkins •  On open source, we are investigating, Pachyderm, Argo and Airflow. •  Pachyderm and Argo are Dataflow and Workflow orchestrators on Kubernetes •  Both have strong merits, and provide a DevOps Operator view around pipeline execution times, logs, metrics, exit handlers, notification systems etc. •  As discussed, Knative is now evolving its own Pipelining capabilities, though in early stage AI Pipeline Orchestration!
  • 101. AIOps Prepared and Analyzed Data Trained Model Deployed Model AIOps platform is ready. How do Data Scientists interact with it using Notebooks?! Prepared and Analyzed Data Initial Model Deployed Model Jupyter Enterprise Gateway
  • 102. Jupyter Enterprise Gateway March 30 2018 / © 2018 IBM Corporation Jupyter Enterprise Gateway at IBM Code https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter-incubator/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ 102 A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases Kernel Kernel Kernel Kernel Kernel KernelKernel
  • 103. Jupyter Notebooks: Support for FfDL User Experience 103 •  User select where to run the experiment •  Job is packaged and submitted on behalf of user •  User has access to Job Console to monitor experiment
  • 104. AIOps Prepared and Analyzed Data Trained Model Deployed Model Together: A Transparent, and trusted Open Source AI Pipeline using Knative Prepared and Analyzed Data Initial Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX AIF360 AIF360 Istio OpenWhisk ART
  • 106. Demo Model: Gender Classification! Data UTKFace Simple convolutional neural network (CNN) with 3 convolutional layers and 2 fully connected layers Male Female SoftMax Confident Score
  • 107. Demo Flow! Fabric for Deep Learning Data Robustness Check Model Deployment ADVERSARIAL ROBUSTNESS TOOLBOX Fairness Check AI Fairness 360 Model Revision 1 10% 90% Model Revision 2 Gender Classification Model Source to Container: Knative Build ML Code Training
  • 108. To fully automate the workflow, Knative Pipeline can be used with Knative Eventing! Fabric for Deep Learning Data Robustness Check Model Deployment ADVERSARIAL ROBUSTNESS TOOLBOX Fairness Check AI Fairness 360 Model Revision 1 10% 90% Model Revision 2 Gender Classification Model Source to Container: Knative Build ML Code Training Eventing
  • 109. Knative Eventing Sources, Channel and Subscription can be defined!
  • 110. Knative Pipeline Tasks and Flow Definition can be defined!
  • 111. For this demonstration, we used Argo Workflows for Pipelining!
  • 112. •  Knative brings serveless actions, events, workflows, apps, containers, service-mesh in one single stack •  Knative eliminates the need for stitching various individual components together (events, actions, service mesh, pipeline etc.) •  Knative Build and Serving are simple to use and integrate with our existing containers. •  Traffic Routing and Canary testing on Knative can be done without any Istio knowledge. •  Built-in monitoring tools are very helpful to visualize the benefits of building services using Knative. •  Knative Eventing shows promise, needs to evolve more for production •  Knative Build-Pipeline is still in very early stage, and changing rapidly. •  Knative Build-Pipeline could get very complicated with many tasks and it needs a better debugging tool/GUI to monitor errors and visualize the pipeline flow. What we learnt!
  • 113. AIOps Prepared and Analyzed Data Trained Model Deployed Model Bringing it together: A Secure, transparent, and trusted Open Source AI Pipeline Prepared and Analyzed Data Initial Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX AIF360 AIF360 Istio OpenWhisk ART
  • 114. Links:! •  Knative: https://github.com/knative •  Fabric for Deep Learning: https://github.com/IBM/FfDL •  Adversarial Robustness Toolbox: https://github.com/IBM/adversarial-robustness-toolbox •  AI Fairness 360: http://aif360.mybluemix.net/ •  Jupyter Enterprise Gateway: https://github.com/jupyter/enterprise_gateway •  Kube-Batch: https://github.com/kubernetes-sigs/kube-batch •  Istio: https://github.com/istio/istio •  OpenWhisk: https://github.com/apache/incubator-openwhisk •  Kiali: https://github.com/kiali/kiali •  Argo: https://github.com/argoproj/argo •  PyTorch: https://pytorch.org/