SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Downloaden Sie, um offline zu lesen
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
Distributed Deep Learning Using Hopsworks
EIT Big Data Summer School
Kim Hammar
kim@logicalclocks.com
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
"A distributed system is one in
which the failure of a computer
you didn’t even know existed can
render your own computer unusable.”
- Leslie Lamport
"A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,”
improves with experience E.
- Tom Mitchell
Why Combine the two?
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DISTRIBUTED COMPUTING IN 2 MINUTES
Distributed Consensus
p1
p2
p3
commit
commit
ack
ack
Peer-to-Peer Systems
Physical Network
Overlay Network
Database Systems
C
P A
0x10ba8
0xf7fc4380
-1
0xf7fc44b8
2
(nil)
-1
Data-Flow Processing
Non-Synchronized Clocks
p1
p2
p3
1
2 3
Logical time: 4Big Data
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DEEP LEARNING IN 2 MINUTES
Non-Convex Iterative Optimization




x1
...
xn




Features
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
Representation Learning
C
O
N
V
C
O
N
V
C
O
N
V
C
O
N
V
C
O
N
V
HPC Tools & Hardware
Black-Box Optimization
Black box function
Optimizer
input x output ˆf(x)
new input x
Automatic Differentiation
x
W0
∗
b0
+ σ ∗
W1 b1
+
σLE
1
∂E
∂W0
=
∂E
∂z0
∂z0
∂W0
∂E
∂z0
= ∂E
∂h0
∂h0
∂z0
∂E
∂b0=∂E
∂h0
∂h0∂b0
∂E
∂h0
= ∂E
∂a0
∂a0
∂h0
∂E
∂W1
=
∂E
∂z1
∂z1
∂W1
∂E
∂a0
= ∂E
∂z1
∂z1
a0
∂E
∂z1
= ∂E
∂h1
∂h1
∂z1
∂E
∂b1
=
∂E
∂h1
∂h1
∂b1
∂E
∂a1
= 1 · ∂E
∂ˆy
Big Data
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
We like challenging problems
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
We like challenging problems
More productive data science
Unreasonable effectiveness of data1
To achieve state-of-the-art results2
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE
SCALING
3
2em13
Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning. https : / / www . scribd . com /
document/355752799/Jeff-Dean-s-Lecture-for-YC-AI. 2018.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE
SCALING
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DDL IS NOT A SECRET ANYMORE
4
2em14
Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth
Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941. URL: http://arxiv.org/abs/
1802.09941.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DDL IS NOT A SECRET ANYMORE
TensorflowOnSpark
CaffeOnSpark
Distributed TF
Frameworks for DDL Companies using DDL
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DDL REQUIRES AN ENTIRE
SOFTWARE/INFRASTRUCTURE STACK
e1
e2
e3
Distributed Training
e4
Gradient
Gradient
Gradient
Gradient
Distributed Systems
Data Validation
Feature Engineering
Data Collection
Hardware
Management
HyperParameter
Tuning
Model
Serving
Pipeline Management
A/B
Testing
Monitoring
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTLINE
1. Hopsworks: Background of the platform
2. Managed Distributed Deep Learning using HopsYARN,
HopsML, PySpark, and Tensorflow
3. Black-Box Optimization (Hyperparameter Tuning) using
Hopsworks, Metadata Store and PySpark
4. Short Break
5. Demo, end-to-end ML pipeline
6. Hands-on Workshop, try out Hopsworks on SICS ICE
cluster
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS
HopsFS
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
from hops import featurestore
from hops import experiment
featurestore.get_features([
"average_attendance",
"average_player_age"])
experiment.collective_all_reduce(features , model)
APIs
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Distributed Metadata
(Available from REST API)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
from hops import featurestore
from hops import experiment
featurestore.get_features([
"average_attendance",
"average_player_age"])
experiment.collective_all_reduce(features , model)
APIs
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
THE TEAM
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPS & HOPSWORKS HISTORY
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1
e2
e3
e4
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1
e2
e3
e4
Gradient
Gradient
Gradient
Gradient
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1
e2
e3
e4
p1
p2
p3
p4
Gradient
Gradient
Gradient
Gradient
Data Partition
Data Partition
Data Partition
Data Partition
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
DISTRIBUTED DEEP LEARNING IN PRACTICE
Implementation
of distributed
algorithms is
becoming a
commodity (TF,
PyTorch etc)
The hardest part
of DDL is now:
Cluster
management
Allocating
GPUs
Data
management
Operations &
performance
?
Models GPUs Data Distribution
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops import experiment
experiment.collective_all_reduce(train_fn)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
YARN container
GPU as a resource
YARN container
GPU as a resource
YARN container
GPU as a resource
Resource requests
Client API
YARN container
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
Resource requests
Client API
YARN container
Spark driver
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
YARN container
conda env
Spark driver
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Here is my ip: 192.168.1.1
Here is my ip: 192.168.1.2
Here is my ip: 192.168.1.3
Here is my ip: 192.168.1.4
YARN container
conda env
Spark driver
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Gradient
Gradient
GradientGradient
YARN container
conda env
Spark driver
Hops Distributed File System (HopsFS)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Gradient
Gradient
GradientGradient
YARN container
conda env
Spark driver
Hops Distributed File System (HopsFS)
Hide complexity behind simple API
Allocate resources using pyspark
Allocate GPUs for spark executors using HopsYARN
Serve sharded training data to workers from HopsFS
Use HopsFS for aggregating logs, checkpoints and results
Store experiment metadata in metastore
Use dynamic allocation for interactive resource management
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Example Use-Case from one of our clients:
Goal: Train a One-Class GAN model for fraud detection
Problem: GANs are extremely sensitive to
hyperparameters and there exists a very large space of
possible hyperparameters.
Example hyperparameters to tune: learning rates η,
optimizers, layers.. etc.
Real input x
Random Noise z
Generator
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Discriminator
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to monitor progress?
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?
Fault Tolerance?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?
Fault Tolerance?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
This should be managed with platform support!
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
MAGGY: A FRAMEWORK FOR
SYNCHRONOUS/ASYNCHRONOUS
HYPERPARAMETER TUNING ON HOPSWORKS5
A flexible framework for running different black-box
optimization algorithms on Hopsworks
ASHA, Hyperband, Differential Evolution, Random
search, Grid search, etc.
2em15
Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy
builds on: Robin Andersson
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
MAGGY: A FRAMEWORK FOR
SYNCHRONOUS/ASYNCHRONOUS
HYPERPARAMETER TUNING ON HOPSWORKS5
A flexible framework for running different black-box
optimization algorithms on Hopsworks
ASHA, Hyperband, Differential Evolution, Random
search, Grid search, etc.
2em15
Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy
builds on: Robin Andersson
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH
ALGORITHMS
Un-directed search
N spark tasks N eval metrics Max/Min
h
Driver/Parameter Server
Synchronous directed search (2 iterations)
N spark tasks
Barrier
N spark tasks N eval metrics Synchronization
Barrier
. . .
N eval metrics Synchronization
Driver/Parameter Server
Parallel undirected/synchronous search is trivial using
Spark and a distributed file system
Example of un-directed search algorithms: random and
grid search
Example of synchronous search algorithms: differential
evolution
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH
ALGORITHMS
Fits very well with Spark BSP model
Un-directed search
N spark tasks N eval metrics Max/Min
h
Driver/Parameter Server
Synchronous directed search (2 iterations)
N spark tasks
Barrier
N spark tasks N eval metrics Synchronization
Barrier
. . .
N eval metrics Synchronization
Driver/Parameter Server
Parallel un-directed/synchronous search is trivial using
Spark and a distributed file system
Example of un-directed search algorithms: random and
grid search
Example of synchronous search algorithms: differential
evolution
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
PROBLEM WITH THE BULK-SYNCHRONOUS
PROCESSING MODEL FOR PARALLEL SEARCH
N spark tasks
Barrier
N eval metrics
Waiting.. wasted compute
Driver/Parameter Server
Synchronous search is sensitive to stragglers and not suitable for
early stopping
... For large scale search problems we need asynchronous search
Problem: Asynchronous search is much harder to implement
with big data processing tools such as Spark
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker
HopsFS
Async Task Queue/Driver/Parameter Server
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside
HopsFS
Async Task Queue/Driver/Parameter Server
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside
HopsFS
Write
checkpoints &
results
Async Task Queue/Driver/Parameter Server
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside
HopsFS
Write
checkpoints &
results
Async fetching
of new tasks
(RPC framework)
Async Task Queue/Driver/Parameter Server
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
Robust against
stragglers
Supports early
stopping
Fault tolerance with
checkpointing
Monitoring with
Tensorboard
Log aggregation
with HopsFS
Simple API and
extendable
1 spark task/worker, many async tasks inside
HopsFS
Write
checkpoints &
results
Async fetching
of new tasks
(RPC framework)
Async Task Queue/Driver/Parameter Server
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
PARALLEL EXPERIMENTS
from maggy import experiment
from maggy.searchspace import Searchspace
from maggy.randomsearch import RandomSearch
sp = Searchspace(argument_param=(’DOUBLE’, [1, 5]))
rs = RandomSearch(5, sp)
result = experiment.launch(train_fn ,
sp, optimizer=rs,
num_trials=5, name=’test’,
direction="max")
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Suggested tasks
Results
Suggested tasks
Results
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Heartbeats
Suggested tasks
Results
Heartbeats
Suggested tasks
Results
Heartbeats
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Heartbeats
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Heartbeats
Checkpoints
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
SUMMARY
Deep Learning is going distributed
Algorithms for DDL are available in several frameworks
Applying DDL in practice brings a lot of operational
complexity
Hopsworks is a platform for scale out deep learning and
big data processing
Hopsworks makes DDL simpler by providing simple
abstractions for distributed training, parallel experiments
and much more..
@hopshadoop
www.hops.io
@logicalclocks
www.logicalclocks.com
We are open source:
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops
Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso,
Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson,
Alex Ormenisan, Rasmus Toivonen and Steffen Grohsschmiedt.
During the break..
1. Register for an account at:
www.hops.site
2. Follow the instructions at:
http://bit.ly/2OI4Ggt
3. Cheatsheet (for copy-paste):
http://snurran.sics.
se/hops/kim/workshop_
cheat.txt
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
Feature
Computation
Raw/Structured Data
Data Lake
Feature Store
Curated Features
Model
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Demo-Setting
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
Hands-on Workshop
1. If you haven’t registered, do it now on hops.site
2. Cheatsheet: http://snurran.sics.se/hops/
kim/workshop_cheat.txt
3. Python API Docs:
http://hops-py.logicalclocks.com/
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 1 (HELLO HOPSWORKS)
1. Create a Deep Learning Tour Project on Hopsworks
2. Start a Jupyter Notebook with the config:
“Experiment” Mode
1 GPU
4000 (MB) memory for the driver (appmaster)
8000 (MB) memory for the executor
Rest can be default
3. Create a new “PySpark” notebook
4. In the first cell, write:
print("Hello Hopsworks")
5. Execute the cell (Ctrl + <Enter>)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH
GPU)
1. Add a new cell with the contents:
def executor():
print("Hello from GPU")
2. Add a new cell with the contents:
from hops import experiment
experiment.launch(executor)
3. Execute the two cells in order (Ctrl + <Enter>)
4. Go to the Application UI
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH
GPU)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH
GPU)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 3 (SAVE DATA IN THE FEATURE STORE)
1. Enable the Feature Store service in your project
2. Add a new cell with the contents:
from hops import featurestore
import numpy as np
featurestore.create_featuregroup(np.random.rand(20,10), "eit_school")
3. Add a new cell with the contents:
featurestore.get_featuregroup("eit_school").show(5)
4. Add a new cell with the contents:
%%local
%matplotlib inline
from hops import featurestore
featurestore.visualize_featuregroup_correlations("eit_school")
5. Execute the cells in order and then go to the featurestore
registry
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 4 (LOAD MNIST FROM HOPSFS)
1. Add a new cell with the contents:
from hops import hdfs
import tensorflow as tf
def create_tf_dataset():
train_files = [hdfs.project_path() +
"TestJob/data/mnist/train/train.tfrecords"]
dataset = tf.data.TFRecordDataset(train_files)
def decode(example):
example = tf.parse_single_example(example ,{
’image_raw’: tf.FixedLenFeature([], tf.string),
’label’: tf.FixedLenFeature([], tf.int64)})
image = tf.reshape(tf.decode_raw(example[’image_raw’],
tf.uint8), (28,28,1))
label = tf.one_hot(tf.cast(example[’label’], tf.int32), 10)
return image, label
return dataset.map(decode).batch(128).repeat()
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 4 (LOAD MNIST FROM HOPSFS)
2. Add a new cell with the contents:
create_tf_dataset()
3. Execute the two cells in order (Ctrl + <Enter>)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 5 (DEFINE CNN MODEL)
from tensorflow import keras
def create_model():
model = keras.Sequential()
model.add(keras.layers.Conv2D(filters=32, kernel_size=3, padding=’same’,
activation=’relu’, input_shape=(28,28,1)))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPooling2D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.Conv2D(filters=64, kernel_size=3,
padding=’same’, activation=’relu’))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPooling2D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation=’relu’))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(10, activation=’softmax’))
return model
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 5 (DEFINE CNN MODEL)
2. Add a new cell with the contents:
create_model().summary()
3. Execute the two cells in order (Ctrl + <Enter>)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 6 (DEFINE & RUN THE EXPERIMENT)
1. Add a new cell with the contents:
from hops import tensorboard
from tensorflow.python.keras.callbacks import TensorBoard
def train_fn():
dataset = create_tf_dataset()
model = create_model()
model.compile(loss=keras.losses.categorical_crossentropy ,
optimizer=keras.optimizers.Adam(),metrics=[’accuracy’])
tb_callback = TensorBoard(log_dir=tensorboard.logdir())
model_ckpt_callback = keras.callbacks.ModelCheckpoint(
tensorboard.logdir(), monitor=’acc’)
history = model.fit(dataset , epochs=50,
steps_per_epoch=80, callbacks=[tb_callback])
return history.history["acc"][−1]
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
EXERCISE 6 (DEFINE & RUN THE EXPERIMENT)
2. Add a new cell with the contents:
experiment.launch(train_fn)
3. Execute the two cells in order (Ctrl + <Enter>)
4. Go to the Application UI and monitor the training progress
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP
REFERENCES
Example notebooks https:
//github.com/logicalclocks/hops-examples
HopsML6
Hopsworks7
Hopsworks’ feature store8
Maggy
https://github.com/logicalclocks/maggy
HopsFS9
2em16
Logical Clocks AB. HopsML: Python-First ML Pipelines. https : / / hops . readthedocs . io / en /
latest/hopsml/hopsML.html. 2018.
2em17
Jim Dowling. Introducing Hopsworks. https : / / www . logicalclocks . com / introducing -
hopsworks/. 2018.
2em18
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.
logicalclocks.com/feature-store/. 2018.
2em19
Salman Niazi et al. “HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases”.
In: 15th USENIX Conference on File and Storage Technologies (FAST 17). Santa Clara, CA: USENIX Association, 2017,
pp. 89–104. ISBN: 978-1-931971-36-2. URL: https://www.usenix.org/conference/fast17/technical-
sessions/presentation/niazi.

Weitere ähnliche Inhalte

Ähnlich wie Eit digital big_data_summer_school_8_aug_2019_kim_hammar

Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex PerrierAlexis Perrier
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 Eran Shlomo
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceJim Dowling
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify Dataconomy Media
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18DataconomyGmbH
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningOswald Campesato
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowOswald Campesato
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningOswald Campesato
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIlya Grigorik
 
2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REAkenbot
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagerySuneel Marthi
 

Ähnlich wie Eit digital big_data_summer_school_8_aug_2019_kim_hammar (20)

Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
 
Dagstuhl2021
Dagstuhl2021Dagstuhl2021
Dagstuhl2021
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
 
Gdd pydp
Gdd pydpGdd pydp
Gdd pydp
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 
Bol.com
Bol.comBol.com
Bol.com
 
2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REA
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
 

Mehr von Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion ResponseKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionKim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security AutomationKim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...Kim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseKim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 

Mehr von Kim Hammar (20)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 

Kürzlich hochgeladen

Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
SEO Growth Program-Digital optimization Specialist
SEO Growth Program-Digital optimization SpecialistSEO Growth Program-Digital optimization Specialist
SEO Growth Program-Digital optimization SpecialistKHM Anwar
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goa
horny (9316020077 ) Goa  Call Girls Service by VIP Call Girls in Goahorny (9316020077 ) Goa  Call Girls Service by VIP Call Girls in Goa
horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goasexy call girls service in goa
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 

Kürzlich hochgeladen (20)

Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
SEO Growth Program-Digital optimization Specialist
SEO Growth Program-Digital optimization SpecialistSEO Growth Program-Digital optimization Specialist
SEO Growth Program-Digital optimization Specialist
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goa
horny (9316020077 ) Goa  Call Girls Service by VIP Call Girls in Goahorny (9316020077 ) Goa  Call Girls Service by VIP Call Girls in Goa
horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goa
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Call Girls In Noida 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Noida 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In Noida 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Noida 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
 

Eit digital big_data_summer_school_8_aug_2019_kim_hammar

  • 1. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP Distributed Deep Learning Using Hopsworks EIT Big Data Summer School Kim Hammar kim@logicalclocks.com
  • 2. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning "A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.” - Leslie Lamport "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P,” improves with experience E. - Tom Mitchell Why Combine the two?
  • 3. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DISTRIBUTED COMPUTING IN 2 MINUTES Distributed Consensus p1 p2 p3 commit commit ack ack Peer-to-Peer Systems Physical Network Overlay Network Database Systems C P A 0x10ba8 0xf7fc4380 -1 0xf7fc44b8 2 (nil) -1 Data-Flow Processing Non-Synchronized Clocks p1 p2 p3 1 2 3 Logical time: 4Big Data
  • 4. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DEEP LEARNING IN 2 MINUTES Non-Convex Iterative Optimization     x1 ... xn     Features b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy) Representation Learning C O N V C O N V C O N V C O N V C O N V HPC Tools & Hardware Black-Box Optimization Black box function Optimizer input x output ˆf(x) new input x Automatic Differentiation x W0 ∗ b0 + σ ∗ W1 b1 + σLE 1 ∂E ∂W0 = ∂E ∂z0 ∂z0 ∂W0 ∂E ∂z0 = ∂E ∂h0 ∂h0 ∂z0 ∂E ∂b0=∂E ∂h0 ∂h0∂b0 ∂E ∂h0 = ∂E ∂a0 ∂a0 ∂h0 ∂E ∂W1 = ∂E ∂z1 ∂z1 ∂W1 ∂E ∂a0 = ∂E ∂z1 ∂z1 a0 ∂E ∂z1 = ∂E ∂h1 ∂h1 ∂z1 ∂E ∂b1 = ∂E ∂h1 ∂h1 ∂b1 ∂E ∂a1 = 1 · ∂E ∂ˆy Big Data
  • 5. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 6. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? We like challenging problems 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 7. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? We like challenging problems More productive data science Unreasonable effectiveness of data1 To achieve state-of-the-art results2 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 8. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE SCALING 3 2em13 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning. https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI. 2018.
  • 9. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE SCALING
  • 10. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DDL IS NOT A SECRET ANYMORE 4 2em14 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941. URL: http://arxiv.org/abs/ 1802.09941.
  • 11. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DDL IS NOT A SECRET ANYMORE TensorflowOnSpark CaffeOnSpark Distributed TF Frameworks for DDL Companies using DDL
  • 12. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DDL REQUIRES AN ENTIRE SOFTWARE/INFRASTRUCTURE STACK e1 e2 e3 Distributed Training e4 Gradient Gradient Gradient Gradient Distributed Systems Data Validation Feature Engineering Data Collection Hardware Management HyperParameter Tuning Model Serving Pipeline Management A/B Testing Monitoring
  • 13. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTLINE 1. Hopsworks: Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization (Hyperparameter Tuning) using Hopsworks, Metadata Store and PySpark 4. Short Break 5. Demo, end-to-end ML pipeline 6. Hands-on Workshop, try out Hopsworks on SICS ICE cluster
  • 14. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS
  • 15. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS HopsFS
  • 16. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource)
  • 17. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data)
  • 18. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets
  • 19. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) APIs
  • 20. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Distributed Metadata (Available from REST API) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) APIs
  • 21. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP THE TEAM
  • 22. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPS & HOPSWORKS HISTORY
  • 23. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 24. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 25. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 26. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP INNER LOOP: DISTRIBUTED DEEP LEARNING e1 e2 e3 e4
  • 27. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP INNER LOOP: DISTRIBUTED DEEP LEARNING e1 e2 e3 e4 Gradient Gradient Gradient Gradient
  • 28. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP INNER LOOP: DISTRIBUTED DEEP LEARNING e1 e2 e3 e4 p1 p2 p3 p4 Gradient Gradient Gradient Gradient Data Partition Data Partition Data Partition Data Partition
  • 29. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP DISTRIBUTED DEEP LEARNING IN PRACTICE Implementation of distributed algorithms is becoming a commodity (TF, PyTorch etc) The hardest part of DDL is now: Cluster management Allocating GPUs Data management Operations & performance ? Models GPUs Data Distribution b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy
  • 30. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS DDL SOLUTION
  • 31. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment.collective_all_reduce(train_fn)
  • 32. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource YARN container GPU as a resource YARN container GPU as a resource YARN container GPU as a resource Resource requests Client API YARN container
  • 33. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor Resource requests Client API YARN container Spark driver
  • 34. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API YARN container conda env Spark driver
  • 35. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Here is my ip: 192.168.1.1 Here is my ip: 192.168.1.2 Here is my ip: 192.168.1.3 Here is my ip: 192.168.1.4 YARN container conda env Spark driver
  • 36. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Gradient Gradient GradientGradient YARN container conda env Spark driver Hops Distributed File System (HopsFS)
  • 37. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Gradient Gradient GradientGradient YARN container conda env Spark driver Hops Distributed File System (HopsFS) Hide complexity behind simple API Allocate resources using pyspark Allocate GPUs for spark executors using HopsYARN Serve sharded training data to workers from HopsFS Use HopsFS for aggregating logs, checkpoints and results Store experiment metadata in metastore Use dynamic allocation for interactive resource management
  • 38. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 39. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 40. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 41. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Example Use-Case from one of our clients: Goal: Train a One-Class GAN model for fraud detection Problem: GANs are extremely sensitive to hyperparameters and there exists a very large space of possible hyperparameters. Example hyperparameters to tune: learning rates η, optimizers, layers.. etc. Real input x Random Noise z Generator b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Discriminator b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy
  • 42. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 43. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 44. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 45. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to monitor progress? Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 46. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 47. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search? Fault Tolerance?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 48. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search? Fault Tolerance?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy) This should be managed with platform support!
  • 49. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP MAGGY: A FRAMEWORK FOR SYNCHRONOUS/ASYNCHRONOUS HYPERPARAMETER TUNING ON HOPSWORKS5 A flexible framework for running different black-box optimization algorithms on Hopsworks ASHA, Hyperband, Differential Evolution, Random search, Grid search, etc. 2em15 Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy builds on: Robin Andersson
  • 50. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP MAGGY: A FRAMEWORK FOR SYNCHRONOUS/ASYNCHRONOUS HYPERPARAMETER TUNING ON HOPSWORKS5 A flexible framework for running different black-box optimization algorithms on Hopsworks ASHA, Hyperband, Differential Evolution, Random search, Grid search, etc. 2em15 Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy builds on: Robin Andersson
  • 51. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH ALGORITHMS Un-directed search N spark tasks N eval metrics Max/Min h Driver/Parameter Server Synchronous directed search (2 iterations) N spark tasks Barrier N spark tasks N eval metrics Synchronization Barrier . . . N eval metrics Synchronization Driver/Parameter Server Parallel undirected/synchronous search is trivial using Spark and a distributed file system Example of un-directed search algorithms: random and grid search Example of synchronous search algorithms: differential evolution
  • 52. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH ALGORITHMS Fits very well with Spark BSP model Un-directed search N spark tasks N eval metrics Max/Min h Driver/Parameter Server Synchronous directed search (2 iterations) N spark tasks Barrier N spark tasks N eval metrics Synchronization Barrier . . . N eval metrics Synchronization Driver/Parameter Server Parallel un-directed/synchronous search is trivial using Spark and a distributed file system Example of un-directed search algorithms: random and grid search Example of synchronous search algorithms: differential evolution
  • 53. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP PROBLEM WITH THE BULK-SYNCHRONOUS PROCESSING MODEL FOR PARALLEL SEARCH N spark tasks Barrier N eval metrics Waiting.. wasted compute Driver/Parameter Server Synchronous search is sensitive to stragglers and not suitable for early stopping ... For large scale search problems we need asynchronous search Problem: Asynchronous search is much harder to implement with big data processing tools such as Spark
  • 54. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS 1 spark task/worker HopsFS Async Task Queue/Driver/Parameter Server
  • 55. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS 1 spark task/worker, many async tasks inside HopsFS Async Task Queue/Driver/Parameter Server
  • 56. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS 1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async Task Queue/Driver/Parameter Server
  • 57. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS 1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async fetching of new tasks (RPC framework) Async Task Queue/Driver/Parameter Server
  • 58. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS Robust against stragglers Supports early stopping Fault tolerance with checkpointing Monitoring with Tensorboard Log aggregation with HopsFS Simple API and extendable 1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async fetching of new tasks (RPC framework) Async Task Queue/Driver/Parameter Server
  • 59. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP PARALLEL EXPERIMENTS from maggy import experiment from maggy.searchspace import Searchspace from maggy.randomsearch import RandomSearch sp = Searchspace(argument_param=(’DOUBLE’, [1, 5])) rs = RandomSearch(5, sp) result = experiment.launch(train_fn , sp, optimizer=rs, num_trials=5, name=’test’, direction="max")
  • 60. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Suggested tasks Results Suggested tasks Results
  • 61. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Heartbeats Suggested tasks Results Heartbeats Suggested tasks Results Heartbeats
  • 62. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Heartbeats
  • 63. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP ASYNCHRONOUS SEARCH WORKFLOW λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Heartbeats Checkpoints
  • 64. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP SUMMARY Deep Learning is going distributed Algorithms for DDL are available in several frameworks Applying DDL in practice brings a lot of operational complexity Hopsworks is a platform for scale out deep learning and big data processing Hopsworks makes DDL simpler by providing simple abstractions for distributed training, parallel experiments and much more.. @hopshadoop www.hops.io @logicalclocks www.logicalclocks.com We are open source: https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan, Rasmus Toivonen and Steffen Grohsschmiedt.
  • 65. During the break.. 1. Register for an account at: www.hops.site 2. Follow the instructions at: http://bit.ly/2OI4Ggt 3. Cheatsheet (for copy-paste): http://snurran.sics. se/hops/kim/workshop_ cheat.txt
  • 66. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP Feature Computation Raw/Structured Data Data Lake Feature Store Curated Features Model b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Demo-Setting
  • 67. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP Hands-on Workshop 1. If you haven’t registered, do it now on hops.site 2. Cheatsheet: http://snurran.sics.se/hops/ kim/workshop_cheat.txt 3. Python API Docs: http://hops-py.logicalclocks.com/
  • 68. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 1 (HELLO HOPSWORKS) 1. Create a Deep Learning Tour Project on Hopsworks 2. Start a Jupyter Notebook with the config: “Experiment” Mode 1 GPU 4000 (MB) memory for the driver (appmaster) 8000 (MB) memory for the executor Rest can be default 3. Create a new “PySpark” notebook 4. In the first cell, write: print("Hello Hopsworks") 5. Execute the cell (Ctrl + <Enter>)
  • 69. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH GPU) 1. Add a new cell with the contents: def executor(): print("Hello from GPU") 2. Add a new cell with the contents: from hops import experiment experiment.launch(executor) 3. Execute the two cells in order (Ctrl + <Enter>) 4. Go to the Application UI
  • 70. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH GPU)
  • 71. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH GPU)
  • 72. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 3 (SAVE DATA IN THE FEATURE STORE) 1. Enable the Feature Store service in your project 2. Add a new cell with the contents: from hops import featurestore import numpy as np featurestore.create_featuregroup(np.random.rand(20,10), "eit_school") 3. Add a new cell with the contents: featurestore.get_featuregroup("eit_school").show(5) 4. Add a new cell with the contents: %%local %matplotlib inline from hops import featurestore featurestore.visualize_featuregroup_correlations("eit_school") 5. Execute the cells in order and then go to the featurestore registry
  • 73. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 4 (LOAD MNIST FROM HOPSFS) 1. Add a new cell with the contents: from hops import hdfs import tensorflow as tf def create_tf_dataset(): train_files = [hdfs.project_path() + "TestJob/data/mnist/train/train.tfrecords"] dataset = tf.data.TFRecordDataset(train_files) def decode(example): example = tf.parse_single_example(example ,{ ’image_raw’: tf.FixedLenFeature([], tf.string), ’label’: tf.FixedLenFeature([], tf.int64)}) image = tf.reshape(tf.decode_raw(example[’image_raw’], tf.uint8), (28,28,1)) label = tf.one_hot(tf.cast(example[’label’], tf.int32), 10) return image, label return dataset.map(decode).batch(128).repeat()
  • 74. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 4 (LOAD MNIST FROM HOPSFS) 2. Add a new cell with the contents: create_tf_dataset() 3. Execute the two cells in order (Ctrl + <Enter>)
  • 75. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 5 (DEFINE CNN MODEL) from tensorflow import keras def create_model(): model = keras.Sequential() model.add(keras.layers.Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=(28,28,1))) model.add(keras.layers.BatchNormalization()) model.add(keras.layers.MaxPooling2D(pool_size=2)) model.add(keras.layers.Dropout(0.3)) model.add(keras.layers.Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(keras.layers.BatchNormalization()) model.add(keras.layers.MaxPooling2D(pool_size=2)) model.add(keras.layers.Dropout(0.3)) model.add(keras.layers.Flatten()) model.add(keras.layers.Dense(128, activation=’relu’)) model.add(keras.layers.Dropout(0.5)) model.add(keras.layers.Dense(10, activation=’softmax’)) return model
  • 76. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 5 (DEFINE CNN MODEL) 2. Add a new cell with the contents: create_model().summary() 3. Execute the two cells in order (Ctrl + <Enter>)
  • 77. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 6 (DEFINE & RUN THE EXPERIMENT) 1. Add a new cell with the contents: from hops import tensorboard from tensorflow.python.keras.callbacks import TensorBoard def train_fn(): dataset = create_tf_dataset() model = create_model() model.compile(loss=keras.losses.categorical_crossentropy , optimizer=keras.optimizers.Adam(),metrics=[’accuracy’]) tb_callback = TensorBoard(log_dir=tensorboard.logdir()) model_ckpt_callback = keras.callbacks.ModelCheckpoint( tensorboard.logdir(), monitor=’acc’) history = model.fit(dataset , epochs=50, steps_per_epoch=80, callbacks=[tb_callback]) return history.history["acc"][−1]
  • 78. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP EXERCISE 6 (DEFINE & RUN THE EXPERIMENT) 2. Add a new cell with the contents: experiment.launch(train_fn) 3. Execute the two cells in order (Ctrl + <Enter>) 4. Go to the Application UI and monitor the training progress
  • 79. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION SUMMARY BREAK DEMO/WORKSHOP REFERENCES Example notebooks https: //github.com/logicalclocks/hops-examples HopsML6 Hopsworks7 Hopsworks’ feature store8 Maggy https://github.com/logicalclocks/maggy HopsFS9 2em16 Logical Clocks AB. HopsML: Python-First ML Pipelines. https : / / hops . readthedocs . io / en / latest/hopsml/hopsML.html. 2018. 2em17 Jim Dowling. Introducing Hopsworks. https : / / www . logicalclocks . com / introducing - hopsworks/. 2018. 2em18 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www. logicalclocks.com/feature-store/. 2018. 2em19 Salman Niazi et al. “HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases”. In: 15th USENIX Conference on File and Storage Technologies (FAST 17). Santa Clara, CA: USENIX Association, 2017, pp. 89–104. ISBN: 978-1-931971-36-2. URL: https://www.usenix.org/conference/fast17/technical- sessions/presentation/niazi.