SlideShare a Scribd company logo
1 of 48
© Cloudera, Inc. All rights reserved.
Parallel/Distributed Deep Learning
and CDSW
Rafael Arana - Senior Solutions Architect
Zuling Kang - Senior Solutions Architect
© Cloudera, Inc. All rights reserved. 2
TABLE OF CONTENTS
● Initiative of distributed deep learning and distributed model training
● Distributing the model training processes
● Integrating the distributed model training into CDSW
● Discussions and future
© Cloudera, Inc. All rights reserved. 3
BACKGROUND
CONNECT
products &
services (IoT)
PROTECTDRIVE
customer insights
© Cloudera, Inc. All rights reserved. 4
Are we there yet?
QUAID Where am I?
JOHNNY (cheerful) You're in a
JohnnyCab!
QUAID I mean...what am I doing
here?
JOHNNY I'm sorry. Would you
please rephrase the question.
QUAID (impatient, enunciates)
How did I get in this taxi?!
JOHNNY The door opened. You
got it.
© Cloudera, Inc. All rights reserved. 5
Increase in compute
Source: https://blog.openai.com/ai-and-compute/
© Cloudera, Inc. All rights reserved. 6
Model lifecycle
© Cloudera, Inc. All rights reserved. 7
The Power-law Region
More compute + more training data -> Better Accuracy
Reference: https://arxiv.org/abs/1712.00409
© Cloudera, Inc. All rights reserved. 8
The Power-law Region
More compute + more training data set = Better Accuracy
Reference: https://arxiv.org/abs/1712.00409
© Cloudera, Inc. All rights reserved. 9
PROBLEM:
LABELED
TRAINING
DATA
• Supervised learning
• Reuse public data sets
• Data Augmentation
• Enterprise Data and data privacy
regulations
© Cloudera, Inc. All rights reserved. 10
TRANSFER
LEARNING
• Low budget ( computation , data set labelling,…)
• Use transfer learning to sort of transfer knowledge
from large public data sets to your own problem.
• Small data: Replace soft Layer
• Medium Data set: Replace last layers
• Large Dataset. Just for initialization
• Sample image detection based on retinanet using
Keras:
• Person, car, …
• But,…what is that prediction on Ringo’s left leg?
© Cloudera, Inc. All rights reserved. 11
Neural Networks Architectures
Training
Data Set
© Cloudera, Inc. All rights reserved. 12
Neural Network
Architecture and
Accuracy
DNN models with more parameters would produce higher
classification accuracy?
• Example: Computer Vision Popular DNN Convnets
• VGG and AlexNet each have more than 150MB of
fully-connected layer parameters, GoogLeNet has
smaller fully-connected layers, and NiN does not
have fully-connected
• GoogLeNet and NiN have a resolution of 1x1
instead of 3x3 or larger
• Models with fewer parameters are more amenable to
scalability, while still delivering high accuracy.
Reference: https://arxiv.org/pdf/1511.00175
© Cloudera, Inc. All rights reserved. 13
Let’s put our model in production!!!!
Photos by Unsplash
© Cloudera, Inc. All rights reserved. 14
Industrialization of ML – Efficient training
Photos by Unsplash
© Cloudera, Inc. All rights reserved. 15
Machine Learning Development Life Cycle
© Cloudera, Inc. All rights reserved. 16
Let’s scale
© Cloudera, Inc. All rights reserved. 17
Cloudera Data Science Workbench
Architecture
CDH CDH
Cloudera Manager
Gateway node(s) CDH nodes
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
Container
Registry
Git Repo
© Cloudera, Inc. All rights reserved. 18
Cloudera Data Science Workbench
Architecture
HDP HDP
Ambari
Gateway node(s) HDP nodes
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
Container
Registry
Git Repo
© Cloudera, Inc. All rights reserved. 19
Adding GPUs
Step 1. Admin > Engines > Engine Images
© Cloudera, Inc. All rights reserved. 20
Adding GPUs
Step 2. Project > Settings > Engine
© Cloudera, Inc. All rights reserved. 21
Adding GPUs
GPU Support
CDSW
CPU
CDH/HDP
CPU
CDH/HDP
CPU
single-node
training
distributed
training, scoring
On CDH coming in C6
GPU
© Cloudera, Inc. All rights reserved. 22
Distributed Tensorflow Package
• Main concepts
• Workers
• Parameter Servers
• tf.Server(),
• tf.ClusterSpec(),
tf.train.SyncReplicasOptimizer()
tf.train.replicas_device_setter()
© Cloudera, Inc. All rights reserved. 23
Local Multi-GPU Training - TF Distribution Strategy
Keras API
distribution = tf.contrib.distribute.MirroredStrategy()
with distribution.scope():
inputs = tf.keras.layers.Input(shape=(1,))
predictions = tf.keras.layers.Dense(1)(inputs)
model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
model.compile(loss='mean_squared_error',
optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.2))
model.fit(train_dataset, epochs=5, steps_per_epoch=10)
CDSW
CPU GPUGPU
GPUGPUCPU
© Cloudera, Inc. All rights reserved. 24
Local Multi-GPU Training - TF Distribution Strategy
Estimator API
def model_fn(features, labels, mode):
layer = tf.layers.Dense(1)
logits = layer(features)
def input_fn():
features = tf.data.Dataset.from_tensors([[1.]]).repeat(100)
labels = tf.data.Dataset.from_tensors(1.).repeat(100)
return tf.data.Dataset.zip((features, labels))
distribution = tf.contrib.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(train_distribute=distribution)
classifier = tf.estimator.Estimator(model_fn=model_fn, config=config)
classifier.train(input_fn=input_fn) classifier.evaluate(input_fn=input_fn)
CDSW
CPU GPUGPU
GPUGPUCPU
TF Estimator
© Cloudera, Inc. All rights reserved.
Distributing the Model Training Processes
© Cloudera, Inc. All rights reserved. 26
PROCEDURES OF TRAINING A DEEP LEARNING MODEL
Repeat the following code for num_epoch times
For each mini_batch(x, y) in dataset
Set pred_tensor = model(x) //feeding forward
Set diff_tensor = L2_loss(y, pred_tensor)
// OR Set diff_tensor = cross_entropy_loss(y, pred_tensor)
Set grad = gradient of diff_tensor
Update the model using grad
© Cloudera, Inc. All rights reserved. 27
FOUR MAJOR ISSUES IN DISTRIBUTED MODEL TRAINING
• Shall we use data parallelism or model parallelism?
• How to efficiently distribute model parameters, which is normal huge in amount?
• How to aggregate model parameters in different training nodes into a global one?
• Model updating algorithms
• How to efficiently scale the training load and make it efficient access the huge
amount of training data?
• The first 3 issues are covered in this section, while the 4th one will be addressed in
the next section.
© Cloudera, Inc. All rights reserved. 28
TENSORFLOW AND MODEL PARALLELISM
● The initial idea of DistBelief is proposed by Google
● First idea is published in its research paper in 2012.
● Used as the built-in distributed implementation for
Tensorflow
● Parameter server (PS)
● A centralized server for sharing neural network
parameters
● Model parallelism:
● A method to distributed the training parameters across
worker nodes
● Model updating algorithm
● Downpour SGD
Jeffrey Dean, et al. “Large scale distributed deep networks”, advances in neural
information processing systems (NIPS), 2012.
© Cloudera, Inc. All rights reserved. 29
FROM MODEL TO DATA PARALLELISM
• Strength of model parallelism
• Applicable for models with size greater than memory or GPU capacities within ONE worker
node
• Weakness
• Unable to take full advantage of our hardware resources
• For models whose parameters can be hold in GPUs within ONE worker node
• Data parallelism
© Cloudera, Inc. All rights reserved. 30
HARDWARE USE RATE AS TRAINING NODE INCREASES
https://eng.uber.com/horovod/
© Cloudera, Inc. All rights reserved. 31
WHOLE PICTURE OF DATA PARALLELISM
https://eng.uber.com/horovod/
© Cloudera, Inc. All rights reserved. 32
FROM PS TO MPI ALLREDUCE
● Based on Baidu ring-allreduce algorithm (see
http://andrew.gibiansky.com/ for detail)
● Using HPC/MPI framework, which originally
written in C while currently encapsulated with
Python
● Implemented in Uber Horovod, Baidu, PyTorch,
MXNet, etc.
● Found to be faster in small-scale number of
nodes (8-64)
○ https://cwiki.apache.org/confluence/display/M
XNET/Extend+MXNet+Distributed+Training+by
+MPI+AllReduce
© Cloudera, Inc. All rights reserved. 33
PERFORMANCE GAINS OF INFINIBAND/RDMA
https://eng.uber.com/horovod/
© Cloudera, Inc. All rights reserved. 34
MODEL UPDATING ALGORITHMS
Synchronized Asynchronous
From: Strategies and Principles of Distributed Machine Learning on
Big Data, https://doi.org/10.1016/J.ENG.2016.02.008
© Cloudera, Inc. All rights reserved. 35
UPDATING ALGORITHM: SYNCHRONIZED VS. ASYNCHRONOUS
• Synchronized algorithms will lead to a more precise and consistent model,
however, some workers will sometimes have to wait for a long time during the
synchronization barrier, which leads to longer training time.
• When the minibatch is large, the low efficiency issue can be largely reduced.
• From: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour,
https://arxiv.org/abs/1706.02677
• Asynchronous algorithms is said to be stochastic
in descent directions which will make the model
imprecise.
• However there find in practice some momentum
that leads the model convergent to a very close
place of its synchronized counterpart.
© Cloudera, Inc. All rights reserved. 36
MODEL ERROR VS. BATCH SIZE
Priya Goyal, et al. Accurate, Large Minibatch SGD: Training ImageNet in 1
Hour. https://arxiv.org/abs/1706.02677
© Cloudera, Inc. All rights reserved. 37
FAMOUS SYNCHRONIZED AND ASYNCHRONOUS EXAMPLES
• Synchronized updating algorithms
• Microsoft CNTK: Model average after certain iterations.
• Uber Horovod: Using large minibatches.
• Asynchronous updating algorithms
• Google Tensorflow: Downpour SGD.
© Cloudera, Inc. All rights reserved. 38
ALGORITHM FRAMEWORK FOR SYNCHRONIZED SGD
•
© Cloudera, Inc. All rights reserved.
Integrating the Distributed Model Training into CDSW
© Cloudera, Inc. All rights reserved. 40
OVERVIEW OF THE ARCHITECTURE
© Cloudera, Inc. All rights reserved. 41
USING CDSW API TO SPAWN TRAIN-WORKERS
Using cdsw.launch_workers() to generate sub-containers, then connect back using the master’s IP address obtained from the
CDSW_MASTER_IP environment variable. After that, trainer-master is able to distribute the IP addresses, and enable all the workers
to create mutual communication.
import cdsw, socket
import threading
import time
workers = cdsw.launch_workers(n=2, cpu=0.2, memory=0.5, 
script="worker.py")
# Attempt to get workers’ IP addresses by accepting connections
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("0.0.0.0", 6000))
s.listen(1)
conns=dict()
for i in range(2):
conn, addr = s.accept()
print("IP address of %d: %s"%(i,addr[0]))
conns[i]=(conn,addr[0])
import os, time, socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((os.environ["CDSW_MASTER_IP"], 6000))
data = s.recv(1024).decode()
print("Response from the master:", data)
s.close()
master.py worker.py
© Cloudera, Inc. All rights reserved. 42
CREATING CDSW DOCKER IMAGES
• CDSW Docker images for distributed model training can be created by extending
the following base images.
• docker.repository.cloudera.com/cdsw/engine:7
• Running the base image, and installing OpenMPI 4.0.0 from source code in the
docker instance.
• Not to install the OS provided OpenMPI package, as its version is below Horovod’s
requirement.
• Installing the core packages.
• pip install petastorm tensorflow pytorch horovd
• If you wish to use GPU in model training, make sure to the NVidia driver and use
the GPU version of Tensorflow and/or PyTorch.
• See: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_gpu.html
© Cloudera, Inc. All rights reserved. 43
USING PRE-BUILT IMAGES
• You can also use our pre-built
Docker image from our public
Docker repo.
• docker pull rarana73/cdsw-7-
horovod-gpu:1
• Content:
• CDSW Base Image v7 -
• CUDA_VERSION 9.0.176
• NCCL_VERSION 2.4.2
• CUDNN_VERSION 7.4.2.24
• Tensorflow 1.12.0
• Open MPI 4.0.0
© Cloudera, Inc. All rights reserved. 44
INITIALIZING OPEN-MPI PEERS
• Normally, OpenMPI peers are initialized by directly spawning Python/OpenMPI
processes via the mpirun command.
• Similarly, in CDSW-Horovod processes, it can also be done by invoke the mpirun
command via Python packages.
• However, make sure when doing so, the train-worker containers are still there.
© Cloudera, Inc. All rights reserved. 45
Horovod in Action
• Applying Horovod to
a WideResNet model, trained on
the Fashion MNIST dataset
• 2 GPUS NVIDA Quadro P600
• CUDA Cores: 384 / 2 GB GDDR5
horovodrun -np 1 python
fashion_mnist/fashion_mnist_solution.py --log-dir log/np-1
horovodrun -np 2 python
fashion_mnist/fashion_mnist_solution.py --log-dir log/np-2
© Cloudera, Inc. All rights reserved.
Discussions and Future
© Cloudera, Inc. All rights reserved. 47
Around the corner
• SPARK On K8S & GPU support
• Horovod in Spark
• TensorFlow 2.0 & Distribution
Strategy
• Apache Submarine -
https://hadoop.apache.org/submarine/
• …
© Cloudera, Inc. All rights reserved.
THANK YOU

More Related Content

What's hot

End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 

What's hot (20)

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 

Similar to Parallel/Distributed Deep Learning and CDSW

Federated Learning
Federated LearningFederated Learning
Federated Learning
DataWorks Summit
 

Similar to Parallel/Distributed Deep Learning and CDSW (20)

Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
 
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloads
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
C3 w3
C3 w3C3 w3
C3 w3
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Parallel/Distributed Deep Learning and CDSW

  • 1. © Cloudera, Inc. All rights reserved. Parallel/Distributed Deep Learning and CDSW Rafael Arana - Senior Solutions Architect Zuling Kang - Senior Solutions Architect
  • 2. © Cloudera, Inc. All rights reserved. 2 TABLE OF CONTENTS ● Initiative of distributed deep learning and distributed model training ● Distributing the model training processes ● Integrating the distributed model training into CDSW ● Discussions and future
  • 3. © Cloudera, Inc. All rights reserved. 3 BACKGROUND CONNECT products & services (IoT) PROTECTDRIVE customer insights
  • 4. © Cloudera, Inc. All rights reserved. 4 Are we there yet? QUAID Where am I? JOHNNY (cheerful) You're in a JohnnyCab! QUAID I mean...what am I doing here? JOHNNY I'm sorry. Would you please rephrase the question. QUAID (impatient, enunciates) How did I get in this taxi?! JOHNNY The door opened. You got it.
  • 5. © Cloudera, Inc. All rights reserved. 5 Increase in compute Source: https://blog.openai.com/ai-and-compute/
  • 6. © Cloudera, Inc. All rights reserved. 6 Model lifecycle
  • 7. © Cloudera, Inc. All rights reserved. 7 The Power-law Region More compute + more training data -> Better Accuracy Reference: https://arxiv.org/abs/1712.00409
  • 8. © Cloudera, Inc. All rights reserved. 8 The Power-law Region More compute + more training data set = Better Accuracy Reference: https://arxiv.org/abs/1712.00409
  • 9. © Cloudera, Inc. All rights reserved. 9 PROBLEM: LABELED TRAINING DATA • Supervised learning • Reuse public data sets • Data Augmentation • Enterprise Data and data privacy regulations
  • 10. © Cloudera, Inc. All rights reserved. 10 TRANSFER LEARNING • Low budget ( computation , data set labelling,…) • Use transfer learning to sort of transfer knowledge from large public data sets to your own problem. • Small data: Replace soft Layer • Medium Data set: Replace last layers • Large Dataset. Just for initialization • Sample image detection based on retinanet using Keras: • Person, car, … • But,…what is that prediction on Ringo’s left leg?
  • 11. © Cloudera, Inc. All rights reserved. 11 Neural Networks Architectures Training Data Set
  • 12. © Cloudera, Inc. All rights reserved. 12 Neural Network Architecture and Accuracy DNN models with more parameters would produce higher classification accuracy? • Example: Computer Vision Popular DNN Convnets • VGG and AlexNet each have more than 150MB of fully-connected layer parameters, GoogLeNet has smaller fully-connected layers, and NiN does not have fully-connected • GoogLeNet and NiN have a resolution of 1x1 instead of 3x3 or larger • Models with fewer parameters are more amenable to scalability, while still delivering high accuracy. Reference: https://arxiv.org/pdf/1511.00175
  • 13. © Cloudera, Inc. All rights reserved. 13 Let’s put our model in production!!!! Photos by Unsplash
  • 14. © Cloudera, Inc. All rights reserved. 14 Industrialization of ML – Efficient training Photos by Unsplash
  • 15. © Cloudera, Inc. All rights reserved. 15 Machine Learning Development Life Cycle
  • 16. © Cloudera, Inc. All rights reserved. 16 Let’s scale
  • 17. © Cloudera, Inc. All rights reserved. 17 Cloudera Data Science Workbench Architecture CDH CDH Cloudera Manager Gateway node(s) CDH nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine Container Registry Git Repo
  • 18. © Cloudera, Inc. All rights reserved. 18 Cloudera Data Science Workbench Architecture HDP HDP Ambari Gateway node(s) HDP nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine Container Registry Git Repo
  • 19. © Cloudera, Inc. All rights reserved. 19 Adding GPUs Step 1. Admin > Engines > Engine Images
  • 20. © Cloudera, Inc. All rights reserved. 20 Adding GPUs Step 2. Project > Settings > Engine
  • 21. © Cloudera, Inc. All rights reserved. 21 Adding GPUs GPU Support CDSW CPU CDH/HDP CPU CDH/HDP CPU single-node training distributed training, scoring On CDH coming in C6 GPU
  • 22. © Cloudera, Inc. All rights reserved. 22 Distributed Tensorflow Package • Main concepts • Workers • Parameter Servers • tf.Server(), • tf.ClusterSpec(), tf.train.SyncReplicasOptimizer() tf.train.replicas_device_setter()
  • 23. © Cloudera, Inc. All rights reserved. 23 Local Multi-GPU Training - TF Distribution Strategy Keras API distribution = tf.contrib.distribute.MirroredStrategy() with distribution.scope(): inputs = tf.keras.layers.Input(shape=(1,)) predictions = tf.keras.layers.Dense(1)(inputs) model = tf.keras.models.Model(inputs=inputs, outputs=predictions) model.compile(loss='mean_squared_error', optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.2)) model.fit(train_dataset, epochs=5, steps_per_epoch=10) CDSW CPU GPUGPU GPUGPUCPU
  • 24. © Cloudera, Inc. All rights reserved. 24 Local Multi-GPU Training - TF Distribution Strategy Estimator API def model_fn(features, labels, mode): layer = tf.layers.Dense(1) logits = layer(features) def input_fn(): features = tf.data.Dataset.from_tensors([[1.]]).repeat(100) labels = tf.data.Dataset.from_tensors(1.).repeat(100) return tf.data.Dataset.zip((features, labels)) distribution = tf.contrib.distribute.MirroredStrategy() config = tf.estimator.RunConfig(train_distribute=distribution) classifier = tf.estimator.Estimator(model_fn=model_fn, config=config) classifier.train(input_fn=input_fn) classifier.evaluate(input_fn=input_fn) CDSW CPU GPUGPU GPUGPUCPU TF Estimator
  • 25. © Cloudera, Inc. All rights reserved. Distributing the Model Training Processes
  • 26. © Cloudera, Inc. All rights reserved. 26 PROCEDURES OF TRAINING A DEEP LEARNING MODEL Repeat the following code for num_epoch times For each mini_batch(x, y) in dataset Set pred_tensor = model(x) //feeding forward Set diff_tensor = L2_loss(y, pred_tensor) // OR Set diff_tensor = cross_entropy_loss(y, pred_tensor) Set grad = gradient of diff_tensor Update the model using grad
  • 27. © Cloudera, Inc. All rights reserved. 27 FOUR MAJOR ISSUES IN DISTRIBUTED MODEL TRAINING • Shall we use data parallelism or model parallelism? • How to efficiently distribute model parameters, which is normal huge in amount? • How to aggregate model parameters in different training nodes into a global one? • Model updating algorithms • How to efficiently scale the training load and make it efficient access the huge amount of training data? • The first 3 issues are covered in this section, while the 4th one will be addressed in the next section.
  • 28. © Cloudera, Inc. All rights reserved. 28 TENSORFLOW AND MODEL PARALLELISM ● The initial idea of DistBelief is proposed by Google ● First idea is published in its research paper in 2012. ● Used as the built-in distributed implementation for Tensorflow ● Parameter server (PS) ● A centralized server for sharing neural network parameters ● Model parallelism: ● A method to distributed the training parameters across worker nodes ● Model updating algorithm ● Downpour SGD Jeffrey Dean, et al. “Large scale distributed deep networks”, advances in neural information processing systems (NIPS), 2012.
  • 29. © Cloudera, Inc. All rights reserved. 29 FROM MODEL TO DATA PARALLELISM • Strength of model parallelism • Applicable for models with size greater than memory or GPU capacities within ONE worker node • Weakness • Unable to take full advantage of our hardware resources • For models whose parameters can be hold in GPUs within ONE worker node • Data parallelism
  • 30. © Cloudera, Inc. All rights reserved. 30 HARDWARE USE RATE AS TRAINING NODE INCREASES https://eng.uber.com/horovod/
  • 31. © Cloudera, Inc. All rights reserved. 31 WHOLE PICTURE OF DATA PARALLELISM https://eng.uber.com/horovod/
  • 32. © Cloudera, Inc. All rights reserved. 32 FROM PS TO MPI ALLREDUCE ● Based on Baidu ring-allreduce algorithm (see http://andrew.gibiansky.com/ for detail) ● Using HPC/MPI framework, which originally written in C while currently encapsulated with Python ● Implemented in Uber Horovod, Baidu, PyTorch, MXNet, etc. ● Found to be faster in small-scale number of nodes (8-64) ○ https://cwiki.apache.org/confluence/display/M XNET/Extend+MXNet+Distributed+Training+by +MPI+AllReduce
  • 33. © Cloudera, Inc. All rights reserved. 33 PERFORMANCE GAINS OF INFINIBAND/RDMA https://eng.uber.com/horovod/
  • 34. © Cloudera, Inc. All rights reserved. 34 MODEL UPDATING ALGORITHMS Synchronized Asynchronous From: Strategies and Principles of Distributed Machine Learning on Big Data, https://doi.org/10.1016/J.ENG.2016.02.008
  • 35. © Cloudera, Inc. All rights reserved. 35 UPDATING ALGORITHM: SYNCHRONIZED VS. ASYNCHRONOUS • Synchronized algorithms will lead to a more precise and consistent model, however, some workers will sometimes have to wait for a long time during the synchronization barrier, which leads to longer training time. • When the minibatch is large, the low efficiency issue can be largely reduced. • From: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, https://arxiv.org/abs/1706.02677 • Asynchronous algorithms is said to be stochastic in descent directions which will make the model imprecise. • However there find in practice some momentum that leads the model convergent to a very close place of its synchronized counterpart.
  • 36. © Cloudera, Inc. All rights reserved. 36 MODEL ERROR VS. BATCH SIZE Priya Goyal, et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. https://arxiv.org/abs/1706.02677
  • 37. © Cloudera, Inc. All rights reserved. 37 FAMOUS SYNCHRONIZED AND ASYNCHRONOUS EXAMPLES • Synchronized updating algorithms • Microsoft CNTK: Model average after certain iterations. • Uber Horovod: Using large minibatches. • Asynchronous updating algorithms • Google Tensorflow: Downpour SGD.
  • 38. © Cloudera, Inc. All rights reserved. 38 ALGORITHM FRAMEWORK FOR SYNCHRONIZED SGD •
  • 39. © Cloudera, Inc. All rights reserved. Integrating the Distributed Model Training into CDSW
  • 40. © Cloudera, Inc. All rights reserved. 40 OVERVIEW OF THE ARCHITECTURE
  • 41. © Cloudera, Inc. All rights reserved. 41 USING CDSW API TO SPAWN TRAIN-WORKERS Using cdsw.launch_workers() to generate sub-containers, then connect back using the master’s IP address obtained from the CDSW_MASTER_IP environment variable. After that, trainer-master is able to distribute the IP addresses, and enable all the workers to create mutual communication. import cdsw, socket import threading import time workers = cdsw.launch_workers(n=2, cpu=0.2, memory=0.5, script="worker.py") # Attempt to get workers’ IP addresses by accepting connections s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(("0.0.0.0", 6000)) s.listen(1) conns=dict() for i in range(2): conn, addr = s.accept() print("IP address of %d: %s"%(i,addr[0])) conns[i]=(conn,addr[0]) import os, time, socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((os.environ["CDSW_MASTER_IP"], 6000)) data = s.recv(1024).decode() print("Response from the master:", data) s.close() master.py worker.py
  • 42. © Cloudera, Inc. All rights reserved. 42 CREATING CDSW DOCKER IMAGES • CDSW Docker images for distributed model training can be created by extending the following base images. • docker.repository.cloudera.com/cdsw/engine:7 • Running the base image, and installing OpenMPI 4.0.0 from source code in the docker instance. • Not to install the OS provided OpenMPI package, as its version is below Horovod’s requirement. • Installing the core packages. • pip install petastorm tensorflow pytorch horovd • If you wish to use GPU in model training, make sure to the NVidia driver and use the GPU version of Tensorflow and/or PyTorch. • See: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_gpu.html
  • 43. © Cloudera, Inc. All rights reserved. 43 USING PRE-BUILT IMAGES • You can also use our pre-built Docker image from our public Docker repo. • docker pull rarana73/cdsw-7- horovod-gpu:1 • Content: • CDSW Base Image v7 - • CUDA_VERSION 9.0.176 • NCCL_VERSION 2.4.2 • CUDNN_VERSION 7.4.2.24 • Tensorflow 1.12.0 • Open MPI 4.0.0
  • 44. © Cloudera, Inc. All rights reserved. 44 INITIALIZING OPEN-MPI PEERS • Normally, OpenMPI peers are initialized by directly spawning Python/OpenMPI processes via the mpirun command. • Similarly, in CDSW-Horovod processes, it can also be done by invoke the mpirun command via Python packages. • However, make sure when doing so, the train-worker containers are still there.
  • 45. © Cloudera, Inc. All rights reserved. 45 Horovod in Action • Applying Horovod to a WideResNet model, trained on the Fashion MNIST dataset • 2 GPUS NVIDA Quadro P600 • CUDA Cores: 384 / 2 GB GDDR5 horovodrun -np 1 python fashion_mnist/fashion_mnist_solution.py --log-dir log/np-1 horovodrun -np 2 python fashion_mnist/fashion_mnist_solution.py --log-dir log/np-2
  • 46. © Cloudera, Inc. All rights reserved. Discussions and Future
  • 47. © Cloudera, Inc. All rights reserved. 47 Around the corner • SPARK On K8S & GPU support • Horovod in Spark • TensorFlow 2.0 & Distribution Strategy • Apache Submarine - https://hadoop.apache.org/submarine/ • …
  • 48. © Cloudera, Inc. All rights reserved. THANK YOU