SlideShare ist ein Scribd-Unternehmen logo
1 von 19
SparkNet: Training Deep
Networks in Spark
Authors: Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan
(EECS, University of California, Berkeley)
Paper Presentation By: Sneh Pahilwani (T-13)
(CISE, University of Florida, Gainesville)
Motivation
● Much research in making deep learning models more accurate.
● With deeper models, comes better performance responsibility.
● Existing frameworks cannot handle asynchronous and communication-intensive
workloads.
● Claim: SparkNet implements a scalable, distributed algorithm for training deep
networks that can be applied to existing batch-processing frameworks like
MapReduce and Spark, with minimal hardware requirements.
SparkNet Features
● Provides a convenient interface to be able to access Spark RDDs.
● Provides Scala interface to call Caffe.
● Has a lightweight multi-dimensional tensor library.
● SparkNet can scale well to the cluster size and can tolerate a high communication
delay.
● Easy to deploy and no parameter adjustment.
● Compatible with existing Caffe models.
Parameter Server Model
● One or more master nodes hold the latest model parameters in memory and serve
them to worker nodes upon request.
● The nodes then compute gradients with respect to these parameters on a
minibatch drawn from the local dataset copy.
● These gradients are sent back to the server, which updates the model parameters.
SparkNet
architecture
Implementations
● SparkNet is built on top of Apache Spark and Caffe deep learning library.
● Uses Java to access the Caffe data.
● Uses Scala to access the parameters of the Caffe.
● Uses ScalaBuff to make the Caffe network to maintain a dynamic structure at run
time.
● SparkNet can be compatible with some of the Caffe model definition files, and
supports the Caffe model parameters of the load.
Implementations
Map from layer names to
weights
Lightweight multi-dimensional
tensor library implementation.
Implementations
Network Architecture
Implementations
Returns a WeightCollection
type
Stochastic Gradient Descent Comparison
● Conventional:
○ A conventional approach to parallelize gradient computation requires a lot of
broadcasting and communication overhead between the workers and
parameter server after every Stochastic Gradient Descent(SGD) iteration.
● SparkNet:
○ In this model, a fixed parameter is set for every worker in the Spark cluster
(number of iterations or time limit), after which the params are sent to the
master and averaged.
SGD Parallelism
● Spark consists of a single master node and a number of worker nodes.
● The data is split among the Spark workers.
● In every iteration, the Spark master broadcasts the model parameters to each
worker.
● Each worker then runs SGD on the model with its subset of data for a fixed
number of iterations (say, 50), after which the resulting model parameters on
each worker are sent to the master and averaged to form the new model
parameters.
Naive Parallelization
Number of serial iterations
Practical Limitations of Naive Parallelization
● Naive parallelization would distribute minibatch ‘b’ over ‘K’ machines, computing
gradients separately and aggregating results on one node.
● Cost of computation on single node in single iteration: C(b/K) and satisfies
C(b/K) >= C(b)/K. Hence, total running time to achieve test accuracy ‘a’:
Na(b)C(b)/K (in theory)
● Limitation #1: For approximation, C(b)/K ~ C(b/K) to hold, K<<b , limits number
of machines in cluster for effective parallelization.
● Limitation #2: To overcome above limitation, minibatch size could be increased
but does not decrease Na(b) enough to justify the increment.
SparkNet ParallelizationNumber of iterations on every
machine
Number of rounds
SparkNet parallelization
● Proceeds in rounds, where for each round, each machine runs SGD for ‘T’
iterations with batch size ‘b’.
● Between rounds, params on workers are gathered on master, averaged
and broadcasted to workers.
● Basically, synchronization happens every ‘T’ iterations.
● Total number of iterations: TMa(b,k,T).
● Total time taken: (TC(b) + S)Ma(b,k,T).
Results
● A speedup matrix is taken into
consideration with ‘T’ and ‘K’ values.
● For each parameter pair, SparkNet is
run on modified AlexNet
implementation on a subset of
ImageNet(100 classes, 1000 data
points over 20000 parallel
implementations).
● Ratio calculated is TMa(b,k,T)/Na(b)
with S=0 relative to training on
single machine.
Serial SGD
Single Worker, no
speedup
Zero communication
overhead
Results
Speedup measured as a function of communication overhead ‘S’. (5-node cluster)
Non-Zero
communication
overhead
Training Benchmarks
● AlexNet on ImageNet training.
● T = 50.
● Single GPU nodes.
● Accuracy measured: 45%
● Time measured:
○ Caffe: 55.6 hours (baseline)
○ SparkNet 3: 22.9 hours (2.4)
○ SparkNet 5: 14.5 hours (3.8)
○ SparkNet 10: 12.8 hours (4.4)
Training Benchmarks
● GoogLeNet on ImageNet
training.
● T = 50.
● Multi-GPU nodes.
● Accuracy measured: 40%
● Speedup measured (compared to
Caffe 4-GPU):
○ SparkNet 3-node 4-GPU: 2.7
○ SparkNet 6-node 4-GPU: 3.2
● On top of Caffe 4-GPU speedup
of 3.5.
THANK YOU!

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)
saurabh Kumar Chaudhary
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
Abolfazl Asudeh
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
Utshab Saha
 

Was ist angesagt? (20)

Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on Clouds
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
 
Chap5 slides
Chap5 slidesChap5 slides
Chap5 slides
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
Designing a machine learning algorithm for Apache Spark
Designing a machine learning algorithm for Apache SparkDesigning a machine learning algorithm for Apache Spark
Designing a machine learning algorithm for Apache Spark
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
NS3 Simulation Research Guidance
NS3 Simulation Research GuidanceNS3 Simulation Research Guidance
NS3 Simulation Research Guidance
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph Processing
 
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis -  Massimo PeriniDeep Stream Dynamic Graph Analytics with Grapharis -  Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Chap6 slides
Chap6 slidesChap6 slides
Chap6 slides
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
 
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
 

Ähnlich wie SparkNet presentation

14-Bill-Tiffany-SigmaSense-VF2023.pdf
14-Bill-Tiffany-SigmaSense-VF2023.pdf14-Bill-Tiffany-SigmaSense-VF2023.pdf
14-Bill-Tiffany-SigmaSense-VF2023.pdf
SamHoney6
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Databricks
 

Ähnlich wie SparkNet presentation (20)

Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
14-Bill-Tiffany-SigmaSense-VF2023.pdf
14-Bill-Tiffany-SigmaSense-VF2023.pdf14-Bill-Tiffany-SigmaSense-VF2023.pdf
14-Bill-Tiffany-SigmaSense-VF2023.pdf
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
 
Comparative Evaluation of Spark and Flink Stream Processing
Comparative Evaluation of Spark and Flink Stream Processing Comparative Evaluation of Spark and Flink Stream Processing
Comparative Evaluation of Spark and Flink Stream Processing
 
Green scheduling
Green schedulingGreen scheduling
Green scheduling
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurations
 
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
A data and task co scheduling algorithm for scientific cloud workflows
A data and task co scheduling algorithm for scientific cloud workflowsA data and task co scheduling algorithm for scientific cloud workflows
A data and task co scheduling algorithm for scientific cloud workflows
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating Systems
 
3rd 3DDRESD: ReCPU 4 NIDS
3rd 3DDRESD: ReCPU 4 NIDS3rd 3DDRESD: ReCPU 4 NIDS
3rd 3DDRESD: ReCPU 4 NIDS
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

SparkNet presentation

  • 1. SparkNet: Training Deep Networks in Spark Authors: Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan (EECS, University of California, Berkeley) Paper Presentation By: Sneh Pahilwani (T-13) (CISE, University of Florida, Gainesville)
  • 2. Motivation ● Much research in making deep learning models more accurate. ● With deeper models, comes better performance responsibility. ● Existing frameworks cannot handle asynchronous and communication-intensive workloads. ● Claim: SparkNet implements a scalable, distributed algorithm for training deep networks that can be applied to existing batch-processing frameworks like MapReduce and Spark, with minimal hardware requirements.
  • 3. SparkNet Features ● Provides a convenient interface to be able to access Spark RDDs. ● Provides Scala interface to call Caffe. ● Has a lightweight multi-dimensional tensor library. ● SparkNet can scale well to the cluster size and can tolerate a high communication delay. ● Easy to deploy and no parameter adjustment. ● Compatible with existing Caffe models.
  • 4. Parameter Server Model ● One or more master nodes hold the latest model parameters in memory and serve them to worker nodes upon request. ● The nodes then compute gradients with respect to these parameters on a minibatch drawn from the local dataset copy. ● These gradients are sent back to the server, which updates the model parameters. SparkNet architecture
  • 5. Implementations ● SparkNet is built on top of Apache Spark and Caffe deep learning library. ● Uses Java to access the Caffe data. ● Uses Scala to access the parameters of the Caffe. ● Uses ScalaBuff to make the Caffe network to maintain a dynamic structure at run time. ● SparkNet can be compatible with some of the Caffe model definition files, and supports the Caffe model parameters of the load.
  • 6. Implementations Map from layer names to weights Lightweight multi-dimensional tensor library implementation.
  • 9. Stochastic Gradient Descent Comparison ● Conventional: ○ A conventional approach to parallelize gradient computation requires a lot of broadcasting and communication overhead between the workers and parameter server after every Stochastic Gradient Descent(SGD) iteration. ● SparkNet: ○ In this model, a fixed parameter is set for every worker in the Spark cluster (number of iterations or time limit), after which the params are sent to the master and averaged.
  • 10. SGD Parallelism ● Spark consists of a single master node and a number of worker nodes. ● The data is split among the Spark workers. ● In every iteration, the Spark master broadcasts the model parameters to each worker. ● Each worker then runs SGD on the model with its subset of data for a fixed number of iterations (say, 50), after which the resulting model parameters on each worker are sent to the master and averaged to form the new model parameters.
  • 11. Naive Parallelization Number of serial iterations
  • 12. Practical Limitations of Naive Parallelization ● Naive parallelization would distribute minibatch ‘b’ over ‘K’ machines, computing gradients separately and aggregating results on one node. ● Cost of computation on single node in single iteration: C(b/K) and satisfies C(b/K) >= C(b)/K. Hence, total running time to achieve test accuracy ‘a’: Na(b)C(b)/K (in theory) ● Limitation #1: For approximation, C(b)/K ~ C(b/K) to hold, K<<b , limits number of machines in cluster for effective parallelization. ● Limitation #2: To overcome above limitation, minibatch size could be increased but does not decrease Na(b) enough to justify the increment.
  • 13. SparkNet ParallelizationNumber of iterations on every machine Number of rounds
  • 14. SparkNet parallelization ● Proceeds in rounds, where for each round, each machine runs SGD for ‘T’ iterations with batch size ‘b’. ● Between rounds, params on workers are gathered on master, averaged and broadcasted to workers. ● Basically, synchronization happens every ‘T’ iterations. ● Total number of iterations: TMa(b,k,T). ● Total time taken: (TC(b) + S)Ma(b,k,T).
  • 15. Results ● A speedup matrix is taken into consideration with ‘T’ and ‘K’ values. ● For each parameter pair, SparkNet is run on modified AlexNet implementation on a subset of ImageNet(100 classes, 1000 data points over 20000 parallel implementations). ● Ratio calculated is TMa(b,k,T)/Na(b) with S=0 relative to training on single machine. Serial SGD Single Worker, no speedup Zero communication overhead
  • 16. Results Speedup measured as a function of communication overhead ‘S’. (5-node cluster) Non-Zero communication overhead
  • 17. Training Benchmarks ● AlexNet on ImageNet training. ● T = 50. ● Single GPU nodes. ● Accuracy measured: 45% ● Time measured: ○ Caffe: 55.6 hours (baseline) ○ SparkNet 3: 22.9 hours (2.4) ○ SparkNet 5: 14.5 hours (3.8) ○ SparkNet 10: 12.8 hours (4.4)
  • 18. Training Benchmarks ● GoogLeNet on ImageNet training. ● T = 50. ● Multi-GPU nodes. ● Accuracy measured: 40% ● Speedup measured (compared to Caffe 4-GPU): ○ SparkNet 3-node 4-GPU: 2.7 ○ SparkNet 6-node 4-GPU: 3.2 ● On top of Caffe 4-GPU speedup of 3.5.