Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Tutorial:
Deep Learning Implementations
and Frameworks
Seiya Tokui*, Kenta Oono*, Atsunori Kanemura+, Toshihiro Kamishima+...
Overview of this tutorial
•1st session (KO, 8:30 – 10:00)
•Introduction
•Basics of neural networks
•Common design of neura...
Differences of
Deep Learning Frameworks
Seiya Tokui
Preferred Networks, Inc.
PAKDD2016 DLIF Tutorial 3
Objective of this part
•List up the design choices of NN frameworks
•Introduce the objective differences between
existing ...
Outline
•Recall the steps of training NNs
•Quick comparison of existing frameworks
•Details of design choices
PAKDD2016 DL...
Outline
•Recall the steps of training NNs
•Quick comparison of existing frameworks
•Details of design choices
PAKDD2016 DL...
Steps for Training Neural Networks
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (...
Training of Neural Networks
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) b...
Training of Neural Networks
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) b...
Framework Design Choices
• The most crucial part of NN frameworks is
• How to define the parameters
• How to define the lo...
Outline
•Recall the steps of training NNs
•Quick comparison of existing frameworks
•Details of design choices
PAKDD2016 DL...
List of Frameworks (not exhaustive)
• Torch.nn
• Theano and ones on top of it (Keras, Blocks, Lasagne, etc.)
• We omit int...
Torch.nn
PAKDD2016 DLIF Tutorial 13
• MATLAB-like environment built on LuaJIT
• Fast scripting, CPU/GPU support with unifi...
Theano (and ones on top of it)
PAKDD2016 DLIF Tutorial 14
• Support computational optimizations and compilations
• Python ...
Caffe
• Fast implementation of NNs in C++
• Mainly focusing on computer vision applications
PAKDD2016 DLIF Tutorial 15
autograd (NumPy, Torch)
• Original one adds automatic differentiation on NumPy APIs
• It is also ported to Torch
PAKDD2016...
Chainer
• Support backprop through dynamically constructed graphs
• It also provides a NumPy-compatible GPU array backend
...
MXNet
• Mixed paradigm support (symbolic/imperative computations)
• It also supports distributed computations
PAKDD2016 DL...
TensorFlow
• Fast execution by distributed computations
• It also supports some control flows on top of the graphs
PAKDD20...
Framework Comparison: Basic information*
Viewpoint Torch.nn** Theano*** Caffe
autograd
(NumPy,
Torch)
Chainer MXNet
Tensor...
List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational gr...
Framework Comparison: Design Choices
Design
Choice
Torch.nn
Theano-
based
Caffe
autograd
(NumPy,
Torch)
Chainer MXNet
Tens...
Outline
• Recall the steps of training NNs
• Quick comparison of existing frameworks
• Details of design choices
PAKDD2016...
List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational gr...
How to write NNs in text format
Write NNs in declarative
configuration files
Framework builds layers of
NNs as written in ...
How to write NNs in text format
Write NNs in declarative
configuration files
High portability
The configuration files are
...
List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational gr...
2. How to build computational graphs
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next...
2. How to build computational graphs
PAKDD2016 DLIF Tutorial 29
Build once, run several
times
Computational graphs are
bui...
2. How to build computational graphs
PAKDD2016 DLIF Tutorial 30
Build once, run several
times
Easy to optimize the
computa...
Flexibility and availability of runtime
language syntaxes
Example: recurrent nets for variable length sequences
Batch 1
Ba...
List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational gr...
3. How to compute backprop
PAKDD2016 DLIF Tutorial 33
Backprop through graphs
Framework only builds
graphs of forward prop...
3. How to compute backprop
PAKDD2016 DLIF Tutorial 34
Backprop through graphs
Easy and simple to
implement
Backprop comput...
List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational gr...
4. How to represent parameters
PAKDD2016 DLIF Tutorial 36
Parameters as part of
operator nodes
Parameters are owned by
ope...
4. How to represent parameters
PAKDD2016 DLIF Tutorial 37
Parameters as part of
operator nodes
Intuitiveness
This represen...
5. How to update parameters
PAKDD2016 DLIF Tutorial 38
Update parameters by own
routines outside of the
graphs
Update form...
5. How to update parameters
PAKDD2016 DLIF Tutorial 39
Update parameters by own
routines outside of the
graphs
Easy to imp...
List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational gr...
6. How to achieve the computational
performance
PAKDD2016 DLIF Tutorial 41
Transform the graphs to
optimize the computatio...
7. How to scale the computations
PAKDD2016 DLIF Tutorial 42
Multi-GPU parallelizations
Nowadays, most popular
frameworks s...
Ease and comfortability of writing NNs
• I mainly explained the abilities of each framework
• But it does not include many...
Summary
• The important points of framework differences are in the
ways to define computational graphs and how to use them...
Conclusion
• We introduced the basics of NNs, typical designs of their
implementations, and pros/cons of various design ch...
Nächste SlideShare
Wird geladen in …5
×

Differences of Deep Learning Frameworks

4.576 Aufrufe

Veröffentlicht am

This is the slide for the second session of the "Deep Learning Implementations and Frameworks" tutorial at PAKDD 2016.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Differences of Deep Learning Frameworks

  1. 1. Tutorial: Deep Learning Implementations and Frameworks Seiya Tokui*, Kenta Oono*, Atsunori Kanemura+, Toshihiro Kamishima+ *Preferred Networks, Inc. (PFN) {tokui,oono}@preferred.jp +National Institute of Advanced Industrial Science and Technology (AIST) atsu-kan@aist.go.jp, mail@kamishima.net 2nd session 1
  2. 2. Overview of this tutorial •1st session (KO, 8:30 – 10:00) •Introduction •Basics of neural networks •Common design of neural network implementations •2nd session (ST, 10:30 – 12:30) •Differences of deep learning frameworks •Coding examples of frameworks •Conclusion
  3. 3. Differences of Deep Learning Frameworks Seiya Tokui Preferred Networks, Inc. PAKDD2016 DLIF Tutorial 3
  4. 4. Objective of this part •List up the design choices of NN frameworks •Introduce the objective differences between existing frameworks on these choices • Two or more choices at each topic • Pros/cons of each choice PAKDD2016 DLIF Tutorial 4
  5. 5. Outline •Recall the steps of training NNs •Quick comparison of existing frameworks •Details of design choices PAKDD2016 DLIF Tutorial 5
  6. 6. Outline •Recall the steps of training NNs •Quick comparison of existing frameworks •Details of design choices PAKDD2016 DLIF Tutorial 6
  7. 7. Steps for Training Neural Networks Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the NN parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters PAKDD2016 DLIF Tutorial 7
  8. 8. Training of Neural Networks Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the NN parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters automated PAKDD2016 DLIF Tutorial 8
  9. 9. Training of Neural Networks Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the NN parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters automated PAKDD2016 DLIF Tutorial 9
  10. 10. Framework Design Choices • The most crucial part of NN frameworks is • How to define the parameters • How to define the loss function of the parameters (= how to write computational graphs) • These also influence on APIs for forward prop, backprop, and parameter updates (i.e., numerical optimization) • And all of these are determined by how to implement computational graphs • Other parts are also important, but are mostly common to implementations of other types of machine learning methods PAKDD2016 DLIF Tutorial 10
  11. 11. Outline •Recall the steps of training NNs •Quick comparison of existing frameworks •Details of design choices PAKDD2016 DLIF Tutorial 11
  12. 12. List of Frameworks (not exhaustive) • Torch.nn • Theano and ones on top of it (Keras, Blocks, Lasagne, etc.) • We omit introduction of each NN framework here, since 1) there are too many frameworks on top of Theano, and 2) most of them share characteristics derived from Theano • Caffe • autograd (NumPy, Torch) • Chainer • MXNet • TensorFlow PAKDD2016 DLIF Tutorial 12
  13. 13. Torch.nn PAKDD2016 DLIF Tutorial 13 • MATLAB-like environment built on LuaJIT • Fast scripting, CPU/GPU support with unified array backend
  14. 14. Theano (and ones on top of it) PAKDD2016 DLIF Tutorial 14 • Support computational optimizations and compilations • Python package to build computational graphs
  15. 15. Caffe • Fast implementation of NNs in C++ • Mainly focusing on computer vision applications PAKDD2016 DLIF Tutorial 15
  16. 16. autograd (NumPy, Torch) • Original one adds automatic differentiation on NumPy APIs • It is also ported to Torch PAKDD2016 DLIF Tutorial 16
  17. 17. Chainer • Support backprop through dynamically constructed graphs • It also provides a NumPy-compatible GPU array backend PAKDD2016 DLIF Tutorial 17
  18. 18. MXNet • Mixed paradigm support (symbolic/imperative computations) • It also supports distributed computations PAKDD2016 DLIF Tutorial 18
  19. 19. TensorFlow • Fast execution by distributed computations • It also supports some control flows on top of the graphs PAKDD2016 DLIF Tutorial 19
  20. 20. Framework Comparison: Basic information* Viewpoint Torch.nn** Theano*** Caffe autograd (NumPy, Torch) Chainer MXNet Tensor- Flow GitHub stars 4,719 3,457 9,590 N: 654 T: 554 1,295 3,316 20,981 Started from 2002 2008 2013 2015 2015 2015 2015 Open issues/PRs 97/26 525/105 407/204 N: 9/0 T: 3/1 95/25 271/18 330/33 Main developers Facebook, Twitter, Google, etc. Université de Montréal BVLC (U.C. Berkeley) N: HIPS (Harvard Univ.) T: Twitter Preferred Networks DMLC Google Core languages C/Lua C/Python C++ Python/Lua Python C++ C++/Python Supported languages Lua Python C++/Python MATLAB Python/Lua Python C++/Python R/Julia/Go etc. C++/Python * Data was taken on Apr. 12, 2016 ** Includes statistics of Torch7 *** There are many frameworks on top of Theano, though we omit them due to the space constraints PAKDD2016 DLIF Tutorial 20
  21. 21. List of Important Design Choices Programming paradigms 1. How to write NNs in text format 2. How to build computational graphs 3. How to compute backprop 4. How to represent parameters 5. How to update parameters Performance improvements 6. How to achieve the computational performance 7. How to scale the computations PAKDD2016 DLIF Tutorial 21
  22. 22. Framework Comparison: Design Choices Design Choice Torch.nn Theano- based Caffe autograd (NumPy, Torch) Chainer MXNet Tensor- Flow 1.NN definition Script (Lua) Script* (Python) Data (protobuf) Script (Python, Lua) Script (Python) Script (many) Script (Python) 2. Graph construction Prebuild Prebuild Prebuild Dynamic Dynamic Prebuild** Prebuild 3. Backprop Through graph Extended graph Through graph Extended graph Through graph Through graph Extended graph 4. Parameters Hidden in operators Separate nodes Hidden in operators Separate nodes Separate nodes Separate nodes Separate nodes 5. Update formula Outside of graphs Part of graphs Outside of graphs Outside of graphs Outside of graphs Outside of graphs** Part of graphs 6. Optimization - Advanced optimization - - - - Simple optimization 57 Parallel computation Multi GPU Multi GPU (libgpuarray) Multi GPU Multi GPU (Torch) Multi GPU Multi node Multi GPU Multi node Multi GPU * Some of Theano-based frameworks use data (e.g. yaml) ** Dynamic dependency analysis and optimization is supported (no autodiff support) 22
  23. 23. Outline • Recall the steps of training NNs • Quick comparison of existing frameworks • Details of design choices PAKDD2016 DLIF Tutorial 23
  24. 24. List of Important Design Choices Programming paradigms 1. How to write NNs in text format 2. How to build computational graphs 3. How to compute backprop 4. How to represent parameters 5. How to update parameters Performance improvements 6. How to achieve the computational performance 7. How to scale the computations PAKDD2016 DLIF Tutorial 24
  25. 25. How to write NNs in text format Write NNs in declarative configuration files Framework builds layers of NNs as written in the files (e.g. prototxt, YAML). E.g.: Caffe (prototxt), Pylearn2 (YAML) PAKDD2016 DLIF Tutorial 25 Write NNs by procedural scripting Framework provides APIs of scripting languages to build NNs. E.g.: most other frameworks
  26. 26. How to write NNs in text format Write NNs in declarative configuration files High portability The configuration files are easy to parse, and reuse for other frameworks. Low flexibility Most static data format does not support structured programming, so it is hart to write complex NNs. PAKDD2016 DLIF Tutorial 26 Write NNs by procedural scripting Low portability It requires much efforts to port NNs to other frameworks. High flexibility Users can use the abstraction power of the scripting languages on building NNs.
  27. 27. List of Important Design Choices Programming paradigms 1. How to write NNs in text format 2. How to build computational graphs 3. How to compute backprop 4. How to represent parameters 5. How to update parameters Performance improvements 6. How to achieve the computational performance 7. How to scale the computations PAKDD2016 DLIF Tutorial 27
  28. 28. 2. How to build computational graphs Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the NN parameters Save the NN parameters Compute the gradient (backprop) Update the NN parameters Define how to compute the loss PAKDD2016 DLIF Tutorial 28 Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the NN parameters Save the NN parameters Define how to compute the loss Compute the gradient (backprop) Update the NN parameters Build once, run several times Build one at every iteration
  29. 29. 2. How to build computational graphs PAKDD2016 DLIF Tutorial 29 Build once, run several times Computational graphs are built once before entering the loop. E.g.: most frameworks (Torch.nn, Theano, Caffe, TensorFlow, MXNet, etc.) Build one at every iteration Computational graphs are rebuilt at every iteration. E.g.: autograd, Chainer
  30. 30. 2. How to build computational graphs PAKDD2016 DLIF Tutorial 30 Build once, run several times Easy to optimize the computations Framework can optimize the computational graphs on constructing them. Low flexibility and usability Users cannot build different graphs for different iterations using language syntaxes. Build one at every iteration Hard to optimize the computations It is basically difficult to do optimization every iteration due to its computational cost. High flexibility and usability Users can build different graphs for different iterations using language syntaxes.
  31. 31. Flexibility and availability of runtime language syntaxes Example: recurrent nets for variable length sequences Batch 1 Batch 2 Batch 3 Batch 4 In “build once” approach, we must build all possible graphs beforehand, or use framework- specific “control flow operators”. PAKDD2016 DLIF Tutorial 31 In “build every time” approach, we can use for loops of the underlying languages to build such graphs, using data- dependent termination conditions.
  32. 32. List of Important Design Choices Programming paradigms 1. How to write NNs in text format 2. How to build computational graphs 3. How to compute backprop 4. How to represent parameters 5. How to update parameters Performance improvements 6. How to achieve the computational performance 7. How to scale the computations PAKDD2016 DLIF Tutorial 32
  33. 33. 3. How to compute backprop PAKDD2016 DLIF Tutorial 33 Backprop through graphs Framework only builds graphs of forward prop, and do backprop by backtracking the graphs. E.g.: Torch.nn, Caffe, MXNet, Chainer Backprop as extended graphs Framework builds graphs for backprop as well as those for forward prop. E.g.: Theano, TensorFlow a mul suby c z b a mul suby c z b dzid neg mul mul dy dc da db ∇y z∇x1 z ∇z z = 1
  34. 34. 3. How to compute backprop PAKDD2016 DLIF Tutorial 34 Backprop through graphs Easy and simple to implement Backprop computation need not be defined as graphs. Low flexibility Features available for graphs may not apply to backprop computations (e.g., applying additional backprop thorugh them, computational optimizations, etc.). Backprop as extended graphs Implementation gets complicated High flexibility Any features available for graphs can also be applied to backprop computations.
  35. 35. List of Important Design Choices Programming paradigms 1. How to write NNs in text format 2. How to build computational graphs 3. How to compute backprop 4. How to represent parameters 5. How to update parameters Performance improvements 6. How to achieve the computational performance 7. How to scale the computations PAKDD2016 DLIF Tutorial 35
  36. 36. 4. How to represent parameters PAKDD2016 DLIF Tutorial 36 Parameters as part of operator nodes Parameters are owned by operator nodes (e.g., convolution layers), and not directly appear in the graphs. E.g.: Torch.nn, Caffe, MXNet Parameters as separate nodes in the graphs Parameters are represented as separate variable nodes. E.g.: Theano, Chainer, TensorFlow x Affine (own W and b) y x Affine yW b
  37. 37. 4. How to represent parameters PAKDD2016 DLIF Tutorial 37 Parameters as part of operator nodes Intuitiveness This representation resembles the classical formulation of NNs. Low flexibility and reusability We cannot do same things for the parameters that can be done for variable nodes. Parameters as separate nodes in the graphs High flexibility and reusability We can apply any operations that can be done for variable nodes to the parameters.
  38. 38. 5. How to update parameters PAKDD2016 DLIF Tutorial 38 Update parameters by own routines outside of the graphs Update formulae are implemented directly using the backend array libraries. E.g.: Torch.nn, Caffe, MXNet, Chainer Represent update formulae as a part of the graphs Update formulae are built as a part of computational graphs. E.g.: Theano, TensorFlow
  39. 39. 5. How to update parameters PAKDD2016 DLIF Tutorial 39 Update parameters by own routines outside of the graphs Easy to implement We can use any features of the array backend on writing update formulae. Low integrity Update formulae are not integrated to computational graphs. Represent update formulae as a part of the graphs Implementation gets complicated Framework must support assign or update operations within the computational graphs. High integrity We can apply e.g. optimizations to the update formulae.
  40. 40. List of Important Design Choices Programming paradigms 1. How to write NNs in text format 2. How to build computational graphs 3. How to compute backprop 4. How to represent parameters 5. How to update parameters Performance improvements 6. How to achieve the computational performance 7. How to scale the computations PAKDD2016 DLIF Tutorial 40
  41. 41. 6. How to achieve the computational performance PAKDD2016 DLIF Tutorial 41 Transform the graphs to optimize the computations There are many ways to optimize the computations. Theano supports variout optimizations. TensorFlow does simple ones. Provide easy ways to write custom operator nodes Users can write their own operator nodes optimized to their purposes. Torch, MXNet, and Chainer provide ways to write one code that runs both on CPU and GPU. Chainer also provides ways to write custom CUDA kernels without manual compilation steps.
  42. 42. 7. How to scale the computations PAKDD2016 DLIF Tutorial 42 Multi-GPU parallelizations Nowadays, most popular frameworks start supporting multi-GPU computations. Multi-GPU (one machine) is enough for most use cases today. Distributed computations (i.e., multi-node parallelizations) Some frameworks also support distributed computations to further scale the learning. MXNet uses a simple distributed key-value store. TensorFlow uses gRPC. It will also support easy-to-use cloud environments. CNTK uses simple MPI.
  43. 43. Ease and comfortability of writing NNs • I mainly explained the abilities of each framework • But it does not include many things around the framework comparison • Choice of frameworks actually depends on the ease and comfortability of writing NNs on them • Many people chooses Torch for research, because Lua is simple and fast so that they do not have to care about the performance (in most cases) • Try and error is important here again (as well as its importance on deep learning research itself) • The choice of frameworks finally depends on your preference • The capabilities are still important to satisfy your demands PAKDD2016 DLIF Tutorial 43
  44. 44. Summary • The important points of framework differences are in the ways to define computational graphs and how to use them • There are several design choices on the framework development • Each of them influences on their performance and flexibility (i.e., the range of easily representable NNs and their learning procedures) • Once your demands are satisfied, choose one that you feel comfortable (it strongly depends on your own preferences!) PAKDD2016 DLIF Tutorial 44
  45. 45. Conclusion • We introduced the basics of NNs, typical designs of their implementations, and pros/cons of various design choices. • Deep learning is an emerging field with increasing speed of development, so quick try-and-error is crutial for the research/development in this field • In that mean, using frameworks as highly reusable parts of NNs is important • There are growing number of frameworks in this world, though most of them have different aspects, so it is also important to choose one appropriate for your purpose PAKDD2016 DLIF Tutorial 45

×