SlideShare ist ein Scribd-Unternehmen logo
1 von 29
RTSS Jun Young Park
Introduction to PyTorch
Objective
 Understanding AutoGrad
 Review
 Logistic Classifier
 Loss Function
 Backpropagation
 Chain Rule
 Example : Find gradient from a matrix
 AutoGrad
 Solve the example with AutoGrad
 Data Parallism in PyTorch
 Why should we use GPUs?
 Inside CUDA
 How to parallelize our models
 Experiment
Simple but powerful implementation of backpropagation
Understanding AutoGrad
Logistic Classifier (Fully-Connected)
𝑊𝑋 + b = y
2.0
1.0
0.1
p = 0.7
p = 0.2
p = 0.1
S(y)
ProbabilityLogits
X : Input
W, b : To be trained
y : Prediction
S(y) : Softmax function (Can be other activation functions)
A
B
C
𝑆 𝑊 =
𝑒 𝑊 𝑖
𝑖 𝑒 𝑊 𝑖
represents the probabilities of elements in vector 𝑊.
A
Instance
Distance
A
0.7
0.2
0.1
Probability
1
0
0
One-Hot Encoded
A
B
C
MAX
Loss
Find W, b that minimize the loss(error).
Predict Label
Loss Function
 The vector can be very large when there are a lot of classes.
 How can we find the distance between vector S(Predict) and L(Label) ?
𝐷 𝑆, 𝐿 = −
𝑖
𝐿𝑖 log(𝑆𝑖)
0.7
0.2
0.1
1.0
0.0
0.0
S(y) L
※ D(S,L) ≠ D(L,S)
Don’t worry to take log(0)
𝑆 𝑊 =
𝑒 𝑊𝑖
𝑖 𝑒 𝑊 𝑖
In-depth of Classifier
Let there’re equations 

1. Affine Sum
𝜎(𝑥) = 𝑊𝑥 + 𝐵
2. Activation Function
𝑊(𝜎) = 𝑅𝑒𝐿𝑈 𝜎
3. Loss Function
𝐞 𝑊 =
1
2
𝑊𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑊
2
4. Gradient Descent
𝑀 ← 𝑀 − 𝛌
𝜕𝐞
𝜕𝑀
𝑏 ← 𝑏 − 𝛌
𝜕𝐞
𝜕𝑏
• Gradient Descent requires
𝜕𝐞
𝜕𝑀
and
𝜕𝐞
𝜕𝑏
.
• How can we find them? -> Use chain rule !
𝑊𝑡𝑎𝑟𝑔𝑒𝑡 : Training data
𝑊 : Prediction result
Chain Rule
• Let y(x) is defined below, 𝑥 influences 𝑔 𝑥 and 𝑔 𝑥 influences 𝑓 𝑔 𝑥
𝑊 𝑥 = 𝑓 𝑔 𝑥 = 𝑓 ∘ 𝑔(𝑥)
• Find derivation of y(x)
𝑊′
𝑥 = 𝑓′
𝑔 𝑥 𝑔′
𝑥
• in Liebniz notation

𝑑𝑊
𝑑𝑥
=
𝑑𝑊
𝑑𝑓
𝑑𝑓
𝑑𝑔
𝑑𝑔
𝑑𝑥
= 1 ∗ 𝑓′ 𝑔 𝑥 ∗ 𝑔′(𝑥)
Chain Rule
𝜕𝐞
𝜕𝑀
=
𝜕𝐞
𝜕𝑊
𝜕𝑊
𝜕𝜎
𝜕𝜎
𝜕𝑀
=
𝑥 𝑊 − 𝑊𝑡𝑎𝑟𝑔𝑒𝑡 (𝜎 > 0)
0 (𝜎 ≀ 0)
𝜕𝐞
𝜕𝑊
= 𝑊 − 𝑊𝑡𝑎𝑟𝑔𝑒𝑡 ,
𝜕𝑊
𝜕𝜎
=
1 (𝜎 > 0)
0 (𝜎 ≀ 0)
,
𝜕𝜎
𝜕𝑀
= 𝑥
Let there’re equations 

1. Affine Sum
𝜎(𝑥) = 𝑊𝑥 + 𝐵
2. Activation Function
𝑊(𝜎) = 𝑅𝑒𝐿𝑈 𝜎
3. Loss Function
𝐞 𝑊 =
1
2
𝑊𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑊
2
4. Gradient Descent
𝑀 ← 𝑀 − 𝛌
𝜕𝐞
𝜕𝑀
𝑏 ← 𝑏 − 𝛌
𝜕𝐞
𝜕𝑏
Example : Finding gradient of 𝑋
 Let input tensor 𝑋 is initialized by following square matrix of 3rd order.
𝑋 =
1 2 3
4 5 6
7 8 9
 And 𝑌, 𝑍 is defined following 

𝑌 = 𝑋 + 3
𝑍 = 6(𝑌)2
= 6( 𝑋 + 3)2
 And output 𝛿 is the average of tensor 𝑍
𝛿 = 𝑚𝑒𝑎𝑛 𝑍 =
1
9
𝑖 𝑗
𝑍𝑖𝑗
Example : Finding gradient of 𝑋
 We can find scalar 𝑍𝑖𝑗 from its definition (Linearity)
𝑍𝑖𝑗 = 6(𝑌𝑖𝑗)2
𝑌𝑖𝑗 = 𝑋𝑖𝑗 + 3
 To find gradient, We use ‘Chain Rule’ so that we can find partial gradients.
𝜕𝛿
𝜕𝑍𝑖𝑗
=
1
9
,
𝜕𝑍𝑖𝑗
𝜕𝑌𝑖𝑗
= 12𝑌𝑖𝑗,
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
= 1
𝜕𝛿
𝜕𝑋𝑖𝑗
=
𝜕𝛿
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
=
1
9
∗ 12𝑌𝑖𝑗 ∗ 1 =
4
3
𝑋𝑖𝑗 + 3
Example : Finding gradient of 𝑋
 Thus, We can get a gradient of (1,1) element of 𝑋
𝜕𝛿
𝜕𝑋𝑖𝑗
=
4
3
𝑋𝑖𝑗 + 3 |(𝑖, 𝑗)=(1,1) =
4
3
1 + 3 =
16
3
 Like this, We can get whole gradient matrix of 𝑋 

𝜕𝛿
𝜕 𝑋
=
𝜕𝛿
𝜕𝑋11
𝜕𝛿
𝜕𝑋12
𝜕𝛿
𝜕𝑋13
𝜕𝛿
𝜕𝑋21
𝜕𝛿
𝜕𝑋22
𝜕𝛿
𝜕𝑋23
𝜕𝛿
𝜕𝑋31
𝜕𝛿
𝜕𝑋32
𝜕𝛿
𝜕𝑋33
=
16
3
20
3
24
3
28
3
32
3
36
3
40
3
44
3
48
3
AutoGrad : Finding gradient of 𝑋
𝑋 =
1 2 3
4 5 6
7 8 9
𝑌 = 𝑋 + 3
𝑍 = 6(𝑌)2
= 6( 𝑋 + 3)2
𝛿 = 𝑚𝑒𝑎𝑛 𝑍 =
1
9
𝑖 𝑗
𝑍𝑖𝑗
𝜕𝛿
𝜕𝑋
=
𝜕𝛿
𝜕𝑋11
𝜕𝛿
𝜕𝑋12
𝜕𝛿
𝜕𝑋13
𝜕𝛿
𝜕𝑋21
𝜕𝛿
𝜕𝑋22
𝜕𝛿
𝜕𝑋23
𝜕𝛿
𝜕𝑋31
𝜕𝛿
𝜕𝑋32
𝜕𝛿
𝜕𝑋33
=
16
3
20
3
24
3
28
3
32
3
36
3
40
3
44
3
48
3
Each operation has its gradient function.
Back Propagation
 Get derivatives using ‘Back Propagation’
+
𝑥
𝑊
𝑧
𝑧 = 𝑥 + 𝑊
𝜕𝑧
𝜕𝑥
=
𝜕𝑧
𝜕𝑊
= 1
𝜕𝐿
𝜕𝑧
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑥
=
𝜕𝐿
𝜕𝑧
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑊
=
𝜕𝐿
𝜕𝑧
x
𝑥
𝑊
𝑧
𝑧 = 𝑥𝑊
𝜕𝑧
𝜕𝑥
= 𝑊,
𝜕𝑧
𝜕𝑊
= 𝑥
𝜕𝐿
𝜕𝑧
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑥
=
𝜕𝐿
𝜕𝑧
∙ 𝑊
From output signal 𝐿 

𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑊
=
𝜕𝐿
𝜕𝑧
∙ 𝑥
Back Propagation
 How about exponentation function?
^
𝑛
𝑥 𝑧
𝑧 = 𝑥 𝑛
𝜕𝑧
𝜕𝑥
= 𝑛𝑥 𝑛−1
,
𝜕𝑧
𝜕𝑛
= 𝑥 𝑛
ln 𝑥
𝜕𝐿
𝜕𝑧
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑥
=
𝜕𝐿
𝜕𝑧
(𝑛𝑥 𝑛−1
)
From output signal 𝐿 

𝑧 = 𝑥 𝑛
ln 𝑧 = 𝑛 ln 𝑥
1
𝑧
𝑑𝑧 = ln 𝑥 𝑑𝑛
𝑑𝑧
𝑑𝑛
= 𝑧 ln 𝑥 = 𝑥 𝑛 ln 𝑥
𝜕𝐿
𝜕𝑧
𝜕𝑧
𝜕𝑛
=
𝜕𝐿
𝜕𝑧
(𝑥 𝑛
ln 𝑥)
Appendix : Operation Graph of 𝛿 (Matrix)
+𝑋11 ^
𝑌11
x
2 2 6
x
1
9
+
𝑍11
𝑋12






𝑋33







 𝑍12
𝑍33
𝛿


𝑍𝑖𝑗 = 6(𝑌𝑖𝑗)2
𝛿 = 𝑚𝑒𝑎𝑛 𝑍
Appendix : Operation Graph of 𝛿 (Scalar)
- Backpropagation
+𝑋𝑖𝑗 ^
𝑌𝑖𝑗
x
2 6
x
1
9
+
𝑍𝑖𝑗
𝛿
+𝑋𝑖𝑗 ^ x x+
𝑍 𝑠𝑢𝑚
2
𝛜𝑖𝑗𝛌𝑖𝑗
𝜕𝛿
𝜕𝑋𝑖𝑗
=
𝜕𝛿
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
=
𝜕𝛿
𝜕𝛜𝑖𝑗
𝜕𝛜𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝛌𝑖𝑗
𝜕𝛌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
=
1
9
∗ 1 ∗ 6 ∗ 2𝑌𝑖𝑗 ∗ 2 =
4
3
(𝑋𝑖𝑗 + 3)
𝜕𝛿
𝜕𝛜𝑖𝑗
=
1
9
𝜕𝛿
𝜕𝑍𝑖𝑗
=
1
9
∗ 1
𝜕𝛿
𝜕𝛌𝑖𝑗
=
1
9
∗ 1 ∗ 6
=
𝜕𝛿
𝜕𝛜𝑖𝑗
=
𝜕𝛿
𝜕𝛜𝑖𝑗
𝜕𝛜𝑖𝑗
𝜕𝑍𝑖𝑗
=
𝜕𝛿
𝜕𝛜𝑖𝑗
𝜕𝛜𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝛌𝑖𝑗
=
𝜕𝛿
𝜕𝛜𝑖𝑗
𝜕𝛜𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝛌𝑖𝑗
𝜕𝛌𝑖𝑗
𝜕𝑌𝑖𝑗
=
𝜕𝛿
𝜕𝛜𝑖𝑗
𝜕𝛜𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝑍𝑖𝑗
𝜕𝛌𝑖𝑗
𝜕𝛌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑌𝑖𝑗
𝜕𝑋𝑖𝑗
𝜕𝛿
𝜕𝑌𝑖𝑗
=
1
9
∗ 1 ∗ 6 ∗ 2𝑌𝑖𝑗
𝜕𝛿
𝜕𝑋𝑖𝑗
=
4
3
(𝑋𝑖𝑗 + 3)
𝜕𝛿
𝜕𝛿
= 1
𝛿
Comparison
AutoGradRaw
Data Parallism
in PyTorch
Why GPU? (CUDA)
T T
Core
T T
Core
T T
Core
T T
Core
T T
Core
T T
Core


3584 cores
Good for few huge tasks Good for enormous small tasks
3.6 GHz
1.6 GHz
(2.0 GHz @ O.C)
Dataflow Diagram
CPU GPU
Memory MemorycudaMemcpy()
cudaMalloc()
__global__ sum()
hello.cu
NVCC
Co-processor
CPU GPU
d_a
d_b
d_out
h_a
h_b
h_out
1.Memcpy
sum
2.Kernal call (cuBLAS)
3.Memcpy
CUDA on Multi GPU System
Quad SLI
14,336 CUDA cores
48GB of VRAM
How can we use multi GPUs in PyTorch?
Problem
- Low utilization
Only allocated
single GPU.
Zero Utilization
Redundant Memory
Problem
- Duration & Memory Allocation
 Large batch size causes lack of memory.
 Out of memory error from PyTorch -> Python kernel dies.
 Can’t set large batch size.
 Can afford batch_size = 5, num_workers = 2
 Can’t divide up the work with the other GPUs
 Elapsed Time : 25m 44s (10 epochs)
 Reached 99% of accuracy in 9 epochs (for training set)
 It takes too much time.
Data Parallelism in PyTorch
 Implemented using torch.nn.DataParallel()
 Can be used for wrapping a module or model.
 Also support primitives (torch.nn.parallel.*)
 Replicate : Replicate the model on multiple devices(GPUs)
 Scatter : Distribute the input in the first-dimension.
 Gather : Gather and concatenate the input in the first-dimension.
 Apply-Parallel : Apply a set of already-distributed inputs to a set of already-distributed
models.
 PyTorch Tutorials – Multi-GPU examples
 https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html
Easy to Use : nn.DataParallel(model)
- Practical Example
1. Define the model.
2. Wrap the model with nn.DataParallel().
3. Access layers through ‘module’
After Parallelism
- GPU Utilization
 Hyperparameters
 Batch Size : 128
 Number of Workers : 16
 High Utilization.
 Can use large memory space.
 Allocated all GPUs
After Parallelism
- Training Performance
 Hyperparameters
 Batch Size : 128
 Large batch size need more memory space
 Number of Workers : 16
 Recommended to set (4 * NUM_GPUs) – From the forum
 Elapsed Time : 7m 50s (10 epochs)
 Reached 99% of accuracy in 4 epochs (for training set).
 It just taken 3m 10s.
Q & A

Weitere Àhnliche Inhalte

Was ist angesagt?

Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Simplilearn
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningJörgen Sandig
 
Optimizers
OptimizersOptimizers
OptimizersIl Gu Yi
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...Simplilearn
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgdataHacker. rs
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningFrancesco Casalegno
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Mohammad Junaid Khan
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work IIMohamed Loey
 

Was ist angesagt? (20)

LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Optimizers
OptimizersOptimizers
Optimizers
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew Ng
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 

Ähnlich wie Introduction to PyTorch

04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward NetworksTamer Ahmed Farrag, PhD
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat PolitÚcnica de Catalunya
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technicalalpinedatalabs
 
DISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEMDISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEMMANISH KUMAR
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfTulasiramKandula1
 
Shors'algorithm simplified.pptx
Shors'algorithm simplified.pptxShors'algorithm simplified.pptx
Shors'algorithm simplified.pptxSundarappanKathiresa
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesArithmer Inc.
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizerHojin Yang
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfssuser7f0b19
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notesUmeshJagga1
 
presentation.pptx
presentation.pptxpresentation.pptx
presentation.pptxraghav415187
 

Ähnlich wie Introduction to PyTorch (20)

04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
DISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEMDISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEM
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Shors'algorithm simplified.pptx
Shors'algorithm simplified.pptxShors'algorithm simplified.pptx
Shors'algorithm simplified.pptx
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave Machines
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notes
 
presentation.pptx
presentation.pptxpresentation.pptx
presentation.pptx
 

Mehr von Jun Young Park

Using Multi GPU in PyTorch
Using Multi GPU in PyTorchUsing Multi GPU in PyTorch
Using Multi GPU in PyTorchJun Young Park
 
Trial for Practical NN Using
Trial for Practical NN UsingTrial for Practical NN Using
Trial for Practical NN UsingJun Young Park
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkJun Young Park
 
PyTorch and Transfer Learning
PyTorch and Transfer LearningPyTorch and Transfer Learning
PyTorch and Transfer LearningJun Young Park
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksJun Young Park
 
Deep Neural Network
Deep Neural NetworkDeep Neural Network
Deep Neural NetworkJun Young Park
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural NetworkJun Young Park
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingJun Young Park
 

Mehr von Jun Young Park (8)

Using Multi GPU in PyTorch
Using Multi GPU in PyTorchUsing Multi GPU in PyTorch
Using Multi GPU in PyTorch
 
Trial for Practical NN Using
Trial for Practical NN UsingTrial for Practical NN Using
Trial for Practical NN Using
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
PyTorch and Transfer Learning
PyTorch and Transfer LearningPyTorch and Transfer Learning
PyTorch and Transfer Learning
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Deep Neural Network
Deep Neural NetworkDeep Neural Network
Deep Neural Network
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural Network
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
 

KÃŒrzlich hochgeladen

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

KÃŒrzlich hochgeladen (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Introduction to PyTorch

  • 1. RTSS Jun Young Park Introduction to PyTorch
  • 2. Objective  Understanding AutoGrad  Review  Logistic Classifier  Loss Function  Backpropagation  Chain Rule  Example : Find gradient from a matrix  AutoGrad  Solve the example with AutoGrad  Data Parallism in PyTorch  Why should we use GPUs?  Inside CUDA  How to parallelize our models  Experiment
  • 3. Simple but powerful implementation of backpropagation Understanding AutoGrad
  • 4. Logistic Classifier (Fully-Connected) 𝑊𝑋 + b = y 2.0 1.0 0.1 p = 0.7 p = 0.2 p = 0.1 S(y) ProbabilityLogits X : Input W, b : To be trained y : Prediction S(y) : Softmax function (Can be other activation functions) A B C 𝑆 𝑊 = 𝑒 𝑊 𝑖 𝑖 𝑒 𝑊 𝑖 represents the probabilities of elements in vector 𝑊. A Instance
  • 6. Loss Function  The vector can be very large when there are a lot of classes.  How can we find the distance between vector S(Predict) and L(Label) ? 𝐷 𝑆, 𝐿 = − 𝑖 𝐿𝑖 log(𝑆𝑖) 0.7 0.2 0.1 1.0 0.0 0.0 S(y) L ※ D(S,L) ≠ D(L,S) Don’t worry to take log(0) 𝑆 𝑊 = 𝑒 𝑊𝑖 𝑖 𝑒 𝑊 𝑖
  • 7. In-depth of Classifier Let there’re equations 
 1. Affine Sum 𝜎(𝑥) = 𝑊𝑥 + 𝐵 2. Activation Function 𝑊(𝜎) = 𝑅𝑒𝐿𝑈 𝜎 3. Loss Function 𝐞 𝑊 = 1 2 𝑊𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑊 2 4. Gradient Descent 𝑀 ← 𝑀 − 𝛌 𝜕𝐞 𝜕𝑀 𝑏 ← 𝑏 − 𝛌 𝜕𝐞 𝜕𝑏 • Gradient Descent requires 𝜕𝐞 𝜕𝑀 and 𝜕𝐞 𝜕𝑏 . • How can we find them? -> Use chain rule ! 𝑊𝑡𝑎𝑟𝑔𝑒𝑡 : Training data 𝑊 : Prediction result
  • 8. Chain Rule • Let y(x) is defined below, 𝑥 influences 𝑔 𝑥 and 𝑔 𝑥 influences 𝑓 𝑔 𝑥 𝑊 𝑥 = 𝑓 𝑔 𝑥 = 𝑓 ∘ 𝑔(𝑥) • Find derivation of y(x) 𝑊′ 𝑥 = 𝑓′ 𝑔 𝑥 𝑔′ 𝑥 • in Liebniz notation
 𝑑𝑊 𝑑𝑥 = 𝑑𝑊 𝑑𝑓 𝑑𝑓 𝑑𝑔 𝑑𝑔 𝑑𝑥 = 1 ∗ 𝑓′ 𝑔 𝑥 ∗ 𝑔′(𝑥)
  • 9. Chain Rule 𝜕𝐞 𝜕𝑀 = 𝜕𝐞 𝜕𝑊 𝜕𝑊 𝜕𝜎 𝜕𝜎 𝜕𝑀 = 𝑥 𝑊 − 𝑊𝑡𝑎𝑟𝑔𝑒𝑡 (𝜎 > 0) 0 (𝜎 ≀ 0) 𝜕𝐞 𝜕𝑊 = 𝑊 − 𝑊𝑡𝑎𝑟𝑔𝑒𝑡 , 𝜕𝑊 𝜕𝜎 = 1 (𝜎 > 0) 0 (𝜎 ≀ 0) , 𝜕𝜎 𝜕𝑀 = 𝑥 Let there’re equations 
 1. Affine Sum 𝜎(𝑥) = 𝑊𝑥 + 𝐵 2. Activation Function 𝑊(𝜎) = 𝑅𝑒𝐿𝑈 𝜎 3. Loss Function 𝐞 𝑊 = 1 2 𝑊𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑊 2 4. Gradient Descent 𝑀 ← 𝑀 − 𝛌 𝜕𝐞 𝜕𝑀 𝑏 ← 𝑏 − 𝛌 𝜕𝐞 𝜕𝑏
  • 10. Example : Finding gradient of 𝑋  Let input tensor 𝑋 is initialized by following square matrix of 3rd order. 𝑋 = 1 2 3 4 5 6 7 8 9  And 𝑌, 𝑍 is defined following 
 𝑌 = 𝑋 + 3 𝑍 = 6(𝑌)2 = 6( 𝑋 + 3)2  And output 𝛿 is the average of tensor 𝑍 𝛿 = 𝑚𝑒𝑎𝑛 𝑍 = 1 9 𝑖 𝑗 𝑍𝑖𝑗
  • 11. Example : Finding gradient of 𝑋  We can find scalar 𝑍𝑖𝑗 from its definition (Linearity) 𝑍𝑖𝑗 = 6(𝑌𝑖𝑗)2 𝑌𝑖𝑗 = 𝑋𝑖𝑗 + 3  To find gradient, We use ‘Chain Rule’ so that we can find partial gradients. 𝜕𝛿 𝜕𝑍𝑖𝑗 = 1 9 , 𝜕𝑍𝑖𝑗 𝜕𝑌𝑖𝑗 = 12𝑌𝑖𝑗, 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 = 1 𝜕𝛿 𝜕𝑋𝑖𝑗 = 𝜕𝛿 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 = 1 9 ∗ 12𝑌𝑖𝑗 ∗ 1 = 4 3 𝑋𝑖𝑗 + 3
  • 12. Example : Finding gradient of 𝑋  Thus, We can get a gradient of (1,1) element of 𝑋 𝜕𝛿 𝜕𝑋𝑖𝑗 = 4 3 𝑋𝑖𝑗 + 3 |(𝑖, 𝑗)=(1,1) = 4 3 1 + 3 = 16 3  Like this, We can get whole gradient matrix of 𝑋 
 𝜕𝛿 𝜕 𝑋 = 𝜕𝛿 𝜕𝑋11 𝜕𝛿 𝜕𝑋12 𝜕𝛿 𝜕𝑋13 𝜕𝛿 𝜕𝑋21 𝜕𝛿 𝜕𝑋22 𝜕𝛿 𝜕𝑋23 𝜕𝛿 𝜕𝑋31 𝜕𝛿 𝜕𝑋32 𝜕𝛿 𝜕𝑋33 = 16 3 20 3 24 3 28 3 32 3 36 3 40 3 44 3 48 3
  • 13. AutoGrad : Finding gradient of 𝑋 𝑋 = 1 2 3 4 5 6 7 8 9 𝑌 = 𝑋 + 3 𝑍 = 6(𝑌)2 = 6( 𝑋 + 3)2 𝛿 = 𝑚𝑒𝑎𝑛 𝑍 = 1 9 𝑖 𝑗 𝑍𝑖𝑗 𝜕𝛿 𝜕𝑋 = 𝜕𝛿 𝜕𝑋11 𝜕𝛿 𝜕𝑋12 𝜕𝛿 𝜕𝑋13 𝜕𝛿 𝜕𝑋21 𝜕𝛿 𝜕𝑋22 𝜕𝛿 𝜕𝑋23 𝜕𝛿 𝜕𝑋31 𝜕𝛿 𝜕𝑋32 𝜕𝛿 𝜕𝑋33 = 16 3 20 3 24 3 28 3 32 3 36 3 40 3 44 3 48 3 Each operation has its gradient function.
  • 14. Back Propagation  Get derivatives using ‘Back Propagation’ + 𝑥 𝑊 𝑧 𝑧 = 𝑥 + 𝑊 𝜕𝑧 𝜕𝑥 = 𝜕𝑧 𝜕𝑊 = 1 𝜕𝐿 𝜕𝑧 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑥 = 𝜕𝐿 𝜕𝑧 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑊 = 𝜕𝐿 𝜕𝑧 x 𝑥 𝑊 𝑧 𝑧 = 𝑥𝑊 𝜕𝑧 𝜕𝑥 = 𝑊, 𝜕𝑧 𝜕𝑊 = 𝑥 𝜕𝐿 𝜕𝑧 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑥 = 𝜕𝐿 𝜕𝑧 ∙ 𝑊 From output signal 𝐿 
 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑊 = 𝜕𝐿 𝜕𝑧 ∙ 𝑥
  • 15. Back Propagation  How about exponentation function? ^ 𝑛 𝑥 𝑧 𝑧 = 𝑥 𝑛 𝜕𝑧 𝜕𝑥 = 𝑛𝑥 𝑛−1 , 𝜕𝑧 𝜕𝑛 = 𝑥 𝑛 ln 𝑥 𝜕𝐿 𝜕𝑧 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑥 = 𝜕𝐿 𝜕𝑧 (𝑛𝑥 𝑛−1 ) From output signal 𝐿 
 𝑧 = 𝑥 𝑛 ln 𝑧 = 𝑛 ln 𝑥 1 𝑧 𝑑𝑧 = ln 𝑥 𝑑𝑛 𝑑𝑧 𝑑𝑛 = 𝑧 ln 𝑥 = 𝑥 𝑛 ln 𝑥 𝜕𝐿 𝜕𝑧 𝜕𝑧 𝜕𝑛 = 𝜕𝐿 𝜕𝑧 (𝑥 𝑛 ln 𝑥)
  • 16. Appendix : Operation Graph of 𝛿 (Matrix) +𝑋11 ^ 𝑌11 x 2 2 6 x 1 9 + 𝑍11 𝑋12 
 
 
 𝑋33 
 
 
 
 𝑍12 𝑍33 𝛿 
 𝑍𝑖𝑗 = 6(𝑌𝑖𝑗)2 𝛿 = 𝑚𝑒𝑎𝑛 𝑍
  • 17. Appendix : Operation Graph of 𝛿 (Scalar) - Backpropagation +𝑋𝑖𝑗 ^ 𝑌𝑖𝑗 x 2 6 x 1 9 + 𝑍𝑖𝑗 𝛿 +𝑋𝑖𝑗 ^ x x+ 𝑍 𝑠𝑢𝑚 2 𝛜𝑖𝑗𝛌𝑖𝑗 𝜕𝛿 𝜕𝑋𝑖𝑗 = 𝜕𝛿 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 = 𝜕𝛿 𝜕𝛜𝑖𝑗 𝜕𝛜𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝛌𝑖𝑗 𝜕𝛌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 = 1 9 ∗ 1 ∗ 6 ∗ 2𝑌𝑖𝑗 ∗ 2 = 4 3 (𝑋𝑖𝑗 + 3) 𝜕𝛿 𝜕𝛜𝑖𝑗 = 1 9 𝜕𝛿 𝜕𝑍𝑖𝑗 = 1 9 ∗ 1 𝜕𝛿 𝜕𝛌𝑖𝑗 = 1 9 ∗ 1 ∗ 6 = 𝜕𝛿 𝜕𝛜𝑖𝑗 = 𝜕𝛿 𝜕𝛜𝑖𝑗 𝜕𝛜𝑖𝑗 𝜕𝑍𝑖𝑗 = 𝜕𝛿 𝜕𝛜𝑖𝑗 𝜕𝛜𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝛌𝑖𝑗 = 𝜕𝛿 𝜕𝛜𝑖𝑗 𝜕𝛜𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝛌𝑖𝑗 𝜕𝛌𝑖𝑗 𝜕𝑌𝑖𝑗 = 𝜕𝛿 𝜕𝛜𝑖𝑗 𝜕𝛜𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝑍𝑖𝑗 𝜕𝛌𝑖𝑗 𝜕𝛌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑌𝑖𝑗 𝜕𝑋𝑖𝑗 𝜕𝛿 𝜕𝑌𝑖𝑗 = 1 9 ∗ 1 ∗ 6 ∗ 2𝑌𝑖𝑗 𝜕𝛿 𝜕𝑋𝑖𝑗 = 4 3 (𝑋𝑖𝑗 + 3) 𝜕𝛿 𝜕𝛿 = 1 𝛿
  • 20. Why GPU? (CUDA) T T Core T T Core T T Core T T Core T T Core T T Core 
 3584 cores Good for few huge tasks Good for enormous small tasks 3.6 GHz 1.6 GHz (2.0 GHz @ O.C)
  • 21. Dataflow Diagram CPU GPU Memory MemorycudaMemcpy() cudaMalloc() __global__ sum() hello.cu NVCC Co-processor CPU GPU d_a d_b d_out h_a h_b h_out 1.Memcpy sum 2.Kernal call (cuBLAS) 3.Memcpy
  • 22. CUDA on Multi GPU System Quad SLI 14,336 CUDA cores 48GB of VRAM How can we use multi GPUs in PyTorch?
  • 23. Problem - Low utilization Only allocated single GPU. Zero Utilization Redundant Memory
  • 24. Problem - Duration & Memory Allocation  Large batch size causes lack of memory.  Out of memory error from PyTorch -> Python kernel dies.  Can’t set large batch size.  Can afford batch_size = 5, num_workers = 2  Can’t divide up the work with the other GPUs  Elapsed Time : 25m 44s (10 epochs)  Reached 99% of accuracy in 9 epochs (for training set)  It takes too much time.
  • 25. Data Parallelism in PyTorch  Implemented using torch.nn.DataParallel()  Can be used for wrapping a module or model.  Also support primitives (torch.nn.parallel.*)  Replicate : Replicate the model on multiple devices(GPUs)  Scatter : Distribute the input in the first-dimension.  Gather : Gather and concatenate the input in the first-dimension.  Apply-Parallel : Apply a set of already-distributed inputs to a set of already-distributed models.  PyTorch Tutorials – Multi-GPU examples  https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html
  • 26. Easy to Use : nn.DataParallel(model) - Practical Example 1. Define the model. 2. Wrap the model with nn.DataParallel(). 3. Access layers through ‘module’
  • 27. After Parallelism - GPU Utilization  Hyperparameters  Batch Size : 128  Number of Workers : 16  High Utilization.  Can use large memory space.  Allocated all GPUs
  • 28. After Parallelism - Training Performance  Hyperparameters  Batch Size : 128  Large batch size need more memory space  Number of Workers : 16  Recommended to set (4 * NUM_GPUs) – From the forum  Elapsed Time : 7m 50s (10 epochs)  Reached 99% of accuracy in 4 epochs (for training set).  It just taken 3m 10s.
  • 29. Q & A

Hinweis der Redaktion

  1. PyTorch 에서 제공하는 자동 믞분 Ʞ능읞 AutoGrad 륌 읎핎하Ʞ 위핎  Deep Learning 의 Ʞ쎈 읎론을 닀지고 Backpropagation 을 좀더 깊게 삎펎볞닀. 귞늬고 ê·ž Backpropagation 곌 AutoGrad 의 구현을 볎며 찚읎점을 읎핎한닀. GPU 륌 사용하는 읎유와 CUDA 연산의 곌정을 볎고 PyTorch 에서 제공하는 데읎터 병렬화 Method 의 사용법을 볞닀. 귞늬고 닀쀑 GPU와 닚음 GPU의 성능을 비교한닀.
  2. Backpropagation 을 쉜게 구현한 몚듈.
  3. 로지슀틱 분류Ʞ의 Ʞ볞적읞 형태는 1ì°š 선형 핚수 ꌎ. (WX+b = y) 읎 때 X 는 입력, W, b 는 가쀑치와 펞향 (학습을 한닀는 것은 적절한 가쀑치와 펞향을 찟는 것.) Y 는 예잡 결곌 –> 읎 결곌 (Logits) 륌 확률로 변환 (Softmax Function) 왜 ? : Logit읎 맀우 컀질수도 있윌니 읎륌 0~1 사읎의 ê°„ë‹ší•œ 값윌로 변환. 확률읎 제음 높은 것윌로 분류 큎래슀가 두개 ? : Logistic Classification 큎래슀가 여러 개 ? : Softmax/Multinomial Classification
  4. 큎래슀륌 수로 나타낎렀멎 ? 벡터에서 핎당하는 큎래슀가 찞의 값을 가지게 하멎 됚. (제음 높은 확률을 갖는 큎래슀) Ex) 큎래슀 A ? -> [ 1 0 0 0 0 
.. ] : 큎래슀 A에 핎당하는 읞덱슀의 값만 ì°ž, 나뚞지는 거짓
  5. 정답곌 예잡간의 거늬 : Cross-Entropy Softmax will not be 0, 순서죌의 슉 값읎 작윌멎(가까우멎) 옳은 판당. S(y) 의 합은 1읎고 각 읞슀턎슀는 0볎닀 큰 값을 가지므로 log(0) 에 대한 묞제가 발생하지 않는닀.
  6. 연쇄 법칙에 따띌 Loss Function E 의 w 에 대한 믞분은 닀음곌 같음. 읎는 곧, w가 변할때 E가 변하는 정도는 합성된 핚수에 의한 변화량의 곱곌 같음. Y 가 E에 영향을 죌고 시귞마가 y에 영향을 죌고 w가 시귞마에 영향을 죌는 것윌로 나누얎 표현. 각각에 대한 믞분을 구하멎 닀음곌 같음. 읎 때, ReLU 는 Non-linear Function 읎므로 구간을 나누얎 믞분.
  7. 위와같읎 연산 정의 
  8. 행렬을 귞대로 연산하Ʞ는 번거로우므로, 닚음 요소에 대한 슀칌띌 표현을 사용. 귞늬고 부분 믞분을 구하멎  읎렇게 나였고 읎것을 합성핚수로 표시하멎
  9. X에 1행 1ì—Ž 원소읞 1을 대입하멎 닀음곌 같읎 나옎. 마찬가지로 닀륞 원소듀을 닀시 원볞 표현읞 행렬로 나타낎멎 닀음곌 같고 결곌는 저렇게 나옎.
  10. Gradient Function 은 ê²°êµ­ 가장 Ʞ볞적읞 계산 녞드의 Backpropagation 을 의믞.
  11. 합성핚수에 대하여 제대로 알았윌므로 역전파로 가볎자. x 와 y 가 z 에 값에 얌마나 영향을 쀬는가? 슉, x 와 y 가 변할 때 z 가 얎떻게 변하는가? 역전파 : 신혞에 녞드의 국소적 믞분을 곱한 후 닀음 녞드로 전달 (거꟞로) 더하Ʞ 녞드의 역전파는 읎전 신혞륌 귞대로 전파. 곱하Ʞ 녞드의 역전파는 읎전 신혞에 반대펾 신혞륌 곱한 신혞륌 전파.
  12. 제곱핚수 녞드와 귞에 대한 순전파, 역전파는 닀음곌같읎 나타낹. 마찬가지로 z 에 대하여 x 와 n 읎 죌는 영향을 찟는닀는 점에서 같음. 귞렇게 구하멎 닀음곌 같읎 나옎.
  13. 행렬에 대한 계산 귞래프륌 나타낎멎 닀음곌 같음. 여러 요소에 대하여 각각 계산 후 ê·ž 원소 수와 합을 읎용하여 평균을 구핚.
  14. 행렬에 대한 표현은 읎핎하Ʞ 얎렀우므로, 각 원소에 대하여 Scalar 로 표시하도록 하자. 앞서 닀룬 역전파 원늬에 읎핎 아래와 같읎 구핎짐.