This document provides an overview and tutorial for PyTorch, a popular deep learning framework developed by Facebook. It discusses what PyTorch is, how to install it, its core packages and concepts like tensors, variables, neural network modules, and optimization. The tutorial also outlines how to define neural network modules in PyTorch, build a network, and describes common layer types like convolution and linear layers. It explains key PyTorch concepts such as defining modules, building networks, and how tensors and variables are used to represent data and enable automatic differentiation for training models.
2. What is PyTorch?
⢠Developed by Facebook
â Python first
â Dynamic Neural Network
â This tutorial is for PyTorch 0.2.0
⢠Endorsed by Director of AI at Tesla
4. Packages of PyTorch
Package Description
torch a Tensor library like Numpy, with strong GPU support
torch.autograd a tape based automatic differentiation library that supports all
differentiable Tensor operations in torch
torch.nn a neural networks library deeply integrated with autograd designed for
maximum flexibility
torch.optim an optimization package to be used with torch.nn with standard
optimization methods such as SGD, RMSProp, LBFGS, Adam etc.
torch.multiprocessing python multiprocessing, but with magical memory sharing of torch
Tensors across processes. Useful for data loading and hogwild training.
torch.utils DataLoader, Trainer and other utility functions for convenience
torch.legacy(.nn/.optim) legacy code that has been ported over from torch for backward
compatibility reasons
This Tutorial
5. Outline
⢠Neural Network in Brief
⢠Concepts of PyTorch
⢠Multi-GPU Processing
⢠RNN
⢠Transfer Learning
⢠Comparison with TensorFlow
6. Neural Network in Brief
⢠Supervised Learning
â Learning a function f, that f(x)=y
Data Label
X1 Y1
X2 Y2
⌠âŚ
Trying to learn f(.), that f(x)=y
7. Neural Network in Brief
WiData
Neural Network
Big Data
Batch N
Batch 1
Batch 2
Batch 3
âŚ
1 Epoch
N=Big Data/Batch Size
8. Neural Network in Brief
WiData Labelâ
Neural Network
Forward
Big Data
Batch N
Batch 1
Batch 2
Batch 3
âŚ
1 Epoch
Forward Process: from data to label
N=Big Data/Batch Size
9. Neural Network in Brief
WiData Labelâ
Neural Network
LabelLoss
Forward
Big Data
Batch N
Batch 1
Batch 2
Batch 3
âŚ
1 Epoch
Forward Process: from data to label
N=Big Data/Batch Size
10. Neural Network in Brief
Wi-> Wi+1Data Labelâ
Neural Network
LabelLoss
Optimizer
Backward
Forward
Big Data
Batch N
Batch 1
Batch 2
Batch 3
âŚ
1 Epoch
Forward Process: from data to label
Backward Process: update the parameters
N=Big Data/Batch Size
11. Neural Network in Brief
Wi-> Wi+1Data Labelâ
Neural Network
LabelLoss
Optimizer
Backward
Forward
W
Forward
Backward
Inside the Neural Network
W W W WâŚ
Data
Labelâ
Gradient
12. Neural Network in Brief
Wi-> Wi+1Data Labelâ
Neural Network
LabelLoss
Optimizer
Backward
Forward
W
Forward
Backward
Inside the Neural Network
W W W WâŚ
Data
Gradient
Data in the Neural Network
- Tensor (n-dim array)
- Gradient of Functions
Labelâ
13. Concepts of PyTorch
⢠Modules of PyTorch
Wi-> Wi+1Data Labelâ
Neural Network
LabelLoss
Optimizer
Backward
Forward
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
14. Concepts of PyTorch
⢠Modules of PyTorch ⢠Similar to Numpy
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
15. Concepts of PyTorch
⢠Modules of PyTorch ⢠Operations
â z=x+y
â torch.add(x,y, out=z)
â y.add_(x) # in-place
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
16. Concepts of PyTorch
⢠Modules of PyTorch ⢠Numpy Bridge
⢠To Numpy
â a = torch.ones(5)
â b = a.numpy()
⢠To Tensor
â a = numpy.ones(5)
â b = torch.from_numpy(a)
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
17. Concepts of PyTorch
⢠Modules of PyTorch ⢠CUDA Tensors
⢠Move to GPU
â x = x.cuda()
â y = y.cuda()
â x+y
⢠Move Net to GPU
net = Network()
if torch.cuda.is_available():
net.cuda()
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
18. Concepts of PyTorch
⢠Modules of PyTorch
Wi-> Wi+1Data Labelâ
Neural Network
LabelLoss
Optimizer
Backward
Forward
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
19. Neural Network in Brief
Wi-> Wi+1Data Labelâ
Neural Network
LabelLoss
Optimizer
Backward
Forward
W
Forward
Backward
Inside the Neural Network
W W W WâŚ
Data
Gradient
Data in the Neural Network
- Tensor (n-dim array)
- Gradient of Functions
Labelâ
20. Concepts of PyTorch
⢠Modules of PyTorch ⢠Variable
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Tensor data
For Current Backward Process
Handled by PyTorch Automatically
21. Concepts of PyTorch
⢠Modules of PyTorch ⢠Variable
⢠x = Variable(torch.ones(2, 2), requires_grad=True)
⢠print(x)
⢠y = x + 2
⢠z = y * y * 3
⢠out = z.mean()
⢠out.backward()
⢠print(x.grad)
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
38. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
Win
Cin
*: convolution
39. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
*
1st kernel
*: convolution
40. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
41. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
k=3d=1
s=1, moving step size
p=1
42. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
k=3d=1
p=1
p=1
s=1, moving step size
43. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
k=3d=1
p=1
k=3
p=1
s=1, moving step size
44. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
k=3d=1
p=1
k=3
s=1
p=1
s=1, moving step size
45. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
Cout-th kernel
k
k
Cin
Hout
Wout
1
*
=
=
*: convolution
⌠âŚ
46. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
Cout-th kernel
k
k
Cin
Hout
Wout
1
*
=
=
Hout
Wout
Cout
*: convolution
⌠âŚ
47. NN Modules
⢠Convolution Layer
â N-th Batch (N), Channel (C)
â torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
â torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
â torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
*
1st kernel
Cout-th kernel
k
k
Cin
*
*: convolution
âŚ
# of parameters
48. NN Modules
⢠Linear Layer
â torch.nn.Linear(in_features=3, out_features=5)
â y=Ax+b
49. NN Modules
⢠Dropout Layer
â torch.nn.Dropout(p)
â Random zeros the input with probability p
â Output are scaled by 1/(1-p)
If dropout here
57. Concepts of PyTorch
⢠Modules of PyTorch ⢠Basic Method
â torch.nn.DataParallel
â Recommend by PyTorch
⢠Advanced Methods
â torch.multiprocessing
â Hogwild (async)
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
58. Multi-GPU Processing
⢠torch.nn.DataParallel
â gpu_id = '6,7'
â os.environ['CUDA_VISIBLE_DEVICES'] = gpu_id
â net = torch.nn.DataParallel(model, device_ids=[0, 1])
â output = net(input_var)
⢠Important Notes:
â Device_ids must start from 0
â (batch_size/GPU_size) must be integer
59. Saving Models
⢠First Approach (Recommend by PyTorch)
⢠# save only the model parameters
⢠torch.save(the_model.state_dict(), PATH)
⢠# load only the model parameters
⢠the_model = TheModelClass(*args, **kwargs)
⢠the_model.load_state_dict(torch.load(PATH))
⢠Second Approach
⢠torch.save(the_model, PATH) # save the entire model
⢠the_model = torch.load(PATH) # load the entire model
http://pytorch.org/docs/master/notes/serialization.html#recommended-approach-for-saving-a-model
61. Recurrent Neural Network (RNN)
http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net
self.i2h
input_size=50+20=70
input
hidden
output
Same module (i.e. same parameters)
among the time
62. Transfer Learning
⢠Freeze the parameters of original model
â requires_grad = False
⢠Then add your own modules
http://pytorch.org/docs/master/notes/autograd.html#excluding-subgraphs-from-backward
63. Comparison with TensorFlow
Properties TensorFlow PyTorch
Graph
Static
Dynamic (TensorFlow Fold)
Dynamic
Ramp-up Time - Win
Graph Creation and Debugging - Win
Feature Coverage Win Catch up quickly
Documentation Tie Tie
Serialization Win (support other lang.) -
Deployment Win (Cloud & Mobile) -
Data Loading - Win
Device Management Win Need .cuda()
Custom Extensions - Win
Summarized from https://awni.github.io/pytorch-tensorflow/