For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Joseph Spisak, Product Manager at Facebook, delivers the presentation "PyTorch Deep Learning Framework: Status and Directions" at the Embedded Vision Alliance's December 2019 Vision Industry and Technology Forum. Spisak gives an update on the Torch deep learning framework and where it’s heading.
12. `
T O R C H S C R I P T
Models are Python TorchScript programs,
an optimizable subset of Python
+ Same “models are programs” idea
+ Production deployment
+ No Python dependency
+ Compilation for performance
optimization
class RNN(nn.Module):
def __init__(self, W_h, U_h, W_y, b_h, b_y):
super(RNN, self).__init__()
self.W_h = nn.Parameter(W_h)
self.U_h = nn.Parameter(U_h)
self.W_y = nn.Parameter(W_y)
self.b_h = nn.Parameter(b_h)
self.b_y = nn.Parameter(b_y)
def forward(self, x, h):
y = []
for t in range(x.size(0)):
h = torch.tanh(x[t] @ self.W_h + h @ self.U_h + self.b_h)
y += [torch.tanh(h @ self.W_y + self.b_y)]
if t % 10 == 0:
print("stats: ", h.mean(), h.var())
return torch.stack(y), h
# one annotation!
script_rnn = torch.jit.script(RNN(W_h, U_h, W_y, b_h, b_y))
13. `
T O R C H S C R I P T
Models are Python TorchScript programs,
an optimizable subset of Python
+ Same “models are programs” idea
+ Prod deployment
+ No Python dependency
+ Optimizable (incl. codegen!)
15. ~1,230C O N T R I B U T O R S
50%+Y O Y G R O W T H
23KP Y T O R C H F O R U M U S E R S
16. GROW TH IN ARXIV MENTIONS IN RESEARCH PAPERS
0
100
200
300
400
500
Jan
17
Feb
17
M
ar17
Apr17
M
ay17
Jun
17
Jul17
Aug17
Sep
17
Jan
18
Feb
18
M
ar18
Apr18
M
ay18
Jun
18
Jul18
Aug18
Sep
18
Jan
19
Feb
19
M
ar19
Apr19
M
ay19
Jun
19
Jul19
17. F R A M E W O R K S
D Y N A M I C V S . S T A T I C
19. DECLARATIVETOOLKITS
Declare and compile a model
Repeatedly execute the model in a VM
TOOLKIT VM
PYTHON SCRIPT
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
20. import tensorflow as tf
import numpy as np
trX = np.linspace(-1, 1, 101)
trY = 2 * trX + np.random.randn(*trX.shape) * 0.33
X = tf.placeholder("float")
Y = tf.placeholder("float")
def model(X, w):
return tf.multiply(X, w)
w = tf.Variable(0.0, name="weights")
y_model = model(X, w)
cost = tf.square(Y - y_model)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
with tf.Session() as sess:
tf.global_variables_initializer().run()
for i in range(100):
for (x, y) in zip(trX, trY):
sess.run(train_op, feed_dict={X: x, Y: y})
print(sess.run(w))
DECLARATIVETOOLKITS
Computation Graph
• Declare a
computation
• Placeholder
variables
• Compile it
• Run it in a Session
21. import tensorflow as tf
import numpy as np
trX = np.linspace(-1, 1, 101)
trY = 2 * trX + np.random.randn(*trX.shape) * 0.33
X = tf.placeholder("float")
Y = tf.placeholder("float")
def model(X, w):
return tf.multiply(X, w)
w = tf.Variable(0.0, name="weights")
y_model = model(X, w)
cost = tf.square(Y - y_model)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
with tf.Session() as sess:
tf.global_variables_initializer().run()
for i in range(100):
for (x, y) in zip(trX, trY):
sess.run(train_op, feed_dict={X: x, Y: y})
print(sess.run(w))
X = tf.placeholder("float")
Y = tf.placeholder("float")
DECLARATIVETOOLKITS
Computation Graph
• Declare a
computation
• Placeholder
variables
• Compile it
• Run it in a Session
22. import tensorflow as tf
import numpy as np
trX = np.linspace(-1, 1, 101)
trY = 2 * trX + np.random.randn(*trX.shape) * 0.33
X = tf.placeholder("float")
Y = tf.placeholder("float")
def model(X, w):
return tf.multiply(X, w)
w = tf.Variable(0.0, name="weights")
y_model = model(X, w)
cost = tf.square(Y - y_model)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
with tf.Session() as sess:
tf.global_variables_initializer().run()
for i in range(100):
for (x, y) in zip(trX, trY):
sess.run(train_op, feed_dict={X: x, Y: y})
print(sess.run(w))
Model definition
def model(X, w):
return tf.multiply(X, w)
w = tf.Variable(0.0, name="weights")
y_model = model(X, w)
cost = tf.square(Y - y_model)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
DECLARATIVETOOLKITS
Computation Graph
• Declare a
computation
• Placeholder
variables
• Compile it
• Run it in a Session
23. import tensorflow as tf
import numpy as np
trX = np.linspace(-1, 1, 101)
trY = 2 * trX + np.random.randn(*trX.shape) * 0.33
X = tf.placeholder("float")
Y = tf.placeholder("float")
def model(X, w):
return tf.multiply(X, w)
w = tf.Variable(0.0, name="weights")
y_model = model(X, w)
cost = tf.square(Y - y_model)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
with tf.Session() as sess:
tf.global_variables_initializer().run()
for i in range(100):
for (x, y) in zip(trX, trY):
sess.run(train_op, feed_dict={X: x, Y: y})
print(sess.run(w))
sess.run(train_op, feed_dict={X: x, Y: y})
print(sess.run(w))
A separate, Turing complete, virtual machine.
for i in range(100):
for (x, y) in zip(trX, trY):
DECLARATIVETOOLKITS
Computation Graph
• Declare a
computation
• Placeholder
variables
• Compile it
• Run it in a Session
25. IMPERATIVETOOLKITS
Run a series of computation
Implicitly defining the model as execution
goes
PYTHON NATIVE RUNTIME
PYTHON INSTRUCTIONS
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
RELU
CONNV2D
BATCHNORM
26. Imperative Toolkits
import torch
from torch.autograd import Variable
trX = torch.linspace(-1, 1, 101)
trY = 2 * trX + torch.random(*trX.size()) * 0.33
w = Variable(trX.new([0.0]), requires_grad=True)
for i in range(100):
for (x, y) in zip(trX, trY):
X = Variable(x)
Y = Variable(y)
print(X)
print(Y)
y_model = X * w.expand_as(X)
cost = (Y - y_model) * 2
Cost.backward(torch.ones(*cost.size()))
w.data = w.data + 0.01 * w.grad.data
print(w)
• Define a model by
execution
• No separate
compilation stage
• No separate
execution engine
27. import torch
from torch.autograd import Variable
trX = torch.linspace(-1, 1, 101)
trY = 2 * trX + torch.random(*trX.size()) * 0.33
w = Variable(trX.new([0.0]), requires_grad=True)
for i in range(100):
for (x, y) in zip(trX, trY):
X = Variable(x)
Y = Variable(y)
print(X)
print(Y)
y_model = X * w.expand_as(X)
cost = (Y - y_model) * 2
Cost.backward(torch.ones(*cost.size()))
w.data = w.data + 0.01 * w.grad.data
print(w)
Imperative Toolkits
Model constructed and values computed as we define it.
• Define a model by
execution
• No separate
compilation stage
• No separate
execution engine
28. P Y T O R C H
F O R E M B E D D E D
S T A T E O F T H E S T A T E
29. H O W D O I R U N P Y T O R C H M O D E L S O N D E V I C E ?
30. H O W D O I R U N P Y T O R C H M O D E L S O N D E V I C E ?
E X P O R T O N N X F O R M A T T E D M O D E L S
31. H O W D O I R U N P Y T O R C H M O D E L S O N D E V I C E ?
E X P O R T O N N X F O R M A T T E D M O D E L S
P Y T O R C H M O B I L E
34. A R C H I T E C T U R E
A N D F L O W
JIT
Tracer
and
Torchscript
ONNX
Exporter
Optimizer
Torch IR to
ONNX IR
Translator
Torch IR
ONNX
torch.onnx
.export()
PyTorch Model
Sample Input
ONNX Graph
38. W H A T I S P Y T O R C H M O B I L E ?
I T ’ S P Y T O R C H
39. W H A T I S P Y T O R C H M O B I L E ?
I T ’ S P Y T O R C H
F O R M O B I L E 😃
40. W H A T I S P Y T O R C H M O B I L E ?
I T ’ S P Y T O R C H
F O R M O B I L E
B U T N O P Y T H O N
😃
41. W H A T C A N I T R U N ?
A N Y T O R C H S C R I P T M O D E L .
42. W H A T C A N I T R U N ?
A N Y T O R C H S C R I P T M O D E L .
L O O P S ? Y E S
43. W H A T C A N I T R U N ?
A N Y T O R C H S C R I P T M O D E L .
L O O P S ?
F U N C T I O N S ?
Y E S
Y E S
44. W H A T C A N I T R U N ?
A N Y T O R C H S C R I P T M O D E L .
L O O P S ?
F U N C T I O N S ?
T U P L E S ?
Y E S
Y E S
Y E S
45. W H A T C A N I T R U N ?
A N Y T O R C H S C R I P T M O D E L .
L O O P S ?
F U N C T I O N S ?
T U P L E S ?
N A M E D T U P L E ?
Y E S
Y E S
Y E S
Y E S
46. ANDROID - MAVEN iOS - COCOAPODS
MODEL OPTIMIZATION (OPTIONAL )
PY TORCH MOBILE
• No separate runtime to export
P Y T O R C H 1 . 3
AUTHOR A MODEL IN PYTORCH
implementation
'org.pytorch:pytorch_
android:1.3.0'
pod ‘LibTorch’
qmodel = quantization.convert(my_mobile_model)
torch.jit.script(qmodel).save(“my_mobile_model.pt")
C O M I N G S O O N
• Build level optimization and selective compilation
• Whole program optimization with link time optimization
End-to-end workflows for mobile in iOS
and Android:
EXPERIMENTAL
47. QUANTIZATION
P Y T O R C H 1 . 3
model = ResNet50()
model.load_state_dict(torch.load("model.pt"))
qmodel = quantization.prepare(
model, {"": quantization.default_qconfig})
qmodel.eval()
for batch, target in data_loader:
model(batch)
qmodel = quantization.convert(qmodel)
4XL E S S M E M O R Y
U S A G E
2-4XS P E E D U P S I N
C O M P U T E
EXPERIMENTAL
• Neural networks inference is expensive
• IoT and mobile devices have limited resources
• Quantizing models enables efficient inference at scale
49. H O W D O I U S E I T ?
TorchScript
A static, high-performance subset of Python.
1. Prototype your model with PyTorch
2. Control flow is preserved
3. First-class support for lists, dicts, etc.
import torch
class MyModule(torch.nn.Module):
def __init__(self, N, M, state: List[Tensor]):
super(MyModule, self).__init__()
self.weight = torch.nn.Parameter(torch.rand(N, M))
self.state = state
def forward(self, input):
self.state.append(input)
if input.sum() > 0:
output = self.weight.mv(input)
else:
output = self.weight + input
return output
# Compile the model code to a static representation
my_module = MyModule(3, 4, [torch.rand(3, 4)])
my_script_module = torch.jit.script(my_module)
# Save the compiled code and model data
# so it can be loaded elsewhere
my_script_module.save("my_script_module.pt")
50. H O W D O I U S E I T ?
TorchScript
A static, high-performance subset of Python.
1. Prototype your model with PyTorch
2. Control flow is preserved
3. First-class support for lists, dicts, etc.
import torch
class MyModule(torch.nn.Module):
def __init__(self, N, M, state: List[Tensor]):
super(MyModule, self).__init__()
self.weight = torch.nn.Parameter(torch.rand(N, M))
self.state = state
def forward(self, input):
self.state.append(input)
if input.sum() > 0:
output = self.weight.mv(input)
else:
output = self.weight + input
return output
# Compile the model code to a static representation
my_module = MyModule(3, 4, [torch.rand(3, 4)])
my_script_module = torch.jit.script(my_module)
# Save the compiled code and model data
# so it can be loaded elsewhere
my_script_module.save("my_script_module.pt")
51. H O W D O I U S E I T ?
# Compile the model code to a static representation
my_module = MyModule(3, 4, [torch.rand(3, 4)])
my_script_module = torch.jit.script(my_module)
# Save the compiled code and model data
# so it can be loaded elsewhere
my_script_module.save("my_script_module.pt")
52. H O W D O I U S E I T ?
ANDROID iOS
implementation
'org.pytorch:pytorch_android:1.3.0'
pod 'LibTorch'
54. H O W D O E S I T W O R K ?
ANDROID iOS
https://github.com/pytorch/android-demo-app https://github.com/pytorch/ios-demo-app
55. W H A T ' S H E R E T O D A Y ?
Full TorchScript support.
Pre-built binary releases in JCenter and CocoaPods.
Java bindings.
All forward CPU operators.
Some optimized float operators (based on Caffe2Go).
Some optimized quantized operators (based on QNNPACK w/ XNNPACK WIP).
56. W H A T ' S C O M I N G U P ?
Faster.
Smaller.
Customized builds.
Obj-C/Swift API?
Kotlin wrapper?
GPU support??
Accelerator support??