SlideShare ist ein Scribd-Unternehmen logo
1 von 105
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
PyTorch under the hood
A guide to understand PyTorch internals
Christian S. Perone
(christian.perone@gmail.com)
http://blog.christianperone.com
PyData Montreal, Feb 2019
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Agenda
TENSORS
Tensors
Python objects
Zero-copy
Tensor storage
Memory allocators (CPU/GPU)
The big picture
JIT
Just-in-time compiler
Tracing
Scripting
Why TorchScript ?
Building IR and JIT Phases
Optimizations
Serialization
Using models in other languages
PRODUCTION
Some tips
Q&A
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHO AM I
Christian S. Perone
14 years working with Machine
Learning, Data Science and Software
Engineering in industry R&D
Blog at
blog.christianperone.com
Open-source projects at
https://github.com/perone
Twitter @tarantulae
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
DISCLAIMER
PyTorch is a moving target, Deep Learning ecosystem moves
fast and big changes happens every week;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
DISCLAIMER
PyTorch is a moving target, Deep Learning ecosystem moves
fast and big changes happens every week;
This is not a talk to teach you the basics of PyTorch or how to
train your network, but to teach you how PyTorch
components works under the hood in a intuitive way;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
DISCLAIMER
PyTorch is a moving target, Deep Learning ecosystem moves
fast and big changes happens every week;
This is not a talk to teach you the basics of PyTorch or how to
train your network, but to teach you how PyTorch
components works under the hood in a intuitive way;
This talk is updated to the PyTorch v.1.0.1 version;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Section I
TENSORS
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
>>> t.dtype # They have a type
torch.float32
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
>>> t.dtype # They have a type
torch.float32
>>> t.shape # a shape
torch.Size([2, 2])
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
>>> t.dtype # They have a type
torch.float32
>>> t.shape # a shape
torch.Size([2, 2])
>>> t.device # and live in some device
device(type='cpu')
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
In Python, the integration of C++ code is (usually) done using
what is called an extension;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
In Python, the integration of C++ code is (usually) done using
what is called an extension;
PyTorch uses ATen, which is the foundational tensor operation
library on which all else is built;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
In Python, the integration of C++ code is (usually) done using
what is called an extension;
PyTorch uses ATen, which is the foundational tensor operation
library on which all else is built;
To do automatic differentiation, PyTorch uses Autograd, which
is an augmentation on top of the ATen framework;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
In Python, the integration of C++ code is (usually) done using
what is called an extension;
PyTorch uses ATen, which is the foundational tensor operation
library on which all else is built;
To do automatic differentiation, PyTorch uses Autograd, which
is an augmentation on top of the ATen framework;
In the Python API, PyTorch previously had separate
Variable and a Tensor types, after v.0.4.0 they were
merged into Tensor .
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
struct _typeobject *ob_type
Py_ssize_t ob_refcnt
objectPyObject
double ob_fval
PyObject_HEAD
objectPyFloatObject
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
struct THPVariable {
PyObject_HEAD
torch::autograd::Variable cdata;
PyObject* backward_hooks;
};
The TH prefix is from TorcH, and P means Python.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
struct THPVariable {
PyObject_HEAD
torch::autograd::Variable cdata;
PyObject* backward_hooks;
};
(object fields)
PyObject_HEAD (w/ ref counter)
objectTHPVariable
variable_a
variable_b
Ref Count = 1
Ref Count = 2
The TH prefix is from TorcH, and P means Python.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
IN PYTHON, EVERYTHING IS AN OBJECT
>>> a = 300
>>> b = 300
>>> a is b
False
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
IN PYTHON, EVERYTHING IS AN OBJECT
>>> a = 300
>>> b = 300
>>> a is b
False
>>> a = 200
>>> b = 200
>>> a is b
True
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
IN PYTHON, EVERYTHING IS AN OBJECT
>>> a = 300
>>> b = 300
>>> a is b
False
>>> a = 200
>>> b = 200
>>> a is b
True (object fields)
PyObject_HEAD
objectPyIntObject
a
b
Ref Count = 1
Ref Count = 2
(object fields)
PyObject_HEAD
objectPyIntObject
(object fields)
PyObject_HEAD
objectPyIntObject
a
b
Ref Count = 1
Ref Count = 1
A typical Python program spend much of its time
allocating/deallocating integers. CPython then caches the small
integers.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
Underline after an operation means an in-place operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.tensor(np_array)
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
Underline after an operation means an in-place operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.tensor(np_array)
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
>>> torch_array.add_(1.0)
Underline after an operation means an in-place operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.tensor(np_array)
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
>>> torch_array.add_(1.0)
>>> np_array
array([[1., 1.], # array is intact, a copy was made
[1., 1.]])
Underline after an operation means an in-place operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Now imagine that you have a batch of 128 images, 3 channels
each (RGB) and with size of 224x224;
0
1
1
1
0
0
1
1
1
0
0
1
1
1
1
1
1
0
1
0
1
1
1
1
0
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
1
0
0
1
1
0
0
0
Column
Row
Channel
This will yield a size in memory of ∼ 74MB. We don’t want to
duplicate memory (except when copying them to discrete GPUs
of course);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> torch_array.add_(1.0)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> torch_array.add_(1.0)
>>> np_array
array([[2., 2.],
[2., 2.]])
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> torch_array.add_(1.0)
>>> np_array
array([[2., 2.],
[2., 2.]])
The original numpy array was changed, because it used a zero-copy
operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> np_array = np_array + 1.0
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> np_array = np_array + 1.0
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> np_array = np_array + 1.0
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
However, if you use np_array += 1.0 , that is an in-place
operation that will change torch_array memory.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
at::Tensor tensor_from_numpy(PyObject* obj) {
// (...) - omitted for brevity
auto array = (PyArrayObject*)obj;
int ndim = PyArray_NDIM(array);
auto sizes = to_aten_shape(ndim, PyArray_DIMS(array));
auto strides = to_aten_shape(ndim, PyArray_STRIDES(array));
// (...) - omitted for brevity
void* data_ptr = PyArray_DATA(array);
auto& type = CPU(dtype_to_aten(PyArray_TYPE(array)));
Py_INCREF(obj);
return type.tensorFromBlob(data_ptr, sizes, strides,
[obj](void* data) {
AutoGIL gil;
Py_DECREF(obj);
});
}
Pay attention to the reference counting using Py_INCREF() and the
call to tensorFromBlob() function.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
DATA POINTERS
(object fields)
data_pointer*
objectPyArrayObject
(object fields)
data_pointer*
objectFloatTensor
The tensor FloatTensor did a copy of the numpy array data
pointer and not of the contents. The reference is kept safe by the
Python reference counting mechanism.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The abstraction responsible for holding the data isn’t actually the
Tensor , but the Storage .
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The abstraction responsible for holding the data isn’t actually the
Tensor , but the Storage .
struct C10_API StorageImpl final : (...) {
// (...)
private:
// (...)
caffe2::TypeMeta data_type_;
DataPtr data_ptr_;
int64_t numel_;
Allocator* allocator_;
}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The abstraction responsible for holding the data isn’t actually the
Tensor , but the Storage .
struct C10_API StorageImpl final : (...) {
// (...)
private:
// (...)
caffe2::TypeMeta data_type_;
DataPtr data_ptr_;
int64_t numel_;
Allocator* allocator_;
}
Holds a pointer to the raw data and contains information such as
the size and allocator;
Storage is a dumb abstraction, there is no metadata telling us
how to interpret the data it holds;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
We can have multiple tensors sharing the same storage, but
with different interpretations, also called views, but without
duplicating memory:
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
We can have multiple tensors sharing the same storage, but
with different interpretations, also called views, but without
duplicating memory:
>>> tensor_a = torch.ones((2, 2))
>>> tensor_b = tensor_a.view(4)
>>> tensor_a_data = tensor_a.storage().data_ptr()
>>> tensor_b_data = tensor_b.storage().data_ptr()
>>> tensor_a_data == tensor_b_data
True
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
We can have multiple tensors sharing the same storage, but
with different interpretations, also called views, but without
duplicating memory:
>>> tensor_a = torch.ones((2, 2))
>>> tensor_b = tensor_a.view(4)
>>> tensor_a_data = tensor_a.storage().data_ptr()
>>> tensor_b_data = tensor_b.storage().data_ptr()
>>> tensor_a_data == tensor_b_data
True
tensor_b is a different view (interpretation) of the same data
present in the underlying storage that is shared between both
tensors.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
MEMORY ALLOCATORS (CPU/GPU)
The tensor storage can be allocated either in the CPU memory
or GPU, therefore a mechanism is required to switch between
these different allocations:
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
MEMORY ALLOCATORS (CPU/GPU)
The tensor storage can be allocated either in the CPU memory
or GPU, therefore a mechanism is required to switch between
these different allocations:
struct Allocator {
virtual ~Allocator() {}
virtual DataPtr allocate(size_t n) const = 0;
virtual DeleterFnPtr raw_deleter() const {...}
void* raw_allocate(size_t n) {...}
void raw_deallocate(void* ptr) {...}
};
There are Allocator s that will use the GPU allocators such as
cudaMallocHost() when the storage should be used for the
GPU or posix_memalign() POSIX functions for data in the
CPU memory.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
THE BIG PICTURE
(object fields)
Storage *storage
objectTensor
Allocator *allocator
(object fields)
DataPtr data_ptr
objectStorage
raw_deallocate()
(object fields)
raw_allocate()
objectAllocator
Raw Data
The Tensor has a Storage which in turn has a pointer to
the raw data and to the Allocator to allocate memory
according to the destination device.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Section II
JIT
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
JIT - JUST-IN-TIME COMPILER
PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
JIT - JUST-IN-TIME COMPILER
PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;
However, this poses problems for optimization and for
decoupling it from Python (the model itself is Python code);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
JIT - JUST-IN-TIME COMPILER
PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;
However, this poses problems for optimization and for
decoupling it from Python (the model itself is Python code);
PyTorch 1.0 introduced torch.jit , which has two main
methods to convert a PyTorch model to a serializable and
optimizable format;
TorchScript was also introduced as a statically-typed subset of
Python;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
JIT - JUST-IN-TIME COMPILER
Two very different worlds with their own requirements.
Prototype, debug, train,
experiment
EAGER MODE
Optimization, other
languages, deployment
SCRIPT MODE
tracing
scripting
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r
>>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2)))
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r
>>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2)))
>>> ftrace.graph
graph(%x : Float(2, 2)) {
%4 : Float() = prim::Constant[value={2}]()
%5 : Device = prim::Constant[value="cpu"]()
%6 : int = prim::Constant[value=6]()
%7 : bool = prim::Constant[value=0]()
%8 : bool = prim::Constant[value=0]()
%9 : Float() = aten::to(%4, %5, %6, %7, %8)
%10 : Float() = aten::detach(%9)
return (%10); }
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
To call the JIT’ed function, just call the forward() method:
>>> x = torch.ones(2, 2)
>>> ftrace.forward(x)
tensor(2.)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
To call the JIT’ed function, just call the forward() method:
>>> x = torch.ones(2, 2)
>>> ftrace.forward(x)
tensor(2.)
However, tracing will not record any control-flow like if statements
or loops, it executes the code with the given context and creates the
graph. You can see this limitation below:
>>> x = torch.ones(2, 2).add_(1.0)
>>> ftrace.forward(x)
tensor(2.)
According to my_function() , result should have been 1.0. Tracing
also checks for differences between traced and Python function, but
what about Dropout ?
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
Another alternative is to use scripting, where you can use decorators
such as @torch.jit.script :
@torch.jit.script
def my_function(x):
if bool(x.mean() > 1.0):
r = 1
else:
r = 2
return r
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
>>> my_function.graph
graph(%x : Tensor) {
%2 : float = prim::Constant[value=1]()
%5 : int = prim::Constant[value=1]()
%6 : int = prim::Constant[value=2]()
%1 : Tensor = aten::mean(%x)
%3 : Tensor = aten::gt(%1, %2)
%4 : bool = prim::Bool(%3)
%r : int = prim::If(%4)
block0() {
-> (%5)
}
block1() {
-> (%6)
}
return (%r);
}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule
When we check the results again:
>>> x = torch.ones(2, 2)
>>> my_function(x)
2
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule
When we check the results again:
>>> x = torch.ones(2, 2)
>>> my_function(x)
2
>>> x = torch.ones(2, 2).add_(1.0)
>>> my_function(x)
1
Control-flow logic was preserved !
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
This opens the door to:
Decouple the model (computationl graph) from Python runtime;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
This opens the door to:
Decouple the model (computationl graph) from Python runtime;
Use it in production with C++ (no GIL) or other languages;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
This opens the door to:
Decouple the model (computationl graph) from Python runtime;
Use it in production with C++ (no GIL) or other languages;
Capitalize on optimizations (whole program);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
This opens the door to:
Decouple the model (computationl graph) from Python runtime;
Use it in production with C++ (no GIL) or other languages;
Capitalize on optimizations (whole program);
Split the development world of hackable and easy to debug from
the world of putting these models in production and optimize
them.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
BUILDING THE IR
To build the IR, PyTorch takes leverage of the Python Abstract
Syntax Tree (AST) which is a tree representation of the syntactic
structure of the source code.
>>> ast_mod = ast.parse("print(1 + 2)")
>>> astpretty.pprint(ast_mod.body[0], show_offsets=False)
Expr(
value=Call(
func=Name(id='print', ctx=Load()),
args=[
BinOp(
left=Num(n=1),
op=Add(),
right=Num(n=2),
),
],
keywords=[],
),
)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
BUILDING THE IR
print(1 + 2)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
PYTORCH JIT PHASES
Parsing Checking Optimization
Translation Execution○
AST Code
or
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
EXECUTING
Just like Python interpreter executes your code, PyTorch has a
interpreter that executes the IR instructions:
bool runImpl(Stack& stack) {
auto& instructions = function->instructions;
size_t last = instructions.size();
while (pc < last) {
auto& inst = instructions[pc];
try {
loadTensorsFromRegisters(inst.inputs, stack);
size_t new_pc = pc + 1 + inst.callback(stack);
for (int i = inst.outputs.size - 1; i >= 0; --i) {
int reg = get(inst.outputs, i);
registers[reg] = pop(stack);
}
pc = new_pc;
// (...) omitted
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
OPTIMIZATIONS
Many optimizations can be used on the computational graph of the
model, such as Loop Unrolling:
for i.. i+= 1 for i.. i+= 4
for j.. for j..
code(i, j) code(i, j)
code(i+1, j)
code(i+2, j)
code(i+3, j)
remainder loop
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
OPTIMIZATIONS
Also Peephole optimizations such as:
x.t().t() = x
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
OPTIMIZATIONS
Also Peephole optimizations such as:
x.t().t() = x
Example:
def dumb_function(x):
return x.t().t()
>>> traced_fn = torch.jit.trace(dumb_function,
... torch.ones(2,2))
>>> traced_fn.graph_for(torch.ones(2,2))
graph(%x : Float(*, *)) {
return (%x);
}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
OPTIMIZATIONS
Also Peephole optimizations such as:
x.t().t() = x
Example:
def dumb_function(x):
return x.t().t()
>>> traced_fn = torch.jit.trace(dumb_function,
... torch.ones(2,2))
>>> traced_fn.graph_for(torch.ones(2,2))
graph(%x : Float(*, *)) {
return (%x);
}
Other optimizations include Constant Propagation, Dead Code
Elimination (DCE), fusion, inlining, etc.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")
$ file resnet.pt
resnet.pt: Zip archive data
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")
$ file resnet.pt
resnet.pt: Zip archive data
$ unzip resnet.pt
Archive: resnet.pt
extracting: resnet/version
extracting: resnet/code/resnet.py
extracting: resnet/model.json
extracting: resnet/tensors/0
(...)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
code/resnet.py
op_version_set = 0
def forward(self, input_1: Tensor) -> Tensor:
input_2 = torch._convolution(input_1, self.conv1.weight, ...)
# (...)
input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias,
self.bn1.running_mean, self.bn1.running_var, ...)
# (...)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
code/resnet.py
op_version_set = 0
def forward(self, input_1: Tensor) -> Tensor:
input_2 = torch._convolution(input_1, self.conv1.weight, ...)
# (...)
input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias,
self.bn1.running_mean, self.bn1.running_var, ...)
# (...)
model.json
{"parameters":
[{ "isBuffer": false,
"tensorId": "1",
"name": "weight" }],
"name": "conv1",
"optimize": true}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
code/resnet.py
op_version_set = 0
def forward(self, input_1: Tensor) -> Tensor:
input_2 = torch._convolution(input_1, self.conv1.weight, ...)
# (...)
input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias,
self.bn1.running_mean, self.bn1.running_var, ...)
# (...)
model.json
{"parameters":
[{ "isBuffer": false,
"tensorId": "1",
"name": "weight" }],
"name": "conv1",
"optimize": true}
model.json
[{"isBuffer": true,
"tensorId": "4",
"name": "running_mean"},
{"isBuffer": true,
"tensorId": "5",
"name": "running_var"}],
"name": "bn1",
"optimize": true}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
USING THE MODEL IN C++
PyTorch also has a C++ API that you can use to load/train models in
C++. This is good for production, mobile, embedded devices, etc.
Example of loading a traced model in PyTorch C++ API:
#include <torch/script.h>
int main(int argc, const char* argv[])
{
auto module = torch::jit::load("resnet.pt");
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));
at::Tensor output = module->forward(inputs).toTensor();
}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
USING THE MODEL IN NODEJS
Complete tutorial at https://goo.gl/7wMJuS.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Section III
PRODUCTION
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
They don’t pay attention to zero-copy practices, so they often
transform, reshape, convert to numpy, convert to PyTorch, etc;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
They don’t pay attention to zero-copy practices, so they often
transform, reshape, convert to numpy, convert to PyTorch, etc;
They often use HTTP/1;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
They don’t pay attention to zero-copy practices, so they often
transform, reshape, convert to numpy, convert to PyTorch, etc;
They often use HTTP/1;
They seldom do batching (important for GPUs);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
They don’t pay attention to zero-copy practices, so they often
transform, reshape, convert to numpy, convert to PyTorch, etc;
They often use HTTP/1;
They seldom do batching (important for GPUs);
They never put that "production" code in production.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
PREFER BINARY SERIALIZATION FORMATS
Prefer using good binary serialization methods such as Protobuf
that offers a schema and a schema evolution mechanism.
Example from EuclidesDB RPC message:
message AddImageRequest {
int32 image_id = 1;
bytes image_data = 2;
// This field can encode JSON data
bytes image_metadata = 3;
repeated string models = 4;
}
* http://euclidesdb.readthedocs.io
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
AVOID EXTRA COPIES
Be careful to avoid extra copies of your tensors, especially during
pre-processing;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
AVOID EXTRA COPIES
Be careful to avoid extra copies of your tensors, especially during
pre-processing;
You can use in-place operations. It is a functional anti-pattern
because it introduces side-effects, but it’s a fair price to pay for
performance;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
AVOID EXTRA COPIES
Be careful to avoid extra copies of your tensors, especially during
pre-processing;
You can use in-place operations. It is a functional anti-pattern
because it introduces side-effects, but it’s a fair price to pay for
performance;
Caveat: in-place operations doesn’t make much sense when you
need gradients. PyTorch uses tensor versioning to catch that:
>>> a = torch.tensor(1.0, requires_grad=True)
>>> y = a.tanh()
>>> y.add_(2.0)
>>> y.backward() # error !
>>> a._version
0
>>> y._version
1
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
Client Server
Time
HTTP 1.1 - HoL
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
Client Server
Time
HTTP 1.1 - HoL
Client Server
Time
HTTP 2.0 - Multiplexing
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
Client Server
Time
HTTP 1.1 - HoL
Client Server
Time
HTTP 2.0 - Multiplexing
Use HTTP 2.0 if possible, and avoid the head-of-line blocking;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
Client Server
Time
HTTP 1.1 - HoL
Client Server
Time
HTTP 2.0 - Multiplexing
Use HTTP 2.0 if possible, and avoid the head-of-line blocking;
Even better, you can use frameworks such as gRPC that uses
HTTP/2.0 and Protobuf.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
BATCHING
Batching data is a way to amortize the performance bottleneck.
GPU
Non-batching
Requests
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
BATCHING
Batching data is a way to amortize the performance bottleneck.
GPU
Non-batching
Requests
GPU
Batch 2 Batch 1
BatchingRequests
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Section IV
Q&A
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Q&A
Thanks !

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
 

Was ist angesagt? (20)

PyCUDAの紹介
PyCUDAの紹介PyCUDAの紹介
PyCUDAの紹介
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
Pytorch
PytorchPytorch
Pytorch
 
実践!Elasticsearch + Sudachi を用いた全文検索エンジン
実践!Elasticsearch + Sudachi を用いた全文検索エンジン実践!Elasticsearch + Sudachi を用いた全文検索エンジン
実践!Elasticsearch + Sudachi を用いた全文検索エンジン
 
OpenStack上の環境構築自動化に向けたTerraform/Pulumiの活用
OpenStack上の環境構築自動化に向けたTerraform/Pulumiの活用OpenStack上の環境構築自動化に向けたTerraform/Pulumiの活用
OpenStack上の環境構築自動化に向けたTerraform/Pulumiの活用
 
containerdの概要と最近の機能
containerdの概要と最近の機能containerdの概要と最近の機能
containerdの概要と最近の機能
 
余ったPCをルータに変える、ソフトウェアルータ「SEIL/x86」
余ったPCをルータに変える、ソフトウェアルータ「SEIL/x86」余ったPCをルータに変える、ソフトウェアルータ「SEIL/x86」
余ったPCをルータに変える、ソフトウェアルータ「SEIL/x86」
 
輪読資料 Xception: Deep Learning with Depthwise Separable Convolutions
輪読資料 Xception: Deep Learning with Depthwise Separable Convolutions輪読資料 Xception: Deep Learning with Depthwise Separable Convolutions
輪読資料 Xception: Deep Learning with Depthwise Separable Convolutions
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないこと
 
Poincare embeddings for Learning Hierarchical Representations
Poincare embeddings for Learning Hierarchical RepresentationsPoincare embeddings for Learning Hierarchical Representations
Poincare embeddings for Learning Hierarchical Representations
 
コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線
 
OpenStackで始めるクラウド環境構築入門(Horizon 基礎編)
OpenStackで始めるクラウド環境構築入門(Horizon 基礎編)OpenStackで始めるクラウド環境構築入門(Horizon 基礎編)
OpenStackで始めるクラウド環境構築入門(Horizon 基礎編)
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
DLLAB Engineer Days : ONNX Export & Optimize
DLLAB Engineer Days : ONNX Export & Optimize DLLAB Engineer Days : ONNX Export & Optimize
DLLAB Engineer Days : ONNX Export & Optimize
 
DockerコンテナでGitを使う
DockerコンテナでGitを使うDockerコンテナでGitを使う
DockerコンテナでGitを使う
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
P2P Container Image Distribution on IPFS With containerd and nerdctl
P2P Container Image Distribution on IPFS With containerd and nerdctlP2P Container Image Distribution on IPFS With containerd and nerdctl
P2P Container Image Distribution on IPFS With containerd and nerdctl
 
捗るリコメンドシステムの裏事情(ハッカドール)
捗るリコメンドシステムの裏事情(ハッカドール)捗るリコメンドシステムの裏事情(ハッカドール)
捗るリコメンドシステムの裏事情(ハッカドール)
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選
 
CPythonを読もう
CPythonを読もうCPythonを読もう
CPythonを読もう
 

Ähnlich wie PyTorch under the hood

a quick Introduction to PyPy
a quick Introduction to PyPya quick Introduction to PyPy
a quick Introduction to PyPy
Kai Aras
 
Python Programming-1.pptx of python by computer
Python Programming-1.pptx of python by computerPython Programming-1.pptx of python by computer
Python Programming-1.pptx of python by computer
sharanyarashmir5
 

Ähnlich wie PyTorch under the hood (20)

data-microscopes
data-microscopesdata-microscopes
data-microscopes
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
a quick Introduction to PyPy
a quick Introduction to PyPya quick Introduction to PyPy
a quick Introduction to PyPy
 
chapter1(3).pdf
chapter1(3).pdfchapter1(3).pdf
chapter1(3).pdf
 
Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?
 
Current State of Python Packaging
Current State of Python PackagingCurrent State of Python Packaging
Current State of Python Packaging
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Analyzing and Sharing HDF5 Data with Python
Analyzing and Sharing HDF5 Data with PythonAnalyzing and Sharing HDF5 Data with Python
Analyzing and Sharing HDF5 Data with Python
 
Python Programming.pptx
Python Programming.pptxPython Programming.pptx
Python Programming.pptx
 
PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtime
 
Getting Started with Python
Getting Started with PythonGetting Started with Python
Getting Started with Python
 
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
 
PYTHON FOR BEGINNERS (BASICS OF PYTHON)
PYTHON FOR BEGINNERS (BASICS OF PYTHON)PYTHON FOR BEGINNERS (BASICS OF PYTHON)
PYTHON FOR BEGINNERS (BASICS OF PYTHON)
 
Python_Haegl.powerpoint presentation. tx
Python_Haegl.powerpoint presentation. txPython_Haegl.powerpoint presentation. tx
Python_Haegl.powerpoint presentation. tx
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning Programming
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Introduction to python 3
Introduction to python 3Introduction to python 3
Introduction to python 3
 
Python Programming-1.pptx of python by computer
Python Programming-1.pptx of python by computerPython Programming-1.pptx of python by computer
Python Programming-1.pptx of python by computer
 
Python Tutorial .pdf
Python Tutorial .pdfPython Tutorial .pdf
Python Tutorial .pdf
 

Mehr von Christian Perone

Python - Introdução Básica
Python - Introdução BásicaPython - Introdução Básica
Python - Introdução Básica
Christian Perone
 

Mehr von Christian Perone (10)

Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introduction
 
Bayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studiesBayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studies
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep Learning
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooDeep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural Zoo
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Machine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learnMachine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learn
 
Python - Introdução Básica
Python - Introdução BásicaPython - Introdução Básica
Python - Introdução Básica
 
C++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing featuresC++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing features
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

PyTorch under the hood

  • 1. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A PyTorch under the hood A guide to understand PyTorch internals Christian S. Perone (christian.perone@gmail.com) http://blog.christianperone.com PyData Montreal, Feb 2019
  • 2. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Agenda TENSORS Tensors Python objects Zero-copy Tensor storage Memory allocators (CPU/GPU) The big picture JIT Just-in-time compiler Tracing Scripting Why TorchScript ? Building IR and JIT Phases Optimizations Serialization Using models in other languages PRODUCTION Some tips Q&A
  • 3. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHO AM I Christian S. Perone 14 years working with Machine Learning, Data Science and Software Engineering in industry R&D Blog at blog.christianperone.com Open-source projects at https://github.com/perone Twitter @tarantulae
  • 4. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A DISCLAIMER PyTorch is a moving target, Deep Learning ecosystem moves fast and big changes happens every week;
  • 5. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A DISCLAIMER PyTorch is a moving target, Deep Learning ecosystem moves fast and big changes happens every week; This is not a talk to teach you the basics of PyTorch or how to train your network, but to teach you how PyTorch components works under the hood in a intuitive way;
  • 6. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A DISCLAIMER PyTorch is a moving target, Deep Learning ecosystem moves fast and big changes happens every week; This is not a talk to teach you the basics of PyTorch or how to train your network, but to teach you how PyTorch components works under the hood in a intuitive way; This talk is updated to the PyTorch v.1.0.1 version;
  • 7. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Section I TENSORS
  • 8. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type.
  • 9. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type. >>> import torch >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> t tensor([[ 1., -1.] [ 1., -1.]])
  • 10. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type. >>> import torch >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> t tensor([[ 1., -1.] [ 1., -1.]]) >>> t.dtype # They have a type torch.float32
  • 11. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type. >>> import torch >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> t tensor([[ 1., -1.] [ 1., -1.]]) >>> t.dtype # They have a type torch.float32 >>> t.shape # a shape torch.Size([2, 2])
  • 12. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type. >>> import torch >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> t tensor([[ 1., -1.] [ 1., -1.]]) >>> t.dtype # They have a type torch.float32 >>> t.shape # a shape torch.Size([2, 2]) >>> t.device # and live in some device device(type='cpu')
  • 13. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Although PyTorch has an elegant python first design, all PyTorch heavy work is actually implemented in C++. In Python, the integration of C++ code is (usually) done using what is called an extension;
  • 14. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Although PyTorch has an elegant python first design, all PyTorch heavy work is actually implemented in C++. In Python, the integration of C++ code is (usually) done using what is called an extension; PyTorch uses ATen, which is the foundational tensor operation library on which all else is built;
  • 15. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Although PyTorch has an elegant python first design, all PyTorch heavy work is actually implemented in C++. In Python, the integration of C++ code is (usually) done using what is called an extension; PyTorch uses ATen, which is the foundational tensor operation library on which all else is built; To do automatic differentiation, PyTorch uses Autograd, which is an augmentation on top of the ATen framework;
  • 16. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Although PyTorch has an elegant python first design, all PyTorch heavy work is actually implemented in C++. In Python, the integration of C++ code is (usually) done using what is called an extension; PyTorch uses ATen, which is the foundational tensor operation library on which all else is built; To do automatic differentiation, PyTorch uses Autograd, which is an augmentation on top of the ATen framework; In the Python API, PyTorch previously had separate Variable and a Tensor types, after v.0.4.0 they were merged into Tensor .
  • 17. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject;
  • 18. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject; typedef struct _object { Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject;
  • 19. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject; typedef struct _object { Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; struct _typeobject *ob_type Py_ssize_t ob_refcnt objectPyObject double ob_fval PyObject_HEAD objectPyFloatObject
  • 20. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS struct THPVariable { PyObject_HEAD torch::autograd::Variable cdata; PyObject* backward_hooks; }; The TH prefix is from TorcH, and P means Python.
  • 21. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS struct THPVariable { PyObject_HEAD torch::autograd::Variable cdata; PyObject* backward_hooks; }; (object fields) PyObject_HEAD (w/ ref counter) objectTHPVariable variable_a variable_b Ref Count = 1 Ref Count = 2 The TH prefix is from TorcH, and P means Python.
  • 22. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A IN PYTHON, EVERYTHING IS AN OBJECT >>> a = 300 >>> b = 300 >>> a is b False
  • 23. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A IN PYTHON, EVERYTHING IS AN OBJECT >>> a = 300 >>> b = 300 >>> a is b False >>> a = 200 >>> b = 200 >>> a is b True
  • 24. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A IN PYTHON, EVERYTHING IS AN OBJECT >>> a = 300 >>> b = 300 >>> a is b False >>> a = 200 >>> b = 200 >>> a is b True (object fields) PyObject_HEAD objectPyIntObject a b Ref Count = 1 Ref Count = 2 (object fields) PyObject_HEAD objectPyIntObject (object fields) PyObject_HEAD objectPyIntObject a b Ref Count = 1 Ref Count = 1 A typical Python program spend much of its time allocating/deallocating integers. CPython then caches the small integers.
  • 25. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS It is very common to load tensors in numpy and convert them to PyTorch, or vice-versa; >>> np_array = np.ones((2,2)) >>> np_array array([[1., 1.], [1., 1.]]) Underline after an operation means an in-place operation.
  • 26. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS It is very common to load tensors in numpy and convert them to PyTorch, or vice-versa; >>> np_array = np.ones((2,2)) >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.tensor(np_array) >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64) Underline after an operation means an in-place operation.
  • 27. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS It is very common to load tensors in numpy and convert them to PyTorch, or vice-versa; >>> np_array = np.ones((2,2)) >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.tensor(np_array) >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64) >>> torch_array.add_(1.0) Underline after an operation means an in-place operation.
  • 28. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS It is very common to load tensors in numpy and convert them to PyTorch, or vice-versa; >>> np_array = np.ones((2,2)) >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.tensor(np_array) >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64) >>> torch_array.add_(1.0) >>> np_array array([[1., 1.], # array is intact, a copy was made [1., 1.]]) Underline after an operation means an in-place operation.
  • 29. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Now imagine that you have a batch of 128 images, 3 channels each (RGB) and with size of 224x224; 0 1 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 1 1 0 0 1 1 0 0 0 Column Row Channel This will yield a size in memory of ∼ 74MB. We don’t want to duplicate memory (except when copying them to discrete GPUs of course);
  • 30. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Let’s see now a slightly different code using the function torch.from_numpy() this time: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array)
  • 31. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Let’s see now a slightly different code using the function torch.from_numpy() this time: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> torch_array.add_(1.0)
  • 32. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Let’s see now a slightly different code using the function torch.from_numpy() this time: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> torch_array.add_(1.0) >>> np_array array([[2., 2.], [2., 2.]])
  • 33. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Let’s see now a slightly different code using the function torch.from_numpy() this time: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> torch_array.add_(1.0) >>> np_array array([[2., 2.], [2., 2.]]) The original numpy array was changed, because it used a zero-copy operation.
  • 34. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Difference between in-place and standard operations might not be so clear in some cases: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array)
  • 35. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Difference between in-place and standard operations might not be so clear in some cases: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> np_array = np_array + 1.0
  • 36. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Difference between in-place and standard operations might not be so clear in some cases: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> np_array = np_array + 1.0 >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64)
  • 37. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Difference between in-place and standard operations might not be so clear in some cases: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> np_array = np_array + 1.0 >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64) However, if you use np_array += 1.0 , that is an in-place operation that will change torch_array memory.
  • 38. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS at::Tensor tensor_from_numpy(PyObject* obj) { // (...) - omitted for brevity auto array = (PyArrayObject*)obj; int ndim = PyArray_NDIM(array); auto sizes = to_aten_shape(ndim, PyArray_DIMS(array)); auto strides = to_aten_shape(ndim, PyArray_STRIDES(array)); // (...) - omitted for brevity void* data_ptr = PyArray_DATA(array); auto& type = CPU(dtype_to_aten(PyArray_TYPE(array))); Py_INCREF(obj); return type.tensorFromBlob(data_ptr, sizes, strides, [obj](void* data) { AutoGIL gil; Py_DECREF(obj); }); } Pay attention to the reference counting using Py_INCREF() and the call to tensorFromBlob() function.
  • 39. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A DATA POINTERS (object fields) data_pointer* objectPyArrayObject (object fields) data_pointer* objectFloatTensor The tensor FloatTensor did a copy of the numpy array data pointer and not of the contents. The reference is kept safe by the Python reference counting mechanism.
  • 40. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The abstraction responsible for holding the data isn’t actually the Tensor , but the Storage .
  • 41. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The abstraction responsible for holding the data isn’t actually the Tensor , but the Storage . struct C10_API StorageImpl final : (...) { // (...) private: // (...) caffe2::TypeMeta data_type_; DataPtr data_ptr_; int64_t numel_; Allocator* allocator_; }
  • 42. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The abstraction responsible for holding the data isn’t actually the Tensor , but the Storage . struct C10_API StorageImpl final : (...) { // (...) private: // (...) caffe2::TypeMeta data_type_; DataPtr data_ptr_; int64_t numel_; Allocator* allocator_; } Holds a pointer to the raw data and contains information such as the size and allocator; Storage is a dumb abstraction, there is no metadata telling us how to interpret the data it holds;
  • 43. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The Storage abstraction is very powerful because it decouples the raw data and how we can interpret it;
  • 44. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The Storage abstraction is very powerful because it decouples the raw data and how we can interpret it; We can have multiple tensors sharing the same storage, but with different interpretations, also called views, but without duplicating memory:
  • 45. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The Storage abstraction is very powerful because it decouples the raw data and how we can interpret it; We can have multiple tensors sharing the same storage, but with different interpretations, also called views, but without duplicating memory: >>> tensor_a = torch.ones((2, 2)) >>> tensor_b = tensor_a.view(4) >>> tensor_a_data = tensor_a.storage().data_ptr() >>> tensor_b_data = tensor_b.storage().data_ptr() >>> tensor_a_data == tensor_b_data True
  • 46. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The Storage abstraction is very powerful because it decouples the raw data and how we can interpret it; We can have multiple tensors sharing the same storage, but with different interpretations, also called views, but without duplicating memory: >>> tensor_a = torch.ones((2, 2)) >>> tensor_b = tensor_a.view(4) >>> tensor_a_data = tensor_a.storage().data_ptr() >>> tensor_b_data = tensor_b.storage().data_ptr() >>> tensor_a_data == tensor_b_data True tensor_b is a different view (interpretation) of the same data present in the underlying storage that is shared between both tensors.
  • 47. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A MEMORY ALLOCATORS (CPU/GPU) The tensor storage can be allocated either in the CPU memory or GPU, therefore a mechanism is required to switch between these different allocations:
  • 48. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A MEMORY ALLOCATORS (CPU/GPU) The tensor storage can be allocated either in the CPU memory or GPU, therefore a mechanism is required to switch between these different allocations: struct Allocator { virtual ~Allocator() {} virtual DataPtr allocate(size_t n) const = 0; virtual DeleterFnPtr raw_deleter() const {...} void* raw_allocate(size_t n) {...} void raw_deallocate(void* ptr) {...} }; There are Allocator s that will use the GPU allocators such as cudaMallocHost() when the storage should be used for the GPU or posix_memalign() POSIX functions for data in the CPU memory.
  • 49. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A THE BIG PICTURE (object fields) Storage *storage objectTensor Allocator *allocator (object fields) DataPtr data_ptr objectStorage raw_deallocate() (object fields) raw_allocate() objectAllocator Raw Data The Tensor has a Storage which in turn has a pointer to the raw data and to the Allocator to allocate memory according to the destination device.
  • 50. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Section II JIT
  • 51. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A JIT - JUST-IN-TIME COMPILER PyTorch is eager by design, which means that it is easily hackable to debug, inspect, etc;
  • 52. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A JIT - JUST-IN-TIME COMPILER PyTorch is eager by design, which means that it is easily hackable to debug, inspect, etc; However, this poses problems for optimization and for decoupling it from Python (the model itself is Python code);
  • 53. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A JIT - JUST-IN-TIME COMPILER PyTorch is eager by design, which means that it is easily hackable to debug, inspect, etc; However, this poses problems for optimization and for decoupling it from Python (the model itself is Python code); PyTorch 1.0 introduced torch.jit , which has two main methods to convert a PyTorch model to a serializable and optimizable format; TorchScript was also introduced as a statically-typed subset of Python;
  • 54. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A JIT - JUST-IN-TIME COMPILER Two very different worlds with their own requirements. Prototype, debug, train, experiment EAGER MODE Optimization, other languages, deployment SCRIPT MODE tracing scripting
  • 55. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING def my_function(x): if x.mean() > 1.0: r = torch.tensor(1.0) else: r = torch.tensor(2.0) return r
  • 56. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING def my_function(x): if x.mean() > 1.0: r = torch.tensor(1.0) else: r = torch.tensor(2.0) return r >>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2)))
  • 57. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING def my_function(x): if x.mean() > 1.0: r = torch.tensor(1.0) else: r = torch.tensor(2.0) return r >>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2))) >>> ftrace.graph graph(%x : Float(2, 2)) { %4 : Float() = prim::Constant[value={2}]() %5 : Device = prim::Constant[value="cpu"]() %6 : int = prim::Constant[value=6]() %7 : bool = prim::Constant[value=0]() %8 : bool = prim::Constant[value=0]() %9 : Float() = aten::to(%4, %5, %6, %7, %8) %10 : Float() = aten::detach(%9) return (%10); }
  • 58. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING To call the JIT’ed function, just call the forward() method: >>> x = torch.ones(2, 2) >>> ftrace.forward(x) tensor(2.)
  • 59. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING To call the JIT’ed function, just call the forward() method: >>> x = torch.ones(2, 2) >>> ftrace.forward(x) tensor(2.) However, tracing will not record any control-flow like if statements or loops, it executes the code with the given context and creates the graph. You can see this limitation below: >>> x = torch.ones(2, 2).add_(1.0) >>> ftrace.forward(x) tensor(2.) According to my_function() , result should have been 1.0. Tracing also checks for differences between traced and Python function, but what about Dropout ?
  • 60. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING Another alternative is to use scripting, where you can use decorators such as @torch.jit.script : @torch.jit.script def my_function(x): if bool(x.mean() > 1.0): r = 1 else: r = 2 return r
  • 61. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING >>> my_function.graph graph(%x : Tensor) { %2 : float = prim::Constant[value=1]() %5 : int = prim::Constant[value=1]() %6 : int = prim::Constant[value=2]() %1 : Tensor = aten::mean(%x) %3 : Tensor = aten::gt(%1, %2) %4 : bool = prim::Bool(%3) %r : int = prim::If(%4) block0() { -> (%5) } block1() { -> (%6) } return (%r); }
  • 62. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING The my_function() is now a ScriptModule : >>> type(my_function) torch.jit.ScriptModule
  • 63. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING The my_function() is now a ScriptModule : >>> type(my_function) torch.jit.ScriptModule When we check the results again: >>> x = torch.ones(2, 2) >>> my_function(x) 2
  • 64. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING The my_function() is now a ScriptModule : >>> type(my_function) torch.jit.ScriptModule When we check the results again: >>> x = torch.ones(2, 2) >>> my_function(x) 2 >>> x = torch.ones(2, 2).add_(1.0) >>> my_function(x) 1 Control-flow logic was preserved !
  • 65. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well;
  • 66. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well; This opens the door to: Decouple the model (computationl graph) from Python runtime;
  • 67. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well; This opens the door to: Decouple the model (computationl graph) from Python runtime; Use it in production with C++ (no GIL) or other languages;
  • 68. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well; This opens the door to: Decouple the model (computationl graph) from Python runtime; Use it in production with C++ (no GIL) or other languages; Capitalize on optimizations (whole program);
  • 69. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well; This opens the door to: Decouple the model (computationl graph) from Python runtime; Use it in production with C++ (no GIL) or other languages; Capitalize on optimizations (whole program); Split the development world of hackable and easy to debug from the world of putting these models in production and optimize them.
  • 70. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A BUILDING THE IR To build the IR, PyTorch takes leverage of the Python Abstract Syntax Tree (AST) which is a tree representation of the syntactic structure of the source code. >>> ast_mod = ast.parse("print(1 + 2)") >>> astpretty.pprint(ast_mod.body[0], show_offsets=False) Expr( value=Call( func=Name(id='print', ctx=Load()), args=[ BinOp( left=Num(n=1), op=Add(), right=Num(n=2), ), ], keywords=[], ), )
  • 71. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A BUILDING THE IR print(1 + 2)
  • 72. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A PYTORCH JIT PHASES Parsing Checking Optimization Translation Execution○ AST Code or
  • 73. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A EXECUTING Just like Python interpreter executes your code, PyTorch has a interpreter that executes the IR instructions: bool runImpl(Stack& stack) { auto& instructions = function->instructions; size_t last = instructions.size(); while (pc < last) { auto& inst = instructions[pc]; try { loadTensorsFromRegisters(inst.inputs, stack); size_t new_pc = pc + 1 + inst.callback(stack); for (int i = inst.outputs.size - 1; i >= 0; --i) { int reg = get(inst.outputs, i); registers[reg] = pop(stack); } pc = new_pc; // (...) omitted
  • 74. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A OPTIMIZATIONS Many optimizations can be used on the computational graph of the model, such as Loop Unrolling: for i.. i+= 1 for i.. i+= 4 for j.. for j.. code(i, j) code(i, j) code(i+1, j) code(i+2, j) code(i+3, j) remainder loop
  • 75. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A OPTIMIZATIONS Also Peephole optimizations such as: x.t().t() = x
  • 76. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A OPTIMIZATIONS Also Peephole optimizations such as: x.t().t() = x Example: def dumb_function(x): return x.t().t() >>> traced_fn = torch.jit.trace(dumb_function, ... torch.ones(2,2)) >>> traced_fn.graph_for(torch.ones(2,2)) graph(%x : Float(*, *)) { return (%x); }
  • 77. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A OPTIMIZATIONS Also Peephole optimizations such as: x.t().t() = x Example: def dumb_function(x): return x.t().t() >>> traced_fn = torch.jit.trace(dumb_function, ... torch.ones(2,2)) >>> traced_fn.graph_for(torch.ones(2,2)) graph(%x : Float(*, *)) { return (%x); } Other optimizations include Constant Propagation, Dead Code Elimination (DCE), fusion, inlining, etc.
  • 78. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION >>> resnet = torch.jit.trace(models.resnet18(), ... torch.rand(1, 3, 224, 224)) >>> resnet.save("resnet.pt")
  • 79. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION >>> resnet = torch.jit.trace(models.resnet18(), ... torch.rand(1, 3, 224, 224)) >>> resnet.save("resnet.pt") $ file resnet.pt resnet.pt: Zip archive data
  • 80. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION >>> resnet = torch.jit.trace(models.resnet18(), ... torch.rand(1, 3, 224, 224)) >>> resnet.save("resnet.pt") $ file resnet.pt resnet.pt: Zip archive data $ unzip resnet.pt Archive: resnet.pt extracting: resnet/version extracting: resnet/code/resnet.py extracting: resnet/model.json extracting: resnet/tensors/0 (...)
  • 81. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION code/resnet.py op_version_set = 0 def forward(self, input_1: Tensor) -> Tensor: input_2 = torch._convolution(input_1, self.conv1.weight, ...) # (...) input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias, self.bn1.running_mean, self.bn1.running_var, ...) # (...)
  • 82. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION code/resnet.py op_version_set = 0 def forward(self, input_1: Tensor) -> Tensor: input_2 = torch._convolution(input_1, self.conv1.weight, ...) # (...) input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias, self.bn1.running_mean, self.bn1.running_var, ...) # (...) model.json {"parameters": [{ "isBuffer": false, "tensorId": "1", "name": "weight" }], "name": "conv1", "optimize": true}
  • 83. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION code/resnet.py op_version_set = 0 def forward(self, input_1: Tensor) -> Tensor: input_2 = torch._convolution(input_1, self.conv1.weight, ...) # (...) input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias, self.bn1.running_mean, self.bn1.running_var, ...) # (...) model.json {"parameters": [{ "isBuffer": false, "tensorId": "1", "name": "weight" }], "name": "conv1", "optimize": true} model.json [{"isBuffer": true, "tensorId": "4", "name": "running_mean"}, {"isBuffer": true, "tensorId": "5", "name": "running_var"}], "name": "bn1", "optimize": true}
  • 84. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A USING THE MODEL IN C++ PyTorch also has a C++ API that you can use to load/train models in C++. This is good for production, mobile, embedded devices, etc. Example of loading a traced model in PyTorch C++ API: #include <torch/script.h> int main(int argc, const char* argv[]) { auto module = torch::jit::load("resnet.pt"); std::vector<torch::jit::IValue> inputs; inputs.push_back(torch::ones({1, 3, 224, 224})); at::Tensor output = module->forward(inputs).toTensor(); }
  • 85. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A USING THE MODEL IN NODEJS Complete tutorial at https://goo.gl/7wMJuS.
  • 86. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Section III PRODUCTION
  • 87. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed);
  • 88. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed); They don’t pay attention to zero-copy practices, so they often transform, reshape, convert to numpy, convert to PyTorch, etc;
  • 89. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed); They don’t pay attention to zero-copy practices, so they often transform, reshape, convert to numpy, convert to PyTorch, etc; They often use HTTP/1;
  • 90. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed); They don’t pay attention to zero-copy practices, so they often transform, reshape, convert to numpy, convert to PyTorch, etc; They often use HTTP/1; They seldom do batching (important for GPUs);
  • 91. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed); They don’t pay attention to zero-copy practices, so they often transform, reshape, convert to numpy, convert to PyTorch, etc; They often use HTTP/1; They seldom do batching (important for GPUs); They never put that "production" code in production.
  • 92. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A PREFER BINARY SERIALIZATION FORMATS Prefer using good binary serialization methods such as Protobuf that offers a schema and a schema evolution mechanism. Example from EuclidesDB RPC message: message AddImageRequest { int32 image_id = 1; bytes image_data = 2; // This field can encode JSON data bytes image_metadata = 3; repeated string models = 4; } * http://euclidesdb.readthedocs.io
  • 93. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A AVOID EXTRA COPIES Be careful to avoid extra copies of your tensors, especially during pre-processing;
  • 94. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A AVOID EXTRA COPIES Be careful to avoid extra copies of your tensors, especially during pre-processing; You can use in-place operations. It is a functional anti-pattern because it introduces side-effects, but it’s a fair price to pay for performance;
  • 95. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A AVOID EXTRA COPIES Be careful to avoid extra copies of your tensors, especially during pre-processing; You can use in-place operations. It is a functional anti-pattern because it introduces side-effects, but it’s a fair price to pay for performance; Caveat: in-place operations doesn’t make much sense when you need gradients. PyTorch uses tensor versioning to catch that: >>> a = torch.tensor(1.0, requires_grad=True) >>> y = a.tanh() >>> y.add_(2.0) >>> y.backward() # error ! >>> a._version 0 >>> y._version 1
  • 96. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0
  • 97. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining
  • 98. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining Client Server Time HTTP 1.1 - HoL
  • 99. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining Client Server Time HTTP 1.1 - HoL Client Server Time HTTP 2.0 - Multiplexing
  • 100. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining Client Server Time HTTP 1.1 - HoL Client Server Time HTTP 2.0 - Multiplexing Use HTTP 2.0 if possible, and avoid the head-of-line blocking;
  • 101. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining Client Server Time HTTP 1.1 - HoL Client Server Time HTTP 2.0 - Multiplexing Use HTTP 2.0 if possible, and avoid the head-of-line blocking; Even better, you can use frameworks such as gRPC that uses HTTP/2.0 and Protobuf.
  • 102. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A BATCHING Batching data is a way to amortize the performance bottleneck. GPU Non-batching Requests
  • 103. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A BATCHING Batching data is a way to amortize the performance bottleneck. GPU Non-batching Requests GPU Batch 2 Batch 1 BatchingRequests
  • 104. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Section IV Q&A
  • 105. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Q&A Thanks !