Chainer v3

Chainer v3
Chainer Meetup #06 @ PFN, Sep. 30, 2017
Seiya Tokui @ Preferred Networks

Recent/coming releases
• Chainer v3.0.0 RC, v2.1.0: Sep. 12
• v3 RC was the 50th release!
• CuPy v2.0.0 RC, v1.0.3 on the same day
• Next release: Chainer v3.0.0 and v4.0.0α on Oct. 17
• CuPy v2.0.0 and v3.0.0α on the same day
• Today, I mainly talk about the features of CuPy v2.0.0 RC and
Chainer v3.0.0 RC

Chainer v3.0.0rc1
• For most users, the backward compatibility is maintained
• See the release notes of v3.0.0rc1 for some small breaks that do not
affect most users
• The inner-working is greatly changed
• It may cause some existing code that directly touches the
computational graphs broken
• Thanks to this change, we now support double backprop
(a.k.a. gradient of gradients) as announced

Double backprop
• Automatic backpropagation through gradients
• When is it needed?
• Consider a loss function that includes a gradient computation as a
term/factor
• E.g. the loss function for WGAN-GP:
𝔼 𝑥∼ℙ 𝑔
𝐷 𝑥 − 𝔼 𝑥∼ℙ 𝑟
𝐷 𝑥 + 𝜆𝔼 𝑥∼ℙ 𝑥
𝛻𝑥 𝐷 𝑥 2 − 1 2
• To take the gradient of this loss function, we need to do backprop
through 𝛻𝑥 𝐷( 𝑥), which itself we want to compute with backprop!
gradient

Double backprop in Chainer v3
• Many functions now support double backprop
• Those functions are rewritten to implement a new interface named
FunctionNode (such functions are called new-style Functions)
• backward() takes Variable instead of ndarray as grad_outputs
and return values, which means backward() itself can be
differentiated
• Variable has now an attribute grad_var, which represents
the gradient as a Variable (so that we can use it in the
computational graph)

How to implement WGAN-GP
1. Using Variable.backward()
x_tilde = generator(z)
x_hat = x + u * (x_tilde – x)
D(x_hat).backward(enable_double_backprop=True)
# 1st diff
gp = lambda * (x_hat.grad_var – 1) ** 2
loss = D(x_tilde) – D(x) + gp
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff

How to implement WGAN-GP
2. Using grad()
x_tilde = generator(z)
x_hat = x + u * (x_tilde – x)
gx_hat, = chainer.grad([D(x_hat)], [x_hat],
enable_double_backprop=True) # 1st diff
gp = lambda * (gx_hat – 1) ** 2
loss = D(x_tilde) – D(x) + gp
loss.backward() # 2nd diff
This version is more efficient because grad() can skip the gradient
computation for parameters (thus also we can drop cleargrads()).

New-style Function support
• Most “standard” functions are now ported to the new-style
interface:
+, -, *, Convolution2D, Deconvolution2D, EmbedID, Linear,
LSTM, BatchNormalization, sigmoid, relu, leaky_relu, softmax,
log_softmax, tanh, exp, mean_squared_error,
softmax_cross_entropy, dropout, layer_normalization,
transpose, reshape, broadcast_to, sum, concat, __getitem__,
etc…
• We are still working on widening the double backprop
support. Contributions are also welcome!!

Other features
• Functions: layer_normalization, selu, arctan2, prod,
NumPy-compatible matmul
• Links: ChildSumTreeLSTM, NaryTreeLSTM,
BatchRenormalization
• Other new features: LeCunNormal, as_variable(),
Variable.array, strict option of load_npz(), etc.

CuPy v2.0.0rc1
• Sparse matrix support
• Complex number support
• Improved memory allocator
• Many new functions, esp. of linear algebra routines

Sparse matrix support
• cupy.sparse --- the sparse matrix support with APIs
compatible to scipy.sparse
• CSR/CSC/COO and diagonal format
• Basic arithmetics, matrix product, element indexing
• Slicing along the major axis
• Dense <-> Sparse conversion

Complex number support
• CuPy now supports complex numbers!
• Dtypes complex32, complex64, complex128 are now available
• Routines related to complex numbers:
angle, conj, imag, real

Linear algebra routines
• Solvers, matrix inversion, determinant, eigenvalues, etc.:
solve, tensorsolve, inv, pinv, det, slogdet, eigh,
eigvalsh, matrix_rank
• All under cupy.linalg namespace
• einsum is also supported (thanks, @fukatani!)
• Flexible tensor product/reduction based on Einstein convention

Improved memory allocator
• The memory pool is greatly improved
• It now uses “best-fit with coalescing” algorithm
• The memory region is reused even if the size does not exactly match
• It may also contribute to the speed improvement, thanks to the
reduced number of reallocations
• Example: the new seq2seq example originally uses all the
memory of 12GB GPU, whose usage is reduced to 3GB, and
also the execution time is reduced by appx. 25%.

Next versions
• As you may know, we slightly changed the release policy
again; the stable releases may now include some new
features (thus v2.1.0 instead of v2.0.3).
• v4 is scheduled based on our release policy: v4.0.0 will be
three months after v3.0.0 (which will be mid Jan. if there is no
delay).
• The core features of v4 is not determined yet; let’s have
discussions!

Chainer v3

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (14)

Ähnlich wie Chainer v3

Ähnlich wie Chainer v3 (20)

Mehr von Seiya Tokui

Mehr von Seiya Tokui (19)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Chainer v3