This document summarizes Intel Nervana Graph, a graph compiler developed by Nervana Systems and now maintained by Intel. It discusses how Nervana Graph can import models from frameworks like Caffe, TensorFlow, MXNet and convert them to an intermediate graph representation. It then describes how different transformers can convert the graph to executable code for CPUs or GPUs. The document provides code examples for using Nervana Graph with Caffe and TensorFlow models and discusses the implementation of the graph transformations and compiler passes.
8. 出ましたよ
https://www.intelnervana.com/intel-nervana-graph-and-neon-3-0-updates/
The connection between the XLA and
Intel Nervana Graph APIs was quite
straightforward given the similar
projects’ intent for a compact and
explicit intermediate representation.
While today the XLA/Intel Nervana
Graph integration is at a pre-alpha level,
we’d love for people to take it for a spin
and kick the tires. We’re working on
ironing out known performance issues and
improving op and backend support.
Intel Nervana Graph Beta : 2017/6/22
12. neon vs cuDNN 4
“Not so fast, FFT”: Winograd (March 3, 2016)
引用:https://www.nervanasys.com/winograd/
13. cuDNN 5
Optimizing Recurrent Neural Networks in
cuDNN 5 (April 6, 2016)
https://devblogs.nvidia.com/parallelforall/optimizing-recurren
t-neural-networks-cudnn-5/
Faster forward
and backward convolutions
using the Winograd
convolution algorithm;
14. Winogradで高速化!
Fast Algorithms
for Convolutional Neural Networks
Andrew Lavin, Scott Gray
https://arxiv.org/abs/1509.09308
Going beyond full utilization: The inside scoop
on Nervana’s Winograd kernels (June 29, 2016)
https://www.nervanasys.com/winograd-2/
15. neon v1.3 vs cuDNN v5.1
Still not slowing down: Benchmarking optimized
Winograd implementations (July 25, 2016)
引用:https://www.nervanasys.com/winograd-3/
vs cuDNN v4 vs cuDNN v5.1
16. Scott Gray さん
https://twitter.com/scottgray76
High-Performance GPU kernels for deep learning
• Fast matrix multiply for small minibatches
• Direct convolution leveraging GEMM advances
• Even faster convolution with Winograd
Nervana (2014年10月 〜 2017年7月)
現在は、Open AI所属 (〜 2017年7月)
引用
:http://on-demand.gputechconf.com/gtc/2016/presentation/s6485-scott-gray-gpu-programming-deep-learnin
g.pdf
20. MKL-DNN Support
Mar 23, 2017 :Intelに買収された後
To install with Intel MKL-DNN support, first download
MKL-DNN from [here]
・(https://github.com/01org/mkl-dnn) and follow the
installation instructions
・there to install MKL-DNN. Set environment variable
MKLDNN_ROOT to point to
・the installated location and follow the rest of the
steps to install Ngraph
引用:https://github.com/NervanaSystems/ngraph/commit/f3b7306214f40b4c1b4c40e3e223080797afb382
21. Transformer API
・CPU と GPU をサポート
Memory usage optimization passes
Transformers allow users to register an included
set of optional compiler passes
for debug and visualization.
・GPU
automatic kernel fusion/compounding
for increased performance
・LLVMのPassのような仕組み
引用:https://github.com/NervanaSystems/ngraph/blob/master/README.md
23. 例題
import ngraph as ng
import ngraph.transformers as ngt
x = ng.placeholder(())
x_plus_one = x + 1
transformer = ngt.make_transformer()
plus_one = transformer.computation(x_plus_one, x)
for i in range(5):
print(plus_one(i))
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/overview.rst
24. 将来サポートするもの?
・Nervana Graph serialization/deserialization
・Further improvements/abstractions to graph
composability for usability/optimization
・Distributed, heterogeneous backend target support
・C APIs for interoperability to enable other languages
to create/execute graphs
・Better debugging
・Support for model deployment
引用:https://github.com/NervanaSystems/ngraph/blob/master/README.md
26. Caffeでの例
from __future__ import print_function
import ngraph.transformers as ngt
from ngraph.frontends.caffe.cf_importer.importer import
parse_prototxt
model = "sum.prototxt"
op_map = parse_prototxt(model,verbose=True)
op = op_map.get("D")
res = ngt.make_transformer().computation(op)()
print("Result is:",res)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/caffe.rst
27. TensorFlowでの例
x = tf.constant(1.)
y = tf.constant(2.)
f = x + y
importer = TFImporter()
importer.import_graph_def(tf.Session().graph_def)
f_ng = importer.get_op_handle(f)
transformer = ngt.make_transformer()
f_result = transformer.computation(f_ng)()
print(f_result)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/tensorflow.rst
28. Transformers
Transformers are used to convert the Op graph into a backend
specific executable format. Once the graph has been defined,
one or more computations are created using a transformer.
Computations are handles to executable objects created by
the transformer, which can be called to evaluate a subset of
the entire graph. All transformers must implement a common
abstract interface allowing users to easily switch between
backends without altering their computation graph definition.
サポートしているバックエンド
・CPUs (via NumPy)
・NVIDIA GPUs (via PyCUDA)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
29. Transformersの生成
1)、デフォルト
from ngraph.transformers import make_transformer
transformer = make_transformer()
2)、ファクトリを利用
import ngraph.transformers as ngt
available_transformers = ngt.transformer_choices()
if 'gpu' in available_transformers:
factory = ngt.make_transformer_factory('gpu')
ngt.set_transformer_factory(factory)
transformer = ngt.make_transformer()
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
30. Computations
Computation objects are created by the transformer and
provide an interface to evaluate a subset of the graph. The
format of the executable used for evaluation depends on the
transformer that created the computation. For example the
CPU transformer generates python NumPy code which is called
to evaluate the computation, while the GPU transformer
generates a series of CUDA kernels which can be called to
evaluate the computation.
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
31. Computationsの生成
import ngraph as ng
a = ng.constant(4)
b = ng.placeholder(())
c = ng.placeholder(())
d = ng.multiply(a, b)
e = ng.add(d, c)
example_comp = transformer.computation(e, b, c)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
32. Computationsの実行
example_comp = transformer.computation(e, b, c)
result_e = eの戻り値
b = 第一引数
c = 第二引数
result_e = example_comp(2, 7) : b = 2, c = 7
result_e = (4 * b) + c => ( 4*2 ) + 7 = 15
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
33. Computationsの実行
複数の戻り値
example_comp2 = transformer.computation([d, e], b, c)
result_d = dの戻り値, result_e = eの戻り値
b = 第一引数
c = 第二引数
result_d, result_e = example_comp2(2, 7)
result_d = (4 * b) = (4 * 2) = 8
result_e = (4 * b) + c => (4 * 2) + 7 = 15
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
36. Transformer_ABC_Metaクラス
class Transformer_ABC_Meta(abc.ABCMeta):
"""
metaclass for the backend objects
takes care of registering all the backend subclasses
"""
def __init__(cls, name, bases, dict_):
if not hasattr(cls, 'transformers'):
# First possible transformer class sets things up
cls.transformers = {}
# If this transformer has a transformer_name, register it
transformer_name = getattr(cls, 'transformer_name', None)
if transformer_name is not None:
cls.transformers[transformer_name] = cls
super(Transformer_ABC_Meta, cls).__init__(name, bases, dict_)
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
37. Transformerクラス
class Transformer(with_metaclass(Transformer_ABC_Meta, object)):
"""
Produce an executable version of op-graphs.
Computations are subsets of Ops to compute. The transformer determines storage
allocation and transforms the computations and allocations into functions.
Arguments:
fusion (bool): Whether to combine sequences of operations into one operation.
**kwargs: Args for related classes.
Attributes:
computations (:obj:`set` of :class:`Computation`): The set of requested computations.
all_results (:obj:`set` of :class:`ngraph.op_graph.op_graph.Op`): A root set of Ops that
need to be computed.
finalized (bool): True when transformation has been performed.
initialized (bool): True when variables have been initialized/restored.
fusion (bool): True when fusion was enabled.
device_buffers (set): Set of handles for storage allocations.
"""
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
39. Computationクラス
class Computation(NameableValue):
"""
A handle for a computation function.
Arguments:
transformer (obj:`Transformer`): The associated transformer.
returns: If an Op, return the value
of the Op, if sequence of Ops, return the sequence of values, if
a set return a map, if None, return None.
*args: AllocationOps marked input will be arguments to the function.
**kwargs: Args for related classes.
"""
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers