8. Tensorflow tools to optimize model (optimize_for_inference.py)
There are several common transformations that can be applied to GraphDefs
created to train a model, that help reduce the amount of computation needed
when the network is used only for inference. These include:
- Removing training-only operations like checkpoint saving.
- Stripping out parts of the graph that are never reached.
- Removing debug operations like CheckNumerics.
- Folding batch normalization ops into the pre-calculated weights.
- Fusing common operations into unified versions.
9. .tflite
TensorFlow Lite defines a new model file format, based on
FlatBuffers. FlatBuffers is an open-sourced, efficient cross
platform serialization library.
10. FlatBuffer
FlatBuffers is an efficient cross platform serialization library for C++, C#, C, Go,
Java, JavaScript, TypeScript, PHP, and Python. It was originally created at Google
for game development and other performance-critical applications.
13. Tensorflow Lite Design
Converter
(to tensorflow lite
format)
Interpre Core
operation kernels
Hardware
accelerator
Flatbuffer base model
Prefusion op kernel
Specially optimized kernels
optimized for NEON on ARM
14. ARM NN SDK
Arm NN bridges the gap between
existing NN frameworks and the
underlying IP. It enables efficient
translation of existing neural
network frameworks, such as
TensorFlow and Caffe, allowing
them to run efficiently – without
modification – across Arm Cortex
CPUs and Arm Mali GPUs.
15. ARM Computer Library
The Compute Library contains a comprehensive collection of software functions
implemented for the Arm Cortex-A family of CPU processors(NEON) and the Arm
Mali family of GPUs(OpenCL). It is a convenient repository of low-level optimized
functions that developers can source individually or use as part of complex
pipelines in order to accelerate their algorithms and applications.
16. ASUS ThinkerBoard
● CPU RK3288
○ Quad-core Cortex-A17 up to 1.8GHz
● GPU
○ ARM Mali™-T764
● Memory
○ 2GB LPDDR3
17. Run Alexnet on Thinkerboard / PC
CPU NN Framework
Thinker board
(RK3288 Quad-core Cortex-A17
up to 1.8GHz With NEON)
real 0m5.499s
user 0m13.050s
sys 0m0.750s
ARM Compute Library
Lenovo
(Intel(R) Core(TM) i7-6500U CPU
@ 2.50GHz)
real 0m16.067s
user 0m15.544s
sys 0m0.136s
OpenVX