Tensorflow Lite and ARM Compute Library

•

0 likes•984 views

Kobe Yu

Advance compiler 5 28 Tensorflow lite and ARM compute library

Software

Tensorflow Lite
and
Arm Computer Library
Kobe Yu

Why on-device ML?
● Lower lantency, no server calls
● Works offline
● Data stays on device
● Power efficient
● All sensor data accessible on-device

On-device ML is hard
● Tight memory constraints
● Low energy usage to preserve batteries
● Little compute power

Tensorflow Lite size and speed
● Size
○ Core Interpreter + all supportedops:~400KB
○ How?
■ compact interpreter and flatbuffer parsing
■ tight dependencies
■ selective registration
● Speed
○ flatbuffer directily access data without parsing
○ prefusion operation
○ Hardware acceleration delegates

Tensorflow Lite Design
Converter
(to tensorflow lite
format)
Interprer Core
operation kernels
Hardware
accelerator
Mobile devicePC

Model
https://heartbeat.fritz.ai/intro-to-machine-learning-on-android-how-to-convert-a-custom-model-to-tensorflow-lite-e07d2d9d50e3

Tensorflow tools to optimize model (optimize_for_inference.py)
There are several common transformations that can be applied to GraphDefs
created to train a model, that help reduce the amount of computation needed
when the network is used only for inference. These include:
- Removing training-only operations like checkpoint saving.
- Stripping out parts of the graph that are never reached.
- Removing debug operations like CheckNumerics.
- Folding batch normalization ops into the pre-calculated weights.
- Fusing common operations into unified versions.

.tflite
TensorFlow Lite defines a new model file format, based on
FlatBuffers. FlatBuffers is an open-sourced, efficient cross
platform serialization library.

FlatBuffer
FlatBuffers is an efficient cross platform serialization library for C++, C#, C, Go,
Java, JavaScript, TypeScript, PHP, and Python. It was originally created at Google
for game development and other performance-critical applications.

FlatBuffer
class Person {
String name;
int friendshipStatus;
Person spouse;
List<Person>friends;
}

FlatBuffer
http://labs.gree.jp/blog/2015/11/14495/

Tensorflow Lite Design
Converter
(to tensorflow lite
format)
Interpre Core
operation kernels
Hardware
accelerator
Flatbuffer base model
Prefusion op kernel
Specially optimized kernels
optimized for NEON on ARM

ARM NN SDK
Arm NN bridges the gap between
existing NN frameworks and the
underlying IP. It enables efficient
translation of existing neural
network frameworks, such as
TensorFlow and Caffe, allowing
them to run efficiently – without
modification – across Arm Cortex
CPUs and Arm Mali GPUs.

ARM Computer Library
The Compute Library contains a comprehensive collection of software functions
implemented for the Arm Cortex-A family of CPU processors(NEON) and the Arm
Mali family of GPUs(OpenCL). It is a convenient repository of low-level optimized
functions that developers can source individually or use as part of complex
pipelines in order to accelerate their algorithms and applications.

ASUS ThinkerBoard
● CPU RK3288
○ Quad-core Cortex-A17 up to 1.8GHz
● GPU
○ ARM Mali™-T764
● Memory
○ 2GB LPDDR3

Run Alexnet on Thinkerboard / PC
CPU NN Framework
Thinker board
(RK3288 Quad-core Cortex-A17
up to 1.8GHz With NEON)
real 0m5.499s
user 0m13.050s
sys 0m0.750s
ARM Compute Library
Lenovo
(Intel(R) Core(TM) i7-6500U CPU
@ 2.50GHz)
real 0m16.067s
user 0m15.544s
sys 0m0.136s
OpenVX

What's hot

Veriloggen: Pythonによるハードウェアメタプログラミング（第3回高位合成友の会＠ドワンゴ）Shinya Takamaeda-Y

DevConf 2014 Kernel Networking WalkthroughThomas Graf

TC Flower OffloadNetronome

Linux Internals - Part IIEmertxe Information Technologies Pvt Ltd

Deploying IPv6 on OpenStackVietnam Open Infrastructure User Group

Fun with Network InterfacesKernel TLV

The TCP/IP Stack in the Linux KernelDivye Kapoor

Deep dive into highly available open stack architecture openstack summit va...Arthur Berezin

Linux Networking ExplainedThomas Graf

[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기Ian Choi

Kernel Recipes 2015: Kernel packet capture technologiesAnne Nicolas

A Kernel of Truth: Intrusion Detection and Attestation with eBPFoholiab

How Linux Processes Your Network Packet - Elazar LeibovichDevOpsDays Tel Aviv

오픈스택 멀티노드 설치 후기영우 김

HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...Linaro

ToolchainAnil Kumar Pugalia

Proxmox for DevOpsJorge Moratilla Porras

Faster packet processing in Linux: XDPDaniel T. Lee

Implementation & Comparison Of Rdma Over EthernetJames Wernicke

Deep dive in container service discoveryDocker, Inc.

What's hot (20)

Veriloggen: Pythonによるハードウェアメタプログラミング（第3回高位合成友の会＠ドワンゴ）

DevConf 2014 Kernel Networking Walkthrough

TC Flower Offload

Linux Internals - Part II

Deploying IPv6 on OpenStack

Fun with Network Interfaces

The TCP/IP Stack in the Linux Kernel

Deep dive into highly available open stack architecture openstack summit va...

Linux Networking Explained

[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기

Kernel Recipes 2015: Kernel packet capture technologies

A Kernel of Truth: Intrusion Detection and Attestation with eBPF

How Linux Processes Your Network Packet - Elazar Leibovich

오픈스택 멀티노드 설치 후기

HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...

Toolchain

Proxmox for DevOps

Faster packet processing in Linux: XDP

Implementation & Comparison Of Rdma Over Ethernet

Deep dive in container service discovery

Similar to Tensorflow Lite and ARM Compute Library

TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode

Cockatrice: A Hardware Design Environment with ElixirHideki Takase

NFV and SDN: 4G LTE and 5G Wireless Networks on Intel(r) ArchitectureMichelle Holley

TFLite NNAPI and GPU DelegatesKoan-Sin Tan

Deep Learning on ARM Platforms - SFO17-509Linaro

Glossary of terms (assignment...)gordonpj96

Deep Learning with Spark and GPUsDataWorks Summit

Parallel and Distributed Computing Chapter 8AbdullahMunir32

TensorflowSlobodan Blazeski

Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com

Os Lamotheoscon2007

Assembly chapter One.pptxssuserb78e291

Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community

DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann

Accelerating Insights in the Technical Computing TransformationIntel IT Center

Intel Knights Landing SlidesRonen Mendezitsky

oneAPI: Industry Initiative & Intel ProductTyrone Systems

GPEH, PCHR, CHR, MR, SIG, CTUM, CELL TRACE, UETR Parsers - InnovileAhmet Ozturk

Stream Processingarnamoy10

Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community

Similar to Tensorflow Lite and ARM Compute Library (20)

TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...

Cockatrice: A Hardware Design Environment with Elixir

NFV and SDN: 4G LTE and 5G Wireless Networks on Intel(r) Architecture

TFLite NNAPI and GPU Delegates

Deep Learning on ARM Platforms - SFO17-509

Glossary of terms (assignment...)

Deep Learning with Spark and GPUs

Parallel and Distributed Computing Chapter 8

Tensorflow

Preparing to program Aurora at Exascale - Early experiences and future direct...

Os Lamothe

Assembly chapter One.pptx

Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster

DBCC 2021 - FLiP Stack for Cloud Data Lakes

Accelerating Insights in the Technical Computing Transformation

Intel Knights Landing Slides

oneAPI: Industry Initiative & Intel Product

GPEH, PCHR, CHR, MR, SIG, CTUM, CELL TRACE, UETR Parsers - Innovile

Stream Processing

Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster

Recently uploaded

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

5 Signs You Need a Fashion PLM Software.pdfWave PLM

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz

Define the academic and professional writing..pdfPearlKirahMaeRagusta1

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

Right Money Management App For Your Financial GoalsJhone kinadey

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

HR Software Buyers Guide in 2024 - HRSoftware.com

5 Signs You Need a Fashion PLM Software.pdf

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

How to Choose the Right Laravel Development Partner in New York City_compress...

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Define the academic and professional writing..pdf

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Optimizing AI for immediate response in Smart CCTV

Right Money Management App For Your Financial Goals

Microsoft AI Transformation Partner Playbook.pdf

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

10 Trends Likely to Shape Enterprise Technology in 2024

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Tensorflow Lite and ARM Compute Library

1. Tensorflow Lite and Arm Computer Library Kobe Yu

2. Why on-device ML? ● Lower lantency, no server calls ● Works offline ● Data stays on device ● Power efficient ● All sensor data accessible on-device

3. On-device ML is hard ● Tight memory constraints ● Low energy usage to preserve batteries ● Little compute power

4. Tensorflow Lite

5. Tensorflow Lite size and speed ● Size ○ Core Interpreter + all supportedops:~400KB ○ How? ■ compact interpreter and flatbuffer parsing ■ tight dependencies ■ selective registration ● Speed ○ flatbuffer directily access data without parsing ○ prefusion operation ○ Hardware acceleration delegates

6. Tensorflow Lite Design Converter (to tensorflow lite format) Interprer Core operation kernels Hardware accelerator Mobile devicePC

7. Model https://heartbeat.fritz.ai/intro-to-machine-learning-on-android-how-to-convert-a-custom-model-to-tensorflow-lite-e07d2d9d50e3

8. Tensorflow tools to optimize model (optimize_for_inference.py) There are several common transformations that can be applied to GraphDefs created to train a model, that help reduce the amount of computation needed when the network is used only for inference. These include: - Removing training-only operations like checkpoint saving. - Stripping out parts of the graph that are never reached. - Removing debug operations like CheckNumerics. - Folding batch normalization ops into the pre-calculated weights. - Fusing common operations into unified versions.

9. .tflite TensorFlow Lite defines a new model file format, based on FlatBuffers. FlatBuffers is an open-sourced, efficient cross platform serialization library.

10. FlatBuffer FlatBuffers is an efficient cross platform serialization library for C++, C#, C, Go, Java, JavaScript, TypeScript, PHP, and Python. It was originally created at Google for game development and other performance-critical applications.

11. FlatBuffer class Person { String name; int friendshipStatus; Person spouse; List<Person>friends; }

12. FlatBuffer http://labs.gree.jp/blog/2015/11/14495/

13. Tensorflow Lite Design Converter (to tensorflow lite format) Interpre Core operation kernels Hardware accelerator Flatbuffer base model Prefusion op kernel Specially optimized kernels optimized for NEON on ARM

14. ARM NN SDK Arm NN bridges the gap between existing NN frameworks and the underlying IP. It enables efficient translation of existing neural network frameworks, such as TensorFlow and Caffe, allowing them to run efficiently – without modification – across Arm Cortex CPUs and Arm Mali GPUs.

15. ARM Computer Library The Compute Library contains a comprehensive collection of software functions implemented for the Arm Cortex-A family of CPU processors(NEON) and the Arm Mali family of GPUs(OpenCL). It is a convenient repository of low-level optimized functions that developers can source individually or use as part of complex pipelines in order to accelerate their algorithms and applications.

16. ASUS ThinkerBoard ● CPU RK3288 ○ Quad-core Cortex-A17 up to 1.8GHz ● GPU ○ ARM Mali™-T764 ● Memory ○ 2GB LPDDR3

17. Run Alexnet on Thinkerboard / PC CPU NN Framework Thinker board (RK3288 Quad-core Cortex-A17 up to 1.8GHz With NEON) real 0m5.499s user 0m13.050s sys 0m0.750s ARM Compute Library Lenovo (Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz) real 0m16.067s user 0m15.544s sys 0m0.136s OpenVX

Tensorflow Lite and ARM Compute Library

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tensorflow Lite and ARM Compute Library

Similar to Tensorflow Lite and ARM Compute Library (20)

More from Kobe Yu

More from Kobe Yu (7)

Recently uploaded

Recently uploaded (20)

Tensorflow Lite and ARM Compute Library