SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
© 2022 Flex Logix Technologies 1
The Flex Logix InferX X1:
Pairing Software and
Hardware to Enable Edge
Machine Learning
Randy Allen
Vice President of Software
Flex Logix Technologies
© 2022 Flex Logix Technologies
History of ML/AI
2
1950 1960 1970 1980 1990 2000 2010 2020
KNOWLEDGE DRIVEN
1952: ML coined
DATA DRIVEN
Winter Winter
Boom
AI
ML
ML Winter
Birth Symbolic AI Tedious engineering Deep learning
Legitimized Minis Killer Micros Ubiquitous GPUs
1965 Neural
Networks
1997: Deep Blue
2006: Deep
Learning
© 2022 Flex Logix Technologies
“
“It’s not possible.”
3
On hitting power and
thermal barriers, the
hardware world threw up
a Hail Mary of
“parallelism” hoping
someone in the
software world will run
underneath and catch it
Dave Patterson,
BERKELEY ARCHITECTURE
DUDE
When we start talking about
parallelism and ease of use of truly
parallel computers, we’re talking
about a problem that’s as hard as
any that computer science has
faced… I would be panicked if I
were in the industry
John Hennessy,
STANFORD COMPILER DUDE
“
“No, it’s necessary.”
© 2022 Flex Logix Technologies
Not a new challenge
4
“Dead Parallel Computer Society”
Convex, Alliant, Multiflow, Encore, FPS, Inmos, Kendall Square,
MasPar, nCUBE, Sequent, SGI, Thinking Machines ….
RIP
They Tried
and Died
© 2022 Flex Logix Technologies
Matrix multiplication: a simple example
Scalar
Vector
for(i=0; i<n; i++)
for(j=0; j<n; j++)
for(k=0; k<n; k++)
c[i][j] += a[i][k] * b[k][j];
for(i=0; i<n; i++)
for(js=0; js<n; js+=s)
for(k=0; k<n; k++)
for(j=js; j<js+s; j++) //vector
c[i][j] += a[i][k] * b[k][j];
5
© 2022 Flex Logix Technologies
Matrix multiplication in parallel
Scalar: Cache optimized
for(ib=0; ib<n; ib+=bi)
for(jb=0; jb<n; jb+=bj)
for(k=0; k<n; k++)
for(i=ib; i<ib+bi; i++)
for(j=jb; j < jb+bj; j++)
c[i][j] += a[i][k] * b[k][j];
6
© 2022 Flex Logix Technologies
Systolic array
for(i=0; i<n; i++)
for (j=i; i>0; i–){
for (k=i; i>1; k–) {
ta(j,k) = ta(j,k-1);
tb(k,j) = tb(k-1,j);
}
}
it = i;
for (j=0; j<i; j++){
ta(j,i) = a(j,t);
tb(1,j) = b(t,j);
it = it - 1;
}
for (j=0; j<i; j++)
for k=0; k<i; k++)
c(j,k) += ta(j,k) * tb(j,k);
7
© 2022 Flex Logix Technologies
Achieving balance
Distributed
Memory
Shared
Memory
Number of processors
Delivered Performance
One Many
Efficiency
Parallelism
Easy to Build
Hard to Program
Easy to Program
Hard to Build
8
© 2022 Flex Logix Technologies
The real challenge of AI is
NOT
Packing more processors onto a die than anyone else in the world
Creating the fastest processor synchronization in the world
Designing a large ({distributed, fetch-and-phi, shared,}) memory system
BUT INSTEAD
Designing a multiprocessor system with a balance between processors and memory
With the right set of specialized instructions for acceleration
In conjunction with software that can effectively utilize the system
BECAUSE
More processors are useful only if the software can compile the net to use them
Eliminating the need for synchronization is far more effective than faster synchronization
The key to fast processing is reusing data, not fetching it quickly
9
© 2022 Flex Logix Technologies
The Flex Logix InferX™ X1
10
• Dynamic TPU Array
• Accelerator/Co-processor for
host processor
• ASIC performance but dynamic
to new models
• Low power/High performance
• Designed for edge (B=1)
applications
4MB L3
SRAM
eFPGA
LPDDR
4
x32
PCIe
Gen3/4
x4
4K MAC Units
8MB distributed
L2 SRAM
© 2022 Flex Logix Technologies
ASIC efficiency with CPU/GPU flexibility
11
2019: Xin Feng, Computer vision algorithms and hardware implementations:
A survey InferX position added by Flex Logix
InferX Technology
• Dynamic Tensor Processing
• Highly Silicon Efficient
• Programmed via standard AI Model
Paradigms (TensorFlow, PyTorch,
ONNX)
• Layer by Layer reconfiguration in
microseconds
© 2022 Flex Logix Technologies
Embedded Application
+ X1
X1 drivers X1 Runtime
Embedded App at The Edge
Runtime API
Neural Net
Conv
AvgPool
MaxPool
Concat
.ncf
InferXDK
Software
Development
Toolkit
System level view of software
12
NN Model Framework
© 2022 Flex Logix Technologies
InferX compiler: unleashes the power of X1
13
NN FRAGMENT
SELECTION OF
OPERATORS
Add
Convolution
AvgPool
MaxPool
Affine
Concatenate
….
Configure Connect
Select
L2 1DTPU
+
1DTPU
L2
1DTPU
1DTPU L2
L2 1DTPU
+
1DTPU
L2
1DTPU
1DTPU L2
Input
Memory
MACS Misc Ops Output
Memory
© 2022 Flex Logix Technologies
INFERX RUNTIME C++ API
FUNCTIONS DESCRIPTION ARGS
StartInference Executes single inference Data_handle
InferMulti Executes multiple inferences times
WaitForInferenceReady Waits for inference to finish { }
GetInferOutputData Retrieves inference results data_out
SetModel Loads model into memory filename
SetData Established Data data
Initialize Initialize X1 HW { }
Runtime software API
14
Simple API used to couple
host application to the
inference processing that is
offloaded to the InferX X1 co-
processor
Embedded Application
+ X1
X1 drivers X1 Runtime
Embedded App at The Edge
Runtime API
© 2022 Flex Logix Technologies
Assuring high quality performance
15
• Input/Output Quantization
• Activation Functions
• Padding
2
51
2
Dept
h
2
51
2
Dept
h
2
51
2
Dept
h
2
51
2
Dept
h
2
51
2
Dept
h
2
51
2
Dept
h
2
D
1
Dept
h
2
51
2
Dept
h
2
51
2
Dept
h
2
3
Dept
h
2
51
2
Dept
h
2
51
2
Dept
h print(“Hell
oWorld”)
Cross Functional
Exploration
32
2
3
Depth
HW configuration depends
on Conv2D shape
SW
HW
Apps
Input
Tensor
Input
Tensor
2
51
2
Dept
h
2
51
2
Dept
h
2
51
2
Dept
h
2
D
1
Depth
H
W
D1
3
N >>
128
608
608
Capturing All
Attributes
Random
Weight
Tensor
© 2022 Flex Logix Technologies
Accelerating AI for embedded processing
16
• Retains customer programming model
○ Does not force customer to move to Arm or Linux with
SoC approach
○ Minimize time to market and development cost
○ Only need Add-in PCIe or M.2 card plus drivers
• Reduces dependence on rapid interface technology
evolution
○ Accelerator uses standard PCIe, DRAM
○ Leave Camera and Display interface selection/evolution
to Host
• Enables R&D and silicon focus on acceleration
processing
○ Dedicate more silicon to high leverage acceleration logic
○ Reduced engineering complexity
CPU
PCIe G4x4
8GB DDR4
4GB LPDDR4
MIPI to
USB
USB 3
MIPI CSI or USB Cameras
AI Accelerator
RGMII
Ethernet
USB 3
eDP
or
LCD
© 2022 Flex Logix Technologies
Minimal effort required to go from model to
inference
17
Customer Input
InferX Workflow
Compiler
Neural net
Model
Inference
Results
Bit file
X1 Hardware
Data
Stream
Runtime
Device
Control
Application Layer
STEP 1 STEP 2 STEP 3
© 2022 Flex Logix Technologies
Can help you gain a competitive advantage
Putting it all together
18
Inferx @flex-logix.com
Contact
Us
Product Impact
• Optimized solutions
around your use
cases
Compiler SW
• Robust
• Maximum
performance
• Minimal user
intervention
Hardware
• Future proofed
• GPU flexibility with
ASIC performance
• Low power
Runtime SW
• The InferX Runtime
API is streamlined
for ease-of-use
© 2022 Flex Logix Technologies
Thank you
Visit us on the expo floor for more information
19

Weitere ähnliche Inhalte

Ähnlich wie “The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning,” a Presentation from Flex Logix

“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
Edge AI and Vision Alliance
 
Comparisons And Contrasts Of Windows Ce, Windows Xp, And...
Comparisons And Contrasts Of Windows Ce, Windows Xp, And...Comparisons And Contrasts Of Windows Ce, Windows Xp, And...
Comparisons And Contrasts Of Windows Ce, Windows Xp, And...
Cecilia Lucero
 

Ähnlich wie “The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning,” a Presentation from Flex Logix (20)

Why Pay for Open Source Linux? Avoid the Hidden Cost of DIY
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIYWhy Pay for Open Source Linux? Avoid the Hidden Cost of DIY
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIY
 
Implementing DevOps – How it came to the fore, its key elements and example d...
Implementing DevOps – How it came to the fore, its key elements and example d...Implementing DevOps – How it came to the fore, its key elements and example d...
Implementing DevOps – How it came to the fore, its key elements and example d...
 
Intel IoT Edge Computing 在 AI 領域的應用與商機
Intel IoT Edge Computing 在 AI 領域的應用與商機Intel IoT Edge Computing 在 AI 領域的應用與商機
Intel IoT Edge Computing 在 AI 領域的應用與商機
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
 
Container Ecosystem and Docker Technology
Container Ecosystem and Docker TechnologyContainer Ecosystem and Docker Technology
Container Ecosystem and Docker Technology
 
Technical Brochure Cloud Infrastructure Cisco Systems
Technical Brochure Cloud Infrastructure Cisco SystemsTechnical Brochure Cloud Infrastructure Cisco Systems
Technical Brochure Cloud Infrastructure Cisco Systems
 
The Ultimate List of Opensource Software for #docker #decentralized #selfhost...
The Ultimate List of Opensource Software for #docker #decentralized #selfhost...The Ultimate List of Opensource Software for #docker #decentralized #selfhost...
The Ultimate List of Opensource Software for #docker #decentralized #selfhost...
 
Infrastructure student
Infrastructure studentInfrastructure student
Infrastructure student
 
Comparisons And Contrasts Of Windows Ce, Windows Xp, And...
Comparisons And Contrasts Of Windows Ce, Windows Xp, And...Comparisons And Contrasts Of Windows Ce, Windows Xp, And...
Comparisons And Contrasts Of Windows Ce, Windows Xp, And...
 
Connectivity is here (5 g, swarm,...). now, let's build interplanetary apps! (1)
Connectivity is here (5 g, swarm,...). now, let's build interplanetary apps! (1)Connectivity is here (5 g, swarm,...). now, let's build interplanetary apps! (1)
Connectivity is here (5 g, swarm,...). now, let's build interplanetary apps! (1)
 
Bahrain ch9 introduction to docker 5th birthday
Bahrain ch9 introduction to docker 5th birthday Bahrain ch9 introduction to docker 5th birthday
Bahrain ch9 introduction to docker 5th birthday
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
Embedded Fest 2019. Dov Nimratz. Artificial Intelligence in Small Embedded Sy...
Embedded Fest 2019. Dov Nimratz. Artificial Intelligence in Small Embedded Sy...Embedded Fest 2019. Dov Nimratz. Artificial Intelligence in Small Embedded Sy...
Embedded Fest 2019. Dov Nimratz. Artificial Intelligence in Small Embedded Sy...
 
DockerDay2015: Keynote
DockerDay2015: KeynoteDockerDay2015: Keynote
DockerDay2015: Keynote
 
Accelerating Innovation from Edge to Cloud
Accelerating Innovation from Edge to CloudAccelerating Innovation from Edge to Cloud
Accelerating Innovation from Edge to Cloud
 
Why containers
Why containersWhy containers
Why containers
 
“Building Large-scale Distributed Computer Vision Solutions Without Starting ...
“Building Large-scale Distributed Computer Vision Solutions Without Starting ...“Building Large-scale Distributed Computer Vision Solutions Without Starting ...
“Building Large-scale Distributed Computer Vision Solutions Without Starting ...
 
Managed DirectX
Managed DirectXManaged DirectX
Managed DirectX
 
Docker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to DockerDocker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to Docker
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 

Mehr von Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

Mehr von Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

“The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning,” a Presentation from Flex Logix

  • 1. © 2022 Flex Logix Technologies 1 The Flex Logix InferX X1: Pairing Software and Hardware to Enable Edge Machine Learning Randy Allen Vice President of Software Flex Logix Technologies
  • 2. © 2022 Flex Logix Technologies History of ML/AI 2 1950 1960 1970 1980 1990 2000 2010 2020 KNOWLEDGE DRIVEN 1952: ML coined DATA DRIVEN Winter Winter Boom AI ML ML Winter Birth Symbolic AI Tedious engineering Deep learning Legitimized Minis Killer Micros Ubiquitous GPUs 1965 Neural Networks 1997: Deep Blue 2006: Deep Learning
  • 3. © 2022 Flex Logix Technologies “ “It’s not possible.” 3 On hitting power and thermal barriers, the hardware world threw up a Hail Mary of “parallelism” hoping someone in the software world will run underneath and catch it Dave Patterson, BERKELEY ARCHITECTURE DUDE When we start talking about parallelism and ease of use of truly parallel computers, we’re talking about a problem that’s as hard as any that computer science has faced… I would be panicked if I were in the industry John Hennessy, STANFORD COMPILER DUDE “ “No, it’s necessary.”
  • 4. © 2022 Flex Logix Technologies Not a new challenge 4 “Dead Parallel Computer Society” Convex, Alliant, Multiflow, Encore, FPS, Inmos, Kendall Square, MasPar, nCUBE, Sequent, SGI, Thinking Machines …. RIP They Tried and Died
  • 5. © 2022 Flex Logix Technologies Matrix multiplication: a simple example Scalar Vector for(i=0; i<n; i++) for(j=0; j<n; j++) for(k=0; k<n; k++) c[i][j] += a[i][k] * b[k][j]; for(i=0; i<n; i++) for(js=0; js<n; js+=s) for(k=0; k<n; k++) for(j=js; j<js+s; j++) //vector c[i][j] += a[i][k] * b[k][j]; 5
  • 6. © 2022 Flex Logix Technologies Matrix multiplication in parallel Scalar: Cache optimized for(ib=0; ib<n; ib+=bi) for(jb=0; jb<n; jb+=bj) for(k=0; k<n; k++) for(i=ib; i<ib+bi; i++) for(j=jb; j < jb+bj; j++) c[i][j] += a[i][k] * b[k][j]; 6
  • 7. © 2022 Flex Logix Technologies Systolic array for(i=0; i<n; i++) for (j=i; i>0; i–){ for (k=i; i>1; k–) { ta(j,k) = ta(j,k-1); tb(k,j) = tb(k-1,j); } } it = i; for (j=0; j<i; j++){ ta(j,i) = a(j,t); tb(1,j) = b(t,j); it = it - 1; } for (j=0; j<i; j++) for k=0; k<i; k++) c(j,k) += ta(j,k) * tb(j,k); 7
  • 8. © 2022 Flex Logix Technologies Achieving balance Distributed Memory Shared Memory Number of processors Delivered Performance One Many Efficiency Parallelism Easy to Build Hard to Program Easy to Program Hard to Build 8
  • 9. © 2022 Flex Logix Technologies The real challenge of AI is NOT Packing more processors onto a die than anyone else in the world Creating the fastest processor synchronization in the world Designing a large ({distributed, fetch-and-phi, shared,}) memory system BUT INSTEAD Designing a multiprocessor system with a balance between processors and memory With the right set of specialized instructions for acceleration In conjunction with software that can effectively utilize the system BECAUSE More processors are useful only if the software can compile the net to use them Eliminating the need for synchronization is far more effective than faster synchronization The key to fast processing is reusing data, not fetching it quickly 9
  • 10. © 2022 Flex Logix Technologies The Flex Logix InferX™ X1 10 • Dynamic TPU Array • Accelerator/Co-processor for host processor • ASIC performance but dynamic to new models • Low power/High performance • Designed for edge (B=1) applications 4MB L3 SRAM eFPGA LPDDR 4 x32 PCIe Gen3/4 x4 4K MAC Units 8MB distributed L2 SRAM
  • 11. © 2022 Flex Logix Technologies ASIC efficiency with CPU/GPU flexibility 11 2019: Xin Feng, Computer vision algorithms and hardware implementations: A survey InferX position added by Flex Logix InferX Technology • Dynamic Tensor Processing • Highly Silicon Efficient • Programmed via standard AI Model Paradigms (TensorFlow, PyTorch, ONNX) • Layer by Layer reconfiguration in microseconds
  • 12. © 2022 Flex Logix Technologies Embedded Application + X1 X1 drivers X1 Runtime Embedded App at The Edge Runtime API Neural Net Conv AvgPool MaxPool Concat .ncf InferXDK Software Development Toolkit System level view of software 12 NN Model Framework
  • 13. © 2022 Flex Logix Technologies InferX compiler: unleashes the power of X1 13 NN FRAGMENT SELECTION OF OPERATORS Add Convolution AvgPool MaxPool Affine Concatenate …. Configure Connect Select L2 1DTPU + 1DTPU L2 1DTPU 1DTPU L2 L2 1DTPU + 1DTPU L2 1DTPU 1DTPU L2 Input Memory MACS Misc Ops Output Memory
  • 14. © 2022 Flex Logix Technologies INFERX RUNTIME C++ API FUNCTIONS DESCRIPTION ARGS StartInference Executes single inference Data_handle InferMulti Executes multiple inferences times WaitForInferenceReady Waits for inference to finish { } GetInferOutputData Retrieves inference results data_out SetModel Loads model into memory filename SetData Established Data data Initialize Initialize X1 HW { } Runtime software API 14 Simple API used to couple host application to the inference processing that is offloaded to the InferX X1 co- processor Embedded Application + X1 X1 drivers X1 Runtime Embedded App at The Edge Runtime API
  • 15. © 2022 Flex Logix Technologies Assuring high quality performance 15 • Input/Output Quantization • Activation Functions • Padding 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 D 1 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 3 Dept h 2 51 2 Dept h 2 51 2 Dept h print(“Hell oWorld”) Cross Functional Exploration 32 2 3 Depth HW configuration depends on Conv2D shape SW HW Apps Input Tensor Input Tensor 2 51 2 Dept h 2 51 2 Dept h 2 51 2 Dept h 2 D 1 Depth H W D1 3 N >> 128 608 608 Capturing All Attributes Random Weight Tensor
  • 16. © 2022 Flex Logix Technologies Accelerating AI for embedded processing 16 • Retains customer programming model ○ Does not force customer to move to Arm or Linux with SoC approach ○ Minimize time to market and development cost ○ Only need Add-in PCIe or M.2 card plus drivers • Reduces dependence on rapid interface technology evolution ○ Accelerator uses standard PCIe, DRAM ○ Leave Camera and Display interface selection/evolution to Host • Enables R&D and silicon focus on acceleration processing ○ Dedicate more silicon to high leverage acceleration logic ○ Reduced engineering complexity CPU PCIe G4x4 8GB DDR4 4GB LPDDR4 MIPI to USB USB 3 MIPI CSI or USB Cameras AI Accelerator RGMII Ethernet USB 3 eDP or LCD
  • 17. © 2022 Flex Logix Technologies Minimal effort required to go from model to inference 17 Customer Input InferX Workflow Compiler Neural net Model Inference Results Bit file X1 Hardware Data Stream Runtime Device Control Application Layer STEP 1 STEP 2 STEP 3
  • 18. © 2022 Flex Logix Technologies Can help you gain a competitive advantage Putting it all together 18 Inferx @flex-logix.com Contact Us Product Impact • Optimized solutions around your use cases Compiler SW • Robust • Maximum performance • Minimal user intervention Hardware • Future proofed • GPU flexibility with ASIC performance • Low power Runtime SW • The InferX Runtime API is streamlined for ease-of-use
  • 19. © 2022 Flex Logix Technologies Thank you Visit us on the expo floor for more information 19