SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
ADCSS PRESENTATION
26.10.2022
pablo.ghiglino@klepsydra.com
www.klepsydra.com
Klepsydra Technologies
Part 1
Klepsydra
Technology
Part 2.1
Lock-free
programming
CONTEXT: PARALLEL PROCESSING
• Recurrent mission
failures due to
software
• Access to sensor data
from Earth is time
consuming.
• Satellites struggle to
meet power
requirements
Consequences for Space applications
Challenges on on-board processing
CPU
Usage
Low Medium
Data volume
Modern hardware and old
software:
• Computers max out with low to
medium data volumes
• Inef
fi
cient use of resources
• Excessive power for low data
processing
COMPARE AND SWAP
• Compare-and-swap (CAS) is an instruction
used in multithreading to achieve
synchronisation. It compares the contents of
a memory location with a given value and,
only if they are the same, modi
fi
es the
contents of that memory location to a new
given value. This is done as a single
atomic operation.
• Compare-and-Swap has been an integral
part of the IBM 370 architectures since
1970.
• Maurice Herlihy (1991) proved that CAS can
implement more of these algorithms than
atomic read, write, and fetch-and-add
LOCK BASED PARALLELISATION VS
LOCK FREE PARALLELISATION
• Threads need to acquire lock to access resource.
• Context switch:
• Suspended while resource is locked by
someone else
• Awaken when resource is available.
• Not deterministic, power consuming context switch.
• Threads access resources using ‘Atomic Operations’
• Compare and Swap (CAS):
• Try to update a memory entry
• If not possible tried again
• No locks involved, but ‘busy wait’
• No context switch required.
PROS AND CONS OF LOCK-FREE
PROGRAMMING
CPU
Usage
Data volume
CPU
Usage
Data volume
Lock-free programming
Pros:
• Less CPU consumption required
• Lower latency and higher data throughput
• Substantial increase in determinism
Cons:
• Extremely dif
fi
cult programming
technique
• Requires processor with CAS instructions
(90% of the market have them, though)
Part 2.2
Klepsydra
SDK
Lightweight, modular and compatible with most used operating systems
Worldwide
application
Klepsydra
SDK
Klepsydra
GPU
Streaming
Klepsydra
AI
Klepsydra
ROS2
executor
plugin
SDK – Software Development Kit
Boost data processing at the edge for general
applications and processor intensive
algorithms
AI – Artificial Intelligence
High performance deep neural network
(DNN) engine to deploy any AI or
machine learning module at the edge
ROS2 Executor plugin
Executor for ROS2 able to process up
to 10 times more data with up to 50%
reduction in CPU consumption.
GPU (Graphic Processing Unit)
High parallelisation of GPU to increase
the processing data rate and GPU
utilization
THE PRODUCT
KLEPSYDRA SDK OVERVIEW
SDK CORE
Plugins: ROS,
ROS2, OpenCV,
CANopen
Code
generator
Performance
analysis
Language bindings
Applications
Basic features
Advanced features
Event Loop
Sensor Multiplexer
Two main data
processing approaches
Producer 1
Consumer 1 Consumer 2
Producer 2
Producer 3
Consumer
Producer 1
11
Part 2.3
Algorithms in
Klepsydra
SDK
LOCK-FREE AS ALTERNATIVE TO
PARALLELISATION
Parallelisation Pipeline
APPROACH
Input
Matrix
B = A x A C = B x B
Output
Matrix
Input
Matrix B = A x A
Output
Matrix
C = B x B
Klepsydra Parallel Streaming Setup
OpenMP Sequential Setup
{
Thread 1
{
Thread 2
{
Parallelised
{
Parallelised
BENCHMARK DESCRIPTION
Description
• Given an input matrix, a number of sequential multiplications will be
performed:
• Step 1: A => B = A x A => Step 2 : C = B x B…
• Matrix A randomly generated on each new sequence
Parameters:
• Matrix dimensions: 100x100
• Data type: Float, integer
• Number of multiplications per matrix: [10, 60]
• Processing frequency: [2Hz - 100Hz]
Technical Spec
• Computer: Odroid XU4
• OS: Ubuntu 18.04
FLOAT PERFORMANCE RESULTS II
CPU Usage. 30 Steps
0,0
20,0
40,0
60,0
80,0
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
Throughput. 30 Steps
0,00
5,00
10,00
15,00
20,00
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
CPU Usage. 40 Steps
0,0
17,5
35,0
52,5
70,0
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Throughput. 40 Steps
0,00
3,50
7,00
10,50
14,00
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Latency. 40 Steps
0,00
60,00
120,00
180,00
240,00
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Latency. 30 Steps
0,00
45,00
90,00
135,00
180,00
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
Part 2.4
Klepsydra AI
2-DIM THREADING MODEL
Input
Data
Layer
Output
Data
First dimension: pipelining
{
Thread 1 (Core 1)
Layer
Layer
Layer
Layer
Layer
{
Thread 2 (Core 2)
Layer
Layer
Layer
Layer
Layer Layer Layer Layer Layer
Deep Neural Network Structure
2-DIM THREADING MODEL
Input
Data
Output
Data
Second dimension: Matrix
multiplication parallelisation
{
T
hread
1
(Core
1)
Layer
{
T
hread
2
(Core
2)
{
T
hread
3
(Core
3)
2-DIM THREADING MODEL
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
• Low CPU
• High throughput CPU
• High latency
• Mid CPU
• Mid throughput CPU
• Mid latency
• High CPU
• Mid throughput CPU
• Low latency
Threading model con
fi
guration
21
Example of performance benchmarks
Klepsydra AI optimised for Latency vs CPU
CPU
Usage
0
30
60
90
Data Rate (Hz)
0 10 20 30
Latency Optimisation CPU Optimisaion TF Lite
Latency = 29ms
Low CPU Saturation
Latency = 97ms
TF Lite Saturation
Latency = 120ms
Part 2.4
The auto-tuning
software
KLEPSYDRA SDO PROCESS
• Klepsydra Streaming Optimiser (KSO):
• Runs on a separate computer
• Executes several dry runs on the OBC
• Collect statistics
• Runs a genetic algorithm to
fi
nd the optimal
solution for latency, power or throughput
• The main variable to optimise is the distribution of
layers on the HPDP cluster
KLEPSYDRA STREAMING DISTRIBUTION
OPTIMISER (SDO)
ONNX API
class KPSR_API OnnxDNNImporter
{
public:
/**
* @brief import an onnx file and uses a default eventloop factory for all processor cores
* @param onnxFileName
* @param testDNN
* @return a share pointer to a DeepNeuralNetwork object
*
* When log level is debug, dumps the YAML configuration of the default factory.
* It makes use of all processor cores.
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName,
bool testDNN = false);
/**
* @brief importForTest an onnx file and uses a default synchronous factory
* @param onnxFileName
* @param envFileName. Klepsydra AI configuration environment file.
* @return a share pointer to a DeepNeuralNetwork object
*
* This method is intented to be used for testing purposes only.
*
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName,
const std::string & envFileName);
};
25
KPSR File API
class KPSR_API DNNImporter
{
public:
/**
* @brief import a klepsydra model file and uses a default eventloop factory for all processor cores
* @param modelFileName
* @param testDNN
* @return a shared pointer to a DeepNeuralNetwork object
*
* When log level is debug, dumps the YAML configuration of the default factory.
* It makes use of all processor cores.
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName,
bool testDNN = false);
/**
* @brief importForTest a klepsydra model file and uses a default synchronous factory
* @param modelFileName
* @param envFileName. Klepsydra AI configuration environment file.
* @return a shared pointer to a DeepNeuralNetwork object
*
* This method is intended to be used for testing purposes only.
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName,
const std::string & envFileName);
};
26
Core API
class DeepNeuralNetwork {
public:
/**
* @brief setCallback
* @param callback. Callback function for the prediction result.
*/
virtual void setCallback(std::function<void(const unsigned long &, const kpsr::ai::F32AlignedVector &)> callback) = 0;
/**
* @brief predict. Load input matrix as input to network.
* @param inputVector. An F32AlignedVector of floats containing network input.
*
* @return Unique id corresponding to the input vector
*/
virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0;
/**
* @brief predict. Copy-less version of predict.
* @param inputVector. An F32AlignedVector of floats containing network input.
*
* @return Unique id corresponding to the input vector
*/
virtual unsigned long predict(const std::shared_ptr<kpsr::ai::F32AlignedVector> & inputVector) = 0;
};
27
Part 3
KATESU Project
QORIQ® LAYERSCAPE LS1046A
MULTICORE PROCESSOR
QorIQ® Layerscape LS1046A
Klepsydra AI Container
STATUS
• Successful installation of the following setup:
• LS1046 running Yocto Jethro
• Docker Installed on LS1046
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software fully supported (quantised and non-
quantised)
XILINX ZEDBOARD
ZedBoard
Klepsydra AI Container
PetaLinux
Klepsydra AI Container
STATUS
• Successful installation of the following setup:
• ZedBoard running PetaLinux 2019.2
• Docker Installed on ZedBoard
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software with quantised support only
Part 3.2
Performance
Results
PERFORMANCE RESULTS: CME ON LS1046
0
6,5
13
19,5
26
CPU / Hz
TFLite + NEON Klepsydra
0
45
90
135
180
Latency (ms)
TFLite + NEON Klepsydra
0
4,5
9
13,5
18
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: CME-Q ON LS1046
0
6,75
13,5
20,25
27
CPU / Hz
TFLite + NEON Klepsydra
0
30
60
90
120
Latency (ms)
TFLite + NEON Klepsydra
0
7,5
15
22,5
30
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: CME-Q ON ZEDBOARD
0
12,5
25
37,5
50
CPU / Hz
TFLite + NEON Klepsydra
0
250
500
750
1000
Latency (ms)
TFLite + NEON Klepsydra
0
0,65
1,3
1,95
2,6
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: BSC ON LS1046
0
20
40
60
80
CPU / Hz
TFLite + NEON Klepsydra
0
1250
2500
3750
5000
Latency (ms)
TFLite + NEON Klepsydra
0
0,15
0,3
0,45
0,6
Throughput (Hz)
TFLite + NEON Klepsydra
Part 3.3
BSC Demo
Part 4
Conclusions
and Future
work
Vision-based navigation Earth Observation Telecommunications
• Process more images per
second
• Increase con
fi
dence in the
mission
• Reduce power consumption up
to 50%
• Faster access to data from Earth
• Increase processed request per
second (increase revenue)
• Enable AI telecomm (Cognitive
radios)
APPLICATION TO SPACE
40
ROADMAP
• 2022:
• Klepsydra AI for GPU (Q4 2022)
• 2023:
• Klepsydra AI For FGPA (Q2 2023)
• Support for operating systems: FreeRTOS (Q3 2023)
• Support for processor architectures:
• GR740 and GR765 (Q4 2023)
• RISC-V (Q4 2023)
• HPDP (Q4 2023)
CONTACT INFORMATION
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

Weitere ähnliche Inhalte

Ähnlich wie ADCSS 2022

Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
Inter Task Communication On Volatile Nodes
Inter Task Communication On Volatile NodesInter Task Communication On Volatile Nodes
Inter Task Communication On Volatile Nodesnagarajan_ka
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsHannes Tschofenig
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformaticsShanker Trivedi
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the wayOleg Podsechin
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Managing Large-scale Networks with Trigger
Managing Large-scale Networks with TriggerManaging Large-scale Networks with Trigger
Managing Large-scale Networks with Triggerjathanism
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsGanesan Narayanasamy
 

Ähnlich wie ADCSS 2022 (20)

Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
Inter Task Communication On Volatile Nodes
Inter Task Communication On Volatile NodesInter Task Communication On Volatile Nodes
Inter Task Communication On Volatile Nodes
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M Processors
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformatics
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
Defense_Presentation
Defense_PresentationDefense_Presentation
Defense_Presentation
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Managing Large-scale Networks with Trigger
Managing Large-scale Networks with TriggerManaging Large-scale Networks with Trigger
Managing Large-scale Networks with Trigger
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systems
 

Mehr von klepsydratechnologie (6)

Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
Klepsydra Company Presentation
Klepsydra Company PresentationKlepsydra Company Presentation
Klepsydra Company Presentation
 
Roscon2021 Executor
Roscon2021 ExecutorRoscon2021 Executor
Roscon2021 Executor
 
IAC 2020
IAC 2020IAC 2020
IAC 2020
 
IAC 2019
IAC 2019 IAC 2019
IAC 2019
 

Kürzlich hochgeladen

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 

Kürzlich hochgeladen (20)

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

ADCSS 2022

  • 4. CONTEXT: PARALLEL PROCESSING • Recurrent mission failures due to software • Access to sensor data from Earth is time consuming. • Satellites struggle to meet power requirements Consequences for Space applications Challenges on on-board processing CPU Usage Low Medium Data volume Modern hardware and old software: • Computers max out with low to medium data volumes • Inef fi cient use of resources • Excessive power for low data processing
  • 5. COMPARE AND SWAP • Compare-and-swap (CAS) is an instruction used in multithreading to achieve synchronisation. It compares the contents of a memory location with a given value and, only if they are the same, modi fi es the contents of that memory location to a new given value. This is done as a single atomic operation. • Compare-and-Swap has been an integral part of the IBM 370 architectures since 1970. • Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-add
  • 6. LOCK BASED PARALLELISATION VS LOCK FREE PARALLELISATION • Threads need to acquire lock to access resource. • Context switch: • Suspended while resource is locked by someone else • Awaken when resource is available. • Not deterministic, power consuming context switch. • Threads access resources using ‘Atomic Operations’ • Compare and Swap (CAS): • Try to update a memory entry • If not possible tried again • No locks involved, but ‘busy wait’ • No context switch required.
  • 7. PROS AND CONS OF LOCK-FREE PROGRAMMING CPU Usage Data volume CPU Usage Data volume Lock-free programming Pros: • Less CPU consumption required • Lower latency and higher data throughput • Substantial increase in determinism Cons: • Extremely dif fi cult programming technique • Requires processor with CAS instructions (90% of the market have them, though)
  • 9. Lightweight, modular and compatible with most used operating systems Worldwide application Klepsydra SDK Klepsydra GPU Streaming Klepsydra AI Klepsydra ROS2 executor plugin SDK – Software Development Kit Boost data processing at the edge for general applications and processor intensive algorithms AI – Artificial Intelligence High performance deep neural network (DNN) engine to deploy any AI or machine learning module at the edge ROS2 Executor plugin Executor for ROS2 able to process up to 10 times more data with up to 50% reduction in CPU consumption. GPU (Graphic Processing Unit) High parallelisation of GPU to increase the processing data rate and GPU utilization THE PRODUCT
  • 10. KLEPSYDRA SDK OVERVIEW SDK CORE Plugins: ROS, ROS2, OpenCV, CANopen Code generator Performance analysis Language bindings Applications Basic features Advanced features
  • 11. Event Loop Sensor Multiplexer Two main data processing approaches Producer 1 Consumer 1 Consumer 2 Producer 2 Producer 3 Consumer Producer 1 11
  • 13. LOCK-FREE AS ALTERNATIVE TO PARALLELISATION Parallelisation Pipeline
  • 14. APPROACH Input Matrix B = A x A C = B x B Output Matrix Input Matrix B = A x A Output Matrix C = B x B Klepsydra Parallel Streaming Setup OpenMP Sequential Setup { Thread 1 { Thread 2 { Parallelised { Parallelised
  • 15. BENCHMARK DESCRIPTION Description • Given an input matrix, a number of sequential multiplications will be performed: • Step 1: A => B = A x A => Step 2 : C = B x B… • Matrix A randomly generated on each new sequence Parameters: • Matrix dimensions: 100x100 • Data type: Float, integer • Number of multiplications per matrix: [10, 60] • Processing frequency: [2Hz - 100Hz] Technical Spec • Computer: Odroid XU4 • OS: Ubuntu 18.04
  • 16. FLOAT PERFORMANCE RESULTS II CPU Usage. 30 Steps 0,0 20,0 40,0 60,0 80,0 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra Throughput. 30 Steps 0,00 5,00 10,00 15,00 20,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra CPU Usage. 40 Steps 0,0 17,5 35,0 52,5 70,0 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Throughput. 40 Steps 0,00 3,50 7,00 10,50 14,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 40 Steps 0,00 60,00 120,00 180,00 240,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 30 Steps 0,00 45,00 90,00 135,00 180,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra
  • 18. 2-DIM THREADING MODEL Input Data Layer Output Data First dimension: pipelining { Thread 1 (Core 1) Layer Layer Layer Layer Layer { Thread 2 (Core 2) Layer Layer Layer Layer Layer Layer Layer Layer Layer Deep Neural Network Structure
  • 19. 2-DIM THREADING MODEL Input Data Output Data Second dimension: Matrix multiplication parallelisation { T hread 1 (Core 1) Layer { T hread 2 (Core 2) { T hread 3 (Core 3)
  • 20. 2-DIM THREADING MODEL Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Layer Layer Layer • Low CPU • High throughput CPU • High latency • Mid CPU • Mid throughput CPU • Mid latency • High CPU • Mid throughput CPU • Low latency Threading model con fi guration
  • 21. 21 Example of performance benchmarks Klepsydra AI optimised for Latency vs CPU CPU Usage 0 30 60 90 Data Rate (Hz) 0 10 20 30 Latency Optimisation CPU Optimisaion TF Lite Latency = 29ms Low CPU Saturation Latency = 97ms TF Lite Saturation Latency = 120ms
  • 23. KLEPSYDRA SDO PROCESS • Klepsydra Streaming Optimiser (KSO): • Runs on a separate computer • Executes several dry runs on the OBC • Collect statistics • Runs a genetic algorithm to fi nd the optimal solution for latency, power or throughput • The main variable to optimise is the distribution of layers on the HPDP cluster
  • 25. ONNX API class KPSR_API OnnxDNNImporter { public: /** * @brief import an onnx file and uses a default eventloop factory for all processor cores * @param onnxFileName * @param testDNN * @return a share pointer to a DeepNeuralNetwork object * * When log level is debug, dumps the YAML configuration of the default factory. * It makes use of all processor cores. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName, bool testDNN = false); /** * @brief importForTest an onnx file and uses a default synchronous factory * @param onnxFileName * @param envFileName. Klepsydra AI configuration environment file. * @return a share pointer to a DeepNeuralNetwork object * * This method is intented to be used for testing purposes only. * */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName, const std::string & envFileName); }; 25
  • 26. KPSR File API class KPSR_API DNNImporter { public: /** * @brief import a klepsydra model file and uses a default eventloop factory for all processor cores * @param modelFileName * @param testDNN * @return a shared pointer to a DeepNeuralNetwork object * * When log level is debug, dumps the YAML configuration of the default factory. * It makes use of all processor cores. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName, bool testDNN = false); /** * @brief importForTest a klepsydra model file and uses a default synchronous factory * @param modelFileName * @param envFileName. Klepsydra AI configuration environment file. * @return a shared pointer to a DeepNeuralNetwork object * * This method is intended to be used for testing purposes only. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName, const std::string & envFileName); }; 26
  • 27. Core API class DeepNeuralNetwork { public: /** * @brief setCallback * @param callback. Callback function for the prediction result. */ virtual void setCallback(std::function<void(const unsigned long &, const kpsr::ai::F32AlignedVector &)> callback) = 0; /** * @brief predict. Load input matrix as input to network. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id corresponding to the input vector */ virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0; /** * @brief predict. Copy-less version of predict. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id corresponding to the input vector */ virtual unsigned long predict(const std::shared_ptr<kpsr::ai::F32AlignedVector> & inputVector) = 0; }; 27
  • 29. QORIQ® LAYERSCAPE LS1046A MULTICORE PROCESSOR QorIQ® Layerscape LS1046A Klepsydra AI Container
  • 30. STATUS • Successful installation of the following setup: • LS1046 running Yocto Jethro • Docker Installed on LS1046 • Container with the following: • Ubuntu 20.04 • Klepsydra AI software fully supported (quantised and non- quantised)
  • 31. XILINX ZEDBOARD ZedBoard Klepsydra AI Container PetaLinux Klepsydra AI Container
  • 32. STATUS • Successful installation of the following setup: • ZedBoard running PetaLinux 2019.2 • Docker Installed on ZedBoard • Container with the following: • Ubuntu 20.04 • Klepsydra AI software with quantised support only
  • 34. PERFORMANCE RESULTS: CME ON LS1046 0 6,5 13 19,5 26 CPU / Hz TFLite + NEON Klepsydra 0 45 90 135 180 Latency (ms) TFLite + NEON Klepsydra 0 4,5 9 13,5 18 Throughput (Hz) TFLite + NEON Klepsydra
  • 35. PERFORMANCE RESULTS: CME-Q ON LS1046 0 6,75 13,5 20,25 27 CPU / Hz TFLite + NEON Klepsydra 0 30 60 90 120 Latency (ms) TFLite + NEON Klepsydra 0 7,5 15 22,5 30 Throughput (Hz) TFLite + NEON Klepsydra
  • 36. PERFORMANCE RESULTS: CME-Q ON ZEDBOARD 0 12,5 25 37,5 50 CPU / Hz TFLite + NEON Klepsydra 0 250 500 750 1000 Latency (ms) TFLite + NEON Klepsydra 0 0,65 1,3 1,95 2,6 Throughput (Hz) TFLite + NEON Klepsydra
  • 37. PERFORMANCE RESULTS: BSC ON LS1046 0 20 40 60 80 CPU / Hz TFLite + NEON Klepsydra 0 1250 2500 3750 5000 Latency (ms) TFLite + NEON Klepsydra 0 0,15 0,3 0,45 0,6 Throughput (Hz) TFLite + NEON Klepsydra
  • 40. Vision-based navigation Earth Observation Telecommunications • Process more images per second • Increase con fi dence in the mission • Reduce power consumption up to 50% • Faster access to data from Earth • Increase processed request per second (increase revenue) • Enable AI telecomm (Cognitive radios) APPLICATION TO SPACE 40
  • 41. ROADMAP • 2022: • Klepsydra AI for GPU (Q4 2022) • 2023: • Klepsydra AI For FGPA (Q2 2023) • Support for operating systems: FreeRTOS (Q3 2023) • Support for processor architectures: • GR740 and GR765 (Q4 2023) • RISC-V (Q4 2023) • HPDP (Q4 2023)
  • 42. CONTACT INFORMATION Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies