SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
GR740 USER DAY
13.12.2022
pablo.ghiglino@klepsydra.com
www.klepsydra.com
Klepsydra Technologies
Part 1
Pre-existing work
CONTEXT: PARALLEL PROCESSING
• Recurrent mission
failures due to
software
• Access to sensor data
from Earth is time
consuming.
• Satellites struggle to
meet power
requirements
Consequences for Space applications
Challenges on on-board processing
CPU
Usage
Low Medium
Data volume
Modern hardware and old
software:
• Computers max out with low to
medium data volumes
• Inef
fi
cient use of resources
• Excessive power for low data
processing
LOCK BASED PARALLELISATION VS
LOCK FREE PARALLELISATION
• Threads need to acquire lock to access resource.
• Context switch:
• Suspended while resource is locked by
someone else
• Awaken when resource is available.
• Not deterministic, power consuming context switch.
• Threads access resources using ‘Atomic Operations’
• Compare and Swap (CAS):
• Try to update a memory entry
• If not possible tried again
• No locks involved, but ‘busy wait’
• No context switch required.
PROS AND CONS OF LOCK-FREE
PROGRAMMING
CPU
Usage
Data volume
CPU
Usage
Data volume
Lock-free programming
Pros:
• Less CPU consumption required
• Lower latency and higher data throughput
• Substantial increase in determinism
Cons:
• Extremely dif
fi
cult programming
technique
• Requires processor with CAS instructions
(90% of the market have them, though)
COMPRESSION ALGORITHM ON GR716
GR716 Benchmark
Power consumption vs Data Processing
Power
(%)
10
33
55
78
100
Data processing rate (Hz)
0 10 20 30 40
Traditional
edge software Klepsydra
Traditional thread-safe queue
Klepsydra’s Eventloop
ENCRYPTION ALGORITHM ON GR716
Lightweight, modular and compatible with most used operating systems
Worldwide
application
Klepsydra
SDK
Klepsydra
GPU
Streaming
Klepsydra
AI
Klepsydra
ROS2
executor
plugin
SDK – Software Development Kit
Boost data processing at the edge for general
applications and processor intensive
algorithms
AI – Artificial Intelligence
High performance deep neural network
(DNN) engine to deploy any AI or
machine learning module at the edge
ROS2 Executor plugin
Executor for ROS2 able to process up
to 10 times more data with up to 50%
reduction in CPU consumption.
GPU (Graphic Processing Unit)
High parallelisation of GPU to increase
the processing data rate and GPU
utilization
THE PRODUCT
LOCK-FREE AS ALTERNATIVE TO
PARALLELISATION
Parallelisation Pipeline
2-DIM THREADING MODEL
Input
Data
Layer
Output
Data
First dimension: pipelining
{
Thread 1 (Core 1)
Layer
Layer
Layer
Layer
Layer
{
Thread 2 (Core 2)
Layer
Layer
Layer
Layer
Layer Layer Layer Layer Layer
Deep Neural Network Structure
2-DIM THREADING MODEL
Input
Data
Output
Data
Second dimension: Matrix
multiplication parallelisation
{
T
hread
1
(Core
1)
Layer
{
T
hread
2
(Core
2)
{
T
hread
3
(Core
3)
2-DIM THREADING MODEL
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
• Low CPU
• High throughput CPU
• High latency
• Mid CPU
• Mid throughput CPU
• Mid latency
• High CPU
• Mid throughput CPU
• Low latency
Threading model con
fi
guration
13
Example of performance benchmarks
Klepsydra AI optimised for Latency vs CPU
CPU
Usage
0
30
60
90
Data Rate (Hz)
0 10 20 30
Latency Optimisation CPU Optimisaion TF Lite
Latency = 29ms
Low CPU Saturation
Latency = 97ms
TF Lite Saturation
Latency = 120ms
KPSR File API
class KPSR_API DNNImporter
{
public:
/**
* @brief import a klepsydra model file and uses a default eventloop factory for all processor cores
* @param modelFileName
* @param testDNN
* @return a shared pointer to a DeepNeuralNetwork object
*
* When log level is debug, dumps the YAML configuration of the default factory.
* It makes use of all processor cores.
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName,
bool testDNN = false);
/**
* @brief importForTest a klepsydra model file and uses a default synchronous factory
* @param modelFileName
* @param envFileName. Klepsydra AI configuration environment file.
* @return a shared pointer to a DeepNeuralNetwork object
*
* This method is intended to be used for testing purposes only.
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName,
const std::string & envFileName);
};
14
Core API
class DeepNeuralNetwork {
public:
/**
* @brief setCallback
* @param callback. Callback function for the prediction result.
*/
virtual void setCallback(std::function<void(const unsigned long &, const kpsr::ai::F32AlignedVector &)> callback) = 0;
/**
* @brief predict. Load input matrix as input to network.
* @param inputVector. An F32AlignedVector of floats containing network input.
*
* @return Unique id corresponding to the input vector
*/
virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0;
/**
* @brief predict. Copy-less version of predict.
* @param inputVector. An F32AlignedVector of floats containing network input.
*
* @return Unique id corresponding to the input vector
*/
virtual unsigned long predict(const std::shared_ptr<kpsr::ai::F32AlignedVector> & inputVector) = 0;
};
15
KLEPSYDRA SDO PROCESS
• Klepsydra Streaming Distribution Optimiser (SDO):
• Runs on a separate computer
• Executes several dry runs on the OBC
• Collect statistics
• Runs a genetic algorithm to
fi
nd the optimal
solution for latency, power or throughput
• The main variables to optimise are the two
dimensions of the threading model
KLEPSYDRA STREAMING DISTRIBUTION
OPTIMISER (SDO)
Part 2
The PATTERN
Project
THE PATTERN PROJECT
PATTERN: Klepsydra AI ported to the GR740 aNd
RISC-V
• Target Processor: GR740, GR765 (Leon5 & Noel-V)
• Target OS: RTMES5
• Development on commercial FPGA board
• Validation on Space quali
fi
ed hardware
THE MULTI-THREADING API
Klepsydra SDK
Multi-threading framework
POSIX Operating System
PTHREAD
Klepsydra AI
Klepsydra SDK
Threading Abstraction Layer
POSIX
POSIX Operating System
PTHREAD
Klepsydra AI
RTEMS5
Multi-threading framework
RTEMS
THE MATHEMATICAL BACKEND
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
ARM x86 ARM x86
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
ARM x86 ARM x86
Leon4?
Leon5?
THE PARALLELISATION FRAMEWORK
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
POSIX Operating System
PTHREAD
Parallelisation Framework
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
Parallelisation Framework
Threading Abstraction Layer
POSIX
RTEMS5
Multi-threading framework
POSIX Operating System
PTHREAD
RTEMS
THE PLAN
Phase 1:
Klepsydra AI for RTEMS5
Phase 2:
Klepsydra AI for GR765/Leon5
Phase 3:
Klepsydra AI for GR765/Noel-V
Phase 4:
Validation of Klepsydra AI on
GR740 and GR765
THE SCHEDULE
Work Package Start Month End Month
Duration in
Months
0 1 2 3 4 5 6 7 8 9 10 11
KOM MTR1 MTR2 FR
WP4.2 11 11 1
WP0 0 17 18
2
11
10
WP4.1
1
10
10
WP3.3
0 0 1
WP2.1
WP2.2 1 4 4
WP1.1 5
0 4
5 8 4
WP1.2
WP2.3 9 10 2
WP3.2 5 8 4
WP3.1 5 5 1
CONCLUSIONS
• Enable real AI for future missions on the GR740 and GR765
• Very easy to use, via a simple API and web-based
optimisation tool
• Highly optimised for the GR740 and GR765 processors
• Lightweight software (current version is 4Mb)
• Deterministic and full control of the dedicated resources
NEXT STEPS
• In-orbit-demonstration:
• OPSSAT OBC
• Other?
• Health Monitoring (core operation failures, etc).
CONTACT INFORMATION
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

Weitere ähnliche Inhalte

Ähnlich wie GR740 User day

6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_finalYutaka Kawai
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packetLinaro
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorDatabricks
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentationAmir Razmjou
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Spark Summit EU talk by Sol Ackerman and Franklyn D'souza
Spark Summit EU talk by Sol Ackerman and Franklyn D'souzaSpark Summit EU talk by Sol Ackerman and Franklyn D'souza
Spark Summit EU talk by Sol Ackerman and Franklyn D'souzaSpark Summit
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machinesinside-BigData.com
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 

Ähnlich wie GR740 User day (20)

6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
0507036
05070360507036
0507036
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Spark Summit EU talk by Sol Ackerman and Franklyn D'souza
Spark Summit EU talk by Sol Ackerman and Franklyn D'souzaSpark Summit EU talk by Sol Ackerman and Franklyn D'souza
Spark Summit EU talk by Sol Ackerman and Franklyn D'souza
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machines
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 

Mehr von klepsydratechnologie (8)

Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
Dasia 2022
Dasia 2022Dasia 2022
Dasia 2022
 
Klepsydra Company Presentation
Klepsydra Company PresentationKlepsydra Company Presentation
Klepsydra Company Presentation
 
Roscon2021 Executor
Roscon2021 ExecutorRoscon2021 Executor
Roscon2021 Executor
 
IAC 2020
IAC 2020IAC 2020
IAC 2020
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
IAC 2019
IAC 2019 IAC 2019
IAC 2019
 

Kürzlich hochgeladen

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Kürzlich hochgeladen (20)

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

GR740 User day

  • 3. CONTEXT: PARALLEL PROCESSING • Recurrent mission failures due to software • Access to sensor data from Earth is time consuming. • Satellites struggle to meet power requirements Consequences for Space applications Challenges on on-board processing CPU Usage Low Medium Data volume Modern hardware and old software: • Computers max out with low to medium data volumes • Inef fi cient use of resources • Excessive power for low data processing
  • 4. LOCK BASED PARALLELISATION VS LOCK FREE PARALLELISATION • Threads need to acquire lock to access resource. • Context switch: • Suspended while resource is locked by someone else • Awaken when resource is available. • Not deterministic, power consuming context switch. • Threads access resources using ‘Atomic Operations’ • Compare and Swap (CAS): • Try to update a memory entry • If not possible tried again • No locks involved, but ‘busy wait’ • No context switch required.
  • 5. PROS AND CONS OF LOCK-FREE PROGRAMMING CPU Usage Data volume CPU Usage Data volume Lock-free programming Pros: • Less CPU consumption required • Lower latency and higher data throughput • Substantial increase in determinism Cons: • Extremely dif fi cult programming technique • Requires processor with CAS instructions (90% of the market have them, though)
  • 6. COMPRESSION ALGORITHM ON GR716 GR716 Benchmark Power consumption vs Data Processing Power (%) 10 33 55 78 100 Data processing rate (Hz) 0 10 20 30 40 Traditional edge software Klepsydra Traditional thread-safe queue Klepsydra’s Eventloop
  • 8. Lightweight, modular and compatible with most used operating systems Worldwide application Klepsydra SDK Klepsydra GPU Streaming Klepsydra AI Klepsydra ROS2 executor plugin SDK – Software Development Kit Boost data processing at the edge for general applications and processor intensive algorithms AI – Artificial Intelligence High performance deep neural network (DNN) engine to deploy any AI or machine learning module at the edge ROS2 Executor plugin Executor for ROS2 able to process up to 10 times more data with up to 50% reduction in CPU consumption. GPU (Graphic Processing Unit) High parallelisation of GPU to increase the processing data rate and GPU utilization THE PRODUCT
  • 9. LOCK-FREE AS ALTERNATIVE TO PARALLELISATION Parallelisation Pipeline
  • 10. 2-DIM THREADING MODEL Input Data Layer Output Data First dimension: pipelining { Thread 1 (Core 1) Layer Layer Layer Layer Layer { Thread 2 (Core 2) Layer Layer Layer Layer Layer Layer Layer Layer Layer Deep Neural Network Structure
  • 11. 2-DIM THREADING MODEL Input Data Output Data Second dimension: Matrix multiplication parallelisation { T hread 1 (Core 1) Layer { T hread 2 (Core 2) { T hread 3 (Core 3)
  • 12. 2-DIM THREADING MODEL Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Layer Layer Layer • Low CPU • High throughput CPU • High latency • Mid CPU • Mid throughput CPU • Mid latency • High CPU • Mid throughput CPU • Low latency Threading model con fi guration
  • 13. 13 Example of performance benchmarks Klepsydra AI optimised for Latency vs CPU CPU Usage 0 30 60 90 Data Rate (Hz) 0 10 20 30 Latency Optimisation CPU Optimisaion TF Lite Latency = 29ms Low CPU Saturation Latency = 97ms TF Lite Saturation Latency = 120ms
  • 14. KPSR File API class KPSR_API DNNImporter { public: /** * @brief import a klepsydra model file and uses a default eventloop factory for all processor cores * @param modelFileName * @param testDNN * @return a shared pointer to a DeepNeuralNetwork object * * When log level is debug, dumps the YAML configuration of the default factory. * It makes use of all processor cores. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName, bool testDNN = false); /** * @brief importForTest a klepsydra model file and uses a default synchronous factory * @param modelFileName * @param envFileName. Klepsydra AI configuration environment file. * @return a shared pointer to a DeepNeuralNetwork object * * This method is intended to be used for testing purposes only. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName, const std::string & envFileName); }; 14
  • 15. Core API class DeepNeuralNetwork { public: /** * @brief setCallback * @param callback. Callback function for the prediction result. */ virtual void setCallback(std::function<void(const unsigned long &, const kpsr::ai::F32AlignedVector &)> callback) = 0; /** * @brief predict. Load input matrix as input to network. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id corresponding to the input vector */ virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0; /** * @brief predict. Copy-less version of predict. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id corresponding to the input vector */ virtual unsigned long predict(const std::shared_ptr<kpsr::ai::F32AlignedVector> & inputVector) = 0; }; 15
  • 16. KLEPSYDRA SDO PROCESS • Klepsydra Streaming Distribution Optimiser (SDO): • Runs on a separate computer • Executes several dry runs on the OBC • Collect statistics • Runs a genetic algorithm to fi nd the optimal solution for latency, power or throughput • The main variables to optimise are the two dimensions of the threading model
  • 19. THE PATTERN PROJECT PATTERN: Klepsydra AI ported to the GR740 aNd RISC-V • Target Processor: GR740, GR765 (Leon5 & Noel-V) • Target OS: RTMES5 • Development on commercial FPGA board • Validation on Space quali fi ed hardware
  • 20. THE MULTI-THREADING API Klepsydra SDK Multi-threading framework POSIX Operating System PTHREAD Klepsydra AI Klepsydra SDK Threading Abstraction Layer POSIX POSIX Operating System PTHREAD Klepsydra AI RTEMS5 Multi-threading framework RTEMS
  • 21. THE MATHEMATICAL BACKEND Klepsydra AI Back-ends Full-backend (Float32, Int8) Quantized-backend (Int8) ARM x86 ARM x86 Klepsydra AI Back-ends Full-backend (Float32, Int8) Quantized-backend (Int8) ARM x86 ARM x86 Leon4? Leon5?
  • 22. THE PARALLELISATION FRAMEWORK Klepsydra AI Back-ends Full-backend (Float32, Int8) Quantized-backend (Int8) POSIX Operating System PTHREAD Parallelisation Framework Klepsydra AI Back-ends Full-backend (Float32, Int8) Quantized-backend (Int8) Parallelisation Framework Threading Abstraction Layer POSIX RTEMS5 Multi-threading framework POSIX Operating System PTHREAD RTEMS
  • 23. THE PLAN Phase 1: Klepsydra AI for RTEMS5 Phase 2: Klepsydra AI for GR765/Leon5 Phase 3: Klepsydra AI for GR765/Noel-V Phase 4: Validation of Klepsydra AI on GR740 and GR765
  • 24. THE SCHEDULE Work Package Start Month End Month Duration in Months 0 1 2 3 4 5 6 7 8 9 10 11 KOM MTR1 MTR2 FR WP4.2 11 11 1 WP0 0 17 18 2 11 10 WP4.1 1 10 10 WP3.3 0 0 1 WP2.1 WP2.2 1 4 4 WP1.1 5 0 4 5 8 4 WP1.2 WP2.3 9 10 2 WP3.2 5 8 4 WP3.1 5 5 1
  • 25. CONCLUSIONS • Enable real AI for future missions on the GR740 and GR765 • Very easy to use, via a simple API and web-based optimisation tool • Highly optimised for the GR740 and GR765 processors • Lightweight software (current version is 4Mb) • Deterministic and full control of the dedicated resources
  • 26. NEXT STEPS • In-orbit-demonstration: • OPSSAT OBC • Other? • Health Monitoring (core operation failures, etc).
  • 27. CONTACT INFORMATION Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies