SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
SMALLSAT 2021 PRESENTATION
DR PABLO GHIGLINO
pablo.ghiglino@klepsydra.com
www.klepsydra.com
A Low Power And High Performance Arti
fi
cial Intelligence
Approach To Increase Guidance Navigation And Control
Robustness
KLEPSYDRA AI IN ACTION
The demo:
• Pose estimation of 67P/
Churyumov–Gerasimenko
asteroid.
• Using an AI deep neural network
(DNN)
• Using real and synthetically
generated data from Rosetta
mission.
• Comparison of three AI inference
engines Klepsydra AI,
TensorFlowLite and OpenCV-
CNN
• Three identical computers,
running the same model with
the same input data and FPS.
KLEPSYDRA AI OVERVIEW
Klepsydra AI
Performance
analysis
Language
bindings
Trained Model
Basic features
Advanced features
Images Sensor Data
Timeseries
TRADING SOFTWARE VS EDGE SOFTWARE
Trading Systems
Edge Systems
• Bigger computer did not solve the
problem
• Can be solved using cutting-edge
lock-free programming techniques
• Top investment banks make billions
using these techniques.
• Very few developers have the required
skills
Computer
Usage
Low Medium
Data volume
Saturation
THE TECHNOLOGY
Klepsydra SDK
Sensors
External
Comms
Other Events
Application
Operating System
Patent pending technology
Klepsydra SDK
• 8x more real-time throughput
• 50% less CPU consumption
• No extra hardware or cloud
Event Loop
Sensor Multiplexer
Two main data
processing approaches
Producer 1
Consumer 1 Consumer 2
Producer 2
Producer 3
Consumer
Producer 1
6
Cobham GR716 Microcontroller
7
CPU vs Data processing rate 8 producers
CPU
(%) 25,00
43,75
62,50
81,25
100,00
Processing Rate (Hz)
0,00 1,25 2,50 3,75 5,00
Safe Queue Klepsydra
Traditional concurrent queue
Klepsydra’s Eventloop
Power consumption vs Data Processing
Power
(%)
10
33
55
78
100
Data processing rate (Hz)
0 10 20 30 40
Traditional
edge software
Klepsydra
Technical Spec:
• Processor: GR716
• OS: RTEMS 5
• Middleware: Memory data sharing
Benchmark Scenario:
• Multi-sensor data processing
• Concurrent Queue and Klepsydra’s processing
engine
APPROACHES TO CONCURRENT
ALGORITHMIC EXECUTION
Parallelisation Pipeline
BENCHMARK DESCRIPTION
Description
• Given an input matrix, a number of sequential multiplications will be
performed:
• Step 1: A => B = A x A => Step 2 : C = B x B…
• Matrix A randomly generated on each new sequence
Parameters:
• Matrix dimensions: 100x100
• Data type: Float, integer
• Number of multiplications per matrix: [10, 60]
• Processing frequency: [2Hz - 100Hz]
Technical Spec
• Computer: Odroid XU4
• OS: Ubuntu 18.04
TESTING SCENARIOS
Input
Matrix
B = A x A C = B x B
Output
Matrix
Input
Matrix B = A x A
Output
Matrix
C = B x B
Klepsydra Parallel Streaming Setup
OpenMP Sequential Setup
{
Thread 1
{
Thread 2
{
Vectorised
{
Vectorised
FLOAT PERFORMANCE RESULTS I
CPU Usage. 10 Steps
0,0
22,5
45,0
67,5
90,0
Publishing Rate (Hz)
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
Throughput. 10 Steps
0,00
25,00
50,00
75,00
100,00
Publishing Rate (Hz)
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
Latency. 10 Steps
0,00
12,50
25,00
37,50
50,00
Publishing Rate (Hz)
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
Throughput. 20 Steps
0,00
10,00
20,00
30,00
40,00
Publishing Rate (Hz)
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra
Latency. 20 Steps
0,00
27,50
55,00
82,50
110,00
Publishing Rate (Hz)
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra
CPU Usage. 20 Steps
0,0
22,5
45,0
67,5
90,0
Publishing Rate (Hz)
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra
FLOAT PERFORMANCE RESULTS II
CPU Usage. 30 Steps
0,0
20,0
40,0
60,0
80,0
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
Throughput. 30 Steps
0,00
5,00
10,00
15,00
20,00
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
CPU Usage. 40 Steps
0,0
17,5
35,0
52,5
70,0
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Throughput. 40 Steps
0,00
3,50
7,00
10,50
14,00
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Latency. 40 Steps
0,00
60,00
120,00
180,00
240,00
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Latency. 30 Steps
0,00
45,00
90,00
135,00
180,00
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
FLOAT PERFORMANCE RESULTS III
CPU Usage. 50 Steps
0,0
15,0
30,0
45,0
60,0
Publishing Rate (Hz)
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
Throughput. 50 Steps
0,00
2,75
5,50
8,25
11,00
Publishing Rate (Hz)
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
Latency. 50 Steps
0,00
100,00
200,00
300,00
400,00
Publishing Rate (Hz)
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
CPU Usage. 60 Steps
0,0
15,0
30,0
45,0
60,0
Publishing Rate (Hz)
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra
Throughput. 60 Steps
0,00
2,00
4,00
6,00
8,00
Publishing Rate (Hz)
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra
Latency. 60 Steps
0,00
225,00
450,00
675,00
900,00
Publishing Rate (Hz)
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra
KLEPSYDRA AI DATA PROCESSING
APPROACH
Input
Data
Layer Layer
Output
Data
Klepsydra AI threading model
{
Thread 1
{
Thread 2
Threading model consists of:
- Number of cores assigned to event loops
- Number of event loops per core
- Number of parallelisation threads for each layer
Most layers can
be parallelised
and are
vectorised.
Eventloops are
assigned to
cores
Performance tuning
Performance Criteria
• CPU usage
• RAM usage
• Throughput (output data rate)
• Latency
15
Performance parameters:
• pool_size
Size of the internal queues of the event loop publish/
subscribe pairs.
High throughput requires large numbers, i.e., more RAM
usage, low throughout requires smaller number, therefore
less RAM.
Performance parameters
• number_of_cores
Number of cores where event loops will be distributed (by
default one event loop per core). High throughput requires
more cores, i.e., more CPU usage, low throughput requires
low number of cores, therefore substantial reduction in
CPU usage.
Performance parameters
• number_of_parallel_threads
Number of threads assigned to parallelise layers. For low
latency requirements, assign large numbers (maximum =
number of cores), i.e., increase CPU usage. For no latency
requirements, use low numbers (minimum = 1), therefore
substantial reduction in CPU usage.
16
Example of performance benchmarks
TensorFlow Klepsydra AI
Latency: 56ms
Latency: 35ms
KLEPSYDRA AI IN ACTION
The demo:
• Pose estimation of 67P/
Churyumov–Gerasimenko
asteroid.
• Using an AI deep neural network
(DNN)
• Using real and synthetically
generated data from Rosetta
mission.
• Comparison of three AI inference
engines Klepsydra AI,
TensorFlowLite and OpenCV-
CNN
• Three identical computers,
running the same model with
the same input data and FPS.
ROADMAP
Q2 2021
• No third party dependencies.
• Binaries are C/C++ only
• Custom format for models
Q3 2021
• FreeRTOS support (alpha version)
• Xilinx Ultrascale+ board
• Microchip SAM V71
Q4 2021
• PykeOS support (alpha version)
• Xilinx Zedboard
Q1 2022
• NVIDIA Jetson TX2 Support (alpha
release)
• Quantisation support
Q2 2022
• Graphs support
• Memory allocation new model
• C support
Legend:
Hard deadlines
Flexible dates
CONCLUSIONS
• The use of advanced lock-free algorithms for on-board data
processing allows a substantial increase in real-time data
throughput and a 50% reduction in power consumption.
• When combined with pipelining, it can enable ground
breaking performance improvement in AI algorithms.
• Further work will be done in the
fi
eld of GPU and FPGA, self-
tuning and graph AI models.
CONTACT INFORMATION
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

Weitere ähnliche Inhalte

Ähnlich wie Smallsat 2021

Ähnlich wie Smallsat 2021 (20)

CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone ML
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
IAC 2020
IAC 2020IAC 2020
IAC 2020
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
 
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 

Mehr von klepsydratechnologie

Mehr von klepsydratechnologie (7)

Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
Klepsydra Company Presentation
Klepsydra Company PresentationKlepsydra Company Presentation
Klepsydra Company Presentation
 
Roscon2021 Executor
Roscon2021 ExecutorRoscon2021 Executor
Roscon2021 Executor
 
GR740 User day
GR740 User dayGR740 User day
GR740 User day
 
IAC 2019
IAC 2019 IAC 2019
IAC 2019
 

Kürzlich hochgeladen

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Kürzlich hochgeladen (20)

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Smallsat 2021

  • 1. SMALLSAT 2021 PRESENTATION DR PABLO GHIGLINO pablo.ghiglino@klepsydra.com www.klepsydra.com A Low Power And High Performance Arti fi cial Intelligence Approach To Increase Guidance Navigation And Control Robustness
  • 2. KLEPSYDRA AI IN ACTION The demo: • Pose estimation of 67P/ Churyumov–Gerasimenko asteroid. • Using an AI deep neural network (DNN) • Using real and synthetically generated data from Rosetta mission. • Comparison of three AI inference engines Klepsydra AI, TensorFlowLite and OpenCV- CNN • Three identical computers, running the same model with the same input data and FPS.
  • 3. KLEPSYDRA AI OVERVIEW Klepsydra AI Performance analysis Language bindings Trained Model Basic features Advanced features Images Sensor Data Timeseries
  • 4. TRADING SOFTWARE VS EDGE SOFTWARE Trading Systems Edge Systems • Bigger computer did not solve the problem • Can be solved using cutting-edge lock-free programming techniques • Top investment banks make billions using these techniques. • Very few developers have the required skills Computer Usage Low Medium Data volume Saturation
  • 5. THE TECHNOLOGY Klepsydra SDK Sensors External Comms Other Events Application Operating System Patent pending technology Klepsydra SDK • 8x more real-time throughput • 50% less CPU consumption • No extra hardware or cloud
  • 6. Event Loop Sensor Multiplexer Two main data processing approaches Producer 1 Consumer 1 Consumer 2 Producer 2 Producer 3 Consumer Producer 1 6
  • 7. Cobham GR716 Microcontroller 7 CPU vs Data processing rate 8 producers CPU (%) 25,00 43,75 62,50 81,25 100,00 Processing Rate (Hz) 0,00 1,25 2,50 3,75 5,00 Safe Queue Klepsydra Traditional concurrent queue Klepsydra’s Eventloop Power consumption vs Data Processing Power (%) 10 33 55 78 100 Data processing rate (Hz) 0 10 20 30 40 Traditional edge software Klepsydra Technical Spec: • Processor: GR716 • OS: RTEMS 5 • Middleware: Memory data sharing Benchmark Scenario: • Multi-sensor data processing • Concurrent Queue and Klepsydra’s processing engine
  • 8. APPROACHES TO CONCURRENT ALGORITHMIC EXECUTION Parallelisation Pipeline
  • 9. BENCHMARK DESCRIPTION Description • Given an input matrix, a number of sequential multiplications will be performed: • Step 1: A => B = A x A => Step 2 : C = B x B… • Matrix A randomly generated on each new sequence Parameters: • Matrix dimensions: 100x100 • Data type: Float, integer • Number of multiplications per matrix: [10, 60] • Processing frequency: [2Hz - 100Hz] Technical Spec • Computer: Odroid XU4 • OS: Ubuntu 18.04
  • 10. TESTING SCENARIOS Input Matrix B = A x A C = B x B Output Matrix Input Matrix B = A x A Output Matrix C = B x B Klepsydra Parallel Streaming Setup OpenMP Sequential Setup { Thread 1 { Thread 2 { Vectorised { Vectorised
  • 11. FLOAT PERFORMANCE RESULTS I CPU Usage. 10 Steps 0,0 22,5 45,0 67,5 90,0 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Throughput. 10 Steps 0,00 25,00 50,00 75,00 100,00 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Latency. 10 Steps 0,00 12,50 25,00 37,50 50,00 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Throughput. 20 Steps 0,00 10,00 20,00 30,00 40,00 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra Latency. 20 Steps 0,00 27,50 55,00 82,50 110,00 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra CPU Usage. 20 Steps 0,0 22,5 45,0 67,5 90,0 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra
  • 12. FLOAT PERFORMANCE RESULTS II CPU Usage. 30 Steps 0,0 20,0 40,0 60,0 80,0 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra Throughput. 30 Steps 0,00 5,00 10,00 15,00 20,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra CPU Usage. 40 Steps 0,0 17,5 35,0 52,5 70,0 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Throughput. 40 Steps 0,00 3,50 7,00 10,50 14,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 40 Steps 0,00 60,00 120,00 180,00 240,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 30 Steps 0,00 45,00 90,00 135,00 180,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra
  • 13. FLOAT PERFORMANCE RESULTS III CPU Usage. 50 Steps 0,0 15,0 30,0 45,0 60,0 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra Throughput. 50 Steps 0,00 2,75 5,50 8,25 11,00 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra Latency. 50 Steps 0,00 100,00 200,00 300,00 400,00 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra CPU Usage. 60 Steps 0,0 15,0 30,0 45,0 60,0 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra Throughput. 60 Steps 0,00 2,00 4,00 6,00 8,00 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra Latency. 60 Steps 0,00 225,00 450,00 675,00 900,00 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra
  • 14. KLEPSYDRA AI DATA PROCESSING APPROACH Input Data Layer Layer Output Data Klepsydra AI threading model { Thread 1 { Thread 2 Threading model consists of: - Number of cores assigned to event loops - Number of event loops per core - Number of parallelisation threads for each layer Most layers can be parallelised and are vectorised. Eventloops are assigned to cores
  • 15. Performance tuning Performance Criteria • CPU usage • RAM usage • Throughput (output data rate) • Latency 15 Performance parameters: • pool_size Size of the internal queues of the event loop publish/ subscribe pairs. High throughput requires large numbers, i.e., more RAM usage, low throughout requires smaller number, therefore less RAM. Performance parameters • number_of_cores Number of cores where event loops will be distributed (by default one event loop per core). High throughput requires more cores, i.e., more CPU usage, low throughput requires low number of cores, therefore substantial reduction in CPU usage. Performance parameters • number_of_parallel_threads Number of threads assigned to parallelise layers. For low latency requirements, assign large numbers (maximum = number of cores), i.e., increase CPU usage. For no latency requirements, use low numbers (minimum = 1), therefore substantial reduction in CPU usage.
  • 16. 16 Example of performance benchmarks TensorFlow Klepsydra AI Latency: 56ms Latency: 35ms
  • 17. KLEPSYDRA AI IN ACTION The demo: • Pose estimation of 67P/ Churyumov–Gerasimenko asteroid. • Using an AI deep neural network (DNN) • Using real and synthetically generated data from Rosetta mission. • Comparison of three AI inference engines Klepsydra AI, TensorFlowLite and OpenCV- CNN • Three identical computers, running the same model with the same input data and FPS.
  • 18. ROADMAP Q2 2021 • No third party dependencies. • Binaries are C/C++ only • Custom format for models Q3 2021 • FreeRTOS support (alpha version) • Xilinx Ultrascale+ board • Microchip SAM V71 Q4 2021 • PykeOS support (alpha version) • Xilinx Zedboard Q1 2022 • NVIDIA Jetson TX2 Support (alpha release) • Quantisation support Q2 2022 • Graphs support • Memory allocation new model • C support Legend: Hard deadlines Flexible dates
  • 19. CONCLUSIONS • The use of advanced lock-free algorithms for on-board data processing allows a substantial increase in real-time data throughput and a 50% reduction in power consumption. • When combined with pipelining, it can enable ground breaking performance improvement in AI algorithms. • Further work will be done in the fi eld of GPU and FPGA, self- tuning and graph AI models.
  • 20. CONTACT INFORMATION Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies