SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Parallel Iterative solution of the
Hermite Collocation Equations
on GPUs
Emmanuel N. Mathioudakis
Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis
TECHNICAL UNIVERSITY OF CRETE
DEPARTMENT OF SCIENCES
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
73100 CHANIA - CRETE - GREECE
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Talk Overview
 Hermite Collocation for elliptic BVPs &
Derivation of the Collocation linear system
 Development of a parallel algorithm for the
Schur Complement method on Shared
Memory Parallel Architectures
 Parallel implementation on multicore computers
with GPUs
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
( , ) ( , ) , ( , )
( , ) ( , ) , ( , )
u x y f x y x y
u x y g x y x y
 
 
L
BBVP
( , ) ( , ) 0 , ( , )
:
( , ) ( , ) 0 , (
,
, )
y y yx x x
i j i j i j
y y yx x x
i j i j i j
a
i j
u f
u g
     
     
  
  
L
B
Hermite Collocation Method
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Hermite Collocation Method…
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Hermite Collocation Method…
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
with 0
( , ) ( , ) ( , ) , ( , )
( , ) ( , ) , ( , )
u x y u x y f x y x y
u x y g x y x y 


  
 
2
Model
Problem
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Helmholtz
Collocation
Matrix
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
 The collocation matrix is large, sparse and enjoys no pleasant
properties (e.g. symmetric, definite)
Iterative
+
Parallel
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Iterative Solution
with
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Iterative Solution
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Eigenvalues of Collocation matrix
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Schur Complement Iterative Solution
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel Iterative Solution of Collocation Linear system
on Shared Memory Architectures
 Uniform Load Balancing between core threads
 Minimal Idle Cycles of core threads
 Minimal Communication
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
( )
1
Rx ( )
2
Rx ( )
3
Rx ( )
4
Rx
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
( )
1
B
p
x 
( )
2
B
p
x 
( )
3
B
p
x 
( )
4
B
p
x 
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
V1
( )
1
Rx ( )
2
Rx ( )
3
Rx ( )
4
Rx( )
1
B
p
x 
( )
2
B
p
x 
( )
3
B
p
x 
( )
4
B
p
x 
V2 V3 V4 V5 V6 V7 V8
V1 V2 V3 V4 V5
Mapping into a fixed size Architecture of N Cores
case of k = 2p/N even 2k virtual threads
l=(j-1)k+1,…,jk
l=2p+(j-1)k+1,…,2p+jk
j=1,…,N
( )B
l
x
( )R
l
x
V2 . . .
Pj
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel Schur Complement Iterative Solution
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel BiCGSTAB
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel BiCGSTAB
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
The Dirichlet Helmholtz Problem
2100( 0.1)2with
( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1]
( ) ( ) x
u x y x y x y
x x x e
 
  
  
 
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
HP SL390s
6 core Xeon@2.8GHz
24GB memory
Oracle Linux 6.3 x64
PGI 13.5 Fortran
PCI-e gen2 x16
HP SL390s – Tesla M2070 GPUs
+ 2 x
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Iterations / Error measurements
ns
BiCGSTAB
Iterations
|| b – Axn||2
256 294 6.06e-11
512 589 2.85e-11
1024 1161 1.39e-11
2048 3726 9.59e-12
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
Conclusions
• A new parallel algorithm implementing the Schur
complement with BiCGSTAB iterative method for
Hermite Collocation equations has been developed.
• The algorithm is realized on Shared Memory multi-core
machines with GPUs .
• A performance acceleration of up to 30% is observed.
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Future work
• Design an efficient parallel Schur complement
algorithm of the Hermite Collocation equations
for Multiprocessor /Grid machines with GPUs.
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (9)

Novel reconfigurable hardware architecture for polynomial matrix multiplications
Novel reconfigurable hardware architecture for polynomial matrix multiplicationsNovel reconfigurable hardware architecture for polynomial matrix multiplications
Novel reconfigurable hardware architecture for polynomial matrix multiplications
 
A Gomez T Tat Cesga
A Gomez T Tat CesgaA Gomez T Tat Cesga
A Gomez T Tat Cesga
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
User guided discovery of declarative process models
User guided discovery of declarative process modelsUser guided discovery of declarative process models
User guided discovery of declarative process models
 
FMRIPREP - robust and easy to use fMRI preprocessing pipeline
FMRIPREP - robust and easy to use fMRI preprocessing pipelineFMRIPREP - robust and easy to use fMRI preprocessing pipeline
FMRIPREP - robust and easy to use fMRI preprocessing pipeline
 
6. Implementation
6. Implementation6. Implementation
6. Implementation
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 

Andere mochten auch

Andere mochten auch (7)

Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitude
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Ähnlich wie Parallel iterative solution of the hermite collocation equations on gpus

Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
Aritra Sarkar
 
epscor_talk_2.pptx
epscor_talk_2.pptxepscor_talk_2.pptx
epscor_talk_2.pptx
ShadowCon
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 

Ähnlich wie Parallel iterative solution of the hermite collocation equations on gpus (20)

Parallelising Dynamic Programming
Parallelising Dynamic ProgrammingParallelising Dynamic Programming
Parallelising Dynamic Programming
 
Experiences of numerical simulations on a PC cluster
Experiences of numerical simulations on a PC clusterExperiences of numerical simulations on a PC cluster
Experiences of numerical simulations on a PC cluster
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Evolutionary Algorithms and Self-Organised Systems
Evolutionary Algorithms and Self-Organised SystemsEvolutionary Algorithms and Self-Organised Systems
Evolutionary Algorithms and Self-Organised Systems
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
epscor_talk_2.pptx
epscor_talk_2.pptxepscor_talk_2.pptx
epscor_talk_2.pptx
 
autoTVM
autoTVMautoTVM
autoTVM
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentation
 
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
CV_Akram_Malak
CV_Akram_MalakCV_Akram_Malak
CV_Akram_Malak
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Parallel iterative solution of the hermite collocation equations on gpus

  • 1. Parallel Iterative solution of the Hermite Collocation Equations on GPUs Emmanuel N. Mathioudakis Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF SCIENCES APPLIED MATHEMATICS AND COMPUTERS LABORATORY 73100 CHANIA - CRETE - GREECE
  • 2. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Talk Overview  Hermite Collocation for elliptic BVPs & Derivation of the Collocation linear system  Development of a parallel algorithm for the Schur Complement method on Shared Memory Parallel Architectures  Parallel implementation on multicore computers with GPUs
  • 3. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y f x y x y u x y g x y x y     L BBVP ( , ) ( , ) 0 , ( , ) : ( , ) ( , ) 0 , ( , , ) y y yx x x i j i j i j y y yx x x i j i j i j a i j u f u g                   L B Hermite Collocation Method
  • 4. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
  • 5. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
  • 6. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 7. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 8. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 9. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 10. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 11. with 0 ( , ) ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y u x y f x y x y u x y g x y x y         2 Model Problem TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 12. Helmholtz Collocation Matrix TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 13. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 14.  The collocation matrix is large, sparse and enjoys no pleasant properties (e.g. symmetric, definite) Iterative + Parallel TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 15. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution with
  • 16. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution
  • 17. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Eigenvalues of Collocation matrix
  • 18. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Schur Complement Iterative Solution
  • 19. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Iterative Solution of Collocation Linear system on Shared Memory Architectures  Uniform Load Balancing between core threads  Minimal Idle Cycles of core threads  Minimal Communication
  • 20. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx
  • 21. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x 
  • 22. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p V1 ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x  V2 V3 V4 V5 V6 V7 V8
  • 23. V1 V2 V3 V4 V5
  • 24. Mapping into a fixed size Architecture of N Cores case of k = 2p/N even 2k virtual threads l=(j-1)k+1,…,jk l=2p+(j-1)k+1,…,2p+jk j=1,…,N ( )B l x ( )R l x V2 . . . Pj
  • 25. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Schur Complement Iterative Solution
  • 26. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
  • 27. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
  • 28. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY The Dirichlet Helmholtz Problem 2100( 0.1)2with ( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1] ( ) ( ) x u x y x y x y x x x e          
  • 29. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY HP SL390s 6 core Xeon@2.8GHz 24GB memory Oracle Linux 6.3 x64 PGI 13.5 Fortran PCI-e gen2 x16 HP SL390s – Tesla M2070 GPUs + 2 x
  • 30. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Iterations / Error measurements ns BiCGSTAB Iterations || b – Axn||2 256 294 6.06e-11 512 589 2.85e-11 1024 1161 1.39e-11 2048 3726 9.59e-12
  • 31. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 32. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 33. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 34. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 35. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 36. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 37. Conclusions • A new parallel algorithm implementing the Schur complement with BiCGSTAB iterative method for Hermite Collocation equations has been developed. • The algorithm is realized on Shared Memory multi-core machines with GPUs . • A performance acceleration of up to 30% is observed. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 38. Future work • Design an efficient parallel Schur complement algorithm of the Hermite Collocation equations for Multiprocessor /Grid machines with GPUs. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY