SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
Integrative Parallel Programming in HPC 
Victor Eijkhout 
2014/09/22
Introduction 
 Motivating example 
 Type system 
 Demonstration 
 Other applications 
 Tasks and processes 
 Task execution 
 Research 
 Conclusion 
GA Tech | 2014/09/22| 2
Introduction 
GA Tech | 2014/09/22| 3
My aims for a new parallel programming 
system 
1. There are many types of parallelism 
) Uniform treatment of parallelism 
2. Data movement is more important than computation 
) While acknowledging the realities of hardware 
3. CS theory seems to ignore HPC-type of parallelism 
) Strongly theory based 
IMP: Integrative Model for Parallelism 
GA Tech | 2014/09/22| 4
Design of a programming system 
One needs to distinguish: 
Programming model How does it look in code 
Execution model How is it actually executed 
Data model How is data placed and moved about 
Three dierent vocabularies! 
GA Tech | 2014/09/22| 5
Programming model 
Sequential semantics 
[A]n HPF program may be understood (and debugged) 
using sequential semantics, a deterministic world that we are 
comfortable with. Once again, as in traditional programming, 
the programmer works with a single address space, treating an 
array as a single, monolithic object, regardless of how it may 
be distributed across the memories of a parallel machine. 
(Nikhil 1993) 
As opposed to 
[H]umans are quickly overwhelmed by concurrency and
nd it much more dicult to reason about concurrent than 
sequential code. Even careful people miss possible interleavings 
among even simple collections of partially ordered operations. 
(Sutter and Larus 2005) 
GA Tech | 2014/09/22| 6
Programming model 
Sequential semantics is close to the mathematics of the problem. 
Note: sequential semantics in the programming model does not 
mean BSP synchronization in the execution. 
Also note: sequential semantics is subtly dierent from SPMD 
(but at least SPMD puts you in the asynchronous mindset) 
GA Tech | 2014/09/22| 7
Execution model 
Virtual machine: data 
ow. 
 Data 
ow expresses the essential dependencies in an 
algorithm. 
 Data 
ow applies to multiple parallelism models. 
 But it would be a mistake to program data
ow explicitly. 
GA Tech | 2014/09/22| 8
Data model 
Distribution: mapping from processors to data. 
(note: traditionally the other way around) 
Needed (and missing from existing systems such as UPC, HPF): 
 distributions need to be
rst-class objects: 
) we want an algebra of distributions 
 algorithms need to be expressed in distributions 
GA Tech | 2014/09/22| 9
Integrative Model for Parallelism (IMP) 
 Theoretical model for describing parallelism 
 Library (or maybe language) for describing operations on 
parallel data 
 Minimal, yet sucient, speci
cation of parallel aspects 
 Many aspects are formally derived (often as
rst-class 
objects), including messages and task dependencies. 
 ) Specify what, not how 
 ) Improve programmer productivity, code quality, eciency 
and robustness 
GA Tech | 2014/09/22| 10
Motivating example 
GA Tech | 2014/09/22| 11
1D example: 3-pt averaging 
Data parallel calculation: yi = f (xi1; xi ; xi+1) 
Each point has a dependency on three points, some on other 
processing elements 
GA Tech | 2014/09/22| 12
;
; 
 distributions 
Distribution: processor-to-elements mapping 
  distribution: data assignment on input 
 
 distribution: data assignment on output
distribution: `local data' assignment 
 )
is dynamically de
ned from the algorithm 
GA Tech | 2014/09/22| 13
Data
ow 
We get a dependency structure: 
Interpretation: 
 Tasks: local task graph 
 Message passing: messages 
Note: this structure follows from the distributions of the algorithm, 
it is not programmed. 
GA Tech | 2014/09/22| 14
Algorithms in the Integrative Model 
Kernel: mapping between two distributed objects 
 An algorithm consists of Kernels 
 Each kernel consists of independent operations/tasks 
 Traditional elements of parallel programming are derived from 
the kernel speci
cation. 
GA Tech | 2014/09/22| 15
Type system 
GA Tech | 2014/09/22| 16
Generalized data parallelism 
Functions 
f : Realk ! Real 
applied to arrays y = f (x): 
yi = f 
 
x(If (i)) 
 
This de
nes function 
If : N ! 2N 
for instance If = fi ; i  1; i + 1g. 
GA Tech | 2014/09/22| 17
Distributions 
Distribution is (non-disjoint, non-unique) mapping from processors 
to sets of indices: 
d : P ! 2N 
Distributed data: 
x(d) : p7! fxi : i 2 d(p)g 
Operations on distributions: 
g : N ! N ) g(d) : p7! fg(i) : i 2 d(p)g 
GA Tech | 2014/09/22| 18
Algorithms in terms of distributions 
If d is a distribution, and (funky notation) 
x  y  x + y; x  y  x  y 
the motivating example becomes: 
y(d) = x(d) + x(d  1) + x(d  1) 
and the
distribution is
= d [ d  1 [ d  1 
To reiterate: the
distribution comes from the structure of the 
algorithm 
GA Tech | 2014/09/22| 19
Transformations of distributions 
How do you go from the  to
distribution of a distributed object? 
x(
) = T(;
)x() whereT(;
) = 1
De
ne 1
: P ! 2P by: 
q 2 1
(p)  (q)
(p)6= ; 
`If q 2 1
(p), the task on q has data for the task on p' 
 OpenMP: task wait 
 MPI: message between q and p 
GA Tech | 2014/09/22| 20
Parallel computing with transformations 
Let y(
) distributed output, then (total needed input)
= If (
) 
so 
y(
) = f 
 
x(
) 
 
;
= If (
) 
is local operation. However, x(), so 
y = f (Tx)  
8 
: 
y is distributed as y(
) 
x is distributed as x()
= If 
 
T = 1
GA Tech | 2014/09/22| 21
Data
ow 
q 2 1
(p) 
Parts of a data
ow graph 
can be realized with OMP tasks 
or MPI messages 
Total data
ow graph from 
all kernels and 
all processes in kernels 
GA Tech | 2014/09/22| 22

Weitere ähnliche Inhalte

Was ist angesagt?

Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
TarikuDabala1
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
guest084d20
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
guest084d20
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
 

Was ist angesagt? (20)

Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm Analyzing
 
Data analysis with R and Julia
Data analysis with R and JuliaData analysis with R and Julia
Data analysis with R and Julia
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
 
Parallel algorithm in linear algebra
Parallel algorithm in linear algebraParallel algorithm in linear algebra
Parallel algorithm in linear algebra
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
Have you met Julia?
Have you met Julia?Have you met Julia?
Have you met Julia?
 
Matlab
MatlabMatlab
Matlab
 
Object Detection & Machine Learning Paper
 Object Detection & Machine Learning Paper Object Detection & Machine Learning Paper
Object Detection & Machine Learning Paper
 
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
 
Matlab Basic Tutorial
Matlab Basic TutorialMatlab Basic Tutorial
Matlab Basic Tutorial
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Matlab
Matlab Matlab
Matlab
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
 
Parallel searching
Parallel searchingParallel searching
Parallel searching
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 

Andere mochten auch

127868127 chemistry-reviewers
127868127 chemistry-reviewers127868127 chemistry-reviewers
127868127 chemistry-reviewers
Mhel Policarpio
 
People appearance and personality ingles iv unit i
People appearance and personality ingles iv unit iPeople appearance and personality ingles iv unit i
People appearance and personality ingles iv unit i
Mercy Paliza
 
การเขียนตัวสะกดอันใหม่ 270856
การเขียนตัวสะกดอันใหม่ 270856การเขียนตัวสะกดอันใหม่ 270856
การเขียนตัวสะกดอันใหม่ 270856
Rose'zll LD
 

Andere mochten auch (20)

Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 
HPC Performance tools, on the road to Exascale
HPC Performance tools, on the road to ExascaleHPC Performance tools, on the road to Exascale
HPC Performance tools, on the road to Exascale
 
Esctp snir
Esctp snirEsctp snir
Esctp snir
 
Field assignment
Field assignmentField assignment
Field assignment
 
Inkscape
InkscapeInkscape
Inkscape
 
Operations management overview
Operations management overviewOperations management overview
Operations management overview
 
Colors
ColorsColors
Colors
 
Yourprezi
YourpreziYourprezi
Yourprezi
 
127868127 chemistry-reviewers
127868127 chemistry-reviewers127868127 chemistry-reviewers
127868127 chemistry-reviewers
 
干 货
干 货干 货
干 货
 
Small business tax planning (2)
Small business tax planning (2)Small business tax planning (2)
Small business tax planning (2)
 
Viễn Tín Vinh IDDC
Viễn Tín Vinh IDDCViễn Tín Vinh IDDC
Viễn Tín Vinh IDDC
 
1사 1사회적기업 캠페인 가이드북 v1.0
1사 1사회적기업 캠페인 가이드북 v1.01사 1사회적기업 캠페인 가이드북 v1.0
1사 1사회적기업 캠페인 가이드북 v1.0
 
Apresiasi Cerpen
Apresiasi CerpenApresiasi Cerpen
Apresiasi Cerpen
 
勉強会勉強会
勉強会勉強会勉強会勉強会
勉強会勉強会
 
People appearance and personality ingles iv unit i
People appearance and personality ingles iv unit iPeople appearance and personality ingles iv unit i
People appearance and personality ingles iv unit i
 
Importancia de la maquetación
Importancia de la maquetaciónImportancia de la maquetación
Importancia de la maquetación
 
การเขียนตัวสะกดอันใหม่ 270856
การเขียนตัวสะกดอันใหม่ 270856การเขียนตัวสะกดอันใหม่ 270856
การเขียนตัวสะกดอันใหม่ 270856
 

Ähnlich wie Integrative Parallel Programming in HPC

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
Sri Prasanna
 
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Frederic Desprez
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
Inderjeet Singh
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
Ruymán Reyes
 

Ähnlich wie Integrative Parallel Programming in HPC (20)

EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
Probe Debugging
Probe DebuggingProbe Debugging
Probe Debugging
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
Pregel
PregelPregel
Pregel
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 

Kürzlich hochgeladen

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Kürzlich hochgeladen (20)

GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 

Integrative Parallel Programming in HPC

  • 1. Integrative Parallel Programming in HPC Victor Eijkhout 2014/09/22
  • 2. Introduction Motivating example Type system Demonstration Other applications Tasks and processes Task execution Research Conclusion GA Tech | 2014/09/22| 2
  • 3. Introduction GA Tech | 2014/09/22| 3
  • 4. My aims for a new parallel programming system 1. There are many types of parallelism ) Uniform treatment of parallelism 2. Data movement is more important than computation ) While acknowledging the realities of hardware 3. CS theory seems to ignore HPC-type of parallelism ) Strongly theory based IMP: Integrative Model for Parallelism GA Tech | 2014/09/22| 4
  • 5. Design of a programming system One needs to distinguish: Programming model How does it look in code Execution model How is it actually executed Data model How is data placed and moved about Three dierent vocabularies! GA Tech | 2014/09/22| 5
  • 6. Programming model Sequential semantics [A]n HPF program may be understood (and debugged) using sequential semantics, a deterministic world that we are comfortable with. Once again, as in traditional programming, the programmer works with a single address space, treating an array as a single, monolithic object, regardless of how it may be distributed across the memories of a parallel machine. (Nikhil 1993) As opposed to [H]umans are quickly overwhelmed by concurrency and
  • 7. nd it much more dicult to reason about concurrent than sequential code. Even careful people miss possible interleavings among even simple collections of partially ordered operations. (Sutter and Larus 2005) GA Tech | 2014/09/22| 6
  • 8. Programming model Sequential semantics is close to the mathematics of the problem. Note: sequential semantics in the programming model does not mean BSP synchronization in the execution. Also note: sequential semantics is subtly dierent from SPMD (but at least SPMD puts you in the asynchronous mindset) GA Tech | 2014/09/22| 7
  • 9. Execution model Virtual machine: data ow. Data ow expresses the essential dependencies in an algorithm. Data ow applies to multiple parallelism models. But it would be a mistake to program data ow explicitly. GA Tech | 2014/09/22| 8
  • 10. Data model Distribution: mapping from processors to data. (note: traditionally the other way around) Needed (and missing from existing systems such as UPC, HPF): distributions need to be
  • 11. rst-class objects: ) we want an algebra of distributions algorithms need to be expressed in distributions GA Tech | 2014/09/22| 9
  • 12. Integrative Model for Parallelism (IMP) Theoretical model for describing parallelism Library (or maybe language) for describing operations on parallel data Minimal, yet sucient, speci
  • 13. cation of parallel aspects Many aspects are formally derived (often as
  • 14. rst-class objects), including messages and task dependencies. ) Specify what, not how ) Improve programmer productivity, code quality, eciency and robustness GA Tech | 2014/09/22| 10
  • 15. Motivating example GA Tech | 2014/09/22| 11
  • 16. 1D example: 3-pt averaging Data parallel calculation: yi = f (xi1; xi ; xi+1) Each point has a dependency on three points, some on other processing elements GA Tech | 2014/09/22| 12
  • 17. ;
  • 18. ; distributions Distribution: processor-to-elements mapping distribution: data assignment on input distribution: data assignment on output
  • 21. ned from the algorithm GA Tech | 2014/09/22| 13
  • 22. Data ow We get a dependency structure: Interpretation: Tasks: local task graph Message passing: messages Note: this structure follows from the distributions of the algorithm, it is not programmed. GA Tech | 2014/09/22| 14
  • 23. Algorithms in the Integrative Model Kernel: mapping between two distributed objects An algorithm consists of Kernels Each kernel consists of independent operations/tasks Traditional elements of parallel programming are derived from the kernel speci
  • 24. cation. GA Tech | 2014/09/22| 15
  • 25. Type system GA Tech | 2014/09/22| 16
  • 26. Generalized data parallelism Functions f : Realk ! Real applied to arrays y = f (x): yi = f x(If (i)) This de
  • 27. nes function If : N ! 2N for instance If = fi ; i 1; i + 1g. GA Tech | 2014/09/22| 17
  • 28. Distributions Distribution is (non-disjoint, non-unique) mapping from processors to sets of indices: d : P ! 2N Distributed data: x(d) : p7! fxi : i 2 d(p)g Operations on distributions: g : N ! N ) g(d) : p7! fg(i) : i 2 d(p)g GA Tech | 2014/09/22| 18
  • 29. Algorithms in terms of distributions If d is a distribution, and (funky notation) x y x + y; x y x y the motivating example becomes: y(d) = x(d) + x(d 1) + x(d 1) and the
  • 31. = d [ d 1 [ d 1 To reiterate: the
  • 32. distribution comes from the structure of the algorithm GA Tech | 2014/09/22| 19
  • 33. Transformations of distributions How do you go from the to
  • 34. distribution of a distributed object? x(
  • 37. ) = 1
  • 38. De
  • 39. ne 1
  • 40. : P ! 2P by: q 2 1
  • 42. (p)6= ; `If q 2 1
  • 43. (p), the task on q has data for the task on p' OpenMP: task wait MPI: message between q and p GA Tech | 2014/09/22| 20
  • 44. Parallel computing with transformations Let y( ) distributed output, then (total needed input)
  • 45. = If ( ) so y( ) = f x(
  • 46. ) ;
  • 47. = If ( ) is local operation. However, x(), so y = f (Tx) 8 : y is distributed as y( ) x is distributed as x()
  • 48. = If T = 1
  • 49. GA Tech | 2014/09/22| 21
  • 51. (p) Parts of a data ow graph can be realized with OMP tasks or MPI messages Total data ow graph from all kernels and all processes in kernels GA Tech | 2014/09/22| 22
  • 52. To summarize Distribution language is global with sequential semantics Leads to data ow formulation Can be interpreted in multiple parallelism modes Execution likely to be ecient GA Tech | 2014/09/22| 23
  • 53. Demonstration GA Tech | 2014/09/22| 24
  • 54. Can you code this? As a library / internal DSL: express distributions in custom API, write local operation in ordinary C/F ) easy integration in existing codes As a programming language / external DSL: requires compiler technology: ) prospect for interactions between data movement and local code. GA Tech | 2014/09/22| 25
  • 55. Approach taken Program expresses the sequential semantics of kernels Base class to realize the IMP concepts One derived class that turns IMP into MPI One derived class that turns IMP into OpenMP+tasks Total: few thousand lines. GA Tech | 2014/09/22| 26
  • 56. GA Tech | 2014/09/22| 27
  • 57. Code IMP_distribution *blocked = new IMP_distribution (disjoint-block,problem_environment,globalsize); for (int step=0; step=nsteps; ++step) { IMP_object *output_vector = new IMP_object( blocked ); all_objects[step] = output_vector; } GA Tech | 2014/09/22| 28
  • 58. for (int step=0; step=nsteps; ++step) { IMP_object *input_vector = all_objects[step-1], *input_vector = all_objects[step]; IMP_kernel *update_step = new IMP_kernel(input_vector,output_vector); update_step-localexecutefn = threepoint_execute; update_step-add_beta_oper( new ioperator(1) ); update_step-add_beta_oper( new ioperator(1) ); update_step-add_beta_oper( new ioperator(none) ); queue-add_kernel( step,update_step ); GA Tech | 2014/09/22| 29
  • 59. Inspector-executor queue-analyze_dependencies(); queue-execute(); Analysis done once (expensive) execution multiple times (very ecient) (In MPI context you can dispense with the queue and execute kernels directly) GA Tech | 2014/09/22| 30
  • 60. (Do I really have to put up performance graphs?) GA Tech | 2014/09/22| 31
  • 61. (Do I really have to put up performance graphs?) 2 4 6 8 10 12 14 16 100 10-1 Gflop under strong scaling of vector averaging OpenMP IMP 140 120 100 80 60 40 20 0 Gflop under weak scaling of vector averaging MPI IMP 0 200 400 600 800 1000 GA Tech | 2014/09/22| 32
  • 62. Summary: the motivating example in parallel language Write the three-point averaging as y(u) = x(u) + x(u 1) + x(u 1) =3 Global description, sequential semantics Execution is driven by data ow, no synchronization -distribution given by context
  • 63. -distribution is u + u 1 + u 1 Messages and task dependencies are derived. GA Tech | 2014/09/22| 33
  • 64. Other applications GA Tech | 2014/09/22| 34
  • 65. N-body problems GA Tech | 2014/09/22| 35
  • 66. Distributions of the N-body problem Going up the levels: (k1) = (k)=2
  • 67. (k) = 2 (k) [ 2 (k) + j (k)j: Redundant computation is never explicitly mentioned. (This can be coded; code is essentially the same as the formulas) GA Tech | 2014/09/22| 36
  • 68. Tasks and processes GA Tech | 2014/09/22| 37
  • 69. Task graph Task is local execution: Task Kernel P Task numbering: hi ; pi where i n; p 2 P Dependency edge: hi ; qi; hi + 1; pi i q 2 1
  • 70. (p): also written t0 = hi ; qi; t = hi + 1; pi; t0 t GA Tech | 2014/09/22| 38
  • 71. Processors and synchronization Processor Cp is a (non-disjoint) subset of tasks: Task = [pCp: For a task t 2 T we de
  • 72. ne a task t0 as a synchronization point if t0 is an immediate predecessor on another processor: t 2 Cp ^ t0 t ^ t0 2 Cp0 ^ p6= p0: If L Task, base BL is BL = ft 2 L: pred(t)6 Lg: GA Tech | 2014/09/22| 39
  • 73. Local computations Two-parameter covering fLk;pgk;p of T is called local computations if 1. the p index corresponds to the division in processors: Cp = [kLk;p: 2. the k index corresponds to the partial ordering on tasks: the sets Lk = [pLk;p satisfy t 2 Lk ^ t0 t ) t0 2 [ `k L` 3. the synchronization points synchronize only with previous levels: pred(Bk;p) Cp [ `k L` For a given k, all Lk;p can be executed independently. GA Tech | 2014/09/22| 40
  • 74. (a): (b): (c): Are these local computations? Yes, No, Yes GA Tech | 2014/09/22| 41
  • 76. nitions can be given purely in terms of the task graph. Programmer decides how `thick' to make the Lk;p covering, communication avoiding scheduling is formally derived. GA Tech | 2014/09/22| 42
  • 77. Co-processors Distributions can describe data placement Our main worry is latency of data movement: in IMP, data can be sent early-as-possible; our communication avoiding compiler transforms algorithms to maximize granularity GA Tech | 2014/09/22| 43
  • 78. Task execution GA Tech | 2014/09/22| 44
  • 79. What is a task? A task is a Finite State Automaton with
  • 80. ve states, transitions are triggered by receiving signals from other tasks: requesting Each task starts out by posting a request for incoming data to each of its predecessors. accepting The requested data is in the process of arriving or being made available. exec The data dependencies are satis
  • 81. ed and the task can execute locally; in a re
  • 82. nement of this model there can be a separate exec state for each predecessor. avail Data that was produced and that serves as origin for some dependency is published to all successor tasks. used All published origin data has been absorbed by the endpoint of the data dependency, and any temporary buers can be released. GA Tech | 2014/09/22| 45
  • 83. p states control messages q; s states requesting # notifyReadyToSend ... exec 8q p requestToSend # q p accepting ! # sendData avail acknowledgeReceipt ! # 8q p used requesting notifyReadyToSend exec ! # 8s p # requestToSend s p accepting 9s p ... avail # sendData ! acknowledgeReceipt 8s p GA Tech | 2014/09/22| 46
  • 84. How does a processor manage tasks? Theorem: if you get a request-to-send, you can release the send buers of your predecessor tasks Corrolary: we have a functional model that doesn't need garbage collection GA Tech | 2014/09/22| 47
  • 85. Research GA Tech | 2014/09/22| 48
  • 86. Open questions Many! Software is barely in demonstration stage: needs much more functionality Theoretical questions: SSA, cost, scheduling, Practical questions: interaction with local code, heterogeneity, interaction with hardware Application: this works for tradition HPC, N-body, probably sorting and graph algorithms. Beyond? Software-hardware co-design: IMP model has semantics for data movement, hardware can be made more ecient using this. GA Tech | 2014/09/22| 49
  • 87. Conclusion GA Tech | 2014/09/22| 50
  • 88. The future's so bright, I gotta wear shades IMP has the right abstraction level: global expression, yet natural derivation of practical concepts. Concept notation looks humanly possible: basis for an expressive programming system Global description without talking about processes/processors: prospect for heterogeneous programming All concepts are explicit: middleware for scheduling, resilience, et cetera Applications to most conceivable scienti
  • 89. c operations GA Tech | 2014/09/22| 51