Programming Languages & Tools for Higher Performance & Productivity

Programming Languages & Tools
for Higher Performance &
Productivity
Hitoshi Murai (RIKEN)
Shun Kamatsuka (Fujitsu)
Tomotake Nakamura (Fujitsu)
Dec. 13, 2017 ARM HPC Workshop 1

Introduction of this Session
nFor higher performance & productivity on
HPC systems, programming environments
have a crucial role.
⦁ languages
⦁ compilers
⦁ tools
⦁ libraries
nRIKEN AICS and Fujitsu are collaborating to
design the programming env. of the
upcoming post-K computer.

Agenda of this Session
1. XcalableMP PGAS Language
⦁ by Hitoshi Murai
2. Advantages of the Compiler for Post-K
Computer
⦁ by Shun Kamatsuka
3. Overview of Programming Assistance
Tools for Post-K Computer
⦁ by Tomotake Nakamura

XcalableMP PGAS Language
Hitoshi Murai (RIKEN)

Introduction
nMessage Passing Interface (MPI) is a de-
facto standard for programming distributed-
memory HPC systems.
nProgramming with MPI is a very hard work.
We are developing the XcalableMP (XMP)
PGAS language, which could provide
both high performance and productivity,
for post-K.

What's PGAS?
nPartitioned Global Address Space
n"Global"
⦁ All processes or threads share one address
space and can access to every data in it.
n"Partitioned"
⦁ Remote and local data are distinguished and
might have different manners and costs of
access.
p0 p1 p2 p3
PGAS
private address space

What's ?
n A directive-based PGAS language
⦁ Extension for C/Fortran.
⦁ Latest ver. 1.3 is available at:
⦁ Defined by XMP WG of the PC Cluster Consortium.
n Two models of PGAS for distributed-memory
parallel programming:
⦁ Global view (data/work mapping directives)
⦁ Local view (coarray)
n Interoperable with other languages and
models (e.g. Python, MPI, OpenMP, OpenACC)
www.xcalablemp.org

Two Parallelization Models in XMP
nGlobal view
⦁ Users specify how a set of nodes cooperate to solve a
whole problem.
⦁ Rich directives for data/work mapping and comm.
⦁ Highly productive but suitable mainly to data parallelism.
nLocal view
⦁ Users specify how each node works to solve a partial
problem.
⦁ Coarray of Fortran 2008.
⦁ Lowly productive but more flexible.
Dec. 13, 2017 8ARM HPC Workshop

Example of a Global-view XMP Program
Dec. 13, 2017 9
real, dimension(lx,ly,lz) :: sr, se, ...
...
do iz = 1, lz-1
do iy = 1, ly
do ix = 1, lx
wu0 = sm(ix,iy,iz ) / sr(ix,iy,iz )
wu1 = sm(ix,iy,iz+1) / sr(ix,iy,iz+1)
wv0 = sn(ix,iy,iz ) / sr(ix,iy,iz )
...
ARM HPC Workshop

Example of a Global-view XMP Program
Dec. 13, 2017 10
!$xmp nodes p(npx,npy,npz)
!$xmp template (lx,ly,lz) :: t
!$xmp distribute (block,block,block) onto p :: t
real, dimension(lx,ly,lz) :: sr, se, ...
!$xmp align (ix,iy,iz) with t(ix,iy,iz) ::
!$xmp& sr, se, sm, sp, sn, sl, ...
!$xmp shadow (1,1,1) ::
!$xmp& sr, se, sm, sp, sn, sl, ...
...
!$xmp reflect (sr, sm, sp, se, sn, sl)
!$xmp loop (ix,iy,iz) on t(ix,iy,iz)
do iz = 1, lz-1
do iy = 1, ly
do ix = 1, lx
wu0 = sm(ix,iy,iz ) / sr(ix,iy,iz )
wu1 = sm(ix,iy,iz+1) / sr(ix,iy,iz+1)
wv0 = sn(ix,iy,iz ) / sr(ix,iy,iz )
...
stencil communication
work mapping
(parallel loops)
ARM HPC Workshop
data mapping

Local-view Programming
nCoarray, a PGAS feature of Fortran 2008, is
available in XMP/C as well as in
XMP/Fortran.
nBasic idea: data declared as coarray can
be accessed by remote nodes.
real a(1024)[*], b(1024)
a(512:1024)[1] = b(1:512)
sync all
float a[1024]:[*], b[1024];
a[512:512]:[0] = b[0:512];
xmp_sync_all(NULL);
XMP/Fortran XMP/C
1. An array a is declared as a coarray.
2. A local array section b(1:512) is put to a remote array section a(512:1024) on image 1.
3. A memory fence and barrier synchronization is performed.
1
2
3
1
2
3

Omni XcalableMP Compiler
n An open-source reference
impl. being developed by
RIKEN & U. Tsukuba.
n Latest Ver. 1.2.2 available at:
n Supported platforms include:
K, Fujitsu FX100, NEC SX, IBM BlueGene,
Hitachi SR, Cray, Linux clusters, etc.
n Proven applications include:
⦁ Plasma (3D fluid)
⦁ Seismic Imaging (3D stencil)
⦁ Fusion (Particle-in-Cell)
⦁ etc.
omni-compiler.org
C/Fortran
compiler
Frontend
Translator
Backend
.....
.....
XMP program
.....
.....
Executable
Comm. libraries
XMP runtime
Omni XMP
C/Fortran+MPI
program

HPL (of HPC Challenge Benchmarks)
nWritten in the global view of XMP/C
nData is distributed in the block-cyclic manner
and DGEMM is invoked for each block.
nOverlapping comm. and calc. using
asynchronous gmove
Dec. 13, 2017 13
double A_L[N][NB];
#pragma xmp align A_L[i][*] with t(*,i)
:
#pragma xmp gmove async(1)
A_L[k:len][0:NB] = A[k:len][j:NB];
:
for(m=j+NB;m<N;m+=NB){
for(n=j+NB;n<N;n+=NB){
cblas_dgemm(&A[m][n], ..);
if(xmp_test_async(1)){
// receive A[k:len][j:NB];
:
10
100
1000
256 2048 16384
423 TFlops (80.7%)
4,096 nodes
TFlops
Number of nodes
971 TFlops (46.3%)
16,384 nodes
ARM HPC Workshop

NICAM-DC (of Fiber Miniapps)
10
15
20
25
30
35
10 20 30 40
Speedup (MPI/10 = 10)
Number of MPI Processes
XMP MPI
n Written in the local
view of
XMP/Fortran with
coarray.
n The coarray-based
impl. is almost
comparable to the
original MPI-based
one.

XcalableMP2.0
nDynamic multitasking for manycore
processors
⦁ Breakaway from Bulk Synchronous Parallel (BSP)
model.
⦁ More chances for overlapping comm. and
comp.
nEnhancements of loop parallelization
nSupport for newer version of base
languages (Fortran 2008, C99, and C++11)

Summary
n PGAS languages are promising alternatives to MPI.
n XMP is a directive-based PGAS extension for Fortran
and C.
n XMP supports the global- and local-view
programming to achieve both high performance
and productivity.
n XMP will be available on post-K.
Dec. 13, 2017 16
omni-compiler.orgwww.xcalablemp.org
More information is available at:
ARM HPC Workshop

Programming Languages & Tools for Higher Performance & Productivity

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Programming Languages & Tools for Higher Performance & Productivity

Ähnlich wie Programming Languages & Tools for Higher Performance & Productivity (20)

Mehr von Linaro

Mehr von Linaro (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Programming Languages & Tools for Higher Performance & Productivity