SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Programming Languages & Tools
for Higher Performance &
Productivity
Hitoshi Murai (RIKEN)
Shun Kamatsuka (Fujitsu)
Tomotake Nakamura (Fujitsu)
Dec. 13, 2017 ARM HPC Workshop 1
Introduction of this Session
nFor higher performance & productivity on
HPC systems, programming environments
have a crucial role.
⦁ languages
⦁ compilers
⦁ tools
⦁ libraries
nRIKEN AICS and Fujitsu are collaborating to
design the programming env. of the
upcoming post-K computer.
Dec. 13, 2017 ARM HPC Workshop 2
Agenda of this Session
1. XcalableMP PGAS Language
⦁ by Hitoshi Murai
2. Advantages of the Compiler for Post-K
Computer
⦁ by Shun Kamatsuka
3. Overview of Programming Assistance
Tools for Post-K Computer
⦁ by Tomotake Nakamura
Dec. 13, 2017 ARM HPC Workshop 3
XcalableMP PGAS Language
Hitoshi Murai (RIKEN)
Dec. 13, 2017 ARM HPC Workshop 4
Introduction
nMessage Passing Interface (MPI) is a de-
facto standard for programming distributed-
memory HPC systems.
nProgramming with MPI is a very hard work.
Dec. 13, 2017 ARM HPC Workshop 5
We are developing the XcalableMP (XMP)
PGAS language, which could provide
both high performance and productivity,
for post-K.
What's PGAS?
nPartitioned Global Address Space
n"Global"
⦁ All processes or threads share one address
space and can access to every data in it.
n"Partitioned"
⦁ Remote and local data are distinguished and
might have different manners and costs of
access.
Dec. 13, 2017 ARM HPC Workshop 6
p0 p1 p2 p3
PGAS
private	address	space
What's ?
n A directive-based PGAS language
⦁ Extension for C/Fortran.
⦁ Latest ver. 1.3 is available at:
⦁ Defined by XMP WG of the PC Cluster Consortium.
n Two models of PGAS for distributed-memory
parallel programming:
⦁ Global view (data/work mapping directives)
⦁ Local view (coarray)
n Interoperable with other languages and
models (e.g. Python, MPI, OpenMP, OpenACC)
Dec. 13, 2017 ARM HPC Workshop 7
www.xcalablemp.org
Two Parallelization Models in XMP
nGlobal view
⦁ Users specify how a set of nodes cooperate to solve a
whole problem.
⦁ Rich directives for data/work mapping and comm.
⦁ Highly productive but suitable mainly to data parallelism.
nLocal view
⦁ Users specify how each node works to solve a partial
problem.
⦁ Coarray of Fortran 2008.
⦁ Lowly productive but more flexible.
Dec. 13, 2017 8ARM HPC Workshop
Example of a Global-view XMP Program
Dec. 13, 2017 9
real, dimension(lx,ly,lz) :: sr, se, ...
...
do iz = 1, lz-1
do iy = 1, ly
do ix = 1, lx
wu0 = sm(ix,iy,iz ) / sr(ix,iy,iz )
wu1 = sm(ix,iy,iz+1) / sr(ix,iy,iz+1)
wv0 = sn(ix,iy,iz ) / sr(ix,iy,iz )
...
ARM HPC Workshop
Example of a Global-view XMP Program
Dec. 13, 2017 10
!$xmp nodes p(npx,npy,npz)
!$xmp template (lx,ly,lz) :: t
!$xmp distribute (block,block,block) onto p :: t
real, dimension(lx,ly,lz) :: sr, se, ...
!$xmp align (ix,iy,iz) with t(ix,iy,iz) ::
!$xmp& sr, se, sm, sp, sn, sl, ...
!$xmp shadow (1,1,1) ::
!$xmp& sr, se, sm, sp, sn, sl, ...
...
!$xmp reflect (sr, sm, sp, se, sn, sl)
!$xmp loop (ix,iy,iz) on t(ix,iy,iz)
do iz = 1, lz-1
do iy = 1, ly
do ix = 1, lx
wu0 = sm(ix,iy,iz ) / sr(ix,iy,iz )
wu1 = sm(ix,iy,iz+1) / sr(ix,iy,iz+1)
wv0 = sn(ix,iy,iz ) / sr(ix,iy,iz )
...
stencil communication
work mapping
(parallel loops)
ARM HPC Workshop
data mapping
Local-view Programming
nCoarray, a PGAS feature of Fortran 2008, is
available in XMP/C as well as in
XMP/Fortran.
nBasic idea: data declared as coarray can
be accessed by remote nodes.
Dec. 13, 2017 ARM HPC Workshop 11
real a(1024)[*], b(1024)
a(512:1024)[1] = b(1:512)
sync all
float a[1024]:[*], b[1024];
a[512:512]:[0] = b[0:512];
xmp_sync_all(NULL);
XMP/Fortran XMP/C
1. An array a is declared as a coarray.
2. A local array section b(1:512) is put to a remote array section a(512:1024) on image 1.
3. A memory fence and barrier synchronization is performed.
1
2
3
1
2
3
Omni XcalableMP Compiler
n An open-source reference
impl. being developed by
RIKEN & U. Tsukuba.
n Latest Ver. 1.2.2 available at:
n Supported platforms include:
K, Fujitsu FX100, NEC SX, IBM BlueGene,
Hitachi SR, Cray, Linux clusters, etc.
n Proven applications include:
⦁ Plasma (3D fluid)
⦁ Seismic Imaging (3D stencil)
⦁ Fusion (Particle-in-Cell)
⦁ etc.
Dec. 13, 2017 ARM HPC Workshop 12
omni-compiler.org
C/Fortran
compiler
Frontend
Translator
Backend
.....
.....
XMP program
.....
.....
Executable
Comm. libraries
XMP runtime
Omni XMP
C/Fortran+MPI
program
HPL (of HPC Challenge Benchmarks)
nWritten in the global view of XMP/C
nData is distributed in the block-cyclic manner
and DGEMM is invoked for each block.
nOverlapping comm. and calc. using
asynchronous gmove
Dec. 13, 2017 13
double A_L[N][NB];
#pragma xmp align A_L[i][*] with t(*,i)
:
#pragma xmp gmove async(1)
A_L[k:len][0:NB] = A[k:len][j:NB];
:
for(m=j+NB;m<N;m+=NB){
for(n=j+NB;n<N;n+=NB){
cblas_dgemm(&A[m][n], ..);
if(xmp_test_async(1)){
// receive A[k:len][j:NB];
:
10
100
1000
256 2048 16384
423 TFlops (80.7%)
4,096 nodes
TFlops
Number of nodes
971 TFlops (46.3%)
16,384 nodes
ARM HPC Workshop
NICAM-DC (of Fiber Miniapps)
Dec. 13, 2017 ARM HPC Workshop 14
10
15
20
25
30
35
10 20 30 40
Speedup	(MPI/10	=	10)
Number	of	MPI	Processes
XMP MPI
n Written in the local
view of
XMP/Fortran with
coarray.
n The coarray-based
impl. is almost
comparable to the
original MPI-based
one.
XcalableMP2.0
nDynamic multitasking for manycore
processors
⦁ Breakaway from Bulk Synchronous Parallel (BSP)
model.
⦁ More chances for overlapping comm. and
comp.
nEnhancements of loop parallelization
nSupport for newer version of base
languages (Fortran 2008, C99, and C++11)
Dec. 13, 2017 ARM HPC Workshop 15
Summary
n PGAS languages are promising alternatives to MPI.
n XMP is a directive-based PGAS extension for Fortran
and C.
n XMP supports the global- and local-view
programming to achieve both high performance
and productivity.
n XMP will be available on post-K.
Dec. 13, 2017 16
omni-compiler.orgwww.xcalablemp.org
More information is available at:
ARM HPC Workshop

Weitere ähnliche Inhalte

Was ist angesagt?

Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem	Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem Linaro
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemLinaro
 
Linaro HPC Workshop Note
Linaro HPC Workshop NoteLinaro HPC Workshop Note
Linaro HPC Workshop NoteLinaro
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 releaseLuba Tang
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Netronome
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me_xhr_
 

Was ist angesagt? (20)

Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem	Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 
Linaro HPC Workshop Note
Linaro HPC Workshop NoteLinaro HPC Workshop Note
Linaro HPC Workshop Note
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Circuit Simplifier
Circuit SimplifierCircuit Simplifier
Circuit Simplifier
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
Introduction to GPUs in HPC
Introduction to GPUs in HPCIntroduction to GPUs in HPC
Introduction to GPUs in HPC
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
ARM and Machine Learning
ARM and Machine LearningARM and Machine Learning
ARM and Machine Learning
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
Parallel HDF5
Parallel HDF5Parallel HDF5
Parallel HDF5
 
IBM AI at Scale
IBM AI at ScaleIBM AI at Scale
IBM AI at Scale
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 

Ähnlich wie Programming Languages & Tools for Higher Performance & Productivity

Minko - Targeting Flash/Stage3D with C++ and GLSL
Minko - Targeting Flash/Stage3D with C++ and GLSLMinko - Targeting Flash/Stage3D with C++ and GLSL
Minko - Targeting Flash/Stage3D with C++ and GLSLMinko3D
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Kynetics
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...Edge AI and Vision Alliance
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingRuymán Reyes
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCinside-BigData.com
 
Enlightenment Foundation Libraries (Overview)
Enlightenment Foundation Libraries (Overview)Enlightenment Foundation Libraries (Overview)
Enlightenment Foundation Libraries (Overview)Samsung Open Source Group
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learningGanesan Narayanasamy
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Databricks
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 

Ähnlich wie Programming Languages & Tools for Higher Performance & Productivity (20)

Apache Nemo
Apache NemoApache Nemo
Apache Nemo
 
Minko - Targeting Flash/Stage3D with C++ and GLSL
Minko - Targeting Flash/Stage3D with C++ and GLSLMinko - Targeting Flash/Stage3D with C++ and GLSL
Minko - Targeting Flash/Stage3D with C++ and GLSL
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
 
main
mainmain
main
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPC
 
Enlightenment Foundation Libraries (Overview)
Enlightenment Foundation Libraries (Overview)Enlightenment Foundation Libraries (Overview)
Enlightenment Foundation Libraries (Overview)
 
20090720 smith
20090720 smith20090720 smith
20090720 smith
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 

Mehr von Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Mehr von Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Kürzlich hochgeladen

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 

Kürzlich hochgeladen (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Programming Languages & Tools for Higher Performance & Productivity

  • 1. Programming Languages & Tools for Higher Performance & Productivity Hitoshi Murai (RIKEN) Shun Kamatsuka (Fujitsu) Tomotake Nakamura (Fujitsu) Dec. 13, 2017 ARM HPC Workshop 1
  • 2. Introduction of this Session nFor higher performance & productivity on HPC systems, programming environments have a crucial role. ⦁ languages ⦁ compilers ⦁ tools ⦁ libraries nRIKEN AICS and Fujitsu are collaborating to design the programming env. of the upcoming post-K computer. Dec. 13, 2017 ARM HPC Workshop 2
  • 3. Agenda of this Session 1. XcalableMP PGAS Language ⦁ by Hitoshi Murai 2. Advantages of the Compiler for Post-K Computer ⦁ by Shun Kamatsuka 3. Overview of Programming Assistance Tools for Post-K Computer ⦁ by Tomotake Nakamura Dec. 13, 2017 ARM HPC Workshop 3
  • 4. XcalableMP PGAS Language Hitoshi Murai (RIKEN) Dec. 13, 2017 ARM HPC Workshop 4
  • 5. Introduction nMessage Passing Interface (MPI) is a de- facto standard for programming distributed- memory HPC systems. nProgramming with MPI is a very hard work. Dec. 13, 2017 ARM HPC Workshop 5 We are developing the XcalableMP (XMP) PGAS language, which could provide both high performance and productivity, for post-K.
  • 6. What's PGAS? nPartitioned Global Address Space n"Global" ⦁ All processes or threads share one address space and can access to every data in it. n"Partitioned" ⦁ Remote and local data are distinguished and might have different manners and costs of access. Dec. 13, 2017 ARM HPC Workshop 6 p0 p1 p2 p3 PGAS private address space
  • 7. What's ? n A directive-based PGAS language ⦁ Extension for C/Fortran. ⦁ Latest ver. 1.3 is available at: ⦁ Defined by XMP WG of the PC Cluster Consortium. n Two models of PGAS for distributed-memory parallel programming: ⦁ Global view (data/work mapping directives) ⦁ Local view (coarray) n Interoperable with other languages and models (e.g. Python, MPI, OpenMP, OpenACC) Dec. 13, 2017 ARM HPC Workshop 7 www.xcalablemp.org
  • 8. Two Parallelization Models in XMP nGlobal view ⦁ Users specify how a set of nodes cooperate to solve a whole problem. ⦁ Rich directives for data/work mapping and comm. ⦁ Highly productive but suitable mainly to data parallelism. nLocal view ⦁ Users specify how each node works to solve a partial problem. ⦁ Coarray of Fortran 2008. ⦁ Lowly productive but more flexible. Dec. 13, 2017 8ARM HPC Workshop
  • 9. Example of a Global-view XMP Program Dec. 13, 2017 9 real, dimension(lx,ly,lz) :: sr, se, ... ... do iz = 1, lz-1 do iy = 1, ly do ix = 1, lx wu0 = sm(ix,iy,iz ) / sr(ix,iy,iz ) wu1 = sm(ix,iy,iz+1) / sr(ix,iy,iz+1) wv0 = sn(ix,iy,iz ) / sr(ix,iy,iz ) ... ARM HPC Workshop
  • 10. Example of a Global-view XMP Program Dec. 13, 2017 10 !$xmp nodes p(npx,npy,npz) !$xmp template (lx,ly,lz) :: t !$xmp distribute (block,block,block) onto p :: t real, dimension(lx,ly,lz) :: sr, se, ... !$xmp align (ix,iy,iz) with t(ix,iy,iz) :: !$xmp& sr, se, sm, sp, sn, sl, ... !$xmp shadow (1,1,1) :: !$xmp& sr, se, sm, sp, sn, sl, ... ... !$xmp reflect (sr, sm, sp, se, sn, sl) !$xmp loop (ix,iy,iz) on t(ix,iy,iz) do iz = 1, lz-1 do iy = 1, ly do ix = 1, lx wu0 = sm(ix,iy,iz ) / sr(ix,iy,iz ) wu1 = sm(ix,iy,iz+1) / sr(ix,iy,iz+1) wv0 = sn(ix,iy,iz ) / sr(ix,iy,iz ) ... stencil communication work mapping (parallel loops) ARM HPC Workshop data mapping
  • 11. Local-view Programming nCoarray, a PGAS feature of Fortran 2008, is available in XMP/C as well as in XMP/Fortran. nBasic idea: data declared as coarray can be accessed by remote nodes. Dec. 13, 2017 ARM HPC Workshop 11 real a(1024)[*], b(1024) a(512:1024)[1] = b(1:512) sync all float a[1024]:[*], b[1024]; a[512:512]:[0] = b[0:512]; xmp_sync_all(NULL); XMP/Fortran XMP/C 1. An array a is declared as a coarray. 2. A local array section b(1:512) is put to a remote array section a(512:1024) on image 1. 3. A memory fence and barrier synchronization is performed. 1 2 3 1 2 3
  • 12. Omni XcalableMP Compiler n An open-source reference impl. being developed by RIKEN & U. Tsukuba. n Latest Ver. 1.2.2 available at: n Supported platforms include: K, Fujitsu FX100, NEC SX, IBM BlueGene, Hitachi SR, Cray, Linux clusters, etc. n Proven applications include: ⦁ Plasma (3D fluid) ⦁ Seismic Imaging (3D stencil) ⦁ Fusion (Particle-in-Cell) ⦁ etc. Dec. 13, 2017 ARM HPC Workshop 12 omni-compiler.org C/Fortran compiler Frontend Translator Backend ..... ..... XMP program ..... ..... Executable Comm. libraries XMP runtime Omni XMP C/Fortran+MPI program
  • 13. HPL (of HPC Challenge Benchmarks) nWritten in the global view of XMP/C nData is distributed in the block-cyclic manner and DGEMM is invoked for each block. nOverlapping comm. and calc. using asynchronous gmove Dec. 13, 2017 13 double A_L[N][NB]; #pragma xmp align A_L[i][*] with t(*,i) : #pragma xmp gmove async(1) A_L[k:len][0:NB] = A[k:len][j:NB]; : for(m=j+NB;m<N;m+=NB){ for(n=j+NB;n<N;n+=NB){ cblas_dgemm(&A[m][n], ..); if(xmp_test_async(1)){ // receive A[k:len][j:NB]; : 10 100 1000 256 2048 16384 423 TFlops (80.7%) 4,096 nodes TFlops Number of nodes 971 TFlops (46.3%) 16,384 nodes ARM HPC Workshop
  • 14. NICAM-DC (of Fiber Miniapps) Dec. 13, 2017 ARM HPC Workshop 14 10 15 20 25 30 35 10 20 30 40 Speedup (MPI/10 = 10) Number of MPI Processes XMP MPI n Written in the local view of XMP/Fortran with coarray. n The coarray-based impl. is almost comparable to the original MPI-based one.
  • 15. XcalableMP2.0 nDynamic multitasking for manycore processors ⦁ Breakaway from Bulk Synchronous Parallel (BSP) model. ⦁ More chances for overlapping comm. and comp. nEnhancements of loop parallelization nSupport for newer version of base languages (Fortran 2008, C99, and C++11) Dec. 13, 2017 ARM HPC Workshop 15
  • 16. Summary n PGAS languages are promising alternatives to MPI. n XMP is a directive-based PGAS extension for Fortran and C. n XMP supports the global- and local-view programming to achieve both high performance and productivity. n XMP will be available on post-K. Dec. 13, 2017 16 omni-compiler.orgwww.xcalablemp.org More information is available at: ARM HPC Workshop