RIKEN's presentation from recent International Supercomputer Conference - #ISC16. A closer look at their next-generation "Post-K" supercomputer based on #ARM and Fujitsu #HPC SoC
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
1. The Next Flagship Supercomputer in Japan
Yutaka Ishikawa, Project Leader
AICS RIKEN
02:15 pm ‐ 02:45 pm, June 21, 2016
2. Outline of Talk
Introduction of the project
An Overview of the Japanese next flagship supercomputer, so-
called post K
Introduction of International Collaborations
System software stack for post K is being developed with
international collaborations
Concluding Remarks
2ISC'16, June 21, 2016
3. Flagship 2020 project
Developing the next Japanese flagship computer,
so-called “post K”
3
Disaster prevention
and global climate
Energy issues Industrial competitiveness Basic science
Society with health
and longevity
Developing a wide range of application codes,
to run on the “post K”, to solve major social
and science issues
Vendor partner
The Japanese government selected 9 social &
scientific priority issues and their R&D
organizations.
ISC'16, June 21, 2016
4. Disaster prevention
and global climate
Energy issues Industrial competitiveness Basic science
Society with health
and longevity
R&D Organization
4
Target ApplicationsArchitectural Parameters
• #SIMD, SIMD length, #core, #NUMA node
• cache (size and bandwidth)
• memory technologies
• specialized hardware
• Interconnect
• I/O network
ISC'16, June 21, 2016
To build an efficient execution environment in terms of
Power consumption,
Productivity, and
Usability
Application developers are involved in the design
5. Disaster prevention
and global climate
Energy issues Industrial competitiveness Basic science
Society with health
and longevity
R&D Organization
5
Target Applications
Architectural Parameters
• #SIMD, SIMD length, #core, #NUMA node
• cache (size and bandwidth)
• memory technologies
• specialized hardware
• Interconnect
• I/O network
ISC'16, June 21, 2016
To build an efficient execution environment in terms of
Power consumption,
Productivity, and
Usability
Application developers are involved in the design
Mutual understanding both
computer architecture/system software and applications
Looking at performance predictions
Finding out the best solution with constraints, e.g., power
consumption, budget, and space
Prediction of node-level
performance
Profiling applications,
e.g., cache misses
and execution unit
usages
Prediction Tool
Prediction of scalability
(latency and bandwidth)
6. Disaster prevention
and global climate
Energy issues Industrial competitiveness Basic science
Society with health
and longevity
R&D Organization
6
Target Applications
ISC'16, June 21, 2016
• DOE‐MEXT
• JLESC
• …
International
Collaboration
•
• HPCI Consortium
• PC Cluster Consortium
• OpenHPC
• …
Communities
• Univ. of Tsukuba
• Univ. of Tokyo
• Kyoto Univ.
Domestic
Collaboration
7. An Overview of post K
Hardware
Manycore architecture
6D mesh/torus Interconnect
3-level hierarchical storage system
Silicon Disk
Magnetic Disk
Storage for archive
7
Target performance:
100 times (maximum) of K by the capacity computing
50 times (maximum) of K by the capability computing
Power consumption of 30 - 40MW (cf. K computer: 12.7 MW)
Login
Servers
Login
Servers
Maintenance
Servers
Maintenance
Servers
I/O NetworkI/O Network
……
…
…
…
…
…
…
…
…
…
… Hierarchical
Storage System
Hierarchical
Storage System
Portal
Servers
Portal
Servers
System Software
Multi-Kernel: Linux with Light-weight Kernel
File I/O middleware for 3-level hierarchical storage
system and application
Application-oriented file I/O middleware
MPI+OpenMP programming environment
Highly productive programing language and libraries
ISC'16, June 21, 2016
8. What we have done
Software
OS functional design
Communication functional design
File I/O functional design
Programming languages
Mathematical libraries
8
• Node architecture
• System configuration
• Storage system
Continue to design
Hardware
Instruction set architecture
ISC'16, June 21, 2016
9. Instruction Set Architecture
ARM V8 HPC Extension
Fujitsu is a lead partner of ARM HPC extension development
Detailed features will be announced at Hot Chips 28 - 2016
9
http://www.hotchips.org/program/
Mon 8/22 Day1 9:45AM GPUs & HPCs
ARMv8‐A Next Generation Vector Architecture for HPC
Fujitsuʼs inheritances
FMA
Math acceleration primitives
Inter core barrier
Sector cache
Hardware prefetch assist
ISC'16, June 21, 2016
10. Outline of Talk
Introduction of FLAGSHIP2020 project
An Overview of post K system
Introduction of International Collaborations
Concluding Remarks
10
*The Icon is made by Freepik from www.flaticon.com
More than 10 research topics
Collaboration Categories
◎ Collaborative development of open source software
◎ Evaluation and analysis of benchmarks and technologies
◎ Standardization of mature technologies
◎ Pre-standardization interface coordination
◎ Collection and publication of open data
ISC'16, June 21, 2016
11. System Software Collaboration: Example (DOE-MEXT)
11
In terms of Collaborative development of open source software
• Argonne contribution: CH4 hackathon for LLC
• AICS contribution: a part of CH4 implementation
• Memory management for new memory hierarchy
• MPICH and LLC communication libraries
MPICH Software Structure
CH4: the successor of CH3, the current
abstract network device interface
◎ Collaborative development of open source software◎ Evaluation and analysis of benchmarks and technologies
ISC'16, June 21, 2016
12. System Software Collaboration: Example (DOE-MEXT)
12
Northwestern
University
• I/O Benchmarks and pnetCDF
implementations for Scientific
Big Data
PI: Takemasa Miyoshi, RIKEN AICS
“Innovating Big Data Assimilation technology for revolutionizing very‐short‐
range severe weather prediction”
An innovative 30-second super-rapid update numerical weather prediction system for 30-minute/1-
hour severe weather forecasting will be developed, aiding disaster prevention and mitigation, as well
as bringing a scientific breakthrough in meteorology.
The results of 100 ensemble simulations are read by
data assimilation processes and data size in total is
over 1.7 TB
◎ Collaborative development of open source software
ISC'16, June 21, 2016
13. System Software Collaboration: Example
13
• Twice meetings per year
• A researcher visits Intel for a few months
Lightweight kernel
McKernel is running on Intel Xeon and Xeon phi
• Understanding benefit of lightweight kernel
• Understanding differences of McKernel and mOS
• Standardization of API for lightweight kernel (Plan)
intel
◎ Evaluation and analysis of benchmarks and technologies ◎ Pre-standardization interface coordination
ISC'16, June 21, 2016
14. System Software Collaboration: Example (DOE-MEXT)
14
AICS and U Houston, U Tsukuba:
Extension of PGAS (Partitioned Global Address Space)
model with language constructs of multitasking
(multithreading) for manycore‐based exascale systems
(XcalableMP 2.0)
XMP, XcalableMP, is a directive-based language for distributed memory systems
• PGAS language for large scale distributed memory system
• HPF‐like concept and OpenMP‐like description with directives
• Two memory models: Global View and Local View
• Global View: PGAS, image of large array distributed into partial ones in nodes
• Local view: MPI‐like + Coarray notation is allowed
◎ Collaborative development of open source software◎ Evaluation and analysis of benchmarks and technologies
ANL and AICS, U. Tsukuba:
Runtime design for PGAS communication and
multitasking using Argobot light‐weight user‐
level thread.
ISC'16, June 21, 2016
15. Concluding Remarks
Fujitsu decided that post Kʼs CPU is based on ARM V8 with
HPC extension
The usability will be improved than the K computer by
changing architecture
More wide-range community support
The system software stack for Post K is being designed and
implemented with the leverage of international
collaborations
The software stack developed at RIKEN is Open source
It also runs on Intel Xeon and Xeon phi
RIKEN would like to contribute to OpenHPC
15ISC'16, June 21, 2016