SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
montblanc-project.eu | @MontBlanc_EU
This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement n° 671697
The Mont-Blanc project
Updates from the Barcelona Supercomputing Center
Filippo Mantovani
Mont-Blanc
The “legacy” Mont-Blanc vision
Denver, Nov 13th 2017Arm HPC User Group2
Vision: to leverage the fast growing market of mobile technology for
scientific computation, HPC and data centers.
2012 2013 2014 20162015 2017 2018
Mont-Blanc 2
Mont-Blanc 3
Mont-Blanc
The “legacy” Mont-Blanc vision
Phases share a common structure
 Experiment with real hardware
 Android dev-kits, mini-clusters, prototypes, production ready systems
 Push software development
 System software, HPC benchmarks/mini-apps/production codes
 Study next generation architectures
 Learn from hardware deployment and evaluation for planning new systems
Denver, Nov 13th 2017Arm HPC User Group3
Vision: to leverage the fast growing market of mobile technology for
scientific computation, HPC and data centers.
2012 2013 2014 20162015 2017 2018
Mont-Blanc 2
Mont-Blanc 3
We started here We ended up here
Hardware platforms
Denver, Nov 13th 2017Arm HPC User Group4
N. Rajovic et al., “The Mont-Blanc Prototype: An Alternative Approach
for HPC Systems,” in Proceedings of SC’16, p. 38:1–38:12.
We started here We ended up here
 Different OS flavors
 Arm HPC Compiler
 Arm Performance Libraries
 Allinea tools
 …
 All well packed and distributed
through OpenHPC
 Several complex HPC production
codes have run on Mont-Blanc
 Alya
 AVL codes
 WRF
 FEniCS
System Software and Use Cases
Denver, Nov 13th 2017Arm HPC User Group5
Source files (C, C++, FORTRAN, Python, …)
GNU Arm HPC Mercurium
Compilers
Network driverOpenCL driver
Linux OS / Ubuntu
LAPACK Boost PETSc Arm PL
FFTW HDF5ATLAS clBLAS
Scientific libraries
ScalascaPerfExtrae Allinea
Developer tools
SLURMGanglia NTP
OpenLDAPNagios Puppet
Cluster management
Nanos++ OpenCL CUDA MPI
Runtime libraries
Power
monitor
Power
monitor LustreNFSDVFS
Hardware support / Storage
CPU
GPUCPU
CPU
Network
We started here We ended up here
 A Multi-level Simulation
Approach (MUSA) allows us:
 To gather performance traces on
any current HPC architecture
 To replay them using almost any
architecture configuration
 To study scalability and
performance figures at scale,
changing the number of MPI
processes simulated
Study of Next-Generation Architectures
Denver, Nov 13th 2017Arm HPC User Group6
Credits: N. Rajovic
Credits: MUSA team @ BSC
Where BSC is contributing today?
 Evaluation of solutions
 Hardware solutions
• Mini-clusters deployed liaising with SoC providers and system integrators
 Software solutions
• Arm Performance Libraries, Arm HPC Compiler
 Use cases
 Alya: finite element code where we experiment atomics-avoiding techniques
• GOAL: test new runtime features to be pushed into OpenMP
 HPCG: benchmark where we started looking at vectorization
• GOAL: explore techniques for exploitation of the Arm Scalable Vector Extension
 Simulation of next generation large clusters
 MUSA: Combining detailed trace driven simulation with sampling strategies for
exploring how architectural parameters affects the performance at scale.
Denver, Nov 13th 2017Arm HPC User Group7
T. Grass et al., “MUSA: A Multi-level Simulation Approach for
Next-Generation HPC Machines,” in SC16 proceedings, pp. 526–537.
F. Banchelli et al., “Is Arm software ecosystem
ready for HPC?”, poster at SC17.
Evaluation of Arm Performance Libraries
 Goal
 Test an HPC code making use of arithmetic and FFT libraries
 Method
 Quantum Espresso pwscf input
 Compiled with GCC 7.1.0
 Platform configuration #1 (poster SC17)
 AMD Seattle
 Arm PL 2.2
 ATLAS 3.11.39
 OpenBLAS 0.2.20
 FFTW 3.3.6
 Platform configuration #2
 Cavium ThunderX2
 Arm PL v18.0
 OpenBLAS 0.2.20
 FFTW 3.3.7
Denver, Nov 13th 2017Arm HPC User Group8
Evaluation of the Arm HPC Compiler
 Goal
 Evaluate the Arm HPC Compilers v18.0 vs v1.4
 Method
 Run Polybench benchmark suite
 Including 30 benchmarks by Ohio State University
 Run on Cavium ThunderX2
Denver, Nov 13th 2017Arm HPC User Group9
Execution time increment v18.0 vs v1.4
SIMD instructions v18.0 vs v1.4
High Performance Conjugate Gradient
 Problem
 Scalability of HPCG is very limited
 OpenMP parallelization of the reference HPCG version is poor
 Goals
1. Improve OpenMP parallelization of HPCG
2. Study current auto-vectorization for leveraging SVE
3. Analyze other performance limitations (e.g. cache effects)
Denver, Nov 13th 2017Arm HPC User Group10
0,00
2,00
4,00
6,00
8,00
10,00
12,00
1 2 4 8 16 28
SpeedUp
OpenMP Threads
Arm HPC Compiler 1.4 GCC 7.1.0
0,00
2,00
4,00
6,00
8,00
10,00
12,00
1 2 4 8 16 28
SpeedUp
OpenMP Threads
Arm HPC Compiler 1.4 GCC 7.1.0
On Cavium ThunderX2
High Performance Conjugate Gradient
 Problem
 Scalability of HPCG is very limited
 OpenMP parallelization of the reference HPCG version is poor
 Goals
1. Improve OpenMP parallelization of HPCG
2. Study current auto-vectorization for leveraging SVE
3. Analyze other performance limitations (e.g. cache effects)
Denver, Nov 13th 2017Arm HPC User Group11
On Cavium ThunderX2
HPCG - SIMD parallelization
 First approach
 Check auto-vectorization in current platforms
 Method
 Count SIMD instructions in the “ComputeSYMGS” region
 On Cavium ThunderX2 using Arm HPC Compiler v18.0
 On Intel Xeon Platinum 8160 (Skylake) using ICC supporting AVX512
Denver, Nov 13th 2017Arm HPC User Group12
x106
HPCG - SVE emulation
 First approach
 Check auto-vectorization when SVE is enabled
 Method
 Evaluate auto-vectorization in a whole execution of HPCG (one iteration)
 Generate binary using Arm HPC Compiler v1.4 enabling SVE
 Emulate SVE instruction using Arm Instruction Emulator in Cavium ThunderX2
Denver, Nov 13th 2017Arm HPC User Group13
0
5
10
15
20
25
30
35
SVE 128b SVE 256b SVE 512b SVE 1024b SVE 2048b
IncrementinSIMDinstructionsagainstNEON
HPGC - Memory access evaluation
 Cache hit ratio degraded when using multi-coloring approaches
 Data related to ComputeSYMGS
 Gathered on Cavium ThunderX2
 Compiled with GCC
 Next steps
 Optimize data access patterns in memory
 Simulate “SVE gather load” instructions in order to quantify the benefits
Denver, Nov 13th 2017Arm HPC User Group14
~13% L1D miss ratio ~35% L2D miss ratio
0% 100% 0% 100%
Alya: BSC code for multi-physics problems
 Analysis with Paraver:
 Reductions with indirect accesses on large arrays using
 No coloring
Use of atomics operations harms performance
 Coloring
Use of coloring harms locality
 Commutative Multidependences
• (OmpSs feature to be hopefully
included in OpenMP)
Denver, Nov 13th 2017Arm HPC User Group15
Parallelization of finite elements code
Credits: M. Garcia, J. Labarta
Alya: taskification and dynamic load balancing
 Goal
 Quantify the effect of commutative dependences and DLB on an HPC code
 Method
 Run the “Assembly phase” of Alya (containing atomics)
 On MareNostrum 3, 2x Intel Xeon SandyBridge-EP E5-2670
 On Cavium ThunderX, 2x CN8890
Denver, Nov 13th 2017Arm HPC User Group16
16 nodes x P processes/node x T threads/process
Assembly phase
Credits: M. Josep, M. Garcia, J. Labarta
Multi-Level Simulation Approach
 Level 1: Trace generation
Denver, Nov 13th 2017Arm HPC User Group17
HPC application execution
OpenMP Runtime
System Plugin
MPI Call
Instrumenatation
Pintool /
DynamoRIO
Task / chunk
creation events,
dependencies
MPI calls
Dynamic
instructions
Trace
Credits: T. Grass, C. Gomez, M. Casas, M. Moreto
 Level 2: Network simulation (Dimemas)
 Level 3: Multi-core simulation (TaskSim + Ramulator + McPAT)
Multi-Level Simulation Approach
Time
Rank 1
Rank 2
……
Network simulator
Multi-core simulator
Thread 1
Thread 2
……
Time
Denver, Nov 13th 2017Arm HPC User Group18
Trace
Credits: T. Grass, C. Gomez, M. Casas, M. Moreto
Multi-Level Parameters
 Architectural
 CPU architecture
 Number of cores
 Core frequency
 Threads per core
 Reorder buffer size
 SIMD width
 Micro-architectural
 L1/2/3 Cache size/latency
 Main memory
 Memory technology
 Capacity
 Bandwidth
 Latency
Problem:
Simulation time diverges
Solution:
We supported different modes
(Burst, Detailed, Sampling)
trading accuracy for speed
Denver, Nov 13th 2017Arm HPC User Group19
Credits: T. Grass, C. Gomez, M. Casas, M. Moreto
MUSA: status
 SC’16 paper
 Validation of the methodology
with 5 applications
• BT-MZ, SP-MZ, LU-MZ, HYDRO, SPECFEM3D
 Proven performance figures
at scale up to 16 kMPI ranks
 Status update
 Added parameter sets
for state-of-the art architectures
 Support for power consumption modeling
• Including CPU, NoC and memory hierarchy
 Incremented set of applications
 Expanded trace database
• Including traces gathered on
MareNostrum4 (Intel Skylake + OmniPath)
 Included support for DynamoRIO
Denver, Nov 13th 2017Arm HPC User Group20
Credits: T. Grass, C. Gomez, M. Casas, M. Moreto
Student Cluster Competition
 Rules
 12 teams of 6 undergraduate students
 1 cluster operating within 3 kW power budget
 3 HPC applications + 2 benchmarks
 One team from University
Politècnica de Catalunya (UPC-Spain)
 Participating with
Mont-Blanc technology
 3 awards to win
 Best HPL
 1st, 2nd, 3rd overall places
 Fan favorite
We are looking for
an Arm-based
cluster for 2018!!!
Denver, Nov 13th 2017Arm HPC User Group21
Interested in any of the topics presented?
Follow us!
montblanc-project.eu @MontBlanc_EU filippo.mantovani@bsc.es
Visit our booths @ SC17!
booth #1694
booth #1925
booth #1975
Denver, Nov 13th 2017Arm HPC User Group22

Weitere ähnliche Inhalte

Was ist angesagt?

A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networksFörderverein Technische Fakultät
 
Addressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC RuntimesAddressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC Runtimesinside-BigData.com
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021OpenACC
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingJonathan Dursi
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...Edge AI and Vision Alliance
 
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Pradeeban Kathiravelu, Ph.D.
 
Introduction to OpenVX
Introduction to OpenVXIntroduction to OpenVX
Introduction to OpenVX家榮 張
 
OpenACC Monthly Highlights
OpenACC Monthly HighlightsOpenACC Monthly Highlights
OpenACC Monthly HighlightsNVIDIA
 
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Ilham Amezzane
 
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCINGnexgentechnology
 

Was ist angesagt? (20)

A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networks
 
Addressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC RuntimesAddressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC Runtimes
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
 
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
 
RL-Cache: Learning-Based Cache Admission for Content Delivery
RL-Cache: Learning-Based Cache Admission for Content DeliveryRL-Cache: Learning-Based Cache Admission for Content Delivery
RL-Cache: Learning-Based Cache Admission for Content Delivery
 
Introduction to OpenVX
Introduction to OpenVXIntroduction to OpenVX
Introduction to OpenVX
 
OpenACC Monthly Highlights
OpenACC Monthly HighlightsOpenACC Monthly Highlights
OpenACC Monthly Highlights
 
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
 
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 

Ähnlich wie Update on the Mont-Blanc Project for ARM-based HPC

Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)Mahadevan N
 
OpenACC Monthly Highlights - March 2018
OpenACC Monthly Highlights - March 2018OpenACC Monthly Highlights - March 2018
OpenACC Monthly Highlights - March 2018NVIDIA
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
OpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly HighlightsOpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly HighlightsOpenACC
 
OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022OpenACC
 
OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018NVIDIA
 
OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019OpenACC
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsGanesan Narayanasamy
 
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptxOpenACC and Open Hackathons Monthly Highlights: July 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptxOpenACC
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com
 
OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC
 
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale SystemsDesigning Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systemsinside-BigData.com
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksDataBench
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOpsTimothy Chen
 

Ähnlich wie Update on the Mont-Blanc Project for ARM-based HPC (20)

Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)
 
OpenACC Monthly Highlights - March 2018
OpenACC Monthly Highlights - March 2018OpenACC Monthly Highlights - March 2018
OpenACC Monthly Highlights - March 2018
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
OpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly HighlightsOpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly Highlights
 
OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022
 
OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018
 
OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systems
 
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptxOpenACC and Open Hackathons Monthly Highlights: July 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptx
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 
OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019
 
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale SystemsDesigning Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data Benchmarks
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOps
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Update on the Mont-Blanc Project for ARM-based HPC

  • 1. montblanc-project.eu | @MontBlanc_EU This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement n° 671697 The Mont-Blanc project Updates from the Barcelona Supercomputing Center Filippo Mantovani
  • 2. Mont-Blanc The “legacy” Mont-Blanc vision Denver, Nov 13th 2017Arm HPC User Group2 Vision: to leverage the fast growing market of mobile technology for scientific computation, HPC and data centers. 2012 2013 2014 20162015 2017 2018 Mont-Blanc 2 Mont-Blanc 3
  • 3. Mont-Blanc The “legacy” Mont-Blanc vision Phases share a common structure  Experiment with real hardware  Android dev-kits, mini-clusters, prototypes, production ready systems  Push software development  System software, HPC benchmarks/mini-apps/production codes  Study next generation architectures  Learn from hardware deployment and evaluation for planning new systems Denver, Nov 13th 2017Arm HPC User Group3 Vision: to leverage the fast growing market of mobile technology for scientific computation, HPC and data centers. 2012 2013 2014 20162015 2017 2018 Mont-Blanc 2 Mont-Blanc 3
  • 4. We started here We ended up here Hardware platforms Denver, Nov 13th 2017Arm HPC User Group4 N. Rajovic et al., “The Mont-Blanc Prototype: An Alternative Approach for HPC Systems,” in Proceedings of SC’16, p. 38:1–38:12.
  • 5. We started here We ended up here  Different OS flavors  Arm HPC Compiler  Arm Performance Libraries  Allinea tools  …  All well packed and distributed through OpenHPC  Several complex HPC production codes have run on Mont-Blanc  Alya  AVL codes  WRF  FEniCS System Software and Use Cases Denver, Nov 13th 2017Arm HPC User Group5 Source files (C, C++, FORTRAN, Python, …) GNU Arm HPC Mercurium Compilers Network driverOpenCL driver Linux OS / Ubuntu LAPACK Boost PETSc Arm PL FFTW HDF5ATLAS clBLAS Scientific libraries ScalascaPerfExtrae Allinea Developer tools SLURMGanglia NTP OpenLDAPNagios Puppet Cluster management Nanos++ OpenCL CUDA MPI Runtime libraries Power monitor Power monitor LustreNFSDVFS Hardware support / Storage CPU GPUCPU CPU Network
  • 6. We started here We ended up here  A Multi-level Simulation Approach (MUSA) allows us:  To gather performance traces on any current HPC architecture  To replay them using almost any architecture configuration  To study scalability and performance figures at scale, changing the number of MPI processes simulated Study of Next-Generation Architectures Denver, Nov 13th 2017Arm HPC User Group6 Credits: N. Rajovic Credits: MUSA team @ BSC
  • 7. Where BSC is contributing today?  Evaluation of solutions  Hardware solutions • Mini-clusters deployed liaising with SoC providers and system integrators  Software solutions • Arm Performance Libraries, Arm HPC Compiler  Use cases  Alya: finite element code where we experiment atomics-avoiding techniques • GOAL: test new runtime features to be pushed into OpenMP  HPCG: benchmark where we started looking at vectorization • GOAL: explore techniques for exploitation of the Arm Scalable Vector Extension  Simulation of next generation large clusters  MUSA: Combining detailed trace driven simulation with sampling strategies for exploring how architectural parameters affects the performance at scale. Denver, Nov 13th 2017Arm HPC User Group7 T. Grass et al., “MUSA: A Multi-level Simulation Approach for Next-Generation HPC Machines,” in SC16 proceedings, pp. 526–537. F. Banchelli et al., “Is Arm software ecosystem ready for HPC?”, poster at SC17.
  • 8. Evaluation of Arm Performance Libraries  Goal  Test an HPC code making use of arithmetic and FFT libraries  Method  Quantum Espresso pwscf input  Compiled with GCC 7.1.0  Platform configuration #1 (poster SC17)  AMD Seattle  Arm PL 2.2  ATLAS 3.11.39  OpenBLAS 0.2.20  FFTW 3.3.6  Platform configuration #2  Cavium ThunderX2  Arm PL v18.0  OpenBLAS 0.2.20  FFTW 3.3.7 Denver, Nov 13th 2017Arm HPC User Group8
  • 9. Evaluation of the Arm HPC Compiler  Goal  Evaluate the Arm HPC Compilers v18.0 vs v1.4  Method  Run Polybench benchmark suite  Including 30 benchmarks by Ohio State University  Run on Cavium ThunderX2 Denver, Nov 13th 2017Arm HPC User Group9 Execution time increment v18.0 vs v1.4 SIMD instructions v18.0 vs v1.4
  • 10. High Performance Conjugate Gradient  Problem  Scalability of HPCG is very limited  OpenMP parallelization of the reference HPCG version is poor  Goals 1. Improve OpenMP parallelization of HPCG 2. Study current auto-vectorization for leveraging SVE 3. Analyze other performance limitations (e.g. cache effects) Denver, Nov 13th 2017Arm HPC User Group10 0,00 2,00 4,00 6,00 8,00 10,00 12,00 1 2 4 8 16 28 SpeedUp OpenMP Threads Arm HPC Compiler 1.4 GCC 7.1.0 0,00 2,00 4,00 6,00 8,00 10,00 12,00 1 2 4 8 16 28 SpeedUp OpenMP Threads Arm HPC Compiler 1.4 GCC 7.1.0 On Cavium ThunderX2
  • 11. High Performance Conjugate Gradient  Problem  Scalability of HPCG is very limited  OpenMP parallelization of the reference HPCG version is poor  Goals 1. Improve OpenMP parallelization of HPCG 2. Study current auto-vectorization for leveraging SVE 3. Analyze other performance limitations (e.g. cache effects) Denver, Nov 13th 2017Arm HPC User Group11 On Cavium ThunderX2
  • 12. HPCG - SIMD parallelization  First approach  Check auto-vectorization in current platforms  Method  Count SIMD instructions in the “ComputeSYMGS” region  On Cavium ThunderX2 using Arm HPC Compiler v18.0  On Intel Xeon Platinum 8160 (Skylake) using ICC supporting AVX512 Denver, Nov 13th 2017Arm HPC User Group12 x106
  • 13. HPCG - SVE emulation  First approach  Check auto-vectorization when SVE is enabled  Method  Evaluate auto-vectorization in a whole execution of HPCG (one iteration)  Generate binary using Arm HPC Compiler v1.4 enabling SVE  Emulate SVE instruction using Arm Instruction Emulator in Cavium ThunderX2 Denver, Nov 13th 2017Arm HPC User Group13 0 5 10 15 20 25 30 35 SVE 128b SVE 256b SVE 512b SVE 1024b SVE 2048b IncrementinSIMDinstructionsagainstNEON
  • 14. HPGC - Memory access evaluation  Cache hit ratio degraded when using multi-coloring approaches  Data related to ComputeSYMGS  Gathered on Cavium ThunderX2  Compiled with GCC  Next steps  Optimize data access patterns in memory  Simulate “SVE gather load” instructions in order to quantify the benefits Denver, Nov 13th 2017Arm HPC User Group14 ~13% L1D miss ratio ~35% L2D miss ratio 0% 100% 0% 100%
  • 15. Alya: BSC code for multi-physics problems  Analysis with Paraver:  Reductions with indirect accesses on large arrays using  No coloring Use of atomics operations harms performance  Coloring Use of coloring harms locality  Commutative Multidependences • (OmpSs feature to be hopefully included in OpenMP) Denver, Nov 13th 2017Arm HPC User Group15 Parallelization of finite elements code Credits: M. Garcia, J. Labarta
  • 16. Alya: taskification and dynamic load balancing  Goal  Quantify the effect of commutative dependences and DLB on an HPC code  Method  Run the “Assembly phase” of Alya (containing atomics)  On MareNostrum 3, 2x Intel Xeon SandyBridge-EP E5-2670  On Cavium ThunderX, 2x CN8890 Denver, Nov 13th 2017Arm HPC User Group16 16 nodes x P processes/node x T threads/process Assembly phase Credits: M. Josep, M. Garcia, J. Labarta
  • 17. Multi-Level Simulation Approach  Level 1: Trace generation Denver, Nov 13th 2017Arm HPC User Group17 HPC application execution OpenMP Runtime System Plugin MPI Call Instrumenatation Pintool / DynamoRIO Task / chunk creation events, dependencies MPI calls Dynamic instructions Trace Credits: T. Grass, C. Gomez, M. Casas, M. Moreto
  • 18.  Level 2: Network simulation (Dimemas)  Level 3: Multi-core simulation (TaskSim + Ramulator + McPAT) Multi-Level Simulation Approach Time Rank 1 Rank 2 …… Network simulator Multi-core simulator Thread 1 Thread 2 …… Time Denver, Nov 13th 2017Arm HPC User Group18 Trace Credits: T. Grass, C. Gomez, M. Casas, M. Moreto
  • 19. Multi-Level Parameters  Architectural  CPU architecture  Number of cores  Core frequency  Threads per core  Reorder buffer size  SIMD width  Micro-architectural  L1/2/3 Cache size/latency  Main memory  Memory technology  Capacity  Bandwidth  Latency Problem: Simulation time diverges Solution: We supported different modes (Burst, Detailed, Sampling) trading accuracy for speed Denver, Nov 13th 2017Arm HPC User Group19 Credits: T. Grass, C. Gomez, M. Casas, M. Moreto
  • 20. MUSA: status  SC’16 paper  Validation of the methodology with 5 applications • BT-MZ, SP-MZ, LU-MZ, HYDRO, SPECFEM3D  Proven performance figures at scale up to 16 kMPI ranks  Status update  Added parameter sets for state-of-the art architectures  Support for power consumption modeling • Including CPU, NoC and memory hierarchy  Incremented set of applications  Expanded trace database • Including traces gathered on MareNostrum4 (Intel Skylake + OmniPath)  Included support for DynamoRIO Denver, Nov 13th 2017Arm HPC User Group20 Credits: T. Grass, C. Gomez, M. Casas, M. Moreto
  • 21. Student Cluster Competition  Rules  12 teams of 6 undergraduate students  1 cluster operating within 3 kW power budget  3 HPC applications + 2 benchmarks  One team from University Politècnica de Catalunya (UPC-Spain)  Participating with Mont-Blanc technology  3 awards to win  Best HPL  1st, 2nd, 3rd overall places  Fan favorite We are looking for an Arm-based cluster for 2018!!! Denver, Nov 13th 2017Arm HPC User Group21
  • 22. Interested in any of the topics presented? Follow us! montblanc-project.eu @MontBlanc_EU filippo.mantovani@bsc.es Visit our booths @ SC17! booth #1694 booth #1925 booth #1975 Denver, Nov 13th 2017Arm HPC User Group22