CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
Possibility of hpc application on cloud infrastructure by container cluster
1. Possibility of HPC application on Cloud
infrastructure by container cluster
22nd IEEE International Conference on Computational Science and Engineering
(IEEE CSE 2019)
Kyunam Cho†, Hyunseok Lee†, Kideuk Bang†, Sungsoo Kim†
† Samsung Research, Samsung Electronics, Republic of Korea
2.Aug.2019(Fri)
Kyunam Cho
2. Introduction
2
• Revolution Artificial Intelligence (AI) in the past 2-3 years. Increasing
in demand for HPC infrastructure changed the way infrastructure is
shared
e.g. enabling AI infrastructure in public cloud
• AI technology requires large computations in GPGPU. There is no
performance loss in the GPGPU resources while cloud technology
provides cloud environment.
• A Linux container (LXC) is a new method for creating a virtual
environment in cloud environment with low performance overhead.
Contribution
: Evaluation and identifying possibility of container technology in HPC
Motivation
: Increased demand for large scale calculation using AI application and
evolution of container technology have increased the possibility
• We evaluate and compare the performance of several applications on
cloud infrastructure.
• We observe that possibility of using HPC application on cloud
infrastructure conditionally.
• We identify that there is no performance overhead for cache miss rate
and InfiniBand latency on cloud infrastructure.
High Performance Computing
Artificial intelligence
ⓒ Kamran kowsari@Wikimedia Commons
Application Disaster
recovery
HPC Medical Transportation
3. Experiments and Evaluation details
• Some comparisons between native environment and container environments
- A) MPI application scalability and B) cache miss rate, C) InfiniBand latency , and D) Machine learning training application performance
- Native environment : no software stacking over OS, Container environment : which is built using container technology
• Hardware environment and Experiment methodology
3
CPU Cluster GPU Cluster
• 2 x 2.2GHz
Intel Xeon-Broadwell
(E5-2640-V4) CPUs
• 8 x Micron 8GB DDR4
• SuperMicro AOC-UR-i4XT
network card With a
network max speed of
10,000Mbps
• 2 x 2.6GHz
Intel Xeon (E5-2690-V4)
CPUs
• 32 x Samsung 8 GB
DDR4
• MCX555A-ECAT /
Connect X® -5 VPI
adapter card and EDR IB
(100Gb/s) network card
• 8 x NVIDIA Tesla P40
model GPGPU cards
Experiment Methodology
A) MPI application
scalability
Measurement of Poisson’s
equation solver scalability
B) MPI application
cache miss rate
Measurement of Poisson’s
equation solver’s cache miss rate
using by valgrind–cachegrind
C) InfiniBand latency
Measuring InfiniBand bandwidth
and latency using openfabrics
enterprise distribution
D) Machine learning
training application
performance
Evaluating the two machine
learning training applications :
ResNet50 and RNNs with LSTM
4. Results and Discussion
A) MPI application scalability
- Measurement of Poisson’s equation solver scalability
- Only asynchronous communication manner for scalability and strong scale, DOF of Matrix : 40,401
- Max iterations of conjugate gradient : 10,000, Tolerance for solution : 1.0-10
4
Overhead represents overhead in container environment against native environment
5. Results and Discussion
A) MPI application scalability
- Efficiency of communication optimization in both environments
- Collective communication, Peer to peer with synchronous and asynchronous communication
5
6. Results and Discussion
B) MPI application cache miss rate
- Measurement of Poisson’s equation solver’s cache miss rate using by valgrind–cachegrind
- A cache miss rate measurement is performed separately from MPI application scalability
6
7. Results and Discussion
C) InfiniBand latency
- Measuring InfiniBand bandwidth and latency using openfabrics enterprise distribution
- Kubernetes CNI : flannel, Send Data size : 83Mb (8,388,608 bytes), Repeat : 1k times
7
8. Results and Discussion
D) Machine learning training application performance
- Evaluating the two machine learning training applications : ResNet50 and RNNs with LSTM
- GPGPU numbers : 64 GPGPU cards on 8 physical servers with each server containing 8 GPGPU cards
8
9. Conclusions and Future work
• We observed that
- There was performance overhead in CPU oriented applications
- Communication optimization method could be applied in container technology
- Cannot find cache miss rate overhead in container environment
- No performance loss in InfiniBand usage, too
- Observe that machine learning training application have very small overhead in container environment
• Future works
- Will investigate and find the most suitable network configuration in container environment for HPC application
- Shall study the best fit optimization method for HPC application in container-based environment
9