Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

byteLAKE's NVIDIA GPU Solutions

118 Aufrufe

Veröffentlicht am

We build AI and HPC solutions. Expertise: highly optimized AI Engines and HPC Apps.

• HPC: accelerating time to results and adapting complex algorithms to GPU, FPGA, many-CPU architectures.

Leverage byteLAKE expertise in complex algorithms adaptation and optimization for NVIDIA GPUs, Xilinx Alveo FPGAs, Intel, AMD and ARM solutions. From single nodes to clusters.

More: www.byteLAKE.com/en/NVIDIA

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

byteLAKE's NVIDIA GPU Solutions

  1. 1. NVIDIA Solutions
  2. 2. • Expertise in all possible configurations – desktop, mobile, server – Tesla, Fermi, Kepler (K80, GeForce GTX Titan, Jetson), Maxwell (NVIDIA GeForce GTX 980), Pascal (P100), Tesla V100, T4 – CUDA, OpenCL, OpenACC • Several case studies delivered – AI training (machine/ deep learning) – Edge AI inferencing – Classic HPC simulations (CFD, weather) • Very active in research space – several publications for prestigious journals (Concurrency and Computation: Practice and Experience, Parallel Computing, Journal of Supercomputing etc.) Exceptional experience with NVIDIA More at: byteLAKE.com/en/Nvidia
  3. 3. HPC Simulations optimized by Machine Learning automatic adaptation of algorithm to a specific hardware architecture • Enables software portability between different architectures: – CPU: different number of cores, hierarchy of memory, caches size; – GPU: register file reusing, shared memory utilization, GPU direct support, reduction of global memory transaction; – HPC: selecting the right number of nodes, scalability estimation, overlapping data transfers & communication; – Hybrid: load balancing (i.e. selecting appropriate parts of the algorithm for different devices executing the code with different performance) • Helps build adaptable algorithms: – automatically selecting the size of data blocks, number of threads, number of processes, precision of data (i.e. depending on algorithm or input data characteristics some data can be stored using double, single of half precision format) – selecting the criterion of optimization: performance, energy consumption, accuracy of result, mix (i.e. mix of performance & energy to minimize energy and keep execution time or to optimization performance & keep energy budget) • Can auto-configure the system and provide the most suitable compiler flags byteLAKE’s Software Autotuning
  4. 4. • Goal: Reducing the energy consumption of the MPDATA algorithm (algorithm for numerical simulation of geophysical fluids flows on micro-to-planetary scales – especially used in a numerical weather prediction). • Hardware: Piz Daint supercomputer (ranked 3-rd at top 500), equipped with the most advanced Pascal-based GPUs: NVIDIA Tesla P100. • Idea: Applying mixed precision arithmetic - set a part of operations to be performed in a single precision (32-bits) and the remaining set to double (64-bits). • Why do we use it? A single simulation of the weather phenomenon needed more than 1013 operations. We suspected that not all of them needs double precision arithmetic to preserve the same simulation accuracy. We believe that the control of the precision and accuracy of numerical results can increase the performance, decrease the energy consumption, and provide highly accurate results. • Solution: We used unsupervised learning to estimate the correlation between the precision of each matrix and their influence on criteria (energy, accuracy of results). During the dynamic and short training stage we evaluated the set of operations that could be performed in a single precision without loss in accuracy of the weather simulation. Research Case: (concluded) CFD acceleration with GPU (MPDATA, weather forecast) Results: We reduced energy by 33%, increased performance by the factor of 1.27x using 25% less GPUs, keeping the accuracy of the results at the same level as when using double precision arithmetic.
  5. 5. 5 Research Case: Reconfiguring HPC Simulation with AI to optimize performance and energy node count accelerators per node memory alignment streams count buffering types … cpu cores memory policy Ca. 5000 possible configurations This module utilizes among others the supervised learning method with the random forest algorithm. The main functionality of the module is to prune the search space in order to eliminate the worst configurations. We develop a Machine Learning module in order to select the most fitting configuration. In this way we achieve a small set that at 90% contains the best configuration. More at: bytelake.com/en/case-studies/hpc-configuration-optimized/
  6. 6. MPDATA Accelerated CFD / Advection algorithm optimized for heterogeneous computing.
  7. 7. CFD Computational Fluid Dynamics • Numerical analysis and algorithms to solve fluid flows problems –how liquids and gases flow and interact with surfaces • Widely used across industries: –automotive, chemical, aerospace, biomedical, power and energy, and construction etc. • Typical applications –weather simulations, –aerodynamic characteristics modelling and optimization, –petroleum mass flow rate assessment 7
  8. 8. • MPDATA (Multidimensional Positive Definite Advection Transport Algorithm) – main part of the dynamic core of the Eulerian/ semi-Lagrangian (EULAG) model – EULAG (MPDATA+elliptic solver) is the established computational model, developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios – currently, this model is being implemented as the new dynamic core of the COSMO (Consortium for Small-scale Modeling) weather prediction framework – advection (together with the elliptic solver) is a key part of many frameworks that allow users to implement their simulations • Advection – movement of some material (dissolved or suspended) in the fluid. Algorithm: Advection (MPDATA) General Information
  9. 9. • Easy to integrate – Can work as a standalone application or be called as a function via our dedicated interface (e.g. can be called as a function with input and output arrays) – Compatible with frameworks like TensorFlow for integrating deep learning with CFD codes • Easy to visualize the results – Results can be stored in a raw format as a binary file of the output arrays or converted via byteLAKE tools to a ParaView format • See benefits already in 1-node HPC configurations – Strongly adapted to Alveo U250, were single card supports the max size of arrays: 2,1 Gcells (max compute domain: 1264 x 1264 x 1264) ~ 60 GB • Scalable to many cards per node and many nodes Algorithm: Advection (MPDATA) byteLAKE’s implementation compatibility
  10. 10. • First-order-accurate step of the advection scheme. Second-order is an option. • Input data – Array X – non-diffusive quantity (e.g. temperature of water vapor, ice, precipitation, etc.) – Arrays V1, V2, V3 - each of them stores the velocity vectors in one direction – (optional) Arrays Fi, Fe - implosion and explosion forces acting on a structure of X – (optional) Array D with density – (optional) Array rho which defines an interface for the coupling of COSMO and EULAG dynamic core (used to provide the transformation of the X variable) – DT – time step (scalar) • Output data – single X array that was updated in the given time step Algorithm: Advection (MPDATA) Technical Information
  11. 11. • Applications include – To characterize the sub-grid scales effect in global numerical simulations of turbulent stellar interiors – To compare anelastic and compressible convection-permitting weather forecasts for the Alpine region – Modeling the prediction of forest fire spread – Flood simulations – Biomechanical modeling of brain injuries within the Voigt model (a linear system of differential equations where the motion of the brain tissue depends merely on the balance between viscous and elastic forces) – Simulation gravity wave turbulence in the Earth's atmosphere – Simulation of geophysical turbulence in the Earth's atmosphere – Ocean modeling: simulation of three-dimensional solitary wave generation and propagation using EULAG coupled to the barotropic NCOM (Navy Coastal Ocean Model) tidal model 11 Applications of Advection (MPDATA)
  12. 12. • Applications include cont. – Oil and Gas: provides a significant return on investment (ROI) in seismic analysis, reservoir modelling and basin modelling. Used also to monitor drilling and seismic data to optimize drilling trajectories and minimize environmental risk. – AgriTech: models to track and predict various environmental impacts on crop yield such as weather changes. For example, daily weather predictions can be customized based on the needs of each client and range from hyperlocal to global. • Example adopters – Poznan Supercomputing and Networking Center, Poland: prognosis of air pollution – European Centre for Medium-Range Weather Forecasts, UK: weather forecast – Institute of Meteorology and Water Management, Poland: weather forecasts – German Aerospace Center: aeronautics, transport and energy areas – University of Cape Town, RPA: weather simulation – Montreal University: weather simulation – Warsaw University: ocean simulation Applications of Advection (MPDATA), cont. Full list
  13. 13. 12x better performance 30% reduced energy consumption • Our solution: machine learning managed, dynamic application of mixed precision • Highlights: – Dynamic estimation of the algorithm’s power consumption as a function of the frequency of the processor and the number of cores. – Energy-aware task management – Auto-tuning procedure taking into account algorithm’s and GPU-specific parameters for auto-configuring purposes. – Result: better performance, less energy consumed. Weather engine optimized for Europe’s fastest supercomputer (Piz Daint) Our mechanism provides the energy savings of up to 1.43x comparing to the default Linux scaling governor. More at: byteLAKE.com/en/MPDATA
  14. 14. Dynamic Mixed Precision, cont. We reduced E by 33%, increased performance by the factor of 1.27x using 25% less GPUs. We kept the accuracy of the results at the same level as when using double precision arithmetic.
  15. 15. Dynamic Mixed Precision Optimize execution time • Ported geophysical model (EULAG) to a parallel computing supercomputer architecture (Piz Daint) • Used Machine Learning (Random Forest) to optimize various numerical parameters as: data blocks sizes, number of GPU streams, sizes of vector data types Optimize energy efficiency • Developed a mechanism (mixed precision) that allowed for providing a low energy consumption of supercomputers keeping the code performance at the highest possible level • Developed a framework, based on software automatic tuning approach Results ✓ 10 times faster ✓ Then we improved it even more, reaching the speed-up of 1.27 ✓ Energy consumption reduced by 33% ✓ Optimized GPUs usage while keeping the accuracy of computations Highlights: • C++, CUDA, MPI, OpenMP
  16. 16. Our Research Studies GO PUBLIC! More at: byteLAKE.com/en/research
  17. 17. Explore byteLAKE’s CFD Suite www.byteLAKE.com/en/CFDSuite byteLAKE’s CFD Suite AI for CFD
  18. 18. AI • highly optimized AI engines to analyze text, image, video, sound and time series data. • Detecting shapes & patterns. • Complex tasks automation. • IoT/ edge, Cloud, on-premise. HPC • accelerating time to results and adapting complex algorithms to GPU, FPGA, many-CPU architectures. • From single nodes to clusters. Meet byteLAKE AI and HPC Experts Your software partner for AI & HPC projects Experts in adapting & optimizing software for Select Products AI for CFD. Ultra fast results, radically lower TCO. New possibilities. Objects Detection Edge AI and real time computer vision. 56x faster AI training. R&I • R&D • Licensing
  19. 19. HPC at byteLAKE Accelerating time to results and adapting complex algorithms to GPU, FPGA, many-CPU architectures. Unleashing the power: • selecting the right programming model to a given problem (task parallelism, data parallelism, mixture of these two) • providing the right balance between CPUs and GPUs/FPGAs • optimizing data transfers between host memory and accelerators • code adaptation to a variety of computing platforms Bottom line: lowering TCO thru various optimizations (performance, energy efficiency, accuracy of calculations) More at: byteLAKE.com/en/HPC Making the most of the hardware: • Speedup: accelerating time to results for complex algorithms • Green Computing: optimizing algorithms to reduce energy consumption • Scalability: from single nodes to clusters
  20. 20. Products and Services Cognitive AutomationEdge AI Services HPC Products CFD Suite brainello Ewa Guard Federated Learning Green Computing (FPGA, GPU) Intelligent Restaurant Incubation
  21. 21. byteLAKE among top AI companies in Poland! "It contains information on practically all meaningful companies operating in Poland which offer services or products in the field of modern technologies. We believe this map will be necessary to help both domestic and international investors looking for interesting projects in Poland.", Aleksander Kutela, President of Digital Poland Foundation