SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Improving Uintah’s Scalability
Through the Use of Portable
Kokkos-Based Data Parallel Tasks
John Holmen1, Alan Humphrey1, Daniel Sunderland2, Martin Berzins1
University of Utah1
Sandia National Laboratories2
Uintah Architecture
Simulation
Controller
Scheduler
Load
Balancer
Runtime System
ARCHES DSL: NEBO
PIDX
VisIT
UQ DRIVERS
CPUsGPUs Xeon Phis
Task
Data
Warehouse
Hypre Linear Solver
• Open source software
• Worldwide distribution
• Broad user base
• Applications code programming
model
• Physics routines unaware of
communications
• Automatically generated abstract
C++ task graph
• Adaptive execution of tasks by the
runtime system
• Asynchronous out-of-order
execution,
• work stealing,
• overlapping of communication
& computation
3
Uintah’s Heterogeneous Runtime System
• MPI+X schedulers support:
• MPI + PThreads + CUDA
• MPI + Kokkos
• Shared memory model on-
node
• 1 MPI process per node
4
Exascale Target Problem
DOE NNSA PSAAP II Center
• Modeling an Alstom Power 1000MWe ultra,
supercritical clean coal boiler at scale with
Uintah
• Supply power for 1M people
• Targeted 1mm grid resolution = 9 x 1012 cells
• Significantly larger than largest problems
solved today
50-92 meters
5
Radiation Overview
• Solving energy and radiative heat transfer equations
simultaneously
• Need to compute the net radiative source term
• The net radiative source term consists of two terms, one of which
requires integration of incoming intensity about a sphere
• RMCRT approximates the second term using Monte-Carlo
methods
 

N
ray
rayin
N
IdI
14
4




t
T
Diffusion – Convection + Source/Sinks q
6
Reverse Monte Carlo Ray Tracing
• Randomly cast rays to compute the
incoming intensity absorbed by a given cell
• Rays are traced away from the origin cell
to compute incoming intensity backwards
to the origin cell
• When marching rays, each cell entered
adds its contribution to the incoming
intensity absorbed by the origin cell
• The further a ray is traced, the smaller the
contribution becomes Back path of ray from S to emitter
E, 9-cell structured mesh patch
7
Parallel Reverse Monte Carlo Ray Tracing
• Lends itself to scalable parallelism
• Rays are mutually exclusive
• Multiple rays can be traced simultaneously at any
given cell and/or timestep
• Backwards approach eliminates the need to track
rays that never reach an origin cell
• Parallelize by splitting the computational domain
across compute nodes
• Each node is responsible for tracing rays from within
each origin cell that it owns across the entire domain
• Nodes must communicate and store geometry
information and physics properties for the entire
domain
Node 1
Node 2
Node 3
Node 4
Global
Mesh
Local
Mesh
8
Multi-Level AMR RMCRT
• Global approach involves too much
communication
• Use a multilevel representation of
computational domain
• Reduces computational cost,
memory usage, and MPI message
volume
• Define Region of Interest (ROI), which is
surrounded by successively coarser grids
• As rays travel away from ROI, the stride
taken between cells becomes larger
Global
Mesh
Local
Mesh
Coarse
Mesh
Fine
Mesh
Kokkos Performance Portability Library
• C++ library allowing developers to write portable, thread-scalable code optimized
for CPU-, GPU-, and MIC-based architectures
• Kokkos provides abstractions to control:
• how/where kernels are executed,
• where data is allocated, and
• how data is mapped to memory
• While Kokkos enables performance portability, the user is responsible for writing
performant kernels
• Source Available at: https://github.com/kokkos/kokkos
Uintah Programing Model for Stencil
Timestep
Example Stencil Task
Unew = Uold +
dt*F(Uold,Uhalo)
Network
Old Data
WarehouseGET Uold Uhalo
Halo Receives
Uhalo
MPI
New Data
WarehousePUT Unew
Halo Sends
Use Kokkos abstraction layer that maps loops
onto machine specific data layouts and has
appropriate memory abstractions
Kokkos Unmanaged Views
Memory Structure
Cache, and Vectorization Friendly
Kokkos-Based RMCRT
• CPU-, GPU-, and MIC-based RMCRT efforts have resulted in several different
implementations
• Introduced RMCRT:Kokkos to consolidate implementations
• Encapsulated “hot spots” within a Kokkos functor
• This new implementation:
• Required < 100 lines of new code
• Replaces a naïve cell iterator with a Kokkos parallel loop, enabling the
selection of optimal iteration schemes via Kokkos
• Enables multi-threaded task execution via Kokkos back-ends
Node-Level Parallelism Within Uintah
• For CPU and MIC architectures, Uintah features parallel execution of serial tasks
• 1 running task per thread
• Requires at least 1 patch per thread
• Breaks down as patches are subdivided to support more threads/cores
• Current Kokkos-based scheduler features serial execution of data parallel tasks
• 1 running task per MPI process
• Requires at least 1 patch per MPI process
• Eliminates the need to create a new patch to run with another thread
• Next step is a Kokkos-based scheduler w/ parallel execution of data parallel tasks
• We already do this for GPU but not for CPU and MIC
15
16
Medium
Large
17
• Data parallel tasks for CPU- and MIC-based architectures allow Uintah to support
larger thread/core counts per node
• Data parallel tasks offer the potential to improve microarchitecture use (e.g. per-
patch work can be computed cooperatively by multiple threads sharing a cache)
• Use of Kokkos allows data parallel tasks to be introduced in a portable manner
• Helps avoid code divergence and architecture-specific implementations
• Reduces the gap between development time and our ability to run on newly
introduced machines
• Titan comparisons offer encouragement as we prepare for the Aurora Early Science
Program
Summary
Questions?
Support provided by the Department of Energy, National Nuclear Security
Administration, under Award Number(s) DE-NA0002375.
Computing time provided by the NSF Extreme Science and Engineering Discovery
Environment (XSEDE) program
Texas Advanced Computing Center resources used under Award Number(s)
MCA08X004 - ``Resilience and Scalability of the Uintah Software''
Thanks to TACC and those involved with the CCMSC and Uintah past and present
Uintah Download: http://www.uintah.utah.edu

Weitere ähnliche Inhalte

Was ist angesagt?

Background bioinformatics
Background bioinformaticsBackground bioinformatics
Background bioinformaticsILRI
 
Exokernel Operating System (1)
Exokernel Operating System (1)Exokernel Operating System (1)
Exokernel Operating System (1)Kashif Dayo
 
High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstackDeepak Mane
 
Linux12 clustering onlinux
Linux12 clustering onlinuxLinux12 clustering onlinux
Linux12 clustering onlinuxJainul Musani
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Belmiro Moreira
 
Responsive Distributed Routing Algorithm
Responsive Distributed Routing AlgorithmResponsive Distributed Routing Algorithm
Responsive Distributed Routing AlgorithmNafiz Ishtiaque Ahmed
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Belmiro Moreira
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in indiaPreeti Chauhan
 
Slidecast eric jonardi
Slidecast   eric jonardiSlidecast   eric jonardi
Slidecast eric jonardiEric Jonardi
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Exokernel operating systems
Exokernel operating systemsExokernel operating systems
Exokernel operating systemsGaurav Dalvi
 
Virtual Network
Virtual NetworkVirtual Network
Virtual NetworkNagiusa
 
Vivek tyagi kick_off_ppt_ss-2
Vivek tyagi kick_off_ppt_ss-2Vivek tyagi kick_off_ppt_ss-2
Vivek tyagi kick_off_ppt_ss-2liming66
 

Was ist angesagt? (20)

Background bioinformatics
Background bioinformaticsBackground bioinformatics
Background bioinformatics
 
Exokernel Operating System (1)
Exokernel Operating System (1)Exokernel Operating System (1)
Exokernel Operating System (1)
 
High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstack
 
Linux12 clustering onlinux
Linux12 clustering onlinuxLinux12 clustering onlinux
Linux12 clustering onlinux
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
 
Responsive Distributed Routing Algorithm
Responsive Distributed Routing AlgorithmResponsive Distributed Routing Algorithm
Responsive Distributed Routing Algorithm
 
Aca2 07 new
Aca2 07 newAca2 07 new
Aca2 07 new
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Aca2 08 new
Aca2 08 newAca2 08 new
Aca2 08 new
 
Network Topology
Network Topology Network Topology
Network Topology
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
 
Slidecast eric jonardi
Slidecast   eric jonardiSlidecast   eric jonardi
Slidecast eric jonardi
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Exokernel operating systems
Exokernel operating systemsExokernel operating systems
Exokernel operating systems
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Multi threading models
Multi threading modelsMulti threading models
Multi threading models
 
Virtual Network
Virtual NetworkVirtual Network
Virtual Network
 
Vivek tyagi kick_off_ppt_ss-2
Vivek tyagi kick_off_ppt_ss-2Vivek tyagi kick_off_ppt_ss-2
Vivek tyagi kick_off_ppt_ss-2
 

Ähnlich wie PEARC17: Improving Uintah's Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks

Advanced Computer Architecture
Advanced Computer ArchitectureAdvanced Computer Architecture
Advanced Computer Architecturenibiganesh
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
Computer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer systemComputer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer systemmohantysikun0
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
UNIT II DIS.pptx
UNIT II DIS.pptxUNIT II DIS.pptx
UNIT II DIS.pptxSamPrem3
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingSachin Gowda
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer ArchitectureHaris456
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processingare you
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Centerinside-BigData.com
 

Ähnlich wie PEARC17: Improving Uintah's Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks (20)

Advanced Computer Architecture
Advanced Computer ArchitectureAdvanced Computer Architecture
Advanced Computer Architecture
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
What is 3d torus
What is 3d torusWhat is 3d torus
What is 3d torus
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Mahti quick-start guide
Mahti quick-start guide Mahti quick-start guide
Mahti quick-start guide
 
K computer
K computerK computer
K computer
 
Computer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer systemComputer system Architecture. This PPT is based on computer system
Computer system Architecture. This PPT is based on computer system
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
UNIT II DIS.pptx
UNIT II DIS.pptxUNIT II DIS.pptx
UNIT II DIS.pptx
 
CC unit 1.pptx
CC unit 1.pptxCC unit 1.pptx
CC unit 1.pptx
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer Architecture
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processing
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

PEARC17: Improving Uintah's Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks

  • 1. Improving Uintah’s Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks John Holmen1, Alan Humphrey1, Daniel Sunderland2, Martin Berzins1 University of Utah1 Sandia National Laboratories2
  • 2. Uintah Architecture Simulation Controller Scheduler Load Balancer Runtime System ARCHES DSL: NEBO PIDX VisIT UQ DRIVERS CPUsGPUs Xeon Phis Task Data Warehouse Hypre Linear Solver • Open source software • Worldwide distribution • Broad user base • Applications code programming model • Physics routines unaware of communications • Automatically generated abstract C++ task graph • Adaptive execution of tasks by the runtime system • Asynchronous out-of-order execution, • work stealing, • overlapping of communication & computation
  • 3. 3 Uintah’s Heterogeneous Runtime System • MPI+X schedulers support: • MPI + PThreads + CUDA • MPI + Kokkos • Shared memory model on- node • 1 MPI process per node
  • 4. 4 Exascale Target Problem DOE NNSA PSAAP II Center • Modeling an Alstom Power 1000MWe ultra, supercritical clean coal boiler at scale with Uintah • Supply power for 1M people • Targeted 1mm grid resolution = 9 x 1012 cells • Significantly larger than largest problems solved today 50-92 meters
  • 5. 5 Radiation Overview • Solving energy and radiative heat transfer equations simultaneously • Need to compute the net radiative source term • The net radiative source term consists of two terms, one of which requires integration of incoming intensity about a sphere • RMCRT approximates the second term using Monte-Carlo methods    N ray rayin N IdI 14 4     t T Diffusion – Convection + Source/Sinks q
  • 6. 6 Reverse Monte Carlo Ray Tracing • Randomly cast rays to compute the incoming intensity absorbed by a given cell • Rays are traced away from the origin cell to compute incoming intensity backwards to the origin cell • When marching rays, each cell entered adds its contribution to the incoming intensity absorbed by the origin cell • The further a ray is traced, the smaller the contribution becomes Back path of ray from S to emitter E, 9-cell structured mesh patch
  • 7. 7 Parallel Reverse Monte Carlo Ray Tracing • Lends itself to scalable parallelism • Rays are mutually exclusive • Multiple rays can be traced simultaneously at any given cell and/or timestep • Backwards approach eliminates the need to track rays that never reach an origin cell • Parallelize by splitting the computational domain across compute nodes • Each node is responsible for tracing rays from within each origin cell that it owns across the entire domain • Nodes must communicate and store geometry information and physics properties for the entire domain Node 1 Node 2 Node 3 Node 4 Global Mesh Local Mesh
  • 8. 8 Multi-Level AMR RMCRT • Global approach involves too much communication • Use a multilevel representation of computational domain • Reduces computational cost, memory usage, and MPI message volume • Define Region of Interest (ROI), which is surrounded by successively coarser grids • As rays travel away from ROI, the stride taken between cells becomes larger Global Mesh Local Mesh Coarse Mesh Fine Mesh
  • 9. Kokkos Performance Portability Library • C++ library allowing developers to write portable, thread-scalable code optimized for CPU-, GPU-, and MIC-based architectures • Kokkos provides abstractions to control: • how/where kernels are executed, • where data is allocated, and • how data is mapped to memory • While Kokkos enables performance portability, the user is responsible for writing performant kernels • Source Available at: https://github.com/kokkos/kokkos
  • 10. Uintah Programing Model for Stencil Timestep Example Stencil Task Unew = Uold + dt*F(Uold,Uhalo) Network Old Data WarehouseGET Uold Uhalo Halo Receives Uhalo MPI New Data WarehousePUT Unew Halo Sends Use Kokkos abstraction layer that maps loops onto machine specific data layouts and has appropriate memory abstractions Kokkos Unmanaged Views Memory Structure Cache, and Vectorization Friendly
  • 11. Kokkos-Based RMCRT • CPU-, GPU-, and MIC-based RMCRT efforts have resulted in several different implementations • Introduced RMCRT:Kokkos to consolidate implementations • Encapsulated “hot spots” within a Kokkos functor • This new implementation: • Required < 100 lines of new code • Replaces a naïve cell iterator with a Kokkos parallel loop, enabling the selection of optimal iteration schemes via Kokkos • Enables multi-threaded task execution via Kokkos back-ends
  • 12. Node-Level Parallelism Within Uintah • For CPU and MIC architectures, Uintah features parallel execution of serial tasks • 1 running task per thread • Requires at least 1 patch per thread • Breaks down as patches are subdivided to support more threads/cores • Current Kokkos-based scheduler features serial execution of data parallel tasks • 1 running task per MPI process • Requires at least 1 patch per MPI process • Eliminates the need to create a new patch to run with another thread • Next step is a Kokkos-based scheduler w/ parallel execution of data parallel tasks • We already do this for GPU but not for CPU and MIC
  • 13.
  • 14.
  • 15. 15
  • 17. 17 • Data parallel tasks for CPU- and MIC-based architectures allow Uintah to support larger thread/core counts per node • Data parallel tasks offer the potential to improve microarchitecture use (e.g. per- patch work can be computed cooperatively by multiple threads sharing a cache) • Use of Kokkos allows data parallel tasks to be introduced in a portable manner • Helps avoid code divergence and architecture-specific implementations • Reduces the gap between development time and our ability to run on newly introduced machines • Titan comparisons offer encouragement as we prepare for the Aurora Early Science Program Summary
  • 18. Questions? Support provided by the Department of Energy, National Nuclear Security Administration, under Award Number(s) DE-NA0002375. Computing time provided by the NSF Extreme Science and Engineering Discovery Environment (XSEDE) program Texas Advanced Computing Center resources used under Award Number(s) MCA08X004 - ``Resilience and Scalability of the Uintah Software'' Thanks to TACC and those involved with the CCMSC and Uintah past and present Uintah Download: http://www.uintah.utah.edu