SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Two short talks on current topics
in Computer Science
Jan Prins
Department of Computer Science
University of North Carolina at Chapel Hill
1 Runtime Methods to Improve Energy Efficiency in
Supercomputing Applications
2 Computational Methods in Transcriptome Analysis
Runtime Methods to Improve Energy Efficiency in
HPC Applications
Sridutt Bhalachandra1, Robert Fowler2, Stephen Olivier3,
Allan Porterfield2, and Jan Prins1
1Department of Computer Science, University of North Carolina at Chapel Hill
2Renaissance Computing Institute, Chapel Hill
3Sandia National Laboratories
May 8, 2018
computing performance: 120 years of exponential growth!
Runtime methods to improve energy efficiency 2
What is driving performance growth?
Moore’s “law” - transistor density doubles every 18-24 months
Runtime methods to improve energy efficiency 3
What is driving performance growth?
Moore’s “law” - transistor density doubles every 18-24 months
Dennard scaling - total power remains the same and maximum
operating frequency increases
Runtime methods to improve energy efficiency 3
What is driving performance growth?
Moore’s “law” - transistor density doubles every 18-24 months
Dennard scaling - total power remains the same and maximum
operating frequency increases
but look what has happened over the past two decades
Runtime methods to improve energy efficiency 3
The end of Dennard scaling and faster transistors
Consequences
additional transistors require additional area
power and heat increase commensurately
parallel computing is the only route to scaling performance
multicore processors
multiprocessor nodes
interconnection networks
Runtime methods to improve energy efficiency 4
Scalable parallel computing
Message Passing Interface (MPI) is used to coordinate computation
and communication among all processor cores
Runtime methods to improve energy efficiency 5
Current largest parallel computer
Source: http://www.nsccwx.cn/wxcyw
Sunway Taihulight
40,960 nodes
10,649,600 cores (256+4 per
node) at 1.45GHz
20PB storage
$273 million
Top500 #1
93.01 PFLOPS @ 15.4MW
1 PetaFLOPS (PFLOPS) = 1015 Floating Point Operations Per Second
1 MegaWatt (MW) can roughly power 1000 homes
Runtime methods to improve energy efficiency 6
Exascale (1018
FLOPS) power requirements
System/Site
Performance
(PFLOPS)
Power
(MW)
Energy Efficiency
(GFLOPS/W)
Exascale 1000 ? ?
Taihulight 93 15 6
Tianhe 2 34 18 2
Piz Daint 20 2 9
Runtime methods to improve energy efficiency 7
Exascale (1018
FLOPS) power requirements
System/Site
Performance
(PFLOPS)
Power
(MW)
Energy Efficiency
(GFLOPS/W)
Exascale 1000 ? ?
Taihulight 93 15 6
Tianhe 2 34 18 2
Piz Daint 20 2 9
TSUBAME 3.0 2 0.14 14
kukai 0.46 0.03 14
AIST AI Cloud 0.96 0.08 13
Runtime methods to improve energy efficiency 7
Exascale (1018
FLOPS) power requirements
System/Site
Performance
(PFLOPS)
Power
(MW)
Energy Efficiency
(GFLOPS/W)
Exascale 1000 20 50
Taihulight 93 15 6
Tianhe 2 34 18 2
Piz Daint 20 2 9
TSUBAME 3.0 2 0.14 14
kukai 0.46 0.03 14
AIST AI Cloud 0.96 0.08 13
Runtime methods to improve energy efficiency 8
Exascale (1018
FLOPS) power requirements
System/Site
Performance
(PFLOPS)
Power
(MW)
Energy Efficiency
(GFLOPS/W)
Exascale 1000 20 50
Taihulight 93 15 6
Tianhe 2 34 18 2
Piz Daint 20 2 9
TSUBAME 3.0 2 0.14 14
kukai 0.46 0.03 14
AIST AI Cloud 0.96 0.08 13
5x - 10x improvement in energy efficiency required
Runtime methods to improve energy efficiency 8
breakdown of power use in a large parallel computer
Source: Use Case: Quantifying the Energy Efficiency of a Computing System -Hsu et al.
Runtime methods to improve energy efficiency 9
Opportunity to save energy
“Race to the end” in parallel regions
each processor core operates on data in its node
each processor maximizes speed while staying within thermal
limit
all processors spinwait on lock at end of the region
last processor to arrive releases the lock
Runtime methods to improve energy efficiency 10
Computational workload imbalance
could be inherent in application
could be due to system heterogeneity
exacerbated by the race to the end
Runtime methods to improve energy efficiency 11
Saving energy by mitigating workload imbalance
Runtime methods to improve energy efficiency 12
Saving energy by mitigating workload imbalance
Challenges
each core is set to operate at a suitable frequency based on
previous phase observation
the frequency can change at every phase
Runtime methods to improve energy efficiency 12
Fine grained power control
Dynamic Duty Cycle Modulation (DDCM) – T-states
− Actual clock rate is not changed, DVFS and TurboBoost still operational
− Modulation range constant across architecture - 100% to 6.25%
− IA32_CLOCK_MODULATION MSR
Runtime methods to improve energy efficiency 13
Fine grained power control
Dynamic Duty Cycle Modulation (DDCM) – T-states
− Actual clock rate is not changed, DVFS and TurboBoost still operational
− Modulation range constant across architecture - 100% to 6.25%
− IA32_CLOCK_MODULATION MSR
DVFS - core specific (Haswell) – P-states
− Can slow only non-critical cores
− Operational range machine-dependent even for the same architecture
− acpi_cpufreq kernel module
Runtime methods to improve energy efficiency 13
Runtime control policy
Core-specific control
− match a core’s effective duty cycle to its workload
Duty cycle =
Time core in active state
Total time (clock cycles)
∗Change core active time using DDCM or clock cycles using DVFS
Work =
Compute time
Compute time + Idle time
(constant frequency)
Effective Work =
Compute time
Compute time + Idle time
∗
Max frequency
Current frequency
Runtime methods to improve energy efficiency 14
Runtime policy
Assumes similar
behavior across
successive phases
Policy calculation
local to core, no
communication
Runtime methods to improve energy efficiency 15
Runtime policy
Assumes similar
behavior across
successive phases
Policy calculation
local to core, no
communication
Combined policy
(PowerDVFS < PowerDDCM )
− Use DVFS policy until lowest frequency reached
− Thereafter, use DDCM policy
Runtime methods to improve energy efficiency 15
Adaptive Core-specific Runtime (ACR)
ACR = Runtime Policy + User Options
1 Can monitor performance degradation at the end of every
phase
− Rudimentary method to detect phase change
2 Can induce minimum phase length limit
− Useful in skipping start-up phases
3 Support for user-annotations
− However, not used in current experimentation
∗ Runtime is transparent, eliminating the need for code changes
to MPI applications
Runtime methods to improve energy efficiency 16
Experimental Setup
Mini-apps & Applications
Unstructured grids – MiniFE, HPCCG, AMG
Structured grids – MiniGhost
Mesh Refinement – MiniAMR
Hydrodynamics – CloverLeaf
− mini-apps representative of key production HPC applications
Dislocation Dynamics – ParaDis
System
32 Haswell node partition (Sandia Shepard) = 1024 cores
− Dell M420: two 16-cores Xeon E5-2698v3 128GB at 2.3GHz
− RHEL6.8, Slurm 2.3.3-1.18chaos and Linux 3.17.8 kernel
− Mpich 3.2
Results are average of 12 runs taken at stable temperatures (to promote
reproducibility)
Runtime methods to improve energy efficiency 17
ParaDis results
Runtime methods to improve energy efficiency 18
ParaDis results
Runtime methods to improve energy efficiency 18
ParaDis critical path on 24 nodes (768 cores) - Default
0.51.01.52.02.5
Phase
ComputeTime(s)
0 200 400 600 800 1000 1200
250030003500
AverageFrequency(MHz)
Runtime methods to improve energy efficiency 19
ParaDis critical path on 24 nodes (768 cores) - Default
0.51.01.52.02.5
Phase
ComputeTime(s)
0 200 400 600 800 1000 1200
250030003500
AverageFrequency(MHz)
Bimodal distribution of critical path times < 1.0s and > 1.0s
Successive phases are similar, with only occasional jumps
Average critical path frequency (Default) = 2507.4MHz
Runtime methods to improve energy efficiency 19
ParaDis critical path on 24 nodes (768 cores) - DVFS
0.51.01.52.0
Phase
ComputeTime(s)
0 200 400 600 800 1000 1200
2200260030003400
AverageFrequency(MHz)
Average critical path frequency (Default) = 2467.3MHz
Runtime methods to improve energy efficiency 20
ParaDis critical path on 24 nodes (768 cores) - DDCM
0.51.01.52.0
Phase
ComputeTime(s)
0 200 400 600 800 1000 1200
250030003500
AverageFrequency(MHz)
Very low frequency on non-critical cores for prolonged periods reduces
variation, and increases available thermal headroom for critical cores
Average critical path frequency (Default) = 2784.8MHz
Runtime methods to improve energy efficiency 21
Mitigating workload balance
average results across all experiments
Policy %Power reduced %Energy saved %Time increase Temp decrease (C)
DDCM 19.3 15.1 5.3 3.2
DVFS 20.5 20.2 0.5 3.3
Combined 24.9 22.6 2.9 4.2
ACR demonstrates that dynamic control of power at
runtime is possible
Runtime methods to improve energy efficiency 22
Mitigating workload balance
average results across all experiments
Policy %Power reduced %Energy saved %Time increase Temp decrease (C)
DDCM 19.3 15.1 5.3 3.2
DVFS 20.5 20.2 0.5 3.3
Combined 24.9 22.6 2.9 4.2
ACR demonstrates that dynamic control of power at
runtime is possible
At Exascale, runtimes such as ACR will allow
− more work to be run at one time by using less power
− individual applications to run faster by allowing a higher
thermal headroom on critical cores
Runtime methods to improve energy efficiency 22
Mitigating workload balance
average results across all experiments
Policy %Power reduced %Energy saved %Time increase Temp decrease (C)
DDCM 19.3 15.1 5.3 3.2
DVFS 20.5 20.2 0.5 3.3
Combined 24.9 22.6 2.9 4.2
ACR demonstrates that dynamic control of power at
runtime is possible
At Exascale, runtimes such as ACR will allow
− more work to be run at one time by using less power
− individual applications to run faster by allowing a higher
thermal headroom on critical cores
Energy optimization can also be performance optimization
Runtime methods to improve energy efficiency 22
Saving energy in memory-bound applications
Many HPC applications are memory-bound
Memory operations are seldom visible to OS/runtime
− Power wasted in CPU while waiting on memory
Approach
− sample table of request occupancy in memory subsystem
− use DVFS to slow non-critical cores
Published work
1 Improving Energy Efficiency in Memory-constrained
Applications Using Core-specific Power Control (E2SC 2017)
Runtime methods to improve energy efficiency 23
Acknowledgements
Ph.D. research of Sridutt Bha-
lachandra (Argonne National Labs
Exascale Group)
Published work
1 An Adaptive Core-specific Runtime for Energy Efficiency
(IPDPS 2017)
2 Using Dynamic Duty Cycle Modulation to improve energy
efficiency in High Performance Computing (HPPAC 2015)
Runtime methods to improve energy efficiency 24

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Design of Compensator for Roll Control of Towing Air-Crafts
Design of Compensator for Roll Control of Towing Air-CraftsDesign of Compensator for Roll Control of Towing Air-Crafts
Design of Compensator for Roll Control of Towing Air-Crafts
 
2 2 mitchell lee_am and pwat spectral correction_pvpmc5
2 2 mitchell lee_am and pwat spectral correction_pvpmc52 2 mitchell lee_am and pwat spectral correction_pvpmc5
2 2 mitchell lee_am and pwat spectral correction_pvpmc5
 
Jh3516001603
Jh3516001603Jh3516001603
Jh3516001603
 
Energy Efficient Change Management in a Cloud Computing Environment
Energy Efficient Change Management in a Cloud Computing EnvironmentEnergy Efficient Change Management in a Cloud Computing Environment
Energy Efficient Change Management in a Cloud Computing Environment
 
Comperative Performance Analysis of PMSM Drive Using MPSO and ACO Techniques
Comperative Performance Analysis of PMSM Drive Using MPSO and ACO TechniquesComperative Performance Analysis of PMSM Drive Using MPSO and ACO Techniques
Comperative Performance Analysis of PMSM Drive Using MPSO and ACO Techniques
 
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)
 
A Brief Survey of Current Power Limiting Strategies
A Brief Survey of Current Power Limiting StrategiesA Brief Survey of Current Power Limiting Strategies
A Brief Survey of Current Power Limiting Strategies
 
Load Shifting Technique on 24Hour Basis for a Smart-Grid to Reduce Cost and P...
Load Shifting Technique on 24Hour Basis for a Smart-Grid to Reduce Cost and P...Load Shifting Technique on 24Hour Basis for a Smart-Grid to Reduce Cost and P...
Load Shifting Technique on 24Hour Basis for a Smart-Grid to Reduce Cost and P...
 
HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020
 
IEEE International Conference Presentation
IEEE International Conference PresentationIEEE International Conference Presentation
IEEE International Conference Presentation
 
Optimizing energy yield in multi-MW power converters for wind” by Mikel zabal...
Optimizing energy yield in multi-MW power converters for wind” by Mikel zabal...Optimizing energy yield in multi-MW power converters for wind” by Mikel zabal...
Optimizing energy yield in multi-MW power converters for wind” by Mikel zabal...
 
Novel Adaptive Controller for PMSG Driven Wind Turbine To Improve Power Syste...
Novel Adaptive Controller for PMSG Driven Wind Turbine To Improve Power Syste...Novel Adaptive Controller for PMSG Driven Wind Turbine To Improve Power Syste...
Novel Adaptive Controller for PMSG Driven Wind Turbine To Improve Power Syste...
 
Unit commitment in power system
Unit commitment in power systemUnit commitment in power system
Unit commitment in power system
 
Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
 
Maximum Power Point Tracking of a DFIG Wind Turbine System 786328456
Maximum Power Point Tracking of a DFIG Wind Turbine System 786328456Maximum Power Point Tracking of a DFIG Wind Turbine System 786328456
Maximum Power Point Tracking of a DFIG Wind Turbine System 786328456
 
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...
 
Unit commitment
Unit commitmentUnit commitment
Unit commitment
 
Enhancement of Power System Performance by Optimal Placement of Distributed G...
Enhancement of Power System Performance by Optimal Placement of Distributed G...Enhancement of Power System Performance by Optimal Placement of Distributed G...
Enhancement of Power System Performance by Optimal Placement of Distributed G...
 

Ähnlich wie Runtime Methods to Improve Energy Efficiency in HPC Applications

Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 

Ähnlich wie Runtime Methods to Improve Energy Efficiency in HPC Applications (20)

Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia LaboratoryRT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
 
Energy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale SystemsEnergy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale Systems
 
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
 
03 broderick qsts_sand2016-4697 c
03 broderick qsts_sand2016-4697 c03 broderick qsts_sand2016-4697 c
03 broderick qsts_sand2016-4697 c
 
Hitec Brochure QPS English_LR
Hitec Brochure QPS English_LRHitec Brochure QPS English_LR
Hitec Brochure QPS English_LR
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
Reliability Analysis for an Energy-Aware RAID System
Reliability Analysis for an Energy-Aware RAID SystemReliability Analysis for an Energy-Aware RAID System
Reliability Analysis for an Energy-Aware RAID System
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
PES Wind Magazine - New-generation DFIG power converters for 6-8 MW wind turb...
PES Wind Magazine - New-generation DFIG power converters for 6-8 MW wind turb...PES Wind Magazine - New-generation DFIG power converters for 6-8 MW wind turb...
PES Wind Magazine - New-generation DFIG power converters for 6-8 MW wind turb...
 
Energy Efficiency in Data Centers
Energy Efficiency in Data CentersEnergy Efficiency in Data Centers
Energy Efficiency in Data Centers
 
NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN  NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN
 
Massive Energy Saving Systems and Strategies at Equipment Level in the Highes...
Massive Energy Saving Systems and Strategies at Equipment Level in the Highes...Massive Energy Saving Systems and Strategies at Equipment Level in the Highes...
Massive Energy Saving Systems and Strategies at Equipment Level in the Highes...
 
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
 
Indian Wind Power - New-generation DFIG Power Converters for 6-8 MW Wind Turb...
Indian Wind Power - New-generation DFIG Power Converters for 6-8 MW Wind Turb...Indian Wind Power - New-generation DFIG Power Converters for 6-8 MW Wind Turb...
Indian Wind Power - New-generation DFIG Power Converters for 6-8 MW Wind Turb...
 
IRJET- Power Scheduling Algorithm based Power Optimization of Mpsocs
IRJET-  	  Power Scheduling Algorithm based Power Optimization of MpsocsIRJET-  	  Power Scheduling Algorithm based Power Optimization of Mpsocs
IRJET- Power Scheduling Algorithm based Power Optimization of Mpsocs
 
XANT SPS PowerTower presentation microgrid inno conf sep 2017 20170905
XANT SPS PowerTower presentation microgrid inno conf sep 2017 20170905XANT SPS PowerTower presentation microgrid inno conf sep 2017 20170905
XANT SPS PowerTower presentation microgrid inno conf sep 2017 20170905
 
White paper - Ingeteam's new-generation power converters for 6-8 MW wind tur...
White paper -  Ingeteam's new-generation power converters for 6-8 MW wind tur...White paper -  Ingeteam's new-generation power converters for 6-8 MW wind tur...
White paper - Ingeteam's new-generation power converters for 6-8 MW wind tur...
 
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
 

Mehr von Facultad de Informática UCM

Mehr von Facultad de Informática UCM (20)

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
 

Kürzlich hochgeladen

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Kürzlich hochgeladen (20)

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

Runtime Methods to Improve Energy Efficiency in HPC Applications

  • 1. Two short talks on current topics in Computer Science Jan Prins Department of Computer Science University of North Carolina at Chapel Hill 1 Runtime Methods to Improve Energy Efficiency in Supercomputing Applications 2 Computational Methods in Transcriptome Analysis
  • 2. Runtime Methods to Improve Energy Efficiency in HPC Applications Sridutt Bhalachandra1, Robert Fowler2, Stephen Olivier3, Allan Porterfield2, and Jan Prins1 1Department of Computer Science, University of North Carolina at Chapel Hill 2Renaissance Computing Institute, Chapel Hill 3Sandia National Laboratories May 8, 2018
  • 3. computing performance: 120 years of exponential growth! Runtime methods to improve energy efficiency 2
  • 4. What is driving performance growth? Moore’s “law” - transistor density doubles every 18-24 months Runtime methods to improve energy efficiency 3
  • 5. What is driving performance growth? Moore’s “law” - transistor density doubles every 18-24 months Dennard scaling - total power remains the same and maximum operating frequency increases Runtime methods to improve energy efficiency 3
  • 6. What is driving performance growth? Moore’s “law” - transistor density doubles every 18-24 months Dennard scaling - total power remains the same and maximum operating frequency increases but look what has happened over the past two decades Runtime methods to improve energy efficiency 3
  • 7. The end of Dennard scaling and faster transistors Consequences additional transistors require additional area power and heat increase commensurately parallel computing is the only route to scaling performance multicore processors multiprocessor nodes interconnection networks Runtime methods to improve energy efficiency 4
  • 8. Scalable parallel computing Message Passing Interface (MPI) is used to coordinate computation and communication among all processor cores Runtime methods to improve energy efficiency 5
  • 9. Current largest parallel computer Source: http://www.nsccwx.cn/wxcyw Sunway Taihulight 40,960 nodes 10,649,600 cores (256+4 per node) at 1.45GHz 20PB storage $273 million Top500 #1 93.01 PFLOPS @ 15.4MW 1 PetaFLOPS (PFLOPS) = 1015 Floating Point Operations Per Second 1 MegaWatt (MW) can roughly power 1000 homes Runtime methods to improve energy efficiency 6
  • 10. Exascale (1018 FLOPS) power requirements System/Site Performance (PFLOPS) Power (MW) Energy Efficiency (GFLOPS/W) Exascale 1000 ? ? Taihulight 93 15 6 Tianhe 2 34 18 2 Piz Daint 20 2 9 Runtime methods to improve energy efficiency 7
  • 11. Exascale (1018 FLOPS) power requirements System/Site Performance (PFLOPS) Power (MW) Energy Efficiency (GFLOPS/W) Exascale 1000 ? ? Taihulight 93 15 6 Tianhe 2 34 18 2 Piz Daint 20 2 9 TSUBAME 3.0 2 0.14 14 kukai 0.46 0.03 14 AIST AI Cloud 0.96 0.08 13 Runtime methods to improve energy efficiency 7
  • 12. Exascale (1018 FLOPS) power requirements System/Site Performance (PFLOPS) Power (MW) Energy Efficiency (GFLOPS/W) Exascale 1000 20 50 Taihulight 93 15 6 Tianhe 2 34 18 2 Piz Daint 20 2 9 TSUBAME 3.0 2 0.14 14 kukai 0.46 0.03 14 AIST AI Cloud 0.96 0.08 13 Runtime methods to improve energy efficiency 8
  • 13. Exascale (1018 FLOPS) power requirements System/Site Performance (PFLOPS) Power (MW) Energy Efficiency (GFLOPS/W) Exascale 1000 20 50 Taihulight 93 15 6 Tianhe 2 34 18 2 Piz Daint 20 2 9 TSUBAME 3.0 2 0.14 14 kukai 0.46 0.03 14 AIST AI Cloud 0.96 0.08 13 5x - 10x improvement in energy efficiency required Runtime methods to improve energy efficiency 8
  • 14. breakdown of power use in a large parallel computer Source: Use Case: Quantifying the Energy Efficiency of a Computing System -Hsu et al. Runtime methods to improve energy efficiency 9
  • 15. Opportunity to save energy “Race to the end” in parallel regions each processor core operates on data in its node each processor maximizes speed while staying within thermal limit all processors spinwait on lock at end of the region last processor to arrive releases the lock Runtime methods to improve energy efficiency 10
  • 16. Computational workload imbalance could be inherent in application could be due to system heterogeneity exacerbated by the race to the end Runtime methods to improve energy efficiency 11
  • 17. Saving energy by mitigating workload imbalance Runtime methods to improve energy efficiency 12
  • 18. Saving energy by mitigating workload imbalance Challenges each core is set to operate at a suitable frequency based on previous phase observation the frequency can change at every phase Runtime methods to improve energy efficiency 12
  • 19. Fine grained power control Dynamic Duty Cycle Modulation (DDCM) – T-states − Actual clock rate is not changed, DVFS and TurboBoost still operational − Modulation range constant across architecture - 100% to 6.25% − IA32_CLOCK_MODULATION MSR Runtime methods to improve energy efficiency 13
  • 20. Fine grained power control Dynamic Duty Cycle Modulation (DDCM) – T-states − Actual clock rate is not changed, DVFS and TurboBoost still operational − Modulation range constant across architecture - 100% to 6.25% − IA32_CLOCK_MODULATION MSR DVFS - core specific (Haswell) – P-states − Can slow only non-critical cores − Operational range machine-dependent even for the same architecture − acpi_cpufreq kernel module Runtime methods to improve energy efficiency 13
  • 21. Runtime control policy Core-specific control − match a core’s effective duty cycle to its workload Duty cycle = Time core in active state Total time (clock cycles) ∗Change core active time using DDCM or clock cycles using DVFS Work = Compute time Compute time + Idle time (constant frequency) Effective Work = Compute time Compute time + Idle time ∗ Max frequency Current frequency Runtime methods to improve energy efficiency 14
  • 22. Runtime policy Assumes similar behavior across successive phases Policy calculation local to core, no communication Runtime methods to improve energy efficiency 15
  • 23. Runtime policy Assumes similar behavior across successive phases Policy calculation local to core, no communication Combined policy (PowerDVFS < PowerDDCM ) − Use DVFS policy until lowest frequency reached − Thereafter, use DDCM policy Runtime methods to improve energy efficiency 15
  • 24. Adaptive Core-specific Runtime (ACR) ACR = Runtime Policy + User Options 1 Can monitor performance degradation at the end of every phase − Rudimentary method to detect phase change 2 Can induce minimum phase length limit − Useful in skipping start-up phases 3 Support for user-annotations − However, not used in current experimentation ∗ Runtime is transparent, eliminating the need for code changes to MPI applications Runtime methods to improve energy efficiency 16
  • 25. Experimental Setup Mini-apps & Applications Unstructured grids – MiniFE, HPCCG, AMG Structured grids – MiniGhost Mesh Refinement – MiniAMR Hydrodynamics – CloverLeaf − mini-apps representative of key production HPC applications Dislocation Dynamics – ParaDis System 32 Haswell node partition (Sandia Shepard) = 1024 cores − Dell M420: two 16-cores Xeon E5-2698v3 128GB at 2.3GHz − RHEL6.8, Slurm 2.3.3-1.18chaos and Linux 3.17.8 kernel − Mpich 3.2 Results are average of 12 runs taken at stable temperatures (to promote reproducibility) Runtime methods to improve energy efficiency 17
  • 26. ParaDis results Runtime methods to improve energy efficiency 18
  • 27. ParaDis results Runtime methods to improve energy efficiency 18
  • 28. ParaDis critical path on 24 nodes (768 cores) - Default 0.51.01.52.02.5 Phase ComputeTime(s) 0 200 400 600 800 1000 1200 250030003500 AverageFrequency(MHz) Runtime methods to improve energy efficiency 19
  • 29. ParaDis critical path on 24 nodes (768 cores) - Default 0.51.01.52.02.5 Phase ComputeTime(s) 0 200 400 600 800 1000 1200 250030003500 AverageFrequency(MHz) Bimodal distribution of critical path times < 1.0s and > 1.0s Successive phases are similar, with only occasional jumps Average critical path frequency (Default) = 2507.4MHz Runtime methods to improve energy efficiency 19
  • 30. ParaDis critical path on 24 nodes (768 cores) - DVFS 0.51.01.52.0 Phase ComputeTime(s) 0 200 400 600 800 1000 1200 2200260030003400 AverageFrequency(MHz) Average critical path frequency (Default) = 2467.3MHz Runtime methods to improve energy efficiency 20
  • 31. ParaDis critical path on 24 nodes (768 cores) - DDCM 0.51.01.52.0 Phase ComputeTime(s) 0 200 400 600 800 1000 1200 250030003500 AverageFrequency(MHz) Very low frequency on non-critical cores for prolonged periods reduces variation, and increases available thermal headroom for critical cores Average critical path frequency (Default) = 2784.8MHz Runtime methods to improve energy efficiency 21
  • 32. Mitigating workload balance average results across all experiments Policy %Power reduced %Energy saved %Time increase Temp decrease (C) DDCM 19.3 15.1 5.3 3.2 DVFS 20.5 20.2 0.5 3.3 Combined 24.9 22.6 2.9 4.2 ACR demonstrates that dynamic control of power at runtime is possible Runtime methods to improve energy efficiency 22
  • 33. Mitigating workload balance average results across all experiments Policy %Power reduced %Energy saved %Time increase Temp decrease (C) DDCM 19.3 15.1 5.3 3.2 DVFS 20.5 20.2 0.5 3.3 Combined 24.9 22.6 2.9 4.2 ACR demonstrates that dynamic control of power at runtime is possible At Exascale, runtimes such as ACR will allow − more work to be run at one time by using less power − individual applications to run faster by allowing a higher thermal headroom on critical cores Runtime methods to improve energy efficiency 22
  • 34. Mitigating workload balance average results across all experiments Policy %Power reduced %Energy saved %Time increase Temp decrease (C) DDCM 19.3 15.1 5.3 3.2 DVFS 20.5 20.2 0.5 3.3 Combined 24.9 22.6 2.9 4.2 ACR demonstrates that dynamic control of power at runtime is possible At Exascale, runtimes such as ACR will allow − more work to be run at one time by using less power − individual applications to run faster by allowing a higher thermal headroom on critical cores Energy optimization can also be performance optimization Runtime methods to improve energy efficiency 22
  • 35. Saving energy in memory-bound applications Many HPC applications are memory-bound Memory operations are seldom visible to OS/runtime − Power wasted in CPU while waiting on memory Approach − sample table of request occupancy in memory subsystem − use DVFS to slow non-critical cores Published work 1 Improving Energy Efficiency in Memory-constrained Applications Using Core-specific Power Control (E2SC 2017) Runtime methods to improve energy efficiency 23
  • 36. Acknowledgements Ph.D. research of Sridutt Bha- lachandra (Argonne National Labs Exascale Group) Published work 1 An Adaptive Core-specific Runtime for Energy Efficiency (IPDPS 2017) 2 Using Dynamic Duty Cycle Modulation to improve energy efficiency in High Performance Computing (HPPAC 2015) Runtime methods to improve energy efficiency 24