SlideShare a Scribd company logo
1 of 18
Experts in numerical algorithms
and HPC services
Accelerators: the good, the bad and the ugly!
Dr Ian Reid
Ian.Reid@nag.co.uk
2
 NAG Introduction
 Accelerators – NAG experience
 NAG on Intel Xeon Phi
 Summary
Agenda
3
 Founded 1970
 Not-for-profit organisation
 Surpluses fund on-going R&D
 Mathematical and Statistical Expertise
 Libraries of components
 Consulting
 HPC Services
 Computational Science and Engineering (CSE) support
 Procurement advice, market watch, benchmarking
NAG Background
4
 Escalator?:
Want more performance? Buy the next processor!
 To get performance/efficiency we have to go
(massively) parallel
 Disruption causing serious look at ‘other’
technologies (and algorithms!)
 Even CPUs with tens of cores
 Hybrid, shared-memory and distributed-memory
parallelism
 Painful whichever way we turn!
Where has my Escalator gone?
5
 Loose definition: hardware on which to run your
software better than on your (general purpose) CPU
 Generally NOT an easy win
 Significant learning curve and effort
 Offload disadvantages…
 The good: put some effort in; get a great result!
 The bad: put effort in, get an OK result, but learn
lessons which can be re-used (often good!)
 The ugly: put significant effort in, get a poor result
and don’t learn anything substantive
Accelerators
6
 The Intel Xeon Phi is a co-processor attached to a
host system via the PCI express bus
 Highly parallel architecture
 Compiler support for OpenMP parallelism
 It has a distinct memory system from the host
 Several use cases to consider:
 Automatic Offloading
 Explicit Offloading
 Native Applications
Intel Xeon Phi
7
 Relatively easy to take existing OpenMP based code
and port to Phi
 Tuning for Phi takes some learning and expertise
 … but feedback into Xeon code is often very strong
 NAG Library for Intel Xeon Phi supports all models
 Offload (supports automatic and explicit) and Native libs
 Windows version from Intel Xeon Phi now in beta
NAG Experience with Intel Xeon Phi
8
 Offload OpenMP regions to Phi when problem sizes
are above some threshold
 Estimating problem size can be complex
 Required data is transferred to/from the host
prior/post executing OpenMP region
 Data transfer takes time, eats into the benefit of running
the OpenMP on the Phi
 Transparent to the user of the Library
 Just recompile code containing NAG Library function calls
to benefit.
Automatic Offload
9
 All NAG functions can be explicitly offloaded by user
 user code modified to include relevant offload statements
 allows control of which functions offloaded
 Data transfers to Phi can be dissociated with function
offloading allowing data to remain on the Phi
 user responsible for data movement
 reduces penalty of offloading data by allowing its use by
multiple offloaded function calls before returning to host
 Effort required by the user to re-code application
Explicit Offload
10
 Users may choose to port their entire application
 user code modified to include relevant offload statements
 allows complete control of which functions are offloaded
 Data transfers to Phi can be dissociated with function
offloading allowing data to remain on the Phi
 user responsible for data movement
 reduces penalty of offloading data by allowing its use by
multiple offloaded function calls before returning to host
 Effort required by the user to re-code application
Native Applications
11
 Sandybridge CPUs (typically using 32 threads)
 Knights Corner Phi processor (typically using 240
threads)
Performance Examples and Lessons
12
0
200
400
600
800
1,000
1,200
1,400
1,600
0 5000 10000 15000 20000 25000 30000
Time(s)
Problem Size (n)
Hierarchical Cluster Analysis (go3ec)
32 threads original Phi offload original Phi offload opt 32 threads opt
 n=30k; m=3k
 Xeon 32t: 1,412s
 Phi 240t*: 1,259s
 Xeon 32t*: 1,073s
 For this size problem
best to stay on CPU
but take the 25%!
13
0
50
100
150
200
250
300
350
400
450
0 5000 10000 15000 20000 25000 30000
Time(s)
Problem Size (n)
Distance Matrix (g03ea)
32 threads original Phi offload original Phi offload opt 32 threads opt
 n=30k; m=3k
 Xeon 32t: 192s
 Phi 240t*: 40.6s
 Xeon 32t*: 75.7s
 Phi gain ~2x (~5x
over original)
14
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
100 10,000 1,000,000 100,000,000
Time(s)
Size of problem (n, log scale)
Uniform RNG - Mersenne Twister (g05sa)
8 threads original Native Phi original Native Phi opt 8 threads opt
 n=500m
 Xeon 8t: 0.25s
 Phi 240t*: 0.08s
 Xeon 8t*: 0.22s
 Phi gain ~3x
15
0
50
100
150
200
250
300
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Time(s)
Problem Size (weighted)
Maximum Likelihood Estimates (g03ca)
32 threads original Phi offload original Phi offload opt 32 threads opt
 n=2500; m=2500;
nfac=30; nvar=200
 Xeon 32t: 256s
 Phi 240t*: 53.6s
 Xeon 32t*: 54.7s
 Phi gain 4x, but also
Xeon speed-up (green
line under red)
16
0
20
40
60
80
100
120
140
160
180
200
0 1000 2000 3000 4000 5000 6000 7000
Time(s)
Problem Size (n)
Solve real symmetric positive definite simultaneous linear
equations using iterative refinement (f04af)
32 threads original Phi offload original Phi offload opt 32 threads opt
 n=6,000; nrhs;1,000
 Xeon32t: 171s
 Phi 240t*: 66s
 Xeon 32t*: 86s
 Phi gain ~1.3x (~3x
original)
17
 Parallelism is a real issue we all face
 Exciting for some. Challenging for others!
 Accelerators are interesting and can offer spectacular wins
 Intel Phi claiming less spectacular performance gains
 Less effort than on other Accelerators
 … and often repays on CPU as well!
 Acid test is always solving your (complete) problem!
 NAG can help you try out this technology
 NAG Library for Phi
 NAG expertise
Summary
18
Thank You
Questions?

More Related Content

What's hot

Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Chris Fregly
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015Karel Ha
 
Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster  with TerasortUsing Derivation-Free Optimization in the Hadoop Cluster  with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
 

What's hot (7)

Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
 
Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster  with TerasortUsing Derivation-Free Optimization in the Hadoop Cluster  with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
 

Viewers also liked

Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Intel IT Center
 
Accelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelAccelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelIntel IT Center
 
Enter the Age of Hadoop SuperComputing
Enter the Age of Hadoop SuperComputingEnter the Age of Hadoop SuperComputing
Enter the Age of Hadoop SuperComputingIntel IT Center
 
New Memory Solutions for Enterprise Computing
New Memory Solutions for Enterprise ComputingNew Memory Solutions for Enterprise Computing
New Memory Solutions for Enterprise ComputingIntel IT Center
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleIntel IT Center
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyIntel IT Center
 
Migrating Mission-Critical Workloads to Intel Architecture
Migrating Mission-Critical Workloads to Intel ArchitectureMigrating Mission-Critical Workloads to Intel Architecture
Migrating Mission-Critical Workloads to Intel ArchitectureIntel IT Center
 
Are you ready to work in the Parallel Universe? Rise to the challenge at SC13
Are you ready to work in the Parallel Universe? Rise to the challenge at SC13Are you ready to work in the Parallel Universe? Rise to the challenge at SC13
Are you ready to work in the Parallel Universe? Rise to the challenge at SC13Intel IT Center
 
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Intel IT Center
 
Transforming Business with Advanced Analytics
Transforming Business with Advanced AnalyticsTransforming Business with Advanced Analytics
Transforming Business with Advanced AnalyticsIntel IT Center
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital AgeIntel IT Center
 

Viewers also liked (12)

Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
Cloud 2015: Connecting the Next Billion - Intel Keynote @ HP Discover 2011
 
AIC Intel Based HPC
AIC Intel Based HPCAIC Intel Based HPC
AIC Intel Based HPC
 
Accelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelAccelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at Intel
 
Enter the Age of Hadoop SuperComputing
Enter the Age of Hadoop SuperComputingEnter the Age of Hadoop SuperComputing
Enter the Age of Hadoop SuperComputing
 
New Memory Solutions for Enterprise Computing
New Memory Solutions for Enterprise ComputingNew Memory Solutions for Enterprise Computing
New Memory Solutions for Enterprise Computing
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
 
Migrating Mission-Critical Workloads to Intel Architecture
Migrating Mission-Critical Workloads to Intel ArchitectureMigrating Mission-Critical Workloads to Intel Architecture
Migrating Mission-Critical Workloads to Intel Architecture
 
Are you ready to work in the Parallel Universe? Rise to the challenge at SC13
Are you ready to work in the Parallel Universe? Rise to the challenge at SC13Are you ready to work in the Parallel Universe? Rise to the challenge at SC13
Are you ready to work in the Parallel Universe? Rise to the challenge at SC13
 
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
 
Transforming Business with Advanced Analytics
Transforming Business with Advanced AnalyticsTransforming Business with Advanced Analytics
Transforming Business with Advanced Analytics
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 

Similar to Accelerators: the good, the bad, and the ugly

High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
 
Threading Successes 01 Intro
Threading Successes 01   IntroThreading Successes 01   Intro
Threading Successes 01 Introguest40fc7cd
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 Eran Shlomo
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Intel Software Brasil
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorEarly Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorIntel IT Center
 
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. AvailabilityHPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. AvailabilityHPC DAY
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileDatabricks
 
Intel new processors
Intel new processorsIntel new processors
Intel new processorszaid_b
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesIntel® Software
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performancePiotr Przymus
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 

Similar to Accelerators: the good, the bad, and the ugly (20)

Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Threading Successes 01 Intro
Threading Successes 01   IntroThreading Successes 01   Intro
Threading Successes 01 Intro
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorEarly Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
 
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. AvailabilityHPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android Devices
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Webinaron muticoreprocessors
Webinaron muticoreprocessorsWebinaron muticoreprocessors
Webinaron muticoreprocessors
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 

More from Intel IT Center

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- SupercomputingIntel IT Center
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraIntel IT Center
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationIntel IT Center
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsIntel IT Center
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationIntel IT Center
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Intel IT Center
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayIntel IT Center
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.Intel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldIntel IT Center
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...Intel IT Center
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityIntel IT Center
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications ShowcaseIntel IT Center
 

More from Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Accelerators: the good, the bad, and the ugly

  • 1. Experts in numerical algorithms and HPC services Accelerators: the good, the bad and the ugly! Dr Ian Reid Ian.Reid@nag.co.uk
  • 2. 2  NAG Introduction  Accelerators – NAG experience  NAG on Intel Xeon Phi  Summary Agenda
  • 3. 3  Founded 1970  Not-for-profit organisation  Surpluses fund on-going R&D  Mathematical and Statistical Expertise  Libraries of components  Consulting  HPC Services  Computational Science and Engineering (CSE) support  Procurement advice, market watch, benchmarking NAG Background
  • 4. 4  Escalator?: Want more performance? Buy the next processor!  To get performance/efficiency we have to go (massively) parallel  Disruption causing serious look at ‘other’ technologies (and algorithms!)  Even CPUs with tens of cores  Hybrid, shared-memory and distributed-memory parallelism  Painful whichever way we turn! Where has my Escalator gone?
  • 5. 5  Loose definition: hardware on which to run your software better than on your (general purpose) CPU  Generally NOT an easy win  Significant learning curve and effort  Offload disadvantages…  The good: put some effort in; get a great result!  The bad: put effort in, get an OK result, but learn lessons which can be re-used (often good!)  The ugly: put significant effort in, get a poor result and don’t learn anything substantive Accelerators
  • 6. 6  The Intel Xeon Phi is a co-processor attached to a host system via the PCI express bus  Highly parallel architecture  Compiler support for OpenMP parallelism  It has a distinct memory system from the host  Several use cases to consider:  Automatic Offloading  Explicit Offloading  Native Applications Intel Xeon Phi
  • 7. 7  Relatively easy to take existing OpenMP based code and port to Phi  Tuning for Phi takes some learning and expertise  … but feedback into Xeon code is often very strong  NAG Library for Intel Xeon Phi supports all models  Offload (supports automatic and explicit) and Native libs  Windows version from Intel Xeon Phi now in beta NAG Experience with Intel Xeon Phi
  • 8. 8  Offload OpenMP regions to Phi when problem sizes are above some threshold  Estimating problem size can be complex  Required data is transferred to/from the host prior/post executing OpenMP region  Data transfer takes time, eats into the benefit of running the OpenMP on the Phi  Transparent to the user of the Library  Just recompile code containing NAG Library function calls to benefit. Automatic Offload
  • 9. 9  All NAG functions can be explicitly offloaded by user  user code modified to include relevant offload statements  allows control of which functions offloaded  Data transfers to Phi can be dissociated with function offloading allowing data to remain on the Phi  user responsible for data movement  reduces penalty of offloading data by allowing its use by multiple offloaded function calls before returning to host  Effort required by the user to re-code application Explicit Offload
  • 10. 10  Users may choose to port their entire application  user code modified to include relevant offload statements  allows complete control of which functions are offloaded  Data transfers to Phi can be dissociated with function offloading allowing data to remain on the Phi  user responsible for data movement  reduces penalty of offloading data by allowing its use by multiple offloaded function calls before returning to host  Effort required by the user to re-code application Native Applications
  • 11. 11  Sandybridge CPUs (typically using 32 threads)  Knights Corner Phi processor (typically using 240 threads) Performance Examples and Lessons
  • 12. 12 0 200 400 600 800 1,000 1,200 1,400 1,600 0 5000 10000 15000 20000 25000 30000 Time(s) Problem Size (n) Hierarchical Cluster Analysis (go3ec) 32 threads original Phi offload original Phi offload opt 32 threads opt  n=30k; m=3k  Xeon 32t: 1,412s  Phi 240t*: 1,259s  Xeon 32t*: 1,073s  For this size problem best to stay on CPU but take the 25%!
  • 13. 13 0 50 100 150 200 250 300 350 400 450 0 5000 10000 15000 20000 25000 30000 Time(s) Problem Size (n) Distance Matrix (g03ea) 32 threads original Phi offload original Phi offload opt 32 threads opt  n=30k; m=3k  Xeon 32t: 192s  Phi 240t*: 40.6s  Xeon 32t*: 75.7s  Phi gain ~2x (~5x over original)
  • 14. 14 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 100 10,000 1,000,000 100,000,000 Time(s) Size of problem (n, log scale) Uniform RNG - Mersenne Twister (g05sa) 8 threads original Native Phi original Native Phi opt 8 threads opt  n=500m  Xeon 8t: 0.25s  Phi 240t*: 0.08s  Xeon 8t*: 0.22s  Phi gain ~3x
  • 15. 15 0 50 100 150 200 250 300 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time(s) Problem Size (weighted) Maximum Likelihood Estimates (g03ca) 32 threads original Phi offload original Phi offload opt 32 threads opt  n=2500; m=2500; nfac=30; nvar=200  Xeon 32t: 256s  Phi 240t*: 53.6s  Xeon 32t*: 54.7s  Phi gain 4x, but also Xeon speed-up (green line under red)
  • 16. 16 0 20 40 60 80 100 120 140 160 180 200 0 1000 2000 3000 4000 5000 6000 7000 Time(s) Problem Size (n) Solve real symmetric positive definite simultaneous linear equations using iterative refinement (f04af) 32 threads original Phi offload original Phi offload opt 32 threads opt  n=6,000; nrhs;1,000  Xeon32t: 171s  Phi 240t*: 66s  Xeon 32t*: 86s  Phi gain ~1.3x (~3x original)
  • 17. 17  Parallelism is a real issue we all face  Exciting for some. Challenging for others!  Accelerators are interesting and can offer spectacular wins  Intel Phi claiming less spectacular performance gains  Less effort than on other Accelerators  … and often repays on CPU as well!  Acid test is always solving your (complete) problem!  NAG can help you try out this technology  NAG Library for Phi  NAG expertise Summary