SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Utility of Graphics Processing Units
for dense matrix calculations in
computing and inverting genomic
relationship matrices
Karin Meyer and Bruce Tier
Animal Genetics and Breeding Unit,
University of New England,
Armidale, Australia
AAABG 2013
GPUs for GRMs | Introduction
Computing in the genomics age
Requirements for mixed model analyses have changed!
Pre-genomics:
mixed model equations sparse (few non-zero elements)
– inverse of pedigree based relationship matrix (NRM)
concentrated on exploiting sparsity in computations
Now: Genomic relationship matrix (GRM)
dense (large proportion of elements non-zero)
require manipulation of large, dense matrices to
– compute the GRM
– invert the GRM
– predict breeding values in single-step approach
computationally demanding
exploit new hardware and software libraries
– computing in parallel
K. M. | 2 / 16
GPUs for GRMs | Introduction
Objectives
A first look to examine scope for
speeding up computations to calculate & invert GRM
using parallel computations
exploiting a Graphics Processing Unit (GPU)
“An excursion into the land of computer gaming,
modern compilers and software libraries”
K. M. | 3 / 16
GPUs for GRMs | Introduction
What is a GPU?
Graphical Processing Unit
initially: co-processor for computer gaming
– graphics card
– fast recalculation of pixel values
– remember: i387 math processor for i386
now: GP-GPU adapted to General Purpose computing
– hundreds to thousands of cores
– capable of double precision calculations
– turns desktop PC into personal super-computer
– but: limited memory (currently 6GB max.)
specialised interface for application programming
– CUDA (proprietary of NVIDIA) −→ Fortran interface
– OPENCL (C++ like)
K. M. | 4 / 16
GPUs for GRMs | Introduction
Tesla K20X
2688 cores
6 GB RAM
Peak: 3.95/1.31 Tflops (single/double precision)
K. M. | 5 / 16
GPUs for GRMs | Calculating the GRM
Memory needed: G = ZZ
Gigabytes – single precision matrices
n Gn×n Zn×s
s = 100K 500K 1000K
5 000 0.1 1.9 9.3 18.6
10 000 0.4 3.7 18.6 37.3
20 000 1.5 7.5 37.3 74.5
30 000 3.4 11.2 55.9 111.8
50 000 9.3 18.6 93.1 186.3
−→ break calculations into blocks
−→ use out-of-core storage
→ parallel computing literature
K. M. | 6 / 16
GPUs for GRMs | Calculating the GRM
Blockwise multiplication
All-in-one
=Z
Z
G
G = ZZ
Column-wise division
= + · · · +Z.1 · · · · · · Z.4
Z.1
· · ·
· · ·
Z.4
Z.1Z.1 Z.4Z.4
G =
k
Z.kZ.k
K. M. | 7 / 16
Z: n animals
× s SNPs
G: symmetric
sn(n + 1)/2
flops
GPUs for GRMs | Calculating the GRM
Blockwise multiplication - cont’
Row-wise division
=Zi.
Zj. Gij = Zi.Zj.
Row- and columnwise division
=
Gij =
k
ZikZjk
K. M. | 8 / 16
GPUs for GRMs | Inverting the GRM
Blockwise inversion
G =


GPP GPC GPT
GCP GCC GCT
GTP GTC GTT


1. Subdivide matrix into blocks
C current −→ choosen size
P previous
T trailing
2. Factor & invert chol(GCC)
3. Adjust GPP & GPC
4. ‘Absorb’ into GTT & GCT
5. Calculate inverse
6. Redefine C, P & T; repeat
Gauss-Jordan Elimination
GCC := chol(GCC)
GCC := G−1
CC
GPC := GPCGCC
GPP := GPP + GPCGPC
GCT := GCCGCT
GTT := GTT − GCTGCT
GPT := GPT − GPCGCT
GCT := −(GCCGCT)
GPC := GPCGCC
GCC := GCCGCC
−→ break matrix products into blocks as needed
K. M. | 9 / 16
GPUs for GRMs | Computing environment
Hardware
‘Standard’ multi-core CPU
– Intel I7-3930K
– 3.2 GHz, 12M cache
– 64GB RAM
– 6 cores
GPU
– Nvidia GTX 580
– 512 cores
– 3GB RAM
– capable of double precision calculations
→ high end gaming card, not GP-GPU
K. M. | 10 / 16
GPUs for GRMs | Computing environment
Software
Linux (CentOS 6)
gfortran, gcc 4.4.7; Intel composer XE 13.0.1
Intel MKL library 11.0
– BLAS: SSYRK, SGEMM, STRTRI, STRMM
– LAPACK: SPOTRF, SPOTRI
CUDA 5.0
– CUBLAS: CUBLAS_SSYRK, CUBLAS_SGEMM,
CUBLAS_STRTRI, CUBLAS_STRMM
CULA dense R16a
– CULA_SPOTRF, CULA_SPOTRI, CULA_STRTRI
MAGMA 1.3
– MAGMAF_SLAUUM
K. M. | 11 / 16
GPUs for GRMs | Results
Time to ‘make’ GRM∗
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
GPU
CPU6
CPU1
1 hour
5 hours
10−1
100
101
102
103
0 20000 40000 60000 80000
No. animals
Systemtime(min)
∗
Single precision calculations, s = 500 000
K. M. | 12 / 16
GPUs for GRMs | Results
Time to ‘make’ GRM∗
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
GPU
CPU6
CPU1
1 hour
5 hours
10−1
100
101
102
103
0 20000 40000 60000 80000
No. animals
Systemtime(min)
∗
Single precision calculations, s = 500 000
K. M. | 12 / 16
●
● ●
●
● ●
●
●
●
●
●
● ● ● ●
●
●
● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ●
GPU / CPU1
CPU6 / CPU1
5
10
15
20
GPUs for GRMs | Results
Time to invert GRM†
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1 hour
GPU
CPU6
CPU1
10−4
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
†
Single precision calculations
K. M. | 13 / 16
GPUs for GRMs | Results
Time to invert GRM†
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1 hour
GPU
CPU6
CPU1
10−4
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
†
Single precision calculations
K. M. | 13 / 16
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●●●
●
●
●●●●●●●●
●●●●●●
●
●●●●●●●●
●
●●●●
●●
●
●●●●
●●
●●●●●●●●●
●
GPU / CPU1
CPU6 / CPU1
5
10
15
GPUs for GRMs | Results
Time to invert GRM‡
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●
●
●
●
● ●
1 hour
GPU
CPU6
CPU1
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
‡
Double precision calculations
K. M. | 14 / 16
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ● ●
●
●
● ●
●
● ● ●
● ●
●
●
● ●
● ●
●
● ● ● ● ●
● ● ● ● ● ●
●
●
● ● ● ● ● ●
● ● ●
●
●
●
●
●
●
●
●
GPU / CPU1
CPU6 / CPU1
4
6
8
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
BLAS and LAPACK library routines awesome
– highly optimised
– exploit memory architecture of modern processors
– multi-threaded versions
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
BLAS and LAPACK library routines awesome
– highly optimised
– exploit memory architecture of modern processors
– multi-threaded versions
GPU: powerful new hardware for parallel computing
– thousands of processors
– can use multiple GPUs simultaneously
– BLAS/LAPACK etc. libraries available
– but: limited memory −→ blocked algorithms
– but: specialised interface, not much Fortran support yet
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
K. M. | 16 / 16

Weitere ähnliche Inhalte

Was ist angesagt?

"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...Edge AI and Vision Alliance
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Gao Boyang
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overviewRajiv Kumar
 
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...Beniamino Murgante
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsSepidehShirkhanzadeh
 
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...Koichi Shirahata
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performances.rohit
 
Resource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsResource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsTsung-en Hsiao
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsHeechul Yun
 
Graphics Processing Units
Graphics Processing UnitsGraphics Processing Units
Graphics Processing UnitsAmrik Sadhra
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Fisnik Kraja
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...Zalando adtech lab
 

Was ist angesagt? (20)

GPU
GPUGPU
GPU
 
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
 
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Resource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsResource Management with Systemd and cgroups
Resource Management with Systemd and cgroups
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
 
Graphics Processing Units
Graphics Processing UnitsGraphics Processing Units
Graphics Processing Units
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
 

Andere mochten auch

7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterolmegan508
 
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònnhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònmaxwell700
 
โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์nuttavorn kesornsiri
 
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)Hiroaki Wagatsuma
 
Piano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano BicoccaPiano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano BicoccaGiulia Caliandro
 
cần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmcần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmsung379
 
çİkolatali tatlilar
çİkolatali tatlilarçİkolatali tatlilar
çİkolatali tatlilarhtcuysl13
 
nhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmnhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmkraig109
 
Penalized maximum likelihood estimates of genetic covariance matrices wit...
  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...
Penalized maximum likelihood estimates of genetic covariance matrices wit...prettygully
 
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònnhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònmitch422
 
Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?Selena Norcini
 

Andere mochten auch (15)

Hipertensi¢n arterial
Hipertensi¢n arterialHipertensi¢n arterial
Hipertensi¢n arterial
 
7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol
 
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònnhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
 
Com59
Com59Com59
Com59
 
โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์
 
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
 
Piano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano BicoccaPiano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano Bicocca
 
cần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmcần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcm
 
çİkolatali tatlilar
çİkolatali tatlilarçİkolatali tatlilar
çİkolatali tatlilar
 
nhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmnhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcm
 
Penalized maximum likelihood estimates of genetic covariance matrices wit...
  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...
Penalized maximum likelihood estimates of genetic covariance matrices wit...
 
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònnhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
 
ISA 2015
ISA  2015 ISA  2015
ISA 2015
 
The pitch
The pitchThe pitch
The pitch
 
Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?
 

Ähnlich wie Slides meyer116

Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modelingnadikari123
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneOpen-NFP
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020NECST Lab @ Politecnico di Milano
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoSRohit Jnagal
 
GPUrdma - Presentation
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - PresentationFeras Daoud
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxssuser30e7d2
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)Amal R
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Tarik Reza Toha
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionPreferred Networks
 

Ähnlich wie Slides meyer116 (20)

Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
GPU
GPUGPU
GPU
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
GPUrdma - Presentation
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - Presentation
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Hotspot gc
Hotspot gcHotspot gc
Hotspot gc
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 

Kürzlich hochgeladen

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfTukamushabaBismark
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 

Kürzlich hochgeladen (20)

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 

Slides meyer116

  • 1. Utility of Graphics Processing Units for dense matrix calculations in computing and inverting genomic relationship matrices Karin Meyer and Bruce Tier Animal Genetics and Breeding Unit, University of New England, Armidale, Australia AAABG 2013
  • 2. GPUs for GRMs | Introduction Computing in the genomics age Requirements for mixed model analyses have changed! Pre-genomics: mixed model equations sparse (few non-zero elements) – inverse of pedigree based relationship matrix (NRM) concentrated on exploiting sparsity in computations Now: Genomic relationship matrix (GRM) dense (large proportion of elements non-zero) require manipulation of large, dense matrices to – compute the GRM – invert the GRM – predict breeding values in single-step approach computationally demanding exploit new hardware and software libraries – computing in parallel K. M. | 2 / 16
  • 3. GPUs for GRMs | Introduction Objectives A first look to examine scope for speeding up computations to calculate & invert GRM using parallel computations exploiting a Graphics Processing Unit (GPU) “An excursion into the land of computer gaming, modern compilers and software libraries” K. M. | 3 / 16
  • 4. GPUs for GRMs | Introduction What is a GPU? Graphical Processing Unit initially: co-processor for computer gaming – graphics card – fast recalculation of pixel values – remember: i387 math processor for i386 now: GP-GPU adapted to General Purpose computing – hundreds to thousands of cores – capable of double precision calculations – turns desktop PC into personal super-computer – but: limited memory (currently 6GB max.) specialised interface for application programming – CUDA (proprietary of NVIDIA) −→ Fortran interface – OPENCL (C++ like) K. M. | 4 / 16
  • 5. GPUs for GRMs | Introduction Tesla K20X 2688 cores 6 GB RAM Peak: 3.95/1.31 Tflops (single/double precision) K. M. | 5 / 16
  • 6. GPUs for GRMs | Calculating the GRM Memory needed: G = ZZ Gigabytes – single precision matrices n Gn×n Zn×s s = 100K 500K 1000K 5 000 0.1 1.9 9.3 18.6 10 000 0.4 3.7 18.6 37.3 20 000 1.5 7.5 37.3 74.5 30 000 3.4 11.2 55.9 111.8 50 000 9.3 18.6 93.1 186.3 −→ break calculations into blocks −→ use out-of-core storage → parallel computing literature K. M. | 6 / 16
  • 7. GPUs for GRMs | Calculating the GRM Blockwise multiplication All-in-one =Z Z G G = ZZ Column-wise division = + · · · +Z.1 · · · · · · Z.4 Z.1 · · · · · · Z.4 Z.1Z.1 Z.4Z.4 G = k Z.kZ.k K. M. | 7 / 16 Z: n animals × s SNPs G: symmetric sn(n + 1)/2 flops
  • 8. GPUs for GRMs | Calculating the GRM Blockwise multiplication - cont’ Row-wise division =Zi. Zj. Gij = Zi.Zj. Row- and columnwise division = Gij = k ZikZjk K. M. | 8 / 16
  • 9. GPUs for GRMs | Inverting the GRM Blockwise inversion G =   GPP GPC GPT GCP GCC GCT GTP GTC GTT   1. Subdivide matrix into blocks C current −→ choosen size P previous T trailing 2. Factor & invert chol(GCC) 3. Adjust GPP & GPC 4. ‘Absorb’ into GTT & GCT 5. Calculate inverse 6. Redefine C, P & T; repeat Gauss-Jordan Elimination GCC := chol(GCC) GCC := G−1 CC GPC := GPCGCC GPP := GPP + GPCGPC GCT := GCCGCT GTT := GTT − GCTGCT GPT := GPT − GPCGCT GCT := −(GCCGCT) GPC := GPCGCC GCC := GCCGCC −→ break matrix products into blocks as needed K. M. | 9 / 16
  • 10. GPUs for GRMs | Computing environment Hardware ‘Standard’ multi-core CPU – Intel I7-3930K – 3.2 GHz, 12M cache – 64GB RAM – 6 cores GPU – Nvidia GTX 580 – 512 cores – 3GB RAM – capable of double precision calculations → high end gaming card, not GP-GPU K. M. | 10 / 16
  • 11. GPUs for GRMs | Computing environment Software Linux (CentOS 6) gfortran, gcc 4.4.7; Intel composer XE 13.0.1 Intel MKL library 11.0 – BLAS: SSYRK, SGEMM, STRTRI, STRMM – LAPACK: SPOTRF, SPOTRI CUDA 5.0 – CUBLAS: CUBLAS_SSYRK, CUBLAS_SGEMM, CUBLAS_STRTRI, CUBLAS_STRMM CULA dense R16a – CULA_SPOTRF, CULA_SPOTRI, CULA_STRTRI MAGMA 1.3 – MAGMAF_SLAUUM K. M. | 11 / 16
  • 12. GPUs for GRMs | Results Time to ‘make’ GRM∗ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU CPU6 CPU1 1 hour 5 hours 10−1 100 101 102 103 0 20000 40000 60000 80000 No. animals Systemtime(min) ∗ Single precision calculations, s = 500 000 K. M. | 12 / 16
  • 13. GPUs for GRMs | Results Time to ‘make’ GRM∗ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU CPU6 CPU1 1 hour 5 hours 10−1 100 101 102 103 0 20000 40000 60000 80000 No. animals Systemtime(min) ∗ Single precision calculations, s = 500 000 K. M. | 12 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU / CPU1 CPU6 / CPU1 5 10 15 20
  • 14. GPUs for GRMs | Results Time to invert GRM† ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−4 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) † Single precision calculations K. M. | 13 / 16
  • 15. GPUs for GRMs | Results Time to invert GRM† ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−4 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) † Single precision calculations K. M. | 13 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●●● ●●●● ● ● ●●●●●●●● ●●●●●● ● ●●●●●●●● ● ●●●● ●● ● ●●●● ●● ●●●●●●●●● ● GPU / CPU1 CPU6 / CPU1 5 10 15
  • 16. GPUs for GRMs | Results Time to invert GRM‡ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) ‡ Double precision calculations K. M. | 14 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU / CPU1 CPU6 / CPU1 4 6 8
  • 17. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable K. M. | 15 / 16
  • 18. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable BLAS and LAPACK library routines awesome – highly optimised – exploit memory architecture of modern processors – multi-threaded versions K. M. | 15 / 16
  • 19. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable BLAS and LAPACK library routines awesome – highly optimised – exploit memory architecture of modern processors – multi-threaded versions GPU: powerful new hardware for parallel computing – thousands of processors – can use multiple GPUs simultaneously – BLAS/LAPACK etc. libraries available – but: limited memory −→ blocked algorithms – but: specialised interface, not much Fortran support yet K. M. | 15 / 16
  • 20. GPUs for GRMs | Results | Conclusions K. M. | 16 / 16