SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
GPU ACCELERATION OF BIOINFORMATICS PIPELINES
Jonathan Cohen and Mark Berger, NVIDIA
Agenda
GPU Programming in 10 slides – Cohen (10 minutes)
GPUs for Bioinformatics – Cohen (10 minutes)
Experiences porting SeqAn to CUDA – Siragusa (15 minutes)
Resources – Berger (5 minutes)
Discussion, Q&A – All (20 minutes)
GPU Programming in Ten Slides
CUDA – Programming for Throughput
CPU threads:
Large amount of memory per thread
Full-featured instruction set
1-16 execute simultaneous
CUDA threads:
Lightweight footprint
Full-featured instruction set
10,000 execute simultaneously
CPU Host Executes functions
GPU Device Executes kernels
Run few threads,
each one very fast
Run many threads,
each one slow,
=> total throughput high
CUDA Kernels: Parallel Threads
A kernel is an array of threads,
executed in parallel
All threads execute the same
code
Each thread has an ID
Select input/output data
Control decisions
float x =
input[threadID];
float y = func(x);
output[threadID] = y;
CUDA Kernels: Subdivide into Blocks
CUDA Kernels: Subdivide into Blocks
Threads are grouped into blocks
CUDA Kernels: Subdivide into Blocks
Threads are grouped into blocks
Blocks are grouped into a grid
CUDA Kernels: Subdivide into Blocks
Threads are grouped into blocks
Blocks are grouped into a grid
A kernel is executed as a grid of blocks of threads
CUDA Kernels: Subdivide into Blocks
Threads are grouped into blocks
Blocks are grouped into a grid
A kernel is executed as a grid of blocks of threads
GPU
Accelerated Computing
Multi-core plus Many-cores
CPU
Optimized for
Serial Tasks
GPU Accelerator
Optimized for Many
Parallel Tasks
3-10X+ Comp Thruput
7X Memory Bandwidth
5x Energy Efficiency
How GPU Acceleration Works
Application Code
+
GPU CPU
5% of Code
Compute-Intensive Functions
Rest of Sequential
CPU Code
Hello World in CUDA
__global__
void parallel_hello_world()
{
printf(“Hello, world. This is thread %d, block %d!n”,
threadIdx.x, blockIdx.x);
}
int main()
{
parallel_hello_world<<<128,128>>>();
return 0;
}
> nvcc –o hello_world –arch=sm_30 main.cu
> ./hello_world
Hello, world. This is thread 0, block 0!
Hello, world. This is thread 1, block 0!
...
GPUs for Bioinformatics
Life Technologies
Ion Proton
3 GPUs per Device
S3229 - GPU Accelerated Signal Processing in Ion Proton
Whole Genome Sequencer
Mohit Gupta ( Life Technologies )
Jakob Siegel ( Life Technologies )
https://registration.gputechconf.com/form/session-listing
BGI & NVIDIA
Joint Innovation Lab
SOAP3 Aligner
S3257 - Tackling Big Data in Genomics with GPU
BingQiang Wang (Beijing Genomics Institute)
https://registration.gputechconf.com/form/session-listing
CUDASW++
From Bertil Schmidt’s group: http://cudasw.sourceforge.net/homepage.htm
Y. Liu, A. Wirawan, B. Schmidt: "CUDASW++ 3.0: accelerating Smith-Waterman protein database search
by coupling CPU and GPU SIMD instructions". BMC Bioinformatics, 2013, 14:117.
Performance comparisons on
the Swiss-Prot database.
“On GTX680 (GTX690),
CUDASW++ 3.0 yields an
average performance of 109.4
(169.7) GCUPS, with a
maximum of 119.0 (185.6)
GCUPS.”
NVIDIA GPU Life Science Focus
Molecular Dynamics: All codes are available
AMBER, CHARMM, DESMOND, DL_POLY,
GROMACS, LAMMPS, NAMD
Great multi-GPU performance
GPU codes: Abalone, ACEMD, HOOMD-Blue
Focus: scaling to large numbers of GPUs
Quantum Chemistry: key codes ported or optimizing
Active GPU acceleration projects:
VASP, NWChem, Gaussian, GAMESS, ABINIT,
Quantum Espresso, BigDFT, CP2K, GPAW, etc.
GPU code: TeraChem
Analytical and Medical Imaging Instruments
NVBIO
A GPU based C++ framework for
High Throughput Sequence Analysis
Short & Long Read Alignment
Variant Calling
Compression
…
Overall Design:
flexibility & customizability – a templated library
parallelism at every level
optimize throughput, server-like design
optimize the whole pipeline, not just a single component
(e.g. including data transfers, SAM, BAM, CRAM I/O, …)
A modular library
FM-index
Suffix Trie
Radix Tree
Sorted Dictionary
Edit Distance
Smith-Waterman
Needleman-Wunsch
Gotoh
Banded/Full DP
DP AlignmentTries
Exact Search
Backtracking
Text Search
FASTQ
FASTA
Sequence I/O
SAM
BAM
CRAM
Alignment I/O
HTML report
generators
Support Tools
GPU
CPU
O(1k-10k) threads
O(10-100) threads
nvBowtie2 - Real Datasets
speedup 4.3x
alignment rate +0.5%
disagreement 0.002%
Ion Proton
100M x 175bp (8-350) end-to-end
-
speedup 2.4x
alignment rate =
disagreement 0.006%
Illumina Genome Analyzer II
10M x 100bp x 2 end-to-end
ERR161544
speedup 7.6x
alignment rate -0.6%
disagreement 0.03%
Ion Proton
100M x 175bp (8-350) local
-
speedup 2.6x
alignment rate =
disagreement 0.022%
Illumina Genome Analyzer II
10M x 100bp x 2 local
ERR161544
TT32
NVBIO: efficient sequences analysis on GPUs
Jacopo Pantaleoni
Tuesday 2:10 pm, Hall 9
GPU Technology Conference
https://registration.gputechconf.com/form/session-listing
Tag: “Bioinformatics and Genomics”
http://www.gputechconf.com/page/home.html
Google: “GPU Technology Conference”
Resources
3 Ways to Accelerate Applications
Applications
Libraries
“Drop-in”
Acceleration
Programming
Languages
Maximum
Flexibility
OpenACC
Directives
Easily Accelerate
Applications
GPU Accelerated Libraries
“Drop-in” Acceleration for your Applications
Linear Algebra
FFT, BLAS,
SPARSE, Matrix
Numerical & Math
RAND, Statistics
Data Struct. & AI
Sort, Scan, Zero Sum
Visual Processing
Image & Video
NVIDIA
cuFFT,
cuBLAS,
cuSPARSE
NVIDIA
Math Lib NVIDIA cuRAND
NVIDIA
NPP
NVIDIA
Video
Encode
GPU AI –
Board
Games
GPU AI –
Path Finding
OpenACC: Open, Simple, Portable
• Open Standard
• Easy, Compiler-Driven Approach
• Portable on GPUs and Xeon Phi
main() {
…
<serial code>
…
#pragma acc kernels
{
<compute intensive code>
}
…
}
Compiler
Hint
CAM-SE Climate
6x Faster on GPU
2x Faster on CPU only
Top Kernel: 50% of Runtime
Available from:
GPU Programming Languages
OpenACC, CUDA FortranFortran
OpenACC, CUDA CC
Thrust, CUDA C++C++
PyCUDA, Anaconda AcceleratePython
GPU.NETC#
R, MATLAB, Mathematica, LabVIEWNumerical analytics
Reaching New Developers - CUDA Python
Python Productivity + GPU Performance
Easy to Learn
Powerful Libraries
Popular in New Developers
HPC & Data Analytics
Data from CodeEval.com, based on 100k+ code samples
Easiest Way to Learn CUDA
50K
Registered
127
Countries
$$
Learn from the Best
Anywhere, Any Time
It’s Free!
Engage with an Active Community
Feedback/Discussion

Weitere ähnliche Inhalte

Was ist angesagt?

YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation Jiann-Fuh Liaw
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORNVIDIA Japan
 
Rtos ameba
Rtos amebaRtos ameba
Rtos amebaJou Neo
 
Kernel development
Kernel developmentKernel development
Kernel developmentNuno Martins
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudAndrea Righi
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityBrendan Gregg
 
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterLISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterIvan Babrou
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerunidsecconf
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitSpying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitAndrea Righi
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 

Was ist angesagt? (20)

YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
 
Rtos ameba
Rtos amebaRtos ameba
Rtos ameba
 
Kernel development
Kernel developmentKernel development
Kernel development
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterLISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitSpying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profit
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 

Ähnlich wie Nvidia in bioinformatics

Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08Neil Pittman
 
Adam_Mcconnell_Revision3
Adam_Mcconnell_Revision3Adam_Mcconnell_Revision3
Adam_Mcconnell_Revision3Adam McConnell
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5Steen Larsen
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfMuhammadAbdullah311866
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodellingObsidian Software
 

Ähnlich wie Nvidia in bioinformatics (20)

Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
ADCSS 2022
ADCSS 2022ADCSS 2022
ADCSS 2022
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Adam_Mcconnell_Revision3
Adam_Mcconnell_Revision3Adam_Mcconnell_Revision3
Adam_Mcconnell_Revision3
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
GTC 2022 Keynote
GTC 2022 KeynoteGTC 2022 Keynote
GTC 2022 Keynote
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodelling
 

Mehr von Shanker Trivedi

BHGE AM 2018 keynote final
BHGE AM 2018 keynote finalBHGE AM 2018 keynote final
BHGE AM 2018 keynote finalShanker Trivedi
 
GTC World Tour 2017 highlights
GTC World Tour 2017 highlightsGTC World Tour 2017 highlights
GTC World Tour 2017 highlightsShanker Trivedi
 
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDATiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDAShanker Trivedi
 
Silicom Ventures Talk Aug 2013 - GPUs and Parallel Programming create new opp...
Silicom Ventures Talk Aug 2013 - GPUs and Parallel Programming create new opp...Silicom Ventures Talk Aug 2013 - GPUs and Parallel Programming create new opp...
Silicom Ventures Talk Aug 2013 - GPUs and Parallel Programming create new opp...Shanker Trivedi
 
NVIDIA GTC 2013 HIGHLIGHTS
NVIDIA GTC 2013 HIGHLIGHTSNVIDIA GTC 2013 HIGHLIGHTS
NVIDIA GTC 2013 HIGHLIGHTSShanker Trivedi
 
Nvidia quadro sales guide
Nvidia quadro sales guideNvidia quadro sales guide
Nvidia quadro sales guideShanker Trivedi
 
Nvidia Corporate Presentation
Nvidia Corporate PresentationNvidia Corporate Presentation
Nvidia Corporate PresentationShanker Trivedi
 
Tesla @ NVIDIA investor day
Tesla @ NVIDIA investor dayTesla @ NVIDIA investor day
Tesla @ NVIDIA investor dayShanker Trivedi
 
Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1Shanker Trivedi
 
Icme Stanford 20110507 Final
Icme Stanford 20110507 FinalIcme Stanford 20110507 Final
Icme Stanford 20110507 FinalShanker Trivedi
 

Mehr von Shanker Trivedi (14)

BHGE AM 2018 keynote final
BHGE AM 2018 keynote finalBHGE AM 2018 keynote final
BHGE AM 2018 keynote final
 
GTC World Tour 2017 highlights
GTC World Tour 2017 highlightsGTC World Tour 2017 highlights
GTC World Tour 2017 highlights
 
Nvidia 2018 1
Nvidia 2018 1Nvidia 2018 1
Nvidia 2018 1
 
GTC2016highlights
GTC2016highlightsGTC2016highlights
GTC2016highlights
 
GTC 2015 Highlights
GTC 2015 HighlightsGTC 2015 Highlights
GTC 2015 Highlights
 
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDATiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
 
Silicom Ventures Talk Aug 2013 - GPUs and Parallel Programming create new opp...
Silicom Ventures Talk Aug 2013 - GPUs and Parallel Programming create new opp...Silicom Ventures Talk Aug 2013 - GPUs and Parallel Programming create new opp...
Silicom Ventures Talk Aug 2013 - GPUs and Parallel Programming create new opp...
 
NVIDIA GTC 2013 HIGHLIGHTS
NVIDIA GTC 2013 HIGHLIGHTSNVIDIA GTC 2013 HIGHLIGHTS
NVIDIA GTC 2013 HIGHLIGHTS
 
Gtc2013 recap
Gtc2013 recapGtc2013 recap
Gtc2013 recap
 
Nvidia quadro sales guide
Nvidia quadro sales guideNvidia quadro sales guide
Nvidia quadro sales guide
 
Nvidia Corporate Presentation
Nvidia Corporate PresentationNvidia Corporate Presentation
Nvidia Corporate Presentation
 
Tesla @ NVIDIA investor day
Tesla @ NVIDIA investor dayTesla @ NVIDIA investor day
Tesla @ NVIDIA investor day
 
Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1
 
Icme Stanford 20110507 Final
Icme Stanford 20110507 FinalIcme Stanford 20110507 Final
Icme Stanford 20110507 Final
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Nvidia in bioinformatics

  • 1. GPU ACCELERATION OF BIOINFORMATICS PIPELINES Jonathan Cohen and Mark Berger, NVIDIA
  • 2. Agenda GPU Programming in 10 slides – Cohen (10 minutes) GPUs for Bioinformatics – Cohen (10 minutes) Experiences porting SeqAn to CUDA – Siragusa (15 minutes) Resources – Berger (5 minutes) Discussion, Q&A – All (20 minutes)
  • 3. GPU Programming in Ten Slides
  • 4. CUDA – Programming for Throughput CPU threads: Large amount of memory per thread Full-featured instruction set 1-16 execute simultaneous CUDA threads: Lightweight footprint Full-featured instruction set 10,000 execute simultaneously CPU Host Executes functions GPU Device Executes kernels Run few threads, each one very fast Run many threads, each one slow, => total throughput high
  • 5. CUDA Kernels: Parallel Threads A kernel is an array of threads, executed in parallel All threads execute the same code Each thread has an ID Select input/output data Control decisions float x = input[threadID]; float y = func(x); output[threadID] = y;
  • 7. CUDA Kernels: Subdivide into Blocks Threads are grouped into blocks
  • 8. CUDA Kernels: Subdivide into Blocks Threads are grouped into blocks Blocks are grouped into a grid
  • 9. CUDA Kernels: Subdivide into Blocks Threads are grouped into blocks Blocks are grouped into a grid A kernel is executed as a grid of blocks of threads
  • 10. CUDA Kernels: Subdivide into Blocks Threads are grouped into blocks Blocks are grouped into a grid A kernel is executed as a grid of blocks of threads GPU
  • 11. Accelerated Computing Multi-core plus Many-cores CPU Optimized for Serial Tasks GPU Accelerator Optimized for Many Parallel Tasks 3-10X+ Comp Thruput 7X Memory Bandwidth 5x Energy Efficiency
  • 12. How GPU Acceleration Works Application Code + GPU CPU 5% of Code Compute-Intensive Functions Rest of Sequential CPU Code
  • 13. Hello World in CUDA __global__ void parallel_hello_world() { printf(“Hello, world. This is thread %d, block %d!n”, threadIdx.x, blockIdx.x); } int main() { parallel_hello_world<<<128,128>>>(); return 0; } > nvcc –o hello_world –arch=sm_30 main.cu > ./hello_world Hello, world. This is thread 0, block 0! Hello, world. This is thread 1, block 0! ...
  • 15. Life Technologies Ion Proton 3 GPUs per Device S3229 - GPU Accelerated Signal Processing in Ion Proton Whole Genome Sequencer Mohit Gupta ( Life Technologies ) Jakob Siegel ( Life Technologies ) https://registration.gputechconf.com/form/session-listing
  • 16. BGI & NVIDIA Joint Innovation Lab SOAP3 Aligner S3257 - Tackling Big Data in Genomics with GPU BingQiang Wang (Beijing Genomics Institute) https://registration.gputechconf.com/form/session-listing
  • 17. CUDASW++ From Bertil Schmidt’s group: http://cudasw.sourceforge.net/homepage.htm Y. Liu, A. Wirawan, B. Schmidt: "CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions". BMC Bioinformatics, 2013, 14:117. Performance comparisons on the Swiss-Prot database. “On GTX680 (GTX690), CUDASW++ 3.0 yields an average performance of 109.4 (169.7) GCUPS, with a maximum of 119.0 (185.6) GCUPS.”
  • 18. NVIDIA GPU Life Science Focus Molecular Dynamics: All codes are available AMBER, CHARMM, DESMOND, DL_POLY, GROMACS, LAMMPS, NAMD Great multi-GPU performance GPU codes: Abalone, ACEMD, HOOMD-Blue Focus: scaling to large numbers of GPUs Quantum Chemistry: key codes ported or optimizing Active GPU acceleration projects: VASP, NWChem, Gaussian, GAMESS, ABINIT, Quantum Espresso, BigDFT, CP2K, GPAW, etc. GPU code: TeraChem Analytical and Medical Imaging Instruments
  • 19. NVBIO A GPU based C++ framework for High Throughput Sequence Analysis Short & Long Read Alignment Variant Calling Compression … Overall Design: flexibility & customizability – a templated library parallelism at every level optimize throughput, server-like design optimize the whole pipeline, not just a single component (e.g. including data transfers, SAM, BAM, CRAM I/O, …)
  • 20. A modular library FM-index Suffix Trie Radix Tree Sorted Dictionary Edit Distance Smith-Waterman Needleman-Wunsch Gotoh Banded/Full DP DP AlignmentTries Exact Search Backtracking Text Search FASTQ FASTA Sequence I/O SAM BAM CRAM Alignment I/O HTML report generators Support Tools GPU CPU O(1k-10k) threads O(10-100) threads
  • 21. nvBowtie2 - Real Datasets speedup 4.3x alignment rate +0.5% disagreement 0.002% Ion Proton 100M x 175bp (8-350) end-to-end - speedup 2.4x alignment rate = disagreement 0.006% Illumina Genome Analyzer II 10M x 100bp x 2 end-to-end ERR161544 speedup 7.6x alignment rate -0.6% disagreement 0.03% Ion Proton 100M x 175bp (8-350) local - speedup 2.6x alignment rate = disagreement 0.022% Illumina Genome Analyzer II 10M x 100bp x 2 local ERR161544
  • 22. TT32 NVBIO: efficient sequences analysis on GPUs Jacopo Pantaleoni Tuesday 2:10 pm, Hall 9
  • 23. GPU Technology Conference https://registration.gputechconf.com/form/session-listing Tag: “Bioinformatics and Genomics” http://www.gputechconf.com/page/home.html Google: “GPU Technology Conference”
  • 25. 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming Languages Maximum Flexibility OpenACC Directives Easily Accelerate Applications
  • 26. GPU Accelerated Libraries “Drop-in” Acceleration for your Applications Linear Algebra FFT, BLAS, SPARSE, Matrix Numerical & Math RAND, Statistics Data Struct. & AI Sort, Scan, Zero Sum Visual Processing Image & Video NVIDIA cuFFT, cuBLAS, cuSPARSE NVIDIA Math Lib NVIDIA cuRAND NVIDIA NPP NVIDIA Video Encode GPU AI – Board Games GPU AI – Path Finding
  • 27. OpenACC: Open, Simple, Portable • Open Standard • Easy, Compiler-Driven Approach • Portable on GPUs and Xeon Phi main() { … <serial code> … #pragma acc kernels { <compute intensive code> } … } Compiler Hint CAM-SE Climate 6x Faster on GPU 2x Faster on CPU only Top Kernel: 50% of Runtime Available from:
  • 28. GPU Programming Languages OpenACC, CUDA FortranFortran OpenACC, CUDA CC Thrust, CUDA C++C++ PyCUDA, Anaconda AcceleratePython GPU.NETC# R, MATLAB, Mathematica, LabVIEWNumerical analytics
  • 29. Reaching New Developers - CUDA Python Python Productivity + GPU Performance Easy to Learn Powerful Libraries Popular in New Developers HPC & Data Analytics Data from CodeEval.com, based on 100k+ code samples
  • 30. Easiest Way to Learn CUDA 50K Registered 127 Countries $$ Learn from the Best Anywhere, Any Time It’s Free! Engage with an Active Community