Nvidia in bioinformatics

GPU ACCELERATION OF BIOINFORMATICS PIPELINES
Jonathan Cohen and Mark Berger, NVIDIA

Agenda
GPU Programming in 10 slides – Cohen (10 minutes)
GPUs for Bioinformatics – Cohen (10 minutes)
Experiences porting SeqAn to CUDA – Siragusa (15 minutes)
Resources – Berger (5 minutes)
Discussion, Q&A – All (20 minutes)

CUDA – Programming for Throughput
CPU threads:
Large amount of memory per thread
Full-featured instruction set
1-16 execute simultaneous
CUDA threads:
Lightweight footprint
Full-featured instruction set
10,000 execute simultaneously
CPU Host Executes functions
GPU Device Executes kernels
Run few threads,
each one very fast
Run many threads,
each one slow,
=> total throughput high

CUDA Kernels: Parallel Threads
A kernel is an array of threads,
executed in parallel
All threads execute the same
code
Each thread has an ID
Select input/output data
Control decisions
float x =
input[threadID];
float y = func(x);
output[threadID] = y;

CUDA Kernels: Subdivide into Blocks

Threads are grouped into blocks

Blocks are grouped into a grid

A kernel is executed as a grid of blocks of threads

A kernel is executed as a grid of blocks of threads
GPU

Accelerated Computing
Multi-core plus Many-cores
CPU
Optimized for
Serial Tasks
GPU Accelerator
Optimized for Many
Parallel Tasks
3-10X+ Comp Thruput
7X Memory Bandwidth
5x Energy Efficiency

How GPU Acceleration Works
Application Code
+
GPU CPU
5% of Code
Compute-Intensive Functions
Rest of Sequential
CPU Code

Hello World in CUDA
__global__
void parallel_hello_world()
{
printf(“Hello, world. This is thread %d, block %d!n”,
threadIdx.x, blockIdx.x);
}
int main()
{
parallel_hello_world<<<128,128>>>();
return 0;
}
> nvcc –o hello_world –arch=sm_30 main.cu
> ./hello_world
Hello, world. This is thread 0, block 0!
Hello, world. This is thread 1, block 0!
...

Life Technologies
Ion Proton
3 GPUs per Device
S3229 - GPU Accelerated Signal Processing in Ion Proton
Whole Genome Sequencer
Mohit Gupta ( Life Technologies )
Jakob Siegel ( Life Technologies )
https://registration.gputechconf.com/form/session-listing

BGI & NVIDIA
Joint Innovation Lab
SOAP3 Aligner
S3257 - Tackling Big Data in Genomics with GPU
BingQiang Wang (Beijing Genomics Institute)

CUDASW++
From Bertil Schmidt’s group: http://cudasw.sourceforge.net/homepage.htm
Y. Liu, A. Wirawan, B. Schmidt: "CUDASW++ 3.0: accelerating Smith-Waterman protein database search
by coupling CPU and GPU SIMD instructions". BMC Bioinformatics, 2013, 14:117.
Performance comparisons on
the Swiss-Prot database.
“On GTX680 (GTX690),
CUDASW++ 3.0 yields an
average performance of 109.4
(169.7) GCUPS, with a
maximum of 119.0 (185.6)
GCUPS.”

NVIDIA GPU Life Science Focus
Molecular Dynamics: All codes are available
AMBER, CHARMM, DESMOND, DL_POLY,
GROMACS, LAMMPS, NAMD
Great multi-GPU performance
GPU codes: Abalone, ACEMD, HOOMD-Blue
Focus: scaling to large numbers of GPUs
Quantum Chemistry: key codes ported or optimizing
Active GPU acceleration projects:
VASP, NWChem, Gaussian, GAMESS, ABINIT,
Quantum Espresso, BigDFT, CP2K, GPAW, etc.
GPU code: TeraChem
Analytical and Medical Imaging Instruments

NVBIO
A GPU based C++ framework for
High Throughput Sequence Analysis
Short & Long Read Alignment
Variant Calling
Compression
…
Overall Design:
flexibility & customizability – a templated library
parallelism at every level
optimize throughput, server-like design
optimize the whole pipeline, not just a single component
(e.g. including data transfers, SAM, BAM, CRAM I/O, …)

A modular library
FM-index
Suffix Trie
Radix Tree
Sorted Dictionary
Edit Distance
Smith-Waterman
Needleman-Wunsch
Gotoh
Banded/Full DP
DP AlignmentTries
Exact Search
Backtracking
Text Search
FASTQ
FASTA
Sequence I/O
SAM
BAM
CRAM
Alignment I/O
HTML report
generators
Support Tools
GPU
CPU
O(1k-10k) threads
O(10-100) threads

nvBowtie2 - Real Datasets
speedup 4.3x
alignment rate +0.5%
disagreement 0.002%
Ion Proton
100M x 175bp (8-350) end-to-end
-
speedup 2.4x
alignment rate =
disagreement 0.006%
Illumina Genome Analyzer II
10M x 100bp x 2 end-to-end
ERR161544
speedup 7.6x
alignment rate -0.6%
disagreement 0.03%
Ion Proton
100M x 175bp (8-350) local
-
speedup 2.6x
alignment rate =
disagreement 0.022%
Illumina Genome Analyzer II
10M x 100bp x 2 local
ERR161544

TT32
NVBIO: efficient sequences analysis on GPUs
Jacopo Pantaleoni
Tuesday 2:10 pm, Hall 9

GPU Technology Conference
Tag: “Bioinformatics and Genomics”
http://www.gputechconf.com/page/home.html
Google: “GPU Technology Conference”

3 Ways to Accelerate Applications
Applications
Libraries
“Drop-in”
Acceleration
Programming
Languages
Maximum
Flexibility
OpenACC
Directives
Easily Accelerate
Applications

GPU Accelerated Libraries
“Drop-in” Acceleration for your Applications
Linear Algebra
FFT, BLAS,
SPARSE, Matrix
Numerical & Math
RAND, Statistics
Data Struct. & AI
Sort, Scan, Zero Sum
Visual Processing
Image & Video
NVIDIA
cuFFT,
cuBLAS,
cuSPARSE
NVIDIA
Math Lib NVIDIA cuRAND
NVIDIA
NPP
NVIDIA
Video
Encode
GPU AI –
Board
Games
GPU AI –
Path Finding

OpenACC: Open, Simple, Portable
• Open Standard
• Easy, Compiler-Driven Approach
• Portable on GPUs and Xeon Phi
main() {
…
<serial code>
…
#pragma acc kernels
{
<compute intensive code>
}
…
}
Compiler
Hint
CAM-SE Climate
6x Faster on GPU
2x Faster on CPU only
Top Kernel: 50% of Runtime
Available from:

GPU Programming Languages
OpenACC, CUDA FortranFortran
OpenACC, CUDA CC
Thrust, CUDA C++C++
PyCUDA, Anaconda AcceleratePython
GPU.NETC#
R, MATLAB, Mathematica, LabVIEWNumerical analytics

Reaching New Developers - CUDA Python
Python Productivity + GPU Performance
Easy to Learn
Powerful Libraries
Popular in New Developers
HPC & Data Analytics
Data from CodeEval.com, based on 100k+ code samples

Easiest Way to Learn CUDA
50K
Registered
127
Countries
$$
Learn from the Best
Anywhere, Any Time
It’s Free!
Engage with an Active Community

Nvidia in bioinformatics

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Nvidia in bioinformatics

Ähnlich wie Nvidia in bioinformatics (20)

Mehr von Shanker Trivedi

Mehr von Shanker Trivedi (14)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Nvidia in bioinformatics