SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as NVIDIA GPUs. Go through the slides to learn more. For your own BI development you can try GPUs for free here: www.Nvidia.com/GPUTestDrive
Unraveling Multimodality with Large Language Models.pdf
Introduction to SeqAn, an Open-source C++ Template Library
1. Test Drive NVIDIA GPUs!
Experience The Acceleration
Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Drive
on remotely hosted clusters
www.nvidia.com/GPUTestDrive
2. Prof. Dr. Knut Reinert
Algorithmische Bioinformatik, FB Mathematik und Informatik
Intro to SeqAn
An Open-Source C++ template library
for biological sequence analysis
Knut Reinert, David Weese
Freie Universität Berlin Berlin
Institute for Computer Science
4. ~ 15 years ago...
Data volume and cost:
In 2000 the 3 billion base pairs of the
human genome were sequenced for
about 3 billion US$ Dollar
100 million bp per day
Nvidia Webinar, 22.10.2013
4
5. Sequencing today...
Illumina HiSeq
100 Billion bps per DAY
Within roughly ten years sequencing has
become about 10 million times cheaper
Nvidia Webinar, 22.10.2013
5
6. Future of NGS data analysis
Nvidia Webinar, 22.10.2013
6
8. SeqAn
Now SeqAn/SeqAn tools have been cited more
than 360 times
Among the institutions are (omitting German institutes):
Department of Genetics, Harvard Medical School, Boston,
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,
J. Craig Venter Institute, under BSD USA,
Is Rockville MD, license and
Department of Molecular Biology, Princeton University,
hence free for academic
Applied Mathematics Program, Yale University, New Haven,
IBM T.J. Watson Research Center, Yorktown Heights,
AND commercial use.
The Ohio State University, Columbus, University of Minnesota,
Australian National University, Canberra,
Department of Statistics, University of Oxford,
Swedish University of Agricultural Sciences (SLU), Uppsala,
Graduate School of Life Sciences, University of Cambridge,
Broad Institute, Cambridge, USA,
EMBL-EBI, University of California, University of Chicago,
Iowa State University, Ames, The Pennsylvania State University,
Peking University, Beijing University of Science and Technology of China,
BGI-Shenzhen, China, Beijing Institute of Genomics……
Nvidia Webinar, 22.10.2013
8
30. SeqAn SDK Components
Review Board to ensure code quality
CDash/CTest to automatically
Code coverage reports
compile and test across platforms
Nvidia Webinar, 22.10.2013
30
32. Unified Alignment Algorithms
Versatile & Extensible DP-Interface
For Example ...
Standard DP-Algorithms
Global & Semi Global Alignments
Local Alignments
Modified DP-Algorithms
Split Breakpoint Detection
Banded Chain Alignment
Nvidia Webinar, 22.10.2013
32
33. Unified Alignment Algorithms
For
Example
...
Needleman-Wunsch with Traceback:
DPProfile<GlobalAlignment<>, LinearGaps, TracebackOn<> >
Semi-Global Gotoh without Traceback:
DPProfile<GlobalAlignment<FreeEndGaps<True, False, True, False> >,
AffineGaps, TracebackOff>
Banded Smith-Waterman with Affine Gap Costs:
DPBand<BandOn>(lowerDiag, upperDiag),
DPProfile<LocalAlignment<>, AffineGaps, TracebackOn<> >
Split-Breakpoint Detection for Right Anchor:
DPProfile<SplitAlignment<>, AffineGaps, TracebackOn<GapsRight> >
Nvidia Webinar, 22.10.2013
33
34. Support for Common File Formats
Important file formats for HTS analysis
SequenceStream
ss(“file.fa.gz”);
Sequences
while
(!atEnd(ss))
FASTA, FASTQ
Indexed FASTA (FAI) for random access {
Genomic Features
GFF 2, GFF 3, GTF, BED
Read Mapping
SAM, BAM (plus BAM indices)
Variants
VCF
readRecord(id,
seq,
ss);
cout
<<
id
<<
't'
<<
seq
<<
'n';
}
BamStream
bs(“file.bam”);
while
(!atEnd(bs))
{
readRecord(record,
bs);
cout
<<
record.qName
<<
't'
<<
record.pos
<<
'n’;
}
… or write your own parser
Tutorials and helper routines for writing your own parsers.
Nvidia Webinar, 22.10.2013
34
36. Fragment
Store
(Multi) Read Alignments
Read alignments can be easily imported:
std::ifstream
file("ex1.sam");
read(file,
store,
Sam());
… and accessed as a multiple alignment, e.g. for visualization:
AlignedReadLayout
layout;
layoutAlignment(layout,
store);
printAlignment(svgFile,
Raw(),
layout,
store,
1,
0,
150,
0,
36);
Nvidia Webinar, 22.10.2013
36
37. Unified
Full-‐Text
Indexing
Framework
Available Indices
Suffix Trees:
• suffix array
• enhanced suffix array
• lazy suffix tree
Prefix Trie:
• FM-index
q-Gram Indices:
• direct addressing
• open addressing
• gapped
Index<TSeq,
IndexEsa<>
>
Index<StringSet<TSeq>,
FMIndex<>
>
All indices support multiple strings and external memory construction/usage.
Index Lookup Interface
All indices support the (sequential) find interface:
Finder<TIndex>
finder(index);
while
(find(finder,
"TATAA"))
cout
<<
"Hit
at
position"
<<
position(finder)
<<
endl;
Nvidia Webinar, 22.10.2013
37
40. Masai read mapper
Reads
Genome
Chr.
1
Chr.
2
Chr.
X
ACGCTTCATCGCCCT…
Index
of
reads
(Radix
tree
of
seeds)
Index
of
genome
(e.g.
FM-‐index)
Algorithm
is
based
on
the
simultaneous
traversal
of
two
string
indices
(e.g.,
FM-‐index,
Enhanced
suffix
array,
Lazy
suffix
tree)
40
Nvidia Webinar, 22.10.2013
41. Read Mapping: Masai
Faster
and
more
accurate
than
BWA
and
BowLe2
Timings
on
a
single
core
Nvidia Webinar, 22.10.2013
41
43. Collaboration to parallelize indices and
verification algorithms in SeqAn, to speed up any
applications making use of indices
What about multi-core implementation?
Nvidia Webinar, 22.10.2013
43
44. SeqAn going parallel
GOAL
Parallelize the finder interface of SeqAn
so it works on CPU and accelerators like GPU
Will
be
replaced
by
hg18
and
10
million
20-‐mers
Nvidia Webinar, 22.10.2013
44
46. SeqAn going parallel : NVIDIA GPUs
Copy
needles
and
index
to
GPU
SAME
count
funcLon
as
on
CPU
!
Nvidia Webinar, 22.10.2013
46
47. SeqAn going parallel
Count
occurrences
of
10
million
20-‐mers
in
the
human
genome
using
an
FM-‐index
I7,3.2
GHz
…12...
Intel
Xeon
Phi
7120,
244
threads
NVIDIA
Tesla
K20
Nvidia Webinar, 22.10.2013
18.6
sec
1
X
2.66
sec
7
X
2.18
sec
8.5
X
0.4 s
47
X
47
48. SeqAn going parallel
Approx.
count
occurrences
of
1.2
million
33-‐mers
in
the
human
genome
using
an
FM-‐index
I7,3.2
GHz
…12...
66.1
s
9.0
s
1
X
7.3
X
Intel
Xeon
Phi
7120,
244
threads
3.9 s
16.9
X
NVIDIA
Tesla
K20
3.2 s
20.7
X
Nvidia Webinar, 22.10.2013
48
49. Part II: The details
Nvidia Webinar, 22.10.2013
49
51. CUDA preliminaries
In
order
to
use
CUDA
we
first
had
to
adapt
some
parts
of
SeqAn:
• CUDA
requires
each
funcLon
to
be
prefixed
with
domain
qualifiers
__host__
or
__device__
in
order
to
generate
CPU/GPU
code
• We
prefixed
all
basic
template
funcLons
with
a
SEQAN_HOST_DEVICE
macro
#ifdef __CUDACC__!
#define SEQAN_HOST_DEVICE inline __device__ __host__!
#else!
#define SEQAN_HOST_DEVICE inline!
#endif!
• StaLc
const
arrays
are
not
allowed
in
the
way
SeqAn
defines
them
• We
replaced
alphabet
conversion
lookup
tables
(e.g.
Dna<-->
char)
by
conversion
funcLons
Nvidia Webinar, 22.10.2013
52. Strings
• Instead
of
defining
a
new
CUDA
string
we
simply
use
the
Thrust
library:
• Provides
host_vector
and
device_vector
classes,
which
are
vectors
with
buffers
in
host
or
device
memory
• However,
Thrust
funcLons
are
callable
only
from
host-‐side
• We
made
both
vectors
accessible
from
SeqAn
• SeqAn
strings
have
to
provide
a
set
of
global
(meta-‐)funcLons,
e.g.
Value<>,
resize(),
…
• We
simply
defined
the
required
wrapper
funcLons
for
these
two
vectors
Nvidia Webinar, 22.10.2013
53. Standard Strings
• Up
to
here,
all
strings
can
only
be
used
on
the
side
of
their
scope
Device
Memory
Host
Memory
thrust::host_vector!
Buffer
Buffer
thrust::device_vector!
seqan::String!
Nvidia Webinar, 22.10.2013
Buffer
seqan::String!
Buffer
54. Host-Device String
• How
to
access
a
device_vector
from
device-‐side?
• We
could
pass
(POD)
iterators
to
the
kernel
• However,
many
SeqAn
algorithms
work
on
more
complex
containers
• We
need
the
same
interface
of
the
container
on
the
device
side
• For
strings
we
developed
a
so-‐called
ContainerView (POD
type)
• Provides
a
container
interface
given
the
begin/end
pointers
of
vector
buffer
• The
view()
funcLon
creates
the
ContainerView
object
for
a
given
device_vector!
Nvidia Webinar, 22.10.2013
55. Host-Device String
• How
to
use
a
device_vector
on
the
device
Device
Memory
Host
Memory
Buffer
thrust::device_vector!
view()!
seqan::ContainerView!
Nvidia Webinar, 22.10.2013
kernel
launch!
seqan::ContainerView!
56. Device and View metafunctions
• For
generic
GPU
programming:
• The
Device
metafuncLon
returns
the
device-‐memory
equivalent
of
a
class
// Replaces String with thrust::device_vector.!
template <typename TValue, typename TSpec>!
struct Device<String<TValue, TSpec> >!
{!
typedef thrust::device_vector<TValue> Type;!
};!
• The
View
metafuncLon
returns
the
(POD)
view
type
of
a
class
// Returns a view type that can be passed to a CUDA kernel.!
template <typename TValue, typename TAlloc>!
struct View<thrust::device_vector<TValue, TAlloc> >!
{!
typedef ContainerView<thrust::device_vector<TValue, TAlloc> > Type;!
};!
Nvidia Webinar, 22.10.2013
57. Hello world
• A
simple
example
to
reverse
a
string
on
the
GPU
// A standard SeqAn string over the Dna alphabet.!
String<Dna> myString = "ACGT";!
!
// A Dna string on device global memory.!
typename Device<String<Dna> >::Type myDeviceString;!
!
// Copy the string to global memory.!
assign(myDeviceString, myString);!
!
// Pass a view of the device string to the CUDA kernel.!
myKernel<<<1,1>>>(view(myDeviceString));!
!
// TString is ContainerView<device_vector<Dna> >.!
template <typename TString>!
__global__ void myKernel(TString string)!
{!
printf(”length(string) = %dn", length(string));!
reverse(string);!
}!
Nvidia Webinar, 22.10.2013
58. Porting complex data structures
• More
complex
structures
(e.g.
Index,
Graph)
can
only
be
ported
to
the
GPU
if
they
…
• don’t
use
pointers
• use
only
strings
of
POD
types
(String<Dna>,
but
not
String<String<…> >)
• use
only
1-‐dimensional
StringSets
(ConcatDirect)
• Nested
classes
are
no
problem
• View
metafuncLon
converts
all
member
types
into
their
view
types
• view()
funcLon
is
called
recursively
on
all
members
Nvidia Webinar, 22.10.2013
63. The FM-index in SeqAn
• The
FM-‐index
can
be
implemented
using
a
number
of
string-‐based
lookup
tables
• ...
as
well
as
other
indices,
e.g.
enhanced
suffix
array,
q-‐gram
index
• There
is
a
space-‐Lme
tradeoff
between
all
these
indices
• The
FM
index
has
the
minimal
memory
requirements
Nvidia Webinar, 22.10.2013
64. A generic FM-index
• SeqAn‘s
FM-‐index
consists
of
some
nested
classes
storing
Strings
FM-‐index
(host-‐only)
Nvidia Webinar, 22.10.2013
65. A generic FM-index
• The
Device
type
of
the
FM
index
uses
device_vector
instead
of
String!
GPU
FM-‐index
(host-‐part)
• The
view
of
this
object
(=
device-‐part)
is
the
same
tree,
where
leaves
are
replaced
by
ContainerViews
of
device_vectors
Nvidia Webinar, 22.10.2013
66. CPU vs. GPU
• Invoking
an
FM-‐index
based
search
on
CPU
and
GPU:
// Select the index The findGPU kernel AND the
type.!
findCPU function will TIndex;!
typedef Index<DnaString, FMIndex<> > invoke many
!
instances of the SAME generic
// Type is Index<device_vector<Dna>, FMIndex<> >.!
function which will perform a
typedef typename Device<TIndex>::Type TDeviceIndex;!
!
// ======== On CPU
!
backtracking algorithm on our
========
// ==========
generic index interface
On
// Create an index.
TIndex index("ACGTTGCAA");
GPU ===========!
// Create a device index.!
TIndex index("ACGTTGCAA");!
TDeviceIndex deviceIndex;!
assign(deviceIndex, index);!
!
// Use the FM-index on CPU.
findCPU(index,…);
!
template <typename TIndex>
void
findCPU(TIndex & index,…);
Nvidia Webinar, 22.10.2013
// Use the FM-index in a CUDA kernel.!
findGPU<<<...>>>(view(deviceIndex),…);!
template <typename TIndex>!
__global__ void!
findGPU(TIndex index,…);!
67. Approximate search via backtracking
do {!
if (finder.score == finder.scoreThreshold)!
{!
if (goDown(textIt, suffix(pattern, patternIt))) delegate(finder);!
goUp(textIt);!
if (isRoot(textIt)) break;!
}!
else if (finder.score < finder.scoreThreshold)!
{!
if (atEnd(patternIt)) delegate(finder);!
else if (goDown(textIt))!
{!
finder.score += parentEdgeLabel(textIt) != value(patternIt);!
goNext(patternIt);!
continue;!
}!
}!
!
!
do {!
goPrevious(patternIt);!
finder.score -= parentEdgeLabel(textIt) != value(patternIt);!
} while (!goRight(textIt) && goUp(textIt));!
if (isRoot(textIt)) break;!
finder.score += parentEdgeLabel(textIt) != value(patternIt);!
goNext(patternIt);!
}!
while (true);!
Nvidia Webinar, 22.10.2013
68. Outlook for GPU support
• Our
next
steps
are:
• Provide
parallelFor()
to
hide
CUDA
kernel
call/OpenMP
for-‐loop
• Develop
classes
for
concurrent
access
(String,
job
queues)
• Port
more
indices
and
index
iterators
to
be
used
with
CUDA
• Port
SeqAn‘s
alignment
module
• Develop
a
CPU/GPU
version
of
the
FM-‐index
based
read
mapper
Masai
• ...
• Follow
our
development:
• Sources:
hqps://github.com/seqan/seqan/tree/develop
• Code
examples:
hqp://trac.seqan.de/wiki/HowTo/DevelopCUDA
Nvidia Webinar, 22.10.2013
70. Multicore parallelization
• We
first
introduced
Tags
to
switch
between
serial
and
parallel
algorithms:
struct Serial_;!
typedef Tag<Serial_> Serial;!
!
struct Parallel_;!
typedef Tag<Parallel_> Parallel;!
• Then
we
defined
basic
atomic
operaLons
required
for
thread
safety:
template <typename T>!
inline T atomicInc(T &x, Serial)!
{!
return ++x;!
}!
!
template <typename T>!
inline T atomicInc(volatile T &x, Parallel)!
{!
__sync_add_and_fetch(&x, 1);!
}!
71. Splitter
• To
this
end,
we
developed
the
Splitter<TValue, TSpec>
to
compute
a
parLLon
into
subintervals
of
(almost)
equal
length
…
Splitter<unsigned> splitter(10, 20, 3);!
for (unsigned i = 0; i < length(splitter); ++i)!
cout << '[' << splitter[i] << ',' << splitter[i+1] << ')' << endl;!
!
// [10,14)!
// [14,17) !
// [17,20)!
72. Splitter
• The
Spliqer
can
also
be
used
with
iterators
directly
• The
Serial
/
Parallel
tag
divides
an
interval
range
into
1
/
#thread_num
many
intervals
template <typename TIter, typename TVal, typename TParallelTag>!
inline void arrayFill(TIter begin_, TIter end_, !
TVal const &value, Tag<TParallelTag> parallelTag)!
{!
Splitter<TIterator> splitter(begin_, end_, parallelTag);!
!
SEQAN_OMP_PRAGMA(parallel for)!
for (int job = 0; job < (int)length(splitter); ++job)!
arrayFill(splitter[job], splitter[job + 1], value, Serial());!
}!
• The
parallel
tag
can
be
used
to
switch
off
the
parallel
behaviour
73. SeqAn going parallel
Count
occurrences
of
10
million
20-‐mers
in
the
human
genome
using
an
FM-‐index
I7,3.2
GHz
18.6
sec
1
X
Thank you for your
2.66
sec
7
X
…12...
attention
Intel
Xeon
Phi
7120,
244
threads
NVIDIA
Tesla
K20
2.18
sec
0.4 s
8.5
X
47
X
73
74. Upcoming GTC Express Webinars
October 23 - Revolutionize Virtual Desktops with the One
Missing Piece: A Scalable GPU
October 30 - OpenACC 2.0 Enhancements for Cray
Supercomputers
October 31 - Getting the Most out of NVIDIA GRID vGPU with
Citrix XenServer
November 5 - Accelerating Face-in-the-Crowd Recognition with
GPU Technology
November 6 - Bright Cluster Manager: A CUDA-ready
Management Solution for GPU-based HPC
Register at www.gputechconf.com/gtcexpress
75. GTC 2014 Call for Posters
Posters should describe novel or interesting topics in
§ Science and research
§ Professional graphics
§ Mobile computing
§ Automotive applications
§ Game development
§ Cloud computing
Call opens October 29
www.gputechconf.com
76. Test Drive NVIDIA GPUs!
Experience The Acceleration
Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Drive
on remotely hosted clusters
www.nvidia.com/GPUTestDrive