SlideShare ist ein Scribd-Unternehmen logo
1 von 76
Downloaden Sie, um offline zu lesen
Test Drive NVIDIA GPUs!
Experience The Acceleration

Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Drive
on remotely hosted clusters

www.nvidia.com/GPUTestDrive
Prof. Dr. Knut Reinert
Algorithmische Bioinformatik, FB Mathematik und Informatik

Intro to SeqAn
An Open-Source C++ template library
for biological sequence analysis
Knut Reinert, David Weese
Freie Universität Berlin Berlin
Institute for Computer Science
This talk

Why SeqAn?
SeqAn as SDK
SeqAn concept/content
Generic Parallelization
3
~ 15 years ago...

Data volume and cost:
In 2000 the 3 billion base pairs of the
human genome were sequenced for
about 3 billion US$ Dollar
100 million bp per day

Nvidia Webinar, 22.10.2013

4
Sequencing today...

Illumina HiSeq
100 Billion bps per DAY

Within roughly ten years sequencing has
become about 10 million times cheaper
Nvidia Webinar, 22.10.2013

5
Future of NGS data analysis

Nvidia Webinar, 22.10.2013

6
Software libraries bridge gap
Structural variants

RNA-Seq

ChIP-Seq

Metagenomics abundance

Sequence assembly

Cancer genomics

Analysis pipelines

Experimentalists

Maintainable tool
Prototype implementation

Algorithm libraries

Algorithm design
Computer Scientists
FM-index

Multicore

Suffix arrays
Nvidia Webinar, 22.10.2013

Theoretical Considerations
Secondary memory
Fast I/O

K-mer filter
Hardware acceleration
7
SeqAn
Now SeqAn/SeqAn tools have been cited more
than 360 times
Among the institutions are (omitting German institutes):
Department of Genetics, Harvard Medical School, Boston,
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,
J. Craig Venter Institute, under BSD USA,
Is Rockville MD, license and
Department of Molecular Biology, Princeton University,
hence free for academic
Applied Mathematics Program, Yale University, New Haven,
IBM T.J. Watson Research Center, Yorktown Heights,
AND commercial use.
The Ohio State University, Columbus, University of Minnesota,
Australian National University, Canberra,
Department of Statistics, University of Oxford,
Swedish University of Agricultural Sciences (SLU), Uppsala,
Graduate School of Life Sciences, University of Cambridge,
Broad Institute, Cambridge, USA,
EMBL-EBI, University of California, University of Chicago,
Iowa State University, Ames, The Pennsylvania State University,
Peking University, Beijing University of Science and Technology of China,
BGI-Shenzhen, China, Beijing Institute of Genomics……
Nvidia Webinar, 22.10.2013

8
SeqAn developers
16
14
12
External

10

CSC
BMBF

8

DFG

6

IMPRS
FU

4
2
0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Nvidia Webinar, 22.10.2013

9
SeqAn main concepts

Nvidia Webinar, 22.10.2013

10
length(str)

Value<T>::Type

String<Subclass>
Nvidia Webinar, 22.10.2013

11
void swap(string & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

12
template <typename T>
void swap(T & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

13
template <typename T>
void swap(T & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

14
template <typename T>
void swap(String<T> & str)
{
T help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

15
template <typename T>
void swap(T & str)
{
T::value_type help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

16
template <typename T>
void swap(T & str)
{
Value<T>::Type help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

17
Metafunction
template <typename T>
struct Value
{
typedef T Type;
};

Nvidia Webinar, 22.10.2013

18
template <typename T>
struct Value
template <typename T>
{
struct Value< String<T> >
typedef T Type;
{
}; typedef T Type;
};
Nvidia Webinar, 22.10.2013

19
template <typename T>
struct Value
{
typedef T Type;
};
template <typename T>
< >
struct Value< String<T> >
char * >
{
typedef T Type;
char Type;
};
Nvidia Webinar, 22.10.2013

20
template <typename T>
struct Value< String<T> >
{
typedef T Type;
};
template < >
t_size N >
struct Value< char * > >
[N]
{
typedef char Type;
};
Nvidia Webinar, 22.10.2013

21
template <typename T>
void swap(T & str)
{
Value<T>::Type help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

22
template <typename T>
void swap(T & str)
{
Value<T>::Type help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

23
template <typename T>
void swap(T & str)
{
Value<T>::Type help =
value(str,1);
value(str,1) = value(str,0);
value(str,0) = help;
}
Nvidia Webinar, 22.10.2013

24
Shim Function
template <typename T>
Value<T> & value( T & str,
int i)
{
return str[i];
};
Nvidia Webinar, 22.10.2013

25
Generic Algorithm
template <typename T>
void swap(T & str)
{
Value<T>::Type help =
value(str,1);
value(str,1) = value(str,0);
value(str,0) = help;
}
Nvidia Webinar, 22.10.2013

26
SeqAn Content - SDK

Nvidia Webinar, 22.10.2013

27
SeqAn SDK Components - Tutorials

Nvidia Webinar, 22.10.2013

28
SeqAn SDK Components –
Reference Manual

Nvidia Webinar, 22.10.2013

29
SeqAn SDK Components

Review Board to ensure code quality
CDash/CTest to automatically
Code coverage reports
compile and test across platforms
Nvidia Webinar, 22.10.2013

30
SeqAn Content
algorithms & data structures

Nvidia Webinar, 22.10.2013

31
Unified Alignment Algorithms
Versatile & Extensible DP-Interface
For Example ...
Standard DP-Algorithms
Global & Semi Global Alignments
Local Alignments

Modified DP-Algorithms
Split Breakpoint Detection
Banded Chain Alignment

Nvidia Webinar, 22.10.2013

32
Unified Alignment Algorithms
For	
  Example	
  ...	
  
Needleman-Wunsch with Traceback:
DPProfile<GlobalAlignment<>, LinearGaps, TracebackOn<> >
Semi-Global Gotoh without Traceback:
DPProfile<GlobalAlignment<FreeEndGaps<True, False, True, False> >,
AffineGaps, TracebackOff>
Banded Smith-Waterman with Affine Gap Costs:
DPBand<BandOn>(lowerDiag, upperDiag),
DPProfile<LocalAlignment<>, AffineGaps, TracebackOn<> >
Split-Breakpoint Detection for Right Anchor:
DPProfile<SplitAlignment<>, AffineGaps, TracebackOn<GapsRight> >

Nvidia Webinar, 22.10.2013

33
Support for Common File Formats
Important file formats for HTS analysis
SequenceStream	
  ss(“file.fa.gz”);	
  
Sequences
while	
  (!atEnd(ss))	
  
FASTA, FASTQ
Indexed FASTA (FAI) for random access {	
  

Genomic Features
GFF 2, GFF 3, GTF, BED
Read Mapping
SAM, BAM (plus BAM indices)
Variants
VCF

	
  readRecord(id,	
  seq,	
  ss);	
  
	
  cout	
  <<	
  id	
  <<	
  't'	
  <<	
  seq	
  <<	
  'n';	
  
}
BamStream	
  bs(“file.bam”);	
  
while	
  (!atEnd(bs))	
  
{	
  
	
  readRecord(record,	
  bs);	
  
	
  cout	
  <<	
  record.qName	
  <<	
  't'	
  <<	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  record.pos	
  <<	
  'n’;	
  
}

… or write your own parser
Tutorials and helper routines for writing your own parsers.
Nvidia Webinar, 22.10.2013

34
Journaled Sequences
Store Multiple Genomes
Save Storage Capacities

StringSet<TJournaled,	
  Owner<JournalSet>	
  >	
  set;	
  
setGlobalReference(set,	
  refSeq);	
  
String<Dna,	
  Journaled<Alloc<>	
  >	
  >	
  

appendValue(set,	
  seq1);	
  
join(set,	
  idx,	
  JoinConfig<>());	
  

Ref:
G1:
ŸŸŸ

ŸŸŸ

G2:
GN:

Nvidia Webinar, 22.10.2013

35
Fragment	
  Store
(Multi) Read Alignments
Read alignments can be easily imported:
std::ifstream	
  file("ex1.sam");	
  
read(file,	
  store,	
  Sam());	
  
… and accessed as a multiple alignment, e.g. for visualization:
AlignedReadLayout	
  layout;	
  
layoutAlignment(layout,	
  store);	
  
printAlignment(svgFile,	
  Raw(),	
  layout,	
  store,	
  1,	
  0,	
  150,	
  0,	
  36);

Nvidia Webinar, 22.10.2013

36
Unified	
  Full-­‐Text	
  Indexing	
  Framework
Available Indices
Suffix Trees:
•  suffix array
•  enhanced suffix array
•  lazy suffix tree

Prefix Trie:
•  FM-index

q-Gram Indices:
•  direct addressing
•  open addressing
•  gapped

Index<TSeq,	
  IndexEsa<>	
  >
	
  
Index<StringSet<TSeq>,	
  FMIndex<>	
  > 	
  
All indices support multiple strings and external memory construction/usage.
Index Lookup Interface
All indices support the (sequential) find interface:
Finder<TIndex>	
  finder(index);	
  
while	
  (find(finder,	
  "TATAA"))	
  
	
  	
  cout	
  <<	
  "Hit	
  at	
  position"	
  <<	
  position(finder)	
  <<	
  endl;	
  
	
  
	
  

Nvidia Webinar, 22.10.2013

37
SeqAn Performance

Nvidia Webinar, 22.10.2013

38
Masai read mapper

Nvidia Webinar, 22.10.2013

39
Masai read mapper
Reads	
  

Genome	
  
Chr.	
  1	
  
Chr.	
  2	
  
Chr.	
  X	
  

ACGCTTCATCGCCCT…	
  

Index	
  of	
  reads	
  
(Radix	
  tree	
  of	
  seeds)	
  
Index	
  of	
  genome	
  
(e.g.	
  FM-­‐index)	
  

Algorithm	
  is	
  based	
  on	
  the	
  simultaneous	
  traversal	
  of	
  two	
  string	
  indices	
  	
  
(e.g.,	
  FM-­‐index,	
  Enhanced	
  suffix	
  array,	
  Lazy	
  suffix	
  tree)	
  
40
Nvidia Webinar, 22.10.2013
Read Mapping: Masai
Faster	
  and	
  more	
  accurate	
  than	
  BWA	
  and	
  BowLe2	
  
Timings	
  on	
  a	
  single	
  core	
  

Nvidia Webinar, 22.10.2013

41
Easily exchange index….

Nvidia Webinar, 22.10.2013

42
Collaboration to parallelize indices and
verification algorithms in SeqAn, to speed up any
applications making use of indices

What about multi-core implementation?

Nvidia Webinar, 22.10.2013

43
SeqAn going parallel
GOAL
Parallelize the finder interface of SeqAn
so it works on CPU and accelerators like GPU

Will	
  be	
  replaced	
  by	
  hg18	
  
	
  and	
  10	
  million	
  20-­‐mers	
  

Nvidia Webinar, 22.10.2013

44
SeqAn going parallel

Construct	
  FM-­‐index	
  
on	
  reverse	
  genome	
  
Set	
  #	
  OMP	
  threads	
  
Call	
  generic	
  count	
  funcLon	
  

Nvidia Webinar, 22.10.2013

45
SeqAn going parallel : NVIDIA GPUs

Copy	
  needles	
  and	
  index	
  to	
  GPU	
  

SAME	
  count	
  funcLon	
  as	
  on	
  CPU	
  !	
  

Nvidia Webinar, 22.10.2013

46
SeqAn going parallel
Count	
  occurrences	
  of	
  10	
  million	
  20-­‐mers	
  	
  	
  
in	
  the	
  human	
  genome	
  using	
  an	
  FM-­‐index	
  
I7,3.2	
  GHz	
  

…12...	
  
Intel	
  Xeon	
  Phi	
  
7120,	
  
244	
  threads	
  
NVIDIA	
  
Tesla	
  K20	
  
Nvidia Webinar, 22.10.2013

18.6	
  sec	
  

1	
  X	
  

2.66	
  sec	
  

7	
  X	
  

2.18	
  
sec	
  

8.5	
  X	
  

0.4 s

47	
  X	
  
47
SeqAn going parallel
Approx.	
  count	
  occurrences	
  of	
  1.2	
  million	
  33-­‐mers	
  	
  	
  
in	
  the	
  human	
  genome	
  using	
  an	
  FM-­‐index	
  
I7,3.2	
  GHz	
  

…12...	
  

66.1	
  s	
  

9.0	
  s	
  

1	
  X	
  
7.3	
  X	
  

Intel	
  Xeon	
  Phi	
  
7120,	
  
244	
  threads	
  

3.9 s

16.9	
  X	
  

NVIDIA	
  
Tesla	
  K20	
  

3.2 s

20.7	
  X	
  

Nvidia Webinar, 22.10.2013

48
Part II: The details

Nvidia Webinar, 22.10.2013

49
Parallelization on the GPU

Nvidia Webinar, 22.10.2013
CUDA preliminaries
In	
  order	
  to	
  use	
  CUDA	
  we	
  first	
  had	
  to	
  adapt	
  some	
  parts	
  of	
  SeqAn:	
  
•  CUDA	
  requires	
  each	
  funcLon	
  to	
  be	
  prefixed	
  with	
  domain	
  qualifiers	
  
__host__	
  	
  or	
  	
  __device__	
  	
  in	
  order	
  to	
  generate	
  CPU/GPU	
  code	
  
•  We	
  prefixed	
  all	
  basic	
  template	
  funcLons	
  with	
  a	
  SEQAN_HOST_DEVICE	
  macro	
  	
  

	
  

#ifdef __CUDACC__!
#define SEQAN_HOST_DEVICE inline __device__ __host__!
#else!
#define SEQAN_HOST_DEVICE inline!
#endif!

•  StaLc	
  const	
  arrays	
  are	
  not	
  allowed	
  in	
  the	
  way	
  SeqAn	
  defines	
  them	
  
•  We	
  replaced	
  alphabet	
  conversion	
  lookup	
  tables	
  (e.g.	
  Dna<-->	
  char)	
  by	
  
conversion	
  funcLons	
  
Nvidia Webinar, 22.10.2013
Strings
•  Instead	
  of	
  defining	
  a	
  new	
  CUDA	
  string	
  we	
  simply	
  use	
  the	
  Thrust	
  library:	
  
•  Provides	
  host_vector	
  and	
  device_vector	
  classes,	
  which	
  are	
  vectors	
  with	
  
buffers	
  in	
  host	
  or	
  device	
  memory	
  
•  However,	
  Thrust	
  funcLons	
  are	
  callable	
  only	
  from	
  host-­‐side	
  

•  We	
  made	
  both	
  vectors	
  accessible	
  from	
  SeqAn	
  
•  SeqAn	
  strings	
  have	
  to	
  provide	
  a	
  set	
  of	
  global	
  (meta-­‐)funcLons,	
  e.g.	
  Value<>,	
  
resize(),	
  …	
  
•  We	
  simply	
  defined	
  the	
  required	
  wrapper	
  funcLons	
  for	
  these	
  two	
  vectors	
  

Nvidia Webinar, 22.10.2013
Standard Strings
•  Up	
  to	
  here,	
  all	
  strings	
  can	
  only	
  be	
  used	
  on	
  the	
  side	
  of	
  their	
  scope	
  

Device	
  Memory	
  

Host	
  Memory	
  
thrust::host_vector!

Buffer	
  
Buffer	
  

thrust::device_vector!

seqan::String!

Nvidia Webinar, 22.10.2013

Buffer	
  

seqan::String!

Buffer	
  
Host-Device String
•  How	
  to	
  access	
  a	
  device_vector	
  from	
  device-­‐side?	
  
•  We	
  could	
  pass	
  (POD)	
  iterators	
  to	
  the	
  kernel	
  
•  However,	
  many	
  SeqAn	
  algorithms	
  work	
  on	
  more	
  complex	
  containers	
  

•  We	
  need	
  the	
  same	
  interface	
  of	
  the	
  container	
  on	
  the	
  device	
  side	
  
•  For	
  strings	
  we	
  developed	
  a	
  so-­‐called	
  ContainerView (POD	
  type)	
  
•  Provides	
  a	
  container	
  interface	
  given	
  the	
  begin/end	
  pointers	
  of	
  vector	
  buffer	
  
•  The	
  view()	
  funcLon	
  creates	
  the	
  ContainerView	
  object	
  for	
  a	
  given	
  
device_vector!

Nvidia Webinar, 22.10.2013
Host-Device String
•  How	
  to	
  use	
  a	
  device_vector	
  on	
  the	
  device	
  

Device	
  Memory	
  

Host	
  Memory	
  

Buffer	
  

thrust::device_vector!

view()!

seqan::ContainerView!

Nvidia Webinar, 22.10.2013

kernel	
  launch!

seqan::ContainerView!
Device and View metafunctions
•  For	
  generic	
  GPU	
  programming:	
  
•  The	
  Device	
  metafuncLon	
  returns	
  the	
  device-­‐memory	
  equivalent	
  of	
  a	
  class	
  
// Replaces String with thrust::device_vector.!
template <typename TValue, typename TSpec>!
struct Device<String<TValue, TSpec> >!
{!
typedef thrust::device_vector<TValue> Type;!
};!

•  The	
  View	
  metafuncLon	
  returns	
  the	
  (POD)	
  view	
  type	
  of	
  a	
  class	
  
// Returns a view type that can be passed to a CUDA kernel.!
template <typename TValue, typename TAlloc>!
struct View<thrust::device_vector<TValue, TAlloc> >!
{!
typedef ContainerView<thrust::device_vector<TValue, TAlloc> > Type;!
};!

Nvidia Webinar, 22.10.2013
Hello world
•  A	
  simple	
  example	
  to	
  reverse	
  a	
  string	
  on	
  the	
  GPU	
  
// A standard SeqAn string over the Dna alphabet.!
String<Dna> myString = "ACGT";!
!
// A Dna string on device global memory.!
typename Device<String<Dna> >::Type myDeviceString;!
!

// Copy the string to global memory.!
assign(myDeviceString, myString);!
!
// Pass a view of the device string to the CUDA kernel.!
myKernel<<<1,1>>>(view(myDeviceString));!
!

// TString is ContainerView<device_vector<Dna> >.!
template <typename TString>!
__global__ void myKernel(TString string)!
{!
printf(”length(string) = %dn", length(string));!
reverse(string);!
}!

Nvidia Webinar, 22.10.2013
Porting complex data structures
•  More	
  complex	
  structures	
  (e.g.	
  Index,	
  Graph)	
  can	
  only	
  be	
  ported	
  to	
  the	
  
GPU	
  if	
  they	
  …	
  
•  don’t	
  use	
  pointers	
  
•  use	
  only	
  strings	
  of	
  POD	
  types	
  (String<Dna>,	
  but	
  not	
  String<String<…> >)	
  
•  use	
  only	
  1-­‐dimensional	
  StringSets	
  (ConcatDirect)	
  

•  Nested	
  classes	
  are	
  no	
  problem	
  
•  View	
  metafuncLon	
  converts	
  all	
  member	
  types	
  into	
  their	
  view	
  types	
  
•  view()	
  funcLon	
  is	
  called	
  recursively	
  on	
  all	
  members	
  

Nvidia Webinar, 22.10.2013
Example: FM Index

Nvidia Webinar, 22.10.2013
The FM-index (BWT, LF-mapping)

Nvidia Webinar, 22.10.2013
The FM-index (search ssi)

a3	
  =	
  C(‘i’)	
  +	
  Occ(‘i’,0)	
  +	
  1 	
  =	
  1	
  +	
  0	
  +	
  1	
  
b3	
  =	
  C(‘i’)	
  +	
  Occ(‘i’,12) 	
  =	
  1	
  +	
  4	
  
Nvidia Webinar, 22.10.2013
The FM-index (backwards search)

a1	
  =	
  C(‘s’)	
  +	
  Occ(‘s’,8)	
  +	
  1	
  =	
  8	
  +	
  2	
  +	
  1	
  
	
  
b1	
  =	
  C(‘s’)	
  +	
  Occ(‘s’,10) 	
  	
  =	
  8	
  +	
  4	
  
Nvidia Webinar, 22.10.2013
The FM-index in SeqAn
•  The	
  FM-­‐index	
  can	
  be	
  implemented	
  using	
  a	
  number	
  of	
  string-­‐based	
  
lookup	
  tables	
  
•  ...	
  as	
  well	
  as	
  other	
  indices,	
  e.g.	
  enhanced	
  suffix	
  array,	
  q-­‐gram	
  index	
  
•  There	
  is	
  a	
  space-­‐Lme	
  tradeoff	
  between	
  all	
  these	
  indices	
  
•  The	
  FM	
  index	
  has	
  the	
  minimal	
  memory	
  requirements	
  

Nvidia Webinar, 22.10.2013
A generic FM-index
•  SeqAn‘s	
  FM-­‐index	
  consists	
  of	
  some	
  nested	
  classes	
  storing	
  Strings	
  
FM-­‐index	
  (host-­‐only)	
  

Nvidia Webinar, 22.10.2013
A generic FM-index
•  The	
  Device	
  type	
  of	
  the	
  FM	
  index	
  uses	
  device_vector	
  instead	
  of	
  String!
GPU	
  FM-­‐index	
  (host-­‐part)	
  

•  The	
  view	
  of	
  this	
  object	
  (=	
  device-­‐part)	
  is	
  the	
  same	
  tree,	
  where	
  leaves	
  are	
  
replaced	
  by	
  ContainerViews	
  of	
  device_vectors	
  	
  

Nvidia Webinar, 22.10.2013
CPU vs. GPU
•  Invoking	
  an	
  FM-­‐index	
  based	
  search	
  on	
  CPU	
  and	
  GPU:	
  
// Select the index The findGPU kernel AND the
type.!
findCPU function will TIndex;!
typedef Index<DnaString, FMIndex<> > invoke many
!

instances of the SAME generic

// Type is Index<device_vector<Dna>, FMIndex<> >.!
function which will perform a
typedef typename Device<TIndex>::Type TDeviceIndex;!
!

// ======== On CPU
!

backtracking algorithm on our
========
// ==========
generic index interface	

 On

// Create an index.
TIndex index("ACGTTGCAA");

GPU ===========!

// Create a device index.!
TIndex index("ACGTTGCAA");!
TDeviceIndex deviceIndex;!
assign(deviceIndex, index);!

!
// Use the FM-index on CPU.
findCPU(index,…);
!

template <typename TIndex>
void
findCPU(TIndex & index,…);

Nvidia Webinar, 22.10.2013

// Use the FM-index in a CUDA kernel.!
findGPU<<<...>>>(view(deviceIndex),…);!
template <typename TIndex>!
__global__ void!
findGPU(TIndex index,…);!
Approximate search via backtracking
do {!
if (finder.score == finder.scoreThreshold)!
{!
if (goDown(textIt, suffix(pattern, patternIt))) delegate(finder);!
goUp(textIt);!
if (isRoot(textIt)) break;!
}!
else if (finder.score < finder.scoreThreshold)!
{!
if (atEnd(patternIt)) delegate(finder);!
else if (goDown(textIt))!
{!
finder.score += parentEdgeLabel(textIt) != value(patternIt);!
goNext(patternIt);!
continue;!
}!
}!
!

!

do {!
goPrevious(patternIt);!
finder.score -= parentEdgeLabel(textIt) != value(patternIt);!
} while (!goRight(textIt) && goUp(textIt));!
if (isRoot(textIt)) break;!
finder.score += parentEdgeLabel(textIt) != value(patternIt);!
goNext(patternIt);!

}!
while (true);!

Nvidia Webinar, 22.10.2013
Outlook for GPU support
•  Our	
  next	
  steps	
  are:	
  
•  Provide	
  parallelFor()	
  to	
  hide	
  CUDA	
  kernel	
  call/OpenMP	
  for-­‐loop	
  
•  Develop	
  classes	
  for	
  concurrent	
  access	
  (String,	
  job	
  queues)	
  
•  Port	
  more	
  indices	
  and	
  index	
  iterators	
  to	
  be	
  used	
  with	
  CUDA	
  
•  Port	
  SeqAn‘s	
  alignment	
  module	
  
•  Develop	
  a	
  CPU/GPU	
  version	
  of	
  the	
  FM-­‐index	
  based	
  read	
  mapper	
  Masai	
  
•  ...	
  

•  Follow	
  our	
  development:	
  
•  Sources:	
  hqps://github.com/seqan/seqan/tree/develop	
  
•  Code	
  examples:	
  hqp://trac.seqan.de/wiki/HowTo/DevelopCUDA	
  
	
  

Nvidia Webinar, 22.10.2013
Generic Parallelization

Nvidia Webinar, 22.10.2013

69
Multicore parallelization
•  We	
  first	
  introduced	
  Tags	
  to	
  switch	
  between	
  serial	
  and	
  parallel	
  
algorithms:	
  
struct Serial_;!
typedef Tag<Serial_> Serial;!

	
  

!
struct Parallel_;!
typedef Tag<Parallel_> Parallel;!

	
  
•  Then	
  we	
  defined	
  basic	
  atomic	
  operaLons	
  required	
  for	
  thread	
  safety:	
  	
  

	
  

template <typename T>!
inline T atomicInc(T &x, Serial)!
{!
return ++x;!
}!

	
  	
  

!

template <typename T>!
inline T atomicInc(volatile T &x, Parallel)!
{!
__sync_add_and_fetch(&x, 1);!
}!
Splitter
•  To	
  this	
  end,	
  we	
  developed	
  the	
  Splitter<TValue, TSpec>	
  to	
  
compute	
  a	
  parLLon	
  into	
  subintervals	
  of	
  (almost)	
  equal	
  length	
  …	
  

	
  

Splitter<unsigned> splitter(10, 20, 3);!
for (unsigned i = 0; i < length(splitter); ++i)!
cout << '[' << splitter[i] << ',' << splitter[i+1] << ')' << endl;!
!
// [10,14)!
// [14,17) !
// [17,20)!
Splitter
•  The	
  Spliqer	
  can	
  also	
  be	
  used	
  with	
  iterators	
  directly	
  	
  
•  The	
  Serial	
  /	
  Parallel	
  tag	
  divides	
  an	
  interval	
  range	
  into	
  1	
  /	
  #thread_num	
  
many	
  intervals	
  
template <typename TIter, typename TVal, typename TParallelTag>!
inline void arrayFill(TIter begin_, TIter end_, !
TVal const &value, Tag<TParallelTag> parallelTag)!
{!
Splitter<TIterator> splitter(begin_, end_, parallelTag);!
!

SEQAN_OMP_PRAGMA(parallel for)!
for (int job = 0; job < (int)length(splitter); ++job)!
arrayFill(splitter[job], splitter[job + 1], value, Serial());!
}!

•  The	
  parallel	
  tag	
  can	
  be	
  used	
  to	
  switch	
  off	
  the	
  parallel	
  behaviour	
  
SeqAn going parallel
Count	
  occurrences	
  of	
  10	
  million	
  20-­‐mers	
  	
  	
  
in	
  the	
  human	
  genome	
  using	
  an	
  FM-­‐index	
  
I7,3.2	
  GHz	
  

18.6	
  sec	
  

1	
  X	
  

Thank you for your
2.66	
  sec	
  
7	
  X	
  
…12...	
  
attention
Intel	
  Xeon	
  Phi	
  
7120,	
  
244	
  threads	
  
NVIDIA	
  
Tesla	
  K20	
  

2.18	
  
sec	
  

0.4 s

8.5	
  X	
  
47	
  X	
  
73
Upcoming GTC Express Webinars
October 23 - Revolutionize Virtual Desktops with the One
Missing Piece: A Scalable GPU
October 30 - OpenACC 2.0 Enhancements for Cray
Supercomputers
October 31 - Getting the Most out of NVIDIA GRID vGPU with
Citrix XenServer
November 5 - Accelerating Face-in-the-Crowd Recognition with
GPU Technology
November 6 - Bright Cluster Manager: A CUDA-ready
Management Solution for GPU-based HPC

Register at www.gputechconf.com/gtcexpress
GTC 2014 Call for Posters
Posters should describe novel or interesting topics in
§  Science and research
§  Professional graphics
§  Mobile computing
§  Automotive applications
§  Game development
§  Cloud computing

Call opens October 29
www.gputechconf.com
Test Drive NVIDIA GPUs!
Experience The Acceleration

Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Drive
on remotely hosted clusters

www.nvidia.com/GPUTestDrive

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」Shinya Takamaeda-Y
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introductionHanibei
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報NVIDIA Japan
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCINVIDIA Japan
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...Hiroki Nakahara
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
IoT: From Arduino Microcontrollers to Tizen Products using IoTivity
IoT: From Arduino Microcontrollers to Tizen Products using IoTivityIoT: From Arduino Microcontrollers to Tizen Products using IoTivity
IoT: From Arduino Microcontrollers to Tizen Products using IoTivitySamsung Open Source Group
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
GPU Computing with Ruby
GPU Computing with RubyGPU Computing with Ruby
GPU Computing with RubyShin Yee Chung
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 

Was ist angesagt? (20)

Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 
Cuda
CudaCuda
Cuda
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
IoT: From Arduino Microcontrollers to Tizen Products using IoTivity
IoT: From Arduino Microcontrollers to Tizen Products using IoTivityIoT: From Arduino Microcontrollers to Tizen Products using IoTivity
IoT: From Arduino Microcontrollers to Tizen Products using IoTivity
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
GPU Computing with Ruby
GPU Computing with RubyGPU Computing with Ruby
GPU Computing with Ruby
 
Cuda
CudaCuda
Cuda
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 

Ähnlich wie Introduction to SeqAn, an Open-source C++ Template Library

JavaDayKiev'15 Java in production for Data Mining Research projects
JavaDayKiev'15 Java in production for Data Mining Research projectsJavaDayKiev'15 Java in production for Data Mining Research projects
JavaDayKiev'15 Java in production for Data Mining Research projectsAlexey Zinoviev
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Deep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceDeep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceNVIDIA
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5Rihards Gailums
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
 
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...Alessandro Confetti
 
DReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabDReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabNECST Lab @ Politecnico di Milano
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemCDVClub
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCinside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systemsVsevolod Stakhov
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 

Ähnlich wie Introduction to SeqAn, an Open-source C++ Template Library (20)

JavaDayKiev'15 Java in production for Data Mining Research projects
JavaDayKiev'15 Java in production for Data Mining Research projectsJavaDayKiev'15 Java in production for Data Mining Research projects
JavaDayKiev'15 Java in production for Data Mining Research projects
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Deep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceDeep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and Inference
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
 
DReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabDReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLab
 
High Performance Reconfigurable Computing at NECSTLab
High Performance Reconfigurable Computing at NECSTLabHigh Performance Reconfigurable Computing at NECSTLab
High Performance Reconfigurable Computing at NECSTLab
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemC
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 

Mehr von Can Ozdoruk

ROAD FROM $0 TO $10M: 10 GROWTH TIPS
ROAD FROM $0 TO $10M: 10 GROWTH TIPSROAD FROM $0 TO $10M: 10 GROWTH TIPS
ROAD FROM $0 TO $10M: 10 GROWTH TIPSCan Ozdoruk
 
Cloudinary Webinar Responsive Images
Cloudinary Webinar Responsive ImagesCloudinary Webinar Responsive Images
Cloudinary Webinar Responsive ImagesCan Ozdoruk
 
Image optimization q_auto - f_auto
Image optimization q_auto - f_autoImage optimization q_auto - f_auto
Image optimization q_auto - f_autoCan Ozdoruk
 
Boomerang-ConsumerElectronics-RAR
Boomerang-ConsumerElectronics-RARBoomerang-ConsumerElectronics-RAR
Boomerang-ConsumerElectronics-RARCan Ozdoruk
 
White-Paper-Consumer-Electronics
White-Paper-Consumer-ElectronicsWhite-Paper-Consumer-Electronics
White-Paper-Consumer-ElectronicsCan Ozdoruk
 
Boomerang-Toys-RAR
Boomerang-Toys-RARBoomerang-Toys-RAR
Boomerang-Toys-RARCan Ozdoruk
 
SacramentoKings_Case-Study
SacramentoKings_Case-StudySacramentoKings_Case-Study
SacramentoKings_Case-StudyCan Ozdoruk
 
Product Marketing 101
Product Marketing 101Product Marketing 101
Product Marketing 101Can Ozdoruk
 
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChemChallenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChemCan Ozdoruk
 
Supercharging MD Simulations with GPUs
Supercharging MD Simulations with GPUsSupercharging MD Simulations with GPUs
Supercharging MD Simulations with GPUsCan Ozdoruk
 
NVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUNVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUCan Ozdoruk
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
 
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMDUncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMDCan Ozdoruk
 
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUs
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUsACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUs
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUsCan Ozdoruk
 
AMBER and Kepler GPUs
AMBER and Kepler GPUsAMBER and Kepler GPUs
AMBER and Kepler GPUsCan Ozdoruk
 

Mehr von Can Ozdoruk (16)

ROAD FROM $0 TO $10M: 10 GROWTH TIPS
ROAD FROM $0 TO $10M: 10 GROWTH TIPSROAD FROM $0 TO $10M: 10 GROWTH TIPS
ROAD FROM $0 TO $10M: 10 GROWTH TIPS
 
Cloudinary Webinar Responsive Images
Cloudinary Webinar Responsive ImagesCloudinary Webinar Responsive Images
Cloudinary Webinar Responsive Images
 
Image optimization q_auto - f_auto
Image optimization q_auto - f_autoImage optimization q_auto - f_auto
Image optimization q_auto - f_auto
 
Boomerang-ConsumerElectronics-RAR
Boomerang-ConsumerElectronics-RARBoomerang-ConsumerElectronics-RAR
Boomerang-ConsumerElectronics-RAR
 
White-Paper-Consumer-Electronics
White-Paper-Consumer-ElectronicsWhite-Paper-Consumer-Electronics
White-Paper-Consumer-Electronics
 
Boomerang-Toys-RAR
Boomerang-Toys-RARBoomerang-Toys-RAR
Boomerang-Toys-RAR
 
SacramentoKings_Case-Study
SacramentoKings_Case-StudySacramentoKings_Case-Study
SacramentoKings_Case-Study
 
Product Marketing 101
Product Marketing 101Product Marketing 101
Product Marketing 101
 
AMBER14 & GPUs
AMBER14 & GPUsAMBER14 & GPUs
AMBER14 & GPUs
 
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChemChallenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChem
 
Supercharging MD Simulations with GPUs
Supercharging MD Simulations with GPUsSupercharging MD Simulations with GPUs
Supercharging MD Simulations with GPUs
 
NVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUNVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPU
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
 
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMDUncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
 
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUs
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUsACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUs
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUs
 
AMBER and Kepler GPUs
AMBER and Kepler GPUsAMBER and Kepler GPUs
AMBER and Kepler GPUs
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Introduction to SeqAn, an Open-source C++ Template Library

  • 1. Test Drive NVIDIA GPUs! Experience The Acceleration Develop your codes on latest GPUs today Sign up for FREE GPU Test Drive on remotely hosted clusters www.nvidia.com/GPUTestDrive
  • 2. Prof. Dr. Knut Reinert Algorithmische Bioinformatik, FB Mathematik und Informatik Intro to SeqAn An Open-Source C++ template library for biological sequence analysis Knut Reinert, David Weese Freie Universität Berlin Berlin Institute for Computer Science
  • 3. This talk Why SeqAn? SeqAn as SDK SeqAn concept/content Generic Parallelization 3
  • 4. ~ 15 years ago... Data volume and cost: In 2000 the 3 billion base pairs of the human genome were sequenced for about 3 billion US$ Dollar 100 million bp per day Nvidia Webinar, 22.10.2013 4
  • 5. Sequencing today... Illumina HiSeq 100 Billion bps per DAY Within roughly ten years sequencing has become about 10 million times cheaper Nvidia Webinar, 22.10.2013 5
  • 6. Future of NGS data analysis Nvidia Webinar, 22.10.2013 6
  • 7. Software libraries bridge gap Structural variants RNA-Seq ChIP-Seq Metagenomics abundance Sequence assembly Cancer genomics Analysis pipelines Experimentalists Maintainable tool Prototype implementation Algorithm libraries Algorithm design Computer Scientists FM-index Multicore Suffix arrays Nvidia Webinar, 22.10.2013 Theoretical Considerations Secondary memory Fast I/O K-mer filter Hardware acceleration 7
  • 8. SeqAn Now SeqAn/SeqAn tools have been cited more than 360 times Among the institutions are (omitting German institutes): Department of Genetics, Harvard Medical School, Boston, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, J. Craig Venter Institute, under BSD USA, Is Rockville MD, license and Department of Molecular Biology, Princeton University, hence free for academic Applied Mathematics Program, Yale University, New Haven, IBM T.J. Watson Research Center, Yorktown Heights, AND commercial use. The Ohio State University, Columbus, University of Minnesota, Australian National University, Canberra, Department of Statistics, University of Oxford, Swedish University of Agricultural Sciences (SLU), Uppsala, Graduate School of Life Sciences, University of Cambridge, Broad Institute, Cambridge, USA, EMBL-EBI, University of California, University of Chicago, Iowa State University, Ames, The Pennsylvania State University, Peking University, Beijing University of Science and Technology of China, BGI-Shenzhen, China, Beijing Institute of Genomics…… Nvidia Webinar, 22.10.2013 8
  • 9. SeqAn developers 16 14 12 External 10 CSC BMBF 8 DFG 6 IMPRS FU 4 2 0 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Nvidia Webinar, 22.10.2013 9
  • 10. SeqAn main concepts Nvidia Webinar, 22.10.2013 10
  • 12. void swap(string & str) { char help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 12
  • 13. template <typename T> void swap(T & str) { char help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 13
  • 14. template <typename T> void swap(T & str) { char help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 14
  • 15. template <typename T> void swap(String<T> & str) { T help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 15
  • 16. template <typename T> void swap(T & str) { T::value_type help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 16
  • 17. template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 17
  • 18. Metafunction template <typename T> struct Value { typedef T Type; }; Nvidia Webinar, 22.10.2013 18
  • 19. template <typename T> struct Value template <typename T> { struct Value< String<T> > typedef T Type; { }; typedef T Type; }; Nvidia Webinar, 22.10.2013 19
  • 20. template <typename T> struct Value { typedef T Type; }; template <typename T> < > struct Value< String<T> > char * > { typedef T Type; char Type; }; Nvidia Webinar, 22.10.2013 20
  • 21. template <typename T> struct Value< String<T> > { typedef T Type; }; template < > t_size N > struct Value< char * > > [N] { typedef char Type; }; Nvidia Webinar, 22.10.2013 21
  • 22. template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 22
  • 23. template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 23
  • 24. template <typename T> void swap(T & str) { Value<T>::Type help = value(str,1); value(str,1) = value(str,0); value(str,0) = help; } Nvidia Webinar, 22.10.2013 24
  • 25. Shim Function template <typename T> Value<T> & value( T & str, int i) { return str[i]; }; Nvidia Webinar, 22.10.2013 25
  • 26. Generic Algorithm template <typename T> void swap(T & str) { Value<T>::Type help = value(str,1); value(str,1) = value(str,0); value(str,0) = help; } Nvidia Webinar, 22.10.2013 26
  • 27. SeqAn Content - SDK Nvidia Webinar, 22.10.2013 27
  • 28. SeqAn SDK Components - Tutorials Nvidia Webinar, 22.10.2013 28
  • 29. SeqAn SDK Components – Reference Manual Nvidia Webinar, 22.10.2013 29
  • 30. SeqAn SDK Components Review Board to ensure code quality CDash/CTest to automatically Code coverage reports compile and test across platforms Nvidia Webinar, 22.10.2013 30
  • 31. SeqAn Content algorithms & data structures Nvidia Webinar, 22.10.2013 31
  • 32. Unified Alignment Algorithms Versatile & Extensible DP-Interface For Example ... Standard DP-Algorithms Global & Semi Global Alignments Local Alignments Modified DP-Algorithms Split Breakpoint Detection Banded Chain Alignment Nvidia Webinar, 22.10.2013 32
  • 33. Unified Alignment Algorithms For  Example  ...   Needleman-Wunsch with Traceback: DPProfile<GlobalAlignment<>, LinearGaps, TracebackOn<> > Semi-Global Gotoh without Traceback: DPProfile<GlobalAlignment<FreeEndGaps<True, False, True, False> >, AffineGaps, TracebackOff> Banded Smith-Waterman with Affine Gap Costs: DPBand<BandOn>(lowerDiag, upperDiag), DPProfile<LocalAlignment<>, AffineGaps, TracebackOn<> > Split-Breakpoint Detection for Right Anchor: DPProfile<SplitAlignment<>, AffineGaps, TracebackOn<GapsRight> > Nvidia Webinar, 22.10.2013 33
  • 34. Support for Common File Formats Important file formats for HTS analysis SequenceStream  ss(“file.fa.gz”);   Sequences while  (!atEnd(ss))   FASTA, FASTQ Indexed FASTA (FAI) for random access {   Genomic Features GFF 2, GFF 3, GTF, BED Read Mapping SAM, BAM (plus BAM indices) Variants VCF  readRecord(id,  seq,  ss);    cout  <<  id  <<  't'  <<  seq  <<  'n';   } BamStream  bs(“file.bam”);   while  (!atEnd(bs))   {    readRecord(record,  bs);    cout  <<  record.qName  <<  't'  <<                    record.pos  <<  'n’;   } … or write your own parser Tutorials and helper routines for writing your own parsers. Nvidia Webinar, 22.10.2013 34
  • 35. Journaled Sequences Store Multiple Genomes Save Storage Capacities StringSet<TJournaled,  Owner<JournalSet>  >  set;   setGlobalReference(set,  refSeq);   String<Dna,  Journaled<Alloc<>  >  >   appendValue(set,  seq1);   join(set,  idx,  JoinConfig<>());   Ref: G1: ŸŸŸ ŸŸŸ G2: GN: Nvidia Webinar, 22.10.2013 35
  • 36. Fragment  Store (Multi) Read Alignments Read alignments can be easily imported: std::ifstream  file("ex1.sam");   read(file,  store,  Sam());   … and accessed as a multiple alignment, e.g. for visualization: AlignedReadLayout  layout;   layoutAlignment(layout,  store);   printAlignment(svgFile,  Raw(),  layout,  store,  1,  0,  150,  0,  36); Nvidia Webinar, 22.10.2013 36
  • 37. Unified  Full-­‐Text  Indexing  Framework Available Indices Suffix Trees: •  suffix array •  enhanced suffix array •  lazy suffix tree Prefix Trie: •  FM-index q-Gram Indices: •  direct addressing •  open addressing •  gapped Index<TSeq,  IndexEsa<>  >   Index<StringSet<TSeq>,  FMIndex<>  >   All indices support multiple strings and external memory construction/usage. Index Lookup Interface All indices support the (sequential) find interface: Finder<TIndex>  finder(index);   while  (find(finder,  "TATAA"))      cout  <<  "Hit  at  position"  <<  position(finder)  <<  endl;       Nvidia Webinar, 22.10.2013 37
  • 39. Masai read mapper Nvidia Webinar, 22.10.2013 39
  • 40. Masai read mapper Reads   Genome   Chr.  1   Chr.  2   Chr.  X   ACGCTTCATCGCCCT…   Index  of  reads   (Radix  tree  of  seeds)   Index  of  genome   (e.g.  FM-­‐index)   Algorithm  is  based  on  the  simultaneous  traversal  of  two  string  indices     (e.g.,  FM-­‐index,  Enhanced  suffix  array,  Lazy  suffix  tree)   40 Nvidia Webinar, 22.10.2013
  • 41. Read Mapping: Masai Faster  and  more  accurate  than  BWA  and  BowLe2   Timings  on  a  single  core   Nvidia Webinar, 22.10.2013 41
  • 42. Easily exchange index…. Nvidia Webinar, 22.10.2013 42
  • 43. Collaboration to parallelize indices and verification algorithms in SeqAn, to speed up any applications making use of indices What about multi-core implementation? Nvidia Webinar, 22.10.2013 43
  • 44. SeqAn going parallel GOAL Parallelize the finder interface of SeqAn so it works on CPU and accelerators like GPU Will  be  replaced  by  hg18    and  10  million  20-­‐mers   Nvidia Webinar, 22.10.2013 44
  • 45. SeqAn going parallel Construct  FM-­‐index   on  reverse  genome   Set  #  OMP  threads   Call  generic  count  funcLon   Nvidia Webinar, 22.10.2013 45
  • 46. SeqAn going parallel : NVIDIA GPUs Copy  needles  and  index  to  GPU   SAME  count  funcLon  as  on  CPU  !   Nvidia Webinar, 22.10.2013 46
  • 47. SeqAn going parallel Count  occurrences  of  10  million  20-­‐mers       in  the  human  genome  using  an  FM-­‐index   I7,3.2  GHz   …12...   Intel  Xeon  Phi   7120,   244  threads   NVIDIA   Tesla  K20   Nvidia Webinar, 22.10.2013 18.6  sec   1  X   2.66  sec   7  X   2.18   sec   8.5  X   0.4 s 47  X   47
  • 48. SeqAn going parallel Approx.  count  occurrences  of  1.2  million  33-­‐mers       in  the  human  genome  using  an  FM-­‐index   I7,3.2  GHz   …12...   66.1  s   9.0  s   1  X   7.3  X   Intel  Xeon  Phi   7120,   244  threads   3.9 s 16.9  X   NVIDIA   Tesla  K20   3.2 s 20.7  X   Nvidia Webinar, 22.10.2013 48
  • 49. Part II: The details Nvidia Webinar, 22.10.2013 49
  • 50. Parallelization on the GPU Nvidia Webinar, 22.10.2013
  • 51. CUDA preliminaries In  order  to  use  CUDA  we  first  had  to  adapt  some  parts  of  SeqAn:   •  CUDA  requires  each  funcLon  to  be  prefixed  with  domain  qualifiers   __host__    or    __device__    in  order  to  generate  CPU/GPU  code   •  We  prefixed  all  basic  template  funcLons  with  a  SEQAN_HOST_DEVICE  macro       #ifdef __CUDACC__! #define SEQAN_HOST_DEVICE inline __device__ __host__! #else! #define SEQAN_HOST_DEVICE inline! #endif! •  StaLc  const  arrays  are  not  allowed  in  the  way  SeqAn  defines  them   •  We  replaced  alphabet  conversion  lookup  tables  (e.g.  Dna<-->  char)  by   conversion  funcLons   Nvidia Webinar, 22.10.2013
  • 52. Strings •  Instead  of  defining  a  new  CUDA  string  we  simply  use  the  Thrust  library:   •  Provides  host_vector  and  device_vector  classes,  which  are  vectors  with   buffers  in  host  or  device  memory   •  However,  Thrust  funcLons  are  callable  only  from  host-­‐side   •  We  made  both  vectors  accessible  from  SeqAn   •  SeqAn  strings  have  to  provide  a  set  of  global  (meta-­‐)funcLons,  e.g.  Value<>,   resize(),  …   •  We  simply  defined  the  required  wrapper  funcLons  for  these  two  vectors   Nvidia Webinar, 22.10.2013
  • 53. Standard Strings •  Up  to  here,  all  strings  can  only  be  used  on  the  side  of  their  scope   Device  Memory   Host  Memory   thrust::host_vector! Buffer   Buffer   thrust::device_vector! seqan::String! Nvidia Webinar, 22.10.2013 Buffer   seqan::String! Buffer  
  • 54. Host-Device String •  How  to  access  a  device_vector  from  device-­‐side?   •  We  could  pass  (POD)  iterators  to  the  kernel   •  However,  many  SeqAn  algorithms  work  on  more  complex  containers   •  We  need  the  same  interface  of  the  container  on  the  device  side   •  For  strings  we  developed  a  so-­‐called  ContainerView (POD  type)   •  Provides  a  container  interface  given  the  begin/end  pointers  of  vector  buffer   •  The  view()  funcLon  creates  the  ContainerView  object  for  a  given   device_vector! Nvidia Webinar, 22.10.2013
  • 55. Host-Device String •  How  to  use  a  device_vector  on  the  device   Device  Memory   Host  Memory   Buffer   thrust::device_vector! view()! seqan::ContainerView! Nvidia Webinar, 22.10.2013 kernel  launch! seqan::ContainerView!
  • 56. Device and View metafunctions •  For  generic  GPU  programming:   •  The  Device  metafuncLon  returns  the  device-­‐memory  equivalent  of  a  class   // Replaces String with thrust::device_vector.! template <typename TValue, typename TSpec>! struct Device<String<TValue, TSpec> >! {! typedef thrust::device_vector<TValue> Type;! };! •  The  View  metafuncLon  returns  the  (POD)  view  type  of  a  class   // Returns a view type that can be passed to a CUDA kernel.! template <typename TValue, typename TAlloc>! struct View<thrust::device_vector<TValue, TAlloc> >! {! typedef ContainerView<thrust::device_vector<TValue, TAlloc> > Type;! };! Nvidia Webinar, 22.10.2013
  • 57. Hello world •  A  simple  example  to  reverse  a  string  on  the  GPU   // A standard SeqAn string over the Dna alphabet.! String<Dna> myString = "ACGT";! ! // A Dna string on device global memory.! typename Device<String<Dna> >::Type myDeviceString;! ! // Copy the string to global memory.! assign(myDeviceString, myString);! ! // Pass a view of the device string to the CUDA kernel.! myKernel<<<1,1>>>(view(myDeviceString));! ! // TString is ContainerView<device_vector<Dna> >.! template <typename TString>! __global__ void myKernel(TString string)! {! printf(”length(string) = %dn", length(string));! reverse(string);! }! Nvidia Webinar, 22.10.2013
  • 58. Porting complex data structures •  More  complex  structures  (e.g.  Index,  Graph)  can  only  be  ported  to  the   GPU  if  they  …   •  don’t  use  pointers   •  use  only  strings  of  POD  types  (String<Dna>,  but  not  String<String<…> >)   •  use  only  1-­‐dimensional  StringSets  (ConcatDirect)   •  Nested  classes  are  no  problem   •  View  metafuncLon  converts  all  member  types  into  their  view  types   •  view()  funcLon  is  called  recursively  on  all  members   Nvidia Webinar, 22.10.2013
  • 59. Example: FM Index Nvidia Webinar, 22.10.2013
  • 60. The FM-index (BWT, LF-mapping) Nvidia Webinar, 22.10.2013
  • 61. The FM-index (search ssi) a3  =  C(‘i’)  +  Occ(‘i’,0)  +  1  =  1  +  0  +  1   b3  =  C(‘i’)  +  Occ(‘i’,12)  =  1  +  4   Nvidia Webinar, 22.10.2013
  • 62. The FM-index (backwards search) a1  =  C(‘s’)  +  Occ(‘s’,8)  +  1  =  8  +  2  +  1     b1  =  C(‘s’)  +  Occ(‘s’,10)    =  8  +  4   Nvidia Webinar, 22.10.2013
  • 63. The FM-index in SeqAn •  The  FM-­‐index  can  be  implemented  using  a  number  of  string-­‐based   lookup  tables   •  ...  as  well  as  other  indices,  e.g.  enhanced  suffix  array,  q-­‐gram  index   •  There  is  a  space-­‐Lme  tradeoff  between  all  these  indices   •  The  FM  index  has  the  minimal  memory  requirements   Nvidia Webinar, 22.10.2013
  • 64. A generic FM-index •  SeqAn‘s  FM-­‐index  consists  of  some  nested  classes  storing  Strings   FM-­‐index  (host-­‐only)   Nvidia Webinar, 22.10.2013
  • 65. A generic FM-index •  The  Device  type  of  the  FM  index  uses  device_vector  instead  of  String! GPU  FM-­‐index  (host-­‐part)   •  The  view  of  this  object  (=  device-­‐part)  is  the  same  tree,  where  leaves  are   replaced  by  ContainerViews  of  device_vectors     Nvidia Webinar, 22.10.2013
  • 66. CPU vs. GPU •  Invoking  an  FM-­‐index  based  search  on  CPU  and  GPU:   // Select the index The findGPU kernel AND the type.! findCPU function will TIndex;! typedef Index<DnaString, FMIndex<> > invoke many ! instances of the SAME generic // Type is Index<device_vector<Dna>, FMIndex<> >.! function which will perform a typedef typename Device<TIndex>::Type TDeviceIndex;! ! // ======== On CPU ! backtracking algorithm on our ======== // ========== generic index interface On // Create an index. TIndex index("ACGTTGCAA"); GPU ===========! // Create a device index.! TIndex index("ACGTTGCAA");! TDeviceIndex deviceIndex;! assign(deviceIndex, index);! ! // Use the FM-index on CPU. findCPU(index,…); ! template <typename TIndex> void findCPU(TIndex & index,…); Nvidia Webinar, 22.10.2013 // Use the FM-index in a CUDA kernel.! findGPU<<<...>>>(view(deviceIndex),…);! template <typename TIndex>! __global__ void! findGPU(TIndex index,…);!
  • 67. Approximate search via backtracking do {! if (finder.score == finder.scoreThreshold)! {! if (goDown(textIt, suffix(pattern, patternIt))) delegate(finder);! goUp(textIt);! if (isRoot(textIt)) break;! }! else if (finder.score < finder.scoreThreshold)! {! if (atEnd(patternIt)) delegate(finder);! else if (goDown(textIt))! {! finder.score += parentEdgeLabel(textIt) != value(patternIt);! goNext(patternIt);! continue;! }! }! ! ! do {! goPrevious(patternIt);! finder.score -= parentEdgeLabel(textIt) != value(patternIt);! } while (!goRight(textIt) && goUp(textIt));! if (isRoot(textIt)) break;! finder.score += parentEdgeLabel(textIt) != value(patternIt);! goNext(patternIt);! }! while (true);! Nvidia Webinar, 22.10.2013
  • 68. Outlook for GPU support •  Our  next  steps  are:   •  Provide  parallelFor()  to  hide  CUDA  kernel  call/OpenMP  for-­‐loop   •  Develop  classes  for  concurrent  access  (String,  job  queues)   •  Port  more  indices  and  index  iterators  to  be  used  with  CUDA   •  Port  SeqAn‘s  alignment  module   •  Develop  a  CPU/GPU  version  of  the  FM-­‐index  based  read  mapper  Masai   •  ...   •  Follow  our  development:   •  Sources:  hqps://github.com/seqan/seqan/tree/develop   •  Code  examples:  hqp://trac.seqan.de/wiki/HowTo/DevelopCUDA     Nvidia Webinar, 22.10.2013
  • 70. Multicore parallelization •  We  first  introduced  Tags  to  switch  between  serial  and  parallel   algorithms:   struct Serial_;! typedef Tag<Serial_> Serial;!   ! struct Parallel_;! typedef Tag<Parallel_> Parallel;!   •  Then  we  defined  basic  atomic  operaLons  required  for  thread  safety:       template <typename T>! inline T atomicInc(T &x, Serial)! {! return ++x;! }!     ! template <typename T>! inline T atomicInc(volatile T &x, Parallel)! {! __sync_add_and_fetch(&x, 1);! }!
  • 71. Splitter •  To  this  end,  we  developed  the  Splitter<TValue, TSpec>  to   compute  a  parLLon  into  subintervals  of  (almost)  equal  length  …     Splitter<unsigned> splitter(10, 20, 3);! for (unsigned i = 0; i < length(splitter); ++i)! cout << '[' << splitter[i] << ',' << splitter[i+1] << ')' << endl;! ! // [10,14)! // [14,17) ! // [17,20)!
  • 72. Splitter •  The  Spliqer  can  also  be  used  with  iterators  directly     •  The  Serial  /  Parallel  tag  divides  an  interval  range  into  1  /  #thread_num   many  intervals   template <typename TIter, typename TVal, typename TParallelTag>! inline void arrayFill(TIter begin_, TIter end_, ! TVal const &value, Tag<TParallelTag> parallelTag)! {! Splitter<TIterator> splitter(begin_, end_, parallelTag);! ! SEQAN_OMP_PRAGMA(parallel for)! for (int job = 0; job < (int)length(splitter); ++job)! arrayFill(splitter[job], splitter[job + 1], value, Serial());! }! •  The  parallel  tag  can  be  used  to  switch  off  the  parallel  behaviour  
  • 73. SeqAn going parallel Count  occurrences  of  10  million  20-­‐mers       in  the  human  genome  using  an  FM-­‐index   I7,3.2  GHz   18.6  sec   1  X   Thank you for your 2.66  sec   7  X   …12...   attention Intel  Xeon  Phi   7120,   244  threads   NVIDIA   Tesla  K20   2.18   sec   0.4 s 8.5  X   47  X   73
  • 74. Upcoming GTC Express Webinars October 23 - Revolutionize Virtual Desktops with the One Missing Piece: A Scalable GPU October 30 - OpenACC 2.0 Enhancements for Cray Supercomputers October 31 - Getting the Most out of NVIDIA GRID vGPU with Citrix XenServer November 5 - Accelerating Face-in-the-Crowd Recognition with GPU Technology November 6 - Bright Cluster Manager: A CUDA-ready Management Solution for GPU-based HPC Register at www.gputechconf.com/gtcexpress
  • 75. GTC 2014 Call for Posters Posters should describe novel or interesting topics in §  Science and research §  Professional graphics §  Mobile computing §  Automotive applications §  Game development §  Cloud computing Call opens October 29 www.gputechconf.com
  • 76. Test Drive NVIDIA GPUs! Experience The Acceleration Develop your codes on latest GPUs today Sign up for FREE GPU Test Drive on remotely hosted clusters www.nvidia.com/GPUTestDrive