The document describes a parallel algorithm for 1-D sequence alignment using the Coarse-Grained Multicomputer (CGM) model. The algorithm partitions the sequence alignment problem into smaller subproblems that are distributed across multiple processors. It requires O(n2/p log n) time using O(p) communication rounds on p processors, improving on the sequential O(n2) time algorithm. The algorithm first has each processor compute a lookup table efficiently in parallel before distributing equal-sized macro-blocks of the task graph across processors to calculate the sequence alignment in parallel.
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
CGM-Based Parallel 1-D Sequence Alignment
1. A CGM-Based Parallel Algorithm Using the
Four-Russians Speedup for the 1-D Sequence
Alignment Problem
By
Jerry Lacmou Zeutouo, Grace Colette Tessa Masse, Franklin Ingrid
Kamga Youmbi
at
African Conference on Reseach in Computer Science and Applied
Mathematics
CARI’2020 Polytech School of Thiès, Senegal October.
2. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Opening thoughts
The processing of sequences such as DNA sequencing is a variety
of problems that generate massive calculations and whose results
are difficult to achieve on simple PCs. One way to improve the
speed of solution is parallelism.
Lacmou et al. 2 / 55
6. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Biological information
Definition
Biological information can be:
a sequence (DNA, RNA, proteins);
a structure (primaire, secondary);
a function (molecular function e.g. enzyme; cell component
e.g. membrane protein; biological process e.g. oxygen trans-
port);
interactions (proteins, metabolic pathways, gene networks).
Lacmou et al. 6 / 55
7. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
The privileged axes of bioinformatics
1 Formalization of genetic information analysis of sequences
(biomolecules) and their structure (especially 3D structure);
2 Biological interpretation of genetic information data integra-
tion (establishment of maps and networks of gene interac-
tions, protein interactions, etc.);
3 Functional prediction.
Lacmou et al. 7 / 55
8. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Bioinformatics methods
Comparative method: comparison of unknown sequences or
structures with databases (sequences and structures) of genes
and proteins known to establish similarities (similarities, ho-
mologies or identities);
Statistical method: software programs apply statistical anal-
yses to the data (on sequence syntax) to try to identify and
identify rules and constraints that are systematic, regular or
general in nature;
Modelling probabilistic approach.
Lacmou et al. 8 / 55
9. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Sequence alignment
One of the fundamental problems of bioinformatics
is the sequence alignment, crucial for molecular
prediction, molecular interactions and phylogenetic
analysis.
Lacmou et al. 9 / 55
10. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Sequence alignment definition
Definition
Sequence alignment is the problem of comparing biological
sequences by looking for a series of nucleotides or amino acids
that appear in the same order in the input sequences, possibly
introducing gaps. It is a means of visualizing the similarity
between sequences based on the notions of similarity or distance.
Figure: Sequences alignementLacmou et al. 10 / 55
11. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Types of alignments
Two types of alignments are considered:
1 global alignments, which take into account the whole of two
sequences to be compared;
2 local alignments, which make it possible to detect the seg-
ment of the first sequence which is most similar to a segment
of the other.
Lacmou et al. 11 / 55
12. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Dynamic programming complexity
Dynamic programming complexity
Due to dynamic programming, both types of alignments can be
run in O(n2) time and space where n is the length of two
strings. Based on the Four-Russians method, several sped-up
sequential solutions has been proposed like the one of Brubach
and Ghurye’s algorithm running in O( n2
log n).
Lacmou et al. 12 / 55
13. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Why parallelize?
Despite the acceleration of the sequential algorithm, the
resolution of the problem remains time and space consuming.
Lacmou et al. 13 / 55
14. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
The parallelization of the sequential algorithm
A PRAM algorithm runs in O(log n log m) time with nm
log nm
CREW processors have been proposed by Apostolico et al.,
1990;
Alves et al., 2002 proposed a parallel solution for a variant
of the problem under the CGM model using weighted graphs
that require log p rounds and runs in O(n2
p log m);
Recently, Kim et al., 2016 have proposed a space-efficient al-
phabetindependent Four-Russians’ lookup table and a mul-
tithreaded Four-Russians’ edit distance algorithm.
Lacmou et al. 14 / 55
15. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Our aim
We tackle the problem of parallelizing the Brubach and Ghurye’s
sequential algorithm for the one-dimensional sequence alignment
problem on the CGM model (Coarse-Grained Multicomputer).
The choice of the model CGM is due to the fact that it is
more suitable to the current machines and it has already
been used to solve efficiently several problems such as the
optimal binary search tree problem
Lacmou et al. 15 / 55
16. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
What is a CGM algorithm?
Its design is based on:
Partitioning of the problem into subproblems;
Distribution of subproblems between the different processors;
CGM algorithm itself: it is a set of rounds of calculations
and communicatons.
Lacmou et al. 16 / 55
17. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Parallelisation criterion on CGM algorithm
Minimize the idleness time and the latency time of proces-
sors;
Minimize the overall communication time;
Minimize the number of communications rounds;
Minimize the local computation time of each processor and;
Balance the load between the processors.
Lacmou et al. 17 / 55
18. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Context
Problematic
Our main focus
Our results
Our solution requires O( n2
p log n ) execution time with O(p)
communication rounds on p processors.
Lacmou et al. 18 / 55
19. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
1 Introduction
2 Sequential algorithm
3 CGM algorithm
4 Experimental results
5 Conclusion
Lacmou et al. 19 / 55
20. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Formal definition
Definition
The sequence alignment method seeks to optimize the alignment
score. This score is related to the similarity rate between the
two compared sequences. It measures the number of edit
operations it takes to convert a sequence X into another Y .
Lacmou et al. 20 / 55
21. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Elementary editing operation
They include:
1 insertions;
2 deletions;
3 substitutions of a single character.
The weighted variant assigns a cost to each of the mentioned
operations and through the use of a penalty matrix in case, costs
appear not to be constant.
Lacmou et al. 21 / 55
22. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Global alignment overview
A global alignment of two X and Y sequences can be given
by computing the distance between X and Y ;
The elementary editing operations are described as follows:
substitution of an a symbol by an b symbol;
deletion of an a symbol;
insertion of an b symbol.
There is also the ability to calculate global alignments using
similarity scores instead of distance.
Lacmou et al. 22 / 55
23. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Global alignment score computation
T[i, j] = max
T[i − 1, j − 1] + Sub(X[i], Y [j]),
T[i − 1, j] + Del(X[i]),
T[i, j − 1] + Ins(Y [j]).
(1)
Lacmou et al. 23 / 55
24. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Local alignment overview
A local alignment of two sequences X and Y consists in find-
ing the X segment that is most similar to a Y segment ;
The local edit score of two sequences X and Y is defined by
s(X, Y ), the maximum similarity between an X segment and
a Y segment.
Lacmou et al. 24 / 55
25. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Local alignment score computation
Ts[i, j] = max
Ts[i − 1, j − 1] + Sub(X[i], Y [j]),
Ts[i − 1, j] + Del(X[i]),
Ts[i, j − 1] + Ins(Y [j]),
0.
(2)
Lacmou et al. 25 / 55
27. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Speedup techniques
Dynamic programming algorithms can sometimes be made even
faster by applying speedups such as the Knuth-Yao
quadrangle-inequality speedup or the Four-Russians speedup.
Lacmou et al. 27 / 55
28. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Idea behind the Four-Russians’ speedup
The idea behind the speedup is to tile the dynamic programming
table into smaller blocks whose solutions are foreseen,
precomputed and stored in a lookup table. The goal sought is
spending less time on those by merely searching for them.
Lacmou et al. 28 / 55
30. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
This quadratic time bound has been first proposed using
the Four-Russians technique by Masek and Paterson in 1980
achieving O(n2/ log n);
Crochemore et al., 2003 later achieved the same time bound
for unrestricted scoring matrices.
Lacmou et al. 30 / 55
31. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
Brubach and Ghurye’s efficient lookup table
The bottleneck in the aforementioned speedup lies in the space
demanded by the table, and the time spent computing it.
Lacmou et al. 31 / 55
32. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
The Four-Russians lookup table can be built in O(t2 lg t), queried
in O(t) and stored in O(t2) space.
Lacmou et al. 32 / 55
33. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Global alignment
Local alignment
The Four-Russians’ speedup
Brubach and Ghurye’s efficient lookup table
The t-blocks within the dynamic programming table are tiled
in a way they overlap by one column/row on each side;
Given a t-block first row and column, the block function of
the lookup process has to provide its last row and column;
Running the block function time and again in a row-wise (or
column-wise) manner will eventually yield the global edit
distance score in the bottom-right cell.
Lacmou et al. 33 / 55
35. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Our contribution
Our solution is divided into two parts:
1 Firstly, each processor computes the lookup table through
the Brubach and Ghurye’s sequential algorithm in O(t2 lg t);
2 Secondly, we partition the task graph into subgraphs of the
same size, and we distribute them fairly onto the processors.
Lacmou et al. 35 / 55
36. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Partitioning strategy
Our technique consists in partitioning the task graph in two
steps:
Lacmou et al. 36 / 55
37. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Partitioning steps
1 Firstly, we partition the task graph into small blocks of size
t referred to as t-blocks such that any two adjacent t-blocks
overlap by either a row or column. After this first partition-
ing, the task graph is divided into k lines and k columns
where k = n
t−1;
2 Next, we subdivide the n2
t2 t-blocks into p lines and p columns
of blocks referred to as macro-blocks and denoted by MB(i, j).
MB(i, j) is a matrix of size k
p × k
p and is identified by the
node on the lower right corner.
Lacmou et al. 37 / 55
38. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Scenario of this partitioning
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
Lacmou et al. 38 / 55
39. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Snake-like mapping
This distribution consists in assigning all macro-blocks of a
diagonal from the top left corner to the bottom right corner;
This process is renewed until all processors have been used,
starting with processor 0 and traveling through the blocks
with a "snake-like" path.
Lacmou et al. 39 / 55
41. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Lemma
After the partitioning of the task graph and snake-like
distribution scheme, each processor evaluates exactly p
macro-blocks.
Proof.
There are p2 macro-blocks after the partitioning of the task
graph. Therefore, it is clear that each processor evaluates
exactly p blocks.
Lacmou et al. 41 / 55
42. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Our CGM algorithm
We present our CGM solution for the sequence alignment
problem. Brubach and Ghurye’s sequential algorithm is used for
local computations. Based on the previous partitioning strategy
and the mapping of blocks onto processors, the corresponding
CGM algorithm is presented in algorithm.
Lacmou et al. 42 / 55
44. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Local computation time
Lemma
The evaluation of a macro-block of size n
p(t−1) × n
p(t−1) requires
O n2
p2t
computation time.
Lemma
Our CGM algorithm requires O n2
pt time steps per processor.
Lacmou et al. 44 / 55
45. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Partitioning strategy
Mapping macro-blocks onto processors
Overview of the CGM algorithm
Algorithm complexity
Theorem
By subdividing the task graph into macro-blocks of size
n
p(t−1) × n
p(t−1) and using the snake-like distribution scheme, our
CGM algorithm requires O n2
p log n execution time with O(p)
communication rounds when t = log n.
Lacmou et al. 45 / 55
47. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Environment
We measure the performance of our solution on the Matrics plat-
form backed by the University of Picardie Jules Verne.
Intel Xeon (R) CPU E5-2680 v4 @ 2.40GHz;
28 cores;
128GB of RAM;
The inter-processor communication is implemented with the
MPI library (OpenMPI version 1.10.4).
Lacmou et al. 47 / 55
48. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Samples
We use a real biological DNA sequences with |Σ| = 4. The
results presented here are derived from its execution for different
values of the triplet (m, n, p), where m and n are the size of
sequences, with values ranging from 105 to 106, and p is the
number of processors, with values in the set {1, 2, 4, 8, 16, 32, 48}.
Lacmou et al. 48 / 55
51. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Result comments
For m = 106:
Our algorithm runs about 11.43 times faster than
the Brubach and Ghurye’s sequential algorithm on
16 processors;
The speed-up increases up to 20.40 on 32 processors;
And 30.05 on 48 processors.
From all this, we can conclude that our algorithm is scalable to
the increase of the size of sequences and the numbers of
processors.
Lacmou et al. 51 / 55
53. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Wrapping things up
We attempted in this paper to parallelize their results in ap-
plication to the one-dimensional sequence alignment prob-
lem;
Based on the CGM model, our solution clusters t-blocks into
macro-blocks and follows the snake-like distribution mapping
pattern onto p processors to achieve for O(p) rounds of com-
munication, an execution time in O(n2/p log n);
Experimental results show good agreement with theoretical
forecasts.
Lacmou et al. 53 / 55
54. Introduction
Sequential algorithm
CGM algorithm
Experimental results
Conclusion
Future directions
Noticing the fact that single processors uselessly compute large
sets of entries in their respective lookup tables, one might be
interested in cutting off the waste. Or even extend our solution
to address the two-dimensional sequence alignment problem.
Lacmou et al. 54 / 55