SlideShare a Scribd company logo
1 of 54
Download to read offline
GARY BENSON, YOZEN HERNANDEZ, &
JOSHUA LOVING
BIOINFORMATICS PROGRAM
B O S T O N U N I V E R S I T Y
J L O V I N G @ B U . E D U
A Bit-Parallel,
General Integer-Scoring
Sequence Alignment Algorithm
Introduction: Problem Description
Input:
• Sequences X and Y
• Integer weights M; I; G
match; mismatch; indel or gap that define a
similarity or distance scoring function S
Output:
• Calculate the global alignment score for X and Y
Introduction
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
Global Alignment – Needleman-Wunsch
Alignment Scoring Matrix
Sequence X
SequenceY
Introduction
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
Introduction
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
Penalty from beginning
Introduction
- A C T G C A A
- 0 0 0 0 0 0 0 0
A 0 2 -3 -3 -3 -3 2 2
G 0 -3 -1 -6 -1 -6 -3 -1
T 0 -3 -6 1 -4 -4 -8 -6
C 0 -3 -1 -4 -2 -2 -7 -11
A 0 2 -3 -9 -7 -5 0 -5
A 0 2 -1 -6 -7 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
No initial Penalty
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Motivation
 Cheaper
sequencing of
DNA means that
larger datasets
are being
generated
 Sequence
analysis of such
large datasets
can be
accelerated by
faster alignment
algorithms
Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program
(GSP) Available at: www.genome.gov/sequencingcosts. Accessed June 10, 2013.
Earlier Bit-parallel Pattern Matching Algorithms
 Longest Common Subsequence (LCS) (Allison & Dix, 1986;
Crochemore et al, 2001; Hyyro, 2004)
 Unit-cost edit-distance (Myers, 99; Hyyro et al, 2005)
 K-differences (agrep; Wu-Manber, 92)
 Regular expression search (Navarro, 04)
 Arbitrary weights edit-distance (Bergeron&Hamel, 02)
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Algorithm Foundation
Match = 2, Mismatch = -3, Indel = -5
Algorithm Foundation
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
-3 -8
-1 -6
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
-3 -8
-1 -6
-5
2 2
-5
Algorithm Foundation
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
-3 -8
-1 -6
∆H
∆V ∆VNEXT
∆HNEXT
-5
2 2
-5
Algorithm Foundation
Match = 2, Mismatch = -3, Indel = -5
∆H
-5
∆V -
-5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
-3 -8
-1 -6
-5
2 2
-5
∆H
∆V ∆VNEXT
∆HNEXT
Algorithm Foundation
Match = 2, Mismatch = -3, Indel = -5
Output
Input
Function Table
∆VNEXT output values given ∆V and ∆H
input values
What is the range of differences?
What is the range of differences?
Match = 2,
Mismatch = -3,
Indel = -5
What is the range of differences?
Match = 2,
Mismatch = -3,
Indel = -5
Minimum Value = Indel = -5
Maximum Value = Match – Indel = 2 - (- 5) = 7
Generalized Function Table
M = match score
I = mismatch score
G = indel (gap) penalty
Zones in Example Function Table
Bit-parallel Representation
 Bit-vectors: computer words 64 bits long
 A bit-vector for each possible difference, both horizontally
and vertically (∆V and ∆H)
 A set of Match vectors (MatchA, MatchC, MatchG, MatchT
in the DNA case)
 We keep track of match positions because they are a special
case in the function table.
Example ∆H Bit-vector Storage
- A C T G C A A
C -20 -13 -6 -4 -2 -2 -7 -12
∆H values
7 7 2 2 0 -5 -5
∆H Bit-Vectors
7 1 1 0 0 0 0 0
6 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
2 0 0 1 1 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
-1 0 0 0 0 0 0 0
-2 0 0 0 0 0 0 0
-3 0 0 0 0 0 0 0
-4 0 0 0 0 0 0 0
-5 0 0 0 0 0 1 1
Example Match Vectors
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match Vectors
MatchesA 1 0 0 0 0 1 1
MatchesC 0 1 0 0 1 0 0
MatchesT 0 0 1 0 0 0 0
MatchesG 0 0 0 1 0 0 0
Algorithm
 Start with ∆H values
 Compute ∆V values
 Then compute the new ∆H values
Algorithm: Example
- A C T G C A A
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
∆H values
7 7 2 2 0 -5 -5
-5 -5 -5 -5 -3 7 7
∆V values-5
Time Complexity
𝑂 𝑧𝑛
𝑚
𝑤
where
n = |Sequence Y|
m = |Sequence X|
w = length of computer word
𝑧 =
(𝑀 − 2𝐺 + 1)2
− (𝐼 − 2𝐺)2
2
Implementation
 Python script that generates C code based on input
parameters (M; I; G)
 Will eventually have web page for download of code
Experimental Analysis
 Compared BHL with
 Wu-Manber K-differences algorithm
 Unit cost edit distance bit-parallel algorithm
 Longest Common Subsequence bit-parallel algorithm
 Needleman-Wunsch dynamic programming algorithm
 Computed 25 million alignments with each program
 Each DNA sequence was 63 bases long
 All programs compiled using GCC, optimization level
O3
 Computation done on a typical desktop computer
Results: Comparison to NW algorithm
Results: comparison to bit-parallel algorithms
Current and Future Work
 Implementation for sequences longer than one word
 Single Instruction Multiple Data (SIMD) implementation
 BLOSUM and PAM type substitution matrix support
 General Purpose Graphics Processing Unit (GPGPU)
implementation
 New bit-parallel representations for greater speed and
compactness of data
Acknowledgements
My advisor, Dr. Gary Benson
Lab members
Yevgeniy Gelfand Yozen Hernandez
Funded by the National Science Foundation (NSF)
Questions

More Related Content

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

A bit parallel, general integer-scoring

  • 1. GARY BENSON, YOZEN HERNANDEZ, & JOSHUA LOVING BIOINFORMATICS PROGRAM B O S T O N U N I V E R S I T Y J L O V I N G @ B U . E D U A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm
  • 2. Introduction: Problem Description Input: • Sequences X and Y • Integer weights M; I; G match; mismatch; indel or gap that define a similarity or distance scoring function S Output: • Calculate the global alignment score for X and Y
  • 3. Introduction - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 Global Alignment – Needleman-Wunsch Alignment Scoring Matrix Sequence X SequenceY
  • 4. Introduction - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 Integer Scores
  • 5. Introduction - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 Penalty from beginning
  • 6. Introduction - A C T G C A A - 0 0 0 0 0 0 0 0 A 0 2 -3 -3 -3 -3 2 2 G 0 -3 -1 -6 -1 -6 -3 -1 T 0 -3 -6 1 -4 -4 -8 -6 C 0 -3 -1 -4 -2 -2 -7 -11 A 0 2 -3 -9 -7 -5 0 -5 A 0 2 -1 -6 -7 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 No initial Penalty
  • 7. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5
  • 8. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5
  • 9. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5
  • 10. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5
  • 11. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30
  • 12. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30
  • 13. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30
  • 14. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30
  • 15. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30
  • 16. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30
  • 17. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30
  • 18. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30
  • 19. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30
  • 20. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 T -15 C -20 A -25 A -30
  • 21. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30
  • 22. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores
  • 23. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30
  • 24. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30
  • 25. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30
  • 26. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30
  • 27. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30
  • 28. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2
  • 29. Motivation  Cheaper sequencing of DNA means that larger datasets are being generated  Sequence analysis of such large datasets can be accelerated by faster alignment algorithms Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) Available at: www.genome.gov/sequencingcosts. Accessed June 10, 2013.
  • 30. Earlier Bit-parallel Pattern Matching Algorithms  Longest Common Subsequence (LCS) (Allison & Dix, 1986; Crochemore et al, 2001; Hyyro, 2004)  Unit-cost edit-distance (Myers, 99; Hyyro et al, 2005)  K-differences (agrep; Wu-Manber, 92)  Regular expression search (Navarro, 04)  Arbitrary weights edit-distance (Bergeron&Hamel, 02)
  • 31. - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Algorithm Foundation Match = 2, Mismatch = -3, Indel = -5
  • 32. Algorithm Foundation - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 -3 -8 -1 -6
  • 33. - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 -3 -8 -1 -6 -5 2 2 -5 Algorithm Foundation Match = 2, Mismatch = -3, Indel = -5
  • 34. - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 -3 -8 -1 -6 ∆H ∆V ∆VNEXT ∆HNEXT -5 2 2 -5 Algorithm Foundation Match = 2, Mismatch = -3, Indel = -5
  • 35. ∆H -5 ∆V - -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 -3 -8 -1 -6 -5 2 2 -5 ∆H ∆V ∆VNEXT ∆HNEXT Algorithm Foundation Match = 2, Mismatch = -3, Indel = -5 Output Input
  • 36. Function Table ∆VNEXT output values given ∆V and ∆H input values
  • 37. What is the range of differences?
  • 38. What is the range of differences? Match = 2, Mismatch = -3, Indel = -5
  • 39. What is the range of differences? Match = 2, Mismatch = -3, Indel = -5 Minimum Value = Indel = -5 Maximum Value = Match – Indel = 2 - (- 5) = 7
  • 40. Generalized Function Table M = match score I = mismatch score G = indel (gap) penalty
  • 41. Zones in Example Function Table
  • 42. Bit-parallel Representation  Bit-vectors: computer words 64 bits long  A bit-vector for each possible difference, both horizontally and vertically (∆V and ∆H)  A set of Match vectors (MatchA, MatchC, MatchG, MatchT in the DNA case)  We keep track of match positions because they are a special case in the function table.
  • 43. Example ∆H Bit-vector Storage - A C T G C A A C -20 -13 -6 -4 -2 -2 -7 -12 ∆H values 7 7 2 2 0 -5 -5 ∆H Bit-Vectors 7 1 1 0 0 0 0 0 6 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -1 0 0 0 0 0 0 0 -2 0 0 0 0 0 0 0 -3 0 0 0 0 0 0 0 -4 0 0 0 0 0 0 0 -5 0 0 0 0 0 1 1
  • 44. Example Match Vectors - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match Vectors MatchesA 1 0 0 0 0 1 1 MatchesC 0 1 0 0 1 0 0 MatchesT 0 0 1 0 0 0 0 MatchesG 0 0 0 1 0 0 0
  • 45. Algorithm  Start with ∆H values  Compute ∆V values  Then compute the new ∆H values
  • 46. Algorithm: Example - A C T G C A A C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 ∆H values 7 7 2 2 0 -5 -5 -5 -5 -5 -5 -3 7 7 ∆V values-5
  • 47. Time Complexity 𝑂 𝑧𝑛 𝑚 𝑤 where n = |Sequence Y| m = |Sequence X| w = length of computer word 𝑧 = (𝑀 − 2𝐺 + 1)2 − (𝐼 − 2𝐺)2 2
  • 48. Implementation  Python script that generates C code based on input parameters (M; I; G)  Will eventually have web page for download of code
  • 49. Experimental Analysis  Compared BHL with  Wu-Manber K-differences algorithm  Unit cost edit distance bit-parallel algorithm  Longest Common Subsequence bit-parallel algorithm  Needleman-Wunsch dynamic programming algorithm  Computed 25 million alignments with each program  Each DNA sequence was 63 bases long  All programs compiled using GCC, optimization level O3  Computation done on a typical desktop computer
  • 50. Results: Comparison to NW algorithm
  • 51. Results: comparison to bit-parallel algorithms
  • 52. Current and Future Work  Implementation for sequences longer than one word  Single Instruction Multiple Data (SIMD) implementation  BLOSUM and PAM type substitution matrix support  General Purpose Graphics Processing Unit (GPGPU) implementation  New bit-parallel representations for greater speed and compactness of data
  • 53. Acknowledgements My advisor, Dr. Gary Benson Lab members Yevgeniy Gelfand Yozen Hernandez Funded by the National Science Foundation (NSF)

Editor's Notes

  1. In computational biology, sequence alignment forms an important part of biological sequence analysis.To formally state the problem we are trying to solve, given two input sequences we want to find their global alignment score, using integer parameters to weight the alignment. We aren’t computing the alignment, which takes extra steps, we are solely interested in computing the alignment score.
  2. The Needleman Wunsch algorithm is the standard method of global alignment.Here is the alignment scoring matrix for an example alignment of two sequences with a particular set of alignment scoring weights.
  3. We restrict the scores that we are using to be only integer valued.
  4. In this alignment example, we want to align the strings from their beginnings, so we impose a gap penalty for starting from a place other than the beginning in either string.
  5. However, if this constraint is not present, we can allow alignment to start at any point by initializing the first row and column to be all zero.Our method allows either case, but we will focus on the case using an initial penalty.
  6. As a brief overview, the NW algorithm proceeds by initializing the scoring matrix’s first row and column.
  7. To fill in the scoring matrix, we iterate over it, with each cell’s score being determined by the three cells to the left, diagonally, and above.
  8. To fill in the entire scoring matrix, each cell must be visited once.
  9. Because cells depend on the cells that preceded them, they must be done in order.
  10. Bit parallel alignment allows us to represent multiple cells by bits in a computer word.Rather than computing each cell individually, we compute values for entire rows at once.The benefit to bit-parallel algorithms lie in their speed and efficiency.
  11. In this figure, we have cost, in dollars, shown logarithmically on the y-axis against time on the x-axis. The white line represents how Moore’s Law shrinks the cost of computation over time. In green, we have the actual cost of sequencing a human genome, so it is clear that the cost of sequencing is dropping much faster than the cost of computing.This makes implementing faster sequence alignment algorithms important, which is what we are doing.
  12. Say that Hyyroacheieved 4 bit operations per word for LCS, and 15 bit operations per word for Unit-cost edit distance.Arbitrary weights edit-distance – does it solve the same problem? They gave no implementation and were only able to guess at the time complexity. We have an implementation that uses a different methodology.
  13. As we recalled, each cell in the NW alignment matrix is derived from the 3 adjacent cells to its left and above.
  14. As an example, we will consider a small block of the scoring matrix.
  15. We know that the scores in the scoring matrix have unbounded values.However, the differences between adjacent cells are bounded. In our bit-parallel algorithm and others these differences are used as the problem representation. This is similar to how the 4 russians’ technique uses differences between cells. (possibly refer to 4 russian’s technique if questions are asked)
  16. We will call these differences Delta H and Delta V.
  17. Of course, it is obvious that DeltaVnext and DeltaHnext can be derived from Delta H and Delta V, but what was not obvious was the regular pattern that emerges if you look at all possible inputs and outputs.
  18. In fact, the output values produced by input values of Delta H and Delta V follow a very regular pattern.This is shown here in this function table for Delta Vnext output values.
  19. As an example, we can look at the function table shown before. The input Delta V and Delta H values range from -5 to positive 7.
  20. These values derive from the input parameters of 2, -3, and -5
  21. The minimum value is equal to the indel penalty, and the maximum value is equal to the match score minus the indel penalty. 2 minus (negative 5)
  22. Similar relationships determine important boundaries between these zones in this function table.The algorithm needs to compute the values in these zones separately Thus, using a set of scoring parameters we can generate the corresponding function table
  23. Here are the same zones shown in the example function table we were looking at.
  24. What are bit-vectors? They are computer words, in current machines, generally 64 bits long. We use a bit-vector to represent each possible difference horizontally and vertically. We end up with two sets, one that represents Delta H, and another set representing the same integer values of difference for Delta V.We also need a set of Match vectors. For each character in Sequence Y, we wish to know whether there are matches in sequence X. In the case of DNA, this means that we must store 4 vectors for matches with A, C, G or T.These match vectors are necessary because matches represent a special case in the function table.
  25. Here we have a single row in the scoring matrix. The bit-vectors that represent the horizontal changes are shown directly below the values they represent.
  26. Similarly, this is an example set of Match vectors used to store the matches corresponding to each base in Sequence X
  27. Point out the squared values, note that these tend to be small, and mention that we will be showing an actual runtime demonstration of how the parameters affect efficiency.
  28. As the scoring parameters change, the code changes as well. Since it would be very difficult to write a program for each possible input parameter set, I wrote a script that generates the C code for each scoresWe will host it on a website.
  29. For all experiments, we used human DNA and did a total of 25 million alignments. All sequences were 63 bases long, to allow the bit-parallel representations to fit in single computer words. (if they ask: We are working on the multiple word implementation, but were unable to finish in time for this paper.)Single core of 3 GHz computer (intel core 2 duo).
  30. We implemented our algorithm and compared it to several other pattern matching algorithms.This figure compares our method with Needleman Wunsch. Say what the Y – axis is: minutes to compute 25 million alignmentsSay what the X-axis is: number of bit operations that each of our versions of the algorithm uses.Say what the labels on the BHL algorithms mean.Note that as the parameters increase, the time increases.Note that NW takes over 7 minutes, our 2-3-5 version takes under 2
  31. Here, we compare the unit-cost edit distance bit-parallel algorithm, the LCS bit-parallel algorithm, Wu-Manbers K-difference algorithm, and the version of our algorithm equivalent to unit-cost edit-distance.While our algorithm has slightly more operations, 23, than the unit-cost edit distance algorithm with 15, our algorithm is not optimized for any particular parameter set. Even though our algorithm is general purpose, we come quite close to the best known solution.
  32. I’d like to thank my advisor, Dr. Gary Benson, and the other members of my lab, YevgeniyGelfand and Yozen Hernandez, as well as the NSF for funding this work.