SlideShare ist ein Scribd-Unternehmen logo
1 von 18
 "Phylogenetics" is the study or estimation of the evolutionary history that 
underlies that biological diversity. 
 The results of phylogenetic analysis are usually presented as a collection of nodes 
and branches. That is, a tree 
 In such tree, taxa that are closely related in an evolutionary sense appear close to 
each other, and taxa that are distantly related are in different (far) branches of 
the trees 
 Phylogenetic trees are also important for multiple sequence alignment
 Trees may be rooted or unrooted. 
 Rooted trees reflect the most 
basal ancestor of the tree in 
question. 
 Unrooted trees do not imply a 
known ancestral root. 
 There are competing techniques 
for rooting a tree; one of the 
most common methods is 
through the use of an 
"outgroup" . 
 An outgroup is a species that 
have unambiguously separated 
early from the other species 
being considered. 
B
 Multiple sequence alignment can be viewed as an extension of pairwise sequence 
alignment, but the complexity of the computation grows exponentially with the 
number of sequences. 
 MSA applies both to nucleotide and amino acid sequences 
 One of the most essential tools in molecular biology that is used since 1987. 
 MSA can help us to reveal biological facts about proteins, like analysis of the 
secondary/tertiary structure. 
 MSA helps us to do a phylogenetic analysis of the sequences so as to construct 
evolutionary trees.
 Exhaustive search: 
extension of DP to multiple dimensions. 
 Progressive alignment: compute tree of sequences, based on hierarchical 
clustering, and then merge closest first, greedily. E.g. ClustalW 
 Block-based global alignment find highly conserved regions and then grow 
alignment around these regions. E.g. BLAST 
 Iterative search: based on genetic algorithm search. 
• Local alignments 
 Profile analysis 
 Block analysis 
 Patterns searching and/or Statistical methods
VTISCTGSSSNIGAG-NHVKWYQQLPG 
VTISCTGTSSNIGS--ITVNWYQQLPG 
LRLSCSSSGFIFSS--YAMYWVRQAPG 
LSLTCTVSGTSFDD--YYSTWVRQPPG 
PEVTCVVVDVSHEDPQVKFNWYVDG-- 
ATLVCLISDFYPGA--VTVAWKADS-- 
AALGCLVKDYFPEP--VTVSWNSG--- 
VSLTCLVKGFYPSD--IAVEWWSNG--
 Alignment of 2 sequences is represented as a 
2-row matrix 
 In a similar way, we represent alignment of 3 
sequences as a 3-row matrix 
A T _ G C G _ 
A _ C G T _ A 
A T C A C _ A 
 Score: more conserved columns, better alignment
 Align 3 sequences: ATGC, AATC,ATGC 
0 1 1 2 3 4 
A -- T G C 
0 1 2 3 3 4 
A A T -- C 
0 0 1 2 3 4 
-- A T G C 
x coordinate 
y coordinate 
z coordinate 
• Resulting path in (x,y,z) space: 
(0,0,0)(1,1,0)(1,2,1) (2,3,2) (3,3,3) (4,4,4)
C (i-1,j-1) C (i-1,j) 
C (i,j-1) 
In 2-D, 3 edges 
in each unit 
square 
In 3-D, 7 edges 
in each unit cube 
C(i-1,j-1,k-1) C(i-1,j,k-1) 
C(i-1,j-1,k) 
C(i,j-1,k) 
C (i-1,j,k) 
C(i,j,k) 
C(i,j-1,k-1) C(i,j,k-1) 
Enumerate all possibilities and choose the best one
 For three sequences of length n, the run time is proportional to the 
number of edges in the 3-D grid. i. e 7n . 
 For a k-way alignment, build a k-dimensional Manhattan graph 
with 
k 
 n nodes 
k k 
k 
 Most nodes have 2 -1 incoming edges 
 Runtime: 0(2 n ) 
 Consider 2 protein sequences of 100 amino acids in length. 
 If it takes 1002 (103) seconds to exhaustively align these sequences, then it will 
take 104 seconds to align 3 sequences, 105 to align 4 sequences, etc. 
 It will take ~1021 seconds to align 20 sequences. One year is ~3x107 seconds. The 
age of the visible universe is ~.4x1018 seconds.
 Greedy method follows the problem solving heuristic of 
making the locally optimal choice at each stage of k 
sequences with the hope of finding a global optimum to 
an alignment of of k-1 sequences/profiles. 
u1= ACGTACGTACGT… 
u2 = TTAATTAATTAA… 
u3 = ACTACTACTACT… 
… 
uk = CCGGCCGGCCGG 
u1= ACg/tTACg/tTACg/cT… 
u2 = TTAATTAATTAA… 
… 
uk = CCGGCCGGCCGG… 
k 
k-1
• Consider these 4 sequences 
s1 GATTCA 
s2 GTCTGA 
s3 GATATT 
s4 GTCAGC
4 
• There are = 6 possible alignments 2 
s2 GTCTGA 
s4 GTCAGC (score = 2) 
s1 GAT-TCA 
s2 G-TCTGA (score = 1) 
s1 GAT-TCA 
s3 GATAT-T (score 
s1 GATTCA-- 
s4 G—T-CAGC(score = 0) 
Match= +1 
Mismatch/gap= -1 
s2 G-TCTGA 
s3 GATAT-T (score = -1) 
s3 GAT-ATT 
= 1) s4 G-TCAGC 
(score = -1)
s2 and s4 are closest; combine: 
s2 GTCTGA 
s4 GTCAGC 
s2,4 GTCt/aGa/c 
(profile) 
new set of 3 sequences: 
s1 
s3 
s2,4 
GATTCA 
GATATT 
GTCt/aGa/c
s1 
s3 
s2,4 
GATTCA 
GATATT 
GTCt/aGa/c 
s1 GATTC- - A 
s2,4 G -T -CTGA 
(score = 0) 
s3 GATATT - 
s2,4 G -TCTGA 
(score = -1) 
s1 and s2,4 are closest; combine: 
s1 GATTC- - A 
S2,4 G -T -CTGA S1,2,4 Ga/-Tt/-ct/-g/-A 
s3 
S1,2,4 
GATATT 
Ga/-Tt/-ct/-g/-A 
s3 GATAT –T- - 
S1,2,4 GAT-TCTGA 
(score = 1) 
S1,2,3,4 GATa/-Tc/-Tg/-a/- 
Final Alignment:
 Computationally complex 
 If msa includes matches, mismatches and gaps and also 
accounts the degree of variation then msa can be applied 
to only a few sequences 
 Difficult to score 
 Multiple comparison necessary in each column of the msa for a 
cumulative score 
 Placement of gaps and scoring of substitution is more difficult 
 Difficulty increases with diversity 
 Relatively easy for a set of closely related sequences 
 Identifying the correct ancestry relationships for a set 
of distantly related sequences is more challenging 
 Even difficult if some members are more alike compared 
to others
 EMBL-EBI 
 http://www.ebi.ac.uk/clustalw/ 
 BCM Search Launcher: Multiple Alignment 
 http://dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align.html 
 Multiple Sequence Alignment for Proteins (Wash. U. St. Louis) 
 http://www.ibc.wustl.edu/service/msa/ 
web.warwick.ac.uk/telri/Bioinfo/ 
http://science.marshall.edu/murraye/ 
http://www.cs.iastate.edu/~cs544/Lectures/
Msa & rooted/unrooted tree

Weitere ähnliche Inhalte

Andere mochten auch

Application of Gauss,Green and Stokes Theorem
Application of Gauss,Green and Stokes TheoremApplication of Gauss,Green and Stokes Theorem
Application of Gauss,Green and Stokes TheoremSamiul Ehsan
 
Practical applications of limits
Practical applications of limitsPractical applications of limits
Practical applications of limitsmichael ocampo
 
Limits and continuity powerpoint
Limits and continuity powerpointLimits and continuity powerpoint
Limits and continuity powerpointcanalculus
 
Application of calculus in everyday life
Application of calculus in everyday lifeApplication of calculus in everyday life
Application of calculus in everyday lifeMohamed Ibrahim
 
Calculus in real life
Calculus in real lifeCalculus in real life
Calculus in real lifeSamiul Ehsan
 

Andere mochten auch (8)

Histogram
HistogramHistogram
Histogram
 
Radioactivity
RadioactivityRadioactivity
Radioactivity
 
Application of Gauss,Green and Stokes Theorem
Application of Gauss,Green and Stokes TheoremApplication of Gauss,Green and Stokes Theorem
Application of Gauss,Green and Stokes Theorem
 
Practical applications of limits
Practical applications of limitsPractical applications of limits
Practical applications of limits
 
Limits and continuity powerpoint
Limits and continuity powerpointLimits and continuity powerpoint
Limits and continuity powerpoint
 
Application of calculus in everyday life
Application of calculus in everyday lifeApplication of calculus in everyday life
Application of calculus in everyday life
 
Calculus in real life
Calculus in real lifeCalculus in real life
Calculus in real life
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Ähnlich wie Msa & rooted/unrooted tree

20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentSanaym
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...IOSR Journals
 
timeSeriesClassificationLDA
timeSeriesClassificationLDAtimeSeriesClassificationLDA
timeSeriesClassificationLDAKellen Betts
 
Presentation 2009 Journal Club Azhar Ali Shah
Presentation 2009 Journal Club Azhar Ali ShahPresentation 2009 Journal Club Azhar Ali Shah
Presentation 2009 Journal Club Azhar Ali Shahguest5de83e
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
Dynamic_Prog_Analysis_poster2
Dynamic_Prog_Analysis_poster2Dynamic_Prog_Analysis_poster2
Dynamic_Prog_Analysis_poster2Vineetha Vishnu
 
Bounded Approaches in Radio Labeling Square Grids -- Dev Ananda
Bounded Approaches in Radio Labeling Square Grids -- Dev AnandaBounded Approaches in Radio Labeling Square Grids -- Dev Ananda
Bounded Approaches in Radio Labeling Square Grids -- Dev AnandaDev Ananda
 
Traditional vs Nontraditional Methods for Network Analytics - Ernesto Estrada
Traditional vs Nontraditional Methods for Network Analytics - Ernesto EstradaTraditional vs Nontraditional Methods for Network Analytics - Ernesto Estrada
Traditional vs Nontraditional Methods for Network Analytics - Ernesto EstradaLake Como School of Advanced Studies
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 

Ähnlich wie Msa & rooted/unrooted tree (20)

Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Maximum parsimony
Maximum parsimonyMaximum parsimony
Maximum parsimony
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
 
timeSeriesClassificationLDA
timeSeriesClassificationLDAtimeSeriesClassificationLDA
timeSeriesClassificationLDA
 
Presentation 2009 Journal Club Azhar Ali Shah
Presentation 2009 Journal Club Azhar Ali ShahPresentation 2009 Journal Club Azhar Ali Shah
Presentation 2009 Journal Club Azhar Ali Shah
 
Alignments
AlignmentsAlignments
Alignments
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
Dynamic_Prog_Analysis_poster2
Dynamic_Prog_Analysis_poster2Dynamic_Prog_Analysis_poster2
Dynamic_Prog_Analysis_poster2
 
Bounded Approaches in Radio Labeling Square Grids -- Dev Ananda
Bounded Approaches in Radio Labeling Square Grids -- Dev AnandaBounded Approaches in Radio Labeling Square Grids -- Dev Ananda
Bounded Approaches in Radio Labeling Square Grids -- Dev Ananda
 
Traditional vs Nontraditional Methods for Network Analytics - Ernesto Estrada
Traditional vs Nontraditional Methods for Network Analytics - Ernesto EstradaTraditional vs Nontraditional Methods for Network Analytics - Ernesto Estrada
Traditional vs Nontraditional Methods for Network Analytics - Ernesto Estrada
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 

Kürzlich hochgeladen

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Kürzlich hochgeladen (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Msa & rooted/unrooted tree

  • 1.
  • 2.  "Phylogenetics" is the study or estimation of the evolutionary history that underlies that biological diversity.  The results of phylogenetic analysis are usually presented as a collection of nodes and branches. That is, a tree  In such tree, taxa that are closely related in an evolutionary sense appear close to each other, and taxa that are distantly related are in different (far) branches of the trees  Phylogenetic trees are also important for multiple sequence alignment
  • 3.  Trees may be rooted or unrooted.  Rooted trees reflect the most basal ancestor of the tree in question.  Unrooted trees do not imply a known ancestral root.  There are competing techniques for rooting a tree; one of the most common methods is through the use of an "outgroup" .  An outgroup is a species that have unambiguously separated early from the other species being considered. B
  • 4.  Multiple sequence alignment can be viewed as an extension of pairwise sequence alignment, but the complexity of the computation grows exponentially with the number of sequences.  MSA applies both to nucleotide and amino acid sequences  One of the most essential tools in molecular biology that is used since 1987.  MSA can help us to reveal biological facts about proteins, like analysis of the secondary/tertiary structure.  MSA helps us to do a phylogenetic analysis of the sequences so as to construct evolutionary trees.
  • 5.  Exhaustive search: extension of DP to multiple dimensions.  Progressive alignment: compute tree of sequences, based on hierarchical clustering, and then merge closest first, greedily. E.g. ClustalW  Block-based global alignment find highly conserved regions and then grow alignment around these regions. E.g. BLAST  Iterative search: based on genetic algorithm search. • Local alignments  Profile analysis  Block analysis  Patterns searching and/or Statistical methods
  • 6. VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWWSNG--
  • 7.  Alignment of 2 sequences is represented as a 2-row matrix  In a similar way, we represent alignment of 3 sequences as a 3-row matrix A T _ G C G _ A _ C G T _ A A T C A C _ A  Score: more conserved columns, better alignment
  • 8.  Align 3 sequences: ATGC, AATC,ATGC 0 1 1 2 3 4 A -- T G C 0 1 2 3 3 4 A A T -- C 0 0 1 2 3 4 -- A T G C x coordinate y coordinate z coordinate • Resulting path in (x,y,z) space: (0,0,0)(1,1,0)(1,2,1) (2,3,2) (3,3,3) (4,4,4)
  • 9. C (i-1,j-1) C (i-1,j) C (i,j-1) In 2-D, 3 edges in each unit square In 3-D, 7 edges in each unit cube C(i-1,j-1,k-1) C(i-1,j,k-1) C(i-1,j-1,k) C(i,j-1,k) C (i-1,j,k) C(i,j,k) C(i,j-1,k-1) C(i,j,k-1) Enumerate all possibilities and choose the best one
  • 10.  For three sequences of length n, the run time is proportional to the number of edges in the 3-D grid. i. e 7n .  For a k-way alignment, build a k-dimensional Manhattan graph with k  n nodes k k k  Most nodes have 2 -1 incoming edges  Runtime: 0(2 n )  Consider 2 protein sequences of 100 amino acids in length.  If it takes 1002 (103) seconds to exhaustively align these sequences, then it will take 104 seconds to align 3 sequences, 105 to align 4 sequences, etc.  It will take ~1021 seconds to align 20 sequences. One year is ~3x107 seconds. The age of the visible universe is ~.4x1018 seconds.
  • 11.  Greedy method follows the problem solving heuristic of making the locally optimal choice at each stage of k sequences with the hope of finding a global optimum to an alignment of of k-1 sequences/profiles. u1= ACGTACGTACGT… u2 = TTAATTAATTAA… u3 = ACTACTACTACT… … uk = CCGGCCGGCCGG u1= ACg/tTACg/tTACg/cT… u2 = TTAATTAATTAA… … uk = CCGGCCGGCCGG… k k-1
  • 12. • Consider these 4 sequences s1 GATTCA s2 GTCTGA s3 GATATT s4 GTCAGC
  • 13. 4 • There are = 6 possible alignments 2 s2 GTCTGA s4 GTCAGC (score = 2) s1 GAT-TCA s2 G-TCTGA (score = 1) s1 GAT-TCA s3 GATAT-T (score s1 GATTCA-- s4 G—T-CAGC(score = 0) Match= +1 Mismatch/gap= -1 s2 G-TCTGA s3 GATAT-T (score = -1) s3 GAT-ATT = 1) s4 G-TCAGC (score = -1)
  • 14. s2 and s4 are closest; combine: s2 GTCTGA s4 GTCAGC s2,4 GTCt/aGa/c (profile) new set of 3 sequences: s1 s3 s2,4 GATTCA GATATT GTCt/aGa/c
  • 15. s1 s3 s2,4 GATTCA GATATT GTCt/aGa/c s1 GATTC- - A s2,4 G -T -CTGA (score = 0) s3 GATATT - s2,4 G -TCTGA (score = -1) s1 and s2,4 are closest; combine: s1 GATTC- - A S2,4 G -T -CTGA S1,2,4 Ga/-Tt/-ct/-g/-A s3 S1,2,4 GATATT Ga/-Tt/-ct/-g/-A s3 GATAT –T- - S1,2,4 GAT-TCTGA (score = 1) S1,2,3,4 GATa/-Tc/-Tg/-a/- Final Alignment:
  • 16.  Computationally complex  If msa includes matches, mismatches and gaps and also accounts the degree of variation then msa can be applied to only a few sequences  Difficult to score  Multiple comparison necessary in each column of the msa for a cumulative score  Placement of gaps and scoring of substitution is more difficult  Difficulty increases with diversity  Relatively easy for a set of closely related sequences  Identifying the correct ancestry relationships for a set of distantly related sequences is more challenging  Even difficult if some members are more alike compared to others
  • 17.  EMBL-EBI  http://www.ebi.ac.uk/clustalw/  BCM Search Launcher: Multiple Alignment  http://dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align.html  Multiple Sequence Alignment for Proteins (Wash. U. St. Louis)  http://www.ibc.wustl.edu/service/msa/ web.warwick.ac.uk/telri/Bioinfo/ http://science.marshall.edu/murraye/ http://www.cs.iastate.edu/~cs544/Lectures/