SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
EVOLUTIONARY INACCURACY OF
PAIRWISE STRUCTURAL ALIGNMENTS
Presenter: Nguyen Dinh Chien (阮庭戰)
Authors:
M. I. Sadowski and W. R. Taylor
From Division of Mathematical Biology,
MRC National Institute for Medical Research, London, UK
 Structural alignment attempts to establish homology between two or more polymer
structures based on their shape and 3D confomation. This process is usually
applied to protein tertiary structures but can also be used for large RNA molecules.
 In this study, the authors analyzed the selft-consistency of 7 widely-used structural
alignment methods, such as, SAP, TM-align, MAMMOTH, DALI, CE, and FATCAT
on a diverse, non-redundant set of 1863 domains from the SCOP database.
 Results:
 The degree of inconsistency of the alignments on a residue level is 30%.
 Producing more consistent alignments than the rest.
 The methods able to identify good structural alignments is also accessed using geometric
measures.
Outline
INTRODUCTION
METHODS
RESULTS
DISCUSSION
INTRODUCTION
 The problem of alignment pairs of protein structures has attracted a significant level of
research effort.
 Kolodny et al., 2005 and Mayr et al., 2007 are important contributions. Kolodny‘s study tested
find a good solution as judged by geometric criteria, and Mayr’s study agreed the aligned
residues with a set of manually curated ‘gold standard’ alignments.
 They used geometric measures to assess the ability of aligners. They proposed that, if A and B are
homologous, B and C are homologous, then A and C must also be homologous.
 In this study, authors compared the most widely-used methods for pairwise structural alignment, and
considering alignment accuracy relative to other annotation sources: DSSP structural classes and
solvent accessibilities.
 They also used SCOP folds, GO annotations, topological distances, and several geometric scores to
external annotations.
 The different assessment methods highlight different strengths and weaknesses of each methods.
Outline
INTRODUCTION
METHODS
RESULTS
DISCUSSION
Data set
Structural alignment methods
Inconsistency measure
Calibration of data
Other geometric measures
Residue annotations
Assessment of symmetry
Data set
 In this study, the authors used a set of 1863 domains, which was
derived from the ASTRAL SCOP10 databases.
 SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html
 ASTRAL: http://astral.berkeley.edu/
 The set was restricted to high quality structures by requiring a SPACI
(Summary PDB ASTRAL Check Index
http://astral.berkeley.edu/spaci.html) score >0.5 and excluding NMR
(Nuclear Magnetic Resonance) structures (http://nmr.cit.nih.gov/xplor-
nih/xplorMan/node470.html) and those with missing residues.
Structural alignment methods
 All-versus-all pairwise structural alignments were generated using 7 methods: SAP,
DALILite, MAMMOTH-MULT, FATCAT, CE, TM-align, and Fr-TM-align.
 They selected these methods because of,
 Many cases are used to compute large sets of alignments for publicly available resources
(FATCAT and CE for PDB; DALILite for the DALI FSSP database), or
 have been used to draw conclusions about fold-space (SAP, TM-align, DALILite and
MAMMOTH-MULT)
 All methods were used with default parameters.
 They also used Andrea Prlic’s Java implementations of FATCAT and CE.
Inconsistency measure
 Inconsistency was assessed for all positions in any triplet (in a particular threshold) of
aligned structures. In case, a gap was found at that position in any of the three alignment
sequences, the position was ignored.
 For each position, they determined whether the condition E(Ai,Bj)∩E(Bj,Ck)∩E(Ai,Ck) was
true, where
 The predicate E(Xj,Yj) is defined as meaning position i in sequence X is aligned to position j in
sequence Y.
If condition is false, inconsistency=1, otherwise, inconsistency=0.
 The proportion of inconsistent positions was found for all aligned triples for each method
at each threshold and calculated as a percentage. All residues in this case is absolute
inconsistency.
 The subsets of residues with particular annotations is called relative inconsistency.
Calibration of data
 The RMSD (Root-mean-square deviation) and coverage values were used to
approximate TM-scores for the alignments generated by each method.
Approximate TM-score for TM-align – 0.981; real TM-score for them – 0.985.
However, approximate TM-score for the other methods were correlated with TM-align as
follows: SAP – 0.739; DALILite – 0.643; FATCAT-0.774; FATCAT (flexible mode)-0.639;
CE-0.837; Fr-TM-align-0.923)
 Next, they compared the fTM score with the methods own summary score to determine
which was likely to provide the best ranking.






 
















RSMDreportedthe-R
structurestheoflengthmeanthebeingL
residuesalignedofnumberthebeingC
2004)Skolnick,and(Zhang8.11524.1
,
1
3
0
2
0
LD
where
D
R
L
C
fTM
Calibration of data
Method .985 .986 .987 .988 .989 .990 .991 .992 .993 .994 .995 .996 .997 .998 .999
MMT 5.15 5.28 5.42 5.57 5.75 5.95 6.16 6.43 6.73 7.11 7.61 8.34 9.33 11.03 14.56
TM 0.417 0.421 0.426 0.431 0.438 0.445 0.453 0.461 0.472 0.484 0.499 0.517 0.540 0.570 0.617
FrTM 0.437 0.441 0.446 0.452 0.458 0.465 0.473 0.481 0.492 0.505 0.520 0.539 0.561 0.590 0.636
SAP 0.263 0.270 0.276 0.283 0.292 0.301 0.312 0.325 0.339 0.355 0.376 0.401 0.433 0.476 0.548
FTCT 0.397 0.403 0.408 0.415 0.422 0.430 0.440 0.450 0.462 0.475 0.492 0.512 0.538 0.572 0.625
DALI 0.364 0.370 0.376 0.382 0.390 0.398 0.407 0.418 0.431 0.446 0.463 0.483 0.510 0.547 0.603
FTCF 0.011 0.01 0.009 0.007 0.006 0.005 0.004 0.003 0.003 0.002 0.001 7e-04 3e-04 6e-05 1e-06
CE 0.398 0.403 0.409 0.415 0.422 0.430 0.439 0.449 0.460 0.474 0.490 0.510 0.535 0.570 0.621
Table S1: Thresholds used for the top 15 increments from 98.5% to 99.9% of alignments
Other geometric measures
 To assess geometric quality of reported alignments, they used the following formular
C
LLR
SI
),min( 21

 )21
0
,min(11
1
LL
W
R
C
MI









C
R
SAS
100

R - RMSD
C - alignment coverage
L1 and L2 - lengths of the two sequences
W0 - weighting parameter
W0=1.5 as in Kolodny et al., 2005
Residue annotations
 The catalytic site atlas annotations (Porter et al., 2004) and annotations
from PDB SITE records to produce datasets of functional residues
http://www.ebi.ac.uk/thornton-srv/databases/CSA_NEW/ .
 Secondary structure assignments and accessibility values were taken from DSSP
(Define Secondary Structure of Proteins) http://swift.cmbi.ru.nl/gv/dssp/
 Assessment of the consistency of the annotations was assessed
separately using chi-square test
class I (π-helix) almost always aligns with class H (α-helix). Isolated β-
bridges (B) align mostly with strands (class E) and the remaining non-
coil classes align significantly together, suggesting that at greater
distances these regions are interchangeable.
Residue annotations
Fold Description Mean
Inconsistency
SD
Inconsistency
N
b.80 SS R/H Beta Helix 100.00% 0.00% 5
a.118 Alpha/alpha superhelix 86.70% 7.30% 10
a.24 Four helix up/down bundle 80.00% 8.10% 13
b.69 7-bladed beta propellor 58.70% 5.60% 8
a.102 alpha/alpha toroid 57.60% 28.50% 8
b.55 PH domain like-barrel 8.60% 3.30% 10
d.38 Thioesterase 8.00% 2.60% 7
d.131 DNA clamp 7.90% 0.40% 5
b.34 SH3-like barrel 6.90% 6.20% 11
d.37 CBS-domain pair 6.50% 1.80% 5
Table S2: Most and least consistent domains. The SCOP folds and concomitant names are shown
for the five most and least consistently aligned domains at the highest threshold across all
methods are shown along with the number of neighbours at that level in the dataset.
Assessment of symmetry
 Symmetry values for protein structures were derived using the Fourier
transform-based approach described by Taylor et al. (2002)
 Inconsistency values per domain were the mean for all methods at the
highest threshold, which had 803 members; domains with fewer than 5
neighbors for TM-align were culled from the set, leaving 207 domains.
Outline
INTRODUCTION
METHODS
RESULTS
DISCUSSION
Choosing a score for ranking:
ROC assessment
Assessment of self-consistency
for structural aligners
Determining structural features
associated with inconsistencies
Assessment by geometric
measures
Choosing a score for ranking: ROC
assessment
 Mean AUC values for
ROC curves derived from
each possible score for
the methods presented.
 CE, DALI, FATCAT (rigid
mode) and Fr-TM-align all
perform excellently when
scored using the
approximate TM-score.
 MAMMOTH, FATCAT
(flexible mode) and SAP all
performless well regardless
of score.
Assessment of self-consistency for
structural aligners
 Fig 2: Inconsistency of pairwise
structural alignments. The
proportion of positions failing
transitive consistency is shown
for all alignment pairs in the
relevant fraction of the set.
The methods appear in the
order FATCAT-flexible,
MAMMOTH, CE, FATCAT, TM-
align, DALI, Fr-TM-align, SAP
from top to bottom on the left-
hand edge of the graph.
Determining structural features associated with
inconsistencies
 Fig 3: Improved consistency
at residues marked
functional. Absolute rates
of inconsistency are shown
for functional residues
(solid lines) and all residues
(dashed lines) for the three
most consistent methods.
These appear in the order
DALI, Fr-TM-align, SAP
from the top downwards
along the left-hand edge.
Determining structural features
associated with inconsistencies
 Fig 4. Relative inconsistencies for
DSSP residue classes.
Inconsistencies are shown as a
percentage of the absolute
value for each method. The
upper panel shows results for
the top 0.01% of alignments,
the bottom the top 1.5%.
Determining structural features
associated with inconsistencies
 Fig 5. Relative inconsistency for
three methods in relation to
solvent accessibility. Solvent
accessibility was split into classes
in bins of 20% with 0 being the
lowest. Panels are arranged as
in Figure 4.
Determining structural features associated with
inconsistencies
 Figure S1: symmetry and
inconsistency. Mean
inconsistency (X-axis) for 233
domains with more than 5
neighbours at the highest
level of structural similarity is
plotted against the power of
the Fourier series as a
measure of the internal
symmetry of the structure (Y-
axis, arbitrary units).
Determining structural features
associated with inconsistencies
 Fig 6. Relative inconsistency
as a function of gap distance.
Panels are arranged as
in Figure 4.
Assessment by geometric
measures
 TM-align, FATCAT (flexible),
and Fr-TM-align are best
three methods in all case
regardless of the metric used.
 SAP and MAMMOTH both
rank as worst by all metrics.
Outline
INTRODUCTION
METHODS
RESULTS
DISCUSSION
Discussion
Even for the most consistent methods the level of inconsistency is very high.
The most significant contributory factor to inconsistent structural alignments is
the treatment of gaps.
Another important issue is that optimization of structural similarity is not in all
cases the ideal strategy for identifying homology.
Flexible alignment is correctly seen as an important innovation in aligning
protein structures, however our results demonstrate that it is not a panacea.
The least consistently aligned domains are the repeats such as beta-helices
and the least consistently aligned elements are generally helices.
Another possibility for improving the results of large-scale pairwise alignments
(e.g. in database search or when using large datasets) is to realign significantly
similar structures using a consistency criterion
Thanks for your attention!

Weitere ähnliche Inhalte

Ähnlich wie Evolutionary inaccuracy of pairwise structural alignments (slide)

Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
jaumebp
 

Ähnlich wie Evolutionary inaccuracy of pairwise structural alignments (slide) (20)

MTP Presentation.pptx
MTP Presentation.pptxMTP Presentation.pptx
MTP Presentation.pptx
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Structure alignment methods
Structure alignment methodsStructure alignment methods
Structure alignment methods
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shah
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
 
Treatment by alternative methods of regression gas chromatographic retention ...
Treatment by alternative methods of regression gas chromatographic retention ...Treatment by alternative methods of regression gas chromatographic retention ...
Treatment by alternative methods of regression gas chromatographic retention ...
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Aligning Subunits of Internally Symmetric Proteins with CE-Symm
Aligning Subunits of Internally Symmetric Proteins with CE-SymmAligning Subunits of Internally Symmetric Proteins with CE-Symm
Aligning Subunits of Internally Symmetric Proteins with CE-Symm
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
 
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
P0126557 report
P0126557 reportP0126557 report
P0126557 report
 
Effect of Residual Modes on Dynamically Condensed Spacecraft Structure
Effect of Residual Modes on Dynamically Condensed Spacecraft StructureEffect of Residual Modes on Dynamically Condensed Spacecraft Structure
Effect of Residual Modes on Dynamically Condensed Spacecraft Structure
 
Recursive
RecursiveRecursive
Recursive
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...
A Method for Determining and Improving the Horizontal Accuracy of Geospatial ...
 
Cray HPC + D + A = HPDA
Cray HPC + D + A = HPDACray HPC + D + A = HPDA
Cray HPC + D + A = HPDA
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubts
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
A Simulation Experiment on a Built-In Self Test Equipped with Pseudorandom Te...
A Simulation Experiment on a Built-In Self Test Equipped with Pseudorandom Te...A Simulation Experiment on a Built-In Self Test Equipped with Pseudorandom Te...
A Simulation Experiment on a Built-In Self Test Equipped with Pseudorandom Te...
 

Mehr von Nguyen Chien

Giao Trinh Vi Xu Ly (20 12 2008)
Giao Trinh Vi Xu Ly (20 12 2008)Giao Trinh Vi Xu Ly (20 12 2008)
Giao Trinh Vi Xu Ly (20 12 2008)
Nguyen Chien
 
Vxl Dahl 2009 05 08
Vxl Dahl 2009 05 08Vxl Dahl 2009 05 08
Vxl Dahl 2009 05 08
Nguyen Chien
 
Ham Giai Phuong Trinh Bac 2
Ham Giai Phuong Trinh Bac 2Ham Giai Phuong Trinh Bac 2
Ham Giai Phuong Trinh Bac 2
Nguyen Chien
 
Quy Che Quyet Dinh 14
Quy Che Quyet Dinh 14Quy Che Quyet Dinh 14
Quy Che Quyet Dinh 14
Nguyen Chien
 

Mehr von Nguyen Chien (7)

P0126557 slides
P0126557 slidesP0126557 slides
P0126557 slides
 
Giao Trinh Vi Xu Ly (20 12 2008)
Giao Trinh Vi Xu Ly (20 12 2008)Giao Trinh Vi Xu Ly (20 12 2008)
Giao Trinh Vi Xu Ly (20 12 2008)
 
Vxl Dahl 2009 05 08
Vxl Dahl 2009 05 08Vxl Dahl 2009 05 08
Vxl Dahl 2009 05 08
 
Mips Assembly
Mips AssemblyMips Assembly
Mips Assembly
 
Risc
RiscRisc
Risc
 
Ham Giai Phuong Trinh Bac 2
Ham Giai Phuong Trinh Bac 2Ham Giai Phuong Trinh Bac 2
Ham Giai Phuong Trinh Bac 2
 
Quy Che Quyet Dinh 14
Quy Che Quyet Dinh 14Quy Che Quyet Dinh 14
Quy Che Quyet Dinh 14
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Evolutionary inaccuracy of pairwise structural alignments (slide)

  • 1. EVOLUTIONARY INACCURACY OF PAIRWISE STRUCTURAL ALIGNMENTS Presenter: Nguyen Dinh Chien (阮庭戰) Authors: M. I. Sadowski and W. R. Taylor From Division of Mathematical Biology, MRC National Institute for Medical Research, London, UK
  • 2.  Structural alignment attempts to establish homology between two or more polymer structures based on their shape and 3D confomation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules.  In this study, the authors analyzed the selft-consistency of 7 widely-used structural alignment methods, such as, SAP, TM-align, MAMMOTH, DALI, CE, and FATCAT on a diverse, non-redundant set of 1863 domains from the SCOP database.  Results:  The degree of inconsistency of the alignments on a residue level is 30%.  Producing more consistent alignments than the rest.  The methods able to identify good structural alignments is also accessed using geometric measures.
  • 4. INTRODUCTION  The problem of alignment pairs of protein structures has attracted a significant level of research effort.  Kolodny et al., 2005 and Mayr et al., 2007 are important contributions. Kolodny‘s study tested find a good solution as judged by geometric criteria, and Mayr’s study agreed the aligned residues with a set of manually curated ‘gold standard’ alignments.  They used geometric measures to assess the ability of aligners. They proposed that, if A and B are homologous, B and C are homologous, then A and C must also be homologous.  In this study, authors compared the most widely-used methods for pairwise structural alignment, and considering alignment accuracy relative to other annotation sources: DSSP structural classes and solvent accessibilities.  They also used SCOP folds, GO annotations, topological distances, and several geometric scores to external annotations.  The different assessment methods highlight different strengths and weaknesses of each methods.
  • 5. Outline INTRODUCTION METHODS RESULTS DISCUSSION Data set Structural alignment methods Inconsistency measure Calibration of data Other geometric measures Residue annotations Assessment of symmetry
  • 6. Data set  In this study, the authors used a set of 1863 domains, which was derived from the ASTRAL SCOP10 databases.  SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html  ASTRAL: http://astral.berkeley.edu/  The set was restricted to high quality structures by requiring a SPACI (Summary PDB ASTRAL Check Index http://astral.berkeley.edu/spaci.html) score >0.5 and excluding NMR (Nuclear Magnetic Resonance) structures (http://nmr.cit.nih.gov/xplor- nih/xplorMan/node470.html) and those with missing residues.
  • 7. Structural alignment methods  All-versus-all pairwise structural alignments were generated using 7 methods: SAP, DALILite, MAMMOTH-MULT, FATCAT, CE, TM-align, and Fr-TM-align.  They selected these methods because of,  Many cases are used to compute large sets of alignments for publicly available resources (FATCAT and CE for PDB; DALILite for the DALI FSSP database), or  have been used to draw conclusions about fold-space (SAP, TM-align, DALILite and MAMMOTH-MULT)  All methods were used with default parameters.  They also used Andrea Prlic’s Java implementations of FATCAT and CE.
  • 8. Inconsistency measure  Inconsistency was assessed for all positions in any triplet (in a particular threshold) of aligned structures. In case, a gap was found at that position in any of the three alignment sequences, the position was ignored.  For each position, they determined whether the condition E(Ai,Bj)∩E(Bj,Ck)∩E(Ai,Ck) was true, where  The predicate E(Xj,Yj) is defined as meaning position i in sequence X is aligned to position j in sequence Y. If condition is false, inconsistency=1, otherwise, inconsistency=0.  The proportion of inconsistent positions was found for all aligned triples for each method at each threshold and calculated as a percentage. All residues in this case is absolute inconsistency.  The subsets of residues with particular annotations is called relative inconsistency.
  • 9. Calibration of data  The RMSD (Root-mean-square deviation) and coverage values were used to approximate TM-scores for the alignments generated by each method. Approximate TM-score for TM-align – 0.981; real TM-score for them – 0.985. However, approximate TM-score for the other methods were correlated with TM-align as follows: SAP – 0.739; DALILite – 0.643; FATCAT-0.774; FATCAT (flexible mode)-0.639; CE-0.837; Fr-TM-align-0.923)  Next, they compared the fTM score with the methods own summary score to determine which was likely to provide the best ranking.                         RSMDreportedthe-R structurestheoflengthmeanthebeingL residuesalignedofnumberthebeingC 2004)Skolnick,and(Zhang8.11524.1 , 1 3 0 2 0 LD where D R L C fTM
  • 10. Calibration of data Method .985 .986 .987 .988 .989 .990 .991 .992 .993 .994 .995 .996 .997 .998 .999 MMT 5.15 5.28 5.42 5.57 5.75 5.95 6.16 6.43 6.73 7.11 7.61 8.34 9.33 11.03 14.56 TM 0.417 0.421 0.426 0.431 0.438 0.445 0.453 0.461 0.472 0.484 0.499 0.517 0.540 0.570 0.617 FrTM 0.437 0.441 0.446 0.452 0.458 0.465 0.473 0.481 0.492 0.505 0.520 0.539 0.561 0.590 0.636 SAP 0.263 0.270 0.276 0.283 0.292 0.301 0.312 0.325 0.339 0.355 0.376 0.401 0.433 0.476 0.548 FTCT 0.397 0.403 0.408 0.415 0.422 0.430 0.440 0.450 0.462 0.475 0.492 0.512 0.538 0.572 0.625 DALI 0.364 0.370 0.376 0.382 0.390 0.398 0.407 0.418 0.431 0.446 0.463 0.483 0.510 0.547 0.603 FTCF 0.011 0.01 0.009 0.007 0.006 0.005 0.004 0.003 0.003 0.002 0.001 7e-04 3e-04 6e-05 1e-06 CE 0.398 0.403 0.409 0.415 0.422 0.430 0.439 0.449 0.460 0.474 0.490 0.510 0.535 0.570 0.621 Table S1: Thresholds used for the top 15 increments from 98.5% to 99.9% of alignments
  • 11. Other geometric measures  To assess geometric quality of reported alignments, they used the following formular C LLR SI ),min( 21   )21 0 ,min(11 1 LL W R C MI          C R SAS 100  R - RMSD C - alignment coverage L1 and L2 - lengths of the two sequences W0 - weighting parameter W0=1.5 as in Kolodny et al., 2005
  • 12. Residue annotations  The catalytic site atlas annotations (Porter et al., 2004) and annotations from PDB SITE records to produce datasets of functional residues http://www.ebi.ac.uk/thornton-srv/databases/CSA_NEW/ .  Secondary structure assignments and accessibility values were taken from DSSP (Define Secondary Structure of Proteins) http://swift.cmbi.ru.nl/gv/dssp/  Assessment of the consistency of the annotations was assessed separately using chi-square test class I (π-helix) almost always aligns with class H (α-helix). Isolated β- bridges (B) align mostly with strands (class E) and the remaining non- coil classes align significantly together, suggesting that at greater distances these regions are interchangeable.
  • 13. Residue annotations Fold Description Mean Inconsistency SD Inconsistency N b.80 SS R/H Beta Helix 100.00% 0.00% 5 a.118 Alpha/alpha superhelix 86.70% 7.30% 10 a.24 Four helix up/down bundle 80.00% 8.10% 13 b.69 7-bladed beta propellor 58.70% 5.60% 8 a.102 alpha/alpha toroid 57.60% 28.50% 8 b.55 PH domain like-barrel 8.60% 3.30% 10 d.38 Thioesterase 8.00% 2.60% 7 d.131 DNA clamp 7.90% 0.40% 5 b.34 SH3-like barrel 6.90% 6.20% 11 d.37 CBS-domain pair 6.50% 1.80% 5 Table S2: Most and least consistent domains. The SCOP folds and concomitant names are shown for the five most and least consistently aligned domains at the highest threshold across all methods are shown along with the number of neighbours at that level in the dataset.
  • 14. Assessment of symmetry  Symmetry values for protein structures were derived using the Fourier transform-based approach described by Taylor et al. (2002)  Inconsistency values per domain were the mean for all methods at the highest threshold, which had 803 members; domains with fewer than 5 neighbors for TM-align were culled from the set, leaving 207 domains.
  • 15. Outline INTRODUCTION METHODS RESULTS DISCUSSION Choosing a score for ranking: ROC assessment Assessment of self-consistency for structural aligners Determining structural features associated with inconsistencies Assessment by geometric measures
  • 16. Choosing a score for ranking: ROC assessment  Mean AUC values for ROC curves derived from each possible score for the methods presented.  CE, DALI, FATCAT (rigid mode) and Fr-TM-align all perform excellently when scored using the approximate TM-score.  MAMMOTH, FATCAT (flexible mode) and SAP all performless well regardless of score.
  • 17. Assessment of self-consistency for structural aligners  Fig 2: Inconsistency of pairwise structural alignments. The proportion of positions failing transitive consistency is shown for all alignment pairs in the relevant fraction of the set. The methods appear in the order FATCAT-flexible, MAMMOTH, CE, FATCAT, TM- align, DALI, Fr-TM-align, SAP from top to bottom on the left- hand edge of the graph.
  • 18. Determining structural features associated with inconsistencies  Fig 3: Improved consistency at residues marked functional. Absolute rates of inconsistency are shown for functional residues (solid lines) and all residues (dashed lines) for the three most consistent methods. These appear in the order DALI, Fr-TM-align, SAP from the top downwards along the left-hand edge.
  • 19. Determining structural features associated with inconsistencies  Fig 4. Relative inconsistencies for DSSP residue classes. Inconsistencies are shown as a percentage of the absolute value for each method. The upper panel shows results for the top 0.01% of alignments, the bottom the top 1.5%.
  • 20. Determining structural features associated with inconsistencies  Fig 5. Relative inconsistency for three methods in relation to solvent accessibility. Solvent accessibility was split into classes in bins of 20% with 0 being the lowest. Panels are arranged as in Figure 4.
  • 21. Determining structural features associated with inconsistencies  Figure S1: symmetry and inconsistency. Mean inconsistency (X-axis) for 233 domains with more than 5 neighbours at the highest level of structural similarity is plotted against the power of the Fourier series as a measure of the internal symmetry of the structure (Y- axis, arbitrary units).
  • 22. Determining structural features associated with inconsistencies  Fig 6. Relative inconsistency as a function of gap distance. Panels are arranged as in Figure 4.
  • 23. Assessment by geometric measures  TM-align, FATCAT (flexible), and Fr-TM-align are best three methods in all case regardless of the metric used.  SAP and MAMMOTH both rank as worst by all metrics.
  • 25. Discussion Even for the most consistent methods the level of inconsistency is very high. The most significant contributory factor to inconsistent structural alignments is the treatment of gaps. Another important issue is that optimization of structural similarity is not in all cases the ideal strategy for identifying homology. Flexible alignment is correctly seen as an important innovation in aligning protein structures, however our results demonstrate that it is not a panacea. The least consistently aligned domains are the repeats such as beta-helices and the least consistently aligned elements are generally helices. Another possibility for improving the results of large-scale pairwise alignments (e.g. in database search or when using large datasets) is to realign significantly similar structures using a consistency criterion
  • 26. Thanks for your attention!