Symmetry is a common and significant feature of protein structures. Symmetry has been found to be important for understanding protein evolution, DNA binding, allosteric regulation, cooperativity, and folding. We have compiled a census of internal symmetry, conducted using the novel CE-Symm algorithm. We find that internal symmetry is present in at least 18% of superfamilies. To elucidate the relationship between symmetry and protein function, the census is analyzed with respect to structural classification, enzyme activity, and ligand binding. The CE-Symm algorithm was benchmarked against a manually curated set of ~1000 domains.
Myers-Turnbull, D., Bliven, S. E., Rose, P. W., Aziz, Z. K., Youkharibache, P., Bourne, P. E., & Prlić, A. (2014). Systematic Detection of Internal Symmetry in Proteins Using CE-Symm. Journal of Molecular Biology, 426(11), 2255–2268. PMID: 24681267
This poster was presented at the 22nd Annual International Conference on Intelligent Systems for Molecular Biology (2014).
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
ISMB 2014 Poster: Systematic detection of internal symmetry in proteins
1. Case Studies!
!
Glyoxalase I is a dimer in both C. acetobutylicum and E. coli.
However, the arrangement of the chains is quite different. Detecting
the C2 internal symmetry in each chain reveals that the two active
sites are each composed of two structural repeats, giving an overall
dihedral symmetry. A monomer from 1,2-dihydroxy-naphthalene
dioxygenase also contains four copies of the repeat in the same
arrangement.
ABC transporters are responsible for transporting a wide range of
metabolites across the cell membrane. The Vitamin B12 transporter
BtuCD is a heterodimer with C2 quaternary symmetry. It binds a
single BtuF subunit on the periplasmic face. BtuF has two-fold
pseudosymmetry, which binds each of the two BtuC subunits and
induces a slight asymmetry in the overall conformation.
Systematic detection of internal symmetry in proteins
Spencer E. Bliven1,8,*, Douglas Myers-Turnbull2, Peter W. Rose3,7, Zaid K. Aziz4, Philippe
Youkharibache6, Philip E. Bourne5,7, Andreas Prlić3,7,**
Bioinformatics and Systems Biology Program1, Department of Computer Science and Engineering2, San Diego Supercomputer Center3, and Department of Chemistry and Biochemistry4, Skaggs School of Pharmacy and Pharmaceutical Sciences5, University of California San Diego. InPharmatics
Corporation6. RCSB PDB7. Intramural Research Program of the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health8. *sbliven@ucsd.edu
CE-Symm!
The CE-Symm algorithm has been created to detect internal
symmetry in proteins. It is available as a stand-alone command line
tool, as part of the BioJava software library,14 and as a web server (see
Availability).
CE-Symm first identifies structurally similar regions within the protein
structure. It then refines this alignment to improve the correspondence
between structural repeats.
1. Identify structurally similar regions!
The CE-Symm algorithm starts by identifying a non-trivial structural
alignment between a protein and itself using Combinatorial Extension10
(CE). This uses the dynamic programming and progressive refinement
o f C E , b u t w i t h t wo
modifications.
1.A strong penalty term is
added to self-aligned
residues to prevent the
trivial 0° rotation from
dominating.
2.The alignment matrix is
duplicated in the manner of
Uliel et al.11 to account for
the circular permutation
which is introduced when
comparing a symmetric
protein against a rotated
copy of itself.
(Left) Fibroblast growth factor 1 [3JUT], colored to show internal symmetry. (Right) Dot plot
showing equivalent residues within the protein. Red lines correspond to a 120° clockwise
rotation of the protein around the 3-fold axis, and cyan to the 240° rotation. After
duplicating the matrix, each alignment forms a sequential diagonal line which can be fully
detected by CE. Gray shading indicates regions near the diagonal which are penalized by the
scoring function.
Poster first presented at the 22nd Annual International Conference on Intelligent Systems for Molecular Biology (2014).
The RCSB PDB is supported by the National Science Foundation [NSF DBI 0829586]; National Institute of General
Medical Sciences; Office of Science, Department of Energy; National Library of Medicine; National Cancer Institute;
National Institute of Neurological Disorders and Stroke; and the National Institute of Diabetes & Digestive & Kidney
Diseases. The RCSB PDB is a member of the wwPDB.
This research was supported by the Intramural Research Program of the National Center for Biotechnology Information,
National Library of Medicine, National Institutes of Health.
Abstract!
Symmetry is a common and significant feature of protein structures. Symmetry has been found to be
important for understanding protein evolution, DNA binding, allosteric regulation, cooperativity, and
folding. We have compiled a census of internal symmetry, conducted using the novel CE-Symm
algorithm. We find that internal symmetry is present in at least 18% of superfamilies. To elucidate the
relationship between symmetry and protein function, the census is analyzed with respect to structural
classification, enzyme activity, and ligand binding. The CE-Symm algorithm was benchmarked against a
manually curated set of ~1000 domains.
!
Myers-Turnbull, D., Bliven, S. E., Rose, P. W., Aziz, Z. K., Youkharibache, P., Bourne, P. E., & Prlić, A.
(2014). Systematic Detection of Internal Symmetry in Proteins Using CE-Symm. Journal of
Molecular Biology, 426(11), 2255–2268. PMID: 24681267
1. Lee, J. & Blaber, M. PNAS 108, 126–130 (2011).
2. Monod, J. et al. J Mol Biol 12, 88–118 (1965).
3. Juo, Z. S. et al. J Mol Biol 261, 239–254 (1996).
4. Goodsell, D. S. & Olson, A. J. Annu Rev Biophys
Biomol Struct 29, 105–153 (2000).
5. Gosavi, S. et al. J Mol Biol 357, 986–996 (2006).
6. Fortenberry, C. et al. J Am Chem Soc 133, 18026–
18029 (2011).
7. Murray, K. B. et al. J Mol Biol 316, 341–363 (2002).
8. Kim, C. et al. BMC Bioinformatics 11, 303 (2010).
9. Guerler, A. et al. J Chem Inf Model 49, 2147–2151
(2009).
10. Shindyalov, I. N. & Bourne, P. E. Protein Eng 11,
739–747 (1998).
11. Uliel, S. et al. Bioinformatics 15, 930–936 (1999).
12. Abraham, A.-L. et al. J Mol Biol 394, 522–534
(2009).
13. Zhang, Y., & Skolnick, J. Proteins: Structure,
Function, and Bioinformatics, 57(4), 702–710
(2004).
14. Prlić, A. et al. Bioinformatics, 28(20), 2693–2695
(2012).
15. Neuwald, A. F. Nucleic Acids Research, 33(11),
3614–3628 (2005).
16. Zuccola, H. J., Filman, D. J., Coen, D. M., & Hogle,
J. M. Cell, 5(2), 267–278 (2000).
References
Quaternary Structure Symmetry!
Quaternary symmetry consists of multiple identical polypeptide chains arranged in a symmetric
fashion. Such symmetry is extremely common in proteins, occurring in approximately 80% of
structures in the Protein Data Bank (PDB). Detecting quaternary symmetry relies on accurate
assignment of the correct biological assembly for each protein. The PDB now annotates protein
structures with their quaternary symmetry (Peter Rose et al., in preparation).
For quaternary symmetry, only the subunits in the biological assembly are considered. The subunits
may surround either a crystallographic axis (for crystal structures) or a non-crystallographic axis.
However, because the equivalent chains are identical, a one-to-one relationship exists between atoms in
each symmetry unit.
Internal Symmetry!
Proteins can also have internal symmetry, when a single chain contains two or more equivalent
structural repeats. The repeats generally will differ in the exact sequence, but have substantially similar
structures. Internal symmetry is sometimes styled as pseudosymmetry to reflect that the equivalence
between repeats is generally at the level of residues or secondary structure elements rather than precise
coordinates, as with quaternary symmetry.
GTP cyclohydrolase I [1A8R]
D5
Rhinovirus 2 [3DPR]
Icosahedral
Hemoglobin [4HHB]
C2 (but pseudo D2)
AmtB Ammonia Channel
[1U7G] C3
Symmetry & Function!
Both quaternary and internal symmetry are linked to a wide range of protein functions.
Ligand Binding!
Ligands often bind near the axis of symmetry. Of symmetric domains
with ligands, 63% have the ligand within 5Å of the axis of symmetry;
in 37% it is within 1Å.
Symmetric proteins often bind symmetric ligands, such as metals.
DNA binding proteins often utilize symmetry. Many transcription
factors are symmetric dimers and recognize palindromic sequences.
The TATA binding protein (right) is an internally symmetric monomer
which has evolved to recognize a non-palindromic sequence.3
Allosteric Regulation!
Cooperativity can arise from coordinated movements in symmetric subunits. 2 This mechanism holds
for both quaternary symmetry (e.g. in hemoglobin) and for internally symmetric proteins.4
Protein Folding!
Internal symmetry can smooth the folding landscape and reduce folding time.5
Internal repeats can fold quasi-independently
Misfolding of one repeat can trigger degradation of the whole protein, unlike in quaternary
symmetric complexes.
Experimental Tools!
Aid the computational design of large proteins6
Improve search for distance homologs15
CE-Symm Availability!
Web server: source.rcsb.org/jfatcatserver/symmetry.jsp
Download & Source code: github.com/rcsb/symmetry (LGPL)
Screenshot of the CE-Symm interface,
showing a two-fold axis of EPSP
synthase [1G6S].
TATA Binding
Evolution!
Internal symmetry can arise from quaternary symmetry by gene duplication or fusion. Thus, in addition
to the many functional implications of symmetry, identifying protein symmetry can provide
information about the evolutionary history of a protein. Such fission and fusion events often preserve
the overall structure and function of the active complex.1
Many proteins with higher order symmetry appear to have undergone several duplication events. For
instance, DNA clamps are composed of 12 structural repeats arranged in a ring. Pairs of these repeats
form domains with the ‘processivity fold,’ which can also be found in non-ring conformations in some
species.16 Six such domains form a complete ring, but they are fused together into either two (bacteria)
or three (eukaryotes, archae, and viruses) chains.
Dimeric bacterial clamp: DNA
polymerase III beta subunit
from E. Coli [1mmi]
Trimeric eukaryotic clamp:
proliferating cell nuclear
antigen from humans [1vym]
Trimeric clamp, colored to show
the 12 structural repeats
Single domain, as viewed from
the center of the ring.
12-mer 6-mer
Eukaryotic Trimer
Bacterial Dimer
Benchmark!
A benchmark of 1007 proteins from different SCOP superfamilies was created by manually inspecting
each for internal symmetry. Structures with less than 4 secondary structure were omitted from the
benchmark. 24% of the superfamilies were found to have internal symmetry or large structural repeats.
Comparison of CE-Symm and SymD8 performance. Dots
represent default thresholds for determining symmetry.
Order
Number of
Superfamilies
% symmetric
Asymmetric 766 76.10%
Rotational
2 166 16.5%
3 10 1.0%
4 2 0.2%
5 3 0.3%
6 9 0.9%
7 9 0.9%
8 21 2.1%
Dihedral
2 2 0.2%
4 1 0.1%
Helical
2 9 0.9%
3 2 0.2%
Non-integral 2 0.2%
Superhelical 2 0.2%
Translational 3 0.3%
AUC=.95, .87
Census!
CE-Symm was run on every domain in
SCOPe 2.03. The census is available at
source.rcsb.org/jfatcatserver/scopResults.jsp.
Percentage of SCOP superfamiles with internal symmetry, as
detected by CE-Symm
SCOP class Number of
Superfamilies
% symmetric
α 507 18.5%
β 354 24.6%
α/β 244 16.8%
α+β 551 14.3%
multi-domain 66 4.5%
membrane 109 23.8%
All classes 1831 18.0%
Percentage of internal symmetry detected by CE-Symm in
domains annotated with Enzyme Commission numbers.
Glyoxalase I from Clostridium
acetobutylicum [3HDP]
(Nickel; Dimer)
Glyoxalase I from E. coli [1F9Z]
(Nickel; Dimer)
1,2-dihydroxy-naphthalene
dioxygenase from Pseudomonas
sp. strain C18 [2EHZ]
(Iron; Octamer)
Pseudo D2 symmetry of the
complex, colored to show the
four repeats.
Vitamin B12 transporter BtuCD from E.
coli, in complex with periplasmic-binding
protein BtuF (pink) [PDB:4FI3].
Ferredoxin-like
[d2j5aa1]
C2
Beta-trefoil
[3JUT]
C3
Beta-trefoil
[3JUT]
C3
Beta-trefoil
[3JUT]
C3