3. WHY ARMADILLOS? BIOLOGICAL INTEREST
Roles in localization, protein transport, & more
ß-catenin: cell adhesion, development
α-importin: nuclear localization
APC: tumor suppressor gene, linked to colorectal cancer
Probably homologous to HEAT repeat family
Slightly different structure
huntingtin: disease, nerve signaling, protein transport
ß-importin: nuclear localization
phosphotases/kinases
Ancient family (primarily eukaryotic, but predates metazoans)
4. WHY ARMADILLOS? PROTEIN-PROTEIN BINDING
Bind peptides, so could be used like antibodies or
DARPins (therapeutics, biotech, assays, etc.)
Bind extended chains
Target disordered regions and termini
Linear epitope, so much easier to design
Modular binding
[5AEI]
6. TANDEM REPEAT EVOLUTION
Duplications & fusions within a gene lead to tandem repeats
Speciation and gene duplication lead to orthologs and paralogs
Pattern of repeats tells us the sequence of evolutionary events
7. HEAT & ARM
Andrade MA, Petosa C, O'donoghue SI, Müller CW, Bork P. Comparison of ARM and HEAT protein repeats. J Mol Biol.
Academic Press; 2001 May 25;309(1):1–18.
8. ARM FAMILY
Gul, I. S., Hulpiau, P., Saeys, Y., & van Roy, F. (2017) Cellular and Molecular Life Sciences, 74(3), 525–541
∂-catenins & ARM formins
ß-catenin not with ∂-catenins
ß-importin
HEAT
(outgroup)
Catenin
beta-like
α-importin
9. LIMITATIONS OF PRIOR STUDIES
Don’t model repeat evolution
Either use full-length sequences (no support for copy variation) or single
repeats (inconsistent boundaries, repeats segregate differently between
species)
No reconciliation between gene tree and repeat tree
Older papers use limited species and sequences
Inconsistent inclusion of HEAT repeats
MY APPROACH
Detect repeats with TRAL (cpHMM)
Alignment & tree inference with ProGraphML+TR
Joint gene tree and repeat tree inference (future work)
10. TRAL
Tandem Repeat Annotation Library
Circularly permuted Hidden Markov Model (cpHMM) for tandem
repeat alignment
Integrates repeat detection software
Important for expanding analysis beyond ArmRP family
Schaper et al. (2015). TRAL: tandem repeat annotation library. Bioinformatics, 31(18), 3051–3053.
Schaper E, Gascuel O, Anisimova M. Deep conservation of human protein tandem repeats within the eukaryotes. Mol Biol Evol. 2014
May;31(5):1132–48.
11. DETECTED REPEATS BY SPECIES (GUL HMM)
Species ArmRP Proteins
Macrostomum lignano 170
Echinostoma caproni 163
Lingula anatina 125
human 107
zebrafish 107
scaled quail 100
tropical clawed frog 95
owl limpet 93
starlet sea anemone 93
Florida lancelet 90
Japanese sea cucumber 84
Schistocephalus solidus 84
Octopus bimaculoides 82
Biomphalaria glabrata 82
purple sea urchin 81
platypus 75
green sea turtle 75
Stylophora pistillata 75
Wild Bactrian camel 72
Amphimedon queenslandica 68
Number of Proteins
Numberofspecies
94 species
12. PROGRAPHML+TR
Szalkowski AM, Anisimova M. Graph-based modeling of tandem repeats improves global multiple sequence alignment. Nucleic Acids Res.
2013 Sep;41(17):e162–2.
13. OUTLOOK: EVOLUTION
Improve Arm profiles based on structural searches
MMTF-pySpark for rapid structural searches
Finish phylogenetic reconstruction with ProGraphML+TR on diverse
species
Joint gene-repeat reconstruction
Analogous to joint species-gene tree inference (e.g. Szöllosi et al, 2015)
15. MOTIVATION
Nature’s solution to binding
molecules
Used in diagnostics,
therapy, labelling,
biochemistry research
$105 billion industry (2016)
3D epitope
Produced in vivo in
animals (polyclonal) then
optimized biochemically
(monoclonal)
Antibodies
16. MOTIVATION
Nature’s solution to binding
molecules
Used in diagnostics,
therapy, labelling,
biochemistry research
$105 billion industry (2016)
3D epitope
Produced in vivo in
animals (polyclonal) then
optimized biochemically
(monoclonal)
Antibodies DARPins
Designed Ankyrin Repeat
Proteins
Developed by Andreas
Plückthun, UZH
Commercialized by
Molecular Partners AG
($571 million market cap)
Similar uses to antibodies
3D epitope
Produced in vitro from a
randomized library
17. MOTIVATION
Nature’s solution to binding
molecules
Used in diagnostics,
therapy, labelling,
biochemistry research
$105 billion industry (2016)
3D epitope
Produced in vivo in
animals (polyclonal) then
optimized biochemically
(monoclonal)
Antibodies DARPins dArmRP
Designed Ankyrin Repeat
Proteins
Developed by Andreas
Plückthun, UZH
Commercialized by
Molecular Partners AG
($571 million market cap)
Similar uses to antibodies
3D epitope
Produced in vitro from a
randomized library
Designed Armadillo Repeat
Proteins
Bind extended peptides
(tails, disordered regions,
denatured proteins)
1D epitope
Rationally designed in
silico?
18. ARM STRUCTURE & CONSERVATION
Gul 2017 Fig 1B
Structure: Repeat from designed ARM YIIIM5AII (Hansen…Plückthun, 2016) [5aei], colored and labeled as in the alignment
H1
H2
H3 H1 H2 H3
Hydrophobic core
19. BINDING HINTS FROM DARMRP ((KR)N BINDING)
Gul 2017 Fig 1B
Structure: Repeat from designed ARM YIIIM5AII (Hansen…Plückthun, 2016) [5aei], colored and labeled as in the alignment
H1
H2
H3
Nonspecific
binding
Mutants available for 7 residues in Arg pocket
Lys pocket has only one specific interaction
H1 H2 H3
Hydrophobic core
20. BINDING MODULARITY
For dArmRP, binding is linear with the number of repeats and for
single-residue mutations
Predictable binding energies
Single-residue resolution
K->A
R->A
2K->2A
2R->2A
21. KERNEL MODEL
Regression problem: predict binding affinity from sequence at 7
positions
Extract 5 features based on amino acid properties (Atchley 2005)
Use linear regression with various kernels
log10 𝑌 = 𝐾 𝐾 + 𝜆𝐼 log10 𝑌
Linear kernel 𝑎, 𝑏 = 𝑎 𝑇 𝑏
Gaussian kernel 𝑎, 𝑏 = 𝑒𝑥𝑝 −𝜎 𝑎 − 𝑏 2
22. RESULTS
Train on 138 datapoints from Plückthun group
Essentially all “positive” binding cases
Leave-one-out cross validation for error estimation
Linear: 0.42 standard error (log10 M units)
Gaussian: 1.42, but numerically instable
26. OUTLOOK: BINDING
Switch from regression to classification
Additional training data from collaborators
In particular, need non-binding examples
More sophisticated classifiers
Numerically stable implementation
Better kernels?
Proactively suggest informative instances for our collaborators to
measure
27. THANKS!
ACGTeam: Maria Anisimova, Manuel Gil, Victor Garcia,
Lorenzo Gatti, Max Maiolo, Simone Ulzega, Erich
Zbinden
Matteo Delucci & Lina Naef (ACLS masters) – TRAL
Elke Schaper – TRAL
Somayeh Danafar – Kernel methods
Andreas Plückthun, Patrick Ernst, Yvonne Stark (UZH) –
Binding data
But wait, there’s more! MMTF format coming next…