1. Trials and tribulations of curating peptide and
antibody ligands for the IUPHAR/BPS Guide to
Pharmacology
Christopher Southan, Joanna L. Sharman, Adam J. Pawson, Simon D.
Harding, Elena Faccenda and Jamie A. Davies, IUPHAR/BPS Guide to Pharmacology,
Discovery Brain Sciences, University of Edinburgh, UK.
ACS Boston 2018, Biologics & Registration Session, Mon Aug 20,
15:50 - 16:15, Harbor Ballroom II
1
https://www.slideshare.net/cdsouthan
2. Abstract (will not be shown)
As an expert-curated database of approved, clinical or research pharmacological targets mapped to
defined ligands, the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) and its precursor IUPHAR-DB,
have been extracting and annotating bioactive peptides from papers for well over a decade. The current
total has reached 2089 peptides, split between exogenous and endogenous, within the 9144 ligand
entries submitted to PubChem in our 2018.2 database release. More recently, as approved drugs or
clinical candidates we have curated 235 antibodies and a small number of therapeutic nucleotides.
Indexing these entity types in GtoPdb present challenges similar to those being encountered for the
registration of biologicals as explicitly defined structures. In addition, we target-map the citation-
supported quantitative binding parameters where possible.This presentation will outline these
curatorial challenges and our efforts to at least partially ameliorate the problems. For peptides below
the PubChem CID SMILES limit of approximately 70 residues we have been using Sugar and Splice from
NextMove Software to convert more of our peptide SIDs to join the 6969 CIDs we already have.
However, we are often confounded by the equivocal structural specifications of authors w.r.t. post
translational modifications and exact positions of radiolabel incorporations. However, we do capture at
least a primary sequence string as an interim compromise that users can hit by BLAST. For reported
receptor-binding endogenous peptides we find some that do not match the Swiss-Prot features for the
precursor protein. PubChem has been encouraging and supporting us in converting more activity-
mapped peptides to CIDs and InChIKeys which should enhance inter-source connectivity. Otherwise,
biological SID data can only be joined by equivocal name matching. Antibodies and other large-
biological SIDs may also currently remain structurally orphaned and present their own challenges.
Notwithstanding, GtoPdb has successfully curated at least primary sequences for the molecular
specification of clinical Mabs. For this we use the IMGT/mAb-DB for approved monoclonals as a first
stop shop since they extract sequences from INN documents. For these and clinical candidates with
code names we also use the patent sequence databases to source a UniParc accession number and can
sometimes get binding data that has not appeared in papers. 2
3. Outline
• Intoducing GtoPdb
• GtoPdb peptide content and stats
• Peptide tribulations
• PubChem peptidic pros and cons
• Getting more peptides > SMILES
• GtoPdb antibody content
• Antipbody tribulations
• Stats and examples
• Exploiting PubChem SID tagging
• Wher we go from here
• Further information
3
4. Introducing the IUPHAR/BPS Guide to
PHARMACOLOGY (GtoPdb)
• IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British
Pharmacological Society
• Formerly know as IUPHAR-DB for receptors and channels since 2003
• Since 2012 funded byWellcomeTrust to cover all targets in the human genome
• Since 2015 WellcomeTrust “fork” as Guide to IMMUNOPHARMACOLOGY
• Molecular mechanism of action (mmoa) mapping primary & secondary targets
• Release cycle time (with PubChem refreshes) ~ 2 months
• Six well-cited NAR Annual Database issues, latest as PMID 29149325 (2018)
• Distilled into the 2-yearly BritishJournal of Pharmacology “Concise Guide to
PHARMACOLOGY” as a nine-paper series (see PMID 29055037) with outlinks
• Presents users with selected quality compounds for pharmacology research in
silico, in vitro, in cellulo, in vivo, in clinico
• An ELIXIR UK Node resource since 2016 http://www.guidetopharmacology.org
4
5. 5
Expert-curated, citation provenanced,
quantitative binding data
Document > assay > result > compound > location > protein target
D- A- R - C- L- P
Where “C” is not a small molecule, we have ~ 2000 peptides and ~ 250
antibodies included in the ~ 9000 substances we submit to PubChem
10. Tribulations with peptides
• Author specifications may be insuficient for complete molecular definition
• Consequent structural equivocalties slip through the editor/referee net
• Correct IUPAC peptide nomenclature is rare (ad-hoc more common)
• Exact location of radiolables often not specified
• Absence of purity verification and/or in vivo stability
• Need to surface user-intuative renderings (but HELM rules OK)
• Poor resolution of peptide name-to-structure (n2s)
• SMILES only copes for ~ 70 residues
• Searching patents for corroborative peptide prior-art is much more difficult than
small-molecules
• Literature extraction or author database submissions for bioactive peptides
proportionally lower than small molecules
• Species ”zoo” for venom peptides and their names
• Conjugates (peptides + linkers + proteins ect) even more difficult
• The PIR RESID Database of Protein Modifications is no longer maintained
10
11. The classic peptidic triple-whammy
11
Endothelin-1, CID 91928636, 1470 ”Similar Compounds” and top-100 BLAST hits
• Too big to search or cluster by SMILES
• Too small to BLAST cleanly (and sans PTMs)
• Too many species splits for precursors
19. Tribulations with antibody curation
• Getting at least a primary Mab sequence as a molecuar definition
• Not alll clinical Mab sequences > patents > INN > IMGT-DB
• May get persistant UniParc ID sequence (on a good day)
• Papers often omit in vitro binding data
• Challenging to track press releases back to primary data
• Papers usually dont usually cite the patents
• But we sometimes get binding data from patents
• The biosimilars are piling in
• No open specification of glycan chains linked to primary sequences
• Some journals publish Mab characterisation with blinded code names
• Considering reseach reagents with vendor IDs if well provenanced
19
23. GtoP plans
• Continue peptide back-fill of peptides > CIDs using S&S
• Resolve our sequences against Swiss-Prot x-refs, ChEMBL and GPCRdb
• Continue adding antibody biosimilar cross-pointers
• Consider adding ”peptide” as a new SID tag
• For IUPHAR Guide to Immunopharmacology
– Sub-comitee feedback on peptides, antibodies, targets and indications
– Continue curation of peptides relevant to immunity and inflamation
• Anticipate curation of new ”binder” therapeutics including minibodies,
polyvalents and hybrids
• Keep watching brief on large-molecule InChIKeys
• Belt-and-braces of linking SMILEs with compromise (i.e. sans modifications)
FASTA approximations for BLAST indexing and clustering of peptide ligands
• Introduce local HELM rendering
• Revise legacy data model (e.g. introduce a protein ligand classification)
23
24. Acknowledgments, info, COI
24https://sites.google.com/view/tw2informatics/home
Conflict of interest (minor) has consulted in the peptide area
Thanks to the NextMove team
for S&S support
Lin Yikai, for her M.Sc. project;
”Developing
bio/cheminformatics methods
for converting bioactive peptide
structures into machine-
readable formats”
Anna Gaulton for ChEMBL FASTA
sequences
Paul Thiessen for PubChem for
FASTA sequences