Curating Published Bioactive Peptides in Guide to PHARMACOLOGY
1. Tribulations of curating published key bioactive
peptides for the Guide to PHARMACOLOGY
Christopher Southan, Joanna L. Sharman, Adam J. Pawson, Simon D.
Harding, Elena Faccenda and Jamie A. Davies, IUPHAR/BPS Guide to Pharmacology,
Discovery Brain Sciences, University of Edinburgh, UK.
BPS 2018 Molecular and Cellular Pharmacology Oral Communications 1
Tuesday, December 18, 15:00
1
https://www.slideshare.net/cdsouthan
2. Abstract (will not be shown)
Introduction:The crucial roles of bioactive peptides in pharmacology, drug discovery and chemical biology are well
established. Consequently, the IUPHAR/BPSGuide to PHARMACOLOGY (GtoPdb) and its precursor IUPHAR-DB have
been curating peptide entries for over a decade.While small-molecule chemical structures have curatorial challenges
with which the GtoPdb team has to grapple, these are exacerbated for peptides. Because of their increasing
importance both in endogenous pharmacology (e.g. GPCR ligands) and the development of new exogenous modified
peptide therapeutics we undertook a review of our peptide statistics, curation strategies, indexing in PubChem and
enhancement options.
Methods:We assessed our internal peptide statistics for release 2018.3 including our submitted PubChem substance
entries (SIDs) and undertook a retrospective assessment of their tribulations.We also looked at equivocality problems
with searching peptides in PubChem and major sequence sources. To enhance our own curation, we piloted the Sugar
and Splice (S&S) program from NextMove Software to convert more of our medium-sized peptides from sequence
strings, including formally specified post-translational modifications (PTMs) to SMILES molecular representations
that could then merge with PubChem compound entries (CIDs).
Results:The current database includes 786 endogenous and 1310 exogenous peptide entries (n.b. the presentation will
update these stats from the upcoming 2018.4 release).These are nested within our 9345 PubChem SIDs but many have
not formed CIDs. Legacy problems were mostly due to equivocal structural specifications of PTMs and exact positions
of radiolabel incorporations. However, our capturing of at least a primary sequence string is a compromise that users
can match by BLAST search. As an example, exploring “Endothelin-1” in PubChem and by NCBI sequence search
exposed major name-to-structure mapping problems and multiple structures, includingSwiss-Port features for the
precursor protein.We assessed the major problem of similarity ascertainment for peptides because they are too large
for chemical clustering but too small for clean sequence searching. We successfully incorporated S&S into our peptide
curation triage and converted many legacy sequences to SMILES strings.
Conclusion: Despite their increasing importance, pharmacology database entries for bioactive peptides are
associated with tribulations that GtoPdb, PubChem and other databases have so far confronted with only partial
success. Many are associated with equivocal representations in papers that thus render many reported experiments
irreproducible. We urge authors and journal editors to increase the specificity of peptide specifications. In collaboration
with PubChem and NextMove we are improving our peptide curation, including for some of our legacy entries.
2
3. Outline
• Intoducing GtoPdb
• GtoPdb peptide content and stats
• Peptide tribulations
• PubChem peptidic pros and cons
• Getting more peptides > SMILES
• Stats and examples
• Exploiting PubChem SID tagging
• Wher we go from here
• Further information
3
4. Introducing the IUPHAR/BPS Guide to
PHARMACOLOGY (GtoPdb)
• IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British
Pharmacological Society
• Formerly know as IUPHAR-DB for receptors and channels since 2003
• Since 2012 funded byWellcomeTrust to cover all targets in the human genome
• Since 2015 WellcomeTrust “fork” as Guide to IMMUNOPHARMACOLOGY
• Molecular mechanism of action (mmoa) mapping primary & secondary targets
• Release cycle time (with PubChem refreshes) ~ 2 months
• Six NAR Annual Database issues, latest as PMID 29149325 (2018)
• Distilled into the 2-yearly BritishJournal of Pharmacology “Concise Guide to
PHARMACOLOGY” as a nine-paper series (see PMID 29055037) with outlinks
• Presents users with selected quality compounds for pharmacology research in
silico, in vitro, in cellulo, in vivo, in clinico
• An ELIXIR UK Node resource since 2016 http://www.guidetopharmacology.org
4
5. 5
The GtoPdb hallmark: quantitative binding data
Document > assay > result > compound > location > protein target
D- A- R - C- L- P
Where “C” is not a small molecule, we have ~ 2000 peptides included in
the ~ 9000 substances we submit to PubChem
8. GtoPdb peptide stats
• Peptide ligs/all ligs = 22%.
• Ligands with quantitative binding data/all ligs = 75%
• Peptides with quantitative binding data/all peps = 63%
• CID quantitative binding data peptides/all CID peps = 89%
• These are from release 2018.3 so slight changes in current 2018.4 8
9. Tribulations with peptides
• Author specifications often insuficient for complete molecular definition
• Consequent structural equivocalties slip through the editor/referee net
• Correct IUPAC peptide nomenclature, esp for modified residues is rare (ad-
hoc more common)
• Poor resolution of peptide name-to-structure (n2s)
• Exact location of radiolables often not specified
• Absence of purity verification and/or in vivo stability
• Different graphic rendering styles
• SMILES only < ~ 70 residues in PubChem (grey zone of small peptides)
• Literature and patent extraction for database feeds are proportionally lower
than small molecules
• Searching patents for peptide prior-art or analogues is much more difficult
than small-molecules
• Species ”zoo” for venom peptides and names
• Conjugates (peptides + linkers + proteins ect) provide even more tribulations
9
10. The classic peptidic triple-whammy
10
Endothelin-1, CID 91928636, 1470 ”Similar Compounds” and top-100 BLAST hits
• Too big to search or cluster by SMILES
• Too small to BLAST cleanly (and sans PTMs)
• Too many species splits for precursors
17. GtoPdb plans
• Continue peptide back-fill of peptides > CIDs using Sugar &Splice
• Resolve our sequences against Swiss-Prot x-refs, ChEMBL and GPCRdb
• Consider adding ”peptide” as a new SID tag
• For IUPHAR Guide to Immunopharmacology
– Sub-comitee feedback on peptides, antibodies, targets and
indications
– Continue curation of peptides relevant to immunity and inflamation
• Anticipate curation of new ”binder” therapeutics including minibodies,
polyvalents and hybrids
• Belt-and-braces of linking SMILEs with compromise (i.e. sans
modifications) FASTA approximations to facilitate BLAST indexing and
clustering of peptide ligands
• Introduce local HELM rendering
• Revise legacy data model (e.g. introduce a protein ligand classification)
17
18. Acknowledgments, info and COI
18https://sites.google.com/view/tw2informatics/home
Conflict of interest (minor) has consulted in the peptide area
Thanks to the NextMove Software
team for S&S support
Lin Yikai, for her M.Sc. project;
”Developing bio/cheminformatics
methods for converting bioactive
peptide structures into machine-
readable formats”
Paul Thiessen from PubChem for
support and FASTA sequences