Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Opening up and connecting antimalarial data: Progress with caveats
1. www.guidetopharmacology.org
Opening up and connecting antimalarial data:
Progress with caveats
Christopher Southan
ACS CINF session: The Growing Impact of Openness in
Chemistry: A Symposium in Honour of JC Bradley
1
http://www.slideshare.net/cdsouthan/southan-malaria-acs
2. Abstract
2
Among JCBs achievements his work on Open notebook science (ONS) has not only perhaps the
largest impact but the ripple effect continues to broaden. This is particularly the case in Open
Source Drug Discovery (OSDD) where ONS is a natural fit. This presentation will review the
“findability” of new antimalarial drug discovery data. While antimalarials are very much a poster
child for OSDD the patterns of result disclosure and practical extent of openness varies widely.
This recent blogpost
(http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html) describes “digging
out” 26 antimalarial leads to add to a new MMV pathogen box. The difficulties associated with this
task will be outlined. In particular, examples are still emerging from conventional (i.e. closed) drug
discovery operations, even to the extent of finding patent-only lead compounds. Even for the
academic groups that do publish papers, examples show the system can be slow and patchy in
getting the structures surfaced in database records. This may not happen at all if MeSH curation
fails to index the lead compound in PubChem so curation of paper is necessary. This slowness
contrasts with the Sydney University Open Source Malaria project (OSM
http://opensourcemalaria.org/) with its declared open source principles. It thus comes closest to
ONS in that they and their collaborators endeavour to surface results in close to real time.
Technical aspects of extracting the information from open web instantiations will be described
including the use of SMILES, InChI strings and Keys. The latter comes close to a perfect ONS
vehicle for chemistry since it makes an explicit chemical structure globally “findable” literally within
minutes of being written into a blogpost, via a search taking ~0.3 seconds (PMID 23399051).
Because JCBs ideas still need wider implementation issues around improving connections
between papers, patents, database entries, OSM data and potential new box inclusions will be
discussed.
3. Introduction
• As we have heard, Jean-Claude Bradley’s (JCB) work on Open
Notebook Science (ONS) was a major innovation
• The core revolutionary philosophy is real-time data surfaced on the open
web via an Electronic Laboratory Notebook (ELN).
• It has become embraced by Open Source Drug Discovery (OSDD used
here as a generic term not specific to any group)
• The openness is a radical departure from what could be termed
Traditional Closed Drug Discovery (TCDD)
• ONS touches several contemporary themes
• Disclosure of results for others to build on
• Exposure of detailed protocols
• Reproducibility (i.e. warts-and-all sharing of positive and negative results )
• A logical extrapolation of the “open access” publication principle
• Transparency – knowing what different groups are doing globally
• Potential to accelerate discovery research by telescoping timelines
3
4. Origins of Open Notebook Science from 2005
4
A 2012 page from the JCB lab run through ChemAxon chemicalize.org
5. Antimalarial research and context
• Research progress for all NTDs is crucial but antimalarials has become somewhat
of a poster-child for OSDD
• The boundaries between OSDD and TCDD are blurred
• The majority of current leads have still come through TCDD route (e.g. many are
patented)
• Antimalarials has become a test bed for new approaches (e.g. open data sets
from GSK and others, the Medicines for Malaria Ventures (MMV) “Malaria Box” of
physical compounds, and WIPO Re:Search intellectual property sharing)
• So far, the Sydney Open Source Malaria project is the only ONS instantiation
http://opensourcemalaria.org/#
• For context, I have donated small amounts voluntary support to the OSM team
since 2012
• This has focused on chemical structure searching, data organisation and
surfacing strategies
• I blog occasionally on the themes of data connectivity in general and for
antimalarial leads in particular
• The surfacing of these leads illustrate “shades of openness” and the problems
thereof, particularly well
5
6. 6
Useful recent
review of leads
- but
• Link-free zone (except
for references)
• PDF “tomb”
• Images for structures
• No systematic chemical
descriptions
• No chemical database
identifiers
• No target protein
database identifiers
• DDD107498 was
blinded at that time (no
structure)
• I decided to address the
problem as a community
service
7. Consequently, much effort was needed
to get from this to this
7
http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48460617/public/
8. Getting compounds out of papers into the Pathogen Box: not easy
8
http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html
On a good day, MeSH curators will index the lead structures specified in
PubMed and connect them to PubChem. On a bad day (as in this case), they
may record the name but without any link to a chemical structure.
9. But a little curatorial perspicacity did resolve DDD107498
9
IUPAC from supp dat > chemicalize.org > PubChem > SureChEMBL > SAR table
10. But was still a tough job to get 28 antimalarial structures
• The 6 structures not in PubChem are de facto unfindable in open databases
but some may get Google InChIKey matches via chemicalize.org cache
• The only systematic identifier encountered was the IUPAC name which often
had to be dug out of the supplementary data (i.e. neither SMILES nor InChI in
papers or patents)
• No authors made direct database submissions
• The code name was often not a PubChem synonym
• ChEMBL had picked up 16 with data in PubChem BioAssay
• 13 had patent-extraction matches and 11 chemical vendor matches
• The MeSH annotation had only linked two directly to PMIDs
10
12. The entire portfolio is open: even the new designs
12
Chemicalize.org does open name-to-struc (n2s) on the web pages
13. Googling the InChIKey
for global findability
13
• Direct from the Open Lab
Book sheet
• Or from a chemicalize
conversion
• Gives exact match instantly
• Works also with inner layer
• Can cross-check from
PubChem <> ELN
• Many directly uploaded >
ChEMBL then >
PubChemBioAssay
17. Extending connectivity to target and pathway mapping
17
http://www.wikipathwa
ys.org/index.php/Wiki
Pathways
18. Conclusions
• Challenges of curating published antimalarial leads were similar to
those encountered by the GtoPdb team for human targets and their
ligands on a daily basis
• This impedes progress in many ways
• Authors spend little effort on ensuring their leads and SAR are
surfaced and connected in databases with a retrievable name
• There are also gaps in reciprocal mappings between leads, targets
and pathways
• Journals should step up efforts towards author chemistry mark up
(Nature Chemical Biology being a good example)
• Authors seem peculiarly reluctant to cite even their own patents
• Compared to TCDD, the way Sydney OSM and their collaborators
work in the open makes a huge difference in the pace of research
• JCBs pioneering work continues to spread out into the open science
community and will extend its impact
18