SlideShare ist ein Scribd-Unternehmen logo
1 von 19
www.guidetopharmacology.org
Opening up and connecting antimalarial data:
Progress with caveats
Christopher Southan
ACS CINF session: The Growing Impact of Openness in
Chemistry: A Symposium in Honour of JC Bradley
1
http://www.slideshare.net/cdsouthan/southan-malaria-acs
Abstract
2
Among JCBs achievements his work on Open notebook science (ONS) has not only perhaps the
largest impact but the ripple effect continues to broaden. This is particularly the case in Open
Source Drug Discovery (OSDD) where ONS is a natural fit. This presentation will review the
“findability” of new antimalarial drug discovery data. While antimalarials are very much a poster
child for OSDD the patterns of result disclosure and practical extent of openness varies widely.
This recent blogpost
(http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html) describes “digging
out” 26 antimalarial leads to add to a new MMV pathogen box. The difficulties associated with this
task will be outlined. In particular, examples are still emerging from conventional (i.e. closed) drug
discovery operations, even to the extent of finding patent-only lead compounds. Even for the
academic groups that do publish papers, examples show the system can be slow and patchy in
getting the structures surfaced in database records. This may not happen at all if MeSH curation
fails to index the lead compound in PubChem so curation of paper is necessary. This slowness
contrasts with the Sydney University Open Source Malaria project (OSM
http://opensourcemalaria.org/) with its declared open source principles. It thus comes closest to
ONS in that they and their collaborators endeavour to surface results in close to real time.
Technical aspects of extracting the information from open web instantiations will be described
including the use of SMILES, InChI strings and Keys. The latter comes close to a perfect ONS
vehicle for chemistry since it makes an explicit chemical structure globally “findable” literally within
minutes of being written into a blogpost, via a search taking ~0.3 seconds (PMID 23399051).
Because JCBs ideas still need wider implementation issues around improving connections
between papers, patents, database entries, OSM data and potential new box inclusions will be
discussed.
Introduction
• As we have heard, Jean-Claude Bradley’s (JCB) work on Open
Notebook Science (ONS) was a major innovation
• The core revolutionary philosophy is real-time data surfaced on the open
web via an Electronic Laboratory Notebook (ELN).
• It has become embraced by Open Source Drug Discovery (OSDD used
here as a generic term not specific to any group)
• The openness is a radical departure from what could be termed
Traditional Closed Drug Discovery (TCDD)
• ONS touches several contemporary themes
• Disclosure of results for others to build on
• Exposure of detailed protocols
• Reproducibility (i.e. warts-and-all sharing of positive and negative results )
• A logical extrapolation of the “open access” publication principle
• Transparency – knowing what different groups are doing globally
• Potential to accelerate discovery research by telescoping timelines
3
Origins of Open Notebook Science from 2005
4
A 2012 page from the JCB lab run through ChemAxon chemicalize.org
Antimalarial research and context
• Research progress for all NTDs is crucial but antimalarials has become somewhat
of a poster-child for OSDD
• The boundaries between OSDD and TCDD are blurred
• The majority of current leads have still come through TCDD route (e.g. many are
patented)
• Antimalarials has become a test bed for new approaches (e.g. open data sets
from GSK and others, the Medicines for Malaria Ventures (MMV) “Malaria Box” of
physical compounds, and WIPO Re:Search intellectual property sharing)
• So far, the Sydney Open Source Malaria project is the only ONS instantiation
http://opensourcemalaria.org/#
• For context, I have donated small amounts voluntary support to the OSM team
since 2012
• This has focused on chemical structure searching, data organisation and
surfacing strategies
• I blog occasionally on the themes of data connectivity in general and for
antimalarial leads in particular
• The surfacing of these leads illustrate “shades of openness” and the problems
thereof, particularly well
5
6
Useful recent
review of leads
- but
• Link-free zone (except
for references)
• PDF “tomb”
• Images for structures
• No systematic chemical
descriptions
• No chemical database
identifiers
• No target protein
database identifiers
• DDD107498 was
blinded at that time (no
structure)
• I decided to address the
problem as a community
service
Consequently, much effort was needed
to get from this to this
7
http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48460617/public/
Getting compounds out of papers into the Pathogen Box: not easy
8
http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html
On a good day, MeSH curators will index the lead structures specified in
PubMed and connect them to PubChem. On a bad day (as in this case), they
may record the name but without any link to a chemical structure.
But a little curatorial perspicacity did resolve DDD107498
9
IUPAC from supp dat > chemicalize.org > PubChem > SureChEMBL > SAR table
But was still a tough job to get 28 antimalarial structures
• The 6 structures not in PubChem are de facto unfindable in open databases
but some may get Google InChIKey matches via chemicalize.org cache
• The only systematic identifier encountered was the IUPAC name which often
had to be dug out of the supplementary data (i.e. neither SMILES nor InChI in
papers or patents)
• No authors made direct database submissions
• The code name was often not a PubChem synonym
• ChEMBL had picked up 16 with data in PubChem BioAssay
• 13 had patent-extraction matches and 11 chemical vendor matches
• The MeSH annotation had only linked two directly to PMIDs
10
Because OSM practices ONS finding stuff is much easier
11
The entire portfolio is open: even the new designs
12
Chemicalize.org does open name-to-struc (n2s) on the web pages
Googling the InChIKey
for global findability
13
• Direct from the Open Lab
Book sheet
• Or from a chemicalize
conversion
• Gives exact match instantly
• Works also with inner layer
• Can cross-check from
PubChem <> ELN
• Many directly uploaded >
ChEMBL then >
PubChemBioAssay
Speed sharing via OSM > Twitter
14
If PubChem –ve; then search the chemicalize.org cache
15
In this case we
similarity hit other
OSM compounds
Rapid triage in PubChem
16
identity matches 90% similarity
chemicalize download > PubChem upload > search
Extending connectivity to target and pathway mapping
17
http://www.wikipathwa
ys.org/index.php/Wiki
Pathways
Conclusions
• Challenges of curating published antimalarial leads were similar to
those encountered by the GtoPdb team for human targets and their
ligands on a daily basis
• This impedes progress in many ways
• Authors spend little effort on ensuring their leads and SAR are
surfaced and connected in databases with a retrievable name
• There are also gaps in reciprocal mappings between leads, targets
and pathways
• Journals should step up efforts towards author chemistry mark up
(Nature Chemical Biology being a good example)
• Authors seem peculiarly reluctant to cite even their own patents
• Compared to TCDD, the way Sydney OSM and their collaborators
work in the open makes a huge difference in the pace of research
• JCBs pioneering work continues to spread out into the open science
community and will extend its impact
18
Questions welcome
19
http://www.ncbi.nlm.nih.gov/pubmed/24234439
http://www.ncbi.nlm.nih.gov/pubmed/23618056
http://www.ncbi.nlm.nih.gov/pubmed/23399051
http://cdsouthan.blogspot.com/

Weitere ähnliche Inhalte

Andere mochten auch

Why Open Data is Important: Presentation to ITAPA Bratislava 11 Nov 14
Why Open Data is Important: Presentation to ITAPA Bratislava 11 Nov 14Why Open Data is Important: Presentation to ITAPA Bratislava 11 Nov 14
Why Open Data is Important: Presentation to ITAPA Bratislava 11 Nov 14Andrew Stott
 
Data, feedback, measuring/reporting, charting progress, social norms, selling...
Data, feedback, measuring/reporting, charting progress, social norms, selling...Data, feedback, measuring/reporting, charting progress, social norms, selling...
Data, feedback, measuring/reporting, charting progress, social norms, selling...icarb
 
Why Admission Data is so Important to Word of Mouth
Why Admission Data is so Important to Word of MouthWhy Admission Data is so Important to Word of Mouth
Why Admission Data is so Important to Word of MouthRick Newberry
 
Charles Henderson | How Fit For Purpose Are Community Accounting Systems?
Charles Henderson | How Fit For Purpose Are Community Accounting Systems?Charles Henderson | How Fit For Purpose Are Community Accounting Systems?
Charles Henderson | How Fit For Purpose Are Community Accounting Systems?icarb
 
Importance of data
Importance of dataImportance of data
Importance of dataJay Daley
 
The Power of Data, the Importance of Moments, and the Future of Storytelling
The Power of Data, the Importance of Moments, and the Future of StorytellingThe Power of Data, the Importance of Moments, and the Future of Storytelling
The Power of Data, the Importance of Moments, and the Future of StorytellingRobert Michael Murray
 
Drawing Sketch Maps of O.S. Maps and Aerial Photographs
Drawing Sketch Maps of O.S. Maps and Aerial PhotographsDrawing Sketch Maps of O.S. Maps and Aerial Photographs
Drawing Sketch Maps of O.S. Maps and Aerial PhotographsAisling O Connor
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining8trackweb
 
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, NoorThe Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, Noormuzkara
 
Reading maps of different kinds
Reading maps of different kindsReading maps of different kinds
Reading maps of different kindsPadma Lalitha
 

Andere mochten auch (13)

Why Open Data is Important: Presentation to ITAPA Bratislava 11 Nov 14
Why Open Data is Important: Presentation to ITAPA Bratislava 11 Nov 14Why Open Data is Important: Presentation to ITAPA Bratislava 11 Nov 14
Why Open Data is Important: Presentation to ITAPA Bratislava 11 Nov 14
 
Maps and GIS
Maps and GISMaps and GIS
Maps and GIS
 
Data, feedback, measuring/reporting, charting progress, social norms, selling...
Data, feedback, measuring/reporting, charting progress, social norms, selling...Data, feedback, measuring/reporting, charting progress, social norms, selling...
Data, feedback, measuring/reporting, charting progress, social norms, selling...
 
Why Admission Data is so Important to Word of Mouth
Why Admission Data is so Important to Word of MouthWhy Admission Data is so Important to Word of Mouth
Why Admission Data is so Important to Word of Mouth
 
Charles Henderson | How Fit For Purpose Are Community Accounting Systems?
Charles Henderson | How Fit For Purpose Are Community Accounting Systems?Charles Henderson | How Fit For Purpose Are Community Accounting Systems?
Charles Henderson | How Fit For Purpose Are Community Accounting Systems?
 
Importance of data
Importance of dataImportance of data
Importance of data
 
Map and GIS Data for 21st Century Research
Map and GIS Data for 21st Century ResearchMap and GIS Data for 21st Century Research
Map and GIS Data for 21st Century Research
 
The Power of Data, the Importance of Moments, and the Future of Storytelling
The Power of Data, the Importance of Moments, and the Future of StorytellingThe Power of Data, the Importance of Moments, and the Future of Storytelling
The Power of Data, the Importance of Moments, and the Future of Storytelling
 
Drawing Sketch Maps of O.S. Maps and Aerial Photographs
Drawing Sketch Maps of O.S. Maps and Aerial PhotographsDrawing Sketch Maps of O.S. Maps and Aerial Photographs
Drawing Sketch Maps of O.S. Maps and Aerial Photographs
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
 
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, NoorThe Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
 
Reading maps of different kinds
Reading maps of different kindsReading maps of different kinds
Reading maps of different kinds
 
Lean Six Sigma Green Belt roadmap poster
Lean Six Sigma Green Belt roadmap posterLean Six Sigma Green Belt roadmap poster
Lean Six Sigma Green Belt roadmap poster
 

Ähnlich wie Opening up and connecting antimalarial data: Progress with caveats

Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureChris Southan
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchDatapetermurrayrust
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Sciencepetermurrayrust
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteTheContentMine
 
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic ScienceScott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic ScienceGigaScience, BGI Hong Kong
 
Open access impact
Open access impactOpen access impact
Open access impactIryna Kuchma
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingSunghwan Kim
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open ScienceTheContentMine
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchGigaScience, BGI Hong Kong
 
Directions in Open Science
Directions in Open ScienceDirections in Open Science
Directions in Open ScienceMike Travers
 
25 January 2022: Webinar on Adverse Outcome Pathway co-operative activities b...
25 January 2022: Webinar on Adverse Outcome Pathway co-operative activities b...25 January 2022: Webinar on Adverse Outcome Pathway co-operative activities b...
25 January 2022: Webinar on Adverse Outcome Pathway co-operative activities b...OECD Environment
 
Open Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsOpen Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsScott Edmunds
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraGigaScience, BGI Hong Kong
 

Ähnlich wie Opening up and connecting antimalarial data: Progress with caveats (20)

Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 
Ebi
EbiEbi
Ebi
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic ScienceScott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Open access impact
Open access impactOpen access impact
Open access impact
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
Open Science in Practice
Open Science in PracticeOpen Science in Practice
Open Science in Practice
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open Science
 
Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research  Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
Directions in Open Science
Directions in Open ScienceDirections in Open Science
Directions in Open Science
 
25 January 2022: Webinar on Adverse Outcome Pathway co-operative activities b...
25 January 2022: Webinar on Adverse Outcome Pathway co-operative activities b...25 January 2022: Webinar on Adverse Outcome Pathway co-operative activities b...
25 January 2022: Webinar on Adverse Outcome Pathway co-operative activities b...
 
Open Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsOpen Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott Edmunds
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
 

Mehr von Chris Southan

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCPChris Southan
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulationsChris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeChris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Chris Southan
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCPChris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteinsChris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFERChris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databasesChris Southan
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 posterChris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagensChris Southan
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyChris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide TribulationsChris Southan
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology updateChris Southan
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProtChris Southan
 

Mehr von Chris Southan (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 

Kürzlich hochgeladen

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Kürzlich hochgeladen (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Opening up and connecting antimalarial data: Progress with caveats

  • 1. www.guidetopharmacology.org Opening up and connecting antimalarial data: Progress with caveats Christopher Southan ACS CINF session: The Growing Impact of Openness in Chemistry: A Symposium in Honour of JC Bradley 1 http://www.slideshare.net/cdsouthan/southan-malaria-acs
  • 2. Abstract 2 Among JCBs achievements his work on Open notebook science (ONS) has not only perhaps the largest impact but the ripple effect continues to broaden. This is particularly the case in Open Source Drug Discovery (OSDD) where ONS is a natural fit. This presentation will review the “findability” of new antimalarial drug discovery data. While antimalarials are very much a poster child for OSDD the patterns of result disclosure and practical extent of openness varies widely. This recent blogpost (http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html) describes “digging out” 26 antimalarial leads to add to a new MMV pathogen box. The difficulties associated with this task will be outlined. In particular, examples are still emerging from conventional (i.e. closed) drug discovery operations, even to the extent of finding patent-only lead compounds. Even for the academic groups that do publish papers, examples show the system can be slow and patchy in getting the structures surfaced in database records. This may not happen at all if MeSH curation fails to index the lead compound in PubChem so curation of paper is necessary. This slowness contrasts with the Sydney University Open Source Malaria project (OSM http://opensourcemalaria.org/) with its declared open source principles. It thus comes closest to ONS in that they and their collaborators endeavour to surface results in close to real time. Technical aspects of extracting the information from open web instantiations will be described including the use of SMILES, InChI strings and Keys. The latter comes close to a perfect ONS vehicle for chemistry since it makes an explicit chemical structure globally “findable” literally within minutes of being written into a blogpost, via a search taking ~0.3 seconds (PMID 23399051). Because JCBs ideas still need wider implementation issues around improving connections between papers, patents, database entries, OSM data and potential new box inclusions will be discussed.
  • 3. Introduction • As we have heard, Jean-Claude Bradley’s (JCB) work on Open Notebook Science (ONS) was a major innovation • The core revolutionary philosophy is real-time data surfaced on the open web via an Electronic Laboratory Notebook (ELN). • It has become embraced by Open Source Drug Discovery (OSDD used here as a generic term not specific to any group) • The openness is a radical departure from what could be termed Traditional Closed Drug Discovery (TCDD) • ONS touches several contemporary themes • Disclosure of results for others to build on • Exposure of detailed protocols • Reproducibility (i.e. warts-and-all sharing of positive and negative results ) • A logical extrapolation of the “open access” publication principle • Transparency – knowing what different groups are doing globally • Potential to accelerate discovery research by telescoping timelines 3
  • 4. Origins of Open Notebook Science from 2005 4 A 2012 page from the JCB lab run through ChemAxon chemicalize.org
  • 5. Antimalarial research and context • Research progress for all NTDs is crucial but antimalarials has become somewhat of a poster-child for OSDD • The boundaries between OSDD and TCDD are blurred • The majority of current leads have still come through TCDD route (e.g. many are patented) • Antimalarials has become a test bed for new approaches (e.g. open data sets from GSK and others, the Medicines for Malaria Ventures (MMV) “Malaria Box” of physical compounds, and WIPO Re:Search intellectual property sharing) • So far, the Sydney Open Source Malaria project is the only ONS instantiation http://opensourcemalaria.org/# • For context, I have donated small amounts voluntary support to the OSM team since 2012 • This has focused on chemical structure searching, data organisation and surfacing strategies • I blog occasionally on the themes of data connectivity in general and for antimalarial leads in particular • The surfacing of these leads illustrate “shades of openness” and the problems thereof, particularly well 5
  • 6. 6 Useful recent review of leads - but • Link-free zone (except for references) • PDF “tomb” • Images for structures • No systematic chemical descriptions • No chemical database identifiers • No target protein database identifiers • DDD107498 was blinded at that time (no structure) • I decided to address the problem as a community service
  • 7. Consequently, much effort was needed to get from this to this 7 http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48460617/public/
  • 8. Getting compounds out of papers into the Pathogen Box: not easy 8 http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html On a good day, MeSH curators will index the lead structures specified in PubMed and connect them to PubChem. On a bad day (as in this case), they may record the name but without any link to a chemical structure.
  • 9. But a little curatorial perspicacity did resolve DDD107498 9 IUPAC from supp dat > chemicalize.org > PubChem > SureChEMBL > SAR table
  • 10. But was still a tough job to get 28 antimalarial structures • The 6 structures not in PubChem are de facto unfindable in open databases but some may get Google InChIKey matches via chemicalize.org cache • The only systematic identifier encountered was the IUPAC name which often had to be dug out of the supplementary data (i.e. neither SMILES nor InChI in papers or patents) • No authors made direct database submissions • The code name was often not a PubChem synonym • ChEMBL had picked up 16 with data in PubChem BioAssay • 13 had patent-extraction matches and 11 chemical vendor matches • The MeSH annotation had only linked two directly to PMIDs 10
  • 11. Because OSM practices ONS finding stuff is much easier 11
  • 12. The entire portfolio is open: even the new designs 12 Chemicalize.org does open name-to-struc (n2s) on the web pages
  • 13. Googling the InChIKey for global findability 13 • Direct from the Open Lab Book sheet • Or from a chemicalize conversion • Gives exact match instantly • Works also with inner layer • Can cross-check from PubChem <> ELN • Many directly uploaded > ChEMBL then > PubChemBioAssay
  • 14. Speed sharing via OSM > Twitter 14
  • 15. If PubChem –ve; then search the chemicalize.org cache 15 In this case we similarity hit other OSM compounds
  • 16. Rapid triage in PubChem 16 identity matches 90% similarity chemicalize download > PubChem upload > search
  • 17. Extending connectivity to target and pathway mapping 17 http://www.wikipathwa ys.org/index.php/Wiki Pathways
  • 18. Conclusions • Challenges of curating published antimalarial leads were similar to those encountered by the GtoPdb team for human targets and their ligands on a daily basis • This impedes progress in many ways • Authors spend little effort on ensuring their leads and SAR are surfaced and connected in databases with a retrievable name • There are also gaps in reciprocal mappings between leads, targets and pathways • Journals should step up efforts towards author chemistry mark up (Nature Chemical Biology being a good example) • Authors seem peculiarly reluctant to cite even their own patents • Compared to TCDD, the way Sydney OSM and their collaborators work in the open makes a huge difference in the pace of research • JCBs pioneering work continues to spread out into the open science community and will extend its impact 18