SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Reproducibility in cheminformatics and
computational chemistry research: Certainly
we can do better than this
Gregory Landrum Ph.D.
NIBR IT
Novartis Institutes for BioMedical Research
Basel
GCC 2012 Goslar
Outline
§  Reproducibility?
§  Requirements for reproducibility of published research
§  Practical aspects
Landrum, G. A. & Stiefl, N. Is that a scientific publication or an advertisement?
Reproducibility, source code and data in the computational chemistry literature. Future
Medicinal Chemistry 4, 1885–1887 (2012).
But first!
A new fingerprint for similarity-based virtual
screening
§  Start with Morgan fingerprints (a.k.a. circular fingerprints1)
§  The usual FCFP algorithm uses fairly crude feature definitions
§  Combine the RDKit Morgan fingerprint algorithm with pharmacophoric
features calculated using “better” feature definitions2.
1.  Rogers, D. & Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model.
50, 742–754 (2010).
2.  Gobbi, A. & Poppinger, D. Genetic Optimization of Combinatorial Libraries.
Biotechnology and Bioengineering (Combinatorial Chemistry) 61, 47–54 (1998).
"[$([N;!H0;v3,v4&+1]),$([O,S;H1;+0]),n&H1&+0]", // Donor
"[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),$([O,S;H0;v2]),$([O,S;-]),
$([N;v3;!$(N-*=[O,N,P,S])]),n&H0&+0,
$([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]", // Acceptor
"[a]", //Aromatic
"[F,Cl,Br,I]",//Halogen
"[#7;+,$([N;H2&+0][$([C,a]);!$([C,a](=O))]),
$([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),
$([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]", // Basic
"[$([C,S](=[O,S,P])-[O;H1,-1])]" //Acidic
Validation data
§  Diverse ChEMBL actives for 50 target classes1
§  Data taken from ChEMBL v14
§  Active : reported activity<10uM and confidence=9
§  Diverse: 100 actives picked using the RDKit’s implementation of the
MaxMin algorithm2 with radius 0 Morgan fingerprints (ECFP-like)
§  Inactives: 10000 molecules selected from the ZINC druglike set.
Selection criterion: two randomly selected neighbors (similarity via Morgan0
fingerprint>=0.5) for each of the 5000 actives
1.  Heikamp, K. & Bajorath, J. Large-Scale Similarity Search Profiling of ChEMBL
Compound Data Sets. JCIM 51, 1831–1839 (2011).
2.  Ashton, M. et al. Identification of Diverse Database Subsets using Property-Based
and Fragment-Based Molecular Descriptions. QSAR & Combinatorial Science 21,
598–604 (2002).
Validation procedure
§  Repeat 50 times for each data set:
•  Randomly pick 5 actives
•  Mix the remaining 95 actives with the 10K inactives
•  Rank that pool of compounds based on maximum similarity to the 5 actives
•  Calculate performance based on enrichment at 5% of the total dataset size
(10095)
§  Look at average enrichments within each assay
§  Compare the new fingerprint to other standard fingerprints;
MACCS, Morgan6 (bv + counts), Morgan4 (bv + counts), Morgan0 (bv +
counts), Topological Torsions (bv + counts), Atom Pairs (bv + counts), Avalon,
2D Pharmacophore, RDKit, 2 internal fingerprints
Results
The new fingerprint is the best for 29 of the 50 datasets
FeatureMorgan2
Morgan0
Back to the talk…
§  Reproducibility?
§  Requirements for reproducibility of published research
§  Practical aspects
Landrum, G. A. & Stiefl, N. Is that a scientific publication or an advertisement?
Reproducibility, source code and data in the computational chemistry literature. Future
Medicinal Chemistry 4, 1885–1887 (2012).
Reproducibility
Scientific publications have at least two goals: (i) to announce a result and (ii)
to convince readers that the result is correct. Mathematics papers are
expected to contain a proof complete enough to allow knowledgeable
readers to fill in any details. Papers in experimental science should describe
the results and provide a clear enough protocol to allow successful repetition
and extension.
Mesirov, J. P. Accessible Reproducible Research. Science 327,
415–416 (2010).
Reproducibility
An author’s central obligation is to present an accurate and complete account
of the research performed, absolutely avoiding deception, including the data
collected or used, as well as an objective discussion of the significance of the
research. Data are defined as information collected or used in generating
research conclusions. The research report and the data collected should
contain sufficient detail and reference to public sources of information to permit
a trained professional to reproduce the experimental observations.
ACS “Ethical Guidelines to Publication of Chemical Research”
Reproducibility
With these thoughts in mind, the editors of journals published by the American
Chemical Society now present a set of ethical guidelines for persons engaged
in the publication of chemical research, specifically, for editors, authors, and
manuscript reviewers. These guidelines are offered not in the sense that there
is any immediate crisis in ethical behavior, but rather from a conviction that the
observance of high ethical standards is so vital to the whole scientific enterprise
that a definition of those standards should be brought to the attention of all
concerned.
We believe that most of the guidelines now offered are already understood and
subscribed to by the majority of experienced research chemists. They may,
however, be of substantial help to those who are relatively new to research.
Even well-established scientists may appreciate an opportunity to review
matters so significant to the practice of science
ACS “Ethical Guidelines to Publication of Chemical Research”
Reproducibility
Experimental reproducibility is the coin of the scientific realm. The extent to
which measurements or observations agree when performed by different
individuals defines this important tenet of the scientific method. The formal
essence of experimental reproducibility was born of the philosophy of logical
positivism or logical empiricism, which purports to gain knowledge of the world
through the use of formal logic linked to observation. A key principle of logical
positivism is verificationism, which holds that every truth is verifiable by
experience. In this rational context, truth is defined by reproducible experience,
and unbiased scientific observation and determinism are its underpinnings.
…
The assumption that objectively true scientific observations must be reproducible
is implicit, yet direct tests of reproducibility are rarely found in the published
literature. This lack of published evidence of reproducibility stems from the
limited appeal of studies reproducing earlier work to most funding bodies and to
most editors. Furthermore, many readers of scientific journals— especially of
higher-impact journals—assume that if a study is of sufficient quality to pass the
scrutiny of rigorous reviewers, it must be true; this assumption is based on the
inferred equivalence of reproducibility and truth described above.
Loscalzo, J. Irreproducible Experimental Results: Causes,
(Mis)interpretations, and Consequences. Circulation 125, 1211–1214 (2012).
If it’s not reproducible science?
“Let me show you some cool pictures from my lab…”
Requirements for Reproducibility
§  Data used
§  Code/algorithm description
§  Results
Peng, R. D. Reproducible Research in Computational Science.
Science 334, 1226–1227 (2011).
Requirements for Reproducibility:
Data
§  This is a no brainer, right?
§  Unless it’s completely unprocessed (or the processing is part of the
detailed method description/code), it’s better to include the actual data
§  For sources like ChEMBL, a version number and SQL to grab the data
are probably adequate
§  “Ligands from PDB structures X, Y, and Z” probably not good enough
Requirements for Reproducibility:
Data
As a condition of publication, authors must agree to make available all data
necessary to understand and assess the conclusions of the manuscript to
any reader of Science. Data must be included in the body of the paper or in
the supplementary materials, where they can be viewed free of charge by all
visitors to the site. Certain types of data must be deposited in an approved
online database, including DNA and protein sequences, microarray data,
crystal structures, and climate records.
http://www.sciencemag.org/site/feature/contribinfo/faq/
index.xhtml#data_faq
Requirements for Reproducibility:
Data
§  What about chemical structures?
•  a table with drawings of molecules?
•  names instead of structures?
§  Why not include the structures in a machine-readable format?
This expanded use of electronic resources offers an excellent opportunity to make chemical
information more accessible and user-friendly to readers of scientific papers.
To take advantage of these opportunities, we have developed several online features that expand
the usefulness of chemical compound information for Nature Chemical Biology readers … In all
original research papers, compounds that are relevant to the background or results of the paper
are assigned a bolded, Arabic numeral that serves as a unique identifier for the compound. Each
numerical abbreviation in the HTML and PDF versions of the article is linked to a Compound Data
page, which shows the structure and the IUPAC or common name of the chemical compound.
From there, readers can download a ChemDraw file of the compound…To provide readers with
rapid access to all of the chemical compounds discussed in an article, we feature a Compound
Data Index page, which is accessible from the Compound Data page, the table of contents entry
for the paper, and the navigation tools on the right side of the Nature Chemical Biology website.
http://www.nature.com/nchembio/journal/v3/n6/full/nchembio0607-297.htm
Requirements for Reproducibility:
Chemical Data
From Nature Chemical Biology
Requirements for Reproducibility:
Code
Data and materials availability All data necessary to understand,
assess, and extend the conclusions of the manuscript must be available
to any reader of Science. All computer codes involved in the creation or
analysis of data must also be available to any reader of Science. After
publication, all reasonable requests for data and materials must be
fulfilled. Any restrictions on the availability of data, codes, or materials,
including fees and original data obtained from other sources (Materials
Transfer Agreements), must be disclosed to the editors upon submission.
http://www.sciencemag.org/site/feature/contribinfo/prep/
gen_info.xhtml#dataavail
Requirements for Reproducibility:
Code
An inherent principle of publication is that others should be able to
replicate and build upon the authors' published claims. Therefore, a
condition of publication in a Nature journal is that authors are required to
make materials, data and associated protocols promptly available to
readers without undue qualifications. Any restrictions on the availability of
materials or information must be disclosed to the editors at the time of
submission. Any restrictions must also be disclosed in the submitted
manuscript, including details of how readers can obtain materials and
information. If materials are to be distributed by a for-profit company, this
must be stated in the paper.
http://www.nature.com/authors/policies/availability.html
In the meantime, researchers must, when they are arranging the
commercialization of their work, bear in mind the implications that these
deals may have on their freedom to publish to the standards that the
community is entitled to expect.
http://www.nature.com/nature/journal/v442/
n7098/full/442001a.html
Requirements for Reproducibility:
Code
Ince, D. C., Hatton, L. & Graham-Cumming, J. The case for open
computer programs. Nature 482, 485–488 (2012).
We argue that, with some exceptions, anything less
than the release of source programs is intolerable for
results that depend on computation. The vagaries of
hardware, software and natural language will always
ensure that exact reproducibility remains uncertain, but
withholding code increases the chances that efforts to
reproduce results will fail.
Requirements for Reproducibility:
Code
§  “Black box” code sharing: installing the software on a publicly
accessible server, or providing executables for people to test
§  Does this help with reproducibility?
§  Not cut and dried. Needs discussion
Requirements for Reproducibility:
Results
§  Including the actual results is even more of a no brainer, right?
Homology Models of Human All-Trans Retinoic Acid Metabolizing Enzymes
CYP26B1 and CYP26B1 Spliced Variant
Homology models of CYP26B1 (cytochrome P450RAI2) and CYP26B1 spliced variant were
derived using the crystal structure of cyanobacterial CYP120A1 as template for the model building.
The quality of the homology models generated were carefully evaluated, and the natural substrate
all-trans-retinoic acid (atRA), several tetralone-derived retinoic acid metabolizing blocking agents
(RAMBAs), and a well-known potent inhibitor of CYP26B1 (R115866) were docked into the
homology model of full-length cytochrome P450 26B1. The results show that in the model of the
full-length CYP26B1, the protein is capable of distinguishing between the natural substrate (atRA),
R115866, and the tetralone derivatives. The spliced variant of CYP26B1 model displays a reduced
affinity for atRA compared to the full-length enzyme, in accordance with recently described
experimental information.
This paper, presenting two new homology models, does not
include either model.
Unfortunately I didn’t have to search long to find this example
How are we doing?
§  Survey of recent publications:
•  Everything in JCIM vol 52 #10
•  Everything in JCAMD vol 26 #10
•  Journal of Cheminformatics from July 2012-Nov 4 2012
§  Big differences between journals
§  Plenty of room for improvement
Journal	
   Type	
  of	
  paper	
   Count	
   Full	
  Data	
   Par3al	
  Data	
   Missing	
  Data	
   Code?	
  
JCIM	
   Method	
   13	
   6	
   3	
   4	
   1	
  
JCIM	
   Non-­‐method	
   16	
   10	
   3	
   3	
   0	
  
JCAMD	
   Method	
   3	
   3	
   0	
   0	
   0	
  
JCAMD	
   Non-­‐method	
   4	
   0	
   3	
   1	
   0	
  
JChemInf	
   Method	
   12	
   7	
   3	
   3	
   8	
  
JChemInf	
   Non-­‐method	
   3	
   0	
   0	
   0	
   0	
  
Practical considerations
§  Where to put the data and code?
•  Supplementary material
•  Code-sharing sites (sourceforge.net, google code, github)
•  Figshare
§  Considerations:
•  It needs to still be there 10+ years from now
•  Having a solid connection to the original paper is good
Tools for reproducible research
Knime
§  Open-source workflow tool
§  Strong data manipulation and mining capabilities
§  Data and results can be stored with the workflow.
Tools for reproducible research
IPython notebook
§  Python session running in a browser
•  Tab completion
•  Access to docstrings
§  Text formatting options available for including discussion or capturing
mathematics
§  Captures all data transformations and displays output
§  Tight integration with matplotlib
Tools for reproducible research
IPython notebook
Tools for reproducible research
IPython notebook
Tools for reproducible research
IPython notebook
Tools for reproducible research
IPython notebook
Back to the earlier interruption
§  Data? YES
§  Solid description of method? YES
§  Code? NO
Still ok, though, right?
Ooops
§  I had a typo in the script where I calculated EF_5 for the new
fingerprint:
§  Fixing that yields:
FeatureMorgan2
Morgan0
ef_5 = calcEnrichment(rankedSims,nActivesTotal=80)
The new fingerprint is no
better than the others.
Should be
95
Requirements for Reproducibility
§  Data used
§  Code/algorithm description
§  Results
Perhaps the biggest barrier to reproducible
research is the lack of a deeply ingrained
culture that simply requires reproducibility for
all scientific claims.
Peng, R. D. Reproducible Research in Computational Science.
Science 334, 1226–1227 (2011).
Acknowledgements
§  NIBR:
•  Nik Stiefl (GDC/CADD)
•  Nikolas Fechner (NIBR IT/IS Sigma)
•  Sereina Riniker (NIBR IT/IS Sigma)
§  Matthias Rarey

Weitere ähnliche Inhalte

Was ist angesagt?

Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AIDatabricks
 
Knowledge graph applications for cosmetics industry
Knowledge graph applications for cosmetics industryKnowledge graph applications for cosmetics industry
Knowledge graph applications for cosmetics industryAnton Yuryev
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_Ann-Marie Roche
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsValery Tkachenko
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Sean Ekins
 
Aiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignAiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignShahir Shamsir
 
Why are we still doing industrial age drug
Why are we still doing industrial age drugWhy are we still doing industrial age drug
Why are we still doing industrial age drugSean Ekins
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 
Resolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceResolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceChris Southan
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorLevi Waldron
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...NextMove Software
 

Was ist angesagt? (20)

Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
 
Knowledge graph applications for cosmetics industry
Knowledge graph applications for cosmetics industryKnowledge graph applications for cosmetics industry
Knowledge graph applications for cosmetics industry
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
AXP302
AXP302AXP302
AXP302
 
Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
 
Aiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignAiding Computer Aided Drug Design
Aiding Computer Aided Drug Design
 
Why are we still doing industrial age drug
Why are we still doing industrial age drugWhy are we still doing industrial age drug
Why are we still doing industrial age drug
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
Resolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceResolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experience
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for Bioconductor
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 

Andere mochten auch

CSS Media Queries (WordCamp 2010)
CSS Media Queries (WordCamp 2010)CSS Media Queries (WordCamp 2010)
CSS Media Queries (WordCamp 2010)Michael Jendryschik
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
How To Treat A Bleeding Wound From A Popped Pimple
How To Treat A Bleeding Wound From A Popped PimpleHow To Treat A Bleeding Wound From A Popped Pimple
How To Treat A Bleeding Wound From A Popped PimpleM. David Cole, MD
 
Scottish Public Opinion Monitor: Gordon's Fightback
Scottish Public Opinion Monitor: Gordon's FightbackScottish Public Opinion Monitor: Gordon's Fightback
Scottish Public Opinion Monitor: Gordon's FightbackIpsos UK
 
Presentación sobre desarrollo de nuevos negocios con Grupo Supernova y Cedice...
Presentación sobre desarrollo de nuevos negocios con Grupo Supernova y Cedice...Presentación sobre desarrollo de nuevos negocios con Grupo Supernova y Cedice...
Presentación sobre desarrollo de nuevos negocios con Grupo Supernova y Cedice...Alejandro Bermudez
 
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, FranceWhat About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, FranceDigitised Manuscripts to Europeana
 
Congreso eucarístico vale neme y ferchu albañil
Congreso eucarístico vale neme y ferchu albañilCongreso eucarístico vale neme y ferchu albañil
Congreso eucarístico vale neme y ferchu albañilferchualba
 
Inclusión laboral. empleo y discapacidad.
Inclusión laboral. empleo y discapacidad.Inclusión laboral. empleo y discapacidad.
Inclusión laboral. empleo y discapacidad.José María
 
Las políticas sobre discapacidad en el sistema universitario español.
Las políticas sobre discapacidad en el sistema universitario español.Las políticas sobre discapacidad en el sistema universitario español.
Las políticas sobre discapacidad en el sistema universitario español.José María
 
TREAT YOUR COMPUTER NECK
TREAT YOUR COMPUTER NECKTREAT YOUR COMPUTER NECK
TREAT YOUR COMPUTER NECKEason Chan
 
Atividades de Natal adaptadas em spc
Atividades de Natal adaptadas em spcAtividades de Natal adaptadas em spc
Atividades de Natal adaptadas em spcMadalena Charruadas
 
Introducción a la computadora - Parte I
Introducción a la computadora - Parte IIntroducción a la computadora - Parte I
Introducción a la computadora - Parte IManuel Otero
 
Johan Stuve, presentación de servicios
Johan Stuve, presentación de serviciosJohan Stuve, presentación de servicios
Johan Stuve, presentación de serviciosJohan Stuve
 
Upliftment of man based on character
Upliftment of man based on characterUpliftment of man based on character
Upliftment of man based on characterjasvirsandhu
 
3 d pie chart circular with hole in center 10 stages powerpoint presentation ...
3 d pie chart circular with hole in center 10 stages powerpoint presentation ...3 d pie chart circular with hole in center 10 stages powerpoint presentation ...
3 d pie chart circular with hole in center 10 stages powerpoint presentation ...SlideTeam.net
 
Introduction To Ant1
Introduction To  Ant1Introduction To  Ant1
Introduction To Ant1Rajesh Kumar
 

Andere mochten auch (20)

CSS Media Queries (WordCamp 2010)
CSS Media Queries (WordCamp 2010)CSS Media Queries (WordCamp 2010)
CSS Media Queries (WordCamp 2010)
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Getting started
Getting startedGetting started
Getting started
 
How To Treat A Bleeding Wound From A Popped Pimple
How To Treat A Bleeding Wound From A Popped PimpleHow To Treat A Bleeding Wound From A Popped Pimple
How To Treat A Bleeding Wound From A Popped Pimple
 
Scottish Public Opinion Monitor: Gordon's Fightback
Scottish Public Opinion Monitor: Gordon's FightbackScottish Public Opinion Monitor: Gordon's Fightback
Scottish Public Opinion Monitor: Gordon's Fightback
 
Presentación sobre desarrollo de nuevos negocios con Grupo Supernova y Cedice...
Presentación sobre desarrollo de nuevos negocios con Grupo Supernova y Cedice...Presentación sobre desarrollo de nuevos negocios con Grupo Supernova y Cedice...
Presentación sobre desarrollo de nuevos negocios con Grupo Supernova y Cedice...
 
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, FranceWhat About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
 
Congreso eucarístico vale neme y ferchu albañil
Congreso eucarístico vale neme y ferchu albañilCongreso eucarístico vale neme y ferchu albañil
Congreso eucarístico vale neme y ferchu albañil
 
Inclusión laboral. empleo y discapacidad.
Inclusión laboral. empleo y discapacidad.Inclusión laboral. empleo y discapacidad.
Inclusión laboral. empleo y discapacidad.
 
Las políticas sobre discapacidad en el sistema universitario español.
Las políticas sobre discapacidad en el sistema universitario español.Las políticas sobre discapacidad en el sistema universitario español.
Las políticas sobre discapacidad en el sistema universitario español.
 
TREAT YOUR COMPUTER NECK
TREAT YOUR COMPUTER NECKTREAT YOUR COMPUTER NECK
TREAT YOUR COMPUTER NECK
 
Test ppt
Test pptTest ppt
Test ppt
 
Atividades de Natal adaptadas em spc
Atividades de Natal adaptadas em spcAtividades de Natal adaptadas em spc
Atividades de Natal adaptadas em spc
 
Introducción a la computadora - Parte I
Introducción a la computadora - Parte IIntroducción a la computadora - Parte I
Introducción a la computadora - Parte I
 
Johan Stuve, presentación de servicios
Johan Stuve, presentación de serviciosJohan Stuve, presentación de servicios
Johan Stuve, presentación de servicios
 
Upliftment of man based on character
Upliftment of man based on characterUpliftment of man based on character
Upliftment of man based on character
 
Folha de setembro
Folha de setembroFolha de setembro
Folha de setembro
 
3 d pie chart circular with hole in center 10 stages powerpoint presentation ...
3 d pie chart circular with hole in center 10 stages powerpoint presentation ...3 d pie chart circular with hole in center 10 stages powerpoint presentation ...
3 d pie chart circular with hole in center 10 stages powerpoint presentation ...
 
Introduction To Ant1
Introduction To  Ant1Introduction To  Ant1
Introduction To Ant1
 

Ähnlich wie Reproducibility in cheminformatics and computational chemistry research: certainly we can do better than this

Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsTimothy McPhillips
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmShikha Popali
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNJeremy Yang
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Advances in computer aided drug design
Advances in computer aided drug designAdvances in computer aided drug design
Advances in computer aided drug designVikas Soni
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)Erich Gombocz
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Prof. Wim Van Criekinge
 
Digital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible researchDigital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible researchSC CTSI at USC and CHLA
 

Ähnlich wie Reproducibility in cheminformatics and computational chemistry research: certainly we can do better than this (20)

Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research Objects
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Advances in computer aided drug design
Advances in computer aided drug designAdvances in computer aided drug design
Advances in computer aided drug design
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
 
Open reproducible research
Open reproducible researchOpen reproducible research
Open reproducible research
 
Digital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible researchDigital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible research
 

Mehr von Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registrationGreg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Greg Landrum
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysisGreg Landrum
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? Greg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Greg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 

Mehr von Greg Landrum (13)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 

Kürzlich hochgeladen

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Kürzlich hochgeladen (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Reproducibility in cheminformatics and computational chemistry research: certainly we can do better than this

  • 1. Reproducibility in cheminformatics and computational chemistry research: Certainly we can do better than this Gregory Landrum Ph.D. NIBR IT Novartis Institutes for BioMedical Research Basel GCC 2012 Goslar
  • 2. Outline §  Reproducibility? §  Requirements for reproducibility of published research §  Practical aspects Landrum, G. A. & Stiefl, N. Is that a scientific publication or an advertisement? Reproducibility, source code and data in the computational chemistry literature. Future Medicinal Chemistry 4, 1885–1887 (2012).
  • 4. A new fingerprint for similarity-based virtual screening §  Start with Morgan fingerprints (a.k.a. circular fingerprints1) §  The usual FCFP algorithm uses fairly crude feature definitions §  Combine the RDKit Morgan fingerprint algorithm with pharmacophoric features calculated using “better” feature definitions2. 1.  Rogers, D. & Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010). 2.  Gobbi, A. & Poppinger, D. Genetic Optimization of Combinatorial Libraries. Biotechnology and Bioengineering (Combinatorial Chemistry) 61, 47–54 (1998). "[$([N;!H0;v3,v4&+1]),$([O,S;H1;+0]),n&H1&+0]", // Donor "[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),$([O,S;H0;v2]),$([O,S;-]), $([N;v3;!$(N-*=[O,N,P,S])]),n&H0&+0, $([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]", // Acceptor "[a]", //Aromatic "[F,Cl,Br,I]",//Halogen "[#7;+,$([N;H2&+0][$([C,a]);!$([C,a](=O))]), $([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]), $([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]", // Basic "[$([C,S](=[O,S,P])-[O;H1,-1])]" //Acidic
  • 5. Validation data §  Diverse ChEMBL actives for 50 target classes1 §  Data taken from ChEMBL v14 §  Active : reported activity<10uM and confidence=9 §  Diverse: 100 actives picked using the RDKit’s implementation of the MaxMin algorithm2 with radius 0 Morgan fingerprints (ECFP-like) §  Inactives: 10000 molecules selected from the ZINC druglike set. Selection criterion: two randomly selected neighbors (similarity via Morgan0 fingerprint>=0.5) for each of the 5000 actives 1.  Heikamp, K. & Bajorath, J. Large-Scale Similarity Search Profiling of ChEMBL Compound Data Sets. JCIM 51, 1831–1839 (2011). 2.  Ashton, M. et al. Identification of Diverse Database Subsets using Property-Based and Fragment-Based Molecular Descriptions. QSAR & Combinatorial Science 21, 598–604 (2002).
  • 6. Validation procedure §  Repeat 50 times for each data set: •  Randomly pick 5 actives •  Mix the remaining 95 actives with the 10K inactives •  Rank that pool of compounds based on maximum similarity to the 5 actives •  Calculate performance based on enrichment at 5% of the total dataset size (10095) §  Look at average enrichments within each assay §  Compare the new fingerprint to other standard fingerprints; MACCS, Morgan6 (bv + counts), Morgan4 (bv + counts), Morgan0 (bv + counts), Topological Torsions (bv + counts), Atom Pairs (bv + counts), Avalon, 2D Pharmacophore, RDKit, 2 internal fingerprints
  • 7. Results The new fingerprint is the best for 29 of the 50 datasets FeatureMorgan2 Morgan0
  • 8. Back to the talk… §  Reproducibility? §  Requirements for reproducibility of published research §  Practical aspects Landrum, G. A. & Stiefl, N. Is that a scientific publication or an advertisement? Reproducibility, source code and data in the computational chemistry literature. Future Medicinal Chemistry 4, 1885–1887 (2012).
  • 9. Reproducibility Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct. Mathematics papers are expected to contain a proof complete enough to allow knowledgeable readers to fill in any details. Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension. Mesirov, J. P. Accessible Reproducible Research. Science 327, 415–416 (2010).
  • 10. Reproducibility An author’s central obligation is to present an accurate and complete account of the research performed, absolutely avoiding deception, including the data collected or used, as well as an objective discussion of the significance of the research. Data are defined as information collected or used in generating research conclusions. The research report and the data collected should contain sufficient detail and reference to public sources of information to permit a trained professional to reproduce the experimental observations. ACS “Ethical Guidelines to Publication of Chemical Research”
  • 11. Reproducibility With these thoughts in mind, the editors of journals published by the American Chemical Society now present a set of ethical guidelines for persons engaged in the publication of chemical research, specifically, for editors, authors, and manuscript reviewers. These guidelines are offered not in the sense that there is any immediate crisis in ethical behavior, but rather from a conviction that the observance of high ethical standards is so vital to the whole scientific enterprise that a definition of those standards should be brought to the attention of all concerned. We believe that most of the guidelines now offered are already understood and subscribed to by the majority of experienced research chemists. They may, however, be of substantial help to those who are relatively new to research. Even well-established scientists may appreciate an opportunity to review matters so significant to the practice of science ACS “Ethical Guidelines to Publication of Chemical Research”
  • 12. Reproducibility Experimental reproducibility is the coin of the scientific realm. The extent to which measurements or observations agree when performed by different individuals defines this important tenet of the scientific method. The formal essence of experimental reproducibility was born of the philosophy of logical positivism or logical empiricism, which purports to gain knowledge of the world through the use of formal logic linked to observation. A key principle of logical positivism is verificationism, which holds that every truth is verifiable by experience. In this rational context, truth is defined by reproducible experience, and unbiased scientific observation and determinism are its underpinnings. … The assumption that objectively true scientific observations must be reproducible is implicit, yet direct tests of reproducibility are rarely found in the published literature. This lack of published evidence of reproducibility stems from the limited appeal of studies reproducing earlier work to most funding bodies and to most editors. Furthermore, many readers of scientific journals— especially of higher-impact journals—assume that if a study is of sufficient quality to pass the scrutiny of rigorous reviewers, it must be true; this assumption is based on the inferred equivalence of reproducibility and truth described above. Loscalzo, J. Irreproducible Experimental Results: Causes, (Mis)interpretations, and Consequences. Circulation 125, 1211–1214 (2012).
  • 13. If it’s not reproducible science? “Let me show you some cool pictures from my lab…”
  • 14. Requirements for Reproducibility §  Data used §  Code/algorithm description §  Results Peng, R. D. Reproducible Research in Computational Science. Science 334, 1226–1227 (2011).
  • 15. Requirements for Reproducibility: Data §  This is a no brainer, right? §  Unless it’s completely unprocessed (or the processing is part of the detailed method description/code), it’s better to include the actual data §  For sources like ChEMBL, a version number and SQL to grab the data are probably adequate §  “Ligands from PDB structures X, Y, and Z” probably not good enough
  • 16. Requirements for Reproducibility: Data As a condition of publication, authors must agree to make available all data necessary to understand and assess the conclusions of the manuscript to any reader of Science. Data must be included in the body of the paper or in the supplementary materials, where they can be viewed free of charge by all visitors to the site. Certain types of data must be deposited in an approved online database, including DNA and protein sequences, microarray data, crystal structures, and climate records. http://www.sciencemag.org/site/feature/contribinfo/faq/ index.xhtml#data_faq
  • 17. Requirements for Reproducibility: Data §  What about chemical structures? •  a table with drawings of molecules? •  names instead of structures? §  Why not include the structures in a machine-readable format? This expanded use of electronic resources offers an excellent opportunity to make chemical information more accessible and user-friendly to readers of scientific papers. To take advantage of these opportunities, we have developed several online features that expand the usefulness of chemical compound information for Nature Chemical Biology readers … In all original research papers, compounds that are relevant to the background or results of the paper are assigned a bolded, Arabic numeral that serves as a unique identifier for the compound. Each numerical abbreviation in the HTML and PDF versions of the article is linked to a Compound Data page, which shows the structure and the IUPAC or common name of the chemical compound. From there, readers can download a ChemDraw file of the compound…To provide readers with rapid access to all of the chemical compounds discussed in an article, we feature a Compound Data Index page, which is accessible from the Compound Data page, the table of contents entry for the paper, and the navigation tools on the right side of the Nature Chemical Biology website. http://www.nature.com/nchembio/journal/v3/n6/full/nchembio0607-297.htm
  • 18. Requirements for Reproducibility: Chemical Data From Nature Chemical Biology
  • 19. Requirements for Reproducibility: Code Data and materials availability All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. All computer codes involved in the creation or analysis of data must also be available to any reader of Science. After publication, all reasonable requests for data and materials must be fulfilled. Any restrictions on the availability of data, codes, or materials, including fees and original data obtained from other sources (Materials Transfer Agreements), must be disclosed to the editors upon submission. http://www.sciencemag.org/site/feature/contribinfo/prep/ gen_info.xhtml#dataavail
  • 20. Requirements for Reproducibility: Code An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications. Any restrictions on the availability of materials or information must be disclosed to the editors at the time of submission. Any restrictions must also be disclosed in the submitted manuscript, including details of how readers can obtain materials and information. If materials are to be distributed by a for-profit company, this must be stated in the paper. http://www.nature.com/authors/policies/availability.html In the meantime, researchers must, when they are arranging the commercialization of their work, bear in mind the implications that these deals may have on their freedom to publish to the standards that the community is entitled to expect. http://www.nature.com/nature/journal/v442/ n7098/full/442001a.html
  • 21. Requirements for Reproducibility: Code Ince, D. C., Hatton, L. & Graham-Cumming, J. The case for open computer programs. Nature 482, 485–488 (2012). We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail.
  • 22. Requirements for Reproducibility: Code §  “Black box” code sharing: installing the software on a publicly accessible server, or providing executables for people to test §  Does this help with reproducibility? §  Not cut and dried. Needs discussion
  • 23. Requirements for Reproducibility: Results §  Including the actual results is even more of a no brainer, right? Homology Models of Human All-Trans Retinoic Acid Metabolizing Enzymes CYP26B1 and CYP26B1 Spliced Variant Homology models of CYP26B1 (cytochrome P450RAI2) and CYP26B1 spliced variant were derived using the crystal structure of cyanobacterial CYP120A1 as template for the model building. The quality of the homology models generated were carefully evaluated, and the natural substrate all-trans-retinoic acid (atRA), several tetralone-derived retinoic acid metabolizing blocking agents (RAMBAs), and a well-known potent inhibitor of CYP26B1 (R115866) were docked into the homology model of full-length cytochrome P450 26B1. The results show that in the model of the full-length CYP26B1, the protein is capable of distinguishing between the natural substrate (atRA), R115866, and the tetralone derivatives. The spliced variant of CYP26B1 model displays a reduced affinity for atRA compared to the full-length enzyme, in accordance with recently described experimental information. This paper, presenting two new homology models, does not include either model. Unfortunately I didn’t have to search long to find this example
  • 24. How are we doing? §  Survey of recent publications: •  Everything in JCIM vol 52 #10 •  Everything in JCAMD vol 26 #10 •  Journal of Cheminformatics from July 2012-Nov 4 2012 §  Big differences between journals §  Plenty of room for improvement Journal   Type  of  paper   Count   Full  Data   Par3al  Data   Missing  Data   Code?   JCIM   Method   13   6   3   4   1   JCIM   Non-­‐method   16   10   3   3   0   JCAMD   Method   3   3   0   0   0   JCAMD   Non-­‐method   4   0   3   1   0   JChemInf   Method   12   7   3   3   8   JChemInf   Non-­‐method   3   0   0   0   0  
  • 25. Practical considerations §  Where to put the data and code? •  Supplementary material •  Code-sharing sites (sourceforge.net, google code, github) •  Figshare §  Considerations: •  It needs to still be there 10+ years from now •  Having a solid connection to the original paper is good
  • 26. Tools for reproducible research Knime §  Open-source workflow tool §  Strong data manipulation and mining capabilities §  Data and results can be stored with the workflow.
  • 27. Tools for reproducible research IPython notebook §  Python session running in a browser •  Tab completion •  Access to docstrings §  Text formatting options available for including discussion or capturing mathematics §  Captures all data transformations and displays output §  Tight integration with matplotlib
  • 28. Tools for reproducible research IPython notebook
  • 29. Tools for reproducible research IPython notebook
  • 30. Tools for reproducible research IPython notebook
  • 31. Tools for reproducible research IPython notebook
  • 32. Back to the earlier interruption §  Data? YES §  Solid description of method? YES §  Code? NO Still ok, though, right?
  • 33. Ooops §  I had a typo in the script where I calculated EF_5 for the new fingerprint: §  Fixing that yields: FeatureMorgan2 Morgan0 ef_5 = calcEnrichment(rankedSims,nActivesTotal=80) The new fingerprint is no better than the others. Should be 95
  • 34. Requirements for Reproducibility §  Data used §  Code/algorithm description §  Results
  • 35. Perhaps the biggest barrier to reproducible research is the lack of a deeply ingrained culture that simply requires reproducibility for all scientific claims. Peng, R. D. Reproducible Research in Computational Science. Science 334, 1226–1227 (2011).
  • 36. Acknowledgements §  NIBR: •  Nik Stiefl (GDC/CADD) •  Nikolas Fechner (NIBR IT/IS Sigma) •  Sereina Riniker (NIBR IT/IS Sigma) §  Matthias Rarey