SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Alasdair J G Gray
A.J.G.Gray@hw.ac.uk
www.macs.hw.ac.uk/~ajg33
@gray_alasdair
Using a Jupyter Notebook to perform a
reproducible scientific analysis over semantic
web sources
Reproducibility Crisis
9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 2
Images from: https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 3
Computation Notebook
• Literate programming:
combines
– Analysis: computation
environment
– Narrative: explanatory text
• Cross-discipline take-up:
– Astronomy
– Biology
– Oceanography
• Gravitational Waves
– http://mybinder.org/repo/losc-
tutorial/LOSC_Event_tutorial
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005425
Aim
Use a computation notebook to:
1. Perform an analysis over Semantic Web resources
– Reproduce an analysis performed through website
– Exploit recent Guide to Pharmacology RDF data publication
and other Linked Open Data endpoints
2. Publish the analysis for ease of reproducibility
3. Embed semantics into the notebook
9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 4
Pharmacology Analysis to Reproduce
9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 5
• Using PubChem
• Compound count
in several datasets
– ChEBI
– ChEMBL
– DrugBank
– GtP
• Intersection of
compounds
across datasets
– Results reproduced
15 March 2018
Developed Jupyter Notebook
9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 6
Analysis Results
Dataset
PubChem
(2018-03-15)
SPARQL
(2018-06-08)
SPARQL
(2018-07-24)
SPARQL
(2018-10-01)
PubChem
(2018-10-01)
ChEBI 91,407 184,393 90,510 90,510 92,367
ChEMBL 1,729,327 1,820,035 1,820,035 1,820,035 1,821,997
DrugBank 9,789 6,810 6,810 6,810 9,823
Guide to
Pharmacology
6,969 7,065 7,146 7,235 7,249
Intersection 1,523 -- -- -- 1,547
9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 7
• PubChem
– Receives regular updates
– ChEMBL count doesn’t correspond to release notes
• ChEBI RDF
– Accessed through OLS
– Issued: 2018-01-01
– Double load in June?
• ChEMBL RDF
– Quarterly update: last release 2018-04-23
– Count corresponds to release notes
• DrugBank RDF
– Last update: 2014-07-25
• Guide to Pharmacology
– Regular updates
• Intersection
– Unable to compute over RDF
Jupyter Notebook Experience
• Easy to interlace explanation
and code
• Writing style:
– Papers tend to be formal
– Code explanation informal
• How to represent results at
time of writing vs live results?
– Used static table
• Embed myBinder link
• No referencing support
(out of the box)
– cite2c plugin:
https://github.com/takluyver/cite2c
• No standard metadata
– Metadata not displayed
– No markup, e.g. ORCID
• Couldn’t include environment
details
• Generating HTML using print
dialogue
– LaTeX generation didn’t work
9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 8
Conclusions
Use a computation notebook to:
1. Perform an analysis over Semantic Web resources
– Reproduce an analysis performed through website
– Exploit recent Guide to Pharmacology RDF data publication
and other Linked Open Data endpoints
2. Publish the analysis for ease of reproducibility
– https://mybinder.org/v2/gh/AlasdairGray/SemSci2018/master?filepath=SemSci2018%20Publication.ipynb
3. Embed semantics into the notebook
9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 9
Alasdair J G Gray
A.J.G.Gray@hw.ac.uk
www.macs.hw.ac.uk/~ajg33
@gray_alasdair

Weitere ähnliche Inhalte

Ähnlich wie Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources

Sard HMSC Tech Talk
Sard HMSC Tech TalkSard HMSC Tech Talk
Sard HMSC Tech TalkNick Sard
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Kai Eckert
 
Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Rudy Potenzone
 
Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word WadeAlex Wade
 
Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012scorlosquet
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerEric Stephan
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Delivering Agile Data Science on Openshift - Red Hat Summit 2019
Delivering Agile Data Science on Openshift  - Red Hat Summit 2019Delivering Agile Data Science on Openshift  - Red Hat Summit 2019
Delivering Agile Data Science on Openshift - Red Hat Summit 2019John Archer
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science CommunicationIsabelle Augenstein
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryRuben Schalk
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RESChristophe Guéret
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 
Challenges and Guidelines for Reproducible Research with Jupyter Notebook
Challenges and Guidelines for Reproducible Research with Jupyter NotebookChallenges and Guidelines for Reproducible Research with Jupyter Notebook
Challenges and Guidelines for Reproducible Research with Jupyter NotebookPeter Rose
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Rese...
Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Rese...Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Rese...
Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Rese...James Baker
 

Ähnlich wie Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources (20)

Sard HMSC Tech Talk
Sard HMSC Tech TalkSard HMSC Tech Talk
Sard HMSC Tech Talk
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)
 
Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011
 
Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word Wade
 
Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012
 
Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model Manager
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Delivering Agile Data Science on Openshift - Red Hat Summit 2019
Delivering Agile Data Science on Openshift  - Red Hat Summit 2019Delivering Agile Data Science on Openshift  - Red Hat Summit 2019
Delivering Agile Data Science on Openshift - Red Hat Summit 2019
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
Challenges and Guidelines for Reproducible Research with Jupyter Notebook
Challenges and Guidelines for Reproducible Research with Jupyter NotebookChallenges and Guidelines for Reproducible Research with Jupyter Notebook
Challenges and Guidelines for Reproducible Research with Jupyter Notebook
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Rese...
Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Rese...Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Rese...
Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Rese...
 

Mehr von Alasdair Gray

Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Alasdair Gray
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAlasdair Gray
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesAlasdair Gray
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceAlasdair Gray
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data TodayAlasdair Gray
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data ContextAlasdair Gray
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataAlasdair Gray
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Alasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileAlasdair Gray
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked DataAlasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Alasdair Gray
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSAlasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsAlasdair Gray
 

Mehr von Alasdair Gray (20)

Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 

Kürzlich hochgeladen

fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 

Kürzlich hochgeladen (20)

fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources

  • 1. Alasdair J G Gray A.J.G.Gray@hw.ac.uk www.macs.hw.ac.uk/~ajg33 @gray_alasdair Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources
  • 2. Reproducibility Crisis 9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 2 Images from: https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
  • 3. 9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 3 Computation Notebook • Literate programming: combines – Analysis: computation environment – Narrative: explanatory text • Cross-discipline take-up: – Astronomy – Biology – Oceanography • Gravitational Waves – http://mybinder.org/repo/losc- tutorial/LOSC_Event_tutorial https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005425
  • 4. Aim Use a computation notebook to: 1. Perform an analysis over Semantic Web resources – Reproduce an analysis performed through website – Exploit recent Guide to Pharmacology RDF data publication and other Linked Open Data endpoints 2. Publish the analysis for ease of reproducibility 3. Embed semantics into the notebook 9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 4
  • 5. Pharmacology Analysis to Reproduce 9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 5 • Using PubChem • Compound count in several datasets – ChEBI – ChEMBL – DrugBank – GtP • Intersection of compounds across datasets – Results reproduced 15 March 2018
  • 6. Developed Jupyter Notebook 9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 6
  • 7. Analysis Results Dataset PubChem (2018-03-15) SPARQL (2018-06-08) SPARQL (2018-07-24) SPARQL (2018-10-01) PubChem (2018-10-01) ChEBI 91,407 184,393 90,510 90,510 92,367 ChEMBL 1,729,327 1,820,035 1,820,035 1,820,035 1,821,997 DrugBank 9,789 6,810 6,810 6,810 9,823 Guide to Pharmacology 6,969 7,065 7,146 7,235 7,249 Intersection 1,523 -- -- -- 1,547 9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 7 • PubChem – Receives regular updates – ChEMBL count doesn’t correspond to release notes • ChEBI RDF – Accessed through OLS – Issued: 2018-01-01 – Double load in June? • ChEMBL RDF – Quarterly update: last release 2018-04-23 – Count corresponds to release notes • DrugBank RDF – Last update: 2014-07-25 • Guide to Pharmacology – Regular updates • Intersection – Unable to compute over RDF
  • 8. Jupyter Notebook Experience • Easy to interlace explanation and code • Writing style: – Papers tend to be formal – Code explanation informal • How to represent results at time of writing vs live results? – Used static table • Embed myBinder link • No referencing support (out of the box) – cite2c plugin: https://github.com/takluyver/cite2c • No standard metadata – Metadata not displayed – No markup, e.g. ORCID • Couldn’t include environment details • Generating HTML using print dialogue – LaTeX generation didn’t work 9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 8
  • 9. Conclusions Use a computation notebook to: 1. Perform an analysis over Semantic Web resources – Reproduce an analysis performed through website – Exploit recent Guide to Pharmacology RDF data publication and other Linked Open Data endpoints 2. Publish the analysis for ease of reproducibility – https://mybinder.org/v2/gh/AlasdairGray/SemSci2018/master?filepath=SemSci2018%20Publication.ipynb 3. Embed semantics into the notebook 9 October 2018 www.macs.hw.ac.uk/SWeL – @hw_swel 9 Alasdair J G Gray A.J.G.Gray@hw.ac.uk www.macs.hw.ac.uk/~ajg33 @gray_alasdair

Hinweis der Redaktion

  1. Nature survey of over 1,500 scientists, published May 2016 Includes nice video
  2. Literate programming: combines narrative with computation
  3. Click screenshot to launch HTML version of the Notebook Click on Binder link to launch executable version Have local backup incase of network issues
  4. Each execution captures a point in time Datasets are constantly evolving Intersection: differences in InChI Key representation Datasets too large to load all data into Notebook (standard configuration) Federated queries timed out
  5. Caveat: very simple analysis performed