SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Eureka Research Workbench:
An Open Source eScience
Laboratory Notebook
Stuart J. Chalk
Department of Chemistry
University of North Florida
schalk@unf.edu
2014 Spring ACS Meeting – CINF Paper 38
 Big Data
 Electronic Notebooks
 The Eureka Research Workbench
 Experiment Markup Language
 ExptML Schema and Files
 Semantic Data and Ontologies
 File Storage
 Eureka Interface
 Web Interface
 Conclusion
Outline
 Current buzz word for “this bring together lots of data and
build tools on top to extract knowledge”
 This is great, except…
 …how do we do that for science?
 Platform, data structures, and exchange protocols to
capture, identify, and disseminate scientific information
 Research Data Alliance (https://rd-alliance.org/)
 “Research Data Sharing without barriers”
 Fran Berman at RPI is NSF funded co-chair of RDA
Big Data
 Scientists need to move to
digital notebooks…
 ...and record not just the data
but the flow and context
 How science is done
is important for searching,
aggregation, meta-analysis
 We need more than an electronic version of a notebook
 We need a science version of “Second Life” (SciLife?)
Electronic Notebooks
 Started in 2006 after getting involved in the
Analytical Information Markup Language (AnIML) project
 Store all research notes/data in a digital format
 Capture the workflow of scientists
 Writing in a lab notebook is equivalent to
“multi-type” blogging in the digital world
 How to capture information? Many data types! (ExptML)
 How to store files “online”? (Fedora-Commons)
 How to access files in the browser? (CakePHP)
 How to represent laboratory resources? (ExptML)
 How to link data together? RDF (in Fedora-Commons)
Eureka Research Workbench
 A specification (written in XML) that describes different
types of information recorded during the scientific process
(http://exptml.sourceforge.net)
Experiment Markup Language (ExptML)
 Sample
 Solution
 Space
 Specimen
 Substance
 Task
 Template
 Timeline
 User
 Vendor
 Annotation
 Api
 Calculation
 Chemical
 Citation
 Customer
 Data
 Dataset
 Definition
 Element
 Equipment
 Event
 Experiment
 Group
 Message
 Project
 Protocol
 Quote
 Report
 Result
ExptML Chemical Schema
ExptML
Chemical
Schema
ExptML Chemical (Instance)
 Data are connected to other data – ‘Linked Data’
(http://www.w3.org/standards/semanticweb/data)
 The ‘Semantic Web’ approach to contextualize data
 Proposed storage of ‘relationships’ between data is the
Resource Description Format (RDF - http://www.w3.org/RDF/)
Semantic Data
 Digital repository software http://fedora-commons.org/
 Creation and management of online digital libraries
 Fedora ‘Digital Object’ consists of metadata + streams
 Metadata stored as Dublin Core (DC stream)
 ExptML file stored as EXPTML stream
 Other files (PDFs, Images, Word etc.) stored as streams
 Relationships stored as RDF (RELS-EXT stream)
 Features: Version control, Checksumming, Archiving
 Built-in search of objects and relationships
 Add-on for file content search (Fedora GSearch)
Fedora Commons
 Fedora-Commons defines and works on digital objects
 In the definition of a Fedora object an ExptML file is just
one stream of many. By default each object also has a
“DC” stream of metadata and an “RELS-EXT” stream of
relationships
 Each Fedora object can have any number of additional
streams for
 Paper PDFs, product/sample pictures,
binary file formats (if a conversion has been done)
 Video, audio, RDF, anything…
 You can export individual streams or the whole Fedora
object with streams binary encoded (Sharing/archiving)
Fedora for File Storage
Fedora Object Storage
 Web interface written in PHP using the CakePHP Framework
 Communicates with Fedora-Commons API to create,
retrieve, update and delete (CRUD) ExptML and other files
 Representational State Transfer (REST) format for URLs
 E.g. http://example.com/chemicals/view/exptml:chm1
 Creation of ExptML via interface
 Provides search via Fedora and Gsearch
 Can extract data out of XML files
 Can gather data from other websites (via API controller)
and integrate into ExptML files
Eureka Web Application
Eureka Website – Group View
Only data types related to the research group show up on left
Eureka Website – Bench View
Clicking on the “Add” menu on the right
allows you add a comment or link to data
Eureka Website – Notebook View
Eureka Website – Laboratory View
The “Rel” menu shows you the information related to this instrument
Eureka Website – Library View
You can add the PDF of the paper to the citation. The contents of the PDF are searchable in the system
Eureka Website – Stockroom View
 Web Application
 Server: Fedora 4, JSON-LD, ElasticSearch
 Client: CakePHP 3/HTML5, Recline.js, Annotator, JQuery
 Standards
 Linked Data Platform (http://www.w3.org/TR/ldp/)
 Datapackage/Simple Data Format (http://dataprotocols.org/)
 Markup Languages: AnIML, UnitsML, CML
 Other Molecular File Formats: MOL/SDF/CDX/CIF/PDB etc.
 Open Framework for Laboratory Data (Allotrope Foundation)
 Datasources
 ChemSpider, CIR, PubChem, Google Scholar, CrossRef, VIVO
 ExchangeNetwork (EPA), NIST, SDBS (no API’s yet)
 Tools
 Marvin for JS, JSXGraph, JSpecView, Chemicalize.org
Eureka Technology Stack
 Implement ingest of all data types, file (if appropriate) and web based
 In browser processing of data -> dataset -> result, report writing
 Extraction of file based legacy data -> ExptML format data
 Open access to data/spectra, ‘available data’ page (browser only)
 Access to data/spectra via linked data server (discovery/indexing)
 Publishing of packaged datasets with authenticated download option
 Automated ingestion of data from instruments/sensors
 Collaborative research: authentication and data exchange
 Timeframe? Depends on securing funding
Eureka Roadmap
 Eureka: Web application to create ExptML files
 Built on ExptML to capture data/resources/workflows
 Reliable storage/archiving system for ExptML files (Fedora)
 Storage of relationships between data (RDF)
 TODO
 Provide mechanism for sharing of data (different levels)
 Add tools to find, visualize and work on science data
 Integration into the RDA model for sharing research data
 Get the word out and test system with many users
Conclusion
References
 Eureka – http://sourceforge.net/projects/eureka
 Fedora-Commons – http://fedora-commons.org
 XML – http://www.w3.org/standards/xml
 AnIML – http://animl.sourceforge.net
 ExptML – http://exptml.sourceforge.net/
 UnitsML – http://unitsml.nist.gov/
 CML – http://www.xml-cml.org/
 JSON-LD – http://www.w3.org/TR/json-ld/
 RDF – http://www.w3.org/RDF/
 CIR – http://cactus.nci.nih.gov/chemical/structure
 RDA – http://rd-alliance.org
 ChemSpider – http://www.chemspider.com/
 Allotrope Foundation – http://allotrope.org

Weitere ähnliche Inhalte

Was ist angesagt?

ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516charper
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v12016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1Bruce Kozuma
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCarole Goble
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataStuart Chalk
 

Was ist angesagt? (20)

The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v12016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
 
ROHub
ROHubROHub
ROHub
 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility Data
 

Ähnlich wie 247th ACS Meeting: The Eureka Research Workbench

Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaStuart Chalk
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Chris Mattmann
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemChris Mattmann
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationStuart Chalk
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao EswcJun Zhao
 
Structured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackStructured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackMike Bergman
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop frameworkTu Pham
 
Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4szbra
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsJeremy Frey
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resourcesimranlatif
 
Rbms 2011 edwards
Rbms 2011 edwardsRbms 2011 edwards
Rbms 2011 edwardsglynnedw
 

Ähnlich wie 247th ACS Meeting: The Eureka Research Workbench (20)

Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - Eureka
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
dotte.ppt
dotte.pptdotte.ppt
dotte.ppt
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!
 
Fedora
FedoraFedora
Fedora
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao Eswc
 
Structured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackStructured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product Stack
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Dspace
DspaceDspace
Dspace
 
Dspace
DspaceDspace
Dspace
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart Labs
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
 
Rbms 2011 edwards
Rbms 2011 edwardsRbms 2011 edwards
Rbms 2011 edwards
 

Mehr von Stuart Chalk

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and unitsStuart Chalk
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structuresStuart Chalk
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...Stuart Chalk
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardStuart Chalk
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook OntologyStuart Chalk
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataStuart Chalk
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebStuart Chalk
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseStuart Chalk
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Stuart Chalk
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXStuart Chalk
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Stuart Chalk
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectStuart Chalk
 
247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)Stuart Chalk
 
Liberating Laboratory Data - AnIML
Liberating Laboratory Data - AnIMLLiberating Laboratory Data - AnIML
Liberating Laboratory Data - AnIMLStuart Chalk
 

Mehr von Stuart Chalk (15)

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and units
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
 
247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)
 
Liberating Laboratory Data - AnIML
Liberating Laboratory Data - AnIMLLiberating Laboratory Data - AnIML
Liberating Laboratory Data - AnIML
 

Kürzlich hochgeladen

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Kürzlich hochgeladen (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

247th ACS Meeting: The Eureka Research Workbench

  • 1. Eureka Research Workbench: An Open Source eScience Laboratory Notebook Stuart J. Chalk Department of Chemistry University of North Florida schalk@unf.edu 2014 Spring ACS Meeting – CINF Paper 38
  • 2.  Big Data  Electronic Notebooks  The Eureka Research Workbench  Experiment Markup Language  ExptML Schema and Files  Semantic Data and Ontologies  File Storage  Eureka Interface  Web Interface  Conclusion Outline
  • 3.  Current buzz word for “this bring together lots of data and build tools on top to extract knowledge”  This is great, except…  …how do we do that for science?  Platform, data structures, and exchange protocols to capture, identify, and disseminate scientific information  Research Data Alliance (https://rd-alliance.org/)  “Research Data Sharing without barriers”  Fran Berman at RPI is NSF funded co-chair of RDA Big Data
  • 4.  Scientists need to move to digital notebooks…  ...and record not just the data but the flow and context  How science is done is important for searching, aggregation, meta-analysis  We need more than an electronic version of a notebook  We need a science version of “Second Life” (SciLife?) Electronic Notebooks
  • 5.  Started in 2006 after getting involved in the Analytical Information Markup Language (AnIML) project  Store all research notes/data in a digital format  Capture the workflow of scientists  Writing in a lab notebook is equivalent to “multi-type” blogging in the digital world  How to capture information? Many data types! (ExptML)  How to store files “online”? (Fedora-Commons)  How to access files in the browser? (CakePHP)  How to represent laboratory resources? (ExptML)  How to link data together? RDF (in Fedora-Commons) Eureka Research Workbench
  • 6.  A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net) Experiment Markup Language (ExptML)  Sample  Solution  Space  Specimen  Substance  Task  Template  Timeline  User  Vendor  Annotation  Api  Calculation  Chemical  Citation  Customer  Data  Dataset  Definition  Element  Equipment  Event  Experiment  Group  Message  Project  Protocol  Quote  Report  Result
  • 10.  Data are connected to other data – ‘Linked Data’ (http://www.w3.org/standards/semanticweb/data)  The ‘Semantic Web’ approach to contextualize data  Proposed storage of ‘relationships’ between data is the Resource Description Format (RDF - http://www.w3.org/RDF/) Semantic Data
  • 11.  Digital repository software http://fedora-commons.org/  Creation and management of online digital libraries  Fedora ‘Digital Object’ consists of metadata + streams  Metadata stored as Dublin Core (DC stream)  ExptML file stored as EXPTML stream  Other files (PDFs, Images, Word etc.) stored as streams  Relationships stored as RDF (RELS-EXT stream)  Features: Version control, Checksumming, Archiving  Built-in search of objects and relationships  Add-on for file content search (Fedora GSearch) Fedora Commons
  • 12.  Fedora-Commons defines and works on digital objects  In the definition of a Fedora object an ExptML file is just one stream of many. By default each object also has a “DC” stream of metadata and an “RELS-EXT” stream of relationships  Each Fedora object can have any number of additional streams for  Paper PDFs, product/sample pictures, binary file formats (if a conversion has been done)  Video, audio, RDF, anything…  You can export individual streams or the whole Fedora object with streams binary encoded (Sharing/archiving) Fedora for File Storage
  • 14.  Web interface written in PHP using the CakePHP Framework  Communicates with Fedora-Commons API to create, retrieve, update and delete (CRUD) ExptML and other files  Representational State Transfer (REST) format for URLs  E.g. http://example.com/chemicals/view/exptml:chm1  Creation of ExptML via interface  Provides search via Fedora and Gsearch  Can extract data out of XML files  Can gather data from other websites (via API controller) and integrate into ExptML files Eureka Web Application
  • 15. Eureka Website – Group View Only data types related to the research group show up on left
  • 16. Eureka Website – Bench View Clicking on the “Add” menu on the right allows you add a comment or link to data
  • 17. Eureka Website – Notebook View
  • 18. Eureka Website – Laboratory View The “Rel” menu shows you the information related to this instrument
  • 19. Eureka Website – Library View You can add the PDF of the paper to the citation. The contents of the PDF are searchable in the system
  • 20. Eureka Website – Stockroom View
  • 21.  Web Application  Server: Fedora 4, JSON-LD, ElasticSearch  Client: CakePHP 3/HTML5, Recline.js, Annotator, JQuery  Standards  Linked Data Platform (http://www.w3.org/TR/ldp/)  Datapackage/Simple Data Format (http://dataprotocols.org/)  Markup Languages: AnIML, UnitsML, CML  Other Molecular File Formats: MOL/SDF/CDX/CIF/PDB etc.  Open Framework for Laboratory Data (Allotrope Foundation)  Datasources  ChemSpider, CIR, PubChem, Google Scholar, CrossRef, VIVO  ExchangeNetwork (EPA), NIST, SDBS (no API’s yet)  Tools  Marvin for JS, JSXGraph, JSpecView, Chemicalize.org Eureka Technology Stack
  • 22.  Implement ingest of all data types, file (if appropriate) and web based  In browser processing of data -> dataset -> result, report writing  Extraction of file based legacy data -> ExptML format data  Open access to data/spectra, ‘available data’ page (browser only)  Access to data/spectra via linked data server (discovery/indexing)  Publishing of packaged datasets with authenticated download option  Automated ingestion of data from instruments/sensors  Collaborative research: authentication and data exchange  Timeframe? Depends on securing funding Eureka Roadmap
  • 23.  Eureka: Web application to create ExptML files  Built on ExptML to capture data/resources/workflows  Reliable storage/archiving system for ExptML files (Fedora)  Storage of relationships between data (RDF)  TODO  Provide mechanism for sharing of data (different levels)  Add tools to find, visualize and work on science data  Integration into the RDA model for sharing research data  Get the word out and test system with many users Conclusion
  • 24. References  Eureka – http://sourceforge.net/projects/eureka  Fedora-Commons – http://fedora-commons.org  XML – http://www.w3.org/standards/xml  AnIML – http://animl.sourceforge.net  ExptML – http://exptml.sourceforge.net/  UnitsML – http://unitsml.nist.gov/  CML – http://www.xml-cml.org/  JSON-LD – http://www.w3.org/TR/json-ld/  RDF – http://www.w3.org/RDF/  CIR – http://cactus.nci.nih.gov/chemical/structure  RDA – http://rd-alliance.org  ChemSpider – http://www.chemspider.com/  Allotrope Foundation – http://allotrope.org