SlideShare ist ein Scribd-Unternehmen logo
1 von 20
A Standard Data Format for
Computational Chemistry: CSX
Stuart J. Chalk1,2, Neil Ostlund1, Mirek Sopek1, Bing Wang1
1) Chemical Semantics Inc., Gainesville FL
2) Department of Chemistry, University of North Florida
schalk@unf.edu
249th ACS Meeting, Denver, CO – March 2015
 Semantic Annotation of Data
 Current DOE Project
 Data Transformations
 Common Standard for eXchange (CSX)
 CSX a Standard Data Format
 The CSX Schema
 CSX - Publishing Information
 CSX - Molecular System Information
 CSX - Calculated Result Information
 Future Plans
 Conclusion
Outline
 Create a way to ‘teach’ computers what information
means – contextualize the data
 Example
 What is this? 904-620-1938
 A computer just sees it as…
 … a string
 By using an appropriate semantic definition in RDF (the
Resource Description Framework) we can identify to the
computer that the text is a phone number (using the
Friend of a Friend (FOAF) specification), i.e.
Semantic Annotation of Data
RDF Specification http://www.w3.org/RDF/
FOAF Specification http://xmlns.com/foaf/spec/
<foaf:phone rdf:datatype=“#string">904-620-1938</foaf:phone>
 RDF can be use to relate information as well as
annotate it
 The following RDF/XML shows how some information is
related (XML is the eXtensible Markup Language)
 Applying this technology to computational chemistry
calculations will allow integration of the calculation and
results with data about chemicals from other sources
Semantic Annotation of Data
<rdf:Description rdf:about=http://example.org/StuartChalk>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<foaf:knows rdf:resource="http://example.org/NeilOstlund"/>
<foaf:phone rdf:datatype=”…#string”>904-620-
1938</foaf:phone>
</rdf:Description>
 Chemical Semantics is funded by DOE to create a web
portal to collect, organize and make searchable the
results output from computational chemistry (CC)
calculations
 This will be freely available and will accept output
from all CC software packages
 The intent is to capture calculation results and…
 Software used to calculate the results
 Input parameters used in the calculation
 Methodology by which the calculation was done
 Details of the molecular system studied
DOE SBIR Grant
 The approach Chemical Semantics is taking is to
1. Add code to software packages to generate an XML file
alongside the normal output file –OR–
Parse an existing output file (using a free application) and
generate XML file
2. Send the XML file into the web portal
3. Convert the XML file into RDF into turtle format (TTL)
4. Finally, ingest TTL into a triplestore (Virtuoso)
 All the data in Virtuoso can then be search using SPARQL
(SPARQL Protocol and RDF Query Language)
Data Transformations
Virtuoso http://virtuoso.openlinksw.com/
SPARQL http://www.w3.org/TR/sparql11-query/
 Why XML?
 Human readable (plain text - UTF-8)
 Platform neutral
 Archivable
 Validatable
 Why not use CML?
 Inability to represent complex structures e.g. residues
 No standard way to add CC results
Intermediate XML File
 A CSX file is a text based file written in XML
 It is a structured data container design to hold CC
result data and additional metadata
 Version 0.x was developed by Neil Ostlund
 Version 1.0 is the current stable release developed as
part of Phase 1 of the SBIR grant (limited scope)
 Version 2.0 is currently under development as part of
Phase 2 of the SBIR grant
Common Standard for eXchange (CSX)
 It is well know that the formats in which data is
reported in CC output files is:
 Highly variable (software specific)
 Sometimes difficult to interpret
 Standardization would:
 Allow data from different packages to be more easily
compared
 Open up opportunities for software development to
display and reuse data for different applications
 This mirrors movement in the CC community toward a
common driver base for CC software packages
CSX as a Standard Data Format
 In order to describe the layout and allowed names of
elements and attributes, and values for both, a schema
document is available for the CSX specification
 This can be used to help new users write valid CSX files
(using XML editing applications such as XML Spy and
oxygenXML) and…
 … validate existing CSX files using any of a number of
XML validators (e.g. Xerces) …
 … and understand the structure of the data especially
for less frequently calculated results
The CSX Schema
CSX Schema v1.0
CSX Schema v1.0
CSX Schema v1.0
CSX Schema v1.0
CSX – Publication Information
CSX – Molecular System Information
CSX – Calculated Result Information
 Work on CSX 2.0 is ongoing – expand to multiple systems
and sets of calculated results
 Develop CSX focused website with converter
functionality, libraries, and documentation
 Engage CC software users/programmers to get involved
with the project
 Organize a community developer workshop over
summer 2015
 Publish version 2.0 of CSX in Fall 2015
Future Plans
 CSX started out as a stepping stone to transfer
information to the CS portal
 Having a data standard for CC is an important
development in of itself
 The CC community can do more with their data
 Leverage XML tools to visualize, process etc…
 Compare results across CC packages
 Validate results
 Reference basis sets (https://bse.pnl.gov/)
Conclusion
 schalk@unf.edu
 Phone: 904-620-1938
 Skype: stuartchalk
 LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk
 ORCID: http://orcid.org/0000-0002-0703-7776
 ResearcherID: http://www.researcherid.com/rid/D-8577-2013
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

STAT Requirement Analysis
STAT Requirement AnalysisSTAT Requirement Analysis
STAT Requirement Analysis
stat
 
ASP.NET Session 7
ASP.NET Session 7ASP.NET Session 7
ASP.NET Session 7
Sisir Ghosh
 

Was ist angesagt? (20)

Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl Framework
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
STAT Requirement Analysis
STAT Requirement AnalysisSTAT Requirement Analysis
STAT Requirement Analysis
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 
Grid1
Grid1Grid1
Grid1
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
Softwae and database in data communication network
Softwae and database in data communication networkSoftwae and database in data communication network
Softwae and database in data communication network
 
Data Life Cycle
Data Life CycleData Life Cycle
Data Life Cycle
 
Linked Open Data and DANS
Linked Open Data and DANSLinked Open Data and DANS
Linked Open Data and DANS
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
 
Linked Data Notifications Distributed Update Notification and Propagation on ...
Linked Data Notifications Distributed Update Notification and Propagation on ...Linked Data Notifications Distributed Update Notification and Propagation on ...
Linked Data Notifications Distributed Update Notification and Propagation on ...
 
CDF Embraces XML and SOAP
CDF Embraces XML and SOAPCDF Embraces XML and SOAP
CDF Embraces XML and SOAP
 
HDF5 Life cycle of data
HDF5 Life cycle of dataHDF5 Life cycle of data
HDF5 Life cycle of data
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudAnalyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
 
Intake 38 10
Intake 38 10Intake 38 10
Intake 38 10
 
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
 
ASP.NET Session 7
ASP.NET Session 7ASP.NET Session 7
ASP.NET Session 7
 
Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...
Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...
Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...
 

Andere mochten auch

Б.И. Нигматулин в РНЦ КИ 14.05.2010
Б.И. Нигматулин в РНЦ КИ 14.05.2010Б.И. Нигматулин в РНЦ КИ 14.05.2010
Б.И. Нигматулин в РНЦ КИ 14.05.2010
myatom
 
JANTI Fukushima report part 4 5 6
JANTI Fukushima report part 4 5 6JANTI Fukushima report part 4 5 6
JANTI Fukushima report part 4 5 6
myatom
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional Theory
Wesley Chen
 
B sc_I_General chemistry U-IV Ligands and chelates
B sc_I_General chemistry U-IV Ligands and chelates  B sc_I_General chemistry U-IV Ligands and chelates
B sc_I_General chemistry U-IV Ligands and chelates
Rai University
 

Andere mochten auch (20)

An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEM
 
Representation of molecular structures and related computations on the Sema...
Representation of molecular structures and related computations on the Sema...Representation of molecular structures and related computations on the Sema...
Representation of molecular structures and related computations on the Sema...
 
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...
 
Drug dna interaction
Drug dna interactionDrug dna interaction
Drug dna interaction
 
Project Ppt
Project PptProject Ppt
Project Ppt
 
Б.И. Нигматулин в РНЦ КИ 14.05.2010
Б.И. Нигматулин в РНЦ КИ 14.05.2010Б.И. Нигматулин в РНЦ КИ 14.05.2010
Б.И. Нигматулин в РНЦ КИ 14.05.2010
 
JANTI Fukushima report part 4 5 6
JANTI Fukushima report part 4 5 6JANTI Fukushima report part 4 5 6
JANTI Fukushima report part 4 5 6
 
Advanced Computational Materials Science: Application to Fusion and Generatio...
Advanced Computational Materials Science: Application to Fusion and Generatio...Advanced Computational Materials Science: Application to Fusion and Generatio...
Advanced Computational Materials Science: Application to Fusion and Generatio...
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional Theory
 
B sc_I_General chemistry U-IV Ligands and chelates
B sc_I_General chemistry U-IV Ligands and chelates  B sc_I_General chemistry U-IV Ligands and chelates
B sc_I_General chemistry U-IV Ligands and chelates
 
UCSD NANO106 - 06 - Plane and Space Groups
UCSD NANO106 - 06 - Plane and Space GroupsUCSD NANO106 - 06 - Plane and Space Groups
UCSD NANO106 - 06 - Plane and Space Groups
 
10.637 Lecture 1: Introduction
10.637 Lecture 1: Introduction10.637 Lecture 1: Introduction
10.637 Lecture 1: Introduction
 
K Point Overview
K Point OverviewK Point Overview
K Point Overview
 
Python for Scientific Computing
Python for Scientific ComputingPython for Scientific Computing
Python for Scientific Computing
 
Introduction to the phenomenology of HiTc superconductors.
Introduction to  the phenomenology of HiTc superconductors.Introduction to  the phenomenology of HiTc superconductors.
Introduction to the phenomenology of HiTc superconductors.
 
Application of density functional theory (dft),
Application of density functional theory (dft),Application of density functional theory (dft),
Application of density functional theory (dft),
 
Lecture6
Lecture6Lecture6
Lecture6
 
Intro to DFT+U
Intro to DFT+U Intro to DFT+U
Intro to DFT+U
 
The all-electron GW method based on WIEN2k: Implementation and applications.
The all-electron GW method based on WIEN2k: Implementation and applications.The all-electron GW method based on WIEN2k: Implementation and applications.
The all-electron GW method based on WIEN2k: Implementation and applications.
 
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice ComputationsUCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
 

Ähnlich wie A Standard Data Format for Computational Chemistry: CSX

Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0
STIinnsbruck
 
CDISC2RDF overview with examples
CDISC2RDF overview with examplesCDISC2RDF overview with examples
CDISC2RDF overview with examples
Kerstin Forsberg
 

Ähnlich wie A Standard Data Format for Computational Chemistry: CSX (20)

Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility Data
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0
 
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
 
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easyIEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
 
Supercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemSupercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control System
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
 
Distributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsDistributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applications
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
 
CDISC2RDF overview with examples
CDISC2RDF overview with examplesCDISC2RDF overview with examples
CDISC2RDF overview with examples
 
Web services Overview in depth
Web services Overview in depthWeb services Overview in depth
Web services Overview in depth
 
Euclid Data Model 101 - Episode 01: Overview
Euclid Data Model 101 - Episode 01: OverviewEuclid Data Model 101 - Episode 01: Overview
Euclid Data Model 101 - Episode 01: Overview
 
8023.ppt
8023.ppt8023.ppt
8023.ppt
 
Database Systems Concepts, 5th Ed
Database Systems Concepts, 5th EdDatabase Systems Concepts, 5th Ed
Database Systems Concepts, 5th Ed
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 

Mehr von Stuart Chalk

Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
Stuart Chalk
 

Mehr von Stuart Chalk (20)

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and units
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 
247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
 

Kürzlich hochgeladen

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Kürzlich hochgeladen (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

A Standard Data Format for Computational Chemistry: CSX

  • 1. A Standard Data Format for Computational Chemistry: CSX Stuart J. Chalk1,2, Neil Ostlund1, Mirek Sopek1, Bing Wang1 1) Chemical Semantics Inc., Gainesville FL 2) Department of Chemistry, University of North Florida schalk@unf.edu 249th ACS Meeting, Denver, CO – March 2015
  • 2.  Semantic Annotation of Data  Current DOE Project  Data Transformations  Common Standard for eXchange (CSX)  CSX a Standard Data Format  The CSX Schema  CSX - Publishing Information  CSX - Molecular System Information  CSX - Calculated Result Information  Future Plans  Conclusion Outline
  • 3.  Create a way to ‘teach’ computers what information means – contextualize the data  Example  What is this? 904-620-1938  A computer just sees it as…  … a string  By using an appropriate semantic definition in RDF (the Resource Description Framework) we can identify to the computer that the text is a phone number (using the Friend of a Friend (FOAF) specification), i.e. Semantic Annotation of Data RDF Specification http://www.w3.org/RDF/ FOAF Specification http://xmlns.com/foaf/spec/ <foaf:phone rdf:datatype=“#string">904-620-1938</foaf:phone>
  • 4.  RDF can be use to relate information as well as annotate it  The following RDF/XML shows how some information is related (XML is the eXtensible Markup Language)  Applying this technology to computational chemistry calculations will allow integration of the calculation and results with data about chemicals from other sources Semantic Annotation of Data <rdf:Description rdf:about=http://example.org/StuartChalk> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> <foaf:knows rdf:resource="http://example.org/NeilOstlund"/> <foaf:phone rdf:datatype=”…#string”>904-620- 1938</foaf:phone> </rdf:Description>
  • 5.  Chemical Semantics is funded by DOE to create a web portal to collect, organize and make searchable the results output from computational chemistry (CC) calculations  This will be freely available and will accept output from all CC software packages  The intent is to capture calculation results and…  Software used to calculate the results  Input parameters used in the calculation  Methodology by which the calculation was done  Details of the molecular system studied DOE SBIR Grant
  • 6.  The approach Chemical Semantics is taking is to 1. Add code to software packages to generate an XML file alongside the normal output file –OR– Parse an existing output file (using a free application) and generate XML file 2. Send the XML file into the web portal 3. Convert the XML file into RDF into turtle format (TTL) 4. Finally, ingest TTL into a triplestore (Virtuoso)  All the data in Virtuoso can then be search using SPARQL (SPARQL Protocol and RDF Query Language) Data Transformations Virtuoso http://virtuoso.openlinksw.com/ SPARQL http://www.w3.org/TR/sparql11-query/
  • 7.  Why XML?  Human readable (plain text - UTF-8)  Platform neutral  Archivable  Validatable  Why not use CML?  Inability to represent complex structures e.g. residues  No standard way to add CC results Intermediate XML File
  • 8.  A CSX file is a text based file written in XML  It is a structured data container design to hold CC result data and additional metadata  Version 0.x was developed by Neil Ostlund  Version 1.0 is the current stable release developed as part of Phase 1 of the SBIR grant (limited scope)  Version 2.0 is currently under development as part of Phase 2 of the SBIR grant Common Standard for eXchange (CSX)
  • 9.  It is well know that the formats in which data is reported in CC output files is:  Highly variable (software specific)  Sometimes difficult to interpret  Standardization would:  Allow data from different packages to be more easily compared  Open up opportunities for software development to display and reuse data for different applications  This mirrors movement in the CC community toward a common driver base for CC software packages CSX as a Standard Data Format
  • 10.  In order to describe the layout and allowed names of elements and attributes, and values for both, a schema document is available for the CSX specification  This can be used to help new users write valid CSX files (using XML editing applications such as XML Spy and oxygenXML) and…  … validate existing CSX files using any of a number of XML validators (e.g. Xerces) …  … and understand the structure of the data especially for less frequently calculated results The CSX Schema
  • 15. CSX – Publication Information
  • 16. CSX – Molecular System Information
  • 17. CSX – Calculated Result Information
  • 18.  Work on CSX 2.0 is ongoing – expand to multiple systems and sets of calculated results  Develop CSX focused website with converter functionality, libraries, and documentation  Engage CC software users/programmers to get involved with the project  Organize a community developer workshop over summer 2015  Publish version 2.0 of CSX in Fall 2015 Future Plans
  • 19.  CSX started out as a stepping stone to transfer information to the CS portal  Having a data standard for CC is an important development in of itself  The CC community can do more with their data  Leverage XML tools to visualize, process etc…  Compare results across CC packages  Validate results  Reference basis sets (https://bse.pnl.gov/) Conclusion
  • 20.  schalk@unf.edu  Phone: 904-620-1938  Skype: stuartchalk  LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk  ORCID: http://orcid.org/0000-0002-0703-7776  ResearcherID: http://www.researcherid.com/rid/D-8577-2013 Questions?