SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Capturing Chemistry in XML/CML J. A. Townsend * ,  S. E. Adams *  , J. M. Goodman * ,  P. Murray-Rust * , C. A. Waudby *   Capturing Chemistry in XML/CML ACS March 2004 *  Unilever Centre for Molecular Informatics, University of Cambridge
The Agony Of  Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World
The Agony Of  Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World Sad The Scientist The Lab Journals Web Pages
The Vision-1 Capturing Chemistry in XML/CML ACS March 2004 < scalar  dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66”  /> mp 65-66   C Human-readable Machine-readable
The Vision-2 ,[object Object],Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],But also
Our Approach ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Capturing Chemistry in XML/CML ACS March 2004
Machine Parsing  of Chemistry Capturing Chemistry in XML/CML ACS March 2004 Structured (CompChem) Semi-Structured (Articles) Unstructured (Discussion) Structured  documents and data in  XML MACHINE PARSING   ?
How? Abstract Discussion Experimental Capturing Chemistry in XML/CML ACS March 2004 Article semi- structured Add  Structure Parse with Regular Expressions Legacy to CML  converters
Regular Expressions Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],Maybe ‘.’ Any  punctuation 0 or more digits Capital ‘ C’ Melting point: two possible syntaxes Capital or  lowercase ‘m’ Lowercase ‘ p’ Maybe whitespace Maybe degrees sign m.p. > 23.5 °C mp 23.5 – 25 °C
CML - XML For  Chemistry ,[object Object],[object Object],[object Object],[object Object],[object Object],Capturing Chemistry in XML/CML ACS March 2004 J. Chem. Inf. Comp. Sci.,  2003 ,  43 , 757
The CML Family Controlled XMLNamespaces: CMLCore – compounds and properties CMLReact – reactions CMLSpect – spectra * CMLComp – compChem CMLCryst – crystallography and condensed matter Interoperates with HTML, MathML, SVG,  * AniML + ,  * ThermoML $ , etc. Capturing Chemistry in XML/CML ACS March 2004 + spectra: ANSI/JCAMP $ thermochemistry: NIST J. Chem. Inf. Comp. Sci.,  2003 ,  43 , 757
Case Studies Parsing output from 750,000 MOPAC jobs High-throughput parsing of journals Capturing Chemistry in XML/CML ACS March 2004
CompChem Logs Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy
Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
Parsing Data CompChem Output Capturing Chemistry in XML/CML ACS March 2004 Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Parsers
Display Process 1 Capturing Chemistry in XML/CML ACS March 2004 CompChem Log Xindice CML XSLT
Display Process 2 Capturing Chemistry in XML/CML ACS March 2004 CML File CMLCore CMLCore CMLComp CMLSpect compChem Output 3D structure, electronic properties Coordinates Energy Levels Vibrations Input/jobControl XSLT Display Normal modes 2D structure,  thermodynamic properties
Parsing Data Capturing Chemistry in XML/CML ACS March 2004 Dictionary Entry: The pointgroup of a molecule ... The Schoenflies convention is  normally used, but Hermann  Mauguin is also allowed. D [debye] ParentSI: c.m Multiplier: 3.335641E-30 CGS units for electric dipole
Dictionaries Capturing Chemistry in XML/CML ACS March 2004 < scalar  dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66”  /> Linked to CML schema Accesses CCML  namespace Units dictionary id =&quot;celsius&quot;  name =&quot;Celsius&quot;  parentSI =&quot;k&quot; multiplierToSI =&quot;1&quot;  constantToSI =&quot;273.15&quot;  abbreviation =&quot;C&quot;  unitType =&quot;temp&quot; id =&quot;meltrange&quot;  term =&quot;Melting range&quot; definition =&quot;Minimum and maximum values of melting range in degrees Celsius&quot;
OSCAR Open Source Chemistry Analysis Routines Capturing Chemistry in XML/CML ACS March 2004 Sponsored by the Royal Society of Chemistry (Cambridge) Mounted on http://www.rsc.org/
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Synthesis Set up Analysis Compound Name Article Experimental
Information  Checked / Extracted Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004 H NMR Nature HRMS
OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004
OSCAR Data Found Capturing Chemistry in XML/CML ACS March 2004 Results from one paper
OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 Serious Error Warning Type 1 Warning Type 2
OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 ~30 errors / warnings  searched for This article has: 4 errors 2 warnings (type 1) 30 warnings (type 2) Elemental analysis, incorrect – calculations are for a different molecular formula
OSCAR Data Presentation Capturing Chemistry in XML/CML ACS March 2004
OSCAR Speed Capturing Chemistry in XML/CML ACS March 2004 A typical paper contains ca. 20 compounds JOC (Feb 2004) contains ~600 compounds OSCAR could extract and tabulate in under 5 minutes OBC (Feb 2004) contains ~300 compounds OSCAR could extract and tabulate in under 3 minutes High throughput, high precision
OSCAR Accuracy Capturing Chemistry in XML/CML ACS March 2004 92 % of Data Correctly Identified 3 % incorrect  author entry 5 % missed 437 items, ~10,000 data fields in test set, working with current Regular Expressions False-positives: 3 %
XML-CML Databases Capturing Chemistry in XML/CML ACS March 2004 CML Journals Theses CompChem XMLDb can support > 250,000 molecules Millisecond retrieval on INChI, properties Xindice
Capturing Molecules Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Encourage chemists to
NLP & Parsing Names Capturing Chemistry in XML/CML ACS March 2004 KEY:  Locant  Characteristic Group  Mono valent parent hydride Multiplier  Heterocyclic parent hydride
Thank You Unilever RSC Jonathan Goodman Sam Adams Fraser Norton Chris Waudby Yong Zhang Capturing Chemistry in XML/CML ACS March 2004

Weitere ähnliche Inhalte

Was ist angesagt?

General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2International QSAR Foundation
 
ACSSA Halide-Water Poster
ACSSA Halide-Water PosterACSSA Halide-Water Poster
ACSSA Halide-Water PosterJiarong Zhou
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1International QSAR Foundation
 
Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Theabhi.in
 
DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2David Woo
 
Introduction to OECD QSAR Toolbox
Introduction to OECD QSAR ToolboxIntroduction to OECD QSAR Toolbox
Introduction to OECD QSAR Toolboxguestcfca1eb1
 
Fac/Mer Isomerism in Fe(II) Complexes
Fac/Mer Isomerism in Fe(II) ComplexesFac/Mer Isomerism in Fe(II) Complexes
Fac/Mer Isomerism in Fe(II) ComplexesRafia Aslam
 
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium SystemLinking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium SystemIRJESJOURNAL
 
Free wilson analysis qsar
Free wilson analysis qsarFree wilson analysis qsar
Free wilson analysis qsarRahul B S
 
Introduction to Quantitative Structure Activity Relationships
Introduction to Quantitative Structure Activity RelationshipsIntroduction to Quantitative Structure Activity Relationships
Introduction to Quantitative Structure Activity RelationshipsOmar Sokkar
 
Quantum mechanical study the kinetics, mechanisms and
Quantum mechanical study the kinetics, mechanisms andQuantum mechanical study the kinetics, mechanisms and
Quantum mechanical study the kinetics, mechanisms andAlexander Decker
 
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...Lumen Learning
 
A correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquidsA correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquidsJosemar Pereira da Silva
 
Steric parameters taft’s steric factor (es)
Steric parameters  taft’s steric factor (es)Steric parameters  taft’s steric factor (es)
Steric parameters taft’s steric factor (es)Shikha Popali
 

Was ist angesagt? (20)

General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
 
ACSSA Halide-Water Poster
ACSSA Halide-Water PosterACSSA Halide-Water Poster
ACSSA Halide-Water Poster
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
 
Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)
 
DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2
 
Introduction to OECD QSAR Toolbox
Introduction to OECD QSAR ToolboxIntroduction to OECD QSAR Toolbox
Introduction to OECD QSAR Toolbox
 
Computer Simulation of EPR Orthorhombic Jahn-Teller Spectra of Cu2+in Cd2(NH4...
Computer Simulation of EPR Orthorhombic Jahn-Teller Spectra of Cu2+in Cd2(NH4...Computer Simulation of EPR Orthorhombic Jahn-Teller Spectra of Cu2+in Cd2(NH4...
Computer Simulation of EPR Orthorhombic Jahn-Teller Spectra of Cu2+in Cd2(NH4...
 
Fac/Mer Isomerism in Fe(II) Complexes
Fac/Mer Isomerism in Fe(II) ComplexesFac/Mer Isomerism in Fe(II) Complexes
Fac/Mer Isomerism in Fe(II) Complexes
 
Qsar lecture
Qsar lectureQsar lecture
Qsar lecture
 
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium SystemLinking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
 
Free wilson analysis qsar
Free wilson analysis qsarFree wilson analysis qsar
Free wilson analysis qsar
 
Introduction to Quantitative Structure Activity Relationships
Introduction to Quantitative Structure Activity RelationshipsIntroduction to Quantitative Structure Activity Relationships
Introduction to Quantitative Structure Activity Relationships
 
Poster
PosterPoster
Poster
 
Quantum mechanical study the kinetics, mechanisms and
Quantum mechanical study the kinetics, mechanisms andQuantum mechanical study the kinetics, mechanisms and
Quantum mechanical study the kinetics, mechanisms and
 
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
 
QSAR
QSARQSAR
QSAR
 
Qsar ppt
Qsar pptQsar ppt
Qsar ppt
 
A correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquidsA correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquids
 
QSAR
QSARQSAR
QSAR
 
Steric parameters taft’s steric factor (es)
Steric parameters  taft’s steric factor (es)Steric parameters  taft’s steric factor (es)
Steric parameters taft’s steric factor (es)
 

Andere mochten auch

Effective Capability Building
Effective Capability BuildingEffective Capability Building
Effective Capability BuildingMohit Mittal
 
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤTCHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤTHoàng Thái Việt
 
ly thuyet va de kiem tra vat ly 7 hoc ky 1
ly thuyet va de kiem tra vat ly 7 hoc ky 1ly thuyet va de kiem tra vat ly 7 hoc ky 1
ly thuyet va de kiem tra vat ly 7 hoc ky 1Hoàng Thái Việt
 
Luento ammattilaiset, lauran muokkaama pohja 2014
Luento ammattilaiset, lauran muokkaama pohja 2014Luento ammattilaiset, lauran muokkaama pohja 2014
Luento ammattilaiset, lauran muokkaama pohja 20140458452713
 
Digital transformation : The Necessity
Digital transformation : The NecessityDigital transformation : The Necessity
Digital transformation : The NecessityMohit Mittal
 
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAYTONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAYHoàng Thái Việt
 
Juknis lomba karya ilmiah
Juknis  lomba karya ilmiahJuknis  lomba karya ilmiah
Juknis lomba karya ilmiahJack Sudarto
 
Enabling Voice Applications with WebRTC and ORTC in Microsoft Edge
Enabling Voice Applications with WebRTC and ORTC in Microsoft EdgeEnabling Voice Applications with WebRTC and ORTC in Microsoft Edge
Enabling Voice Applications with WebRTC and ORTC in Microsoft EdgeMark Roberts
 
Chuyen de hinh hoc khong gian
Chuyen de hinh hoc khong gianChuyen de hinh hoc khong gian
Chuyen de hinh hoc khong gianonthi360
 
SE_Lec 11_ Project Management
SE_Lec 11_ Project ManagementSE_Lec 11_ Project Management
SE_Lec 11_ Project ManagementAmr E. Mohamed
 
Automated Securities Accounting System
Automated Securities Accounting System Automated Securities Accounting System
Automated Securities Accounting System SRI Infotech
 
Singapore intresting facts
Singapore intresting factsSingapore intresting facts
Singapore intresting factsYasir Shah
 
evowatcger - computer monitoring system
evowatcger - computer monitoring systemevowatcger - computer monitoring system
evowatcger - computer monitoring systemCatalin Muresan
 
وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015Haitham El-Ghareeb
 

Andere mochten auch (20)

Effective Capability Building
Effective Capability BuildingEffective Capability Building
Effective Capability Building
 
MS Dynamics AX 2012
MS Dynamics AX 2012MS Dynamics AX 2012
MS Dynamics AX 2012
 
Neha_Resume_Dev
Neha_Resume_DevNeha_Resume_Dev
Neha_Resume_Dev
 
F.D
F.DF.D
F.D
 
Jane Howard
Jane HowardJane Howard
Jane Howard
 
Xp day roberto20130323
Xp day roberto20130323Xp day roberto20130323
Xp day roberto20130323
 
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤTCHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
 
diningfacilityconcept
diningfacilityconceptdiningfacilityconcept
diningfacilityconcept
 
ly thuyet va de kiem tra vat ly 7 hoc ky 1
ly thuyet va de kiem tra vat ly 7 hoc ky 1ly thuyet va de kiem tra vat ly 7 hoc ky 1
ly thuyet va de kiem tra vat ly 7 hoc ky 1
 
Luento ammattilaiset, lauran muokkaama pohja 2014
Luento ammattilaiset, lauran muokkaama pohja 2014Luento ammattilaiset, lauran muokkaama pohja 2014
Luento ammattilaiset, lauran muokkaama pohja 2014
 
Digital transformation : The Necessity
Digital transformation : The NecessityDigital transformation : The Necessity
Digital transformation : The Necessity
 
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAYTONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
 
Juknis lomba karya ilmiah
Juknis  lomba karya ilmiahJuknis  lomba karya ilmiah
Juknis lomba karya ilmiah
 
Enabling Voice Applications with WebRTC and ORTC in Microsoft Edge
Enabling Voice Applications with WebRTC and ORTC in Microsoft EdgeEnabling Voice Applications with WebRTC and ORTC in Microsoft Edge
Enabling Voice Applications with WebRTC and ORTC in Microsoft Edge
 
Chuyen de hinh hoc khong gian
Chuyen de hinh hoc khong gianChuyen de hinh hoc khong gian
Chuyen de hinh hoc khong gian
 
SE_Lec 11_ Project Management
SE_Lec 11_ Project ManagementSE_Lec 11_ Project Management
SE_Lec 11_ Project Management
 
Automated Securities Accounting System
Automated Securities Accounting System Automated Securities Accounting System
Automated Securities Accounting System
 
Singapore intresting facts
Singapore intresting factsSingapore intresting facts
Singapore intresting facts
 
evowatcger - computer monitoring system
evowatcger - computer monitoring systemevowatcger - computer monitoring system
evowatcger - computer monitoring system
 
وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015
 

Ähnlich wie Capturing Chemistry in XML/CML ACS March 2004

Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. BasicsMobiliuz
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics IIbaoilleach
 
Computational Organic Chemistry
Computational Organic ChemistryComputational Organic Chemistry
Computational Organic ChemistryIsamu Katsuyama
 
AWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion ModelsAWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion Modelsmtingle
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...Ichigaku Takigawa
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit MinSung Kim
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using GromacsRajendra K Labala
 
Cheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirCheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirKAUSHAL SAHU
 
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...ManavBhugun3
 
Parameterization of force field
Parameterization of force fieldParameterization of force field
Parameterization of force fieldJose Luis
 
Machine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IMachine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IJon Paul Janet
 
molecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsmolecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsRAKESH JAGTAP
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Hydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemHydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemOmar Qasim
 

Ähnlich wie Capturing Chemistry in XML/CML ACS March 2004 (20)

Poster_Jun 2014
Poster_Jun 2014Poster_Jun 2014
Poster_Jun 2014
 
Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. Basics
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics II
 
Computational Organic Chemistry
Computational Organic ChemistryComputational Organic Chemistry
Computational Organic Chemistry
 
AWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion ModelsAWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion Models
 
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderIdentification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using Gromacs
 
Cheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirCheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sir
 
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
 
A01 9-1
A01 9-1A01 9-1
A01 9-1
 
Parameterization of force field
Parameterization of force fieldParameterization of force field
Parameterization of force field
 
Machine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IMachine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part I
 
molecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsmolecular mechanics and quantum mechnics
molecular mechanics and quantum mechnics
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Hydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemHydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive system
 

Kürzlich hochgeladen

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Capturing Chemistry in XML/CML ACS March 2004

  • 1. Capturing Chemistry in XML/CML J. A. Townsend * , S. E. Adams * , J. M. Goodman * , P. Murray-Rust * , C. A. Waudby * Capturing Chemistry in XML/CML ACS March 2004 * Unilever Centre for Molecular Informatics, University of Cambridge
  • 2. The Agony Of Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World
  • 3. The Agony Of Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World Sad The Scientist The Lab Journals Web Pages
  • 4. The Vision-1 Capturing Chemistry in XML/CML ACS March 2004 < scalar dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66” /> mp 65-66  C Human-readable Machine-readable
  • 5.
  • 6.
  • 7. Machine Parsing of Chemistry Capturing Chemistry in XML/CML ACS March 2004 Structured (CompChem) Semi-Structured (Articles) Unstructured (Discussion) Structured documents and data in XML MACHINE PARSING ?
  • 8. How? Abstract Discussion Experimental Capturing Chemistry in XML/CML ACS March 2004 Article semi- structured Add Structure Parse with Regular Expressions Legacy to CML converters
  • 9.
  • 10.
  • 11. The CML Family Controlled XMLNamespaces: CMLCore – compounds and properties CMLReact – reactions CMLSpect – spectra * CMLComp – compChem CMLCryst – crystallography and condensed matter Interoperates with HTML, MathML, SVG, * AniML + , * ThermoML $ , etc. Capturing Chemistry in XML/CML ACS March 2004 + spectra: ANSI/JCAMP $ thermochemistry: NIST J. Chem. Inf. Comp. Sci., 2003 , 43 , 757
  • 12. Case Studies Parsing output from 750,000 MOPAC jobs High-throughput parsing of journals Capturing Chemistry in XML/CML ACS March 2004
  • 13. CompChem Logs Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy
  • 14. Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
  • 15. Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
  • 16. Parsing Data CompChem Output Capturing Chemistry in XML/CML ACS March 2004 Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Parsers
  • 17. Display Process 1 Capturing Chemistry in XML/CML ACS March 2004 CompChem Log Xindice CML XSLT
  • 18. Display Process 2 Capturing Chemistry in XML/CML ACS March 2004 CML File CMLCore CMLCore CMLComp CMLSpect compChem Output 3D structure, electronic properties Coordinates Energy Levels Vibrations Input/jobControl XSLT Display Normal modes 2D structure, thermodynamic properties
  • 19. Parsing Data Capturing Chemistry in XML/CML ACS March 2004 Dictionary Entry: The pointgroup of a molecule ... The Schoenflies convention is normally used, but Hermann Mauguin is also allowed. D [debye] ParentSI: c.m Multiplier: 3.335641E-30 CGS units for electric dipole
  • 20. Dictionaries Capturing Chemistry in XML/CML ACS March 2004 < scalar dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66” /> Linked to CML schema Accesses CCML namespace Units dictionary id =&quot;celsius&quot; name =&quot;Celsius&quot; parentSI =&quot;k&quot; multiplierToSI =&quot;1&quot; constantToSI =&quot;273.15&quot; abbreviation =&quot;C&quot; unitType =&quot;temp&quot; id =&quot;meltrange&quot; term =&quot;Melting range&quot; definition =&quot;Minimum and maximum values of melting range in degrees Celsius&quot;
  • 21. OSCAR Open Source Chemistry Analysis Routines Capturing Chemistry in XML/CML ACS March 2004 Sponsored by the Royal Society of Chemistry (Cambridge) Mounted on http://www.rsc.org/
  • 22. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 23. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 24. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 25. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 26. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 27. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Synthesis Set up Analysis Compound Name Article Experimental
  • 28.
  • 29. OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004 H NMR Nature HRMS
  • 30. OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004
  • 31. OSCAR Data Found Capturing Chemistry in XML/CML ACS March 2004 Results from one paper
  • 32. OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 Serious Error Warning Type 1 Warning Type 2
  • 33. OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 ~30 errors / warnings searched for This article has: 4 errors 2 warnings (type 1) 30 warnings (type 2) Elemental analysis, incorrect – calculations are for a different molecular formula
  • 34. OSCAR Data Presentation Capturing Chemistry in XML/CML ACS March 2004
  • 35. OSCAR Speed Capturing Chemistry in XML/CML ACS March 2004 A typical paper contains ca. 20 compounds JOC (Feb 2004) contains ~600 compounds OSCAR could extract and tabulate in under 5 minutes OBC (Feb 2004) contains ~300 compounds OSCAR could extract and tabulate in under 3 minutes High throughput, high precision
  • 36. OSCAR Accuracy Capturing Chemistry in XML/CML ACS March 2004 92 % of Data Correctly Identified 3 % incorrect author entry 5 % missed 437 items, ~10,000 data fields in test set, working with current Regular Expressions False-positives: 3 %
  • 37. XML-CML Databases Capturing Chemistry in XML/CML ACS March 2004 CML Journals Theses CompChem XMLDb can support > 250,000 molecules Millisecond retrieval on INChI, properties Xindice
  • 38.
  • 39. NLP & Parsing Names Capturing Chemistry in XML/CML ACS March 2004 KEY: Locant Characteristic Group Mono valent parent hydride Multiplier Heterocyclic parent hydride
  • 40. Thank You Unilever RSC Jonathan Goodman Sam Adams Fraser Norton Chris Waudby Yong Zhang Capturing Chemistry in XML/CML ACS March 2004