SlideShare ist ein Scribd-Unternehmen logo
1 von 19
CHIC – Converting Hamburgers Into Cows Joseph Townsend jat45@cam.ac.uk
The Scholarly Publication Cycle
What is a Cow? the character encoding is clearly stated the document uses a mark-up technology to identify components  the components have meaning and possibly behaviour associated with them unreduced data available
What we thought the workflow should look like Standoff Annotation File
OSCAR http://sourceforge.net/projects/oscar3-chem/ http://www.omii.ac.uk/wiki/Nwsltr1209OSCAR http://tinyurl.com/yakzgkd
Article Front Matter Abstract Introduction Discussion Results Experimental References
Experimental Front Matter Set up	 Abstract Introduction Compound Name Discussion Results Synthesis Experimental Analysis References
DOCX Workflow (part 1)
DOCX Workflow (part 2)
OREChem PDF PSU Soton Atom Atom SVG Text Cam CrystalEye PubChem Atom Molecules Gaussian  workflow ORE Triplestore IU http://research.microsoft.com/en-us/projects/orechem/
What can we do with a Cow? 5-Cyclobutyl-2,3-dihydro-[1H]-2-benzazepine 82: Potassium carbonate (0.63 g, 4.56 mmol) and thiophenol(0.19 g, 1.69 mmol) were added to the 2- nitrobenzene sulfonamide 50 (0.50 g, 1.302 mmol) in N,N-dimethylformamide(33 cm3) at room temperature and the mixture was stirred for 16 h. Deionised water (50 cm3) was added and the aqueous phase was extracted with ethyl acetate (5 x 50 cm3). The organic extracts were dried (MgSO4) and concentrated under reduced pressure to give the title compound 82 (0.259 g, 1.302 mmol, ca. 100%) as an oil used without further purification.
Parsing and Semantics
Tokenization and Chunking
Phrase identification
RDF of reaction components
[object Object]
Double Circles: Oil

Weitere ähnliche Inhalte

Andere mochten auch (6)

IGCSE
IGCSEIGCSE
IGCSE
 
Cambridge University
Cambridge UniversityCambridge University
Cambridge University
 
Universities of Great Britain
Universities of Great BritainUniversities of Great Britain
Universities of Great Britain
 
Cambridge powerpoint
Cambridge powerpointCambridge powerpoint
Cambridge powerpoint
 
Redacción de textos academicos 2009
Redacción de textos academicos 2009Redacción de textos academicos 2009
Redacción de textos academicos 2009
 
Módulo instruccional partes de la computadora
Módulo instruccional partes de la computadora Módulo instruccional partes de la computadora
Módulo instruccional partes de la computadora
 

Ähnlich wie CHIC - Converting Hamburgers Into Cows

Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
Deanna Church
 
Lithium PHP Meetup 0210
Lithium PHP Meetup 0210Lithium PHP Meetup 0210
Lithium PHP Meetup 0210
schreck84
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
thetfoot
 
Vitalii Kotliarenko “Data processing pipelines with Apache Spark: from protot...
Vitalii Kotliarenko “Data processing pipelines with Apache Spark: from protot...Vitalii Kotliarenko “Data processing pipelines with Apache Spark: from protot...
Vitalii Kotliarenko “Data processing pipelines with Apache Spark: from protot...
Lviv Startup Club
 

Ähnlich wie CHIC - Converting Hamburgers Into Cows (20)

Substructure Search Face-off
Substructure Search Face-offSubstructure Search Face-off
Substructure Search Face-off
 
Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
 
Architectural Simulation of Distributed ECU Systems
Architectural Simulation of Distributed ECU SystemsArchitectural Simulation of Distributed ECU Systems
Architectural Simulation of Distributed ECU Systems
 
Lithium PHP Meetup 0210
Lithium PHP Meetup 0210Lithium PHP Meetup 0210
Lithium PHP Meetup 0210
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
 
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
 
Icoper webinar
Icoper webinar Icoper webinar
Icoper webinar
 
Jvm fundamentals
Jvm fundamentalsJvm fundamentals
Jvm fundamentals
 
Simulation Management and Execution Control
Simulation Management and Execution ControlSimulation Management and Execution Control
Simulation Management and Execution Control
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
 
Correctness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLCorrectness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQL
 
SWORD: The Story So Far
SWORD: The Story So FarSWORD: The Story So Far
SWORD: The Story So Far
 
LEXICAL ANALYZER
LEXICAL ANALYZERLEXICAL ANALYZER
LEXICAL ANALYZER
 
2016-07-06-openphacts-docker
2016-07-06-openphacts-docker2016-07-06-openphacts-docker
2016-07-06-openphacts-docker
 
Vitalii Kotliarenko “Data processing pipelines with Apache Spark: from protot...
Vitalii Kotliarenko “Data processing pipelines with Apache Spark: from protot...Vitalii Kotliarenko “Data processing pipelines with Apache Spark: from protot...
Vitalii Kotliarenko “Data processing pipelines with Apache Spark: from protot...
 
Making Repository Easier With SWORD
Making Repository Easier With SWORDMaking Repository Easier With SWORD
Making Repository Easier With SWORD
 
Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017
 
Python Orientation
Python OrientationPython Orientation
Python Orientation
 
Question Answering in NLP on Mahabharata 24 may 2017
Question Answering in NLP on Mahabharata 24 may 2017Question Answering in NLP on Mahabharata 24 may 2017
Question Answering in NLP on Mahabharata 24 may 2017
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

CHIC - Converting Hamburgers Into Cows

Hinweis der Redaktion

  1. Most scientific research is communicated in a formal mannerGroup vs Rest of Community Full Text and Supp InfoMore Data Points require semanitcsSliding Scale – Syntax, Vocab, Ontology, Model(Re)Use:Very hard. Has required human glue before now.This is why we need semantics.
  2. Scan of a printoutPicture with Text Comp Chem more strcuture but still hardFree text
  3. Char Enc - many papers are unreadable because the various glyphs are unresolvedMARKUP – XML RDF Sematic Webthe components have meaning and possibly behavior associated with them. – OntologyNot just interpretted dataNot whole document – sometimes entities sometimes sections
  4. PDF 2 Text HardSAFOSCAR
  5. NCEsChemical Terms Chemical DataOMIISections are important – false positives
  6. Only way to determine sections correctly is to preprocess before it goes into OSCAR using SciXML to hold the section imformationHard with PDF because of the the loss of line breaks text from pictures
  7. SciXML – sections, formattingEmbedded objects can be directly turned into CML (JumboConverters)Suddenly find Data XML too
  8. DataXML loses formatting - RegexHard to recombine.Need to know what Data is associated with what preparation hence which moleculeEach step adds sematics – incremental addition of information
  9. Object Reuse and Exchange
  10. We know that this is a preparationBold NumbersStir phrase Add Phrase
  11. TokensEntitiesPOSChunking
  12. Tokens in BoxesDouble boxes = entities
  13. chunks
  14. Complete description of reaction and added data (strcutures)The following query could be used to search for all reactions using N,Ndimethylformamide as a solvent and yields greater than80%.SELECT ?preparationWHERE f?preparationhasSubstance ?substance .?substance hasMolecule<http://www.polymerinformatics.com/#DMF> .?substance hasRole<http://www.polymerinformatics.com/#Solvent> .?preparation hasSubstance ?product .?product hasYield ?yield .FILTER(?yield > 80 ) .
  15. Maps outside55 compounds madeCompletely new view of this thesis
  16. University of Cambridge (UC) and the University of Southern Queensland (USQ) funded by the JISCIntegrated Repository deposition into author workflowFine grained embagoICE allows linking / inclusion of external data filesChem4WordSemantic Authoring for ChemistryLinked ZonesChemically intelligent authoring