SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Automated Molecular Data Extraction
using Open Babel & ChemSpotlight:
       The Semantic Desktop

         Prof. Geoff Hutchison
         Department of Chemistry
         University of Pittsburgh
         geoffh@pitt.edu


         ACS CINF: Skolnik Symposium
         21 August 2012
         http://hutchison.chem.pitt.edu
“
I can plug my iPod into any
computer and it will recognize
my music and give me all sorts
of metadata: artist, title, type of
music...

Why can’t I read the chemical
metadata off my chemistry files?
                                      ”
— Prof. Henry S. Rzepa (Imperial College)
  Spring 2005 ACS Meeting, San Diego, CA
Pre-History: Chem://Dig


                                Index files, websites
                                Based on Chem MIME
                                Find files on extension
                                Perceive chemistry
                                Database Store
                                Search, Filter
                                Retrieval

    H. Rzepa et al. New J. Chem (2002) 26 p. 656
Open Babel
              Open Babel (Started 2001)
                 Free, open source chemical toolbox
                 Cross-platform: Win, Mac, Linux...
                 Both user-tools & C++ library
                 Interfaces in Python, Perl, Ruby,
                 Java, C#
                 Supports chemistry, bioinformatics,
                 solid-state…
                 100+ file formats and variants

          http://openbabel.org/
    O’Boyle et al. J. Cheminf. 2011, 3:33
Chemical Database?


    1. Some way to store data
         (Organize it)
    2. Index it
    3. Search / filter
    4. Visualize results
ChemSpotlight: Indexing Architecture



                                   ~300 lines
              +                +    of code

  Spotlight       Open Babel

    http://chemspotlight.openmolecules.net/
ChemSpotlight: “Un” Database


      Use the system-wide search database
      No (Visible) Database!
      Index files in-place
      Includes textual data
      (e.g., chemical names, formulas, etc.)
      Multiple retrieval and filtering interfaces
      (i.e., any third-party search tool works)

      http://chemspotlight.openmolecules.net/
So What’s Stored / Perceived
       Formula, mass, SMILES, InChI
       net_sourceforge_openbabel_Formula        =
       C21H36N7O8S

       Fingerprints, number of
        atoms, bonds, residues
       PDB, SDF keywords, properties
       Calculation keywords:
       kMDItemComment                           =
       "Gaussian 09 #n B3LYP/6-31G(d) Opt"

       Calculation results
       (HOMO, LUMO, Dipole Moment)
       net_sourceforge_chemspotlight_DipoleMoment   =
       3.5
ChemSpotlight “Un” Database
ChemSpotlight “Un” Database
How Do We Visualize?

   “QuickLook” previews
   New code ~800 lines
   Generate SDF, PDB, CIF
   (if needed)
   Pass off to ChemDoodle
   Web Components
   Pseudo-3D, interactive JS
   + HTML5
   … or SVG generation
   from Open Babel

             http://web.chemdoodle.com/
Organic Heterojunction Solar Cells



  light
  Transparent Electrode
        +   p-type material
                              Circuit
    -       n-type material
    Reflective Electrode
Organic Heterojunction Solar Cells

                                 ΔE ≥ Exciton Binding Energy                           e-


                                                                           Optical Excitation
  light                                                                            hν
                                        Cathode
  Transparent Electrode                                        Hole
                                                   Electron Conducting                Effective
        +   p-type material
                                                  Conductor Polymer                Heterojunction
                              Circuit
    -       n-type material                       (Nanoparticle)                     Bandgap

    Reflective Electrode                                                  Anode
                                                                                      h+
Pipeline Model for Finding New Molecules

             Monomers
                                       >106
                                     Possible
                                    Structures

                                        Electronic




                                                     ~9 minutes
                                        Properties

                                         Optical
                                        Properties

                                        Synthetic
                                         Score


J Phys Chem C 2011 vol. 115 pp. 16200       ...
Pipeline Model for Finding New Molecules

             Monomers
                                       >106
                                     Possible
                                    Structures

                Fast                    Electronic




                                                     ~9 minutes
             Screening                  Properties

                                         Optical
                                        Properties

                                        Synthetic
               Slower                    Score


J Phys Chem C 2011 vol. 115 pp. 16200       ...
New Genetic Algorithm Approach

      Rather than directly
      driving & wait for
      calc results
      Check Spotlight for
      new results
        “What are top
        HOMO energies?”
      Update GA, generate
      new candidates,
      submit new jobs
Scaling Up the Polymer Solar Search


        S
                                             0


   2nd Gen. Search:
   680 Monomers          LUMO Energy (eV)   −1

   2800+ Fragments
   Search Space:
                                            −2
   500+ million
   oligomers
   ~9 minutes per core                      −3
                                              −9.5   −9.0   −8.5 −8.0 −7.5     −7.0   −6.5
                                                            HOMO Energy (eV)
Take-Home Messages

   “Big Data” is a Big Headache
   ChemSpotlight & Un-Databases Work!
   Keep data as native files w/separate index
   Integrate into user-friendly tools
   Sell to users: “What’s in it for me?”
    Indexing, retrieval
    Improved workflows
Marcus Hanwell
                      Pitt / Kitware




Dr. Noel O’Boyle     Casey Campbell
U.C. Cork, Ireland     Pitt (2010)

Weitere ähnliche Inhalte

Andere mochten auch

Plan de mejora de hoy.com.ec
Plan de mejora de hoy.com.ecPlan de mejora de hoy.com.ec
Plan de mejora de hoy.com.ecAdriana Alban
 
InTASC Standards
InTASC StandardsInTASC Standards
InTASC Standardstaylorjaye
 
Trastornos alimenticios.
Trastornos alimenticios.Trastornos alimenticios.
Trastornos alimenticios._danielahm
 
The 2015 Tech Roundup
The 2015 Tech RoundupThe 2015 Tech Roundup
The 2015 Tech RoundupRobert Half
 
Crew, Foia, Documents 012829 - 012917
Crew, Foia, Documents 012829 - 012917Crew, Foia, Documents 012829 - 012917
Crew, Foia, Documents 012829 - 012917Obama White House
 
Styling with CSS
Styling with CSSStyling with CSS
Styling with CSSMike Crabb
 
Tutorial de como crear particiones de disco duro en windows 10
Tutorial de como crear particiones de disco duro en windows 10Tutorial de como crear particiones de disco duro en windows 10
Tutorial de como crear particiones de disco duro en windows 10luisberazaarieta
 
Demystifying research impact metrics and library support
Demystifying research impact   metrics and library supportDemystifying research impact   metrics and library support
Demystifying research impact metrics and library supportWestern Sydney University
 
How Secure is Cloud ?
How Secure is Cloud ?How Secure is Cloud ?
How Secure is Cloud ?Abu Jubaer
 
Derecho de los pueblos a la auto determinación
Derecho de los pueblos a la  auto determinaciónDerecho de los pueblos a la  auto determinación
Derecho de los pueblos a la auto determinaciónFrank Ragol
 
4ª lista de exercícios desenho técnico i
4ª lista de exercícios   desenho técnico i4ª lista de exercícios   desenho técnico i
4ª lista de exercícios desenho técnico iMarilia Estevao
 
Neu-ir 2016: Opening note
Neu-ir 2016: Opening noteNeu-ir 2016: Opening note
Neu-ir 2016: Opening noteBhaskar Mitra
 
Explore Your Twitter Analytics Dashboard
Explore Your Twitter Analytics DashboardExplore Your Twitter Analytics Dashboard
Explore Your Twitter Analytics DashboardTory Starr
 

Andere mochten auch (20)

Plan de mejora de hoy.com.ec
Plan de mejora de hoy.com.ecPlan de mejora de hoy.com.ec
Plan de mejora de hoy.com.ec
 
InTASC Standards
InTASC StandardsInTASC Standards
InTASC Standards
 
2013 Year End Commercial Real Estate Review
2013 Year End Commercial Real Estate Review2013 Year End Commercial Real Estate Review
2013 Year End Commercial Real Estate Review
 
Trastornos alimenticios.
Trastornos alimenticios.Trastornos alimenticios.
Trastornos alimenticios.
 
The 2015 Tech Roundup
The 2015 Tech RoundupThe 2015 Tech Roundup
The 2015 Tech Roundup
 
Resume New Mitesh
Resume New MiteshResume New Mitesh
Resume New Mitesh
 
07 (ok)mulher encurvada (libertação)
07  (ok)mulher encurvada (libertação)07  (ok)mulher encurvada (libertação)
07 (ok)mulher encurvada (libertação)
 
Crew, Foia, Documents 012829 - 012917
Crew, Foia, Documents 012829 - 012917Crew, Foia, Documents 012829 - 012917
Crew, Foia, Documents 012829 - 012917
 
Styling with CSS
Styling with CSSStyling with CSS
Styling with CSS
 
Tutorial de como crear particiones de disco duro en windows 10
Tutorial de como crear particiones de disco duro en windows 10Tutorial de como crear particiones de disco duro en windows 10
Tutorial de como crear particiones de disco duro en windows 10
 
Impact Outside Academia
Impact Outside AcademiaImpact Outside Academia
Impact Outside Academia
 
Tema 2: secuencias-didacticas
Tema 2: secuencias-didacticasTema 2: secuencias-didacticas
Tema 2: secuencias-didacticas
 
Disrupting the Startup Brogrammer Culture
Disrupting the Startup Brogrammer Culture Disrupting the Startup Brogrammer Culture
Disrupting the Startup Brogrammer Culture
 
Demystifying research impact metrics and library support
Demystifying research impact   metrics and library supportDemystifying research impact   metrics and library support
Demystifying research impact metrics and library support
 
How Secure is Cloud ?
How Secure is Cloud ?How Secure is Cloud ?
How Secure is Cloud ?
 
Derecho de los pueblos a la auto determinación
Derecho de los pueblos a la  auto determinaciónDerecho de los pueblos a la  auto determinación
Derecho de los pueblos a la auto determinación
 
4ª lista de exercícios desenho técnico i
4ª lista de exercícios   desenho técnico i4ª lista de exercícios   desenho técnico i
4ª lista de exercícios desenho técnico i
 
Neu-ir 2016: Opening note
Neu-ir 2016: Opening noteNeu-ir 2016: Opening note
Neu-ir 2016: Opening note
 
Randall Whittinghill: Puppies
Randall Whittinghill: PuppiesRandall Whittinghill: Puppies
Randall Whittinghill: Puppies
 
Explore Your Twitter Analytics Dashboard
Explore Your Twitter Analytics DashboardExplore Your Twitter Analytics Dashboard
Explore Your Twitter Analytics Dashboard
 

Ähnlich wie 2012 ACS Skolnik Symposium - ChemSpotlight

Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
玩轉 LHC 公開數據 (Play around with the LHC open data)
玩轉 LHC 公開數據 (Play around with the LHC open data)玩轉 LHC 公開數據 (Play around with the LHC open data)
玩轉 LHC 公開數據 (Play around with the LHC open data)Yuan CHAO
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSSYuan CHAO
 
PhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_PaulPhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_PaulAbhijeet Paul
 
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...webscience-montpellier
 
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011Lewis Larsen
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designUniversity of California, San Diego
 
Вычислительный эксперимент в молекулярной биофизике белков и биомембран
Вычислительный эксперимент в молекулярной биофизике белков и биомембранВычислительный эксперимент в молекулярной биофизике белков и биомембран
Вычислительный эксперимент в молекулярной биофизике белков и биомембранIlya Klabukov
 
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...TCBG
 
Computational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to PracticeComputational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to PracticeDavid Thompson
 
Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3glennfish
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 
Kobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectKobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectMaho Nakata
 
Using MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryUsing MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryDan Gunter
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLibYuan CHAO
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)cdtpv
 
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...BIOVIA
 

Ähnlich wie 2012 ACS Skolnik Symposium - ChemSpotlight (20)

Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
玩轉 LHC 公開數據 (Play around with the LHC open data)
玩轉 LHC 公開數據 (Play around with the LHC open data)玩轉 LHC 公開數據 (Play around with the LHC open data)
玩轉 LHC 公開數據 (Play around with the LHC open data)
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSS
 
PhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_PaulPhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_Paul
 
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...
EnCOrE: Chemistry, Education, Knowledge From the Real to the Virtual Needs, P...
 
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
Вычислительный эксперимент в молекулярной биофизике белков и биомембран
Вычислительный эксперимент в молекулярной биофизике белков и биомембранВычислительный эксперимент в молекулярной биофизике белков и биомембран
Вычислительный эксперимент в молекулярной биофизике белков и биомембран
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
 
Computational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to PracticeComputational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to Practice
 
Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Kobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectKobeworkshop pubchemqc project
Kobeworkshop pubchemqc project
 
Using MongoDB for Materials Discovery
Using MongoDB for Materials DiscoveryUsing MongoDB for Materials Discovery
Using MongoDB for Materials Discovery
 
Bionic eye
Bionic eyeBionic eye
Bionic eye
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)
 
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
 

Kürzlich hochgeladen

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

2012 ACS Skolnik Symposium - ChemSpotlight

  • 1. Automated Molecular Data Extraction using Open Babel & ChemSpotlight: The Semantic Desktop Prof. Geoff Hutchison Department of Chemistry University of Pittsburgh geoffh@pitt.edu ACS CINF: Skolnik Symposium 21 August 2012 http://hutchison.chem.pitt.edu
  • 2. “ I can plug my iPod into any computer and it will recognize my music and give me all sorts of metadata: artist, title, type of music... Why can’t I read the chemical metadata off my chemistry files? ” — Prof. Henry S. Rzepa (Imperial College) Spring 2005 ACS Meeting, San Diego, CA
  • 3. Pre-History: Chem://Dig Index files, websites Based on Chem MIME Find files on extension Perceive chemistry Database Store Search, Filter Retrieval H. Rzepa et al. New J. Chem (2002) 26 p. 656
  • 4. Open Babel Open Babel (Started 2001) Free, open source chemical toolbox Cross-platform: Win, Mac, Linux... Both user-tools & C++ library Interfaces in Python, Perl, Ruby, Java, C# Supports chemistry, bioinformatics, solid-state… 100+ file formats and variants http://openbabel.org/ O’Boyle et al. J. Cheminf. 2011, 3:33
  • 5. Chemical Database? 1. Some way to store data (Organize it) 2. Index it 3. Search / filter 4. Visualize results
  • 6. ChemSpotlight: Indexing Architecture ~300 lines + + of code Spotlight Open Babel http://chemspotlight.openmolecules.net/
  • 7. ChemSpotlight: “Un” Database Use the system-wide search database No (Visible) Database! Index files in-place Includes textual data (e.g., chemical names, formulas, etc.) Multiple retrieval and filtering interfaces (i.e., any third-party search tool works) http://chemspotlight.openmolecules.net/
  • 8. So What’s Stored / Perceived Formula, mass, SMILES, InChI net_sourceforge_openbabel_Formula = C21H36N7O8S Fingerprints, number of atoms, bonds, residues PDB, SDF keywords, properties Calculation keywords: kMDItemComment = "Gaussian 09 #n B3LYP/6-31G(d) Opt" Calculation results (HOMO, LUMO, Dipole Moment) net_sourceforge_chemspotlight_DipoleMoment = 3.5
  • 11. How Do We Visualize? “QuickLook” previews New code ~800 lines Generate SDF, PDB, CIF (if needed) Pass off to ChemDoodle Web Components Pseudo-3D, interactive JS + HTML5 … or SVG generation from Open Babel http://web.chemdoodle.com/
  • 12. Organic Heterojunction Solar Cells light Transparent Electrode + p-type material Circuit - n-type material Reflective Electrode
  • 13. Organic Heterojunction Solar Cells ΔE ≥ Exciton Binding Energy e- Optical Excitation light hν Cathode Transparent Electrode Hole Electron Conducting Effective + p-type material Conductor Polymer Heterojunction Circuit - n-type material (Nanoparticle) Bandgap Reflective Electrode Anode h+
  • 14. Pipeline Model for Finding New Molecules Monomers >106 Possible Structures Electronic ~9 minutes Properties Optical Properties Synthetic Score J Phys Chem C 2011 vol. 115 pp. 16200 ...
  • 15. Pipeline Model for Finding New Molecules Monomers >106 Possible Structures Fast Electronic ~9 minutes Screening Properties Optical Properties Synthetic Slower Score J Phys Chem C 2011 vol. 115 pp. 16200 ...
  • 16. New Genetic Algorithm Approach Rather than directly driving & wait for calc results Check Spotlight for new results “What are top HOMO energies?” Update GA, generate new candidates, submit new jobs
  • 17. Scaling Up the Polymer Solar Search S 0 2nd Gen. Search: 680 Monomers LUMO Energy (eV) −1 2800+ Fragments Search Space: −2 500+ million oligomers ~9 minutes per core −3 −9.5 −9.0 −8.5 −8.0 −7.5 −7.0 −6.5 HOMO Energy (eV)
  • 18. Take-Home Messages “Big Data” is a Big Headache ChemSpotlight & Un-Databases Work! Keep data as native files w/separate index Integrate into user-friendly tools Sell to users: “What’s in it for me?” Indexing, retrieval Improved workflows
  • 19. Marcus Hanwell Pitt / Kitware Dr. Noel O’Boyle Casey Campbell U.C. Cork, Ireland Pitt (2010)