Designing IA for AI - Information Architecture Conference 2024
2012 ACS Skolnik Symposium - ChemSpotlight
1. Automated Molecular Data Extraction
using Open Babel & ChemSpotlight:
The Semantic Desktop
Prof. Geoff Hutchison
Department of Chemistry
University of Pittsburgh
geoffh@pitt.edu
ACS CINF: Skolnik Symposium
21 August 2012
http://hutchison.chem.pitt.edu
2. “
I can plug my iPod into any
computer and it will recognize
my music and give me all sorts
of metadata: artist, title, type of
music...
Why can’t I read the chemical
metadata off my chemistry files?
”
— Prof. Henry S. Rzepa (Imperial College)
Spring 2005 ACS Meeting, San Diego, CA
3. Pre-History: Chem://Dig
Index files, websites
Based on Chem MIME
Find files on extension
Perceive chemistry
Database Store
Search, Filter
Retrieval
H. Rzepa et al. New J. Chem (2002) 26 p. 656
4. Open Babel
Open Babel (Started 2001)
Free, open source chemical toolbox
Cross-platform: Win, Mac, Linux...
Both user-tools & C++ library
Interfaces in Python, Perl, Ruby,
Java, C#
Supports chemistry, bioinformatics,
solid-state…
100+ file formats and variants
http://openbabel.org/
O’Boyle et al. J. Cheminf. 2011, 3:33
5. Chemical Database?
1. Some way to store data
(Organize it)
2. Index it
3. Search / filter
4. Visualize results
7. ChemSpotlight: “Un” Database
Use the system-wide search database
No (Visible) Database!
Index files in-place
Includes textual data
(e.g., chemical names, formulas, etc.)
Multiple retrieval and filtering interfaces
(i.e., any third-party search tool works)
http://chemspotlight.openmolecules.net/
11. How Do We Visualize?
“QuickLook” previews
New code ~800 lines
Generate SDF, PDB, CIF
(if needed)
Pass off to ChemDoodle
Web Components
Pseudo-3D, interactive JS
+ HTML5
… or SVG generation
from Open Babel
http://web.chemdoodle.com/
12. Organic Heterojunction Solar Cells
light
Transparent Electrode
+ p-type material
Circuit
- n-type material
Reflective Electrode
13. Organic Heterojunction Solar Cells
ΔE ≥ Exciton Binding Energy e-
Optical Excitation
light hν
Cathode
Transparent Electrode Hole
Electron Conducting Effective
+ p-type material
Conductor Polymer Heterojunction
Circuit
- n-type material (Nanoparticle) Bandgap
Reflective Electrode Anode
h+
14. Pipeline Model for Finding New Molecules
Monomers
>106
Possible
Structures
Electronic
~9 minutes
Properties
Optical
Properties
Synthetic
Score
J Phys Chem C 2011 vol. 115 pp. 16200 ...
15. Pipeline Model for Finding New Molecules
Monomers
>106
Possible
Structures
Fast Electronic
~9 minutes
Screening Properties
Optical
Properties
Synthetic
Slower Score
J Phys Chem C 2011 vol. 115 pp. 16200 ...
16. New Genetic Algorithm Approach
Rather than directly
driving & wait for
calc results
Check Spotlight for
new results
“What are top
HOMO energies?”
Update GA, generate
new candidates,
submit new jobs
17. Scaling Up the Polymer Solar Search
S
0
2nd Gen. Search:
680 Monomers LUMO Energy (eV) −1
2800+ Fragments
Search Space:
−2
500+ million
oligomers
~9 minutes per core −3
−9.5 −9.0 −8.5 −8.0 −7.5 −7.0 −6.5
HOMO Energy (eV)
18. Take-Home Messages
“Big Data” is a Big Headache
ChemSpotlight & Un-Databases Work!
Keep data as native files w/separate index
Integrate into user-friendly tools
Sell to users: “What’s in it for me?”
Indexing, retrieval
Improved workflows
19. Marcus Hanwell
Pitt / Kitware
Dr. Noel O’Boyle Casey Campbell
U.C. Cork, Ireland Pitt (2010)