Weitere ähnliche Inhalte Ähnlich wie Interactive Datamining of Large-Scale Screening Datasets (20) Interactive Datamining of Large-Scale Screening Datasets1. Interactive Datamining of Large-Scale
Screening Datasets
Frank Oellien, Wolf D. Ihlenfeldt
Computer-Chemie-Centrum
University Erlangen-Nuremberg
Klaus Engel, Thomas Ertl
Visualization and Interactive Systems Group
University Stuttgart
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
2. Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
3. Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
4. Chemical
data
18000000
16000000 Merck Katalog
14000000 Synopsys PG
12000000 ACX
NCI DTP
10000000
ChemInform
8000000
Spresi
6000000 Beilstein
4000000 CAS
2000000 Current datasets
0
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
5. Multi-Variate and Multi-Dimensional
Numeric Datasets Today
Change in chemical synthesis technology
• new technologies (HTS, combinatorial synthesis)
→ experiments generate terabytes of data per year
• development of data mining and visualization tools
could not keep pace
• most critical bottleneck in R&D today !
→ tools for interactive mining and information
visualization are needed
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
6. Tools for Interactive Visualization of
Multi-Variate and Multi-Dimensional Data
Standard applications
• barchart, 2D and pseudo 3D
scatter plots, molecular spreadsheets
• limited to small subsets
• platform-dependent
Our goal: applications that are
• simple to use
• allow straightforward interpretation of results
• generalized access to tabular numeric data
• platform-independent
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
7. Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
8. 3D Tools for Interactive
Information Visualization
Information Visualization Applications that uses
3D capabilities of modern clients
• Glyph-based InfVis approaches
• Volume-based InfVis approaches
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
9. Glyph-based InfVis Tools
• 3 orthogonal axes
• color
• shape
• size
• transparency
• surface effects
• animation
• up to ~100 Glyphs
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
12. Java/Java3D InfVis Applet
3D Tool Panel
Dynamic Filter Tools
Selection Tools
Detail Tools
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
14. Advantages of Volume-based InfVis Tools
Databases with millions of data points
– Glyph-based InfVis approaches
• produce millions of geometric
primitives
• interactive visualization not possible
– Volume-based InfVis approaches
• can handle large number of
data points
• interactive visualization using
low-cost graphics hardware is possible
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
15. Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
16. ChemCodes Reaction Database
• 100 most important FGs ~75% chemistry
• 100 standard reactions
• Limits of standard reactions
• Functional Group Compatibility
• Generating Rules
Goal: Analysis of the reaction space
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
17. ChemCodes - Reaction Optimization I
• Goal:
Reaction Optimization: > 95% Yield
• 7 Dimensions:
reagent, solvent,
time, temperature,
stoichiometry,
reagent order,
FG-compatibility
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
19. ChemCodes - Reaction Planning
Functional
Group
Compatibility
Check
H H
N
H
O
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
20. Example 2: NCI Anti-tumor
/ Anti-viral Database
• Initiated in April 1990 (modified 1994)
• ~ 250.000 compounds
• ~ 30.000 with anti-tumor screening data
Enhanced NCI Database Browser
• > 30 different molecular properties
• up to 23 3D conformers per compound
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
23. Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
24. Acknowledgment
• Prof. Johann Gasteiger
Computer-Chemie-Centrum
University of Erlangen-Nuremberg
• Prof. Thomas Ertl, Dipl. Inf. Klaus Engel
Visualization and interactive Systems
University of Stuttgart
• Dr. Patrick Kiser, Dr. Gary Eichenbaum
ChemCodes Inc.
• Marc Nicklaus
Laboratory of Medicinal Chemistry
NCI, NIH
• Deutsche Forschungsgemeinschaft
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002