SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Interactive Datamining of Large-Scale
           Screening Datasets
                                     Frank Oellien, Wolf D. Ihlenfeldt
                                     Computer-Chemie-Centrum
                                     University Erlangen-Nuremberg


                                     Klaus Engel, Thomas Ertl
                                     Visualization and Interactive Systems Group
                                     University Stuttgart




C  3
© Oellien, Ihlenfeldt, Engel, Ertl                                   MMWS 2002
Overview

 Multi-variate and multi-dimensional datasets
 • Motivation
 • Information Visualization Techniques
 • Examples (ChemCodes Inc., NCI)
 • Demo




C  3
© Oellien, Ihlenfeldt, Engel, Ertl              MMWS 2002
Overview

 Multi-variate and multi-dimensional datasets
 • Motivation
 • Information Visualization Techniques
 • Examples (ChemCodes Inc., NCI)
 • Demo




C  3
© Oellien, Ihlenfeldt, Engel, Ertl              MMWS 2002
Chemical
              data

18000000
16000000                               Merck Katalog
14000000                               Synopsys PG
12000000                               ACX
                                       NCI DTP
10000000
                                       ChemInform
8000000
                                       Spresi
6000000                                Beilstein
4000000                                CAS
2000000                                Current datasets
         0

 C  3
 © Oellien, Ihlenfeldt, Engel, Ertl   MMWS 2002
Multi-Variate and Multi-Dimensional
               Numeric Datasets Today

 Change in chemical synthesis technology

 • new technologies (HTS, combinatorial synthesis)
      → experiments generate terabytes of data per year
 • development of data mining and visualization tools
   could not keep pace
 • most critical bottleneck in R&D today !
  → tools for interactive mining and information
    visualization are needed

C  3
© Oellien, Ihlenfeldt, Engel, Ertl            MMWS 2002
Tools for Interactive Visualization of
    Multi-Variate and Multi-Dimensional Data
Standard applications
   • barchart, 2D and pseudo 3D
     scatter plots, molecular spreadsheets
   • limited to small subsets
   • platform-dependent
Our goal: applications that are
   • simple to use
   • allow straightforward interpretation of results
   • generalized access to tabular numeric data
   • platform-independent

 C  3
 © Oellien, Ihlenfeldt, Engel, Ertl              MMWS 2002
Overview

 Multi-variate and multi-dimensional datasets
 • Motivation
 • Information Visualization Techniques
 • Examples (ChemCodes Inc., NCI)
 • Demo




C  3
© Oellien, Ihlenfeldt, Engel, Ertl              MMWS 2002
3D Tools for Interactive
                       Information Visualization

 Information Visualization Applications that uses
 3D capabilities of modern clients

 • Glyph-based InfVis approaches

 • Volume-based InfVis approaches




C  3
© Oellien, Ihlenfeldt, Engel, Ertl                 MMWS 2002
Glyph-based InfVis Tools

 • 3 orthogonal axes
 • color
 • shape
 • size
 • transparency
 • surface effects
 • animation
 • up to ~100 Glyphs
C  3
© Oellien, Ihlenfeldt, Engel, Ertl           MMWS 2002
Java/Java3D InfVis Applet



Java3D                                           Tool Panel
Canvas                                       (filters, selection
                                               tools, details)




Control
Panel



C  3
© Oellien, Ihlenfeldt, Engel, Ertl           MMWS 2002
Java/Java3D InfVis Applet
                            3D Render Panel




                3D Glyphs                 3D Barchart


C  3
© Oellien, Ihlenfeldt, Engel, Ertl              MMWS 2002
Java/Java3D InfVis Applet
                              3D Tool Panel




                Dynamic Filter Tools


                                     Selection Tools
                                                       Detail Tools

C  3
© Oellien, Ihlenfeldt, Engel, Ertl                               MMWS 2002
Java/Java3D InfVis Applet
                            3D Control Panel




C  3
© Oellien, Ihlenfeldt, Engel, Ertl             MMWS 2002
Advantages of Volume-based InfVis Tools
Databases with millions of data points
      – Glyph-based InfVis approaches
         • produce millions of geometric
           primitives
         • interactive visualization not possible

      – Volume-based InfVis approaches
         • can handle large number of
           data points
         • interactive visualization using
           low-cost graphics hardware is possible

C  3
© Oellien, Ihlenfeldt, Engel, Ertl                  MMWS 2002
Overview

 Multi-variate and multi-dimensional datasets
 • Motivation
 • Information Visualization Techniques
 • Examples (ChemCodes Inc., NCI)
 • Demo




C  3
© Oellien, Ihlenfeldt, Engel, Ertl              MMWS 2002
ChemCodes Reaction Database


 • 100 most important FGs ~75% chemistry
 • 100 standard reactions
 • Limits of standard reactions
 • Functional Group Compatibility
 • Generating Rules

 Goal: Analysis of the reaction space


C  3
© Oellien, Ihlenfeldt, Engel, Ertl      MMWS 2002
ChemCodes - Reaction Optimization I
 • Goal:
   Reaction Optimization: > 95% Yield

 • 7 Dimensions:
   reagent, solvent,
   time, temperature,
   stoichiometry,
   reagent order,
   FG-compatibility



C  3
© Oellien, Ihlenfeldt, Engel, Ertl      MMWS 2002
ChemCodes - Reaction Optimization II




C  3
© Oellien, Ihlenfeldt, Engel, Ertl   MMWS 2002
ChemCodes - Reaction Planning
Functional
Group
Compatibility
Check
          H           H
                N




H



     O




C  3
© Oellien, Ihlenfeldt, Engel, Ertl     MMWS 2002
Example 2: NCI Anti-tumor
                       / Anti-viral Database

 • Initiated in April 1990 (modified 1994)
 • ~ 250.000 compounds
 • ~ 30.000 with anti-tumor screening data

 Enhanced NCI Database Browser
 • > 30 different molecular properties
 • up to 23 3D conformers per compound


C  3
© Oellien, Ihlenfeldt, Engel, Ertl          MMWS 2002
Lead Compound Discovery II




C  3
© Oellien, Ihlenfeldt, Engel, Ertl        MMWS 2002
Lead Compound Discovery II




C  3
© Oellien, Ihlenfeldt, Engel, Ertl        MMWS 2002
Overview

 Multi-variate and multi-dimensional datasets
 • Motivation
 • Information Visualization Techniques
 • Examples (ChemCodes Inc., NCI)
 • Demo




C  3
© Oellien, Ihlenfeldt, Engel, Ertl              MMWS 2002
Acknowledgment
 • Prof. Johann Gasteiger
   Computer-Chemie-Centrum
   University of Erlangen-Nuremberg

 • Prof. Thomas Ertl, Dipl. Inf. Klaus Engel
   Visualization and interactive Systems
   University of Stuttgart
 • Dr. Patrick Kiser, Dr. Gary Eichenbaum
   ChemCodes Inc.
 • Marc Nicklaus
   Laboratory of Medicinal Chemistry
   NCI, NIH

 • Deutsche Forschungsgemeinschaft
C  3
© Oellien, Ihlenfeldt, Engel, Ertl               MMWS 2002

Weitere ähnliche Inhalte

Ähnlich wie Interactive Datamining of Large-Scale Screening Datasets

Clinical Engineer LinkedIn 10262015 3pages
Clinical Engineer LinkedIn 10262015 3pagesClinical Engineer LinkedIn 10262015 3pages
Clinical Engineer LinkedIn 10262015 3pages
Samuel Allem
 
Accelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelAccelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at Intel
Intel IT Center
 
EPGP Informatics Publication - nihms-369795
EPGP Informatics Publication - nihms-369795EPGP Informatics Publication - nihms-369795
EPGP Informatics Publication - nihms-369795
Michael Williams
 
Determination of Various Diseases in Two Most Consumed Fruits using Artificia...
Determination of Various Diseases in Two Most Consumed Fruits using Artificia...Determination of Various Diseases in Two Most Consumed Fruits using Artificia...
Determination of Various Diseases in Two Most Consumed Fruits using Artificia...
ijtsrd
 

Ähnlich wie Interactive Datamining of Large-Scale Screening Datasets (20)

The OHF Legacy
The OHF LegacyThe OHF Legacy
The OHF Legacy
 
Clinical Engineer LinkedIn 10262015 3pages
Clinical Engineer LinkedIn 10262015 3pagesClinical Engineer LinkedIn 10262015 3pages
Clinical Engineer LinkedIn 10262015 3pages
 
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataPhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
 
Productivity and Performance: An Exploration of Parallel H5py on HPC
Productivity and Performance: An Exploration of Parallel H5py on HPCProductivity and Performance: An Exploration of Parallel H5py on HPC
Productivity and Performance: An Exploration of Parallel H5py on HPC
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen
 
Accelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelAccelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at Intel
 
Covid-19 Detection using Chest X-Ray Images
Covid-19 Detection using Chest X-Ray ImagesCovid-19 Detection using Chest X-Ray Images
Covid-19 Detection using Chest X-Ray Images
 
x-RAYS PROJECT
x-RAYS PROJECTx-RAYS PROJECT
x-RAYS PROJECT
 
Informatics Plenary Hayward Etal May2012
Informatics Plenary Hayward Etal May2012Informatics Plenary Hayward Etal May2012
Informatics Plenary Hayward Etal May2012
 
Ijetcas14 323
Ijetcas14 323Ijetcas14 323
Ijetcas14 323
 
The openEHR Revolution Heidelberg 2018
The openEHR Revolution Heidelberg 2018The openEHR Revolution Heidelberg 2018
The openEHR Revolution Heidelberg 2018
 
Will Spooner - Big Data in Mental Health - 23rd July 2014
Will Spooner - Big Data in Mental Health - 23rd July 2014Will Spooner - Big Data in Mental Health - 23rd July 2014
Will Spooner - Big Data in Mental Health - 23rd July 2014
 
EPGP Informatics Publication - nihms-369795
EPGP Informatics Publication - nihms-369795EPGP Informatics Publication - nihms-369795
EPGP Informatics Publication - nihms-369795
 
FACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRFFACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRF
 
Implementation and Use of ISO EN 13606 and openEHR
Implementation and Use of ISO EN 13606 and openEHRImplementation and Use of ISO EN 13606 and openEHR
Implementation and Use of ISO EN 13606 and openEHR
 
Determination of Various Diseases in Two Most Consumed Fruits using Artificia...
Determination of Various Diseases in Two Most Consumed Fruits using Artificia...Determination of Various Diseases in Two Most Consumed Fruits using Artificia...
Determination of Various Diseases in Two Most Consumed Fruits using Artificia...
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Microlearning in crowdsourcing and crowdtasking applicaitons
Microlearning in crowdsourcing and crowdtasking applicaitonsMicrolearning in crowdsourcing and crowdtasking applicaitons
Microlearning in crowdsourcing and crowdtasking applicaitons
 
Lec.10 Dr Ahmed Elngar
Lec.10 Dr Ahmed ElngarLec.10 Dr Ahmed Elngar
Lec.10 Dr Ahmed Elngar
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 

Interactive Datamining of Large-Scale Screening Datasets

  • 1. Interactive Datamining of Large-Scale Screening Datasets Frank Oellien, Wolf D. Ihlenfeldt Computer-Chemie-Centrum University Erlangen-Nuremberg Klaus Engel, Thomas Ertl Visualization and Interactive Systems Group University Stuttgart C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 2. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 3. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 4. Chemical data 18000000 16000000 Merck Katalog 14000000 Synopsys PG 12000000 ACX NCI DTP 10000000 ChemInform 8000000 Spresi 6000000 Beilstein 4000000 CAS 2000000 Current datasets 0 C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 5. Multi-Variate and Multi-Dimensional Numeric Datasets Today Change in chemical synthesis technology • new technologies (HTS, combinatorial synthesis) → experiments generate terabytes of data per year • development of data mining and visualization tools could not keep pace • most critical bottleneck in R&D today ! → tools for interactive mining and information visualization are needed C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 6. Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data Standard applications • barchart, 2D and pseudo 3D scatter plots, molecular spreadsheets • limited to small subsets • platform-dependent Our goal: applications that are • simple to use • allow straightforward interpretation of results • generalized access to tabular numeric data • platform-independent C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 7. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 8. 3D Tools for Interactive Information Visualization Information Visualization Applications that uses 3D capabilities of modern clients • Glyph-based InfVis approaches • Volume-based InfVis approaches C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 9. Glyph-based InfVis Tools • 3 orthogonal axes • color • shape • size • transparency • surface effects • animation • up to ~100 Glyphs C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 10. Java/Java3D InfVis Applet Java3D Tool Panel Canvas (filters, selection tools, details) Control Panel C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 11. Java/Java3D InfVis Applet 3D Render Panel 3D Glyphs 3D Barchart C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 12. Java/Java3D InfVis Applet 3D Tool Panel Dynamic Filter Tools Selection Tools Detail Tools C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 13. Java/Java3D InfVis Applet 3D Control Panel C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 14. Advantages of Volume-based InfVis Tools Databases with millions of data points – Glyph-based InfVis approaches • produce millions of geometric primitives • interactive visualization not possible – Volume-based InfVis approaches • can handle large number of data points • interactive visualization using low-cost graphics hardware is possible C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 15. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 16. ChemCodes Reaction Database • 100 most important FGs ~75% chemistry • 100 standard reactions • Limits of standard reactions • Functional Group Compatibility • Generating Rules Goal: Analysis of the reaction space C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 17. ChemCodes - Reaction Optimization I • Goal: Reaction Optimization: > 95% Yield • 7 Dimensions: reagent, solvent, time, temperature, stoichiometry, reagent order, FG-compatibility C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 18. ChemCodes - Reaction Optimization II C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 19. ChemCodes - Reaction Planning Functional Group Compatibility Check H H N H O C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 20. Example 2: NCI Anti-tumor / Anti-viral Database • Initiated in April 1990 (modified 1994) • ~ 250.000 compounds • ~ 30.000 with anti-tumor screening data Enhanced NCI Database Browser • > 30 different molecular properties • up to 23 3D conformers per compound C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 21. Lead Compound Discovery II C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 22. Lead Compound Discovery II C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 23. Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
  • 24. Acknowledgment • Prof. Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nuremberg • Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive Systems University of Stuttgart • Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc. • Marc Nicklaus Laboratory of Medicinal Chemistry NCI, NIH • Deutsche Forschungsgemeinschaft C 3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002