SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
NINA JELIAZKOVA
          IdeaConsult Ltd.
          Sofia, Bulgaria
          www.ideaconsult.net


15TH INTERNATIONAL QSAR WORKSHOP
                  TALLINN, ESTONIA
LAST DECADE CHANGES
Biology, Bioinformatics
 Data, human genome
Internet
 Online databases
 Online collaboration
 Crowd Sourcing
Open access, open source
 Database management systems           Williams J M et al. Brief Bioinform 2010;11:598-609
                                         Although biological databases are
 Machine learning and statistics        only a fraction of all available
 Cheminformatics                        bioinformatics software resources,
                                         their rise is representative of the
                                         overall growth of these resources,
                                         and is concurrent with the number of
                                         base pairs released in GenBank.


                                    15TH INTERNATIONAL QSAR WORKSHOP
                                                      TALLINN, ESTONIA                   2
OPEN SOURCE CHEMINFORMATICS




                                Noel M O'Boyle et.al. Open Data,
                                Open Source and Open Standards in
A.Dalke EuroQSAR 2008 poster    chemistry: The Blue Obelisk five
                                years on. Journal of
                                Cheminformatics 2011, 3:37



                               15TH INTERNATIONAL QSAR WORKSHOP
                                                 TALLINN, ESTONIA   3
DATABASES
 2001: Binding DB                 2006: Chemical Structure>80 mln
                                    Lookup Service (CSLS) structures
 2002: NIAID ChemDB
                                   2007: ChemSpider
  HIV/AIDS Database
                                   2008/2009: ChEMBL
 2003 Ligand Depot                2009: Chemical Identifier
 2004: ZINC database               Resolver (CIR)
 2004: PDBbind database           2003: First Nucleic Acids
 2004: PubChem >30 mln             Research (NAR) annual
                    structures      Database issue
 2004: sc-PDB                       > 1000 databases published
 2004: Binding MOAD                  per year since 2008;
 2005: DrugBank                     >1300 in 2011
                        Marc Nicklaus, 5th Meeting on U.S. Government
                        Chemical Databases and Open Chemistry, 2011
                        http://cactus.nci.nih.gov/presentations/meeting-08-
                        2011/meeting-2011-08-25.html
                                   15TH INTERNATIONAL QSAR WORKSHOP
                                                     TALLINN, ESTONIA     4
EASY ACCESS
Increased activity
Technology advances, more products
What was revolutionary few decades ago with
  establishing the software for chemicals
  registration (e.g. CAS database) is a routine
  nowadays.
 It could be a Computer Science Master project
 or a research project …




                               15TH INTERNATIONAL QSAR WORKSHOP
                                                 TALLINN, ESTONIA   5
EU PROJECTS (AN INCOMPLETE LIST)
 EU FP6/ FP7 : 2-FUN, ACROPOLIS , ACuteTox, AQUATERRA,
  ARTEMIS, CADASTER, CAESAR, carcinoGENOMICS, CONTAMED,
  ESNATS, GENESIS, HEIMSTA, INVITROHEART, LIINTOP,
  MENTRANS, MODELKEY, NOMIRACLE, OSIRIS, PREDICT-IV,
  OpenTox, ReProTect, RISKBASE, RISKCYCLE, Sens-it-iv ,
  VITROCELLOMICS

 Innovative Medicines Initiative – 30 ongoing projects , including eTox,
  OpenPhacts

 Safety Evaluation Ultimately Replacing Animal Testing (SEURAT)
  cluster – 6 research projects




                                       15TH INTERNATIONAL QSAR WORKSHOP
                                                         TALLINN, ESTONIA   6
SCIENTIFIC DATABASES IMPLEMENTATION
   Identify the data model and functionality
   Translate the data model into a database schema
   Implement the database and user interface functionality
   (Optionally) provide libraries or expose (some) of the functionality as web
    services
Advantages
 Use one’s favourite technology and jump directly into implementation
 Attract end-users with nice GUI relatively quickly
 Relatively easy to persuade funding organisations this will be a useful resource
Disadvantages
 Proliferation of incompatible resources, providing similar functionality, but incompatible
  programming interface
 Difficult to extract and collate data automatically


                       This is not the only nor the best way!
                                                  15TH INTERNATIONAL QSAR WORKSHOP
                                                                    TALLINN, ESTONIA           7
EASY ACCESS , HENCE MANY NEW SYSTEMS

 SOFTWARE SYSTEMS                     THE SILO EFFECT

Life sciences software                Silo storage system
 and toxicology in particular         Designed to store one single
                                        type of grain.
Live in their own world
 Mostly developed independently      Information silo
 Compatibility is rarely perceived    Rigid design
  as a primary design goal.            No easy exchange of information
Lack of communication                  No integration with other systems
  and common goals




                                      15TH INTERNATIONAL QSAR WORKSHOP
                                                        TALLINN, ESTONIA   8
THE NEED OF INTEGRATION
2005: “Integrated Informatics in Life and
  Materials Sciences: An Oxymoron?” *
 Calculations, Descriptors, Statistics, Models
 Data (chemical structures, properties, predictions)
Approaches toward integration:
Workflow management systems
Standalone container applications (chassis)
Web applications
Web services, web mashups

                     * Gilardoni, F., Curcin, V., Karunanayake, K., Norgaard,
                     J., & Guo, Y. (2005). QSAR Combinatorial Science, 24(1),
                     120-130.
                                    15TH INTERNATIONAL QSAR WORKSHOP
                                                      TALLINN, ESTONIA     9
WORKFLOW MANAGEMENT SYSTEMS
                                   Disadvantages
Commercial                          Overhead: 30% of the tasks
 Accelrys Pipeline Pilot            defined and run in workflows are
                                     related to data conversion,rather
Open source:                         than data analysis.
 Kepler, KNIME, Taverna, Triana    Convenience:
                                      Domain experts often prefer simple
Workflow repositories                  user interfaces or software with
                                       predefined functionality;
 MyExperiment                        Power users prefer scripting
  http://www.myexperiment.org/         languages, rather than GUIs or
                                       graphical workflow builders with their
                                       specific constraints.
Advantages                          Compatibility: Nodes are not
 Flexibility                        transferable between workflow
 Reproducibility                    engines




                                   15TH INTERNATIONAL QSAR WORKSHOP
                                                     TALLINN, ESTONIA      10
INTEGRATION PLATFORMS
C O N TA I N E R
A P P L I C AT I O N S            WEB TOOLS
                                   ChemBench
OECD Toolbox                      ChemMine
   Windows only                   Collaborative Drug Discovery
   OECD / ECHA funded              (CDD)
                                   OCHEM
Bioclipse                         OpenTox
   Multiplatform, Java based;
                                  Recently reviewed in
    standard OSGI interface for
                                  Jeliazkova, N. (2012). Web tools for
    modules                       predictive toxicology model building.
   Open Source                   Expert opinion on drug metabolism
                                  & toxicology, 8(5), 1-11.




                                  15TH INTERNATIONAL QSAR WORKSHOP
                                                    TALLINN, ESTONIA   11
CHEMBENCH *

SCREENSHOT                      WORD CLOUD




             * Walker, T., Grulke, C. M., Pozefsky, D., & Tropsha, A. (2010).
             Chembench: a cheminformatics workbench. Bioinformatics (Oxford,
             England), 26(23), 1-2.

                                15TH INTERNATIONAL QSAR WORKSHOP
                                                  TALLINN, ESTONIA      12
CHEMMINE TOOLS *

SCREENSHOT                     WORD CLOUD




             *Backman, T. W. H., Cao, Y., & Girke, T. (2011). ChemMine tools:
             an online service for analyzing and clustering small molecules.
             Nucleic Acids Research, 39(Web Server issue), W486-W491.
             Oxford University Press.
                               15TH INTERNATIONAL QSAR WORKSHOP
                                                 TALLINN, ESTONIA       13
COLLABORATIVE DRUG DISCOVERY (CDD) *

SCREENSHOT                      WORD CLOUD




             * Hohman, M., Gregory, K., Chibale, K., Smith, P. J., Ekins, S., &
             Bunin, B. (2009). Novel web-based tools combining chemistry
             informatics, biology and social networks for drug discovery. Drug
             Discovery Today, 14(5-6), 261-270. Elsevier Ltd.
                                15TH INTERNATIONAL QSAR WORKSHOP
                                                  TALLINN, ESTONIA         14
OCHEM *

SCREENSHOT                      WORD CLOUD




             * Sushko, I., Novotarskyi, S., Körner, et al. (2011). Online chemical
             modeling environment (OCHEM): web platform for data storage,
             model development and publishing of chemical information. Journal
             of computer-aided molecular design,
                                 15TH INTERNATIONAL QSAR WORKSHOP
                                                   TALLINN, ESTONIA         15
OPENTOX *,**
SCREENSHOT                              WORLD CLOUD




        *Hardy, B., et al. (2010). Collaborative development of predictive toxicology
        applications. Journal of cheminformatics, 2(1), 7.
        **Jeliazkova, N., et. al. (2011). AMBIT RESTful web services: an
        implementation of the OpenTox application programming interface. Journal
        of cheminformatics, 3, 18. 1 5 T H I N T E R N A T I O N A L Q S A R W O R K S H O P
                                                                    TALLINN, ESTONIA         16
MORE WEB PROJECTS




            *Jeliazkova, N. (2012). Web tools for predictive toxicology
            model building. Expert opinion on drug metabolism & toxicology,
            8(5), 1-11.

                              15TH INTERNATIONAL QSAR WORKSHOP
                                                TALLINN, ESTONIA       17
OPENTOX ALGORITHMS
Uniform interface: (OpenTox web services API)
 Descriptor calculation, feature selection;
 Classification and regression algorithms;
 Rule based algorithms;
 Applicability domain algorithms;
 Visualization, similarity and substructure queries ;
 Composite algorithms (workflows);
 Structure optimization, metabolite generation, tautomer
  generation, etc.

               an algorithm i/ˈælɡərɪðəm/ is a step-by-step procedure for
               calculations. Algorithms are used for calculation, data
               processing, and automated reasoning.
                                   15TH INTERNATIONAL QSAR WORKSHOP
                                                     TALLINN, ESTONIA   18
UNIFORM APPROACH TO DATA PROCESSING
  Read data from a web address – process – write to a web address



     Dataset             +    Algorithm          =
                                                              Dataset
                                                                  =
     GET                        GET                           GET
     POST                       POST                          POST
     PUT                        PUT                           PUT
     DELETE                     DELETE                        DELETE


http://myhost1.com/dataset/trainingset1

                 http://myhost2.com/algorithm/{myalgorithm}
                                              http://myhost3.com/dataset/results

                         Once we have a set of uniform building
                         blocks, we can build new tools.
                                          15TH INTERNATIONAL QSAR WORKSHOP
                                                            TALLINN, ESTONIA   19
UNIFORM APPROACH TO MODELS BUILDING
  Read data from a web address – process – write to a web address



     Dataset             +     Algorithm
                                                  =            Model
                                                                  =
     GET                         GET                           GET
     POST                        POST                          POST
     PUT                         PUT                           PUT
     DELETE                      DELETE                        DELETE


http://myhost1.com/dataset/trainingset1

                http://myhost2.com/algorithm/{myalgorithm}
                                   http://myhost.com/model/predictivemodel1


                         Once we have a set of uniform building
                         blocks, we can build new tools.
                                           15TH INTERNATIONAL QSAR WORKSHOP
                                                             TALLINN, ESTONIA   20
ALGORITHM & MODEL SERVICES




          Model management is obtained as a side effect
                            15TH INTERNATIONAL QSAR WORKSHOP
                                              TALLINN, ESTONIA   21
OPENTOX WEB SERVICES AS BUILDING BLOCKS

W E B A P P L I C AT I O N S             BIOCLIPSE-OPENTOX*
• http://ToxPredict.org –
  aggregates remote
  predictions
• OpenTox Wrapper for
  OCHEM/CADASTER web
  services
• QMRF database
• Applicability domain used
  by CADASTER web site

                    *Willighagen E., Jeliazkova N., Hardy B., Grafstrom R., Spjuth
                    O., Computational toxicology using the OpenTox application
                    programming interface and Bioclipse, BMC Research
                    Notes 2011 4 (1), 487
                                          15TH INTERNATIONAL QSAR WORKSHOP
                                                            TALLINN, ESTONIA     22
HOW TO CREATE/PUBLISH A MODEL
Upload the training set and rebuild the model;
 Example: Upload the Open Melting Point dataset and run the linear
regression or random forest algorithm
Develop an OpenTox API compatible solutions, allowing
to train and run predictive models;
 Example : Lazar models, OpenTox partner models (BG, DE, CH, GR, RU)
Use thin wrappers for third-party models, and exposing
them through the compatible web service API.
 OCHEM models are exposed as OpenTox API compatible models
using this approach
                      All models become potentially visible to client
                      applications (ToxPredict, Bioclipse),
                      subject to access rights.

                                      15TH INTERNATIONAL QSAR WORKSHOP
                                                        TALLINN, ESTONIA   23
OPEN MELTING POINT DATASET




      Uploading a dataset makes it structure and similarity
      searchable; access rights could be controlled. An unique
      URI is assigned. This is a web service as well.
                                    15TH INTERNATIONAL QSAR WORKSHOP     24
                                                      TALLINN, ESTONIA
LOAD DATA, BUILD MODEL, APPLY MODEL




                    15TH INTERNATIONAL QSAR WORKSHOP
                                      TALLINN, ESTONIA   25
MELTING POINT MODELS COMPARISON




                   15TH INTERNATIONAL QSAR WORKSHOP
                                     TALLINN, ESTONIA   26
EXAMPLE: REPDOSE DATASETS




             Similarity search results


                         15TH INTERNATIONAL QSAR WORKSHOP
                                           TALLINN, ESTONIA   27
PUBLISH/SHARE A MODEL
Upload the training set and rebuild the model;
 Use existing algorithms (descriptors, statistics), under existing license.
Host on existing servers.
Develop an OpenTox API compatible solutions, allowing to
train and run predictive models;
 Guaranteed exact reproduction of the model!
 Any license (closed or open source). Host anywhere.
Use thin wrappers for third-party models, and exposing them
through the compatible web service API.
 Guaranteed exact reproduction of the model!
 Any license (closed or open source). Host anywhere.

                         Any OpenTox resource could be assigned restricted
                         access


                                          15TH INTERNATIONAL QSAR WORKSHOP
                                                            TALLINN, ESTONIA   28
PUBLISH/SHARE A MODEL
The advantage of a clean underlying API
 Compounds, properties, dataset, algorithms and models
Any kind of processing
 Could be formalized as algorithm or a model
 Applicability domain is an algorithm
 Structural alerts are models! Models of a human expertise
 We don’t need a separate database to handle structural
  alerts
 Just publish the structural alerts as models and apply the
  models to your dataset!




                                15TH INTERNATIONAL QSAR WORKSHOP
                                                  TALLINN, ESTONIA   29
DATASET COMPARISON
 USING OPENTOX ALGORITHMS AND MODELS
      ECHA Preregistration list                        PubChem
100                                     100
 90                                      90
 80                                      80
 70                                      70
 60                                      60
 50                                      50
 40                                      40
 30                                      30
 20                                      20
 10                               No     10                                    No
  0                                       0
                                  Yes                                          Yes




                                          15TH INTERNATIONAL QSAR WORKSHOP
                                                            TALLINN, ESTONIA        30
MODELS FOUND IN THE LITERATURE :AMBIGUOUS!
Cramer rules – textual description
 Q29. Readily Hydrolysed
 How to implement – abiotic / biotic ?
  Context: oral toxicity
  Yet there are different implementations
 Knowledge of the model internals is essential; compare
  predictions!
Substructure alerts
 Have come a long way since the practice of publishing only
  textual description.
 Mostly SMARTS
 Custom languages in some of the tools


                    No established procedure to extend
                    SMARTS or SMILES – thus all the custom
                    extensions.    15TH INTERNATIONAL QSAR WORKSHOP
                                                   TALLINN, ESTONIA   31
SMARTS
Multiple implementations
 Not entirely compatible
 No real standard, except the Daylight web page
Steep learning curve (even for experts)
 Q: Does SMARTS query [CH2][NH2] match [CH2][NH2+]?
 A: YES. Any attribute which is not specified in the SMARTS is
  not tested. So if you do not mention formal charge for an atom,
  any charge is allowed. The same is true for any other query
  attribute.
Validate published structural alerts!
 Publishing structural alerts should be no different than publishing
  QSAR models and require validation dataset

                    http://blueobelisk.shapado.com/questions/does-smarts-
                    query-ch2-nh2-match-ch2-nh2

                                     15TH INTERNATIONAL QSAR WORKSHOP
                                                       TALLINN, ESTONIA     32
THE EXPERT KNOWLEDGE IS AMBIGUOUS
Mode of actions
Adverse outcome pathways
 Effectopedia http://www.qsari.org/index.php/software/100-effectopedia
  – an attempt to gather the expert knowledge
   Hopefully will not become yet another silo
   Could be useful, if adopting the approach biology and bioinformatics are using to capture
    similar information
Ontologies; semantic annotation
 Knowledge representation with strong ground in logic and computer
  science
 Allow automatic reasoning
 Many computer frameworks
 Huge amount of biology/bioinformatic resources

                           Ontology:
                           A formal, shared conceptualization of a domain

                                                  15TH INTERNATIONAL QSAR WORKSHOP
                                                                    TALLINN, ESTONIA            33
CHEBI




        Chemical entities of biological interest
        http://www.ebi.ac.uk/chebi/

                       15TH INTERNATIONAL QSAR WORKSHOP
                                         TALLINN, ESTONIA   34
CHEBI




        15TH INTERNATIONAL QSAR WORKSHOP
                          TALLINN, ESTONIA   35
OBO FOUNDRY




              15TH INTERNATIONAL QSAR WORKSHOP
                                TALLINN, ESTONIA   36
BIOPORTAL




            15TH INTERNATIONAL QSAR WORKSHOP
                              TALLINN, ESTONIA   37
BIOPORTAL




            Search BioPortal for “hepatocellular necrosis”


                          15TH INTERNATIONAL QSAR WORKSHOP
                                            TALLINN, ESTONIA   38
WIKI PATHWAYS




                15TH INTERNATIONAL QSAR WORKSHOP
                                  TALLINN, ESTONIA   39
LINKED OPEN DATA CLOUD




           http://www.w3.org/DesignIssues/LinkedData.html
           SPARQL: http://linkedlifedata.com/sparql


                           15TH INTERNATIONAL QSAR WORKSHOP
                                             TALLINN, ESTONIA   40
BIOINFORMATICS
 Proliferation of databases
 Compatibility is an issue
 Many (relatively recent) open data
  initiatives / standardisation efforts.
 MIABE: Minimum Information about a
  Bioactive Entity, Nature Reviews Drug
  Discovery (2011): “Industry and
  academic actors from chemistry world
  agree on new bioactive molecule
  standard”
 ISA-TAB: Susanna-Assunta Sansone
  et. al., Toward interoperable
  bioscience data, Nature
  Genetics 44, 121–126 (2012)
                   http://isa-tools.org


                                           15TH INTERNATIONAL QSAR WORKSHOP
                                                             TALLINN, ESTONIA   41
CHEMINFORMATICS
 Historically, the cheminformatics world has been driven by
  de facto standards, developed and proposed by different
  vendors.
 Examples: SDF, MOL, SMILES, PDB
 No agreed way to modify or extend the formats!
 A number of initiatives (relatively recent), have adopted
  open standardisation procedures
 Examples: InChI, CML, BlueObelisk initiatives, ToXML
 No requirements for independent interoperable
  implementations so far




                                15TH INTERNATIONAL QSAR WORKSHOP
                                                  TALLINN, ESTONIA   42
DATA STANDARDS (LACK OF)
ID 73  CAS     00091-66-7       NAME      N,N-Diethylaniline WA    2.873
       Mv      0.58     H-073   0         nCb-     1         MAXDP 0.333
       nN      1        Log1/LC50 Exp     3.959    Y-Pred. 3.51    Hat
       0.034
OpenBabel11300911173D

 26 26 0 0 0 0 0 0 0 0999 V2000
   0.2812 0.8575 -0.6609 N 0 0 0 0 0
  -0.3126 2.0484 -1.0695 C 0 0 0 0 0
…… [output skipped]
11 25 1 0 0 0
 11 26 1 0 0 0
M END
$$$$




                       SDF files come in many different flavours

                                        15TH INTERNATIONAL QSAR WORKSHOP
                                                          TALLINN, ESTONIA   43
DATA STANDARDS (LACK OF)
    CPDBAS:                                         QMRF
    Carcinogenic Potency Database                   NAME 0616091047
                                                    56 55 0 0 0 0 0 0 0 0999 V2000
    http://www.epa.gov/ncct/dsstox/sdf_cpdbas.
                                                      0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0
    html#SDFFields
                                                        00000
     ActivityOutcome                                  1.4940 0.0000 0.0000 C 0 0 0 0 0 0 0
     active                                             00000
     unspecified/blank                              …. [output skipped]
     inactive                                       20 55 1 0 0 0 0
                                                    20 56 1 0 0 0 0
ISSCAN:                                             M END
Chemical Carcinogens Database                       > <Eye irritation SP pred.>
http://www.iss.it/ampp/dati/cont.php?id=233&lan     4.380000114440918
g=1&tipo=7
     Canc                                           > <CAS>
                                                    30399-84-9
     3 = carcinogen;
     2 = equivocal;                                 > <Type>
     1 = noncarcinogen                              Testing
                                                    $$$$

                    No common meaning of SDF fields!
                    Each tool provides means to map into its internal semantic.
                                                  15TH INTERNATIONAL QSAR WORKSHOP
                                                                    TALLINN, ESTONIA     44
OPENTOX FEATURES AND ONTOLOGY LINKS




            Every data field is a resource with unique URI and
            metadata assigned (not only name!)

                          15TH INTERNATIONAL QSAR WORKSHOP
                                            TALLINN, ESTONIA   45
DATA CURATION
“JUST structure-name validation is a long,
  torturous, iterative task.”
Antony Williams, RSC, “ChemSpider: A Crowdsourcing Environment
for Hosting and Validating Chemistry Resources”,
5th Meeting on U.S. Government Chemical Databases and Open
Chemistry, Frederick, MD, August 25-26, 2011

Approaches:
• Manual, crowdsourced
• A “standard” workflow - these may differ across toolkits and models!
• Compare with PubChem
• Compare as many sources as possible
                     “obviously , cheminformaticians must only use
                     correct chemical structures and biological activities
                     in their studies”
                                     15TH INTERNATIONAL QSAR WORKSHOP
                                                       TALLINN, ESTONIA   46
OPENTOX DATABASE QUALITY ASSURANCE




 Automatic
                                   Initial Quality Label Assigned
Classification
 Consensus                                      OK

                     Probably OK for the structure that belongs to the majority
  Majority
                 Probably ERROR for the structure(s) that belong(s) to the minority

Ambiguous                          Unknown (multiple sources)
Unconfirmed                         Unknown (single source)

                                          15TH INTERNATIONAL QSAR WORKSHOP
                                                            TALLINN, ESTONIA      47
10
                                                                                        20
                                                                                             30
                                                                                                  40
                                                                                                       50
                                                                                                            60
                                                                                                                 70
                                                                                                                      80
                                                                                                                           90




                                                                              0
                                                                                                                                100
                                                       ChemIDplus (20110503)
                                                   Chemical Identifier Resolver…
                                                        ChemDraw (20110505)
                                                                      CPDBAS
                                                                      DBPCAN
                                                                      EPAFHM
                                                                      FDAMDD
                                                                        HPVCSI
                                                                        HPVISD
                                                                          IRISTR
                                                                         KIERBL
                                                                      NCTRER
                                                                        NTPBSI
                                                                       NTPHTS
                                                                        ISSCAN
                                                                         ISSMIC
                                                                         ISSSTY
                                                                       TOXCST
                                                                       TXCST2
                                                                      ECETOC
                                                                            LLNA
                                                         LLNA-2nd compilation
                                                  Benchmark Data Set for pKa…
                                               Benchmark Data Set for In Silico…
                                                  Bursi AMES Toxicity Dataset
                                                                      EPI_AOP
                                                                      EPI_BCF




                                     OK
                                                                    EPI_BioHC
                                                                    EPI_Biowin
                                                                   EPI_Boil_Pt
                                                                     EPI_Henry
                                                                        EPI_KM
                                                                      EPI_KOA


                                     ProbablyOK
                                                                  EPI_Kowwin
                                                                  EPI_Melt_Pt
                                                                  EPI_PCKOC
                                                                         EPI_VP
                                                               EPI_WaterFrag
                                     Unknown


                                                               EPI_Wskowwin
                                                                      ECBPRS
                                                 NAME2STRUCTURE (OPSIN)
                                                 PubChem Structures + Assays
                                                      Leadscope_carc_level_2
                                                     Leadscope_ccris_genetox
                                                      Leadscope_cder_chronic
                                                     Leadscope_cder_genetox
                                     Probably ERROR




                                                   Leadscope_cder_repro_dev
                                                       Leadscope_cfsan_acute
                                                     Leadscope_cfsan_chronic
                                                    Leadscope_cfsan_genetox
                                                  Leadscope_cfsan_repro_dev
                                               Leadscope_fda_marketed_drugs
15TH INTERNATIONAL QSAR WORKSHOP
                  TALLINN, ESTONIA




                                                  Leadscope_genetox_level_2
                                                      Leadscope_ntp_genetox
                                            Pharmatrope_AERS_hepatobiliary_s…
                                                                                                                                      OPENTOX DATABASE QUALITY LABELS DISTRIBUTION




                                                              Entire database
   48
QMRF DATABASE : NOT ERROR FREE!




                       15TH INTERNATIONAL QSAR WORKSHOP
                                         TALLINN, ESTONIA   49
CHEMICAL STRUCTURE REPRESENTATION
                      IT DEPENDS!




              http://tinyurl.com/smilesquiz
                      15TH INTERNATIONAL QSAR WORKSHOP
                                        TALLINN, ESTONIA   50
CHEMICAL STRUCTURE REPRESENTATION

                      IT DEPENDS!




              http://tinyurl.com/smilesquiz
                      15TH INTERNATIONAL QSAR WORKSHOP
                                        TALLINN, ESTONIA   51
PROLIFERATION OF NEW TOOLS THAT ARE
RARELY ABLE TO TALK TO EACH OTHER.
                             THE SILO EFFECT
• Chemistry & Biology
  software and databases     Silo storage system
  may continue to live in     Designed to store one single
  their own worlds, unless     type of grain.
  we want data shared and
                             Information silo
  tools interoperable.
                              Rigid design
• Interoperability /          No easy exchange of information
  standards may affect        No integration with other systems
  business models.




                             15TH INTERNATIONAL QSAR WORKSHOP
                                               TALLINN, ESTONIA   52
THE INTERNET & INTERNET STANDARDS
Internet Engineering Task Force (IETF) working groups have the responsibility for
    developing and reviewing specifications intended as Internet Standards. The
    process starts by publishing a Request for Comments (RFC) – the goal is peer
    review or to convey new concepts or information.
IETF accepts some RFCs as Internet standards via its three step standardisation
   process. If an RFC is labelled as a Proposed Standard, it needs to be implemented
   by at least two independent and interoperable implementations, further reviewed
   and after correction becomes a Draft Standard.
With a sufficient level of technical maturity, a Draft Standard can become an Internet
   Standard. Organisations such as the World Wide Web consortium and OASIS
   support collaborations of open standards for software interoperability,.
The existence of the Internet itself, based on compatible hardware and software
   components and services is a demonstration of the opportunities offered by
   collaborative innovation, flexibility, interoperability, cost effectiveness and freedom
   of action.


                         An unique example of what society can
                         achieve by adopting common standards
                                                15TH INTERNATIONAL QSAR WORKSHOP
                                                                  TALLINN, ESTONIA       53
SCIENTIFIC DATABASES IMPLEMENTATION
   Identify the data model and functionality
   Translate the data model into a database schema
   Implement the database and user interface functionality
   (Optionally) provide libraries or expose (some) of the functionality as web
    services
Advantages
 Use one’s favourite technology and jump directly into implementation
 Attract end-users with nice GUI relatively quickly
 Relatively easy to persuade funding organisations this will be a useful resource
Disadvantages
 Proliferation of incompatible resources, providing similar functionality, but
  incompatible programming interface
 Difficult to extract and collate data automatically

                      This is not the only nor the best way!
                                              15TH INTERNATIONAL QSAR WORKSHOP
                                                                TALLINN, ESTONIA   54
DON’T LIVE IN THE PAST
“Twenty to thirty years ago, most applications were written
to solve a particular problem and were bound to a single
database. The application was the only way data got into
and out of the database.
Today, data is much more distributed and data consistency,
particularly in the face of extreme scale, poses some very
interesting challenges”




                    http://www.zdnet.com/blog/microsoft/microsoft-big-
                    brains-dave-campbell/1749
                                 15TH INTERNATIONAL QSAR WORKSHOP
                                                   TALLINN, ESTONIA   55
A COMMON API FIRST, MULTIPLE INDEPENDENT
INTEROPERABLE IMPLEMENTATIONS LATER
Advantages:
 Compatibility! Facilitates collation of distributed resources!
 Avoid proliferation of incompatible resources (this however
  only makes sense if the API is adopted beyond a single
  implementation)
 Easy to develop multiple GUI applications, once the
  API/library functionality is in place
Disadvantages: Think first, then implement .
 GUI comes last
 Harder to persuade funding organisations (because
  reviewers usually look for nice GUIs)

          Integration means compatibility and interaction;
          NOT necessarily storing everything on a single place.
                                  15TH INTERNATIONAL QSAR WORKSHOP
                                                    TALLINN, ESTONIA   56
DECENTRALIZED INFORMATION INTEGRATION
 Distributed, yet sufficiently interoperable model for
  information access.
 The future convergence between cheminformatics and
  bioinformatics databases poses new challenges to the
  management and analysis of large data sets.
 Evolution towards the right mix of flexibility, performance,
  scalability, interoperability, sets of unique features offered,
  friendly user interfaces, programmatic access for advanced
  users, platform independence, results reproducibility, curation
  and crowdsourcing utilities, collaborative sharing and secure
  access.

                       Interoperability is a key.
                                  15TH INTERNATIONAL QSAR WORKSHOP
                                                    TALLINN, ESTONIA   57
WHY DOES INTEROPERABILITY MATTER
Facilitates:
 Data , models and prediction results comparison
 A key to decision making
No need to wait until the perfect standard
 emerges
 Annotate and link the sources
 The least powerful standard wins!
There will always be new databases and tools
 Let them talk to each other




                                15TH INTERNATIONAL QSAR WORKSHOP
                                                  TALLINN, ESTONIA   58
THANK YOU!

Acknowledgments:




           OpenTox REST API
           http://opentox.org/dev/apis
           Download AMBIT Implementation of OpenTox
           API and launch your OpenTox service
           http://ambit.sourceforge.net               59

Weitere ähnliche Inhalte

Ähnlich wie Linking the silos. Data and predictive models integration in toxicology.

Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceAndrew Sallans
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowEric Stephan
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-upopen_phacts
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
 
Oracle Exadata
Oracle Exadata Oracle Exadata
Oracle Exadata Mainstay
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...open_phacts
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionPaul Groth
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...Jose Manuel Gómez-Pérez
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityUniversity Medicine Greifswald
 
Let's work towards a #SaferWorldbyDesign - Knowledge-integrated Toxicology an...
Let's work towards a #SaferWorldbyDesign - Knowledge-integrated Toxicology an...Let's work towards a #SaferWorldbyDesign - Knowledge-integrated Toxicology an...
Let's work towards a #SaferWorldbyDesign - Knowledge-integrated Toxicology an...Barry Hardy
 
Kuchinsky_Cytoscape_BOSC2009
Kuchinsky_Cytoscape_BOSC2009Kuchinsky_Cytoscape_BOSC2009
Kuchinsky_Cytoscape_BOSC2009bosc
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
 
Outlook for 2011: Four Trends in Translational Genomics
Outlook for 2011: Four Trends in Translational GenomicsOutlook for 2011: Four Trends in Translational Genomics
Outlook for 2011: Four Trends in Translational Genomicssimon8duke
 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability InstituteNeil Chue Hong
 
NanoAgents: Molecular Docking Using Multi-Agent Technology
NanoAgents: Molecular Docking Using Multi-Agent TechnologyNanoAgents: Molecular Docking Using Multi-Agent Technology
NanoAgents: Molecular Docking Using Multi-Agent TechnologyCSCJournals
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovationopen_phacts
 

Ähnlich wie Linking the silos. Data and predictive models integration in toxicology. (20)

Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
Oracle Exadata
Oracle Exadata Oracle Exadata
Oracle Exadata
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
LogMap in Practice OM 2019
LogMap in Practice OM 2019LogMap in Practice OM 2019
LogMap in Practice OM 2019
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tension
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusability
 
Let's work towards a #SaferWorldbyDesign - Knowledge-integrated Toxicology an...
Let's work towards a #SaferWorldbyDesign - Knowledge-integrated Toxicology an...Let's work towards a #SaferWorldbyDesign - Knowledge-integrated Toxicology an...
Let's work towards a #SaferWorldbyDesign - Knowledge-integrated Toxicology an...
 
Kuchinsky_Cytoscape_BOSC2009
Kuchinsky_Cytoscape_BOSC2009Kuchinsky_Cytoscape_BOSC2009
Kuchinsky_Cytoscape_BOSC2009
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Outlook for 2011: Four Trends in Translational Genomics
Outlook for 2011: Four Trends in Translational GenomicsOutlook for 2011: Four Trends in Translational Genomics
Outlook for 2011: Four Trends in Translational Genomics
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability Institute
 
NanoAgents: Molecular Docking Using Multi-Agent Technology
NanoAgents: Molecular Docking Using Multi-Agent TechnologyNanoAgents: Molecular Docking Using Multi-Agent Technology
NanoAgents: Molecular Docking Using Multi-Agent Technology
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 

Kürzlich hochgeladen

Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 

Kürzlich hochgeladen (20)

Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 

Linking the silos. Data and predictive models integration in toxicology.

  • 1. NINA JELIAZKOVA IdeaConsult Ltd. Sofia, Bulgaria www.ideaconsult.net 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA
  • 2. LAST DECADE CHANGES Biology, Bioinformatics  Data, human genome Internet  Online databases  Online collaboration  Crowd Sourcing Open access, open source  Database management systems Williams J M et al. Brief Bioinform 2010;11:598-609 Although biological databases are  Machine learning and statistics only a fraction of all available  Cheminformatics bioinformatics software resources, their rise is representative of the overall growth of these resources, and is concurrent with the number of base pairs released in GenBank. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 2
  • 3. OPEN SOURCE CHEMINFORMATICS Noel M O'Boyle et.al. Open Data, Open Source and Open Standards in A.Dalke EuroQSAR 2008 poster chemistry: The Blue Obelisk five years on. Journal of Cheminformatics 2011, 3:37 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 3
  • 4. DATABASES  2001: Binding DB  2006: Chemical Structure>80 mln Lookup Service (CSLS) structures  2002: NIAID ChemDB  2007: ChemSpider HIV/AIDS Database  2008/2009: ChEMBL  2003 Ligand Depot  2009: Chemical Identifier  2004: ZINC database Resolver (CIR)  2004: PDBbind database  2003: First Nucleic Acids  2004: PubChem >30 mln Research (NAR) annual structures Database issue  2004: sc-PDB  > 1000 databases published  2004: Binding MOAD per year since 2008;  2005: DrugBank  >1300 in 2011 Marc Nicklaus, 5th Meeting on U.S. Government Chemical Databases and Open Chemistry, 2011 http://cactus.nci.nih.gov/presentations/meeting-08- 2011/meeting-2011-08-25.html 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 4
  • 5. EASY ACCESS Increased activity Technology advances, more products What was revolutionary few decades ago with establishing the software for chemicals registration (e.g. CAS database) is a routine nowadays.  It could be a Computer Science Master project  or a research project … 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 5
  • 6. EU PROJECTS (AN INCOMPLETE LIST)  EU FP6/ FP7 : 2-FUN, ACROPOLIS , ACuteTox, AQUATERRA, ARTEMIS, CADASTER, CAESAR, carcinoGENOMICS, CONTAMED, ESNATS, GENESIS, HEIMSTA, INVITROHEART, LIINTOP, MENTRANS, MODELKEY, NOMIRACLE, OSIRIS, PREDICT-IV, OpenTox, ReProTect, RISKBASE, RISKCYCLE, Sens-it-iv , VITROCELLOMICS  Innovative Medicines Initiative – 30 ongoing projects , including eTox, OpenPhacts  Safety Evaluation Ultimately Replacing Animal Testing (SEURAT) cluster – 6 research projects 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 6
  • 7. SCIENTIFIC DATABASES IMPLEMENTATION  Identify the data model and functionality  Translate the data model into a database schema  Implement the database and user interface functionality  (Optionally) provide libraries or expose (some) of the functionality as web services Advantages  Use one’s favourite technology and jump directly into implementation  Attract end-users with nice GUI relatively quickly  Relatively easy to persuade funding organisations this will be a useful resource Disadvantages  Proliferation of incompatible resources, providing similar functionality, but incompatible programming interface  Difficult to extract and collate data automatically This is not the only nor the best way! 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 7
  • 8. EASY ACCESS , HENCE MANY NEW SYSTEMS SOFTWARE SYSTEMS THE SILO EFFECT Life sciences software Silo storage system  and toxicology in particular  Designed to store one single type of grain. Live in their own world  Mostly developed independently Information silo  Compatibility is rarely perceived  Rigid design as a primary design goal.  No easy exchange of information Lack of communication  No integration with other systems and common goals 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 8
  • 9. THE NEED OF INTEGRATION 2005: “Integrated Informatics in Life and Materials Sciences: An Oxymoron?” *  Calculations, Descriptors, Statistics, Models  Data (chemical structures, properties, predictions) Approaches toward integration: Workflow management systems Standalone container applications (chassis) Web applications Web services, web mashups * Gilardoni, F., Curcin, V., Karunanayake, K., Norgaard, J., & Guo, Y. (2005). QSAR Combinatorial Science, 24(1), 120-130. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 9
  • 10. WORKFLOW MANAGEMENT SYSTEMS Disadvantages Commercial  Overhead: 30% of the tasks  Accelrys Pipeline Pilot defined and run in workflows are related to data conversion,rather Open source: than data analysis.  Kepler, KNIME, Taverna, Triana  Convenience:  Domain experts often prefer simple Workflow repositories user interfaces or software with predefined functionality;  MyExperiment  Power users prefer scripting http://www.myexperiment.org/ languages, rather than GUIs or graphical workflow builders with their specific constraints. Advantages  Compatibility: Nodes are not  Flexibility transferable between workflow  Reproducibility engines 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 10
  • 11. INTEGRATION PLATFORMS C O N TA I N E R A P P L I C AT I O N S WEB TOOLS  ChemBench OECD Toolbox  ChemMine  Windows only  Collaborative Drug Discovery  OECD / ECHA funded (CDD)  OCHEM Bioclipse  OpenTox  Multiplatform, Java based; Recently reviewed in standard OSGI interface for Jeliazkova, N. (2012). Web tools for modules predictive toxicology model building.  Open Source Expert opinion on drug metabolism & toxicology, 8(5), 1-11. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 11
  • 12. CHEMBENCH * SCREENSHOT WORD CLOUD * Walker, T., Grulke, C. M., Pozefsky, D., & Tropsha, A. (2010). Chembench: a cheminformatics workbench. Bioinformatics (Oxford, England), 26(23), 1-2. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 12
  • 13. CHEMMINE TOOLS * SCREENSHOT WORD CLOUD *Backman, T. W. H., Cao, Y., & Girke, T. (2011). ChemMine tools: an online service for analyzing and clustering small molecules. Nucleic Acids Research, 39(Web Server issue), W486-W491. Oxford University Press. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 13
  • 14. COLLABORATIVE DRUG DISCOVERY (CDD) * SCREENSHOT WORD CLOUD * Hohman, M., Gregory, K., Chibale, K., Smith, P. J., Ekins, S., & Bunin, B. (2009). Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery. Drug Discovery Today, 14(5-6), 261-270. Elsevier Ltd. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 14
  • 15. OCHEM * SCREENSHOT WORD CLOUD * Sushko, I., Novotarskyi, S., Körner, et al. (2011). Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. Journal of computer-aided molecular design, 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 15
  • 16. OPENTOX *,** SCREENSHOT WORLD CLOUD *Hardy, B., et al. (2010). Collaborative development of predictive toxicology applications. Journal of cheminformatics, 2(1), 7. **Jeliazkova, N., et. al. (2011). AMBIT RESTful web services: an implementation of the OpenTox application programming interface. Journal of cheminformatics, 3, 18. 1 5 T H I N T E R N A T I O N A L Q S A R W O R K S H O P TALLINN, ESTONIA 16
  • 17. MORE WEB PROJECTS *Jeliazkova, N. (2012). Web tools for predictive toxicology model building. Expert opinion on drug metabolism & toxicology, 8(5), 1-11. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 17
  • 18. OPENTOX ALGORITHMS Uniform interface: (OpenTox web services API)  Descriptor calculation, feature selection;  Classification and regression algorithms;  Rule based algorithms;  Applicability domain algorithms;  Visualization, similarity and substructure queries ;  Composite algorithms (workflows);  Structure optimization, metabolite generation, tautomer generation, etc. an algorithm i/ˈælɡərɪðəm/ is a step-by-step procedure for calculations. Algorithms are used for calculation, data processing, and automated reasoning. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 18
  • 19. UNIFORM APPROACH TO DATA PROCESSING Read data from a web address – process – write to a web address Dataset + Algorithm = Dataset = GET GET GET POST POST POST PUT PUT PUT DELETE DELETE DELETE http://myhost1.com/dataset/trainingset1 http://myhost2.com/algorithm/{myalgorithm} http://myhost3.com/dataset/results Once we have a set of uniform building blocks, we can build new tools. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 19
  • 20. UNIFORM APPROACH TO MODELS BUILDING Read data from a web address – process – write to a web address Dataset + Algorithm = Model = GET GET GET POST POST POST PUT PUT PUT DELETE DELETE DELETE http://myhost1.com/dataset/trainingset1 http://myhost2.com/algorithm/{myalgorithm} http://myhost.com/model/predictivemodel1 Once we have a set of uniform building blocks, we can build new tools. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 20
  • 21. ALGORITHM & MODEL SERVICES Model management is obtained as a side effect 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 21
  • 22. OPENTOX WEB SERVICES AS BUILDING BLOCKS W E B A P P L I C AT I O N S BIOCLIPSE-OPENTOX* • http://ToxPredict.org – aggregates remote predictions • OpenTox Wrapper for OCHEM/CADASTER web services • QMRF database • Applicability domain used by CADASTER web site *Willighagen E., Jeliazkova N., Hardy B., Grafstrom R., Spjuth O., Computational toxicology using the OpenTox application programming interface and Bioclipse, BMC Research Notes 2011 4 (1), 487 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 22
  • 23. HOW TO CREATE/PUBLISH A MODEL Upload the training set and rebuild the model;  Example: Upload the Open Melting Point dataset and run the linear regression or random forest algorithm Develop an OpenTox API compatible solutions, allowing to train and run predictive models;  Example : Lazar models, OpenTox partner models (BG, DE, CH, GR, RU) Use thin wrappers for third-party models, and exposing them through the compatible web service API.  OCHEM models are exposed as OpenTox API compatible models using this approach All models become potentially visible to client applications (ToxPredict, Bioclipse), subject to access rights. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 23
  • 24. OPEN MELTING POINT DATASET Uploading a dataset makes it structure and similarity searchable; access rights could be controlled. An unique URI is assigned. This is a web service as well. 15TH INTERNATIONAL QSAR WORKSHOP 24 TALLINN, ESTONIA
  • 25. LOAD DATA, BUILD MODEL, APPLY MODEL 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 25
  • 26. MELTING POINT MODELS COMPARISON 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 26
  • 27. EXAMPLE: REPDOSE DATASETS Similarity search results 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 27
  • 28. PUBLISH/SHARE A MODEL Upload the training set and rebuild the model;  Use existing algorithms (descriptors, statistics), under existing license. Host on existing servers. Develop an OpenTox API compatible solutions, allowing to train and run predictive models;  Guaranteed exact reproduction of the model!  Any license (closed or open source). Host anywhere. Use thin wrappers for third-party models, and exposing them through the compatible web service API.  Guaranteed exact reproduction of the model!  Any license (closed or open source). Host anywhere. Any OpenTox resource could be assigned restricted access 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 28
  • 29. PUBLISH/SHARE A MODEL The advantage of a clean underlying API  Compounds, properties, dataset, algorithms and models Any kind of processing  Could be formalized as algorithm or a model  Applicability domain is an algorithm  Structural alerts are models! Models of a human expertise  We don’t need a separate database to handle structural alerts  Just publish the structural alerts as models and apply the models to your dataset! 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 29
  • 30. DATASET COMPARISON USING OPENTOX ALGORITHMS AND MODELS ECHA Preregistration list PubChem 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 No 10 No 0 0 Yes Yes 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 30
  • 31. MODELS FOUND IN THE LITERATURE :AMBIGUOUS! Cramer rules – textual description  Q29. Readily Hydrolysed  How to implement – abiotic / biotic ?  Context: oral toxicity  Yet there are different implementations  Knowledge of the model internals is essential; compare predictions! Substructure alerts  Have come a long way since the practice of publishing only textual description.  Mostly SMARTS  Custom languages in some of the tools No established procedure to extend SMARTS or SMILES – thus all the custom extensions. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 31
  • 32. SMARTS Multiple implementations  Not entirely compatible  No real standard, except the Daylight web page Steep learning curve (even for experts)  Q: Does SMARTS query [CH2][NH2] match [CH2][NH2+]?  A: YES. Any attribute which is not specified in the SMARTS is not tested. So if you do not mention formal charge for an atom, any charge is allowed. The same is true for any other query attribute. Validate published structural alerts!  Publishing structural alerts should be no different than publishing QSAR models and require validation dataset http://blueobelisk.shapado.com/questions/does-smarts- query-ch2-nh2-match-ch2-nh2 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 32
  • 33. THE EXPERT KNOWLEDGE IS AMBIGUOUS Mode of actions Adverse outcome pathways  Effectopedia http://www.qsari.org/index.php/software/100-effectopedia – an attempt to gather the expert knowledge  Hopefully will not become yet another silo  Could be useful, if adopting the approach biology and bioinformatics are using to capture similar information Ontologies; semantic annotation  Knowledge representation with strong ground in logic and computer science  Allow automatic reasoning  Many computer frameworks  Huge amount of biology/bioinformatic resources Ontology: A formal, shared conceptualization of a domain 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 33
  • 34. CHEBI Chemical entities of biological interest http://www.ebi.ac.uk/chebi/ 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 34
  • 35. CHEBI 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 35
  • 36. OBO FOUNDRY 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 36
  • 37. BIOPORTAL 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 37
  • 38. BIOPORTAL Search BioPortal for “hepatocellular necrosis” 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 38
  • 39. WIKI PATHWAYS 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 39
  • 40. LINKED OPEN DATA CLOUD http://www.w3.org/DesignIssues/LinkedData.html SPARQL: http://linkedlifedata.com/sparql 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 40
  • 41. BIOINFORMATICS  Proliferation of databases  Compatibility is an issue  Many (relatively recent) open data initiatives / standardisation efforts.  MIABE: Minimum Information about a Bioactive Entity, Nature Reviews Drug Discovery (2011): “Industry and academic actors from chemistry world agree on new bioactive molecule standard”  ISA-TAB: Susanna-Assunta Sansone et. al., Toward interoperable bioscience data, Nature Genetics 44, 121–126 (2012)  http://isa-tools.org 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 41
  • 42. CHEMINFORMATICS  Historically, the cheminformatics world has been driven by de facto standards, developed and proposed by different vendors.  Examples: SDF, MOL, SMILES, PDB  No agreed way to modify or extend the formats!  A number of initiatives (relatively recent), have adopted open standardisation procedures  Examples: InChI, CML, BlueObelisk initiatives, ToXML  No requirements for independent interoperable implementations so far 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 42
  • 43. DATA STANDARDS (LACK OF) ID 73 CAS 00091-66-7 NAME N,N-Diethylaniline WA 2.873 Mv 0.58 H-073 0 nCb- 1 MAXDP 0.333 nN 1 Log1/LC50 Exp 3.959 Y-Pred. 3.51 Hat 0.034 OpenBabel11300911173D 26 26 0 0 0 0 0 0 0 0999 V2000 0.2812 0.8575 -0.6609 N 0 0 0 0 0 -0.3126 2.0484 -1.0695 C 0 0 0 0 0 …… [output skipped] 11 25 1 0 0 0 11 26 1 0 0 0 M END $$$$ SDF files come in many different flavours 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 43
  • 44. DATA STANDARDS (LACK OF) CPDBAS: QMRF Carcinogenic Potency Database NAME 0616091047 56 55 0 0 0 0 0 0 0 0999 V2000 http://www.epa.gov/ncct/dsstox/sdf_cpdbas. 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 html#SDFFields 00000 ActivityOutcome 1.4940 0.0000 0.0000 C 0 0 0 0 0 0 0 active 00000 unspecified/blank …. [output skipped] inactive 20 55 1 0 0 0 0 20 56 1 0 0 0 0 ISSCAN: M END Chemical Carcinogens Database > <Eye irritation SP pred.> http://www.iss.it/ampp/dati/cont.php?id=233&lan 4.380000114440918 g=1&tipo=7 Canc > <CAS> 30399-84-9 3 = carcinogen; 2 = equivocal; > <Type> 1 = noncarcinogen Testing $$$$ No common meaning of SDF fields! Each tool provides means to map into its internal semantic. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 44
  • 45. OPENTOX FEATURES AND ONTOLOGY LINKS Every data field is a resource with unique URI and metadata assigned (not only name!) 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 45
  • 46. DATA CURATION “JUST structure-name validation is a long, torturous, iterative task.” Antony Williams, RSC, “ChemSpider: A Crowdsourcing Environment for Hosting and Validating Chemistry Resources”, 5th Meeting on U.S. Government Chemical Databases and Open Chemistry, Frederick, MD, August 25-26, 2011 Approaches: • Manual, crowdsourced • A “standard” workflow - these may differ across toolkits and models! • Compare with PubChem • Compare as many sources as possible “obviously , cheminformaticians must only use correct chemical structures and biological activities in their studies” 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 46
  • 47. OPENTOX DATABASE QUALITY ASSURANCE Automatic Initial Quality Label Assigned Classification Consensus OK Probably OK for the structure that belongs to the majority Majority Probably ERROR for the structure(s) that belong(s) to the minority Ambiguous Unknown (multiple sources) Unconfirmed Unknown (single source) 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 47
  • 48. 10 20 30 40 50 60 70 80 90 0 100 ChemIDplus (20110503) Chemical Identifier Resolver… ChemDraw (20110505) CPDBAS DBPCAN EPAFHM FDAMDD HPVCSI HPVISD IRISTR KIERBL NCTRER NTPBSI NTPHTS ISSCAN ISSMIC ISSSTY TOXCST TXCST2 ECETOC LLNA LLNA-2nd compilation Benchmark Data Set for pKa… Benchmark Data Set for In Silico… Bursi AMES Toxicity Dataset EPI_AOP EPI_BCF OK EPI_BioHC EPI_Biowin EPI_Boil_Pt EPI_Henry EPI_KM EPI_KOA ProbablyOK EPI_Kowwin EPI_Melt_Pt EPI_PCKOC EPI_VP EPI_WaterFrag Unknown EPI_Wskowwin ECBPRS NAME2STRUCTURE (OPSIN) PubChem Structures + Assays Leadscope_carc_level_2 Leadscope_ccris_genetox Leadscope_cder_chronic Leadscope_cder_genetox Probably ERROR Leadscope_cder_repro_dev Leadscope_cfsan_acute Leadscope_cfsan_chronic Leadscope_cfsan_genetox Leadscope_cfsan_repro_dev Leadscope_fda_marketed_drugs 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA Leadscope_genetox_level_2 Leadscope_ntp_genetox Pharmatrope_AERS_hepatobiliary_s… OPENTOX DATABASE QUALITY LABELS DISTRIBUTION Entire database 48
  • 49. QMRF DATABASE : NOT ERROR FREE! 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 49
  • 50. CHEMICAL STRUCTURE REPRESENTATION IT DEPENDS! http://tinyurl.com/smilesquiz 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 50
  • 51. CHEMICAL STRUCTURE REPRESENTATION IT DEPENDS! http://tinyurl.com/smilesquiz 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 51
  • 52. PROLIFERATION OF NEW TOOLS THAT ARE RARELY ABLE TO TALK TO EACH OTHER. THE SILO EFFECT • Chemistry & Biology software and databases Silo storage system may continue to live in  Designed to store one single their own worlds, unless type of grain. we want data shared and Information silo tools interoperable.  Rigid design • Interoperability /  No easy exchange of information standards may affect  No integration with other systems business models. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 52
  • 53. THE INTERNET & INTERNET STANDARDS Internet Engineering Task Force (IETF) working groups have the responsibility for developing and reviewing specifications intended as Internet Standards. The process starts by publishing a Request for Comments (RFC) – the goal is peer review or to convey new concepts or information. IETF accepts some RFCs as Internet standards via its three step standardisation process. If an RFC is labelled as a Proposed Standard, it needs to be implemented by at least two independent and interoperable implementations, further reviewed and after correction becomes a Draft Standard. With a sufficient level of technical maturity, a Draft Standard can become an Internet Standard. Organisations such as the World Wide Web consortium and OASIS support collaborations of open standards for software interoperability,. The existence of the Internet itself, based on compatible hardware and software components and services is a demonstration of the opportunities offered by collaborative innovation, flexibility, interoperability, cost effectiveness and freedom of action. An unique example of what society can achieve by adopting common standards 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 53
  • 54. SCIENTIFIC DATABASES IMPLEMENTATION  Identify the data model and functionality  Translate the data model into a database schema  Implement the database and user interface functionality  (Optionally) provide libraries or expose (some) of the functionality as web services Advantages  Use one’s favourite technology and jump directly into implementation  Attract end-users with nice GUI relatively quickly  Relatively easy to persuade funding organisations this will be a useful resource Disadvantages  Proliferation of incompatible resources, providing similar functionality, but incompatible programming interface  Difficult to extract and collate data automatically This is not the only nor the best way! 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 54
  • 55. DON’T LIVE IN THE PAST “Twenty to thirty years ago, most applications were written to solve a particular problem and were bound to a single database. The application was the only way data got into and out of the database. Today, data is much more distributed and data consistency, particularly in the face of extreme scale, poses some very interesting challenges” http://www.zdnet.com/blog/microsoft/microsoft-big- brains-dave-campbell/1749 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 55
  • 56. A COMMON API FIRST, MULTIPLE INDEPENDENT INTEROPERABLE IMPLEMENTATIONS LATER Advantages:  Compatibility! Facilitates collation of distributed resources!  Avoid proliferation of incompatible resources (this however only makes sense if the API is adopted beyond a single implementation)  Easy to develop multiple GUI applications, once the API/library functionality is in place Disadvantages: Think first, then implement .  GUI comes last  Harder to persuade funding organisations (because reviewers usually look for nice GUIs) Integration means compatibility and interaction; NOT necessarily storing everything on a single place. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 56
  • 57. DECENTRALIZED INFORMATION INTEGRATION  Distributed, yet sufficiently interoperable model for information access.  The future convergence between cheminformatics and bioinformatics databases poses new challenges to the management and analysis of large data sets.  Evolution towards the right mix of flexibility, performance, scalability, interoperability, sets of unique features offered, friendly user interfaces, programmatic access for advanced users, platform independence, results reproducibility, curation and crowdsourcing utilities, collaborative sharing and secure access. Interoperability is a key. 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 57
  • 58. WHY DOES INTEROPERABILITY MATTER Facilitates:  Data , models and prediction results comparison  A key to decision making No need to wait until the perfect standard emerges  Annotate and link the sources  The least powerful standard wins! There will always be new databases and tools  Let them talk to each other 15TH INTERNATIONAL QSAR WORKSHOP TALLINN, ESTONIA 58
  • 59. THANK YOU! Acknowledgments: OpenTox REST API http://opentox.org/dev/apis Download AMBIT Implementation of OpenTox API and launch your OpenTox service http://ambit.sourceforge.net 59