SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Supported by




Prominent international speakers from




             h"p://workshop.eisbm.eu1
IntelliGO semantic similarity measure
     for Gene Ontology annotations



                            Marie-Dominique Devignes
Laboratoire Lorrain de Recherche en Informatique et ses
                                   Applications (LORIA)
             Equipe Orpailleur – INRIA Nancy Grand-Est
Outline of the talk

1.    Introduction: Knowledge Discovery guided by Domain Knowledge

2.    IntelliGO : a semantic similarity measure for GO annotations

3.    IntelliGO-based clustering of genes

4.    IntelliGO-based abstraction for data mining

5.    Conclusion




Lyon, 14 juin 2012                                                   2/42
LORIA, Team Orpailleur
    Making data talk : from data to knowledge
                                                      KDD*

                         K
                                            Information        K
                  Information



                     Data
                                               Data

      Static picture : pyramid        Dynamic picture : loop
                                                   * KDD : Knowledge Discovery
                                                       from Databases

    Lyon, 14 juin 2012                                                  3
The data deluge…
                                                 Données NGS


   Croissance de EMBL                                                              !
                                                                            xity
90 000 000
                                   A
                                AT                                       ple
80 000 000


                            ig D                                    om
70 000 000
60 000 000
50 000 000
                           B     !                       EST
                                                                   C
40 000 000                                               non-EST
30 000 000                                               WGS
20 000 000
10 000 000
        0
             1992   1994   1997   2000    2004    2009




                                                  t
                                              r ma i t y !
                                            Fo ene                                      No
                                             rog                              knowledge
                                         hete                                  without
                                                                             knowledge …
                                                                              (Amedeo Napoli)


      Lyon, 14 juin 2012                                                                      4
Knowledge Discovery from Databases (KDD)
A three-step iterative process…
                                                         3. Interpretation

                                      2. Data mining


                                                                             Knowledge
           1. Preparation            Formatting

                                                                Rules, patterns
                        Selection

         Integration                               Formatted data                 Expert
                                     Dataset

                        Integrated
                           data

     Database

                                                  …interactively controlled by an expert.

Lyon, 14 juin 2012                                                                         5/42
Knowledge Discovery guided by Domain
Knowledge : KDDK
 Ontologies to assist the expert…                       3. Interpretation

                                      2. Data mining


                                                                            Knowledge
           1. Preparation            Formatting

                                                               Rules, patterns
                        Selection

         Integration                              Formatted data                 Expert
                                     Dataset

                        Integrated
                           data

     Database


                                               … at each step of the process.

Lyon, 14 juin 2012                                                                        6/42
Our scope today :

   How ontologies can be used to improve the data mining step :
        Design of a semantic similarity measure on Gene Ontology

        Use it for functional clustering of genes
              Improvement of gene classification


        Apply it to another ontology (MedDRA) for secondary effects
         clustering
              Data abstraction and dimension reduction
              Essential for execution of data mining programms




Lyon, 14 juin 2012                                                     7/42
Outline of the talk

1.    Introduction: Knowledge Discovery guided by Domain Knowledge

2.    IntelliGO : a semantic similarity measure for GO annotations

3.    IntelliGO-based clustering of genes

4.    IntelliGO-based abstraction for data mining

5.    Conclusion




Lyon, 14 juin 2012                                                   8/42
Semantic similarity in biomedical ontologies
   Biological objects are « annotated » by terms / keywords
       facilitates data retrieval from database


   Annotation terms / Keywords are organized as controlled
    vocabularies
      Tree : e.g. Swissprot Keywords
      Thesaurus : e.g. MESH, UMLS
      Ontologies : e.g. Gene Ontology


   Similarity measure comparing annotations can take advantage of
    semantic relationships between terms /keywords
       « semantic similarity measure »



Lyon, 14 juin 2012                                                   9/42
Semantic similarity of annotations

   General intuition….
                                                     Semantic
                                                   relationships
                                                  between terms


                      GO-t1   GO-t2   GO-t3   …

    Gene1              X       X       O      …              Objects as
    Gene2              X       O       X      …          « feature vectors »

    …


« Euclidean » Score    1       0       0                 1

« Semantic » Score     1      0.5     0.5                2



Lyon, 14 juin 2012                                                  10/42
Brief outlook on GO (1)
   Annotation vocabulary (GO
    consortium since 2000)
        More 30,000 terms
        Millions of annotations (genes and
         proteins in various species) recorded
         in public databases
        Three aspects :
        Biological Process
        Cellular Component
        Molecular Function
   Structured as a Rooted Directed Acyclic
    Graph
      Nodes = terms
      Edges = semantic relationships
              Is_a, part_of, related to
        IMPORTANT : More than one parent        from Pesquita et al., 2009


    Lyon, 14 juin 2012                                               11/42
Example of similar GO annotations

                                Retrieved from Gene at NCBI




   identical GO terms
   similar GO terms
Lyon, 14 juin 2012                               12/42
GO evidence codes
   Traces the origin of each annotation
        IEA (« Inferred from Electronic Annotation ») = not assigned by
         curator
        All others (16) manually assigned
                                 Computational                 Author     Curatorial
      Experimental
                                   analysis                   statement   statement
         EXP                          ISS                       TAS          IC
         IDA                          ISO                       NAS          ND
         IPI                          ISA
         IMP                          ISM
         IGI                          IGC
         IEP                          RCA


« Evidence codes cannot be used as a measure of the quality of the annotation »
    http://www.geneontology.org/GO.evidence.shtml
           To be used as a filter for certain type of annotations

Lyon, 14 juin 2012                                                                     13/42
Semantic similarity measures : overview (1)
      Pesquita et al., 2009: Semantic similarity in biomedical ontologies, PLOS Comp. Biol. July
      2009

      1. Gene-Gene or Protein-Protein comparison
          = comparing two lists of terms (feature vectors)


               « groupwise »                                                   « pairwise »




      Set               Graph             Vector                   All pairs                   Best pairs



                                                           - Lord et al. (2003) : All pairs/ Average/ Resnick, Lin,
- Huang et al. (2007) : Vector/ Kappa-statistics (DAVID)   Jiang measures
- Martin et al. (2004) : Graph/Jaccard                     - Wang et al. (2007) : best pairs/Average/ Wang
                                                           measure

   Lyon, 14 juin 2012                                                                                  14/42
Semantic similarity measures : overview (2)
  2. Term-term comparison
      = exploiting the semantic relationships


      Root                                   Node-based approaches
                                                Information content of
                          __ Depth(LCA)          lowest common ancestor
                          __ SPL(t ,t )          (LCA)
                                  1 2
                                                       Relative to a corpus
                                                   Depth(LCA)
                                                       Independent of corpus


             t1      t2
                                             Edge-based approaches
                                                Shortest Path Length (SPL)



Lyon, 14 juin 2012                                                        15/42
IntelliGO semantic similarity measure (1)
  Benabderrahmane et al., 2010 : BMC Bioinformatics

  1. Term-term comparison
        = combining node-based and edge-based approaches in a DAG

                                             In a tree (MeSH, Ganesan 2003)
      Root
                                                                               2 x Depth(LCA)
                                                   SimGanesan(t1,t2) =
                          __ MaxDepth(LCA)                                   Depth(t1) + Depth(t2)

                          __ MinSPL(t ,t )
                                     1 2

                                               In a rooted DAG (GO, IntelliGO 2010)

                                                   SimIntelliGO(t1,t2)

                                                                         2 x MaxDepth(LCA)
                                                          =
             t1      t2                                       MinSPL(t1,t2) + 2 x MaxDepth(LCA)



Lyon, 14 juin 2012                                                                               16/42
IntelliGO semantic similarity measure (2)
    2. Gene (or protein) representation
   Representation of genes in a vector space model

       g = ∑i α i e i            ei   : basis vector, one per term (ti)
                                 αi   : Coefficient for term ti
   Definition of coefficients
     α i = w(g, ti) x IAF(ti)    w(g, ti)      : weight of evidence code * qualifying the
                                               assignment of feature ti to gène g
                                 IAF(ti)        : « Inverse Annotation Frequency » ~ Information
                                                Content of feature ti in annotation corpus.
                                                                          * When more than one code, take
   Definition of information content                                                  the maximal weight

                        NTOT     NTOT     : Total number of genes in the corpus
      IAF(ti) = log
                         Nti     Nti      : Number of gènes with feature ti




Lyon, 14 juin 2012                                                                               17/42
IntelliGO semantic similarity measure (3)
  3. Gene-gene comparison

    Generalized dot-product between two gene vectors

               g . h = ∑ i,j α i x β j x e i . ej           with e i . ej = SimIntelliGO (ti,tj)
                                                                  1 for i = j
                                                                ≠ 0 for i ≠ j

     Generalized cosine similarity


                                                          g . h
                      SimIntelliGO(g , h ) =
                                                    √g   . g x√ h . h




Lyon, 14 juin 2012                                                                         18/42
Implementation
                                                                                     (Tax_ID, Gene_ID, GO_ID, Evid_Code,
                                                         NCBI AnnotationFile         GO_Def, GO_aspect)
Parameter 1             Species

Parameter 2           GO aspect


Parameter 3    Liste of weights for EC                  CuratedAnnotationFile



                                 IAF calculation                           MaxDepth(LCA                 MinSP
                Genes                                    Terms                               LCA          L          SPL

           (Gene_ID , Array of                                                          (GO_ID pairs,
                                                   (GO_ID, IAF, Array of                                        (GO_ID pairs,
          [GO_ID, Evid-Code])                                                       LCA_Depth, LCA_ID_List
                                                      [Gene_IDs])                                                  SPL)

                    Liste of genes of                                                           Sim(ti, tj ) calculation
                  interest (Gene_IDs)

                                                                                                    IntelliGO
                                       Vector
                                   representation                                                  term-term

                                                    Gene-Gene IntelliGO Similarity Matrix



     Lyon, 14 juin 2012                                                                                                    19/42
IntelliGO on-line
    http://plateforme-mbi.loria.fr/intelligo




Lyon, 14 juin 2012                              20/42
Outline of the talk

1.    Introduction: Knowledge Discovery guided by Domain Knowledge

2.    IntelliGO : a semantic similarity measure for GO annotations

3.    IntelliGO-based clustering of genes

4.    IntelliGO-based abstraction for data mining

5.    Conclusion




Lyon, 14 juin 2012                                                   21/42
Gene clustering with IntelliGO : reference
datasets
   Clustering means grouping together most similar objects and putting in different
    clusters most dissimilar objects
         relies on a similarity/distance measure
   Evaluation purpose : Reference sets of genes
      participating in similar biological process  pathways
      sharing similar molecular functions  Pfam Clans
      In two different species

          Dataset    Species          Source         Number of   Total genes
                                                       sets
              1       Human       KEGG pathways          13         275

              2       Yeast       KEGG pathways          13         169

              3       Human         Pfam Clans           10          94

              4       Yeast         Pfam Clans           10         118


Lyon, 14 juin 2012                                                               22/42
1. Hierarchical clustering for comparing
IntelliGO with other semantic similarity
measures
   For each collection of gene sets
        List all genes from all sets
        Pairwise similarity calculation - > distance matrix
              Lord’s measure (normalized) : based on IC(LCA)
              Al-Mubaid’s measure : based on SPL
              SimGIC measure : based on count of common ancestors
              IntelliGO measure : based on both MaxDepth(LCA) and minSPL


        Hierarchical clustering and heatmap visualisation
              R BioConductor




Lyon, 14 juin 2012                                                          23/42
Heatmap visualisation of hierarchical
  clustering


Lord et al.                               Al-Mubaid
(normalized)




   SIMGIC                                 IntelliGO :
                                          - well-balanced
                                          - suggests fuzziness



  Lyon, 14 juin 2012                              24/42
2. Functional classification : the DAVID
software (1)
   Functional classification = Clustering of functional annotation of genes:
   On-line tool : DAVID (Database for Annotation Visualisation and
    Integrated Discovery) software




Lyon, 14 juin 2012                                                          25/42
DAVID functional classification

    Rich feature vectors : GO terms, domains, Enzyme Classification, Pathways, etc.
    Kappa statistics : similarity measure based on co-occurrence and co-absence of
     features
    Fuzzzy clustering agorithm : one gene can belong to more than one cluster.
    Optimization of K number by varying kappa threshold -> % excluded genes



             GO-t1   GO-t2   GO-t3   …   PfamD1   …
                                                           Counting present and absent
     Gene1      X      X       O     …      X      …       features + ‘Kappa statistics’
     Gene2      X      O       X     …      X      …       Not a semantic similarity measure
                                                           No feature selection possible
     …



                                                                                   26/22
3. IntelliGO-based fuzzy C-means
    IntelliGO distance matrix -> fanny implementation of fuzzy C-means in R
     environment, minimization of objective function

    Remark : not a feature vector matrix + euclidean distance  no centroid calculation,
     no appropriate validity index
    Membership probability matrix: one gene can belong to more than one cluster.
    -> % excluded genes
             C1      C2      C3     …       Membership threshold :               Assignt
                                                     20 %
     Gene1    90 %     0      8%     …                               Gene1          C1

     Gene2    25 %    40 %    0%     …                               Gene2        C1, C2

     Gene3    0%      10 %   13 %   …                                Gene 3        Excl.




                                                                                   27/22
4. Comparison using overlap analysis
                                   R1                R2                     Rn
                    Available
                  classification
                                                                …

   602495
    21259
Input list
    65910    of                               Overlap analysis                                     Global F-score =
    12151
  genes                                      Pair-wise F-scores                                 averageRi(MaxRiF_score)
   157805
     5246
      489
   171553
       ...
                   Functional
                  classification
                                                                    …
                    at various
                     K value        C1                 C2                     Ck
     Precision (Ci, Rj) : % genes present in Ci that belong to Rj                            2 × Precision( , Rj) × Recall( , Rj)
                                                                                                          Ci              Ci
                                                                        F_score (Ci, Rj) =
      Recall (Ci, Rj): % genes from Rj that are found in Ci                                    Precision( , Rj) + Recall( , Rj)
                                                                                                         Ci             Ci



                                                                                                                        28/22
Comparison : optimal F-score and K number

         IntelliGO +Hierarchical        IntelliGO+ fanny               DAVID

Ref        Opt.       Opt.   Excl.    Opt.    Opt.    Excl.    Opt.     Opt.   Excl.
sets      Fscore       K     genes   Fscore    K      genes   Fscore     K     genes
1 (13)     0.58       17     0%       0.70     7       0%      0.67      10    21%
2 (13)     0.57       17     0%       0.74     9       1%      0.68      9     18 %

3 (10)     0.82       18     0%       0.77     15      5%      0.64      11    27 %

4 (10)     0.68       13     0%       0.71     12      2%      0.70      10    41 %




  Functional classification is reliable and robust with IntelliGO measure
                                      Benabderrahmane et al., BIBM workshop IDASB 2011


 Lyon, 14 juin 2012                                                                    29/42
5. CR analysis : enrichment of reference sets

    CR : Similar annotation but not in reference set
         Suggestion : include this gene in the reference set
         Calculate a weight :
                                        Enrichment ( R ) ∩ Annotation ( g )
                     Suggest(g, R) =
                                                    Enrichment ( R )

      where Enrichment(R) contains annotation terms specific of R (P-value < Pthreshold)


    Illustration
         The STON1 gene was suggested to join human pathway hsa04130
          (SNARE interactions in vesicular transport) on the basis of its BP
          annotation: ‘intracellular transport’.
         To be confirmed by biologist experts of this pathway !



                                                                                      30/22
6. RC analysis: enrichment of gene annotation

     RC : Members of reference set not functionally classified with the rest of
      the set
          Suggestion : enrich gene annotation with terms t specific of C (present in
           Enrichment(C), given a P-value threshold)

          Calculate a weight :
                        Suggest(t) =
                                        {g t ∈Enrichment    ( C )  Annotation ( g )}
                                                           R C

     Illustration
          Adding BP term “Methionine biosynthetic process” to ARO8 gene in
           yeast because this term is significantly enriched in cluster C9 matching
           with pathway R6 (Lysine metabolism) to which ARO8 belongs to.
          Indeed, the expert confirmed that ARO8 is involved in a methionine
           biosynthesis pathway
                                                                                   31/22
Outline of the talk

1.    Introduction: Knowledge Discovery guided by Domain Knowledge

2.    IntelliGO : a semantic similarity measure for GO annotations

3.    IntelliGO-based clustering of genes

4.    IntelliGO-based abstraction for data mining

5.    Conclusion




Lyon, 14 juin 2012                                                   32/42
Application of IntelliGO measure to another
ontology: MedDRA
   Study with MedDRA : Medical Directory of Regulatory Activities
        Part of UMLS, about 20,000 terms.
        Used for side-effect description = subset of MedDRA previously called
         COSTART, about 1288 terms.
        MedDRA is organized as a rDAG with five depth levels: System Organ
         class, High Level Group Term, High Level Term, Preferred Term, Lowest
         Level Term.
        About 1/3 of MedDRA terms have more than one parent.
   SIDER : Side Effect Repository at the EMBL (http://sideeffects.embl.de)
        About 800 drugs associated with 1288 side effect features (MedDRA terms)
        Challenge : to apply symbolic data mining methods on this large dataset to
         discover patterns and regularities(Emmanuel Bresso’s PhD Thesis,
         Harmonic Pharma)

Lyon, 14 juin 2012                                                             33/42
Replacing MedDRA terms with term clusters (TC)

           Large matrices (1288 attributes) are untractable with symbolic methods
                  Search for frequent itemsets fails because not enough features are shared !


           Abstraction : Reduce the number of attributes without losing information
                  Classical problem in data mining
                  Generalisation in a tree, not possible with a rDAG
                  Semantic clustering is a solution
                  IntelliGO similarity between MedDRA terms


                             2 DepthMax[LCA (ti, tj)]                                                   SPL (ti, tj)
SimIntelliGO(ti, tj) =                                           DIstIntelliGO(ti, tj) =
                         SPL(ti, tj) + 2 DepthMax[LCA (ti,tj)]                             SPL(ti, tj) + 2 DepthMax[LCA (ti,tj)]




      Lyon, 14 juin 2012                                                                                               34/42
Hierarchical clustering of MedDRA terms
                                                      Cluster terms            AvgDist to
                                                           T54                other cluster
   Pairwise distances calculated for a                                          terms
    subset of 1288 terms                          Erythema                        0.31
                                                  Lichen planus                   0.32
   Hierarchical clustering + Kelley’s            Parapsoriasis                   0.32
                                                  Pityriasis alba                 0.32
    optimisation of cluster number                Rash papular                    0.32
         112 term clusters                      Decubitus ulcer                 0.35
                                                  Lupus miliaris                  0.35
        Calculation of the most representative   disseminatus faciei
         element                                  Pruritus                        0.35
        Validation by the expert                 Rash                            0.35
                                                  Sunburn                         0.35
   Example : TermCluster T54 Erythema            Vulvovaginal pruritus           0.35
                                                  Dandruff                        0.37
        15 terms related to skin pathologies     Rash                            0.37
                                                  Photosensitivity reaction       0.37
                                                  Psoriasis
                                                                                  0.37


Lyon, 14 juin 2012                                                                  35/42
Mining Cardiovascular Agents (CA) and Anti-
  Infective Agents (AIA)

CAAll           1288 attributes (terms)            CATC       112 attributes (term clusters)

           T1    T2    T3     T4     …                         TC1     TC2     TC3    …

 Drug1                                              Drug 1

 Drug2                                              Drug 2

 …                                                  …

94 drugs                                           94 drugs




        Idem for Anti-infective Agents : 76 drugs, 2 datasets AIAAll and AIATC

  Lyon, 14 juin 2012                                                                 36/42
Frequent Closed Itemset (FCI) extraction (1)

Minimal           50 %         60 %          70 %         80 %         90 %         100 %
support
CAAll                386         94           41            11            1            0
CATC              5,564        1,379         256            62            6            0
AIAAll               178         41            9            2             0            0
AIATC                654        154           30            3             0            0


       Use of Zart algorithm (Coron platform for symbolic datamining)
       FCI : maximal subsets of drugs sharing similar side effects (All) or similar TCs
       More FCIs are found with TC representation



Lyon, 14 juin 2012                                                                         37/42
Frequent Closed Itemset (FCI) extraction (2)

   FCIs obtained with TC representation are more informative
   Example : comparison of top- 5 FCIs obtained with AIA datasets

              All representation                       TC representation

     Nausea_and_vomitting_symptoms,         54_Erythema (88 %)
     Vomiting (82%)
                                            64_Nausea_and_vomitting_symptoms
     Nausea (80%)                           (88%)
     Pruritus (79%)                         99_Neuromyopathy (82 %)

     Nausea_and_vomitting_symptomes,        54_Erythema, 64_Nausea_and_
     Vomiting, Nausea (78%)                 vomitting_symptoms (79%)
     Headache (76%)                         65_Blepharitis (78%)

                                                           Bresso et al., KDIR 2011

Lyon, 14 juin 2012                                                             38/42
Conclusion
   Bio-ontologies and data mining
        Semantic distances can lead to better clustering than euclidean
         distances
              Exploit ‘omics’ datasets
              Overlap analysis for annotation curation


        Semantic abstraction make symbolic methods practicable
              -> extract explicit knowledge from large real-world datasets


   Systems Biology
        Data-driven modelling of complex systems implies KDD
         approaches
        Semantic similarity may help in reducing data complexity

Lyon, 14 juin 2012                                                            39/42
Acknowldegements
                                                    Harmonic Pharma
LORIA, Equipe Orpailleur                            (Start-up), Nancy
Nancy
MD Devignes                                         Michel Souchet
Malika Smaïl-Tabbone                                Emmanuel Bresso
Sidahmed Benabderrahmane
Jean-François Kneib

http://plateforme-mbi.loria.fr/intelliGO

                                           Financements
                                           INCa (bourse de thèse interdisciplinaire)
                                           Communauté Urbaine du Grand Nancy
                                           Contrat Plan Etat Région : MISN




 Lyon, 14 juin 2012                                                             40/42

Weitere ähnliche Inhalte

Was ist angesagt?

CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...Albert Orriols-Puig
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIIAlbert Orriols-Puig
 
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...Carlos M. Fernandes
 
Identification of Entities in Swedish
Identification of Entities in SwedishIdentification of Entities in Swedish
Identification of Entities in SwedishFindwise
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IIAlbert Orriols-Puig
 
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...Albert Orriols-Puig
 

Was ist angesagt? (11)

CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
 
Pherogenic Drawings
Pherogenic DrawingsPherogenic Drawings
Pherogenic Drawings
 
Identification of Entities in Swedish
Identification of Entities in SwedishIdentification of Entities in Swedish
Identification of Entities in Swedish
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
 
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 

Andere mochten auch

Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...GUANGYUAN PIAO
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...Daksh Raj Chopra
 
Temporal Pattern Mining
Temporal Pattern MiningTemporal Pattern Mining
Temporal Pattern MiningPrakhar Dhama
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Seattle DAML meetup
 
OUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationOUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationFlorian Leitner
 
gene regulation sdk 2013
gene regulation sdk 2013gene regulation sdk 2013
gene regulation sdk 2013Dr-HAMDAN
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...Victor Giannakouris
 
Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthbiinoida
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 
Text and text stream mining tutorial
Text and text stream mining tutorialText and text stream mining tutorial
Text and text stream mining tutorialmgrcar
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 
Spatial vs non spatial
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatialSumant Diwakar
 

Andere mochten auch (20)

Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
 
Analyzing and integrating probabilistic and deterministic computational model...
Analyzing and integrating probabilistic and deterministic computational model...Analyzing and integrating probabilistic and deterministic computational model...
Analyzing and integrating probabilistic and deterministic computational model...
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
 
SCoT and RAPD
SCoT and RAPDSCoT and RAPD
SCoT and RAPD
 
Temporal Pattern Mining
Temporal Pattern MiningTemporal Pattern Mining
Temporal Pattern Mining
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
 
OUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationOUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text Classification
 
Gene aswa (2)
Gene aswa (2)Gene aswa (2)
Gene aswa (2)
 
gene regulation sdk 2013
gene regulation sdk 2013gene regulation sdk 2013
gene regulation sdk 2013
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
 
Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 month
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Spatial Data Model
Spatial Data ModelSpatial Data Model
Spatial Data Model
 
Text and text stream mining tutorial
Text and text stream mining tutorialText and text stream mining tutorial
Text and text stream mining tutorial
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Spatial vs non spatial
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatial
 

Ähnlich wie IntelliGO semantic similarity measure for Gene Ontology annotations

Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?GigaScience, BGI Hong Kong
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Big Data Spain
 
Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Jian Qin
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotModern Data Stack France
 
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Mathieu d'Aquin
 
Linked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsLinked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsVito Ostuni
 
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…Roku
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵CHENHuiMei
 
Jena based implementation of a iso 11179 meta data registry
Jena based implementation of a iso 11179 meta data registryJena based implementation of a iso 11179 meta data registry
Jena based implementation of a iso 11179 meta data registryA. Anil Sinaci
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Multidimensionalscaling 090920230900-phpapp02
Multidimensionalscaling 090920230900-phpapp02Multidimensionalscaling 090920230900-phpapp02
Multidimensionalscaling 090920230900-phpapp02Sherrin Jv
 
Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Rob Grim
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietySURFnet
 
SP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiSP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiJohn Breslin
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptPerumalPitchandi
 
Desafios e Oportunidades derivados da Explosao de Dados (Big Data)
Desafios e Oportunidades derivados da Explosao de Dados (Big Data)Desafios e Oportunidades derivados da Explosao de Dados (Big Data)
Desafios e Oportunidades derivados da Explosao de Dados (Big Data)Francisco Pires
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 

Ähnlich wie IntelliGO semantic similarity measure for Gene Ontology annotations (20)

Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
 
Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
 
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Linked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsLinked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender Systems
 
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
 
Jena based implementation of a iso 11179 meta data registry
Jena based implementation of a iso 11179 meta data registryJena based implementation of a iso 11179 meta data registry
Jena based implementation of a iso 11179 meta data registry
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Multidimensionalscaling 090920230900-phpapp02
Multidimensionalscaling 090920230900-phpapp02Multidimensionalscaling 090920230900-phpapp02
Multidimensionalscaling 090920230900-phpapp02
 
Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...
 
Linked Sensor Data 101 (FIS2011)
Linked Sensor Data 101 (FIS2011)Linked Sensor Data 101 (FIS2011)
Linked Sensor Data 101 (FIS2011)
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
 
SP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiSP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with Gephi
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.ppt
 
Desafios e Oportunidades derivados da Explosao de Dados (Big Data)
Desafios e Oportunidades derivados da Explosao de Dados (Big Data)Desafios e Oportunidades derivados da Explosao de Dados (Big Data)
Desafios e Oportunidades derivados da Explosao de Dados (Big Data)
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 

Mehr von European Institute for Systems Biology & Medicine.

Mehr von European Institute for Systems Biology & Medicine. (9)

Prediction the outcome of Lung Transplantation within the COLT cohort
Prediction the outcome of Lung Transplantation within the COLT cohortPrediction the outcome of Lung Transplantation within the COLT cohort
Prediction the outcome of Lung Transplantation within the COLT cohort
 
SBGN comprehensive disease maps at LCSB.
SBGN comprehensive disease maps at LCSB.SBGN comprehensive disease maps at LCSB.
SBGN comprehensive disease maps at LCSB.
 
AirProm Harmonisation and Statistical Analysis
AirProm Harmonisation and Statistical AnalysisAirProm Harmonisation and Statistical Analysis
AirProm Harmonisation and Statistical Analysis
 
A systems biology approach for understanding skeletal muscle abnormalities in...
A systems biology approach for understanding skeletal muscle abnormalities in...A systems biology approach for understanding skeletal muscle abnormalities in...
A systems biology approach for understanding skeletal muscle abnormalities in...
 
Understanding and predicting biological complex system.
Understanding and predicting biological complex system.Understanding and predicting biological complex system.
Understanding and predicting biological complex system.
 
Nova Discovery - Advancing 4P medicine with the EISBM
Nova Discovery - Advancing 4P medicine with the EISBMNova Discovery - Advancing 4P medicine with the EISBM
Nova Discovery - Advancing 4P medicine with the EISBM
 
ALTRABio presents WikiBioPath: new perspectives in biological data analysis
ALTRABio presents WikiBioPath: new perspectives in biological data analysisALTRABio presents WikiBioPath: new perspectives in biological data analysis
ALTRABio presents WikiBioPath: new perspectives in biological data analysis
 
Translational Informatics in the Pre-Competitive Era
Translational Informatics in the Pre-Competitive EraTranslational Informatics in the Pre-Competitive Era
Translational Informatics in the Pre-Competitive Era
 
Data Analysis and Knowledge Management using BioXM in MeDALL, AirPROM and Syn...
Data Analysis and Knowledge Management using BioXM in MeDALL, AirPROM and Syn...Data Analysis and Knowledge Management using BioXM in MeDALL, AirPROM and Syn...
Data Analysis and Knowledge Management using BioXM in MeDALL, AirPROM and Syn...
 

Kürzlich hochgeladen

High Profile Call Girls Kodigehalli - 7001305949 Escorts Service with Real Ph...
High Profile Call Girls Kodigehalli - 7001305949 Escorts Service with Real Ph...High Profile Call Girls Kodigehalli - 7001305949 Escorts Service with Real Ph...
High Profile Call Girls Kodigehalli - 7001305949 Escorts Service with Real Ph...narwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
9873777170 Full Enjoy @24/7 Call Girls In North Avenue Delhi Ncr
9873777170 Full Enjoy @24/7 Call Girls In North Avenue Delhi Ncr9873777170 Full Enjoy @24/7 Call Girls In North Avenue Delhi Ncr
9873777170 Full Enjoy @24/7 Call Girls In North Avenue Delhi NcrDelhi Call Girls
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
 
Case Report Peripartum Cardiomyopathy.pptx
Case Report Peripartum Cardiomyopathy.pptxCase Report Peripartum Cardiomyopathy.pptx
Case Report Peripartum Cardiomyopathy.pptxNiranjan Chavan
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsMedicoseAcademics
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.ANJALI
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...rajnisinghkjn
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxDr.Nusrat Tariq
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...narwatsonia7
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any TimeCall Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Timevijaych2041
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingArunagarwal328757
 

Kürzlich hochgeladen (20)

High Profile Call Girls Kodigehalli - 7001305949 Escorts Service with Real Ph...
High Profile Call Girls Kodigehalli - 7001305949 Escorts Service with Real Ph...High Profile Call Girls Kodigehalli - 7001305949 Escorts Service with Real Ph...
High Profile Call Girls Kodigehalli - 7001305949 Escorts Service with Real Ph...
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
9873777170 Full Enjoy @24/7 Call Girls In North Avenue Delhi Ncr
9873777170 Full Enjoy @24/7 Call Girls In North Avenue Delhi Ncr9873777170 Full Enjoy @24/7 Call Girls In North Avenue Delhi Ncr
9873777170 Full Enjoy @24/7 Call Girls In North Avenue Delhi Ncr
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
 
Case Report Peripartum Cardiomyopathy.pptx
Case Report Peripartum Cardiomyopathy.pptxCase Report Peripartum Cardiomyopathy.pptx
Case Report Peripartum Cardiomyopathy.pptx
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes Functions
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptx
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
 
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any TimeCall Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, Pricing
 

IntelliGO semantic similarity measure for Gene Ontology annotations

  • 1. Supported by Prominent international speakers from h"p://workshop.eisbm.eu1
  • 2. IntelliGO semantic similarity measure for Gene Ontology annotations Marie-Dominique Devignes Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Equipe Orpailleur – INRIA Nancy Grand-Est
  • 3. Outline of the talk 1. Introduction: Knowledge Discovery guided by Domain Knowledge 2. IntelliGO : a semantic similarity measure for GO annotations 3. IntelliGO-based clustering of genes 4. IntelliGO-based abstraction for data mining 5. Conclusion Lyon, 14 juin 2012 2/42
  • 4. LORIA, Team Orpailleur  Making data talk : from data to knowledge KDD* K Information K Information Data Data Static picture : pyramid Dynamic picture : loop * KDD : Knowledge Discovery from Databases Lyon, 14 juin 2012 3
  • 5. The data deluge… Données NGS Croissance de EMBL ! xity 90 000 000 A AT ple 80 000 000 ig D om 70 000 000 60 000 000 50 000 000 B ! EST C 40 000 000 non-EST 30 000 000 WGS 20 000 000 10 000 000 0 1992 1994 1997 2000 2004 2009 t r ma i t y ! Fo ene  No rog knowledge hete without knowledge … (Amedeo Napoli) Lyon, 14 juin 2012 4
  • 6. Knowledge Discovery from Databases (KDD) A three-step iterative process… 3. Interpretation 2. Data mining Knowledge 1. Preparation Formatting Rules, patterns Selection Integration Formatted data Expert Dataset Integrated data Database …interactively controlled by an expert. Lyon, 14 juin 2012 5/42
  • 7. Knowledge Discovery guided by Domain Knowledge : KDDK Ontologies to assist the expert… 3. Interpretation 2. Data mining Knowledge 1. Preparation Formatting Rules, patterns Selection Integration Formatted data Expert Dataset Integrated data Database … at each step of the process. Lyon, 14 juin 2012 6/42
  • 8. Our scope today :  How ontologies can be used to improve the data mining step :  Design of a semantic similarity measure on Gene Ontology  Use it for functional clustering of genes  Improvement of gene classification  Apply it to another ontology (MedDRA) for secondary effects clustering  Data abstraction and dimension reduction  Essential for execution of data mining programms Lyon, 14 juin 2012 7/42
  • 9. Outline of the talk 1. Introduction: Knowledge Discovery guided by Domain Knowledge 2. IntelliGO : a semantic similarity measure for GO annotations 3. IntelliGO-based clustering of genes 4. IntelliGO-based abstraction for data mining 5. Conclusion Lyon, 14 juin 2012 8/42
  • 10. Semantic similarity in biomedical ontologies  Biological objects are « annotated » by terms / keywords   facilitates data retrieval from database  Annotation terms / Keywords are organized as controlled vocabularies  Tree : e.g. Swissprot Keywords  Thesaurus : e.g. MESH, UMLS  Ontologies : e.g. Gene Ontology  Similarity measure comparing annotations can take advantage of semantic relationships between terms /keywords   « semantic similarity measure » Lyon, 14 juin 2012 9/42
  • 11. Semantic similarity of annotations  General intuition…. Semantic relationships between terms GO-t1 GO-t2 GO-t3 … Gene1 X X O … Objects as Gene2 X O X … « feature vectors » … « Euclidean » Score 1 0 0  1 « Semantic » Score 1 0.5 0.5  2 Lyon, 14 juin 2012 10/42
  • 12. Brief outlook on GO (1)  Annotation vocabulary (GO consortium since 2000)  More 30,000 terms  Millions of annotations (genes and proteins in various species) recorded in public databases  Three aspects :  Biological Process  Cellular Component  Molecular Function  Structured as a Rooted Directed Acyclic Graph  Nodes = terms  Edges = semantic relationships  Is_a, part_of, related to  IMPORTANT : More than one parent from Pesquita et al., 2009 Lyon, 14 juin 2012 11/42
  • 13. Example of similar GO annotations Retrieved from Gene at NCBI identical GO terms similar GO terms Lyon, 14 juin 2012 12/42
  • 14. GO evidence codes  Traces the origin of each annotation  IEA (« Inferred from Electronic Annotation ») = not assigned by curator  All others (16) manually assigned Computational Author Curatorial Experimental analysis statement statement EXP ISS TAS IC IDA ISO NAS ND IPI ISA IMP ISM IGI IGC IEP RCA « Evidence codes cannot be used as a measure of the quality of the annotation » http://www.geneontology.org/GO.evidence.shtml  To be used as a filter for certain type of annotations Lyon, 14 juin 2012 13/42
  • 15. Semantic similarity measures : overview (1) Pesquita et al., 2009: Semantic similarity in biomedical ontologies, PLOS Comp. Biol. July 2009 1. Gene-Gene or Protein-Protein comparison = comparing two lists of terms (feature vectors) « groupwise » « pairwise » Set Graph Vector All pairs Best pairs - Lord et al. (2003) : All pairs/ Average/ Resnick, Lin, - Huang et al. (2007) : Vector/ Kappa-statistics (DAVID) Jiang measures - Martin et al. (2004) : Graph/Jaccard - Wang et al. (2007) : best pairs/Average/ Wang measure Lyon, 14 juin 2012 14/42
  • 16. Semantic similarity measures : overview (2) 2. Term-term comparison = exploiting the semantic relationships Root  Node-based approaches  Information content of __ Depth(LCA) lowest common ancestor __ SPL(t ,t ) (LCA) 1 2  Relative to a corpus  Depth(LCA)  Independent of corpus t1 t2  Edge-based approaches  Shortest Path Length (SPL) Lyon, 14 juin 2012 15/42
  • 17. IntelliGO semantic similarity measure (1) Benabderrahmane et al., 2010 : BMC Bioinformatics 1. Term-term comparison = combining node-based and edge-based approaches in a DAG In a tree (MeSH, Ganesan 2003) Root 2 x Depth(LCA) SimGanesan(t1,t2) = __ MaxDepth(LCA) Depth(t1) + Depth(t2) __ MinSPL(t ,t ) 1 2 In a rooted DAG (GO, IntelliGO 2010) SimIntelliGO(t1,t2) 2 x MaxDepth(LCA) = t1 t2 MinSPL(t1,t2) + 2 x MaxDepth(LCA) Lyon, 14 juin 2012 16/42
  • 18. IntelliGO semantic similarity measure (2) 2. Gene (or protein) representation  Representation of genes in a vector space model g = ∑i α i e i ei : basis vector, one per term (ti) αi : Coefficient for term ti  Definition of coefficients α i = w(g, ti) x IAF(ti) w(g, ti) : weight of evidence code * qualifying the assignment of feature ti to gène g IAF(ti) : « Inverse Annotation Frequency » ~ Information Content of feature ti in annotation corpus. * When more than one code, take  Definition of information content the maximal weight NTOT NTOT : Total number of genes in the corpus IAF(ti) = log Nti Nti : Number of gènes with feature ti Lyon, 14 juin 2012 17/42
  • 19. IntelliGO semantic similarity measure (3) 3. Gene-gene comparison  Generalized dot-product between two gene vectors g . h = ∑ i,j α i x β j x e i . ej with e i . ej = SimIntelliGO (ti,tj) 1 for i = j ≠ 0 for i ≠ j  Generalized cosine similarity g . h SimIntelliGO(g , h ) = √g . g x√ h . h Lyon, 14 juin 2012 18/42
  • 20. Implementation (Tax_ID, Gene_ID, GO_ID, Evid_Code, NCBI AnnotationFile GO_Def, GO_aspect) Parameter 1 Species Parameter 2 GO aspect Parameter 3 Liste of weights for EC CuratedAnnotationFile IAF calculation MaxDepth(LCA MinSP Genes Terms LCA L SPL (Gene_ID , Array of (GO_ID pairs, (GO_ID, IAF, Array of (GO_ID pairs, [GO_ID, Evid-Code]) LCA_Depth, LCA_ID_List [Gene_IDs]) SPL) Liste of genes of Sim(ti, tj ) calculation interest (Gene_IDs) IntelliGO Vector representation term-term Gene-Gene IntelliGO Similarity Matrix Lyon, 14 juin 2012 19/42
  • 21. IntelliGO on-line  http://plateforme-mbi.loria.fr/intelligo Lyon, 14 juin 2012 20/42
  • 22. Outline of the talk 1. Introduction: Knowledge Discovery guided by Domain Knowledge 2. IntelliGO : a semantic similarity measure for GO annotations 3. IntelliGO-based clustering of genes 4. IntelliGO-based abstraction for data mining 5. Conclusion Lyon, 14 juin 2012 21/42
  • 23. Gene clustering with IntelliGO : reference datasets  Clustering means grouping together most similar objects and putting in different clusters most dissimilar objects   relies on a similarity/distance measure  Evaluation purpose : Reference sets of genes  participating in similar biological process  pathways  sharing similar molecular functions  Pfam Clans  In two different species Dataset Species Source Number of Total genes sets 1 Human KEGG pathways 13 275 2 Yeast KEGG pathways 13 169 3 Human Pfam Clans 10 94 4 Yeast Pfam Clans 10 118 Lyon, 14 juin 2012 22/42
  • 24. 1. Hierarchical clustering for comparing IntelliGO with other semantic similarity measures  For each collection of gene sets  List all genes from all sets  Pairwise similarity calculation - > distance matrix  Lord’s measure (normalized) : based on IC(LCA)  Al-Mubaid’s measure : based on SPL  SimGIC measure : based on count of common ancestors  IntelliGO measure : based on both MaxDepth(LCA) and minSPL  Hierarchical clustering and heatmap visualisation  R BioConductor Lyon, 14 juin 2012 23/42
  • 25. Heatmap visualisation of hierarchical clustering Lord et al. Al-Mubaid (normalized) SIMGIC IntelliGO : - well-balanced - suggests fuzziness Lyon, 14 juin 2012 24/42
  • 26. 2. Functional classification : the DAVID software (1)  Functional classification = Clustering of functional annotation of genes:  On-line tool : DAVID (Database for Annotation Visualisation and Integrated Discovery) software Lyon, 14 juin 2012 25/42
  • 27. DAVID functional classification  Rich feature vectors : GO terms, domains, Enzyme Classification, Pathways, etc.  Kappa statistics : similarity measure based on co-occurrence and co-absence of features  Fuzzzy clustering agorithm : one gene can belong to more than one cluster.  Optimization of K number by varying kappa threshold -> % excluded genes GO-t1 GO-t2 GO-t3 … PfamD1 … Counting present and absent Gene1 X X O … X … features + ‘Kappa statistics’ Gene2 X O X … X … Not a semantic similarity measure No feature selection possible … 26/22
  • 28. 3. IntelliGO-based fuzzy C-means  IntelliGO distance matrix -> fanny implementation of fuzzy C-means in R environment, minimization of objective function  Remark : not a feature vector matrix + euclidean distance  no centroid calculation, no appropriate validity index  Membership probability matrix: one gene can belong to more than one cluster.  -> % excluded genes C1 C2 C3 … Membership threshold : Assignt 20 % Gene1 90 % 0 8% … Gene1 C1 Gene2 25 % 40 % 0% … Gene2 C1, C2 Gene3 0% 10 % 13 % … Gene 3 Excl. 27/22
  • 29. 4. Comparison using overlap analysis R1 R2 Rn Available classification … 602495 21259 Input list 65910 of Overlap analysis Global F-score = 12151 genes Pair-wise F-scores averageRi(MaxRiF_score) 157805 5246 489 171553 ... Functional classification … at various K value C1 C2 Ck Precision (Ci, Rj) : % genes present in Ci that belong to Rj 2 × Precision( , Rj) × Recall( , Rj) Ci Ci F_score (Ci, Rj) = Recall (Ci, Rj): % genes from Rj that are found in Ci Precision( , Rj) + Recall( , Rj) Ci Ci 28/22
  • 30. Comparison : optimal F-score and K number IntelliGO +Hierarchical IntelliGO+ fanny DAVID Ref Opt. Opt. Excl. Opt. Opt. Excl. Opt. Opt. Excl. sets Fscore K genes Fscore K genes Fscore K genes 1 (13) 0.58 17 0% 0.70 7 0% 0.67 10 21% 2 (13) 0.57 17 0% 0.74 9 1% 0.68 9 18 % 3 (10) 0.82 18 0% 0.77 15 5% 0.64 11 27 % 4 (10) 0.68 13 0% 0.71 12 2% 0.70 10 41 % Functional classification is reliable and robust with IntelliGO measure Benabderrahmane et al., BIBM workshop IDASB 2011 Lyon, 14 juin 2012 29/42
  • 31. 5. CR analysis : enrichment of reference sets  CR : Similar annotation but not in reference set  Suggestion : include this gene in the reference set  Calculate a weight : Enrichment ( R ) ∩ Annotation ( g ) Suggest(g, R) = Enrichment ( R ) where Enrichment(R) contains annotation terms specific of R (P-value < Pthreshold)  Illustration  The STON1 gene was suggested to join human pathway hsa04130 (SNARE interactions in vesicular transport) on the basis of its BP annotation: ‘intracellular transport’.  To be confirmed by biologist experts of this pathway ! 30/22
  • 32. 6. RC analysis: enrichment of gene annotation  RC : Members of reference set not functionally classified with the rest of the set  Suggestion : enrich gene annotation with terms t specific of C (present in Enrichment(C), given a P-value threshold)  Calculate a weight : Suggest(t) = {g t ∈Enrichment ( C ) Annotation ( g )} R C  Illustration  Adding BP term “Methionine biosynthetic process” to ARO8 gene in yeast because this term is significantly enriched in cluster C9 matching with pathway R6 (Lysine metabolism) to which ARO8 belongs to.  Indeed, the expert confirmed that ARO8 is involved in a methionine biosynthesis pathway 31/22
  • 33. Outline of the talk 1. Introduction: Knowledge Discovery guided by Domain Knowledge 2. IntelliGO : a semantic similarity measure for GO annotations 3. IntelliGO-based clustering of genes 4. IntelliGO-based abstraction for data mining 5. Conclusion Lyon, 14 juin 2012 32/42
  • 34. Application of IntelliGO measure to another ontology: MedDRA  Study with MedDRA : Medical Directory of Regulatory Activities  Part of UMLS, about 20,000 terms.  Used for side-effect description = subset of MedDRA previously called COSTART, about 1288 terms.  MedDRA is organized as a rDAG with five depth levels: System Organ class, High Level Group Term, High Level Term, Preferred Term, Lowest Level Term.  About 1/3 of MedDRA terms have more than one parent.  SIDER : Side Effect Repository at the EMBL (http://sideeffects.embl.de)  About 800 drugs associated with 1288 side effect features (MedDRA terms)  Challenge : to apply symbolic data mining methods on this large dataset to discover patterns and regularities(Emmanuel Bresso’s PhD Thesis, Harmonic Pharma) Lyon, 14 juin 2012 33/42
  • 35. Replacing MedDRA terms with term clusters (TC)  Large matrices (1288 attributes) are untractable with symbolic methods  Search for frequent itemsets fails because not enough features are shared !  Abstraction : Reduce the number of attributes without losing information  Classical problem in data mining  Generalisation in a tree, not possible with a rDAG  Semantic clustering is a solution  IntelliGO similarity between MedDRA terms 2 DepthMax[LCA (ti, tj)] SPL (ti, tj) SimIntelliGO(ti, tj) = DIstIntelliGO(ti, tj) = SPL(ti, tj) + 2 DepthMax[LCA (ti,tj)] SPL(ti, tj) + 2 DepthMax[LCA (ti,tj)] Lyon, 14 juin 2012 34/42
  • 36. Hierarchical clustering of MedDRA terms Cluster terms AvgDist to T54 other cluster  Pairwise distances calculated for a terms subset of 1288 terms Erythema 0.31 Lichen planus 0.32  Hierarchical clustering + Kelley’s Parapsoriasis 0.32 Pityriasis alba 0.32 optimisation of cluster number Rash papular 0.32   112 term clusters Decubitus ulcer 0.35 Lupus miliaris 0.35  Calculation of the most representative disseminatus faciei element Pruritus 0.35  Validation by the expert Rash 0.35 Sunburn 0.35  Example : TermCluster T54 Erythema Vulvovaginal pruritus 0.35 Dandruff 0.37  15 terms related to skin pathologies Rash 0.37 Photosensitivity reaction 0.37 Psoriasis 0.37 Lyon, 14 juin 2012 35/42
  • 37. Mining Cardiovascular Agents (CA) and Anti- Infective Agents (AIA) CAAll 1288 attributes (terms) CATC 112 attributes (term clusters) T1 T2 T3 T4 … TC1 TC2 TC3 … Drug1 Drug 1 Drug2 Drug 2 … … 94 drugs 94 drugs  Idem for Anti-infective Agents : 76 drugs, 2 datasets AIAAll and AIATC Lyon, 14 juin 2012 36/42
  • 38. Frequent Closed Itemset (FCI) extraction (1) Minimal 50 % 60 % 70 % 80 % 90 % 100 % support CAAll 386 94 41 11 1 0 CATC 5,564 1,379 256 62 6 0 AIAAll 178 41 9 2 0 0 AIATC 654 154 30 3 0 0  Use of Zart algorithm (Coron platform for symbolic datamining)  FCI : maximal subsets of drugs sharing similar side effects (All) or similar TCs  More FCIs are found with TC representation Lyon, 14 juin 2012 37/42
  • 39. Frequent Closed Itemset (FCI) extraction (2)  FCIs obtained with TC representation are more informative  Example : comparison of top- 5 FCIs obtained with AIA datasets All representation TC representation Nausea_and_vomitting_symptoms, 54_Erythema (88 %) Vomiting (82%) 64_Nausea_and_vomitting_symptoms Nausea (80%) (88%) Pruritus (79%) 99_Neuromyopathy (82 %) Nausea_and_vomitting_symptomes, 54_Erythema, 64_Nausea_and_ Vomiting, Nausea (78%) vomitting_symptoms (79%) Headache (76%) 65_Blepharitis (78%) Bresso et al., KDIR 2011 Lyon, 14 juin 2012 38/42
  • 40. Conclusion  Bio-ontologies and data mining  Semantic distances can lead to better clustering than euclidean distances  Exploit ‘omics’ datasets  Overlap analysis for annotation curation  Semantic abstraction make symbolic methods practicable  -> extract explicit knowledge from large real-world datasets  Systems Biology  Data-driven modelling of complex systems implies KDD approaches  Semantic similarity may help in reducing data complexity Lyon, 14 juin 2012 39/42
  • 41. Acknowldegements Harmonic Pharma LORIA, Equipe Orpailleur (Start-up), Nancy Nancy MD Devignes Michel Souchet Malika Smaïl-Tabbone Emmanuel Bresso Sidahmed Benabderrahmane Jean-François Kneib http://plateforme-mbi.loria.fr/intelliGO Financements INCa (bourse de thèse interdisciplinaire) Communauté Urbaine du Grand Nancy Contrat Plan Etat Région : MISN Lyon, 14 juin 2012 40/42