SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Integrating Large, Disparate, Biomedical
Ontologies to Boost Organ Development
Network Connectivity

    Chimezie Ogbuji1 and Rong Xu2
    Metacognition LLC1
    Case Western Reserve University2
Outline
   Outline
    ◦   Background
    ◦   Motivation
    ◦   Literature review / related work
    ◦   Opportunity / specific example
    ◦   Hypothesis
    ◦   Method
    ◦   Evaluation
    ◦   Discussion
Background
   Controlled biomedical vocabulary systems
    (and ontologies) play a key role in the
    analysis of genetic disease
    ◦ Structured, interoperable, and machine-readable
    ◦ Facilitate reproducibility of scientific results and
      use of intelligent software that can leverage
      underlying meaning
    ◦ Scientific results and the structured biomedical
      knowledge they are based on may be used for
      multiple - even unanticipated - purposes
Motivation
 Want descriptive relations that comprise
  terminology paths between (congenital)
  diseases and the anatomical entities that
  become malformed
 Want to use these as the basis for
  analysis and classification of congenital
  disorders according to their underlying
  molecular mechanism
Opportunity
   The Gene Ontology (GO) is arguably the most prominent
    example of how highly-organized and structured medical
    knowledge can be leveraged to facilitate medical genetics
    ◦ Has a hierarchy of biological processes involving organ
      development.  
   The Foundational Model of Anatomy (FMA) is a vast
    ontology with an objective to conceptualize the physical
    objects and spaces that constitute the human body
    ◦ macroscopic, microscopic and sub-cellular canonical anatomy.
   Their skeletal relations (is_a, part_of, and has_part) have the
    same meaning
Opportunity (continued)
 Their skeletal relations (is_a, part_of, and
  has_part) have the same meaning
 There are no immediately usable
  terminology paths between concepts in
  the GO's anatomy development process
  hierarchy and participating anatomical
  entities defined in the FMA
Literature review
  Cellular components function via interaction with
   each other in a highly-complex and
   interconnected network
  Interdependencies among a cell’s molecular
   components lead to functional, molecular, and
   causal relationships among distinct phenotypes.
  Network-based approaches to disease have the
   potential to provide a framework for classifying
   disease, defining susceptibility, predicting disease
   outcome, and identifying tailored therapeutic
   strategies


Barabási et al. Network Medicine: A Network-based Approach to Human Disease, Nature Reviews
Genetics 2011.
For over a decade, analysis of biological networks via network and graph theory
has revealed the importance of locally-dense and
well-connected subgraphs (hubs).
                Schwikowski et al. A network of protein-protein interactions in yeast 2000




  Barabási et al. 2011
Related work
   Investigation of structural and lexical
    concordance between anatomy terms in the FMA
    and SNOMED-CT
    ◦ Bodenreider & Zhang 2006
   Leveraging this concordance for integrating
    modules from each for a specific domain
    ◦ Ogbuji et al. 2010
   Discussion of logical consequences of using
    part_of between both anatomical entities (in the
    FMA) and biological processes (the GO)
    ◦ Jimenez-Ruiz et al. 2010
Opportunity: Cardiovascular
disease and development
 Understanding the formation of the heart is
  critical to the understanding of
  cardiovascular diseases
 The study of genes and gene products
  involved in cardiovascular development is an
  important research area
 There have been recent efforts to expand
  the subset of the GO's anatomy
  development hierarchy involved in heart
  development
Marfan Syndrome (MFS)

[…] mainly characterized by aneurysm formation in the
proximal ascending aorta, leading to aortic dissection
or rupture at a young age when left untreated. The
identification of the underlying genetic cause of MFS, namely
mutations in the fibrillin-1 gene (FBN1), has further
enhanced [...] insights into the complex pathophysiology of
aneurysm formation


In UMLS Metathesaurus
• Finding site: connective tissue structure (SNOMED-CT)
• Category: congenitial skeletal disorder (CRISP Thesaurus and NLM MTH)
Marfan Syndrome example
 In the GO, FBN1 is annotated with the
  GO_0001501 (skeletal system development)
  and GO_0007507 (heart development)
  concepts (amongst others)
 The former coincides with the more common
  finding site and classification of MFS as a
  congenital skeletal disorder
 This is in spite of the fact that associations (causal
  and otherwise) between MFS and cardiovascular
  diseases such as aortic root dilation are well-
  documented in the medical literature
Hypothesis
   A high-quality integration of the GO's
    development process hierarchy with the FMA will
    have several benefits:
    ◦ New biological pathways from genetic diseases to the
      anatomical entities whose development are involved
      in their underlying molecular mechanisms
    ◦ Graph and network analysis can benefit from an
      increase in connectivity for discovering biologically
      meaningful motifs
    ◦ Similarly, classification algorithms can also take
      advantage of this
Copper: annotates human gene
Gold : does not annotate human gene
Method and materials
   Integration is performed on the following GO
    development process hiearchies
    ◦ Anatomical structure development
    ◦ Anatomical structure arrangement
    ◦ Anatomical structure morphogenesis
 Only GO concepts that annotate human genes
  are considered
 In processing the GO, the logical properties
  (transitivity, for example) of the relations are fully
  considered
    ◦ This will always be the case, henceforth
Method and materials (continued)
 The FMA ontology is loaded (as OWL/RDF) into a
  triple store for remote querying via SPARQL
 The prefix of the human-readable label for each GO
  concept in the development hierarchies is stemmed
  and used as a basis for case-insensitive, lexical
  matching on primary labels and exact synonyms of
  FMA classes via a SPARQL query
 FMA classes that match exactly are considered to
  denote the anatomical entities that participate in the
  corresponding GO biological process
Example
 GO_0007507 (heart development)
 Prefix: heart
 Matching FMA concept: FMA_7088
  (Heart)
Evaluation
 Result: 1644 development process and
  anatomical entity pairs
 We calculate the Jaccard coefficient of
  the overlap between hierarchies for 6
  major organs and the anatomical
  development processes they participate
  in
Evaluation (continued)
 Using the GO development process for some
  FMA organ O as the starting point, the set of all
  subordinate terms is calculated: GOsubgraph(O)
 Example:
    ◦ GO_0007507 (heart development) has
      GO_0003170 (heart valve development) as a
      component (via has_part)
    ◦ GO_0003170 subsumes GO_0003176 (aortic valve
      development) and has GO_0003179 (heart valve
      morphogenesis) as a component
    ◦ Each of these would be considered as subordinates of
      GO_0007507
Evaluation (continued)
   In a similar fashion, the subordinate anatomical
    entities for each O amongst the 6 chosen organs
    are calculated:
    ◦ FMAsubgraph(O)
 For each O, we calculate the GO terms that are
  both in GOsubgraph(O) and were matched with an
  FMA class that is in FMAsubgraph(O)
 This resulting set of GO terms is considered the
  intersecting set and the Jaccard coefficient is
  calculated with respect to this, FMAsubgraph(O), and
  GOsubgraph(O)
Jaccard Coefficient (overlap)
Evaluation: network connectivity
   We calculate number of new paths from
    OMIM diseases through their genes to
    the anatomical entities in the FMA:
    ◦ P+dgo
   Similarly, we calculate the number of new
    paths starting from the genes to
    additional FMA anatomical entities
    ◦ P+go
Network connectivity: continued
   Only genes that are annotated with
    anatomical development processes
    matched to FMA classes and OMIM
    diseases associated with these genes
    were considered
    ◦ Genesdev
Number of additional P+dgo paths on a logarithmic scale
Histogram of the distribution of additional P+dgo paths as a whole
and normalized by the number of genes associated with each disease
Log-scaled histogram of additional paths from Genesdev to FMA
classes, only for those genes that had additional paths
Evaluation summary
 On average, mapping introduces 9,549
  additional P+dgo paths per OMIM disease
 On average, each Genedev gene had 17,037
  additional paths to FMA classes
 Caveat in normalizing the number of P+dgo
  paths by number of genes
    ◦ paths from diseases to anatomical entities
      introduce combinatorial factor of disease-gene
      pairings
Discussion
 Overlap results indicate little overlap
  between the GO hierarchies and
  corresponding FMA hierarchies
 Not surprising as both cover disparate
  domains within medicine and one is
  specific to humans while the other is not
Discussion (continued)
 This along with the size of the FMA as a
  whole and within the portions mapped to
  the GO hierarchies indicate opportunity to
  build on the mapping and to integrate both
  ontologies in a meaningful way
 Connectivity results demonstrate significant
  increase of biological paths from genetic
  diseases (and their genes) to the anatomical
  entities participating in the development
  process
Discussion (continued)
 As these paths are at least as logically and
  biologically sound as the ontologies they
  were forged from, we expect that an
  appreciable amount of them will be useful
  for analysis
 To our knowledge, this is the first attempt
  of this kind to integrate the anatomical
  structural development, morphogenesis, and
  organization hierarchies in the GO with the
  FMA
Limitations
   Regarding deductions (formal or
    otherwise) that follow from an
    integration of the FMA and GO
    ◦ Need to be careful to only consider
      annotations for humans or to have a robust
      way to manage the uncertainty introduced in
      not doing so

Weitere ähnliche Inhalte

Was ist angesagt?

Genetic representation
Genetic representationGenetic representation
Genetic representationDEEPIKA T
 
Computational predictiction of prrotein structure
Computational predictiction of prrotein structureComputational predictiction of prrotein structure
Computational predictiction of prrotein structureArchita Srivastava
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Bioinformatics.Assignment
Bioinformatics.AssignmentBioinformatics.Assignment
Bioinformatics.AssignmentNaima Tahsin
 
Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...IAEME Publication
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
 
Protein structure 2
Protein structure 2Protein structure 2
Protein structure 2Rainu Rajeev
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02PILLAI ASWATHY VISWANATH
 
Sequence alignment 1
Sequence alignment 1Sequence alignment 1
Sequence alignment 1SumatiHajela
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...IJCSEA Journal
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...CSCJournals
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biologylemberger
 
Homology modeling
Homology modelingHomology modeling
Homology modelingAjay Murali
 
Introduction to Systemics with focus on Systems Biology
Introduction to Systemics with focus on Systems BiologyIntroduction to Systemics with focus on Systems Biology
Introduction to Systemics with focus on Systems BiologyMrinal Vashisth
 

Was ist angesagt? (20)

Genetic representation
Genetic representationGenetic representation
Genetic representation
 
Computational predictiction of prrotein structure
Computational predictiction of prrotein structureComputational predictiction of prrotein structure
Computational predictiction of prrotein structure
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Bioinformatics.Assignment
Bioinformatics.AssignmentBioinformatics.Assignment
Bioinformatics.Assignment
 
Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster Identification
 
Protein structure 2
Protein structure 2Protein structure 2
Protein structure 2
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Sequence alignment 1
Sequence alignment 1Sequence alignment 1
Sequence alignment 1
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
 
The tree of life
The tree of lifeThe tree of life
The tree of life
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biology
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Introduction to Systemics with focus on Systems Biology
Introduction to Systemics with focus on Systems BiologyIntroduction to Systemics with focus on Systems Biology
Introduction to Systemics with focus on Systems Biology
 

Andere mochten auch

GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachChimezie Ogbuji
 
GRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereGRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereChimezie Ogbuji
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR OntologyChimezie Ogbuji
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantChimezie Ogbuji
 
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLTools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLChimezie Ogbuji
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsChimezie Ogbuji
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextractionChimezie Ogbuji
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchChimezie Ogbuji
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsChimezie Ogbuji
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryChimezie Ogbuji
 

Andere mochten auch (12)

GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial Approach
 
GRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereGRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and Where
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR Ontology
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are Important
 
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLTools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDL
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextraction
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes research
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data Dictionary
 

Ähnlich wie Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Evolution Phylogenetic
Evolution PhylogeneticEvolution Phylogenetic
Evolution PhylogeneticSamsil Arefin
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
MathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathias Hibbard
 
Epigeneticsand methylation
Epigeneticsand methylationEpigeneticsand methylation
Epigeneticsand methylationShubhda Roy
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Mutiple Sclerosis
 
Research proposal sjtu
Research proposal sjtuResearch proposal sjtu
Research proposal sjtuAqsa Qambrani
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSHEETHUMOLKS
 
human gene mutation database is one tye of bio informatics tool
human gene mutation database is one tye of  bio informatics toolhuman gene mutation database is one tye of  bio informatics tool
human gene mutation database is one tye of bio informatics toolN MAHESH
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Jonathan Eisen
 
Coppola network analysis_ucla-07032015
Coppola network analysis_ucla-07032015Coppola network analysis_ucla-07032015
Coppola network analysis_ucla-07032015giovannicoppola
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformaticaMartín Arrieta
 
A Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration FrameworkA Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration FrameworkLisa Muthukumar
 
Making Protein Function and Subcellular Localization Predictions: Challenges ...
Making Protein Function and Subcellular Localization Predictions: Challenges ...Making Protein Function and Subcellular Localization Predictions: Challenges ...
Making Protein Function and Subcellular Localization Predictions: Challenges ...fionabrinkman
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
 

Ähnlich wie Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity (20)

Evolution Phylogenetic
Evolution PhylogeneticEvolution Phylogenetic
Evolution Phylogenetic
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
MathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaper
 
Epigeneticsand methylation
Epigeneticsand methylationEpigeneticsand methylation
Epigeneticsand methylation
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...
 
Research proposal sjtu
Research proposal sjtuResearch proposal sjtu
Research proposal sjtu
 
Omics era
Omics eraOmics era
Omics era
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
human gene mutation database is one tye of bio informatics tool
human gene mutation database is one tye of  bio informatics toolhuman gene mutation database is one tye of  bio informatics tool
human gene mutation database is one tye of bio informatics tool
 
Bio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anweshaBio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anwesha
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
 
Coppola network analysis_ucla-07032015
Coppola network analysis_ucla-07032015Coppola network analysis_ucla-07032015
Coppola network analysis_ucla-07032015
 
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
biologydriven
biologydrivenbiologydriven
biologydriven
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
 
A Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration FrameworkA Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration Framework
 
Making Protein Function and Subcellular Localization Predictions: Challenges ...
Making Protein Function and Subcellular Localization Predictions: Challenges ...Making Protein Function and Subcellular Localization Predictions: Challenges ...
Making Protein Function and Subcellular Localization Predictions: Challenges ...
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 

Kürzlich hochgeladen

Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Kürzlich hochgeladen (20)

Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

  • 1. Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity Chimezie Ogbuji1 and Rong Xu2 Metacognition LLC1 Case Western Reserve University2
  • 2. Outline  Outline ◦ Background ◦ Motivation ◦ Literature review / related work ◦ Opportunity / specific example ◦ Hypothesis ◦ Method ◦ Evaluation ◦ Discussion
  • 3. Background  Controlled biomedical vocabulary systems (and ontologies) play a key role in the analysis of genetic disease ◦ Structured, interoperable, and machine-readable ◦ Facilitate reproducibility of scientific results and use of intelligent software that can leverage underlying meaning ◦ Scientific results and the structured biomedical knowledge they are based on may be used for multiple - even unanticipated - purposes
  • 4. Motivation  Want descriptive relations that comprise terminology paths between (congenital) diseases and the anatomical entities that become malformed  Want to use these as the basis for analysis and classification of congenital disorders according to their underlying molecular mechanism
  • 5.
  • 6. Opportunity  The Gene Ontology (GO) is arguably the most prominent example of how highly-organized and structured medical knowledge can be leveraged to facilitate medical genetics ◦ Has a hierarchy of biological processes involving organ development.    The Foundational Model of Anatomy (FMA) is a vast ontology with an objective to conceptualize the physical objects and spaces that constitute the human body ◦ macroscopic, microscopic and sub-cellular canonical anatomy.  Their skeletal relations (is_a, part_of, and has_part) have the same meaning
  • 7. Opportunity (continued)  Their skeletal relations (is_a, part_of, and has_part) have the same meaning  There are no immediately usable terminology paths between concepts in the GO's anatomy development process hierarchy and participating anatomical entities defined in the FMA
  • 8. Literature review  Cellular components function via interaction with each other in a highly-complex and interconnected network  Interdependencies among a cell’s molecular components lead to functional, molecular, and causal relationships among distinct phenotypes.  Network-based approaches to disease have the potential to provide a framework for classifying disease, defining susceptibility, predicting disease outcome, and identifying tailored therapeutic strategies Barabási et al. Network Medicine: A Network-based Approach to Human Disease, Nature Reviews Genetics 2011.
  • 9. For over a decade, analysis of biological networks via network and graph theory has revealed the importance of locally-dense and well-connected subgraphs (hubs). Schwikowski et al. A network of protein-protein interactions in yeast 2000 Barabási et al. 2011
  • 10. Related work  Investigation of structural and lexical concordance between anatomy terms in the FMA and SNOMED-CT ◦ Bodenreider & Zhang 2006  Leveraging this concordance for integrating modules from each for a specific domain ◦ Ogbuji et al. 2010  Discussion of logical consequences of using part_of between both anatomical entities (in the FMA) and biological processes (the GO) ◦ Jimenez-Ruiz et al. 2010
  • 11. Opportunity: Cardiovascular disease and development  Understanding the formation of the heart is critical to the understanding of cardiovascular diseases  The study of genes and gene products involved in cardiovascular development is an important research area  There have been recent efforts to expand the subset of the GO's anatomy development hierarchy involved in heart development
  • 12. Marfan Syndrome (MFS) […] mainly characterized by aneurysm formation in the proximal ascending aorta, leading to aortic dissection or rupture at a young age when left untreated. The identification of the underlying genetic cause of MFS, namely mutations in the fibrillin-1 gene (FBN1), has further enhanced [...] insights into the complex pathophysiology of aneurysm formation In UMLS Metathesaurus • Finding site: connective tissue structure (SNOMED-CT) • Category: congenitial skeletal disorder (CRISP Thesaurus and NLM MTH)
  • 13. Marfan Syndrome example  In the GO, FBN1 is annotated with the GO_0001501 (skeletal system development) and GO_0007507 (heart development) concepts (amongst others)  The former coincides with the more common finding site and classification of MFS as a congenital skeletal disorder  This is in spite of the fact that associations (causal and otherwise) between MFS and cardiovascular diseases such as aortic root dilation are well- documented in the medical literature
  • 14. Hypothesis  A high-quality integration of the GO's development process hierarchy with the FMA will have several benefits: ◦ New biological pathways from genetic diseases to the anatomical entities whose development are involved in their underlying molecular mechanisms ◦ Graph and network analysis can benefit from an increase in connectivity for discovering biologically meaningful motifs ◦ Similarly, classification algorithms can also take advantage of this
  • 15. Copper: annotates human gene Gold : does not annotate human gene
  • 16. Method and materials  Integration is performed on the following GO development process hiearchies ◦ Anatomical structure development ◦ Anatomical structure arrangement ◦ Anatomical structure morphogenesis  Only GO concepts that annotate human genes are considered  In processing the GO, the logical properties (transitivity, for example) of the relations are fully considered ◦ This will always be the case, henceforth
  • 17. Method and materials (continued)  The FMA ontology is loaded (as OWL/RDF) into a triple store for remote querying via SPARQL  The prefix of the human-readable label for each GO concept in the development hierarchies is stemmed and used as a basis for case-insensitive, lexical matching on primary labels and exact synonyms of FMA classes via a SPARQL query  FMA classes that match exactly are considered to denote the anatomical entities that participate in the corresponding GO biological process
  • 18. Example  GO_0007507 (heart development)  Prefix: heart  Matching FMA concept: FMA_7088 (Heart)
  • 19. Evaluation  Result: 1644 development process and anatomical entity pairs  We calculate the Jaccard coefficient of the overlap between hierarchies for 6 major organs and the anatomical development processes they participate in
  • 20. Evaluation (continued)  Using the GO development process for some FMA organ O as the starting point, the set of all subordinate terms is calculated: GOsubgraph(O)  Example: ◦ GO_0007507 (heart development) has GO_0003170 (heart valve development) as a component (via has_part) ◦ GO_0003170 subsumes GO_0003176 (aortic valve development) and has GO_0003179 (heart valve morphogenesis) as a component ◦ Each of these would be considered as subordinates of GO_0007507
  • 21. Evaluation (continued)  In a similar fashion, the subordinate anatomical entities for each O amongst the 6 chosen organs are calculated: ◦ FMAsubgraph(O)  For each O, we calculate the GO terms that are both in GOsubgraph(O) and were matched with an FMA class that is in FMAsubgraph(O)  This resulting set of GO terms is considered the intersecting set and the Jaccard coefficient is calculated with respect to this, FMAsubgraph(O), and GOsubgraph(O)
  • 23. Evaluation: network connectivity  We calculate number of new paths from OMIM diseases through their genes to the anatomical entities in the FMA: ◦ P+dgo  Similarly, we calculate the number of new paths starting from the genes to additional FMA anatomical entities ◦ P+go
  • 24. Network connectivity: continued  Only genes that are annotated with anatomical development processes matched to FMA classes and OMIM diseases associated with these genes were considered ◦ Genesdev
  • 25. Number of additional P+dgo paths on a logarithmic scale
  • 26. Histogram of the distribution of additional P+dgo paths as a whole and normalized by the number of genes associated with each disease
  • 27.
  • 28. Log-scaled histogram of additional paths from Genesdev to FMA classes, only for those genes that had additional paths
  • 29. Evaluation summary  On average, mapping introduces 9,549 additional P+dgo paths per OMIM disease  On average, each Genedev gene had 17,037 additional paths to FMA classes  Caveat in normalizing the number of P+dgo paths by number of genes ◦ paths from diseases to anatomical entities introduce combinatorial factor of disease-gene pairings
  • 30. Discussion  Overlap results indicate little overlap between the GO hierarchies and corresponding FMA hierarchies  Not surprising as both cover disparate domains within medicine and one is specific to humans while the other is not
  • 31. Discussion (continued)  This along with the size of the FMA as a whole and within the portions mapped to the GO hierarchies indicate opportunity to build on the mapping and to integrate both ontologies in a meaningful way  Connectivity results demonstrate significant increase of biological paths from genetic diseases (and their genes) to the anatomical entities participating in the development process
  • 32. Discussion (continued)  As these paths are at least as logically and biologically sound as the ontologies they were forged from, we expect that an appreciable amount of them will be useful for analysis  To our knowledge, this is the first attempt of this kind to integrate the anatomical structural development, morphogenesis, and organization hierarchies in the GO with the FMA
  • 33. Limitations  Regarding deductions (formal or otherwise) that follow from an integration of the FMA and GO ◦ Need to be careful to only consider annotations for humans or to have a robust way to manage the uncertainty introduced in not doing so