Time Series Foundation Models - current state and future directions
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity
1. Integrating Large, Disparate, Biomedical
Ontologies to Boost Organ Development
Network Connectivity
Chimezie Ogbuji1 and Rong Xu2
Metacognition LLC1
Case Western Reserve University2
2. Outline
Outline
◦ Background
◦ Motivation
◦ Literature review / related work
◦ Opportunity / specific example
◦ Hypothesis
◦ Method
◦ Evaluation
◦ Discussion
3. Background
Controlled biomedical vocabulary systems
(and ontologies) play a key role in the
analysis of genetic disease
◦ Structured, interoperable, and machine-readable
◦ Facilitate reproducibility of scientific results and
use of intelligent software that can leverage
underlying meaning
◦ Scientific results and the structured biomedical
knowledge they are based on may be used for
multiple - even unanticipated - purposes
4. Motivation
Want descriptive relations that comprise
terminology paths between (congenital)
diseases and the anatomical entities that
become malformed
Want to use these as the basis for
analysis and classification of congenital
disorders according to their underlying
molecular mechanism
5.
6. Opportunity
The Gene Ontology (GO) is arguably the most prominent
example of how highly-organized and structured medical
knowledge can be leveraged to facilitate medical genetics
◦ Has a hierarchy of biological processes involving organ
development.
The Foundational Model of Anatomy (FMA) is a vast
ontology with an objective to conceptualize the physical
objects and spaces that constitute the human body
◦ macroscopic, microscopic and sub-cellular canonical anatomy.
Their skeletal relations (is_a, part_of, and has_part) have the
same meaning
7. Opportunity (continued)
Their skeletal relations (is_a, part_of, and
has_part) have the same meaning
There are no immediately usable
terminology paths between concepts in
the GO's anatomy development process
hierarchy and participating anatomical
entities defined in the FMA
8. Literature review
Cellular components function via interaction with
each other in a highly-complex and
interconnected network
Interdependencies among a cell’s molecular
components lead to functional, molecular, and
causal relationships among distinct phenotypes.
Network-based approaches to disease have the
potential to provide a framework for classifying
disease, defining susceptibility, predicting disease
outcome, and identifying tailored therapeutic
strategies
Barabási et al. Network Medicine: A Network-based Approach to Human Disease, Nature Reviews
Genetics 2011.
9. For over a decade, analysis of biological networks via network and graph theory
has revealed the importance of locally-dense and
well-connected subgraphs (hubs).
Schwikowski et al. A network of protein-protein interactions in yeast 2000
Barabási et al. 2011
10. Related work
Investigation of structural and lexical
concordance between anatomy terms in the FMA
and SNOMED-CT
◦ Bodenreider & Zhang 2006
Leveraging this concordance for integrating
modules from each for a specific domain
◦ Ogbuji et al. 2010
Discussion of logical consequences of using
part_of between both anatomical entities (in the
FMA) and biological processes (the GO)
◦ Jimenez-Ruiz et al. 2010
11. Opportunity: Cardiovascular
disease and development
Understanding the formation of the heart is
critical to the understanding of
cardiovascular diseases
The study of genes and gene products
involved in cardiovascular development is an
important research area
There have been recent efforts to expand
the subset of the GO's anatomy
development hierarchy involved in heart
development
12. Marfan Syndrome (MFS)
[…] mainly characterized by aneurysm formation in the
proximal ascending aorta, leading to aortic dissection
or rupture at a young age when left untreated. The
identification of the underlying genetic cause of MFS, namely
mutations in the fibrillin-1 gene (FBN1), has further
enhanced [...] insights into the complex pathophysiology of
aneurysm formation
In UMLS Metathesaurus
• Finding site: connective tissue structure (SNOMED-CT)
• Category: congenitial skeletal disorder (CRISP Thesaurus and NLM MTH)
13. Marfan Syndrome example
In the GO, FBN1 is annotated with the
GO_0001501 (skeletal system development)
and GO_0007507 (heart development)
concepts (amongst others)
The former coincides with the more common
finding site and classification of MFS as a
congenital skeletal disorder
This is in spite of the fact that associations (causal
and otherwise) between MFS and cardiovascular
diseases such as aortic root dilation are well-
documented in the medical literature
14. Hypothesis
A high-quality integration of the GO's
development process hierarchy with the FMA will
have several benefits:
◦ New biological pathways from genetic diseases to the
anatomical entities whose development are involved
in their underlying molecular mechanisms
◦ Graph and network analysis can benefit from an
increase in connectivity for discovering biologically
meaningful motifs
◦ Similarly, classification algorithms can also take
advantage of this
16. Method and materials
Integration is performed on the following GO
development process hiearchies
◦ Anatomical structure development
◦ Anatomical structure arrangement
◦ Anatomical structure morphogenesis
Only GO concepts that annotate human genes
are considered
In processing the GO, the logical properties
(transitivity, for example) of the relations are fully
considered
◦ This will always be the case, henceforth
17. Method and materials (continued)
The FMA ontology is loaded (as OWL/RDF) into a
triple store for remote querying via SPARQL
The prefix of the human-readable label for each GO
concept in the development hierarchies is stemmed
and used as a basis for case-insensitive, lexical
matching on primary labels and exact synonyms of
FMA classes via a SPARQL query
FMA classes that match exactly are considered to
denote the anatomical entities that participate in the
corresponding GO biological process
19. Evaluation
Result: 1644 development process and
anatomical entity pairs
We calculate the Jaccard coefficient of
the overlap between hierarchies for 6
major organs and the anatomical
development processes they participate
in
20. Evaluation (continued)
Using the GO development process for some
FMA organ O as the starting point, the set of all
subordinate terms is calculated: GOsubgraph(O)
Example:
◦ GO_0007507 (heart development) has
GO_0003170 (heart valve development) as a
component (via has_part)
◦ GO_0003170 subsumes GO_0003176 (aortic valve
development) and has GO_0003179 (heart valve
morphogenesis) as a component
◦ Each of these would be considered as subordinates of
GO_0007507
21. Evaluation (continued)
In a similar fashion, the subordinate anatomical
entities for each O amongst the 6 chosen organs
are calculated:
◦ FMAsubgraph(O)
For each O, we calculate the GO terms that are
both in GOsubgraph(O) and were matched with an
FMA class that is in FMAsubgraph(O)
This resulting set of GO terms is considered the
intersecting set and the Jaccard coefficient is
calculated with respect to this, FMAsubgraph(O), and
GOsubgraph(O)
23. Evaluation: network connectivity
We calculate number of new paths from
OMIM diseases through their genes to
the anatomical entities in the FMA:
◦ P+dgo
Similarly, we calculate the number of new
paths starting from the genes to
additional FMA anatomical entities
◦ P+go
24. Network connectivity: continued
Only genes that are annotated with
anatomical development processes
matched to FMA classes and OMIM
diseases associated with these genes
were considered
◦ Genesdev
26. Histogram of the distribution of additional P+dgo paths as a whole
and normalized by the number of genes associated with each disease
27.
28. Log-scaled histogram of additional paths from Genesdev to FMA
classes, only for those genes that had additional paths
29. Evaluation summary
On average, mapping introduces 9,549
additional P+dgo paths per OMIM disease
On average, each Genedev gene had 17,037
additional paths to FMA classes
Caveat in normalizing the number of P+dgo
paths by number of genes
◦ paths from diseases to anatomical entities
introduce combinatorial factor of disease-gene
pairings
30. Discussion
Overlap results indicate little overlap
between the GO hierarchies and
corresponding FMA hierarchies
Not surprising as both cover disparate
domains within medicine and one is
specific to humans while the other is not
31. Discussion (continued)
This along with the size of the FMA as a
whole and within the portions mapped to
the GO hierarchies indicate opportunity to
build on the mapping and to integrate both
ontologies in a meaningful way
Connectivity results demonstrate significant
increase of biological paths from genetic
diseases (and their genes) to the anatomical
entities participating in the development
process
32. Discussion (continued)
As these paths are at least as logically and
biologically sound as the ontologies they
were forged from, we expect that an
appreciable amount of them will be useful
for analysis
To our knowledge, this is the first attempt
of this kind to integrate the anatomical
structural development, morphogenesis, and
organization hierarchies in the GO with the
FMA
33. Limitations
Regarding deductions (formal or
otherwise) that follow from an
integration of the FMA and GO
◦ Need to be careful to only consider
annotations for humans or to have a robust
way to manage the uncertainty introduced in
not doing so