Technology R&D Theme 3: Multi-scale Network Representations

TRD 3: MULTI-SCALE NETWORKS – PROJECT SUMMARY
Although networks have been extremely useful for representing molecular interactions and
mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell
involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in
turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this
Technology Research and Development Project (TRD), we will pursue methods that move Network
Biology towards such hierarchical, multi-scale views of the structure and function of biological
systems. Biological ontologies are one very successful framework for capturing hierarchical multi-
scale organization, but they have so far been only indirectly connected to biological networks and
other types of ‘omics data. Recently, we introduced methods for inferring the terms and term relations
of a gene ontology directly from the hierarchical structure contained in molecular networks, and we
prototyped a web resource to distribute network-based ontologies (NeXO, nexontology.org). This
recent progress motivates and lays groundwork for our present focus on hierarchical multi-scale
representations. Specific aims are to develop tools that: (1) Iteratively and flexibly incorporate new
network experimental results into a ‘working’ NeXO ontology, (2) Use a gene ontology structure, either
inferred or literature curated, to guide an engine for generalized functional predictions, and (3) Explore
multi-scale analysis above the cellular level, by bridging ligand-receptor networks to networks of cell-
cell communication. These aims are stimulated by a range of Driving Biomedical Projects involving
the Gene Ontology project, the Saccharomyces Genome Database, a Cancer Gene Ontology, and
multi-scale analysis of viral-host, cell-cell communication and social networks. Ultimately, all research
aims synergize to use network data to propel hierarchical models of biological structure and function.

TRD 3: MULTI-SCALE NETWORKS – PROJECT NARRATIVE
Although networks have been extremely useful for representing interactions and group formation,
network diagrams fail to capture important aspects of biological structure and function. We will pursue
methods that move Network Biology towards more accurate hierarchical, multi-scale views of
biological systems. The hierarchical models developed here will enable integration of both basic and
clinical data to predict disease outcomes in response to specific therapies.

TRD 3: MULTI-SCALE NETWORKS – SPECIFIC AIMS
Although networks have been very useful for representing molecular interactions and mechanisms,
network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale
hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of
pathways, biological processes, organelles, cells, tissues, and so on. In this technology research
project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale
views of biological structure and function.
Aim 1. Assembly and refinement of gene ontology structure from biological network data.
Ontologies have been very successful at capturing hierarchical, multi-scale cellular organization. In
the prior period of support we introduced methods for assembling a gene ontology directly from the
hierarchical structure evidenced by molecular networks and other ‘omics data. We prototyped a web
resource to distribute network-based ontologies (NeXO, nexontology.org), but it is still at an early
stage. In the next support period, we will research methods to iteratively and flexibly incorporate new
experimental results and data into a ‘working’ NeXO ontology, highlighting new terms and term
relations that are created, alongside existing terms/relations that are further supported or weakened.
We will transform nexontology.org to an interactive community resource that enables investigators not
only to browse an existing ontology but to create, share, and iteratively update, revise, correct, and
expand these ontologies. The potential of this aim is to effectively systematize and crowd-source an
important type of biological model – the ontology.
Aim 2. Functionalized gene ontologies as a hierarchy of phenotypic prediction. Hierarchy and
scale are important not only for capturing the physical architecture of a system (Aim 1) but also its
function. Recent progress in artificial intelligence (AI), embodied by agents such as Siri and Watson,
inspires an approach for moving from networks and gene ontologies, which are currently descriptive in
nature, towards predictive models that are able to predict a range of cellular phenotypes and answer
biological questions. Using these AIs as a rough inspirational guideline, we will develop gene
ontologies as a major platform for the functional translation of genotype to phenotype, with a particular
focus on personalized cancer therapeutics. This aim intersects with the separate TRD project on
Predictive Networks and serves as a bridge between the two TRDs.
Aim 3. Bridging ligand-receptor networks to cell-cell communication networks. We will also
explore multi-scale network analysis above the cellular level, in the context of an emerging class of
biological networks called cell-cell interaction networks. In these networks nodes are cells, and edges
represent physical or chemical (e.g. hormonal) interactions. Inter-cellular signaling and regulation
networks could in the future be controlled to grow artificial organs, heal tissues and develop novel
therapies. We will infer, analyze and visualize multi-scale models of inter-cellular communication
networks and their corresponding intracellular signaling networks and pathways, which link to
traditional molecular interaction network analysis methods. For instance, we will use network analysis
methods to identify potential control points in the cell-cell and intracellular interaction network with
applications to regenerative medicine (growing blood from stem cells). Growth of data and analysis
methods in this area will enable network science to contribute to the wider understanding of
physiological systems.
These aims are stimulated by a range of Driving Biomedical Projects involving the Gene Ontology
project, the Saccharomyces Genome Database, a Cancer Gene Ontology, and multi-scale analysis of
viral-host, cell-cell communication and social networks. Ultimately, all research aims synergize to use
network data to propel hierarchical models of biological structure and function.

TRD 3: MULTI-SCALE NETWORKS – RESEARCH STRATEGY
SIGNIFICANCE
Why it is time to move beyond flat models of biological networks. Like any model of the world,
our view of the cell is inescapably bound by the time and place in which we live. Over the years
different schools have fashioned the cell in a variety of forms, from bags of enzymes1
, to metabolic
channels2
, to feedback circuits3
, to complex systems4
, to gels5
, to self-modifying programs in
software6
. A model that has pervaded cell biology for the past fifteen years is the so-called “network”
view (Figure 1A), which has bloomed in parallel with the emergence of human-made networks such
as the Internet and Facebook. This view treats cells as containers for vast networks of “nodes”
(genes, gene products, metabolites, or other biomolecules) connected by “links” (physical interactions
or functional associations)7
. Network representations of the cell flow directly from the ability to
characterize not only genes and proteins in isolation, but also their functional similarities and physical
binding partners— a major outcome of transcriptomics and proteomics approaches. Analysis of
network information, whether biological or human-made, is an active field leading to algorithms that
detect nodes with strategic positions within a network7
or that analyze networks to identify modular
structures8
(a topic of earlier progress during the past period of support for the NRNB).
While incredibly influential, the network is likely not the ultimate representation of a cell, for two
reasons. First, network diagrams do not visually resemble the contents of cells. Nowhere in the cell do
we observe actual wires running between genes and proteins– unlike for the Internet, which is truly a
network of wires among processing units. Rather, the cell involves a multi-scale hierarchy of
components that is not readily captured by basic network representations. For example, the
proteasome has been mapped extensively to identify its key genes and interactions, but the network
visualization of these data (Figure 1A) is very different from the proteasome’s spatial appearance
(Figure 1B). The interactions making up the proteasome factor into a regulatory particle and a core,
which, in turn, factor into a base and a lid, and an alpha and beta subunit, respectively. This
hierarchical structure is obscured by the network visualization of pairwise relationships between gene
products. Aim 1 will address this shortcoming, by using molecular networks and other ‘omics data to
build hierarchical models of the cell parallel to the Gene Ontology9
.
Figure 1. From networks to ontologies. (A) Network representation of three types of interactions that form the
proteasome structure, displayed using a force directed layout. (B) Cartoon representation of the structure of the
proteasome (PDB entry 4b4t), created by integrating partial crystallographic structures obtained by analysis of
2.4 million images from electron microscopy. (C) Hierarchical factorization of the proteasome sub-components
as described by our data-driven gene ontology NeXO. Across all panels, colors indicate membership to the core
complex beta subunit (red), core complex alpha subunit (orange), regulatory particle lid complex (blue) and
regulatory particle base complex (purple) according to the GO (A), the Protein Data Bank (B) and NeXO (C).

From description to prediction. Second, many of the molecular networks published to date,
including many from the NRNB or earlier research by our labs10-21
, are descriptive maps of physical or
functional connectivity rather than predictive models. For example, technologies such as yeast two
hybrid, protein affinity purification, and chromatin immunoprecipitation are often used to define and
draw large networks of protein-protein and protein-DNA interactions22
, but these static maps do not,
by themselves, predict cell behavior. Although we and many others in the field of network biology
have inferred networks capable of predicting gene function or phenotypic responses [reviewed
here23,24
; network inference was the focus of previous Aim 4 of the past funding period], these efforts
have tended to focus on a specific class of predictions, i.e. gene expression level or cell growth rate.
Assembling a model that would predict a range of phenotypes, rather than only one type of outcome,
requires understanding how phenotypes are interrelated. Here again a hierarchy is important, since
cellular organization involves a multi-scale hierarchy not only in structure but also in function. For
example, the proteasome is a central component of ubiquitin-mediated protein degradation, which,
depending on an intricate set of inputs and rules, can result in cellular homeostasis, differentiation,
death, and other fates. This multi-scale hierarchy of processes is, again, simply not exposed by a
standard pairwise network representation. Aim 2 will address this shortcoming by developing methods
to ‘functionalize’ the Gene Ontology, so that it is not merely a static description of the contents of cells,
but an active framework for predicting phenotype from genotype.
From networks to ontologies: Building better models of cell structure from omics data. To
capture hierarchical organization, a particularly promising direction in computer science has been the
development of the ontology, a model that divides its subject domain into a set of fundamental
concepts or entities and relationships among those entities25
. Ontologies arise from the metaphysics
branch of philosophy, which is concerned with the nature of what exists and the categories into which
the world’s objects naturally fall. Ontologies build upon and extend network models in two key ways:
‘entities’ refer not only to elemental objects but also to any meaningful grouping of objects, and
‘relationships’ refer not only to direct connections but also to nested structures, such as one entity
being a part or type of another. Thus, ontologies explicitly allow for a higher order organization of
knowledge, missing from raw networks. They have been key for building powerful knowledge
representation and reasoning systems in many domains26
including biomedicine27
.
Ontologies became very influential in cell biology through the development of the Gene Ontology
(GO)9
. GO is a major resource of knowledge about genes, gene products, and the hierarchy of cellular
components, molecular functions and biological processes in which they participate. Entities in GO
(GO terms) are hierarchical groupings of other entities. The GO resource is presently very large, with
nearly 35,000 GO terms connected by ~65,000 hierarchical term-term relations, describing more than
80 different species. The impact of GO is hard to overstate – just try to think of a single modern ‘omics
analysis that does not use GO to validate a novel data set or approach, or to generate new
mechanistic hypotheses. In a sense GO is the most universal, and universally accepted, model of a
cell that we currently have.
One limitation of GO lies in the fact that the ontology structure is constructed by a diverse team of
scientists according to their best abilities to curate the published scientific literature. Thus, GO
inevitably misses the large proportion of cell biology that is not yet known or has not yet been curated,
and it contains biases that are hard to control. To address these challenges, in the prior period of
support we investigated whether gene ontologies could be inferred computationally directly from
systematic molecular interaction networks28
. In this study, a large fraction of the GO hierarchy was
recapitulated de novo, directly from network data gathered in budding yeast. For example, the
pairwise interaction network for genes and gene products encoding the proteasome (Figure 2A) was
transformed to infer the hierarchical structure of proteasomal components to a high degree of
accuracy (Figure 2C). In addition, several hundred cellular entities were identified from the data that
had not yet been catalogued in GO, pointing to potentially novel or uncurated molecular machinery
which we are pursuing in collaboration with the Gene Ontology Consortium (formerly a CSP, now a
DBP).

Over the next few years, we will expand on this preliminary work to introduce a system for organizing
molecular interactions and cancer ‘omics data as a genomics-driven, crowd-sourced Gene Ontology.
This will address several parallel challenges in the ‘omics sciences:
(1) The need to move beyond clustering to recognize the multi-scale structure embedded in data
(2) The need to improve ontologies of gene function in their scalability, consistency and coverage
(3) The continued need to provide biomedicine with an accurate map of hallmark pathways and
processes that drive disease progression.
Taking clues from Siri: ‘Active’ networks and ontologies. Whether based on expert knowledge or
inferred from data, current gene ontologies are static descriptions of cellular organization. They
enable representing and reasoning on the structural relationships among biological entities27,29
but
lack any native capacity to capture dynamic biological states or make phenotypic predictions.
However, since gene ontologies inherently represent multi-scale hierarchy in cellular organization,
they provide in theory an ideal substrate for building models that would also be predictive of a range
of cellular responses and phenotypes.
In this respect, intelligent agents developed in the field of knowledge representation and reasoning26
,
such as Apple’s Siri and IBM’s Watson, provide an excellent example of what a predictive, or
‘executable’, ontology looks like. At Siri’s core is a series of ontologies containing knowledge that
concerns Siri – answers to questions one would normally ask an iPhone30
. For instance, Siri uses an
ontology for event planning which treats both meals and movies as types of events, where meals
involve a restaurant and a restaurant consists of components such as a name, address, and style of
food. In many ways, such ontologies are similar in structure to bio-ontologies such as GO (Figure 2).
Figure 2. From ontologies to active ontologies. A subset of the Gene Ontology
9
, left, alongside a subset of
an active ontology for event planning
30
, right. Red relationships and entities indicate dynamic computation.
Unlike gene ontologies, however, which are essentially descriptive, Siri’s ontologies are coupled with
dynamic reasoning systems that render them active: “Whereas a conventional ontology is a formal
representation of domain knowledge with distinct concepts and relations among concepts, an Active
Ontology is a processing formalism where distinct processing elements are arranged according to
ontology notions; it is an execution environment”30
. These active ontologies not only encode entities
and relations, but entities are associated with states and relations are associated with rule sets that
perform actions within and among entities. Through a bottom up execution, input states are
incrementally propagated up the hierarchy to impact higher-level entities, whose states are output as
the answer to the initial question – the best prediction based on the inputs.
For example, try asking Siri to “Find a good sushi restaurant for two tonight”. This query is translated
by setting the states of several entities: style is set to ‘sushi’, address to the user’s current location,

party size to the value ‘2’, and event date to today’s date (Figure 2). These values are propagated
through the ontology to generate a list of restaurants, which becomes the state of the event entity.
This event result can then be provided to the user or included in further computations. In Aim 2, we
will explore whether such systems can teach us how to develop question-and-answer, or genotype-to-
phenotype prediction, systems for cell biology31
.
Cell-cell interaction networks. We will also develop technology for understanding network structure
above the cellular level. In so-called cell-cell interaction networks, nodes are cells and edges
represent physical or chemical (e.g. hormonal) interactions. Chemical interactions are of greatest
interest as they describe inter-cellular signaling and regulation pathways, which could in the future be
controlled to grow artificial organs, heal tissues and develop novel therapies. Increasing information in
this area will enable network science to contribute to the wider understanding of physiological
systems. We have gained experience and interest in this area via analysis of two novel experimentally
mapped cell-cell interaction networks of the developing human hematopoietic system32,33
in
collaboration with Peter Zandstra at the University of Toronto (Zandstra is now a DBP). The Zandstra
lab is interested in mapping inter-cellular networks and feedback in regulating stem and progenitor cell
fate for the purposes of growing blood from stem cells, which would be safer than blood donations.
Cell-cell interaction networks demand new analysis tools that consider their autocrine and paracrine
structure and how they are controlled by intra-cellular molecular networks. Despite the recognized
importance of inter-cellular networks and feedback in regulating multicellular organism development,
the specific cell populations involved and underlying molecular mechanisms are largely undefined. For
example, blood cells are known to secrete and respond to a large number of regulatory proteins in
lineage- and differentiation stage-specific patterns34,35
. Dynamic mathematical models of cells
patterning into tissues during development have been built36-38
, but they function at the cell
population/tissue level and treat cells as a compartment or spatial gradient and do not consider actual
cell-cell interactions. Perhaps the best-studied cell-cell interaction network is that of the worm,
Caenorhabditis elegans, which has been completely mapped over organism development by
microscopy. Network analysis by clustering found that interneurons are more densely connected in
the nervous system compared to sensory or motor neurons, leading to the interpretation that these
cells act as central processing units39
. More recent work predicted cell-cell networks involved in
cancer therapy resistance40
, and found that specific network motifs are enriched in inter-cellular
cytokine mediated communication networks41
and that specific components are more important than
others42
, however this work has thus far studied small cell-cell network models that were never
experimentally validated. As technology for single cell and stem cell measurement improves, we
expect a growth in the amount of cell-cell network information. We are already observing this growth
in projects such as a new CSP from Laurie Ailles at the University Health Network in Toronto, who is
studying how cancer-associated fibroblasts provide a supportive microenvironment for cancer stem
cells within high-grade serous ovarian cancer and other cancers. New technology she has developed
quantifies the protein levels of 363 cell surface antigens in single cell populations43
.
INNOVATION
Central innovation and hypothesis. The central innovation of this TRD project is a set of ideas and
approaches for transitioning Network Biology from the current status-quo of flat, pairwise, and
descriptive representations of biological interactions, to a future in which the same interaction data
lead to the construction of hierarchical models of biological structure and function. We will explore the
hypothesis that current network representations, which view a dataset of pairwise interactions as a
mathematical graph of nodes and edges, may be “too close” to the raw data to allow for complete or
even accurate biological insight. Models derived from the same interactions, such as gene ontologies
and biological process diagrams, may form a more intuitive result, provided these multi-scale
formulations can avoid the tendency towards over-fitting or -interpretation.
The most direct representations of data are not always the most desirable for meaningful
interpretation of those data. In x-ray crystallography, the most direct representations of x-ray

diffraction patterns are two-dimensional images44
. However, when many such images are integrated
and analyzed, exquisite 3D structural models of proteins emerge which, in turn, enable accurate
predictions of protein dynamics and function. Similarly, from many molecular measurements and
interaction data sets the higher order structure and function of the cell might emerge, if only we could
figure out how to assemble these images properly.
Turning networks into ontologies: towards a Network-eXtracted Ontology. Recently we and
others have shown very promising results in the hierarchical analysis of physical and genetic
networks—i.e., that networks harbor rich structure which is not only modular but also hierarchical and
multi-scale45-50
(Aim 1 Progress Report). In particular, we have been able to recover ~60% of the
hierarchical GO Cellular Component hierarchy de novo, directly from physical and genetic network
data gathered in S. cerevisiae and in a manner that is completely independent from the known
structure of GO or from the literature. The resulting Network-eXtracted Ontology, which we call NeXO,
provides a structured hierarchical interpretation of network data which will in most cases be vastly
preferable to flat lists of interaction (a.k.a. interaction ‘hairballs’) or flat lists of network
clusters/complexes. The focus of Aim 1, and an innovative aspect of this proposal, is to explore how
these ontologies can be iteratively updated by a community of biomedical investigators.
3.1 ASSEMBLY AND REFINEMENT OF ONTOLOGY STRUCTURE FROM
BIOLOGICAL NETWORK DATA
Project Leader: Trey Ideker (UCSD)
Overview. Ontologies have been very successful at capturing hierarchical, multi-scale cellular
organization. In the prior period of support we introduced methods for assembling a gene ontology
directly from the hierarchical structure contained in molecular networks and other ‘omics data. We
prototyped a web resource to distribute network-based ontologies (NeXO, nexontology.org), but it is
still at an early stage. In the next support period, we will research methods to iteratively and flexibly
incorporate new experimental results and data into a ‘working’ NeXO ontology, highlighting new terms
and term relations that are created, alongside existing terms/relations that are further supported or
weakened. We will transform nexontology.org to an interactive community resource that enables
investigators not only to browse an existing ontology but to create, share, and iteratively update,
revise, correct, and expand these ontologies. These tools will be built and explored alongside Driving
Biomedical Projects including a Yeast Gene Ontology, a Cancer Gene Ontology, a Viral-Host Gene
Ontology and a hierarchical exploration of social networks. The goal is a means of systematically
incorporating ‘omics data into whole-cell ontological models, with the potential to systematize and
crowd-source an important type of model construction.
Preliminary Results and Progress Report: Proof-of-concept and maturation of a Network-
eXtracted Ontology (NeXO). The previous award supported research by NRNB investigators that led
to creation and prototyping of the first gene ontology inferred from ‘omics data, the NeXO
Resource28,51
(http://nexontology.org). This work fell naturally under previous TRD-C: Visualization
and Representation of Biological Networks. NeXO provides a methodology whereby physical and
genetic network data can be transformed to assemble a structured ontology of protein complexes.
Using this system, we assembled an ontology based on four large yeast networks capturing current
knowledge of physical protein-protein interactions, genetic interactions (synthetic-lethality and
epistasis), co-expressed genes, as well as an integrated functional network known as YeastNet52
. The
resulting Network-eXtracted Ontology (NeXO) contains a total of 4,123 terms and 7,804 term-term
relationships (Figure 3). Based on alignment of the systematic NeXO to the literature-curated Gene
Ontology (GO), it appears that NeXO captures ~60% of terms in the Cellular Component branch of
GO. To further validate NeXO vs. GO, we have used both ontologies to perform functional enrichment
of gene sets, the task to which GO is most often applied. In this regard, NeXO performs at least as
well as GO for functional enrichment in several different genome-scale data sets. Thus, the computed
ontology provides functionally-relevant terms which cover a wide spectrum of yeast biology to an

extent comparable to manually-curated efforts. Since the original proof-of-concept work was published
in early 201328
, we have released a visually integrated website for browsing NeXO and GO ontologies
in the style of Google Maps51
. This summer we published a major improvement to the ontology
inference algorithm53
which was presented and well-received at the Intelligent Systems in Molecular
Biology (ISMB 2014) conference. Progress Report Publications 1-11.
Methods
Basic inference of ontologies and alignment to a reference. To construct a data-driven ontology, a set
of input features is first gathered for each gene, representing information collected from ‘omics studies
such as its interaction partners in molecular networks, its expression levels over time or conditions, or
other data depending on the DBP. These features are analyzed to generate a pairwise gene-gene
similarity matrix, in which the similarity between two genes reflects their closeness in input features.
Many methods have been proposed for this purpose54-56
, presently we have been successful with the
technique of random forest regression57
. The pairwise similarity matrix is then clustered (Figure 4)
using either of several algorithms we have published in prior work28,53
. For example, our original
method is to use a hierarchical probabilistic model for community detection50,58
which constructs a
binary tree, or dendrogram, seeking to maximize the overall probability of the network data by
iteratively joining sets of genes with similar patterns of interaction. Gene sets, represented by nodes in
the tree, are suggestive of biological entities or ‘terms’ in an ontology. Joining of two sets, represented
by connecting two nodes beneath a third, suggests specialized terms that are part of a more general
one. The tree is then expanded to allow for creation of terms with multiple (>2) children and/or parents
which is important for identifying complexes with many subunits or which participate in multiple parent
processes [transforming the hierarchical tree into a directed acyclic graph— we do not detail this
method here but it involves evaluating the probability of the network under the new vs. old structure].
This method yields a novel structure that we call the Network-Extracted Ontology, or NeXO, in which
genes are organized under a hierarchy of terms and parent-child term relations strongly supported by
the input datasets. At this stage terms simply represent structures detected in data and are given
systematic IDs, much like ORFs detected in a newly-sequenced genome. To annotate these terms
with information from known biology, the NeXO structure is aligned against a reference ontology,
Figure 3. Building the
NeXO ontology. The
ontology is reduced to a
tree, with nodes indicating
terms and edges indicating
hierarchical relations
between terms, i.e. that
one term contains another.
Node sizes indicate the
number of genes assigned
to a term. Node colors
represent the degree of
correspondence to a term
in GO as determined by
ontology alignment, with
high-level alignments
labeled. Insets show the
hierarchy identified for the
ribosome and actin
cytoskeleton.

much like ORFs are annotated by alignment against a reference genome whose genes are well-
annotated. As in past work, our default reference ontology for this step will be the literature-curated
Gene Ontology. The desired result of aligning NeXO and GO is to identify NeXO terms that
correspond to well-known versus novel structures, as well as GO terms that are well-supported by the
available data. For high confidence matches, the GO annotations are transferred to the NeXO term,
including the term name and description. Terms that are novel (similar to ‘ORFaned’ genes) may
become extremely interesting for further biological exploration and experimental follow-up.
Although methods for ontology alignment have not received much attention in molecular biology or
bioinformatics, they are under active research in the computer science and semantic web
communities. We will implement an ontology alignment algorithm based on a previously-proposed
method called ASMOV59
, which was the winning ontology alignment algorithm in the 2010 Ontology
Alignment Evaluation Initiative (om2010.ontologymatching.org/). The method was designed to align
semantic ontologies, and it is based on a score function that measures the lexical similarity of text
labels and comments associated with terms. Hence, we will modify and expand this approach to align
ontologies in which the terms refer to sets of genes (technically, the set of genes assigned to a term
defines the ‘label’ of that term).
Application of current and new procedures for data-driven ontologies to Driving Projects. We will begin
work immediately to construct and/or revise data-driven ontologies with each of our Driving
Biomedical Projects, an activity that is expected to continue for most of the next five-year performance
period. The projects are:
1. Creating new terms and term relations in the Gene Ontology. Our previous efforts to infer gene
ontologies from network data were initially carried out as a Collaboration and Service Project (CSP)
with Mike Cherry, Professor of Genetics at Stanford and head of the Gene Ontology Consortium for
the Saccharomyces model organism. Together with Cherry, we will continually apply tools developed
in this TRD to revise and expand the yeast NeXO based on new data, and to communicate the most
promising new terms and term relations it identifies to the Saccharomyces GO.
2. Elucidating the hierarchy of modules in the virus-human protein interaction network. Dr. Nevan
Krogan at UCSF is a world leader in generating large-scale maps of protein complexes based on
affinity purification mass spectrometry as well as in systems for synthetic lethal genetic interaction
screening. NRNB and Krogan have a long-standing relationship in developing physical and genetic
interaction maps of biological systems of interest11,12,18,60-62
, including the original NeXO paper28
. We
expect this productive relationship to continue as we develop tools for data-driven assembly and
refinement of gene ontologies within this TRD, initially as applied to physical and genetic interactions
of viral protein subunits with proteins encoded by the human host.
A
A
B
A
A
F
i
g
u
r
e
X
.
A
u
t
o
m
a
t
e
d
a
s
s
e
m
b
l
y
a
n
d
a
l
i
g
n
m
e
n
t
o
f
g
e
n
e
Figure 4. Automated assembly and alignment of gene ontologies. (A) Probabilistic community detection
within the input networks yields a binary tree in which nodes correspond to ontology terms and links
correspond to parent-child term relations. Unsupported terms are replaced by multi-way joins, and additional
parent-child relations are added based on network data. The resulting ontology is aligned against the Gene
Ontology, in a way that (B) prohibits non-unique mappings and ancestor-descendant criss-crossing.

3. Gene ontology inference based on binding-site-resolved ‘edgetic’ protein networks. Drs. Marc Vidal
and David Hill are pioneers in protein interaction mapping via the yeast-two-hybrid system. Recently
they developed the capability to map interactions at binding site resolution, by using modular protein
domains as baits combined with phage display knowledge of the preferred binding motif of each
domain. We will together explore whether this binding interface information can be used to inform the
inferred gene ontology structures we are building in this TRD.
4. Hierarchical analysis of cancer subtypes with TCGA / ICGC and Sage Bionetworks. Cancer
genomics projects are generating large cancer specific ‘omics data sets. Therefore, natural DBPs for
this project are provided by The Cancer Genome Atlas, International Cancer Genome Sequencing
Consortium, and Sage Bionetworks, all of which are associated with major cancer genomics projects
nationally and internationally. Our focus will be to construct a Cancer Gene Ontology based on a pan-
cancer analysis of data from all ~20 major TCGA tissue types. Such a Cancer GO would provide
insight into the hierarchy of biological processes and cellular components that is somatically mutated
or differentially activated during cancer progression.
5. Understanding the multi-scale hierarchy of social interactions. We will work with UCSD Professor
James Fowler, a renowned social networks researcher, to apply the hierarchical methods developed
in this aim to analyze the structure of a large social network generated from the Framingham Heart
Study. This study has surveyed health behaviors, disease outcomes, and social relationships among
>12,000 people for over 37 years25-27
.
During these collaborations, we will experiment with ontologies constructed with different sources and
types of data, e.g. using genetic interactions only versus those that also include physical interactions
and other types. Such exploration is needed to evaluate which interaction types are most revealing of
cellular componentry such as protein complexes and larger macro-molecular structures, and how to
weight genetic versus physical interactions for this purpose. We will seek to determine how much
interaction data one needs to construct a robust ontology for each of the DBP datasets, e.g., one
which is able to faithfully recover a substantial fraction of knowledge in the manually-curated GO. At
present, what we know is that this is possible using an integrated network including all genetic and
physical interactions that have been mapped to-date for budding yeast.
Development of iterative procedures for incorporating new data into a data-driven ontology. We will
conduct a major program of exploratory research and development on approaches by which data-
driven gene ontologies such as NeXO can evolve over time, by incorporating new datasets as they
are generated and published. We will begin by evaluating a relatively straightforward approach, which
is to integrate the new dataset(s) into the pairwise gene similarity matrix which forms the input to the
ontology inference method (see above). Once the similarities have been adjusted, an ‘updated’
ontology is constructed based on the old+new data and aligned against the ‘previous’ ontology based
on old data only. Similar to alignment against GO (see above), the desired result is to identify terms
and term relations in the updated ontology that are newly-created as well as previous terms / relations
that are reinforced by the new data. Ultimately one might also imagine downgrading or retiring terms
that have remained unsupported over many diverse dataset updates, but this is admittedly a more
delicate proposition than adding new terms. A limitation of this simple update approach is that the
complete ontology must be reconstituted each time a new data set is evaluated. An alternative and
more optimal approach may be to directly modify the previous ontology using information from the
new data set. We will explore both simple and these more advanced approaches in the course of
research.
Given an update procedure, the experimentalist may wish to design further studies aimed at the new
terms. These specially directed new data could then spawn another ontology update, enabling the
exciting possibility of continued iteration between improving the ontology (aka the biological model)
and the experimental data generation phases of a study.

An online system for distribution and community construction of data-driven ontologies. Ontology
models developed with our DBPs will be made available to the scientific community via query from the
stand-alone NeXO website, nexontology.org, as well as through a specialized App for Cytoscape. We
will also prototype a web-based system whereby a unified and common ‘Crowd-Sourced NeXO
Ontology’ can be iteratively updated from biological data sets uploaded by investigators from the
biomedical research community at large. Achieving this vision will require the addition of major
features to nexontology.org, including user accounts, data upload, and a cloud-based implementation
of ontology inference. If successful, we will seek to transition the new website to independent funding
to support what could ultimately become a large community of users. The allure of such a system is
that the wealth of ‘omics data being generated every year could be analyzed to assemble different
types of gene ontology systematically, with less and less reliance on back curation of the literature.
Ultimately, the desired outcome is to enable a shift from using ontologies to evaluate data to using
data to construct and evaluate ontologies—that is, from a regime in which the ontology is viewed as
gold standard to one in which it is the major result.
3.2 FUNCTIONALIZED GENE ONTOLOGIES AS A HIERARCHY OF
PHENOTYPIC PREDICTION
Project Leader: Trey Ideker (UCSD)
Overview. Whether based on expert knowledge (GO) or inferred from data (NeXO in Aim 1), current
gene ontologies are static descriptions of cellular structure and organization. They enable
representing and reasoning on the structural relationships among biological entities27,29
but lack any
native capacity to capture dynamic functional states or make phenotypic predictions. However, since
gene ontologies inherently represent multi-scale hierarchy in cellular organization, they provide in
theory an ideal substrate for building models that would also be predictive of a range of cellular
functions and phenotypes. In this respect, question and answer systems developed in the field of
knowledge representation and reasoning26
, such as Apple’s Siri and IBM’s Watson, provide an
excellent example of what a predictive, or ‘executable’, ontology looks like. In this aim, we will explore
whether such systems can teach us how to develop predictive systems for cell biology31
. This aim
intersects with the separate TRD project on Predictive Networks and serves as a bridge between the
two TRDs.
Preliminary Results and Progress Report: Activating static networks as predictive models. The
Ideker laboratory has over the years introduced a progression of approaches that seek to use
molecular network information to guide the prediction of phenotypic outcomes such as disease state
or drug response. Relevant works include ActiveModules63
, Network-Based Classification64
, Network-
Guided Forests65
, Network-Based Stratification66
and several influential reviews on using networks
predictively67,68
. The more recent works (2011 to present) were supported by the past period of NRNB
funding. Generally, our methodology has been to identify subnetworks of genes whose expression
levels (molecular profile) or mutation states (genotype) can be functionally combined to predict
disease outcome (phenotype or class). For example, Network-Guided Forests is a classification
method that associates subnetworks of genes with decision trees that evaluate the expression levels
of those genes to predict sample class. Such approaches have shown success in classification of
metastatic vs. non-metastatic breast cancer64
, aggressive vs. indolent leukemia69
, as well as
classification of cell fate decisions during development16,65
. We have found repeatedly that, unlike the
gene sets identified by regular classifiers, the subnetworks identified by network-based methods are
highly enriched for causal factors of disease, and they show very consistent performance across
different sample datasets. Progress Report Publications 12-17.
Methods
Taking clues from Siri: propagation of state on predictive ontologies. We will explore use of the
structure of ontologies, rather than the structure of networks, in making phenotypic predictions. The
key distinction is that networks are concerned mainly with pairwise associations between genes,

whereas ontologies represent hierarchical relations across a range of biological modules at various
scales including genes and proteins, protein complexes, pathways and processes, and organelles.
Question and answering systems such as Apple’s Siri provide a useful model of how hierarchical
relations in an ontology can propagate state information. Unlike current gene ontologies which are
descriptive, Siri’s ontologies are coupled with dynamic reasoning systems that render them active:
“Whereas a conventional ontology is a formal representation of domain knowledge with distinct
concepts and relations among concepts, an Active Ontology is a processing formalism where distinct
processing elements are arranged according to ontology notions; it is an execution environment”30
.
These active ontologies not only encode entities and relations, but entities are associated with states
and relations are associated with rule sets that perform actions within and among entities. During
execution, input states are incrementally propagated up and down the hierarchy to impact other
entities, whose states provide the answer to the initial question – the best prediction based on the
inputs. How the ontologies within Siri are used to answer questions, however, is very different from
how GO is used today in bioinformatics. Typically, GO terms are associated with a set of genes
(annotations), but not with dynamic states; the relationships between GO terms are not associated
with rule sets that perform actions, at least beyond propagation of gene set annotations. Given this
similarity, we will explore construction of such an ‘active’ gene ontology as a general engine for
genotype-phenotype translation.
Genotype-to-phenotype prediction challenges from Driving Biological Projects. We will base our
methods development on data and prediction challenges motivated by DBPs in yeast (Cherry DBP)
and cancer (TCGA DBP). Yeast has by far the largest number of genotype-phenotype measurements
of any organism: most single and double gene knockout strains have been constructed and assayed
for growth, yielding over 10 million ‘simple’ genotypes systematically tested for the same phenotype70-
72
. In addition, hundreds of natural yeast genetic isolates have been fully sequenced and extensively
phenotyped, providing examples of complex genotype backgrounds73
. In cancer, TCGA currently has
tumor exomes available for over 8000 cancer patients (genotypes), along with clinical information
such as survival time, tumor grade, and in some cases drug response (phenotypes). In both yeast and
cancer, the goal is to predict the phenotype of growth, survival, etc. given the genotype of a strain or
patient.
Transformation of genotype to ‘ontotype’. The genotype indicates the set of mutation states of all
genes, which for each gene might be represented simply as {mutated, wildtype} or {loss-of-function,
wild-type, gain-of-function} before considering more precise values. We will prototype propagation
approaches by which these states on genes can be integrated with a gene ontology to infer
corresponding states on terms. For example, since the gene SWI4 encodes a subunit of the SBF
complex, the yeast swi4Δ genotype {Swi4 <= loss-of-function} might propagate upwards in the
ontology to set the state of the parent term {SBF transcription complex <= loss-of-function}, and
continue to propagate upwards to affect ancestor terms at higher scales such as ‘RNA pol II
transcription factor complex’ and ultimately ‘nucleus’ and ‘cell’. We call the set of mutation states of all
terms the ‘ontotype.’
For prediction problems, the ontotype and genotype can then be used together or separately as a set
of features for classification of a phenotypic class, e.g. {alive, dead}, or regression against a
quantitative phenotype, e.g. numerical growth rate or progression-free time interval. Alternatively, the
state of any particular term, representing a cellular component or process, can itself be considered as
the phenotype of interest. Predictions will be benchmarked using metrics such as ROC and PR curves
along with standard statistical techniques such as cross-validation or bootstrapping.
Open questions and milestones. A major research question will be to determine how to dynamically
compute the states of ontology terms based on the states of their children, parents, descendants, and
ancestors. The underlying mathematical function could take many forms, including logic gates such as
AND / OR, linear or additive functions, probabilistic functions, or polynomial or logistic equations. How

to determine the specific forms and parameters of these functions, regardless of what form they take,
is also unclear. This step could happen by statistical association from many input-output examples
using machine learning methods, by including externally generated biological knowledge specific to
each entity, or by manual curation from literature. As this aim is quite exploratory, we do not include
specific algorithmic plans or mathematical details here. Some important milestones for success,
however, will be (1) a proof-of-principle bioinformatic method for propagating molecular profiles on a
gene ontology to predict a phenotypic outcome, and (2) implementation of this method in a robust
software tool as a Cytoscape App.
3.3 BRIDGING LIGAND-RECEPTOR NETWORKS TO CELL-CELL
COMMUNICATION NETWORKS
Project Leader: Gary Bader (University of Toronto)
Overview. Cell-cell interaction networks are an emerging area of network science. In collaboration
with the Zandstra DBP, which is mapping cell-cell interaction networks in the hematopoietic system to
help engineer blood tissue, we will develop novel technology for cell network analysis. We will develop
methods to infer cell-cell interaction networks from molecular profiling data of purified cell populations,
cell-cell interaction network topology analysis software, methods to identify intracellular pathways that
control cell-cell interactions and methods to visualize multi-scale models of inter-cellular
communication networks and their intracellular signaling systems.
Preliminary Results and Progress Report. In the past funding period, we worked with the Zandstra
lab to prototype cell-cell interaction network inference methods and their analysis. Two papers were
published in Molecular Systems Biology that experimentally mapped novel cell-cell interaction
networks for the purpose of identifying growth and inhibitory factors that modulate self-renewal, which
is useful for blood stem cell control. The second paper included network topology analysis and
discovered that ligand production is cell type dependent, whereas ligand binding is promiscuous.
Consequently, additional control strategies such as cell frequency modulation and
compartmentalization were needed to achieve specificity in HSC fate regulation. These proof-of-
concept methods now need to be further developed to extend and streamline their use, as described
below. Progress Report Publications 20,118.
Methods
Cell-cell interaction network inference from single cell population molecular profiles. Cell-cell
interaction networks are currently mapped by inferring regulatory relationships based on the
expression of transmitters and receptors at the cell surface. For instance, if cell type A expresses the
epidermal growth factor peptide hormone and cell type B expresses the epidermal growth factor
receptor protein, and there is a means to transmit the hormone to the target receptor (e.g. by diffusion
within a tissue or in the blood stream), then a directional edge is inferred from cell type A to cell type
B. This process depends on the availability of relatively pure cell populations and ability to measure
the expression of their secreted and surface proteins, both of which are practical with current
technology43,74,75
. We will develop technology to automatically process mRNA and protein expression
profiles from cell populations into cell-cell interaction networks using the following steps:
1. Identify all known ligands and receptors based on known gene function annotation. For
instance, using gene ontology terms “cytokine activity,” “growth factor activity,” “hormone
activity,” and “receptor activity,” genes with ligand or receptor activity will be compiled from the
Ensembl BioMart web service76
.
2. Collect all known protein interactions between ligands and receptors (e.g. from iRefIndex77
,
GeneMANIA78
, Pathway Commons79
and related comprehensive interaction resources). We
have previously literature curated ~270 ligand-receptor pairs not currently in standard
databases and these will also be included32,33
.

3. Compile a list of expressed ligands and receptors from each available cell type population,
based on available gene or protein expression data43,74,75
. We will prefer protein expression
information, but will use mRNA expression levels a proxy when protein levels are not available
(with appropriate caveats).
4. Infer directed regulatory edges between expressed ligand and receptor pairs.
5. Visualize the resulting cell-cell interaction network.
Preliminary work successfully used this approach, but we will develop it into a generally applicable
technology that can be conveniently automatically updated. Our initial focus will be on available
human data, but the technology will be applicable to any organism with enough information available.
Discovery of key players and rules of cell-cell interaction networks. We will develop technology to
make it easy for biologists to computationally analyze the topological properties of cell-cell interaction
networks to help identify key control points and general organizational principles. We will use multiple
established measures of node importance in networks (centrality measures), including hub detection
(find highly connected nodes that when removed cause the network to split into parts80
) and
betweenness centrality (find important connection points between different network regions81
). This
analysis will be accomplished using the CytoHubba, CentiScaPe and/or NetMatch network analyzer
Cytoscape apps, which we will tailor to function on directed cell-cell interaction networks. In particular,
selected network analysis functions in these apps will be published as Cytoscape commands so they
can be made available in a cell-cell interaction network analysis app that we will develop.
Identify intracellular pathways that control and are controlled by cell-cell interactions. We will develop
novel computational methods to explain how signals observed to occur between cells are controlled
by and control internal molecular networks and pathways. First, we will gather an intracellular network
of physical molecular and control interactions between all identified receptors and secreted chemical
signal genes from available molecular interaction and pathway databases (e.g. iRefIndex,
GeneMANIA, Pathway Commons). We will then use established path finding algorithms (e.g. as
implemented in Cytoscape apps such as PathExplorer and in the Pathway Commons web service
system) to identify potential signaling pathways that control chemical signal secretion, and links from
activated receptors to activation of pathways in target cells. Paths will be limited to genes expressed
in the given cell population. To identify pathways that are controlled by a given cell-cell
communication path, we will apply pathway enrichment analysis to downstream molecules in target
cells. Thus, we will predict how inter-cellular signaling impinges on intracellular systems, which in turn
could impinge on additional cell-cell signaling paths. We will also use the Pathway Extraction and
Reduction Algorithm (PERA) method described in TRD1 to identify signaling systems involving cell-
cell communication factors.
Multi-scale visualization of cell-cell interaction networks in the context of internal molecular networks.
We will develop novel multi-scale network visualization methods to help interpret networks generated
in this aim. In particular, we will group ligand and receptor families (using Cytoscape’s grouping
function) to reduce complexity of the resulting network, based on family information in the Gene
Ontology. We will also develop methods to display intracellular molecular paths, where nodes
represent genes, within nodes representing cells. These paths will also connect to intracellular nodes
representing pathways to visualize which pathways are activated by specific cell-cell communication
signals.
Links with other TRDs. As the active collection of molecular profiles for secreted and receptor protein
expression grows, we expect data sets to become available that cover multiple time points and
samples (e.g. disease patients and healthy controls). Thus, we will develop multi-scale cell-cell
interaction networks across conditions and use technology developed in the Differential Networks
TRD to compare them. We will also explore how patient specific versions of these networks can be
used as predictive features in work described in the Predictive Networks TRD.

TRD 3: MULTI-SCALE NETWORKS –
BIBLIOGRAPHY AND REFERENCES CITED
1. Mathews, C.K. The Cell-Bag of Enzymes or Network of Channels? J Bacteriol 175, 6377-81
(1993).
2. Reddy, G.P., Singh, A., Stafford, M.E. & Mathews, C.K. Enzyme Associations in T4 Phage
DNA Precursor Synthesis. Proc Natl Acad Sci U S A 74, 3152-6 (1977).
3. Monod, J., Changeux, J.P. & Jacob, F. Allosteric Proteins and Cellular Control Systems.
Journal of Molecular Biology 6, 306-& (1963).
4. Kauffman, S.A. The Origins of Order : Self-Organization and Selection in Evolution, xviii, 709
p. (Oxford University Press, New York, 1993).
5. Pollack, G.H. Cells, Gels and the Engines of Life : A New, Unifying Approach to Cell Function,
xiv, 305 p. (Ebner & Sons, Seattle, WA, 2001).
6. Bray, D. Wetware : A Computer in Every Living Cell, xii, 267 p. (Yale University Press, New
Haven ; London, 2009).
7. Barabasi, A.L. & Oltvai, Z.N. Network Biology: Understanding the Cell's Functional
Organization. Nat Rev Genet 5, 101-13 (2004).
8. Mitra, K., Carvunis, A.R., Ramesh, S.K. & Ideker, T. Integrative Approaches for Finding
Modular Structure in Biological Networks. Nat Rev Genet 14, 719-32 (2013).
9. Ashburner, M. et al. Gene Ontology: Tool for the Unification of Biology. The Gene Ontology
Consortium. Nat Genet 25, 25-9 (2000).
10. Novarino, G. et al. Exome Sequencing Links Corticospinal Motor Neuron Disease to Common
Neurodegenerative Disorders. Science 343, 506-11 (2014).
11. Bandyopadhyay, S. et al. Rewiring of Genetic Networks in Response to DNA Damage.
Science 330, 1385-9 (2010).
12. Roguev, A. et al. Conservation and Rewiring of Functional Modules Revealed by an Epistasis
Map in Fission Yeast. Science 322, 405-10 (2008).
13. Workman, C.T. et al. A Systems Approach to Mapping DNA Damage Response Pathways.
Science 312, 1054-9 (2006).
14. Konig, R. et al. Human Host Factors Required for Influenza Virus Replication. Nature 463,
813-7 (2010).
15. Suthram, S., Sittler, T. & Ideker, T. The Plasmodium Protein Network Diverges from Those of
Other Eukaryotes. Nature 438, 108-12 (2005).
16. Ravasi, T. et al. An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man. Cell
140, 744-52 (2010).
17. Bandyopadhyay, S. et al. A Human Map Kinase Interactome. Nat Methods 7, 801-5 (2010).
18. Guenole, A. et al. Dissection of DNA Damage Responses Using Multiconditional Genetic
Interaction Maps. Mol Cell 49, 346-58 (2013).
19. Begley, T.J., Rosenbach, A.S., Ideker, T. & Samson, L.D. Hot Spots for Modulating Toxicity
Identified by Genomic Phenotyping and Localization Mapping. Mol Cell 16, 117-25 (2004).
20. Srivas, R. et al. A Uv-Induced Genetic Network Links the Rsc Complex to Nucleotide Excision
Repair and Shows Dose-Dependent Rewiring. Cell Rep 5, 1714-24 (2013).
21. Jaehnig, E.J., Kuo, D., Hombauer, H., Ideker, T.G. & Kolodner, R.D. Checkpoint Kinases
Regulate a Global Network of Transcription Factors in Response to DNA Damage. Cell Rep 4,
174-88 (2013).
22. Chuang, H.Y., Hofree, M. & Ideker, T. A Decade of Systems Biology. Annu Rev Cell Dev Biol
26, 721-44 (2010).
23. Walhout, A.J.M., Vidal, M. & Dekker, J. Handbook of Systems Biology : Concepts and Insights,
xiii, 538 p. (Waltham Academic Press, London ;, 2013).
24. Koller, D. & Friedman, N. Probabilistic Graphical Models : Principles and Techniques, xxi,
1231 p. (MIT Press, Cambridge, MA, 2009).
25. Gruber, T.R. Toward Principles for the Design of Ontologies Used for Knowledge Sharing.
International Journal of Human-Computer Studies 43, 907-928 (1995).

26. Brachman, R.J. & Levesque, H.J. Knowledge Representation and Reasoning, xxix, 381 p.
(Morgan Kaufmann, Amsterdam ; Boston, 2004).
27. Robinson, P.N. & Bauer, S. Introduction to Bio-Ontologies, xxvii, 488 p. (Taylor & Francis,
Boca Raton, 2011).
28. Dutkowski, J. et al. A Gene Ontology Inferred from Molecular Networks. Nat Biotechnol 31, 38-
45 (2013).
29. Myhre, S., Tveit, H., Mollestad, T. & Laegreid, A. Additional Gene Ontology Structure for
Improved Biological Reasoning. Bioinformatics 22, 2020-7 (2006).
30. Guzzoni, D., Baur, C. & Cheyer, A. Active: A Unified Platform for Building Intelligent Web
Interaction Assistants. 2006 IEEE/WIC/ACM International Conference on Web Intelligence and
Intelligent Agent Technology, Workshops Proceedings, 417-420 (2006).
31. Wren, J.D. Question Answering Systems in Biology and Medicine--the Time Is Now.
Bioinformatics 27, 2025-6 (2011).
32. Qiao, W. et al. Intercellular Network Structure and Regulatory Motifs in the Human
Hematopoietic System. Molecular systems biology 10, 741 (2014).
33. Kirouac, D.C. et al. Dynamic Interaction Networks in a Hierarchically Organized Tissue. Mol
Syst Biol 6, 417 (2010).
34. Billia, F., Barbara, M., McEwen, J., Trevisan, M. & Iscove, N.N. Resolution of Pluripotential
Intermediates in Murine Hematopoietic Differentiation by Global Complementary DNA
Amplification from Single Cells: Confirmation of Assignments by Expression Profiling of
Cytokine Receptor Transcripts. Blood 97, 2257-68 (2001).
35. Majka, M. et al. Numerous Growth Factors, Cytokines, and Chemokines Are Secreted by
Human Cd34(+) Cells, Myeloblasts, Erythroblasts, and Megakaryoblasts and Regulate Normal
Hematopoiesis in an Autocrine/Paracrine Manner. Blood 97, 3075-85 (2001).
36. von Dassow, G., Meir, E., Munro, E.M. & Odell, G.M. The Segment Polarity Network Is a
Robust Developmental Module. Nature 406, 188-92 (2000).
37. Kondo, S. Cell-Cell Interaction Network That Generates the Skin Pattern of Animal. Genome
Inform 16, 287-91 (2005).
38. De Matteis, G., Graudenzi, A. & Antoniotti, M. A Review of Spatial Computational Models for
Multi-Cellular Systems, with Regard to Intestinal Crypts and Colorectal Cancer Development.
Journal of mathematical biology 66, 1409-62 (2013).
39. Eckmann, J.P. & Moses, E. Curvature of Co-Links Uncovers Hidden Thematic Layers in the
World Wide Web. Proc Natl Acad Sci U S A 99, 5825-9 (2002).
40. Komurov, K. Modeling Community-Wide Molecular Networks of Multicellular Systems.
Bioinformatics 28, 694-700 (2012).
41. Frankenstein, Z., Alon, U. & Cohen, I.R. The Immune-Body Cytokine Network Defines a Social
Architecture of Cell Interactions. Biol Direct 1, 32 (2006).
42. Tieri, P. et al. Quantifying the Relevance of Different Mediators in the Human Immune Cell
Network. Bioinformatics 21, 1639-43 (2005).
43. Gedye, C.A. et al. Cell Surface Profiling Using High-Throughput Flow Cytometry: A Platform
for Biomarker Discovery and Analysis of Cellular Heterogeneity. PLoS ONE 9, e105602
(2014).
44. McPherson, A. Introduction to Macromolecular Crystallography, x, 267 p. (Wiley-Blackwell,
Hoboken, N.J., 2009).
45. Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N. & Barabasi, A.L. Hierarchical
Organization of Modularity in Metabolic Networks. Science 297, 1551-5 (2002).
46. Dotan-Cohen, D., Letovsky, S., Melkman, A.A. & Kasif, S. Biological Process Linkage
Networks. PLoS One 4, e5313 (2009).
47. Tanay, A., Sharan, R., Kupiec, M. & Shamir, R. Revealing Modularity and Organization in the
Yeast Molecular Network by Integrated Analysis of Highly Heterogeneous Genomewide Data.
Proc Natl Acad Sci U S A 101, 2981-6 (2004).
48. Kelley, R. & Ideker, T. Systematic Interpretation of Genetic Interactions Using Protein
Networks. Nat Biotechnol 23, 561-6 (2005).

49. Jaimovich, A., Rinott, R., Schuldiner, M., Margalit, H. & Friedman, N. Modularity and
Directionality in Genetic Interaction Maps. Bioinformatics 26, i228-36 (2010).
50. Park, Y. & Bader, J.S. Resolving the Structure of Interactomes with Hierarchical Agglomerative
Clustering. BMC Bioinformatics 12 Suppl 1, S44 (2011).
51. Dutkowski, J. et al. Nexo Web: The Nexo Ontology Database and Visualization Platform.
Nucleic Acids Res 42, D1269-74 (2014).
52. Lee, I., Li, Z. & Marcotte, E.M. An Improved, Bias-Reduced Probabilistic Functional Gene
Network of Baker's Yeast, Saccharomyces Cerevisiae. PLoS One 2, e988 (2007).
53. Kramer, M., Dutkowski, J., Yu, M., Bafna, V. & Ideker, T. Inferring Gene Ontologies from
Pairwise Similarity Data. Bioinformatics 30, i34-42 (2014).
54. Jensen, L.J. et al. String 8--a Global View on Proteins and Their Functional Interactions in 630
Organisms. Nucleic Acids Res 37, D412-6 (2009).
55. Lee, I., Date, S.V., Adai, A.T. & Marcotte, E.M. A Probabilistic Functional Network of Yeast
Genes. Science 306, 1555-8 (2004).
56. Jansen, R. et al. A Bayesian Networks Approach for Predicting Protein-Protein Interactions
from Genomic Data. Science 302, 449-53 (2003).
57. Breiman, L. Random Forests. Machine Learning 45, 5-32 (2001).
58. Clauset, A., Moore, C. & Newman, M.E. Hierarchical Structure and the Prediction of Missing
Links in Networks. Nature 453, 98-101 (2008).
59. Jean-Mary, Y.R., Shironoshita, E.P. & Kabuka, M.R. Ontology Matching with Semantic
Verification. Web Semant 7, 235-251 (2009).
60. Ryan, C.J. et al. Hierarchical Modularity and the Evolution of Genetic Interactomes across
Species. Mol Cell 46, 691-704 (2012).
61. Wilmes, G.M. et al. A Genetic Interaction Map of Rna-Processing Factors Reveals Links
between Sem1/Dss1-Containing Complexes and Mrna Export and Splicing. Mol Cell 32, 735-
46 (2008).
62. Hannum, G. et al. Genome-Wide Association Data Reveal a Global Map of Genetic
Interactions among Protein Complexes. PLoS Genet 5, e1000782 (2009).
63. Ideker, T., Ozier, O., Schwikowski, B. & Siegel, A.F. Discovering Regulatory and Signalling
Circuits in Molecular Interaction Networks. Bioinformatics 18 Suppl 1, S233-40 (2002).
64. Chuang, H.Y., Lee, E., Liu, Y.T., Lee, D. & Ideker, T. Network-Based Classification of Breast
Cancer Metastasis. Mol Syst Biol 3, 140 (2007).
65. Dutkowski, J. & Ideker, T. Protein Networks as Logic Functions in Development and Cancer.
PLoS Comput Biol 7, e1002180 (2011).
66. Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-Based Stratification of
Tumor Mutations. Nat Methods 10, 1108-15 (2013).
67. Ideker, T., Dutkowski, J. & Hood, L. Boosting Signal-to-Noise in Complex Biology: Prior
Knowledge Is Power. Cell 144, 860-3 (2011).
68. Carvunis, A.R. & Ideker, T. Siri of the Cell: What Biology Could Learn from the Iphone. Cell
157, 534-8 (2014).
69. Chuang, H.Y. et al. Subnetwork-Based Analysis of Chronic Lymphocytic Leukemia Identifies
Pathways That Associate with Disease Progression. Blood 120, 2639-49 (2012).
70. Costanzo, M. et al. The Genetic Landscape of a Cell. Science 327, 425-31 (2010).
71. Winzeler, E.A. et al. Functional Characterization of the S. Cerevisiae Genome by Gene
Deletion and Parallel Analysis. Science 285, 901-6 (1999).
72. Hillenmeyer, M.E. et al. The Chemical Genomic Portrait of Yeast: Uncovering a Phenotype for
All Genes. Science 320, 362-5 (2008).
73. Bloom, J.S., Ehrenreich, I.M., Loo, W.T., Lite, T.L. & Kruglyak, L. Finding the Sources of
Missing Heritability in a Yeast Cross. Nature 494, 234-7 (2013).
74. Novershtern, N. et al. Densely Interconnected Transcriptional Circuits Control Cell States in
Human Hematopoiesis. Cell 144, 296-309 (2011).
75. Laurenti, E. et al. The Transcriptional Architecture of Early Human Hematopoiesis Identifies
Multilevel Control of Lymphoid Commitment. Nature immunology 14, 756-63 (2013).

76. Kinsella, R.J. et al. Ensembl Biomarts: A Hub for Data Retrieval across Taxonomic Space.
Database : the journal of biological databases and curation 2011, bar030 (2011).
77. Turner, B. et al. Irefweb: Interactive Analysis of Consolidated Protein Interaction Data and
Their Supporting Evidence. Database (Oxford) 2010, baq023 (2010).
78. Zuberi, K. et al. Genemania Prediction Server 2013 Update. Nucleic acids research 41, W115-
22 (2013).
79. Cerami, E.G. et al. Pathway Commons, a Web Resource for Biological Pathway Data. Nucleic
Acids Res (2010).
80. Jeong, H., Mason, S.P., Barabasi, A.L. & Oltvai, Z.N. Lethality and Centrality in Protein
Networks. Nature 411, 41-2 (2001).
81. Yu, H., Kim, P.M., Sprecher, E., Trifonov, V. & Gerstein, M. The Importance of Bottlenecks in
Protein Networks: Correlation with Gene Essentiality and Expression Dynamics. PLoS Comput
Biol 3, e59 (2007).

Technology R&D Theme 3: Multi-scale Network Representations

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Technology R&D Theme 3: Multi-scale Network Representations

Ähnlich wie Technology R&D Theme 3: Multi-scale Network Representations (20)

Mehr von Alexander Pico

Mehr von Alexander Pico (17)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Technology R&D Theme 3: Multi-scale Network Representations