A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
Statistical SignificancePieceFinal
1. Statistics for Systems Biology
Pathway analysisisan interdisciplinary approach thatresulted fromtheadventof recent technologicaladvancesin next-
generation sequencing of DNA which revolutionized genomicresearch.Statistics is integral in the analysisand
visualization of thesedata in order to betterunderstand theunderlying biologicalsystems.
Pathway Analysis
Pathway analysis is a rapidly growing field that combines
biology, computer science, and statistics to build a
working computational model of the living cell. The
completion of the Human Genome Project in 2003,
which took over 13 years and nearly $3 billion, fueled
the development of sequencing technologies, known as
next-generation sequencing (NGS), that were less costly
and took less time. Every year since 2003, the cost to
generate a whole human genome sequence has fallen,
resulting in an exponential increase in data. Molecular
interactions can be measured with all of these data to
understand biological functions and predict cell behavior
in response to outside stimuli.
The most common dataset to use from NGS is
microarray-based expression profiling data. Statistics is
used to process the data of these datasets to use in
applications such as drug discovery. Genes within a
specific pathway could be analyzed to determine
whether those genes are significantly more likely to
mutate than chance. For example, if many of the genes
altered in a cancer appear to affect a particular pathway,
then drugs targeting this pathway could be effective for
that cancer. As a result, pathway analysis could be used
to develop personalized therapies that are effective and
reduce costs as well as side effects.
Visualizing Biological Data
Figure in paper “Gaussian graphical modeling reconstructs pathway
reactions from high-throughput metabolomics data” by Jan Krumsiek,
Karsten Suhre, Thomas Illig, Jerzy Adamski and Fabian J Theis.
Networks are commonly used to visualize biological data
by modeling the relationships between genes given the
interactions with nodes. Gaussian graphical models
(GGM) is a popular method of inferring a network and
assumes that the data are normally distributed. GGMs
simplify the structure in the data and can predict
pathways as well as discover novel ones. Bayesian
nonparanormal graphical models relax the normal
assumption by transforming non-normal data to normal
data and puts a distribution on the parameters in the
model in order to construct the network. Current
research involves improving the estimation of the
parameters in order to better detect significant
interactions and remove superfluous data. Combining
robust computational algorithms with statistical analysis
and visualization to describe biological data allows for
effective communication and education among members
of the scientific community and the public.