2. About Me
Born: Minsk, Belarus in 1981
Minsk, Belarus
University of Utah (2000-2007)
•B.S. Biology
•B.S. Chemistry
Salt Lake City, UT
University of California, Davis
(2007-2012)
•Ph.D. Analytical Chemistry
with Emphasis in
Biotechnology
•Post doc, Oliver Fiehn Lab
Davis, CA
data visualization network analysis machine learning
predictive modeling biochemistry software
WCMC - West Coast
Metabolomics Center
•Principal Statistician at the NIH
West Coast Metabolomics Center (WCMC)
Bioinformatics and
Data Science
•CDS - Creative Data Solutions
St. Louis, MO
5. http://www.archaeology.org/issues/207-
1603/features/4157-arles-roman-wall-paintings
Materials of Connected Biological
Data Analysis and Visualization
Quality Assessment
• use replicated measurements
and/or internal standards to
estimate analytical variance
Statistical and Multivariate
• use the experimental design
to test hypotheses and/or
identify trends in analytes
Functional
• use statistical and multivariate
results to identify impacted
biochemical domains
Network and Predictive
• integrate statistical and
multivariate results with the
experimental design and
analyte metadata
6. Predictive Modeling Within a Biochemical Context
Grapov et. al., Circ. Cardiovasc. Genet. 2014
Personalized Medicine
Complex Data Integration
Grapov et. al.,PLoS ONE (2014) doi:10.1371/journal.pone.0084260
J. Proteome Res., 2015, 14 (1), pp 557–566 DOI: 10.1021/pr500782g
Biomarker Discovery
7. Abundance
Time
Drift in >400 replicated measurements across >100 analytical batches for a single analyte
Quality Controls
(QCs) embedded
among >5,500
samples (1:10)
collected over
1.5 years
Analytical Batch
Principal Component
Analysis (PCA) of all
analytes, showing QC
sample scores
biological effect
vs. analytical
variance
Time
Biochemical Signal Over Time
8. Data Quality and Normalization
Analyte specific data quality
overview
normalizations can be used to remove
analytical variance
Raw Data Normalized Data
log mean
low precision
%RSD
high precision
9. Example of data normalization using
a LOESS model fit QCs
Raw Data Normalized Data
Samples
QCs
LOESS
Data Normalization Strategies
13. ~10%
variance
explained
Many diseases, including aging,
have dominant metabolic
components (e.g. metabolic
syndrome)
PMID:24204828
Genotype +
metabolome
>40% variance
explained
Type 2 Diabetes
Is More Data the Answer?