SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Data Normalization Approaches for
Large-Scale Metabolomic Studies
Dmitry Grapov, PhD
Analytical Variance
Variation in sample measurements stemming from sample
handling, data acquisition, processing, etc
• Can modify or mask true biological variability
• Calculated based on variance in replicated measurements
• Can be accounted for using data normalization approaches
Goal- minimize analytical variance using data normalization
Drift in >400 replicated measurements across >100 batches
Need for Normalization
To remove non-biological (e.g. analytical)
drift/variance/artifacts in measurements
Acquisition order Processing/acquisition batches
Samples
Quality Controls (QCs)
Quantifying Data Quality (precision)
Calculate median inter- and intra-batch %RSD
(for replicated measurements)
Analyte specific
performance across
whole study
Within batch
performance
Visualizing Performance
Intra-batch (within) precision for
normalization methods
Inter-batch (across) precision for
normalization methods
RSD = relative standard deviation = standard deviation/mean
Visualizing Metabolite Performance
acquisition time
batch
Univariate Multivariate
PCA
Common Normalization Approaches
Sample-wise scalar corrections
• L2 norm, mean, median, sum, etc.
Internal standard (ISTD)
• Ratio response (metabolite/ISTD)
• NOMIS (Sysi-Aho et al., 2007; selection of optimal combination ISTDs)
• CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs)
Quality control (QC) or reference sample
• Batch ratio (mean, median)
• Loess (doi:10.1038/nprot.2011.335; locally estimated scatterplot smoothing)
• Hierarchical mixed effects (Jauhiainen et al. 2014)
• Quantile (Bolstad et al., 2003; minimize variance in metabolite distribution)
Variance Based
• RUV-2 (De Livera et al., 2012; variance removal for hypothesis testing)
• Variance stabilizing normalization (Huber et al. 2002)
Evaluation of Normalizations
Use QC to define:
• Median within batch %RSD
• Median analyte study wide %RSD
• All normalization specific parameters
• Split QCs into training and test set
• Optimize tuning parameters using leave-one-out
cross-validation
• Assess performance on test set
Image: http://pingax.com/regularization-implementation-r/?utm_source=rss&utm_medium=rss&utm_campaign=regularization-implementation-r
Scalar Normalization
Calculate sample-
specific scalar to ensure
each sample’s (sum,
mean, median, etc)
signal is equivalent
• Using sum signal
normalization (sum
norm) assumes
equivalent total
metabolite signal per
sample
• Can correct for batch
effects when valid
BMC Bioinformatics 2007, 8:93 doi:10.1186/1471-2105-8-93
Theses normalizations may hide true
biological trends or create false ones
After sum norm phospholipids
seem lower in ob/ob when in
reality theses are the same as
in wt samples
Batch Ratio (BR) Normalization
Use QCs to calculate:
1. batch/analyte specific
correction factor =
(batch median /global
median)
2. Apply ratio to samples
• simple
LOESS Normalization (local smoothing)
For each analyte use QCs to:
• Tune LOESS model (span or degree of smoothing)
• LOESS model to remove analytical variance from samples
raw LOESS normalized
LOESS Normalization
LOESS span has a large effect model fit
span (α) defines the degree of
smoothing and is critical for
controlling overfitting
LOESS Normalization
raw samples (red) normalized based on QCs (black)
model is trained on QCs and applied to samples
span: too high just right?
Can not assume convergence of training and test performance because
test data has analytical + biological variance
LOESS Normalization
Avoiding over fitting is critical using the LOESS normalization
Exammple LOESS Normalization
raw span =0.75 span =0.005
Metabolomic Data Case Study I
GC-TOF
• 310 metabolites for 4930 samples
• 132 batches
• ~41 samples per batch
• ~1:10 QCs/samples (487 QCs or 9%)
• No Internal Standards (ISTDs)
Normalizations Implemented
• Batch ratio
• LOESS
• Sum known metabolite signal (mTIC) normalization
Batch Performance (GC-TOF Raw)
Within batch
• Median: 26
• Min: 19
• Max: 69
Median
RSD count cumulative %
10-20 3 2
20-30 98 76
30-40 26 96
40-50 3 98
50-60 1 99
60-70 1 100
Median
RSD count cumulative %
0-10 10 3
10-20 83 30
20-30 100 62
30-40 69 84
40-50 32 94
50-60 6 96
60-70 3 97
70-80 5 98
80-90 1 99
90-100 1 100
Analyte Performance (GC-TOF Raw)
Within Batch
• Median: 24
• Min: 7
• Max: 79
PCA (GC-TOF Raw)
Within batches
• Median: 23
• Min: 17
• Max: 69
Median
RSD count cumulative %
10-20 25 23
20-30 67 85
30-40 15 99
40-50 1 100
60-70 1 101
Batch Performance (GC-TOF BR)
Median
RSD count cumulative %
0-10 17 6
10-20 103 39
20-30 112 75
30-40 57 93
40-50 12 97
50-60 5 99
60-70 3 100
70-80 1 100
Across batches
• Median: 24
• Min: 7
• Max: 79
Batch Performance (GC-TOF BR)
PCA (GC-TOF BR)
BR Normalization Limitations
• Very susceptible to
outliers
• Requires many QCs
• Can inflate variance
when training and test
set trends do not
match
Within batches
• Median: 19
• Min: 11
• Max: 58
Median
RSD count cumulative %
10-20 75 57
20-30 51 96
30-40 4 99
40-50 1 99
50-60 1 100
Batch Performance (GC-TOF LOESS)
Median
RSD count cumulative %
0-10 17 6
10-20 103 39
20-30 112 75
30-40 57 93
40-50 12 97
50-60 5 99
60-70 3 100
70-80 1 100
Across batches
• Median: 19
• Min: 2.9
• Max: 66
Batch Performance (GC-TOF LOESS)
PCA (GC-TOF LOESS)
LOESS Normalization Limitations
raw normalized
LOESS normalization can
inflate variance when:
• overtrained
• training examples do
not match test set
Sum mTIC Normalization (GC-TOF)
Improved performance over
raw and BR, but alters data
from magnitudinal to
compositional
Sum mTIC Normalization (GC-TOF)
Poor removal of trends due to acquisition time, but limits magnitude of
outliers samples compared to other approaches
time
Raw
mTIC Normalized
Metabolomic Data Case
Study II
LC-Q-TOF
• 340+ metabolites for 4930 samples
• 132 batches
• ~41 samples per batch
• ~1:10 QC/samples (524 QCs or 11%)
• NIST reference (63 or 1%)
• 14 internal standards (ISTDs)
• NOMIS (IS = ISTD)
• qcISTD
Internal Standards Normalization
Analyte
Retention time
Internal standards (ISTD)
• qcISTD(QC optimized
metabolite/ISTD)
• NOMIS(Sysi-Aho et al., 2007;
selection of optimal combination ISTDs)
• CCRMN (Redestig et al., 2009;
removal of metabolite cross contribution
to ISTDs)
NOMIS
ISTD Based Normalizations (LC/Q-TOF)
• NOMIS (linear combination of optimal ISTDs;
Sysi-Aho et al., 2007)
• qcISTD (QC optimized ISTD strategy)
PC 38:6
Poor
performance
with NOMIS
qcISTD Normalization
Use QC samples to:
1. Evaluate analyte %RSD
before and after corrections
using all ISTDs
2. Select analyte/ISTD
combinations with %RSD
improvement over raw data
at some threshold (e.g 10%)
3. Correct sample analytes
with QC defined ISTD if ISTD
recovery is above some
minimal threshold (e.g. >
20% of median)
• Subject to overfitting
191 of 326 (60%) are
ISTD corrected
qcISTD Normalization
ISTD used by retention time (Rt) Total number of analytes corrected by ISTD
Optimal Lipidomic ISTDS
Normalizations (LC-Q-TOF)
LOESS performs very
poorly for two
metabolites
• qcISTD performs better than LOESS
• qcISTD + LOESS leads to highest replicate
precision
PCA (LC/Q-TOF)
Raw (%RSD = 13) qcISTD (9)
LOESS (12)
qcISTD +
LOESS (8)
Only LOESS included
normalizations effectively
remove analytical batch
effects
Conclusion
• Comparison of common data normalization approaches
suggests that in addition to ISTD corrections, LOESS
(analyte-specific, non-linear adjustment based on QC
performance at various data acquisition times) is superior
to batch based corrections.
• Further validations need to be completed to confirm the
effects of normalizations on samples’ variance
• These findings suggest that inclusion of “batch” as a
covariate in statistical models will not fully account for
analytical variance
R code for all normalization functions can be found at :
https://github.com/dgrapov/devium/blob/master/R/Devium%20Normalization.r
dgrapov@ucdavis.edu
metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154

Weitere ähnliche Inhalte

Was ist angesagt? (20)

Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Finding ORF
Finding ORFFinding ORF
Finding ORF
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Ensembl genome
Ensembl genomeEnsembl genome
Ensembl genome
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Gemome annotation
Gemome annotationGemome annotation
Gemome annotation
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
Phylogenetic Tree evolution
Phylogenetic Tree evolutionPhylogenetic Tree evolution
Phylogenetic Tree evolution
 
EMBL
EMBLEMBL
EMBL
 
Data mining ppt
Data mining pptData mining ppt
Data mining ppt
 
Protein Data Bank (PDB)
Protein Data Bank (PDB)Protein Data Bank (PDB)
Protein Data Bank (PDB)
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Cath
CathCath
Cath
 
FASTA
FASTAFASTA
FASTA
 
techniques used in Metabolite profiling of bryophytes ppt
techniques used in Metabolite  profiling of bryophytes ppttechniques used in Metabolite  profiling of bryophytes ppt
techniques used in Metabolite profiling of bryophytes ppt
 

Andere mochten auch

5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modelingDmitry Grapov
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 

Andere mochten auch (9)

2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
0 introduction
0  introduction0  introduction
0 introduction
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 

Ähnlich wie Data Normalization Approaches for Large-scale Biological Studies

Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Thomas Bagley
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaSada Siva Rao Maddiguntla
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaSada Siva Rao Maddiguntla
 
Analytical Method Validation.pptx
Analytical Method Validation.pptxAnalytical Method Validation.pptx
Analytical Method Validation.pptxBholakant raut
 
Evaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryEvaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryDrMAnwar2
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Thomas Bagley
 
Good laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachGood laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachSoils FAO-GSP
 
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Aamir Ijaz Brig
 
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjQualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjPratik434909
 
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxQualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxPratik434909
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00Vijay Dhonde
 
From Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsFrom Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsKBI Biopharma
 
Quantitation techniques used in chromatography
Quantitation techniques used in chromatographyQuantitation techniques used in chromatography
Quantitation techniques used in chromatographyVrushali Tambe
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty componentGH Yeoh
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
Bioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBhaswat Chakraborty
 
INSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONINSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONHamunyare Ndwabe
 

Ähnlich wie Data Normalization Approaches for Large-scale Biological Studies (20)

Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasiva
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasiva
 
Analytical Method Validation.pptx
Analytical Method Validation.pptxAnalytical Method Validation.pptx
Analytical Method Validation.pptx
 
Evaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryEvaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratory
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
 
ICP QC protocol
ICP  QC  protocolICP  QC  protocol
ICP QC protocol
 
Good laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachGood laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approach
 
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
 
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjQualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
 
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxQualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00
 
From Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsFrom Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet Methods
 
Quantitation techniques used in chromatography
Quantitation techniques used in chromatographyQuantitation techniques used in chromatography
Quantitation techniques used in chromatography
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty component
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
Bioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug Products
 
INSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONINSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTION
 

Mehr von Dmitry Grapov

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideDmitry Grapov
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014Dmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 

Mehr von Dmitry Grapov (20)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 

Kürzlich hochgeladen

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 

Kürzlich hochgeladen (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

Data Normalization Approaches for Large-scale Biological Studies

  • 1. Data Normalization Approaches for Large-Scale Metabolomic Studies Dmitry Grapov, PhD
  • 2. Analytical Variance Variation in sample measurements stemming from sample handling, data acquisition, processing, etc • Can modify or mask true biological variability • Calculated based on variance in replicated measurements • Can be accounted for using data normalization approaches Goal- minimize analytical variance using data normalization Drift in >400 replicated measurements across >100 batches
  • 3. Need for Normalization To remove non-biological (e.g. analytical) drift/variance/artifacts in measurements Acquisition order Processing/acquisition batches Samples Quality Controls (QCs)
  • 4. Quantifying Data Quality (precision) Calculate median inter- and intra-batch %RSD (for replicated measurements) Analyte specific performance across whole study Within batch performance
  • 5. Visualizing Performance Intra-batch (within) precision for normalization methods Inter-batch (across) precision for normalization methods RSD = relative standard deviation = standard deviation/mean
  • 6. Visualizing Metabolite Performance acquisition time batch Univariate Multivariate PCA
  • 7. Common Normalization Approaches Sample-wise scalar corrections • L2 norm, mean, median, sum, etc. Internal standard (ISTD) • Ratio response (metabolite/ISTD) • NOMIS (Sysi-Aho et al., 2007; selection of optimal combination ISTDs) • CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs) Quality control (QC) or reference sample • Batch ratio (mean, median) • Loess (doi:10.1038/nprot.2011.335; locally estimated scatterplot smoothing) • Hierarchical mixed effects (Jauhiainen et al. 2014) • Quantile (Bolstad et al., 2003; minimize variance in metabolite distribution) Variance Based • RUV-2 (De Livera et al., 2012; variance removal for hypothesis testing) • Variance stabilizing normalization (Huber et al. 2002)
  • 8. Evaluation of Normalizations Use QC to define: • Median within batch %RSD • Median analyte study wide %RSD • All normalization specific parameters • Split QCs into training and test set • Optimize tuning parameters using leave-one-out cross-validation • Assess performance on test set Image: http://pingax.com/regularization-implementation-r/?utm_source=rss&utm_medium=rss&utm_campaign=regularization-implementation-r
  • 9. Scalar Normalization Calculate sample- specific scalar to ensure each sample’s (sum, mean, median, etc) signal is equivalent • Using sum signal normalization (sum norm) assumes equivalent total metabolite signal per sample • Can correct for batch effects when valid BMC Bioinformatics 2007, 8:93 doi:10.1186/1471-2105-8-93 Theses normalizations may hide true biological trends or create false ones After sum norm phospholipids seem lower in ob/ob when in reality theses are the same as in wt samples
  • 10. Batch Ratio (BR) Normalization Use QCs to calculate: 1. batch/analyte specific correction factor = (batch median /global median) 2. Apply ratio to samples • simple
  • 11. LOESS Normalization (local smoothing) For each analyte use QCs to: • Tune LOESS model (span or degree of smoothing) • LOESS model to remove analytical variance from samples raw LOESS normalized
  • 12. LOESS Normalization LOESS span has a large effect model fit span (α) defines the degree of smoothing and is critical for controlling overfitting
  • 13. LOESS Normalization raw samples (red) normalized based on QCs (black) model is trained on QCs and applied to samples span: too high just right? Can not assume convergence of training and test performance because test data has analytical + biological variance
  • 14. LOESS Normalization Avoiding over fitting is critical using the LOESS normalization
  • 15. Exammple LOESS Normalization raw span =0.75 span =0.005
  • 16. Metabolomic Data Case Study I GC-TOF • 310 metabolites for 4930 samples • 132 batches • ~41 samples per batch • ~1:10 QCs/samples (487 QCs or 9%) • No Internal Standards (ISTDs) Normalizations Implemented • Batch ratio • LOESS • Sum known metabolite signal (mTIC) normalization
  • 17. Batch Performance (GC-TOF Raw) Within batch • Median: 26 • Min: 19 • Max: 69 Median RSD count cumulative % 10-20 3 2 20-30 98 76 30-40 26 96 40-50 3 98 50-60 1 99 60-70 1 100
  • 18. Median RSD count cumulative % 0-10 10 3 10-20 83 30 20-30 100 62 30-40 69 84 40-50 32 94 50-60 6 96 60-70 3 97 70-80 5 98 80-90 1 99 90-100 1 100 Analyte Performance (GC-TOF Raw) Within Batch • Median: 24 • Min: 7 • Max: 79
  • 20. Within batches • Median: 23 • Min: 17 • Max: 69 Median RSD count cumulative % 10-20 25 23 20-30 67 85 30-40 15 99 40-50 1 100 60-70 1 101 Batch Performance (GC-TOF BR)
  • 21. Median RSD count cumulative % 0-10 17 6 10-20 103 39 20-30 112 75 30-40 57 93 40-50 12 97 50-60 5 99 60-70 3 100 70-80 1 100 Across batches • Median: 24 • Min: 7 • Max: 79 Batch Performance (GC-TOF BR)
  • 23. BR Normalization Limitations • Very susceptible to outliers • Requires many QCs • Can inflate variance when training and test set trends do not match
  • 24. Within batches • Median: 19 • Min: 11 • Max: 58 Median RSD count cumulative % 10-20 75 57 20-30 51 96 30-40 4 99 40-50 1 99 50-60 1 100 Batch Performance (GC-TOF LOESS)
  • 25. Median RSD count cumulative % 0-10 17 6 10-20 103 39 20-30 112 75 30-40 57 93 40-50 12 97 50-60 5 99 60-70 3 100 70-80 1 100 Across batches • Median: 19 • Min: 2.9 • Max: 66 Batch Performance (GC-TOF LOESS)
  • 27. LOESS Normalization Limitations raw normalized LOESS normalization can inflate variance when: • overtrained • training examples do not match test set
  • 28. Sum mTIC Normalization (GC-TOF) Improved performance over raw and BR, but alters data from magnitudinal to compositional
  • 29. Sum mTIC Normalization (GC-TOF) Poor removal of trends due to acquisition time, but limits magnitude of outliers samples compared to other approaches time Raw mTIC Normalized
  • 30. Metabolomic Data Case Study II LC-Q-TOF • 340+ metabolites for 4930 samples • 132 batches • ~41 samples per batch • ~1:10 QC/samples (524 QCs or 11%) • NIST reference (63 or 1%) • 14 internal standards (ISTDs) • NOMIS (IS = ISTD) • qcISTD
  • 31. Internal Standards Normalization Analyte Retention time Internal standards (ISTD) • qcISTD(QC optimized metabolite/ISTD) • NOMIS(Sysi-Aho et al., 2007; selection of optimal combination ISTDs) • CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs) NOMIS
  • 32. ISTD Based Normalizations (LC/Q-TOF) • NOMIS (linear combination of optimal ISTDs; Sysi-Aho et al., 2007) • qcISTD (QC optimized ISTD strategy) PC 38:6 Poor performance with NOMIS
  • 33. qcISTD Normalization Use QC samples to: 1. Evaluate analyte %RSD before and after corrections using all ISTDs 2. Select analyte/ISTD combinations with %RSD improvement over raw data at some threshold (e.g 10%) 3. Correct sample analytes with QC defined ISTD if ISTD recovery is above some minimal threshold (e.g. > 20% of median) • Subject to overfitting 191 of 326 (60%) are ISTD corrected
  • 34. qcISTD Normalization ISTD used by retention time (Rt) Total number of analytes corrected by ISTD
  • 36. Normalizations (LC-Q-TOF) LOESS performs very poorly for two metabolites • qcISTD performs better than LOESS • qcISTD + LOESS leads to highest replicate precision
  • 37. PCA (LC/Q-TOF) Raw (%RSD = 13) qcISTD (9) LOESS (12) qcISTD + LOESS (8) Only LOESS included normalizations effectively remove analytical batch effects
  • 38. Conclusion • Comparison of common data normalization approaches suggests that in addition to ISTD corrections, LOESS (analyte-specific, non-linear adjustment based on QC performance at various data acquisition times) is superior to batch based corrections. • Further validations need to be completed to confirm the effects of normalizations on samples’ variance • These findings suggest that inclusion of “batch” as a covariate in statistical models will not fully account for analytical variance R code for all normalization functions can be found at : https://github.com/dgrapov/devium/blob/master/R/Devium%20Normalization.r
  • 39. dgrapov@ucdavis.edu metabolomics.ucdavis.edu This research was supported in part by NIH 1 U24 DK097154