SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Data Normalization Approaches for
Large-Scale Metabolomic Studies
Dmitry Grapov, PhD
Analytical Variance
Variation in sample measurements stemming from sample
handling, data acquisition, processing, etc
• Can modify or mask true biological variability
• Calculated based on variance in replicated measurements
• Can be accounted for using data normalization approaches
Goal- minimize analytical variance using data normalization
Drift in >400 replicated measurements across >100 batches
Need for Normalization
To remove non-biological (e.g. analytical)
drift/variance/artifacts in measurements
Acquisition order Processing/acquisition batches
Samples
Quality Controls (QCs)
Quantifying Data Quality (precision)
Calculate median inter- and intra-batch %RSD
(for replicated measurements)
Analyte specific
performance across
whole study
Within batch
performance
Visualizing Performance
Intra-batch (within) precision for
normalization methods
Inter-batch (across) precision for
normalization methods
RSD = relative standard deviation = standard deviation/mean
Visualizing Metabolite Performance
acquisition time
batch
Univariate Multivariate
PCA
Common Normalization Approaches
Sample-wise scalar corrections
• L2 norm, mean, median, sum, etc.
Internal standard (ISTD)
• Ratio response (metabolite/ISTD)
• NOMIS (Sysi-Aho et al., 2007; selection of optimal combination ISTDs)
• CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs)
Quality control (QC) or reference sample
• Batch ratio (mean, median)
• Loess (doi:10.1038/nprot.2011.335; locally estimated scatterplot smoothing)
• Hierarchical mixed effects (Jauhiainen et al. 2014)
• Quantile (Bolstad et al., 2003; minimize variance in metabolite distribution)
Variance Based
• RUV-2 (De Livera et al., 2012; variance removal for hypothesis testing)
• Variance stabilizing normalization (Huber et al. 2002)
Evaluation of Normalizations
Use QC to define:
• Median within batch %RSD
• Median analyte study wide %RSD
• All normalization specific parameters
• Split QCs into training and test set
• Optimize tuning parameters using leave-one-out
cross-validation
• Assess performance on test set
Image: http://pingax.com/regularization-implementation-r/?utm_source=rss&utm_medium=rss&utm_campaign=regularization-implementation-r
Scalar Normalization
Calculate sample-
specific scalar to ensure
each sample’s (sum,
mean, median, etc)
signal is equivalent
• Using sum signal
normalization (sum
norm) assumes
equivalent total
metabolite signal per
sample
• Can correct for batch
effects when valid
BMC Bioinformatics 2007, 8:93 doi:10.1186/1471-2105-8-93
Theses normalizations may hide true
biological trends or create false ones
After sum norm phospholipids
seem lower in ob/ob when in
reality theses are the same as
in wt samples
Batch Ratio (BR) Normalization
Use QCs to calculate:
1. batch/analyte specific
correction factor =
(batch median /global
median)
2. Apply ratio to samples
• simple
LOESS Normalization (local smoothing)
For each analyte use QCs to:
• Tune LOESS model (span or degree of smoothing)
• LOESS model to remove analytical variance from samples
raw LOESS normalized
LOESS Normalization
LOESS span has a large effect model fit
span (α) defines the degree of
smoothing and is critical for
controlling overfitting
LOESS Normalization
raw samples (red) normalized based on QCs (black)
model is trained on QCs and applied to samples
span: too high just right?
Can not assume convergence of training and test performance because
test data has analytical + biological variance
LOESS Normalization
Avoiding over fitting is critical using the LOESS normalization
Exammple LOESS Normalization
raw span =0.75 span =0.005
Metabolomic Data Case Study I
GC-TOF
• 310 metabolites for 4930 samples
• 132 batches
• ~41 samples per batch
• ~1:10 QCs/samples (487 QCs or 9%)
• No Internal Standards (ISTDs)
Normalizations Implemented
• Batch ratio
• LOESS
• Sum known metabolite signal (mTIC) normalization
Batch Performance (GC-TOF Raw)
Within batch
• Median: 26
• Min: 19
• Max: 69
Median
RSD count cumulative %
10-20 3 2
20-30 98 76
30-40 26 96
40-50 3 98
50-60 1 99
60-70 1 100
Median
RSD count cumulative %
0-10 10 3
10-20 83 30
20-30 100 62
30-40 69 84
40-50 32 94
50-60 6 96
60-70 3 97
70-80 5 98
80-90 1 99
90-100 1 100
Analyte Performance (GC-TOF Raw)
Within Batch
• Median: 24
• Min: 7
• Max: 79
PCA (GC-TOF Raw)
Within batches
• Median: 23
• Min: 17
• Max: 69
Median
RSD count cumulative %
10-20 25 23
20-30 67 85
30-40 15 99
40-50 1 100
60-70 1 101
Batch Performance (GC-TOF BR)
Median
RSD count cumulative %
0-10 17 6
10-20 103 39
20-30 112 75
30-40 57 93
40-50 12 97
50-60 5 99
60-70 3 100
70-80 1 100
Across batches
• Median: 24
• Min: 7
• Max: 79
Batch Performance (GC-TOF BR)
PCA (GC-TOF BR)
BR Normalization Limitations
• Very susceptible to
outliers
• Requires many QCs
• Can inflate variance
when training and test
set trends do not
match
Within batches
• Median: 19
• Min: 11
• Max: 58
Median
RSD count cumulative %
10-20 75 57
20-30 51 96
30-40 4 99
40-50 1 99
50-60 1 100
Batch Performance (GC-TOF LOESS)
Median
RSD count cumulative %
0-10 17 6
10-20 103 39
20-30 112 75
30-40 57 93
40-50 12 97
50-60 5 99
60-70 3 100
70-80 1 100
Across batches
• Median: 19
• Min: 2.9
• Max: 66
Batch Performance (GC-TOF LOESS)
PCA (GC-TOF LOESS)
LOESS Normalization Limitations
raw normalized
LOESS normalization can
inflate variance when:
• overtrained
• training examples do
not match test set
Sum mTIC Normalization (GC-TOF)
Improved performance over
raw and BR, but alters data
from magnitudinal to
compositional
Sum mTIC Normalization (GC-TOF)
Poor removal of trends due to acquisition time, but limits magnitude of
outliers samples compared to other approaches
time
Raw
mTIC Normalized
Metabolomic Data Case
Study II
LC-Q-TOF
• 340+ metabolites for 4930 samples
• 132 batches
• ~41 samples per batch
• ~1:10 QC/samples (524 QCs or 11%)
• NIST reference (63 or 1%)
• 14 internal standards (ISTDs)
• NOMIS (IS = ISTD)
• qcISTD
Internal Standards Normalization
Analyte
Retention time
Internal standards (ISTD)
• qcISTD(QC optimized
metabolite/ISTD)
• NOMIS(Sysi-Aho et al., 2007;
selection of optimal combination ISTDs)
• CCRMN (Redestig et al., 2009;
removal of metabolite cross contribution
to ISTDs)
NOMIS
ISTD Based Normalizations (LC/Q-TOF)
• NOMIS (linear combination of optimal ISTDs;
Sysi-Aho et al., 2007)
• qcISTD (QC optimized ISTD strategy)
PC 38:6
Poor
performance
with NOMIS
qcISTD Normalization
Use QC samples to:
1. Evaluate analyte %RSD
before and after corrections
using all ISTDs
2. Select analyte/ISTD
combinations with %RSD
improvement over raw data
at some threshold (e.g 10%)
3. Correct sample analytes
with QC defined ISTD if ISTD
recovery is above some
minimal threshold (e.g. >
20% of median)
• Subject to overfitting
191 of 326 (60%) are
ISTD corrected
qcISTD Normalization
ISTD used by retention time (Rt) Total number of analytes corrected by ISTD
Optimal Lipidomic ISTDS
Normalizations (LC-Q-TOF)
LOESS performs very
poorly for two
metabolites
• qcISTD performs better than LOESS
• qcISTD + LOESS leads to highest replicate
precision
PCA (LC/Q-TOF)
Raw (%RSD = 13) qcISTD (9)
LOESS (12)
qcISTD +
LOESS (8)
Only LOESS included
normalizations effectively
remove analytical batch
effects
Conclusion
• Comparison of common data normalization approaches
suggests that in addition to ISTD corrections, LOESS
(analyte-specific, non-linear adjustment based on QC
performance at various data acquisition times) is superior
to batch based corrections.
• Further validations need to be completed to confirm the
effects of normalizations on samples’ variance
• These findings suggest that inclusion of “batch” as a
covariate in statistical models will not fully account for
analytical variance
R code for all normalization functions can be found at :
https://github.com/dgrapov/devium/blob/master/R/Devium%20Normalization.r
dgrapov@ucdavis.edu
metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154

Weitere ähnliche Inhalte

Was ist angesagt?

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Dna sequencing (bacteriophage m13 and primer walking)
Dna sequencing (bacteriophage m13 and primer walking)Dna sequencing (bacteriophage m13 and primer walking)
Dna sequencing (bacteriophage m13 and primer walking)Shivani Thorat
 
Gene sequencing methods
Gene sequencing methodsGene sequencing methods
Gene sequencing methodsDeepak Kumar
 
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Prasenjit Mitra
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Light Intro to the Gene Ontology
Light Intro to the Gene OntologyLight Intro to the Gene Ontology
Light Intro to the Gene Ontologynniiicc
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl databaseAshfaq Ahmad
 
Illumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodIllumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodFekaduKorsa
 
bone tissue engineering
bone tissue engineeringbone tissue engineering
bone tissue engineeringSomdutt Sharma
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Xenotransplantion
 Xenotransplantion Xenotransplantion
XenotransplantionAchyut Bora
 

Was ist angesagt? (20)

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Evaluating Published Research
Evaluating Published ResearchEvaluating Published Research
Evaluating Published Research
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
genetic variation
genetic variationgenetic variation
genetic variation
 
Dna sequencing (bacteriophage m13 and primer walking)
Dna sequencing (bacteriophage m13 and primer walking)Dna sequencing (bacteriophage m13 and primer walking)
Dna sequencing (bacteriophage m13 and primer walking)
 
Nanopore Sequencing
Nanopore SequencingNanopore Sequencing
Nanopore Sequencing
 
Bio-engineering
Bio-engineeringBio-engineering
Bio-engineering
 
Gene sequencing methods
Gene sequencing methodsGene sequencing methods
Gene sequencing methods
 
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Light Intro to the Gene Ontology
Light Intro to the Gene OntologyLight Intro to the Gene Ontology
Light Intro to the Gene Ontology
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
Illumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodIllumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) method
 
Knock out technology (final)
Knock out technology (final)Knock out technology (final)
Knock out technology (final)
 
bone tissue engineering
bone tissue engineeringbone tissue engineering
bone tissue engineering
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
DNAzymes
DNAzymesDNAzymes
DNAzymes
 
Xenotransplantion
 Xenotransplantion Xenotransplantion
Xenotransplantion
 
Biological networks
Biological networksBiological networks
Biological networks
 

Andere mochten auch

5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modelingDmitry Grapov
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 

Andere mochten auch (9)

2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
0 introduction
0  introduction0  introduction
0 introduction
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 

Ähnlich wie Data Normalization Approaches for Large-scale Biological Studies

Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Thomas Bagley
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaSada Siva Rao Maddiguntla
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaSada Siva Rao Maddiguntla
 
Analytical Method Validation.pptx
Analytical Method Validation.pptxAnalytical Method Validation.pptx
Analytical Method Validation.pptxBholakant raut
 
Evaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryEvaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryDrMAnwar2
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Thomas Bagley
 
Good laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachGood laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachSoils FAO-GSP
 
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Aamir Ijaz Brig
 
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjQualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjPratik434909
 
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxQualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxPratik434909
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00Vijay Dhonde
 
From Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsFrom Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsKBI Biopharma
 
Quantitation techniques used in chromatography
Quantitation techniques used in chromatographyQuantitation techniques used in chromatography
Quantitation techniques used in chromatographyVrushali Tambe
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty componentGH Yeoh
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
Bioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBhaswat Chakraborty
 
INSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONINSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONHamunyare Ndwabe
 

Ähnlich wie Data Normalization Approaches for Large-scale Biological Studies (20)

Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasiva
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasiva
 
Analytical Method Validation.pptx
Analytical Method Validation.pptxAnalytical Method Validation.pptx
Analytical Method Validation.pptx
 
Evaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryEvaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratory
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
 
ICP QC protocol
ICP  QC  protocolICP  QC  protocol
ICP QC protocol
 
Good laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachGood laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approach
 
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
 
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjQualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
 
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxQualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00
 
From Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsFrom Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet Methods
 
Quantitation techniques used in chromatography
Quantitation techniques used in chromatographyQuantitation techniques used in chromatography
Quantitation techniques used in chromatography
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty component
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
Bioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug Products
 
INSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONINSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTION
 

Mehr von Dmitry Grapov

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideDmitry Grapov
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014Dmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 

Mehr von Dmitry Grapov (20)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 

Kürzlich hochgeladen

MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptxPoojaSen20
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFVivekanand Anglo Vedic Academy
 
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatmentsaipooja36
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppCeline George
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjMohammed Sikander
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptxPoojaSen20
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽中 央社
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhleson0603
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxheathfieldcps1
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxneillewis46
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code ExamplesPeter Brusilovsky
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45MysoreMuleSoftMeetup
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppCeline George
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project researchCaitlinCummins3
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMELOISARIVERA8
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/siemaillard
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxMarlene Maheu
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 

Kürzlich hochgeladen (20)

MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 

Data Normalization Approaches for Large-scale Biological Studies

  • 1. Data Normalization Approaches for Large-Scale Metabolomic Studies Dmitry Grapov, PhD
  • 2. Analytical Variance Variation in sample measurements stemming from sample handling, data acquisition, processing, etc • Can modify or mask true biological variability • Calculated based on variance in replicated measurements • Can be accounted for using data normalization approaches Goal- minimize analytical variance using data normalization Drift in >400 replicated measurements across >100 batches
  • 3. Need for Normalization To remove non-biological (e.g. analytical) drift/variance/artifacts in measurements Acquisition order Processing/acquisition batches Samples Quality Controls (QCs)
  • 4. Quantifying Data Quality (precision) Calculate median inter- and intra-batch %RSD (for replicated measurements) Analyte specific performance across whole study Within batch performance
  • 5. Visualizing Performance Intra-batch (within) precision for normalization methods Inter-batch (across) precision for normalization methods RSD = relative standard deviation = standard deviation/mean
  • 6. Visualizing Metabolite Performance acquisition time batch Univariate Multivariate PCA
  • 7. Common Normalization Approaches Sample-wise scalar corrections • L2 norm, mean, median, sum, etc. Internal standard (ISTD) • Ratio response (metabolite/ISTD) • NOMIS (Sysi-Aho et al., 2007; selection of optimal combination ISTDs) • CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs) Quality control (QC) or reference sample • Batch ratio (mean, median) • Loess (doi:10.1038/nprot.2011.335; locally estimated scatterplot smoothing) • Hierarchical mixed effects (Jauhiainen et al. 2014) • Quantile (Bolstad et al., 2003; minimize variance in metabolite distribution) Variance Based • RUV-2 (De Livera et al., 2012; variance removal for hypothesis testing) • Variance stabilizing normalization (Huber et al. 2002)
  • 8. Evaluation of Normalizations Use QC to define: • Median within batch %RSD • Median analyte study wide %RSD • All normalization specific parameters • Split QCs into training and test set • Optimize tuning parameters using leave-one-out cross-validation • Assess performance on test set Image: http://pingax.com/regularization-implementation-r/?utm_source=rss&utm_medium=rss&utm_campaign=regularization-implementation-r
  • 9. Scalar Normalization Calculate sample- specific scalar to ensure each sample’s (sum, mean, median, etc) signal is equivalent • Using sum signal normalization (sum norm) assumes equivalent total metabolite signal per sample • Can correct for batch effects when valid BMC Bioinformatics 2007, 8:93 doi:10.1186/1471-2105-8-93 Theses normalizations may hide true biological trends or create false ones After sum norm phospholipids seem lower in ob/ob when in reality theses are the same as in wt samples
  • 10. Batch Ratio (BR) Normalization Use QCs to calculate: 1. batch/analyte specific correction factor = (batch median /global median) 2. Apply ratio to samples • simple
  • 11. LOESS Normalization (local smoothing) For each analyte use QCs to: • Tune LOESS model (span or degree of smoothing) • LOESS model to remove analytical variance from samples raw LOESS normalized
  • 12. LOESS Normalization LOESS span has a large effect model fit span (α) defines the degree of smoothing and is critical for controlling overfitting
  • 13. LOESS Normalization raw samples (red) normalized based on QCs (black) model is trained on QCs and applied to samples span: too high just right? Can not assume convergence of training and test performance because test data has analytical + biological variance
  • 14. LOESS Normalization Avoiding over fitting is critical using the LOESS normalization
  • 15. Exammple LOESS Normalization raw span =0.75 span =0.005
  • 16. Metabolomic Data Case Study I GC-TOF • 310 metabolites for 4930 samples • 132 batches • ~41 samples per batch • ~1:10 QCs/samples (487 QCs or 9%) • No Internal Standards (ISTDs) Normalizations Implemented • Batch ratio • LOESS • Sum known metabolite signal (mTIC) normalization
  • 17. Batch Performance (GC-TOF Raw) Within batch • Median: 26 • Min: 19 • Max: 69 Median RSD count cumulative % 10-20 3 2 20-30 98 76 30-40 26 96 40-50 3 98 50-60 1 99 60-70 1 100
  • 18. Median RSD count cumulative % 0-10 10 3 10-20 83 30 20-30 100 62 30-40 69 84 40-50 32 94 50-60 6 96 60-70 3 97 70-80 5 98 80-90 1 99 90-100 1 100 Analyte Performance (GC-TOF Raw) Within Batch • Median: 24 • Min: 7 • Max: 79
  • 20. Within batches • Median: 23 • Min: 17 • Max: 69 Median RSD count cumulative % 10-20 25 23 20-30 67 85 30-40 15 99 40-50 1 100 60-70 1 101 Batch Performance (GC-TOF BR)
  • 21. Median RSD count cumulative % 0-10 17 6 10-20 103 39 20-30 112 75 30-40 57 93 40-50 12 97 50-60 5 99 60-70 3 100 70-80 1 100 Across batches • Median: 24 • Min: 7 • Max: 79 Batch Performance (GC-TOF BR)
  • 23. BR Normalization Limitations • Very susceptible to outliers • Requires many QCs • Can inflate variance when training and test set trends do not match
  • 24. Within batches • Median: 19 • Min: 11 • Max: 58 Median RSD count cumulative % 10-20 75 57 20-30 51 96 30-40 4 99 40-50 1 99 50-60 1 100 Batch Performance (GC-TOF LOESS)
  • 25. Median RSD count cumulative % 0-10 17 6 10-20 103 39 20-30 112 75 30-40 57 93 40-50 12 97 50-60 5 99 60-70 3 100 70-80 1 100 Across batches • Median: 19 • Min: 2.9 • Max: 66 Batch Performance (GC-TOF LOESS)
  • 27. LOESS Normalization Limitations raw normalized LOESS normalization can inflate variance when: • overtrained • training examples do not match test set
  • 28. Sum mTIC Normalization (GC-TOF) Improved performance over raw and BR, but alters data from magnitudinal to compositional
  • 29. Sum mTIC Normalization (GC-TOF) Poor removal of trends due to acquisition time, but limits magnitude of outliers samples compared to other approaches time Raw mTIC Normalized
  • 30. Metabolomic Data Case Study II LC-Q-TOF • 340+ metabolites for 4930 samples • 132 batches • ~41 samples per batch • ~1:10 QC/samples (524 QCs or 11%) • NIST reference (63 or 1%) • 14 internal standards (ISTDs) • NOMIS (IS = ISTD) • qcISTD
  • 31. Internal Standards Normalization Analyte Retention time Internal standards (ISTD) • qcISTD(QC optimized metabolite/ISTD) • NOMIS(Sysi-Aho et al., 2007; selection of optimal combination ISTDs) • CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs) NOMIS
  • 32. ISTD Based Normalizations (LC/Q-TOF) • NOMIS (linear combination of optimal ISTDs; Sysi-Aho et al., 2007) • qcISTD (QC optimized ISTD strategy) PC 38:6 Poor performance with NOMIS
  • 33. qcISTD Normalization Use QC samples to: 1. Evaluate analyte %RSD before and after corrections using all ISTDs 2. Select analyte/ISTD combinations with %RSD improvement over raw data at some threshold (e.g 10%) 3. Correct sample analytes with QC defined ISTD if ISTD recovery is above some minimal threshold (e.g. > 20% of median) • Subject to overfitting 191 of 326 (60%) are ISTD corrected
  • 34. qcISTD Normalization ISTD used by retention time (Rt) Total number of analytes corrected by ISTD
  • 36. Normalizations (LC-Q-TOF) LOESS performs very poorly for two metabolites • qcISTD performs better than LOESS • qcISTD + LOESS leads to highest replicate precision
  • 37. PCA (LC/Q-TOF) Raw (%RSD = 13) qcISTD (9) LOESS (12) qcISTD + LOESS (8) Only LOESS included normalizations effectively remove analytical batch effects
  • 38. Conclusion • Comparison of common data normalization approaches suggests that in addition to ISTD corrections, LOESS (analyte-specific, non-linear adjustment based on QC performance at various data acquisition times) is superior to batch based corrections. • Further validations need to be completed to confirm the effects of normalizations on samples’ variance • These findings suggest that inclusion of “batch” as a covariate in statistical models will not fully account for analytical variance R code for all normalization functions can be found at : https://github.com/dgrapov/devium/blob/master/R/Devium%20Normalization.r
  • 39. dgrapov@ucdavis.edu metabolomics.ucdavis.edu This research was supported in part by NIH 1 U24 DK097154