SlideShare a Scribd company logo
1 of 10
Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

Partial Least Squares (O-/PLS/-DA)
modeling of metabolomic sample
processing methods

Goal:
Use PLS to identify metabolites which best
discriminate (most different between) sample
processing methods
(Used DATA: Pumpkin data 1.csv)
Topics:
1.Model Selection
2.Results visualization
3.Feature Selection
4.Validation
Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

Partial Least Squares Modeling
Discriminant Analysis (PLS-DA)

Steps
1.Calculate a single Y PLS model to discriminate between extraction/treatment
2.Select optimal scaling and model latent variable (LV) number
3.Overview PLS scores and loadings plots
4.Validate model
5.Repeat steps 1-4 for an O-PLS model
Visualize:
1.Sample scores annotated by extraction and treatment
2.Variable loadings plot
Exercise:
1.How are scores different between 1-Y PLS, 1-Y O-PLS, 2-Y O-PLS?
2.Are there any moderate or extreme outliers?
3.What variables contribute most to the differences between treatments?
Model Performance

Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

RMSEP - root mean
squared error of
prediction (fit to left
out set)
Q2 - cross-validated fit
to the training data
Xvar- explained
variance in X
(measurements)
Model Validation

Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

single model
performance estimates

Model properties and
validation settings

Based on training/test
splitting
Based on training/test
splitting and permuted Y
Significance of difference between model performance
and permuted NULL distributions
Biology

Chemistry
Informatics

Model Scores single Y
(extraction_treatment)

Partial Least Squares (O-/PLS/-DA)

Treatment

Extraction

Variance in extraction
dominates model
Biology

Chemistry
Informatics

Model Scores
Comparison

Partial Least Squares (O-/PLS/-DA)

1-Y PLS

treatment

extraction

1-Y O-PLS

2-Y O-PLS
Top Discriminants

Biology

Fatty acids, lipids

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

2-Y O-PLS
treatment

extraction
Sugar phosphates, polyols
Feature Selection

Biology

Chemistry

Steps
1.Identify top 10% of discriminants for extraction based on metabolite
• correlation with model scores
• importance (loading, coeffcient, VIP)

Visualize:
1.Feature selection diagnostics

correlation

Partial Least Squares (O-/PLS/-DA)

Informatics

Loading (LV1)
Feature Selection for Extraction

Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

1

2

higher in 3
3

higher in 1
Summary

Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

Modeling Strategy (advanced)
1. Fit preliminary model
2. Evaluate of scores/loadings
3. Remove outliers
4. split data into test and train set (optional)
5. model selection (LV, OLV, etc)
6. internal (training set) model validation (permutation testing,
training/testing)
7. Feature selection (optional)
•

Comparison of selected to excluded feature models

8. External model validation (test set)

More Related Content

What's hot

Virtual screening techniques
Virtual screening techniquesVirtual screening techniques
Virtual screening techniquesROHIT PAL
 
STATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSARSTATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSARRaniBhagat1
 
Role of Target Identification and Target Validation in Drug Discovery Process
Role of Target Identification and Target Validation in Drug Discovery ProcessRole of Target Identification and Target Validation in Drug Discovery Process
Role of Target Identification and Target Validation in Drug Discovery ProcessPallavi Duggal
 
Structure based in silico virtual screening
Structure based in silico virtual screeningStructure based in silico virtual screening
Structure based in silico virtual screeningJoon Jyoti Sahariah
 
screening of anti-fertility agent
screening of anti-fertility agentscreening of anti-fertility agent
screening of anti-fertility agentJaineel Dharod
 
Immunoassay of insulin (neha)
Immunoassay of insulin (neha)Immunoassay of insulin (neha)
Immunoassay of insulin (neha)ANANYAPANDEY71
 
Pharmacophore mapping
Pharmacophore mapping Pharmacophore mapping
Pharmacophore mapping GamitKinjal
 
Alternative methods to animal toxicity testing
Alternative methods to        animal toxicity testingAlternative methods to        animal toxicity testing
Alternative methods to animal toxicity testingSachin Sharma
 
Target identification and validation
Target identification and validationTarget identification and validation
Target identification and validationAshishVerma571
 
In silico lead discovery technique.pptx
In silico lead discovery technique.pptxIn silico lead discovery technique.pptx
In silico lead discovery technique.pptxSIRAJUDDIN MOLLA
 
Assignment on Reproductive toxicology studies
Assignment on Reproductive toxicology studiesAssignment on Reproductive toxicology studies
Assignment on Reproductive toxicology studiesDeepak Kumar
 
Target discovery and validation
Target discovery and validation Target discovery and validation
Target discovery and validation ANAND SAGAR TIWARI
 
Dermal Irritation and Dermal Toxicity Studies
Dermal Irritation and Dermal Toxicity Studies Dermal Irritation and Dermal Toxicity Studies
Dermal Irritation and Dermal Toxicity Studies Dinesh Gangoda
 
Lead Optimization in Drug Discovery
Lead Optimization in Drug DiscoveryLead Optimization in Drug Discovery
Lead Optimization in Drug Discoveryavinashdhake3
 
Fertility and antifertility screening
Fertility and antifertility screeningFertility and antifertility screening
Fertility and antifertility screeningnazuk sharma
 
“Docking Studies and Drug Design”
“Docking  Studies and Drug Design”“Docking  Studies and Drug Design”
“Docking Studies and Drug Design”Naresh Juttu
 
drug discovery & development
drug discovery & developmentdrug discovery & development
drug discovery & developmentRohit K.
 

What's hot (20)

3D QSAR
3D QSAR3D QSAR
3D QSAR
 
Virtual screening techniques
Virtual screening techniquesVirtual screening techniques
Virtual screening techniques
 
Lead identification
Lead identification Lead identification
Lead identification
 
REPRODUCTIVE TOXICITY STUDIES
REPRODUCTIVE TOXICITY STUDIESREPRODUCTIVE TOXICITY STUDIES
REPRODUCTIVE TOXICITY STUDIES
 
STATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSARSTATISTICAL METHOD OF QSAR
STATISTICAL METHOD OF QSAR
 
Role of Target Identification and Target Validation in Drug Discovery Process
Role of Target Identification and Target Validation in Drug Discovery ProcessRole of Target Identification and Target Validation in Drug Discovery Process
Role of Target Identification and Target Validation in Drug Discovery Process
 
Structure based in silico virtual screening
Structure based in silico virtual screeningStructure based in silico virtual screening
Structure based in silico virtual screening
 
screening of anti-fertility agent
screening of anti-fertility agentscreening of anti-fertility agent
screening of anti-fertility agent
 
Immunoassay of insulin (neha)
Immunoassay of insulin (neha)Immunoassay of insulin (neha)
Immunoassay of insulin (neha)
 
Pharmacophore mapping
Pharmacophore mapping Pharmacophore mapping
Pharmacophore mapping
 
Alternative methods to animal toxicity testing
Alternative methods to        animal toxicity testingAlternative methods to        animal toxicity testing
Alternative methods to animal toxicity testing
 
Target identification and validation
Target identification and validationTarget identification and validation
Target identification and validation
 
In silico lead discovery technique.pptx
In silico lead discovery technique.pptxIn silico lead discovery technique.pptx
In silico lead discovery technique.pptx
 
Assignment on Reproductive toxicology studies
Assignment on Reproductive toxicology studiesAssignment on Reproductive toxicology studies
Assignment on Reproductive toxicology studies
 
Target discovery and validation
Target discovery and validation Target discovery and validation
Target discovery and validation
 
Dermal Irritation and Dermal Toxicity Studies
Dermal Irritation and Dermal Toxicity Studies Dermal Irritation and Dermal Toxicity Studies
Dermal Irritation and Dermal Toxicity Studies
 
Lead Optimization in Drug Discovery
Lead Optimization in Drug DiscoveryLead Optimization in Drug Discovery
Lead Optimization in Drug Discovery
 
Fertility and antifertility screening
Fertility and antifertility screeningFertility and antifertility screening
Fertility and antifertility screening
 
“Docking Studies and Drug Design”
“Docking  Studies and Drug Design”“Docking  Studies and Drug Design”
“Docking Studies and Drug Design”
 
drug discovery & development
drug discovery & developmentdrug discovery & development
drug discovery & development
 

Viewers also liked

5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 

Viewers also liked (9)

5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
0 introduction
0  introduction0  introduction
0 introduction
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 

Similar to PLS-DA Modeling Discriminates Sample Processing Methods

3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
1501 Refocus metabolomics
1501 Refocus metabolomics1501 Refocus metabolomics
1501 Refocus metabolomicsIS-X
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptorsRAJAN ROLTA
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
 
Development and Validation of a RP-HPLC method
Development and Validation of a RP-HPLC methodDevelopment and Validation of a RP-HPLC method
Development and Validation of a RP-HPLC methodUshaKhanal3
 
Pg. 05Question FiveAssignment #Deadline Day 22.docx
Pg. 05Question FiveAssignment #Deadline Day 22.docxPg. 05Question FiveAssignment #Deadline Day 22.docx
Pg. 05Question FiveAssignment #Deadline Day 22.docxmattjtoni51554
 
IRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsIRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsPanagiotis Arapitsas
 
Worked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationWorked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationGH Yeoh
 
PurposeStudents will conduct one brief research exercise that w.docx
PurposeStudents will conduct one brief research exercise that w.docxPurposeStudents will conduct one brief research exercise that w.docx
PurposeStudents will conduct one brief research exercise that w.docxamrit47
 
Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Ahmed Negida
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysisILRI-Jmaru
 
Final presentation dwi riyono
Final presentation dwi riyonoFinal presentation dwi riyono
Final presentation dwi riyonoDwi Riyono
 
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docxFinal Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docxmydrynan
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"James Neill
 
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxDIPESH30
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
Lab report 100 points
Lab report 100 pointsLab report 100 points
Lab report 100 pointsMaria Donohue
 

Similar to PLS-DA Modeling Discriminates Sample Processing Methods (20)

Food metabolomics Arapitsas 2017
Food metabolomics Arapitsas 2017Food metabolomics Arapitsas 2017
Food metabolomics Arapitsas 2017
 
Bioanalytical ppt
Bioanalytical pptBioanalytical ppt
Bioanalytical ppt
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
1501 Refocus metabolomics
1501 Refocus metabolomics1501 Refocus metabolomics
1501 Refocus metabolomics
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
 
Development and Validation of a RP-HPLC method
Development and Validation of a RP-HPLC methodDevelopment and Validation of a RP-HPLC method
Development and Validation of a RP-HPLC method
 
Pg. 05Question FiveAssignment #Deadline Day 22.docx
Pg. 05Question FiveAssignment #Deadline Day 22.docxPg. 05Question FiveAssignment #Deadline Day 22.docx
Pg. 05Question FiveAssignment #Deadline Day 22.docx
 
IRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsIRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomics
 
Worked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationWorked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluation
 
PurposeStudents will conduct one brief research exercise that w.docx
PurposeStudents will conduct one brief research exercise that w.docxPurposeStudents will conduct one brief research exercise that w.docx
PurposeStudents will conduct one brief research exercise that w.docx
 
Instrumentation
InstrumentationInstrumentation
Instrumentation
 
Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
 
Final presentation dwi riyono
Final presentation dwi riyonoFinal presentation dwi riyono
Final presentation dwi riyono
 
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docxFinal Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Lab report 100 points
Lab report 100 pointsLab report 100 points
Lab report 100 points
 

More from Dmitry Grapov

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideDmitry Grapov
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014Dmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 

More from Dmitry Grapov (20)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 

PLS-DA Modeling Discriminates Sample Processing Methods

  • 1. Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics Partial Least Squares (O-/PLS/-DA) modeling of metabolomic sample processing methods Goal: Use PLS to identify metabolites which best discriminate (most different between) sample processing methods (Used DATA: Pumpkin data 1.csv) Topics: 1.Model Selection 2.Results visualization 3.Feature Selection 4.Validation
  • 2. Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics Partial Least Squares Modeling Discriminant Analysis (PLS-DA) Steps 1.Calculate a single Y PLS model to discriminate between extraction/treatment 2.Select optimal scaling and model latent variable (LV) number 3.Overview PLS scores and loadings plots 4.Validate model 5.Repeat steps 1-4 for an O-PLS model Visualize: 1.Sample scores annotated by extraction and treatment 2.Variable loadings plot Exercise: 1.How are scores different between 1-Y PLS, 1-Y O-PLS, 2-Y O-PLS? 2.Are there any moderate or extreme outliers? 3.What variables contribute most to the differences between treatments?
  • 3. Model Performance Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics RMSEP - root mean squared error of prediction (fit to left out set) Q2 - cross-validated fit to the training data Xvar- explained variance in X (measurements)
  • 4. Model Validation Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics single model performance estimates Model properties and validation settings Based on training/test splitting Based on training/test splitting and permuted Y Significance of difference between model performance and permuted NULL distributions
  • 5. Biology Chemistry Informatics Model Scores single Y (extraction_treatment) Partial Least Squares (O-/PLS/-DA) Treatment Extraction Variance in extraction dominates model
  • 6. Biology Chemistry Informatics Model Scores Comparison Partial Least Squares (O-/PLS/-DA) 1-Y PLS treatment extraction 1-Y O-PLS 2-Y O-PLS
  • 7. Top Discriminants Biology Fatty acids, lipids Chemistry Partial Least Squares (O-/PLS/-DA) Informatics 2-Y O-PLS treatment extraction Sugar phosphates, polyols
  • 8. Feature Selection Biology Chemistry Steps 1.Identify top 10% of discriminants for extraction based on metabolite • correlation with model scores • importance (loading, coeffcient, VIP) Visualize: 1.Feature selection diagnostics correlation Partial Least Squares (O-/PLS/-DA) Informatics Loading (LV1)
  • 9. Feature Selection for Extraction Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics 1 2 higher in 3 3 higher in 1
  • 10. Summary Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics Modeling Strategy (advanced) 1. Fit preliminary model 2. Evaluate of scores/loadings 3. Remove outliers 4. split data into test and train set (optional) 5. model selection (LV, OLV, etc) 6. internal (training set) model validation (permutation testing, training/testing) 7. Feature selection (optional) • Comparison of selected to excluded feature models 8. External model validation (test set)