SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Introduction

Introduction to
Metabolomic Data Analysis

Dmitry Grapov, PhD
Introduction

Important
•This is an introduction to a series
of 8 tutorials for metabolomic data
analysis
•Download all the required files and
software here:
https://sourceforge.net/projects/teachingdemos/files/Winter%202014%20LC-MS%20and%20Statistics%20Course/

•Then follow the directions in the
software/startup.R to launch all
accompanying software
Goals?
Analysis at the Metabolomic Scale
Cycle of Scientific Discovery
Hypothesis

Hypothesis Generation

Data Acquisition

Data Processing

Data Analysis

Data
Univariate vs. Multivariate
Multivariate

Predictive Modeling

Group 2

Group 1

Univariate

Hypothesis testing
(t-Test, ANOVA, etc.)

PCA

O-/PLS/-DA
Univariate vs. Multivariate
univariate/bivariate


vs.
multivariate

outliers?
mixed up samples?
Data Analysis Goals
Exploration

Classification

• Are there any trends in my data?
– analytical sources
– meta data/covariates

• Useful Methods
– matrix decomposition (PCA, ICA, NMF)
– cluster analysis

• Differences/similarities between groups?
– discrimination, classification, significant changes

• Useful Methods
– analysis of variance (ANOVA), mixed effects models
– partial least squares discriminant analysis (O-/PLS-DA)
– Others: random forest, CART, SVM, ANN

• What is related or predictive of my variable(s) of interest?
– Regression, correlation

• Useful Methods
– correlation
– partial least squares (O-/PLS)

Prediction
Data Complexity
Meta
Data
m
n

variables

Experimental
Design =
complexity

samples

Data
m-D
1-D 2-D
Variable # = dimensionality
Univariate Qualities
•length (sample size)
•center (mean, median,
geometric mean)
•dispersion (variance,
standard deviation)
•range (min / max),
•quantiles

•shape (skewness, kurtosis,
normality, etc.)

standard deviation
mean
Data Quality
Metrics
• Precision
• Accuracy
Remedies

• normalization
• outliers
detection
*Start lab 1-statistical analysis
Univariate Analyses
•Identify differences in sample population
means
•sensitive to distribution shape
•parametric = assumes normality

•error in Y, not in X (Y = mX + error)

wide

•optimal for long data
•assumed independence
•false discovery rate (FDR)

long

n-of-one
False Discovery Rate (FDR)
Type I Error: False Positives
•Type II Error: False Negatives
•Type I risk =
•1-(1-p.value)m
m = number of variables tested

FDR correction
• p-value adjustment or estimate of FDR (Fdr, q-value)
Bioinformatics (2008) 24 (12):1461-1462
Achieving “significance” is a function of:
significance level (α) and power (1-β )

effect size (standardized difference in means)

sample size (n)

*finish lab
1-statistical analysis
Clustering
Identify
•patterns
•group structure

•relationships
•Evaluate/refine hypothesis

•Reduce complexity

Artist: Chuck Close
Cluster Analysis
Use the concept similarity/dissimilarity
to group a collection of samples or
variables
Linkage
Approaches
•hierarchical (HCA)
•non-hierarchical (k-NN, k-means)
•distribution (mixtures models)
•density (DBSCAN)
•self organizing maps (SOM)

Distribution

k-means

Density
Hierarchical Cluster Analysis
• similarity/dissimilarity
defines “nearness” or
distance
euclidean manhattan Mahalanobis non-euclidean

X

X

X
*

Y

Y

Y
Hierarchical Cluster Analysis
Agglomerative/linkage algorithm
defines how points are grouped

single

complete centroid average
Dendrograms

x

x
x

Similarity

x
Hierarchical Cluster Analysis
How does my metadata
match my data structure?

Exploration

*finish lab 2-Cluster Analysis

Confirmation
Projection of Data

The algorithm defines the position of the light source
Principal Components Analysis (PCA)
• unsupervised
• maximize variance (X)
Partial Least Squares Projection to
Latent Structures (PLS)
• supervised
• maximize covariance (Y ~ X)
James X. Li, 2009, VisuMap Tech.
Interpreting PCA Results
Variance explained (eigenvalues)

Row (sample) scores and column (variable) loadings
How are scores and
loadings related?
Centering and Scaling

PMID: 16762068

*finish lab 3-Principal Components Analysis
Use PLS to test a hypothesis
Partial Least Squares (PLS) is used to identify planes of maximum
correlation between X measurements and Y (hypothesis)
PLS

PCA

time = 0

120 min.
Modeling multifactorial
relationships
~two-way ANOVA

dynamic changes among groups
PLS Related Objects
Model
•dimensions, latent variables (LV)
•performance metrics (Q2, RMSEP, etc)
•validation (training/testing, permutation, cross-validation)
•orthogonal correction
Samples
•scores
•predicted values
•residuals
Variables
•Loadings
•Coefficients, summary of loadings based on all LVs
•VIP, variable importance in projection
•Feature selection
“goodness” of the model is all about the
perspective

Determine in-sample (Q2) and outof-sample error (RMSEP) and
compare to a random model
•permutation tests

•training/testing
*finish lab 4-Partial Least Squares and lab 5-Data Analysis Case Study
Biological Interpretation
Projection or mapping of analysis results
into a biological context.
• Visualization
• Enrichment
• Networks
– biochemical
– structural
– spectral
– empirical
Identification of alterations in
biochemical domains
Organism specific biochemical relationships and information
Multiple organism DBs

•KEGG
•BioCyc
•Reactome
•Human
•HMDB

•SMPDB
*finish lab 6-Metabolite Enrichment Analysis
Network Mapping
1. Generate
Connections

2. Calculate
Mappings

3. Create
Network

Grapov D., Fiehn O., Multivariate and network tools for analysis and visualization of metabolomic data, ASMS, June 08, 2013, Minneapolis, MN
Connections and
Contexts
Biochemical (substrate/product)
•Database lookup
•Web query
Chemical (structural or
spectral similarity )
•fingerprint generation
BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99

Empirical (dependency)
•correlation, partial-correlation
Mapping Analysis Results
Analysis results

Network Annotation

*finish lab 7-Network Mapping I

Mapped Network
Biochemical
Relationships

http://www.genome.jp/dbget-bin/www_bget?rn:R00975
Structural
Similarity

http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
Mass Spectral Connections

Watrous J et al. PNAS 2012;109:E1743-E1752

*finish lab 8-Network Mapping II

Weitere ähnliche Inhalte

Was ist angesagt?

Metabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesMetabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisDmitry Grapov
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataDmitry Grapov
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Analytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionAnalytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionBivek Timalsina
 
Importance of instumental analysis
Importance of instumental analysisImportance of instumental analysis
Importance of instumental analysisDr. Nilesh Badgujar
 

Was ist angesagt? (20)

Metabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesMetabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case Studies
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data Analysis
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
Article of analytical chemistry
Article of analytical chemistryArticle of analytical chemistry
Article of analytical chemistry
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Analytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionAnalytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_Introduction
 
Importance of instumental analysis
Importance of instumental analysisImportance of instumental analysis
Importance of instumental analysis
 

Ähnlich wie 0 introduction

Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataDmitry Grapov
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"James Neill
 
Intermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisIntermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisDmitry Grapov
 
Data analysis
Data analysisData analysis
Data analysisamlbinder
 
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Matthew Powers
 
Statistical analysis
Statistical analysisStatistical analysis
Statistical analysisXiuxia Du
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selectionMarco Meoni
 
Are we really including all relevant evidence
Are we really including all relevant evidence Are we really including all relevant evidence
Are we really including all relevant evidence cheweb1
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Learning Probabilistic Relational Models using Non-Negative Matrix Factorization
Learning Probabilistic Relational Models using Non-Negative Matrix FactorizationLearning Probabilistic Relational Models using Non-Negative Matrix Factorization
Learning Probabilistic Relational Models using Non-Negative Matrix FactorizationAnthony Coutant
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyChen Liang
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyKern Rocke
 
New challenges monolixday2011
New challenges monolixday2011New challenges monolixday2011
New challenges monolixday2011blaudez
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxPrincePatel272012
 

Ähnlich wie 0 introduction (20)

Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic Data
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
 
Intermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisIntermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data Analysis
 
Data analysis
Data analysisData analysis
Data analysis
 
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
 
Statistical analysis
Statistical analysisStatistical analysis
Statistical analysis
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
 
Are we really including all relevant evidence
Are we really including all relevant evidence Are we really including all relevant evidence
Are we really including all relevant evidence
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Learning Probabilistic Relational Models using Non-Negative Matrix Factorization
Learning Probabilistic Relational Models using Non-Negative Matrix FactorizationLearning Probabilistic Relational Models using Non-Negative Matrix Factorization
Learning Probabilistic Relational Models using Non-Negative Matrix Factorization
 
Data in science
Data in science Data in science
Data in science
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertainty
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
EDA by Sastry.pptx
EDA by Sastry.pptxEDA by Sastry.pptx
EDA by Sastry.pptx
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human Ecology
 
New challenges monolixday2011
New challenges monolixday2011New challenges monolixday2011
New challenges monolixday2011
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptx
 
Learning from data
Learning from dataLearning from data
Learning from data
 

Mehr von Dmitry Grapov

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideDmitry Grapov
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014Dmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 

Mehr von Dmitry Grapov (10)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 

0 introduction