SlideShare a Scribd company logo
1 of 9
Statistical integration of
methylation, transcriptome and
proteome in cell lines
Said el Bouhaddani1, Hae-Won Uh1, Jeanine Houwing-Duistermaat1,2
1Department of Biostatistics and Research Support, Julius Center, University Medical Center Utrecht,
Netherlands;
2Department of Statistics, University of Leeds, UK.
Background
Multiple System Atrophy (MSA) is a rare neurodegenerative disorder. Almost 80% of patients
are disabled within 5 years of disease onset. The key pathogenic event when developing MSA
is an abnormal accumulation of harmful proteins. Molecular causes and consequences of this
aggregation need to be elucidated, e.g. using multiple omics datasets.
We have access to DNA-methylome, transcriptome, and proteome data, measured in cell
lines that show harmful protein aggregation and in negative controls. Standard sequential
analysis of these data shows no overlap of the significant genes.
Our aim is to develop a data integration method to identify consistent molecular biomarkers
that can classify cells with protein aggregation across all datasets. Apart from the high
dimensionality (p>N), also platform-specific heterogeneity between the omics data need to
be considered.
Motivating data & challenges
Methylome
- 850k sites on 4 cases, 4 controls
Transcriptome
- 25k probes on 3 cases, 3 controls
Proteome
- 2k proteins on 9 cases, 9 controls
Preprocessing: normalize data and map all IDs
to gene IDs
Final dataset: 1732 overlapping genes on 16
cases and 16 controls
Challenges
- High dimensional (p>N)
- Highly correlated
- Different platforms
Methods
There are several estimation methods proposed. The
general model is written as
𝑥 𝑘 = 𝑡 𝑘 𝑊⊤
+ 𝑡 𝑠,𝑘 𝑊𝑠,𝑘
⊤
+ 𝑒 𝑘
𝑦 𝑘 = 𝑡 𝑘 𝐵 + ℎ 𝑘
Underlying general model
For each omics dataset 𝒌=1,…,3, we introduce
- Joint latent variables 𝑡 underlying omics data 𝑥
and MSA outcome 𝑦
- Omic-specific latent variables 𝑡 𝑠 for each omics
dataset
Methods
Sparse PLS-DA 𝑡 𝑠 = 0, algorithmic,
sequential estimation
Sparse OPLS-DA Algorithmic, sequential
estimation
Probabilistic OPLS-
DA
Likelihood, simultaneous
estimation
Three methods considered
Estimation methods
Sparse PLS-DA (sPLS-DA) [1]
1.Convert binary 𝑦 to numerical ‘dummy’ 𝑦
2.Maximize 𝑤⊤ 𝑋⊤ 𝑦 with an L1 penalty on 𝑤
3.Calculate 𝑦 = 𝑥𝑊𝐵 and obtain class-predictions
Sparse OPLS-DA (sOPLS-DA) [2]
1.Obtain estimates for 𝑡 𝑠 𝑊𝑠
⊤ using OPLS
2.Subtract these parts from the original data matrix 𝑋
3.Follow steps in sparse PLS-DA using corrected 𝑋
Probabilistic OPLS-DA (POPLS-DA)
1.Formulate observed likelihood 𝑓(𝑥, 𝑦)
2.Formulate complete likelihood 𝑓 𝑥, 𝑦, 𝑡 =
𝑓 𝑥 𝑡 𝑓 𝑦 𝑡 𝑓(𝑡)
• Each term is computationally efficiently optimized
3.Utilize EM algorithm on 𝑓(𝑥, 𝑦, 𝑡) to obtain maximizers
for 𝑓(𝑥, 𝑦)
Simulation study
Conclusions
- POPLS-DA scores highest on accuracy, even in small sample size
- sparse OPLS-DA likely to overfit: it estimates omics-specific parts in each dataset, while sample
size is low
Setup
- Simulate 𝑋 and 𝑦 from “underlying model”
- Setup close to real data:
- 1000 features,
- 3 data types with resp. 8, 6 and 18 samples
- Two joint, two specific components
- Calculate accuracy of prediction using large
simulated test data:
- 500*{8,6,18} samples
- Compare sPLS-DA, sOPLS-DA, POPLS-DA
Data analysis
Results
- Two joint, two specific
components
- Sparsity level: 50 genes
retained (not for POPLS-DA)
- All methods separate MSA cases from controls
Conclusions
- sOPLS-DA clusters more homogeneous
- POPLS-DA has more spread, less certain about
predictions
- Top ten genes directly involved in harmful protein
aggregation
Conclusions
- POPLS-DA discriminates MSA based on multiple omics data, performs best
for small sample size
- Simulation: algorithmic methods sPLS-DA and sOPLS-DA likely to overfit,
need larger sample size
- MSA cases separated from controls based on 3 omics datasets, top genes
biologically important
- POPLS-DA will be added to OmicsPLS package (on cran.r-
project.org/package=OmicsPLS)
s.elbouhaddani@umcutrecht.nl
Günter Höglinger
Jörg Tost
Matthias Höllerhage
E-Rare EU project: MSAomics
H2020 project: IMFORFUTURE
Acknowledgments
References
[1] Lê Cao, K., Boitard, S. & Besse, P. Sparse PLS discriminant analysis:
biologically relevant feature selection and graphical displays for multiclass
problems. BMC Bioinformatics 12, 253 (2011). https://doi.org/10.1186/1471-
2105-12-253
[2] Bylesjö, M., Rantalainen, M., Cloarec, O., Nicholson, J.K., Holmes, E. and
Trygg, J. (2006), OPLS discriminant analysis: combining the strengths of PLS‐DA
and SIMCA classification. J. Chemometrics, 20: 341-351. doi:10.1002/cem.1006

More Related Content

Similar to Omics data integration for MSA | International Society for Clinical Biostatistics 2020

Presentation july 28_2015
Presentation july 28_2015Presentation july 28_2015
Presentation july 28_2015
gkoytiger
 
20100509 bioinformatics kapushesky_lecture05_0
20100509 bioinformatics kapushesky_lecture05_020100509 bioinformatics kapushesky_lecture05_0
20100509 bioinformatics kapushesky_lecture05_0
Computer Science Club
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
Chien-Wei Lin
 

Similar to Omics data integration for MSA | International Society for Clinical Biostatistics 2020 (20)

Presentation july 28_2015
Presentation july 28_2015Presentation july 28_2015
Presentation july 28_2015
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
20100509 bioinformatics kapushesky_lecture05_0
20100509 bioinformatics kapushesky_lecture05_020100509 bioinformatics kapushesky_lecture05_0
20100509 bioinformatics kapushesky_lecture05_0
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data sets
 
Short story.pptx
Short story.pptxShort story.pptx
Short story.pptx
 
Heart Disease Prediction Using Associative Relational Classification Techniq...
Heart Disease Prediction Using Associative Relational  Classification Techniq...Heart Disease Prediction Using Associative Relational  Classification Techniq...
Heart Disease Prediction Using Associative Relational Classification Techniq...
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...
 
article.pdf
article.pdfarticle.pdf
article.pdf
 
Data handling metabolomics
Data handling metabolomicsData handling metabolomics
Data handling metabolomics
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 
heart final last sem.pptx
heart final last sem.pptxheart final last sem.pptx
heart final last sem.pptx
 
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methods
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Omics data integration for MSA | International Society for Clinical Biostatistics 2020

  • 1. Statistical integration of methylation, transcriptome and proteome in cell lines Said el Bouhaddani1, Hae-Won Uh1, Jeanine Houwing-Duistermaat1,2 1Department of Biostatistics and Research Support, Julius Center, University Medical Center Utrecht, Netherlands; 2Department of Statistics, University of Leeds, UK.
  • 2. Background Multiple System Atrophy (MSA) is a rare neurodegenerative disorder. Almost 80% of patients are disabled within 5 years of disease onset. The key pathogenic event when developing MSA is an abnormal accumulation of harmful proteins. Molecular causes and consequences of this aggregation need to be elucidated, e.g. using multiple omics datasets. We have access to DNA-methylome, transcriptome, and proteome data, measured in cell lines that show harmful protein aggregation and in negative controls. Standard sequential analysis of these data shows no overlap of the significant genes. Our aim is to develop a data integration method to identify consistent molecular biomarkers that can classify cells with protein aggregation across all datasets. Apart from the high dimensionality (p>N), also platform-specific heterogeneity between the omics data need to be considered.
  • 3. Motivating data & challenges Methylome - 850k sites on 4 cases, 4 controls Transcriptome - 25k probes on 3 cases, 3 controls Proteome - 2k proteins on 9 cases, 9 controls Preprocessing: normalize data and map all IDs to gene IDs Final dataset: 1732 overlapping genes on 16 cases and 16 controls Challenges - High dimensional (p>N) - Highly correlated - Different platforms
  • 4. Methods There are several estimation methods proposed. The general model is written as 𝑥 𝑘 = 𝑡 𝑘 𝑊⊤ + 𝑡 𝑠,𝑘 𝑊𝑠,𝑘 ⊤ + 𝑒 𝑘 𝑦 𝑘 = 𝑡 𝑘 𝐵 + ℎ 𝑘 Underlying general model For each omics dataset 𝒌=1,…,3, we introduce - Joint latent variables 𝑡 underlying omics data 𝑥 and MSA outcome 𝑦 - Omic-specific latent variables 𝑡 𝑠 for each omics dataset
  • 5. Methods Sparse PLS-DA 𝑡 𝑠 = 0, algorithmic, sequential estimation Sparse OPLS-DA Algorithmic, sequential estimation Probabilistic OPLS- DA Likelihood, simultaneous estimation Three methods considered Estimation methods Sparse PLS-DA (sPLS-DA) [1] 1.Convert binary 𝑦 to numerical ‘dummy’ 𝑦 2.Maximize 𝑤⊤ 𝑋⊤ 𝑦 with an L1 penalty on 𝑤 3.Calculate 𝑦 = 𝑥𝑊𝐵 and obtain class-predictions Sparse OPLS-DA (sOPLS-DA) [2] 1.Obtain estimates for 𝑡 𝑠 𝑊𝑠 ⊤ using OPLS 2.Subtract these parts from the original data matrix 𝑋 3.Follow steps in sparse PLS-DA using corrected 𝑋 Probabilistic OPLS-DA (POPLS-DA) 1.Formulate observed likelihood 𝑓(𝑥, 𝑦) 2.Formulate complete likelihood 𝑓 𝑥, 𝑦, 𝑡 = 𝑓 𝑥 𝑡 𝑓 𝑦 𝑡 𝑓(𝑡) • Each term is computationally efficiently optimized 3.Utilize EM algorithm on 𝑓(𝑥, 𝑦, 𝑡) to obtain maximizers for 𝑓(𝑥, 𝑦)
  • 6. Simulation study Conclusions - POPLS-DA scores highest on accuracy, even in small sample size - sparse OPLS-DA likely to overfit: it estimates omics-specific parts in each dataset, while sample size is low Setup - Simulate 𝑋 and 𝑦 from “underlying model” - Setup close to real data: - 1000 features, - 3 data types with resp. 8, 6 and 18 samples - Two joint, two specific components - Calculate accuracy of prediction using large simulated test data: - 500*{8,6,18} samples - Compare sPLS-DA, sOPLS-DA, POPLS-DA
  • 7. Data analysis Results - Two joint, two specific components - Sparsity level: 50 genes retained (not for POPLS-DA) - All methods separate MSA cases from controls Conclusions - sOPLS-DA clusters more homogeneous - POPLS-DA has more spread, less certain about predictions - Top ten genes directly involved in harmful protein aggregation
  • 8. Conclusions - POPLS-DA discriminates MSA based on multiple omics data, performs best for small sample size - Simulation: algorithmic methods sPLS-DA and sOPLS-DA likely to overfit, need larger sample size - MSA cases separated from controls based on 3 omics datasets, top genes biologically important - POPLS-DA will be added to OmicsPLS package (on cran.r- project.org/package=OmicsPLS)
  • 9. s.elbouhaddani@umcutrecht.nl Günter Höglinger Jörg Tost Matthias Höllerhage E-Rare EU project: MSAomics H2020 project: IMFORFUTURE Acknowledgments References [1] Lê Cao, K., Boitard, S. & Besse, P. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12, 253 (2011). https://doi.org/10.1186/1471- 2105-12-253 [2] Bylesjö, M., Rantalainen, M., Cloarec, O., Nicholson, J.K., Holmes, E. and Trygg, J. (2006), OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification. J. Chemometrics, 20: 341-351. doi:10.1002/cem.1006