SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
An analytic approach for interpretable
predictive models in high dimensional data, in
the presence of interactions with exposures
Sahir Rai Bhatnagar, PhD Candidate
Joint with Yi Yang, Mathieu Blanchette, Luigi Bouchard, Celia Greenwood
Biostatistics, McGill University
preprint available at
sahirbhatnagar.com
Simulated Data ̸=
Real Data
0/21
Simple Rule 11:
Simulated Data ̸=
Real Data
0/21
Motivation
one predictor variable at a time
Predictor Variable Phenotype
one predictor variable at a time
Predictor Variable Phenotype
Test 1
Test 2
Test 3
Test 4
Test 5
1/21
a network based view
Predictor Variable Phenotype
a network based view
Predictor Variable Phenotype
a network based view
Predictor Variable Phenotype
Test 1
2/21
system level changes due to environment
Predictor Variable PhenotypeEnvironment
A
B
system level changes due to environment
Predictor Variable PhenotypeEnvironment
A
B
Test 1
3/21
Motivating Dataset: Newborn epigenetic adaptations to gesta-
tional diabetes exposure (Luigi Bouchard, USherbrooke)
Environment
Gestational
Diabetes
Large Data
Child’s epigenome
(p ≈ 450k)
Phenotype
Obesity measures
4/21
Differential Correlation between environments
(a) Gestational diabetes affected pregnancy (b) Controls
5/21
NIH MRI brain study
Environment
Age
Large Data
Cortical Thickness
(p ≈ 80k)
Phenotype
Intelligence
6/21
Goals of this study
Objective
(i) Whether clustering that incorporates known covariate or
exposure information can improve prediction models
7/21
Goals of this study
Objective
(i) Whether clustering that incorporates known covariate or
exposure information can improve prediction models
(ii) Can the resulting clusters provide an easier route to
interpretation
7/21
Methods
ECLUST - our proposed method: 2 steps
Original Data
ECLUST - our proposed method: 2 steps
Original Data
E = 0
1a) Gene Similarity
E = 1
ECLUST - our proposed method: 2 steps
Original Data
E = 0
1a) Gene Similarity
E = 1
ECLUST - our proposed method: 2 steps
Original Data
E = 0
1a) Gene Similarity
E = 1
1b) Cluster
Representation
ECLUST - our proposed method: 2 steps
Original Data
E = 0
1a) Gene Similarity
E = 1
1b) Cluster
Representation
n × 1 n × 1
ECLUST - our proposed method: 2 steps
Original Data
E = 0
1a) Gene Similarity
E = 1
1b) Cluster
Representation
n × 1 n × 1
2) Penalized
Regression
Yn×1∼ + ×E
8/21
the objective of statistical
methods is the reduction of
data. A quantity of data . . . is to be
replaced by relatively few quantities
which shall adequately represent
. . . the relevant information
contained in the original data.
- Sir R. A. Fisher, 1922
8/21
Step 1a: Method to detect gene clusters
(i) Hierarchical clustering (average linkage) with TOM1
scoring
dissimilarity2
:
|TOME=1 − TOME=0|
(ii) Number of clusters chosen using dynamicTreeCut algorithm 3
Original Data
E = 0
1a) Gene Similarity
E = 1
1Ravasz et al., Science (2002)
2Klein Oros et al., Frontiers in Genetics (2016)
3Langfelder and Zhang, Bioinformatics (2008)
9/21
Step 1b: Cluster Representation
(i) Average 4
(ii) 1st Principal Component 5
Original Data
E = 0
1a) Gene Similarity
E = 1
1b) Cluster
Representation
n × 1 n × 1
4Hastie et al., Genome Biology (2001), Park et al., Biostatistics (2007)
5Kendall, A Course in Multivariate analysis (1957)
10/21
Step 2: Variable Selection
(i) Linear effects: Lasso, Elastic Net 6
(ii) Non-linear effects: MARS 7
Original Data
E = 0
1a) Gene Similarity
E = 1
1b) Cluster
Representation
n × 1 n × 1
2) Penalized
Regression
Yn×1∼ + ×E
6Tibshirani, JRSSB (1996), Zou and Hastie, JRSSB (2005)
7Friedman, Annals of Statistics (1991)
11/21
Simulation Study
Simulated TOM by Exposure Status
(a) TOM(XE=1) (b) TOM(XE=0)
12/21
Difference of TOMs
(a) |TOM(XE=1) − TOM(XE=0)| 13/21
TOM based on all subjects
(a) TOM(Xall) 14/21
Real Data Analysis
Gestational Diabetes: Prediction Performance
15/21
Gestational Diabetes: Interpretation of Clusters with IPA
• Canonical Pathways: 1.25-dihydroxyvitamin D3 Biosynthesis –
vitamin D associated with obesity
16/21
Gestational Diabetes: Interpretation of Clusters with IPA
• Canonical Pathways: 1.25-dihydroxyvitamin D3 Biosynthesis –
vitamin D associated with obesity
• Diseases and Disorders: Hepatic System Disease – metabolism
of glucose and lipids
16/21
Gestational Diabetes: Interpretation of Clusters with IPA
• Canonical Pathways: 1.25-dihydroxyvitamin D3 Biosynthesis –
vitamin D associated with obesity
• Diseases and Disorders: Hepatic System Disease – metabolism
of glucose and lipids
• Physiological System Development and Function:
(i) Behavior and neurodevelopment – associated with obesity
(ii) Embryonic and organ development – GD associated with
macrosomia
16/21
NIHPD: Age
17/21
NIHPD: Income
18/21
Final Remarks
Discussion and Contributions
• Large system-wide changes are observed in many
environments (DNA methylation, cortical thickness, gene
expression)
19/21
Discussion and Contributions
• Large system-wide changes are observed in many
environments (DNA methylation, cortical thickness, gene
expression)
• Environment dependent clustering can improve prediction
performance in high dimensional settings (n << p)
19/21
Discussion and Contributions
• Large system-wide changes are observed in many
environments (DNA methylation, cortical thickness, gene
expression)
• Environment dependent clustering can improve prediction
performance in high dimensional settings (n << p)
• Clusters can be interpreted but require much more expert
knowledge
19/21
Discussion and Contributions
• Large system-wide changes are observed in many
environments (DNA methylation, cortical thickness, gene
expression)
• Environment dependent clustering can improve prediction
performance in high dimensional settings (n << p)
• Clusters can be interpreted but require much more expert
knowledge
• Leverages existing computationally fast algorithms and can run
on a laptop computer (p ≈ 10k)
19/21
Discussion and Contributions
• Large system-wide changes are observed in many
environments (DNA methylation, cortical thickness, gene
expression)
• Environment dependent clustering can improve prediction
performance in high dimensional settings (n << p)
• Clusters can be interpreted but require much more expert
knowledge
• Leverages existing computationally fast algorithms and can run
on a laptop computer (p ≈ 10k)
• Software implementation in R: sahirbhatnagar.com
19/21
Limitations
• There must be a high-dimensional signature of the exposure
20/21
Limitations
• There must be a high-dimensional signature of the exposure
• Covariance estimation
20/21
Limitations
• There must be a high-dimensional signature of the exposure
• Covariance estimation
• Currently limited to binary environment
20/21
Limitations
• There must be a high-dimensional signature of the exposure
• Covariance estimation
• Currently limited to binary environment
• Interpretation can be difficult
20/21
Acknowledgements
• Dr. Celia Greenwood
• Dr. Blanchette and Dr. Yang
• Dr. Luigi Bouchard, André Anne
Houde
• Dr. Steele, Dr. Kramer,
Dr. Abrahamowicz
• Maxime Turgeon, Kevin
McGregor, Lauren Mokry,
Dr. Forest
• Greg Voisin, Dr. Forgetta,
Dr. Klein
• Mothers and children from the
study
21/21

Weitere ähnliche Inhalte

Mehr von sahirbhatnagar

Analysis of DNA methylation and Gene expression to predict childhood obesity
Analysis of DNA methylation and Gene expression to predict childhood obesityAnalysis of DNA methylation and Gene expression to predict childhood obesity
Analysis of DNA methylation and Gene expression to predict childhood obesity
sahirbhatnagar
 

Mehr von sahirbhatnagar (13)

Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Data
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
 
An introduction to knitr and R Markdown
An introduction to knitr and R MarkdownAn introduction to knitr and R Markdown
An introduction to knitr and R Markdown
 
Atelier r-gerad
Atelier r-geradAtelier r-gerad
Atelier r-gerad
 
Reproducible Research: An Introduction to knitr
Reproducible Research: An Introduction to knitrReproducible Research: An Introduction to knitr
Reproducible Research: An Introduction to knitr
 
Analysis of DNA methylation and Gene expression to predict childhood obesity
Analysis of DNA methylation and Gene expression to predict childhood obesityAnalysis of DNA methylation and Gene expression to predict childhood obesity
Analysis of DNA methylation and Gene expression to predict childhood obesity
 
Estimation and Accuracy after Model Selection
Estimation and Accuracy after Model SelectionEstimation and Accuracy after Model Selection
Estimation and Accuracy after Model Selection
 
Absolute risk estimation in a case cohort study of prostate cancer
Absolute risk estimation in a case cohort study of prostate cancerAbsolute risk estimation in a case cohort study of prostate cancer
Absolute risk estimation in a case cohort study of prostate cancer
 
Computational methods for case-cohort studies
Computational methods for case-cohort studiesComputational methods for case-cohort studies
Computational methods for case-cohort studies
 
Factors influencing participation in cancer screening
Factors influencing participation in cancer screeningFactors influencing participation in cancer screening
Factors influencing participation in cancer screening
 
Introduction to LaTeX
Introduction to LaTeXIntroduction to LaTeX
Introduction to LaTeX
 
Methylation and Expression data integration
Methylation and Expression data integrationMethylation and Expression data integration
Methylation and Expression data integration
 
Reproducible Research
Reproducible ResearchReproducible Research
Reproducible Research
 

Kürzlich hochgeladen

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 

Kürzlich hochgeladen (20)

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 

An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures

  • 1. An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures Sahir Rai Bhatnagar, PhD Candidate Joint with Yi Yang, Mathieu Blanchette, Luigi Bouchard, Celia Greenwood Biostatistics, McGill University preprint available at sahirbhatnagar.com
  • 3. Simple Rule 11: Simulated Data ̸= Real Data 0/21
  • 5. one predictor variable at a time Predictor Variable Phenotype
  • 6. one predictor variable at a time Predictor Variable Phenotype Test 1 Test 2 Test 3 Test 4 Test 5 1/21
  • 7. a network based view Predictor Variable Phenotype
  • 8. a network based view Predictor Variable Phenotype
  • 9. a network based view Predictor Variable Phenotype Test 1 2/21
  • 10. system level changes due to environment Predictor Variable PhenotypeEnvironment A B
  • 11. system level changes due to environment Predictor Variable PhenotypeEnvironment A B Test 1 3/21
  • 12. Motivating Dataset: Newborn epigenetic adaptations to gesta- tional diabetes exposure (Luigi Bouchard, USherbrooke) Environment Gestational Diabetes Large Data Child’s epigenome (p ≈ 450k) Phenotype Obesity measures 4/21
  • 13. Differential Correlation between environments (a) Gestational diabetes affected pregnancy (b) Controls 5/21
  • 14. NIH MRI brain study Environment Age Large Data Cortical Thickness (p ≈ 80k) Phenotype Intelligence 6/21
  • 15. Goals of this study Objective (i) Whether clustering that incorporates known covariate or exposure information can improve prediction models 7/21
  • 16. Goals of this study Objective (i) Whether clustering that incorporates known covariate or exposure information can improve prediction models (ii) Can the resulting clusters provide an easier route to interpretation 7/21
  • 18. ECLUST - our proposed method: 2 steps Original Data
  • 19. ECLUST - our proposed method: 2 steps Original Data E = 0 1a) Gene Similarity E = 1
  • 20. ECLUST - our proposed method: 2 steps Original Data E = 0 1a) Gene Similarity E = 1
  • 21. ECLUST - our proposed method: 2 steps Original Data E = 0 1a) Gene Similarity E = 1 1b) Cluster Representation
  • 22. ECLUST - our proposed method: 2 steps Original Data E = 0 1a) Gene Similarity E = 1 1b) Cluster Representation n × 1 n × 1
  • 23. ECLUST - our proposed method: 2 steps Original Data E = 0 1a) Gene Similarity E = 1 1b) Cluster Representation n × 1 n × 1 2) Penalized Regression Yn×1∼ + ×E 8/21
  • 24. the objective of statistical methods is the reduction of data. A quantity of data . . . is to be replaced by relatively few quantities which shall adequately represent . . . the relevant information contained in the original data. - Sir R. A. Fisher, 1922 8/21
  • 25. Step 1a: Method to detect gene clusters (i) Hierarchical clustering (average linkage) with TOM1 scoring dissimilarity2 : |TOME=1 − TOME=0| (ii) Number of clusters chosen using dynamicTreeCut algorithm 3 Original Data E = 0 1a) Gene Similarity E = 1 1Ravasz et al., Science (2002) 2Klein Oros et al., Frontiers in Genetics (2016) 3Langfelder and Zhang, Bioinformatics (2008) 9/21
  • 26. Step 1b: Cluster Representation (i) Average 4 (ii) 1st Principal Component 5 Original Data E = 0 1a) Gene Similarity E = 1 1b) Cluster Representation n × 1 n × 1 4Hastie et al., Genome Biology (2001), Park et al., Biostatistics (2007) 5Kendall, A Course in Multivariate analysis (1957) 10/21
  • 27. Step 2: Variable Selection (i) Linear effects: Lasso, Elastic Net 6 (ii) Non-linear effects: MARS 7 Original Data E = 0 1a) Gene Similarity E = 1 1b) Cluster Representation n × 1 n × 1 2) Penalized Regression Yn×1∼ + ×E 6Tibshirani, JRSSB (1996), Zou and Hastie, JRSSB (2005) 7Friedman, Annals of Statistics (1991) 11/21
  • 29. Simulated TOM by Exposure Status (a) TOM(XE=1) (b) TOM(XE=0) 12/21
  • 30. Difference of TOMs (a) |TOM(XE=1) − TOM(XE=0)| 13/21
  • 31. TOM based on all subjects (a) TOM(Xall) 14/21
  • 33. Gestational Diabetes: Prediction Performance 15/21
  • 34. Gestational Diabetes: Interpretation of Clusters with IPA • Canonical Pathways: 1.25-dihydroxyvitamin D3 Biosynthesis – vitamin D associated with obesity 16/21
  • 35. Gestational Diabetes: Interpretation of Clusters with IPA • Canonical Pathways: 1.25-dihydroxyvitamin D3 Biosynthesis – vitamin D associated with obesity • Diseases and Disorders: Hepatic System Disease – metabolism of glucose and lipids 16/21
  • 36. Gestational Diabetes: Interpretation of Clusters with IPA • Canonical Pathways: 1.25-dihydroxyvitamin D3 Biosynthesis – vitamin D associated with obesity • Diseases and Disorders: Hepatic System Disease – metabolism of glucose and lipids • Physiological System Development and Function: (i) Behavior and neurodevelopment – associated with obesity (ii) Embryonic and organ development – GD associated with macrosomia 16/21
  • 40. Discussion and Contributions • Large system-wide changes are observed in many environments (DNA methylation, cortical thickness, gene expression) 19/21
  • 41. Discussion and Contributions • Large system-wide changes are observed in many environments (DNA methylation, cortical thickness, gene expression) • Environment dependent clustering can improve prediction performance in high dimensional settings (n << p) 19/21
  • 42. Discussion and Contributions • Large system-wide changes are observed in many environments (DNA methylation, cortical thickness, gene expression) • Environment dependent clustering can improve prediction performance in high dimensional settings (n << p) • Clusters can be interpreted but require much more expert knowledge 19/21
  • 43. Discussion and Contributions • Large system-wide changes are observed in many environments (DNA methylation, cortical thickness, gene expression) • Environment dependent clustering can improve prediction performance in high dimensional settings (n << p) • Clusters can be interpreted but require much more expert knowledge • Leverages existing computationally fast algorithms and can run on a laptop computer (p ≈ 10k) 19/21
  • 44. Discussion and Contributions • Large system-wide changes are observed in many environments (DNA methylation, cortical thickness, gene expression) • Environment dependent clustering can improve prediction performance in high dimensional settings (n << p) • Clusters can be interpreted but require much more expert knowledge • Leverages existing computationally fast algorithms and can run on a laptop computer (p ≈ 10k) • Software implementation in R: sahirbhatnagar.com 19/21
  • 45. Limitations • There must be a high-dimensional signature of the exposure 20/21
  • 46. Limitations • There must be a high-dimensional signature of the exposure • Covariance estimation 20/21
  • 47. Limitations • There must be a high-dimensional signature of the exposure • Covariance estimation • Currently limited to binary environment 20/21
  • 48. Limitations • There must be a high-dimensional signature of the exposure • Covariance estimation • Currently limited to binary environment • Interpretation can be difficult 20/21
  • 49. Acknowledgements • Dr. Celia Greenwood • Dr. Blanchette and Dr. Yang • Dr. Luigi Bouchard, André Anne Houde • Dr. Steele, Dr. Kramer, Dr. Abrahamowicz • Maxime Turgeon, Kevin McGregor, Lauren Mokry, Dr. Forest • Greg Voisin, Dr. Forgetta, Dr. Klein • Mothers and children from the study 21/21