SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Pathways-Driven Sparse Regression Identifies
Pathways and Genes Associated with High-Density
Lipoprotein Cholesterol in Two Asian Cohorts
Silver M, Chen P, Li R, Cheng C-Y, Wong T-Y, et al.
In PLOS Genetics, 2013
Introduction
• Genes do not act in isolation, but interact in complex
networks or pathways
• Rather than univariate approaches, a joint modelling
approach, a dual-level, sparse regression model is proposed
• can simultaneously identify pathways and genes for pathway
selection
• Pathways-driven gene selection in a search for pathways and genes
associated with variation
Sparse group lasso model
• N individuals, P SNPs, (N x P) genotype matrix X, L pathways
• Assumptions
• All P SNPs may be mapped to L groups or pathways
• Pathways are disjoint or non-overlapping
causal SNPs
causal pathways
Pathway level constraint SNP level constraint
𝛼 controls how the sparsity constraint is
distributed between the two penalties
𝜆 controls the degree of sparsity in 𝛽
SGL model estimation
• To estimate 𝛽 𝑆𝐺𝐿
,
• block, or group-wise coordinate gradient
descent (BCGD) algorithm
• Select a pathway 𝑙
• Select SNP 𝑗 in selected pathway 𝑙
• Pathway, SNP partial residuals
• Regress out the current estimated effects of all
other pathways and SNPs
SGL simulation study 1
• Hypothesis
• causal SNPs are enriched in a given pathway
• pathway-driven SNP selection using SGL will outperform
simple lasso selection
• Randomly select 5 causal SNPs from a single pathway / all
2500 SNPs (without pathway information)
The problem of overlapping pathways
• Genes and SNPs may map to multiple pathways
• The optimization is no longer separable into groups
(pathways)
• Not be able to select pathways independently
• By duplicating SNP predictors, SNPs belonging to
more than one pathway can enter the model
separately
• SNPs are selected in each pathway whose joint
effects pass a pathway selection threshold,
irrespective of overlaps between pathways
• Pathways are independent
• they do not compete in the model estimation process
Partially overlapping causal SNPs
The problem of overlapping pathways
•
• each pathway is regressed against the phenotype vector y
• Only coordinate gradient descent within selected
pathway (SGL-CGD)
• Under the independence assumption, the estimation of
each 𝛽𝑙
∗
doesn’t depend on the other estimates 𝛽 𝑘
∗
• Need only record the set of selected SNPs in each
selected pathway
SGL simulation study 2
Figure 5. SGL Simulation Study with overlapping pathways
Table 1. Mean number of pathways and SNPs selected by each model
at each effect size, γ, across 2000 MC simulations
• SNPs are mapped to 50 overlapping pathways,
each containing 30 SNPs
• Each pathway overlaps any adjacent pathway
by 10 SNPs
• The number of selected pathways or SNPs
increases with decreasing effect size, as the
number of pathways close to the selection
threshold set
SGL simulation study 2
• Pathway and SNP selection power and
False positive rates (FPR) at MC
simulation z
• SGL-CGD consistently outperforms SGL,
both in terms of pathway selection
sensitivity and control of false positives
• SGL-BCGD typically has a higher FPR
than SGL-CGD, since more SNPs are
selected from non-causal pathways
• SGL-CGD is more often able to select
both causal pathways, and to select
additional causal SNPs that are missed
by SGL
Figure 6. SGL-CGD vs SGL-BCGD performance
Pathway and SNP selection bias
• Biasing factors
• pathway size, varying patterns of SNP-SNP correlations, and gene
sizes
• An adaptive weight-tuning strategy to reduce selection bias
• tuning the pathway weight vector 𝑤 to ensure that each pathway
must have an equal chance of being selected
Ranking variables
• A resampling strategy
• calculate pathway, gene and SNP selection frequencies by repeatedly
fitting the model over B subsamples of the data, at fixed values for 𝛼 and
𝜆
• exploit knowledge of finite sample variability obtained by subsampling, to
achieve better estimates of a variable's importance
• can rank pathways, genes and SNPs in order of their strength of
association with the phenotype
• Pathways or SNPs and genes are ranked in order of their selection
probabilities
Simulation study 3
• Evaluate ranking strategies
• Use real genotype and pathways data
• genome-wide SNP dataset ‘SP2’
• KEGG pathways database
• SNP ranking
• TP: selected SNPs that tag at least one causal
SNP
• FP: selected SNPs which do not tag any causal
SNP
• gene ranking
• TP: selected causal genes(map to true causal SNP)
• FP: selected non-causal genes
• Compared with SNP and gene rankings
using a univariate, regression-based
quantitative trait test (QTT)
K: the number of causal SNPs
GV, TV: proportion of trait variance
Simulation study 3
TPR: The proportion of subsamples in
which the correct causal pathway is
selected
Figure 7. A–F: SNP and gene ranking performance for the six different scenarios
Pathway mapping
• Genes are mapped to pathways using information on
gene-gene interactions.
• Many SNPs and genes do not map to any known
pathway.
• Genes and SNPs may map to more than one pathway.
• Many SNPs cannot be mapped to a pathway since
they do not map to a mapped gene.
Available SNPs
492,639 SNPs (SP2)
515,503 SNPs (SiMES)
Genes: GRCH36/hg18
21,004 genes
239,757 SNPs (SP2)
251,089 SNPs (SiMES)
mapped to
18,845 genes (SP2)
18,919 genes (SiMES)
within 10kbp
Pathways: KEGG
185 Pathways containing
5,267 distinct genes
SNP to gene
mapping
75,389 SNPs (SP2)
78,933 SNPs (SiMES)
mapped to
4,734 genes (SP2)
4,751 genes (SiMES)
and 185 pathways
SNP to pathway
mapping
Results
• Pathways-driven SNP selection on the SP2 and SiMES
datasets separately using SGL
• Combine this with the subsampling procedure to highlight
pathways and genes associated with variation
• Compare results from both datasets
• Compare with the resulting pathway and
SNP selection frequency distributions with
null distributions
• A greater number of SNPs contribute to
increase the number of pathways
• The number of SNPs may affect the
resulting pathway and SNP rankings
• Optimal 𝛼=?
Table 5. Separate combinations of regularisation
parameters, 𝜆 and 𝛼 used for analysis of the SP2 dataset.
Pathway level constraint SNP level constraint
Pathway and SNP selection results
Pathway and SNP selection results
Figure 11. Empirical and null pathway selection
frequency distributions for all 185 KEGG pathways
with the SP2 dataset
Figure 12. Empirical and null SNP selection
frequency distributions with the SP2 dataset
Figure 14. Empirical and null pathway (top) and
SNP (bottom) selection frequency distributions for
the SiMES dataset
𝛼 = 0.85
𝛼 = 0.95
clearer separation of
empirical and null
distributions
Biased empirical pathway and
SNP selection frequency
distributions
𝛼 = 0.95
Pathway and SNP selection results
Figure 13. SP2 dataset: scatter plots comparing empirical and null
selection frequencies presented in Figures 11 and 12
Figure 15. SiMES dataset: Scatter plots comparing empirical and null pathway (left)
and SNP (right) selection frequencies presented in Figure 14
• Increased correlation between empirical and null selection
frequency distributions at the lower 𝛼 increase bias in the
empirical results
• The selection of too many SNPs will add noise, bias
Table 6. SP2 dataset: Pearson correlation coefficients (r) and p-
values for the data plotted in Figure 13
Table 9. SiMES dataset: Pearson correlation coefficients (r) and
p-values for the data plotted in Figure 15.
Pathway and SNP selection results
Top 30 pathways and genes
... … … … …
Table 7. SP2 dataset: Top 30 pathways, ranked by pathway selection frequency, 𝜋 𝑝𝑎𝑡ℎ
.
Table 8. SP2 and SiMES datasets: Top 30 genes ranked by
gene selection frequency, 𝜋 𝑔𝑒𝑛𝑒
.
... … … …
Top 30 pathways
... … … … …
Table 10. SiMES dataset: Top 30 pathways, ranked by pathway selection frequency, 𝜋 𝑝𝑎𝑡ℎ
.
Comparison of ranked pathway and gene lists
• Pathway rankings
Figure 16. Comparison of top-k SP2 and SiMES pathway rankings
Normalized Canberra distance(left), FDR q-values (right)
Table 11. Consensus set of important pathways, Ψ25
𝑝𝑎𝑡ℎ
, for SP2 and
SiMES datasets with k = 25.
closest agreement when k = 25
Comparison of ranked pathway and gene lists
• Gene rankings
Figure 17. Comparison of top-k SP2 and SiMES gene rankings, for k = 1,…,500.
Normalized Canberra distance(left), FDR q-values (right)
Table 13. Top 30 consensus genes ordered by their average rank, 𝜓244
𝑔𝑒𝑛𝑒
closest agreement when k=244
Discussion
• A method for the detection of pathways and genes associated with a
quantitative trait
• uses a sparse regression model, the sparse group lasso, that enforces sparsity at
the pathway and SNP level.
• identify important pathways and also maximize the power to detect causal SNPs
• Simulation studies
• SGL has greater SNP selection power than lasso
• a modified SGL-CGD estimation algorithm that treats pathways as independent,
may offer greater sensitivity for the detection of causal SNPs and pathways
• combines with a weight-tuning algorithm to reduce selection bias
• a resampling technique is designed to provide a robust measure of variable
importance
Thank you
Q & A

Weitere ähnliche Inhalte

Andere mochten auch

[DL輪読会]Unsupervised Learning of 3D Structure from Images
[DL輪読会]Unsupervised Learning of 3D Structure from Images[DL輪読会]Unsupervised Learning of 3D Structure from Images
[DL輪読会]Unsupervised Learning of 3D Structure from ImagesDeep Learning JP
 
Src슬라이드(1총괄1세부) 임요한
Src슬라이드(1총괄1세부) 임요한Src슬라이드(1총괄1세부) 임요한
Src슬라이드(1총괄1세부) 임요한SRCDSC
 
[DL輪読会]Learning What and Where to Draw (NIPS’16)
[DL輪読会]Learning What and Where to Draw (NIPS’16)[DL輪読会]Learning What and Where to Draw (NIPS’16)
[DL輪読会]Learning What and Where to Draw (NIPS’16)Deep Learning JP
 
[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphsDeep Learning JP
 
[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER
[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER
[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODERDeep Learning JP
 
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...Deep Learning JP
 
[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks
[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks
[DL輪読会]Image-to-Image Translation with Conditional Adversarial NetworksDeep Learning JP
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKSDeep Learning JP
 
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural NetworksDeep Learning JP
 
[DL輪読会]Semi supervised qa with generative domain-adaptive nets
[DL輪読会]Semi supervised qa with generative domain-adaptive nets[DL輪読会]Semi supervised qa with generative domain-adaptive nets
[DL輪読会]Semi supervised qa with generative domain-adaptive netsDeep Learning JP
 
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image GenerationDeep Learning JP
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...Deep Learning JP
 
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...Deep Learning JP
 
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개Terry Cho
 
[DL輪読会]Understanding deep learning requires rethinking generalization
[DL輪読会]Understanding deep learning requires rethinking generalization[DL輪読会]Understanding deep learning requires rethinking generalization
[DL輪読会]Understanding deep learning requires rethinking generalizationDeep Learning JP
 

Andere mochten auch (16)

Lecture18 xing
Lecture18 xingLecture18 xing
Lecture18 xing
 
[DL輪読会]Unsupervised Learning of 3D Structure from Images
[DL輪読会]Unsupervised Learning of 3D Structure from Images[DL輪読会]Unsupervised Learning of 3D Structure from Images
[DL輪読会]Unsupervised Learning of 3D Structure from Images
 
Src슬라이드(1총괄1세부) 임요한
Src슬라이드(1총괄1세부) 임요한Src슬라이드(1총괄1세부) 임요한
Src슬라이드(1총괄1세부) 임요한
 
[DL輪読会]Learning What and Where to Draw (NIPS’16)
[DL輪読会]Learning What and Where to Draw (NIPS’16)[DL輪読会]Learning What and Where to Draw (NIPS’16)
[DL輪読会]Learning What and Where to Draw (NIPS’16)
 
[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs
 
[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER
[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER
[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER
 
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
 
[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks
[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks
[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
 
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
 
[DL輪読会]Semi supervised qa with generative domain-adaptive nets
[DL輪読会]Semi supervised qa with generative domain-adaptive nets[DL輪読会]Semi supervised qa with generative domain-adaptive nets
[DL輪読会]Semi supervised qa with generative domain-adaptive nets
 
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
 
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
 
[DL輪読会]Understanding deep learning requires rethinking generalization
[DL輪読会]Understanding deep learning requires rethinking generalization[DL輪読会]Understanding deep learning requires rethinking generalization
[DL輪読会]Understanding deep learning requires rethinking generalization
 

Ähnlich wie Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated with High-Density Lipoprotein Cholesterol in Two Asian Cohorts

Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Golden Helix Inc
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Christos Argyropoulos
 
Pathway analysis for genomics data
Pathway analysis for genomics dataPathway analysis for genomics data
Pathway analysis for genomics dataSakshiJha40
 
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxPABOLU TEJASREE
 
O.M.GSEA - An in-depth introduction to gene-set enrichment analysis
O.M.GSEA - An in-depth introduction to gene-set enrichment analysisO.M.GSEA - An in-depth introduction to gene-set enrichment analysis
O.M.GSEA - An in-depth introduction to gene-set enrichment analysisShana White
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxFatma Sayed Ibrahim
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
Linkage analysis
Linkage analysisLinkage analysis
Linkage analysisUshaYadav24
 
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Varsha Gayatonde
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 
Pooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorPooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorDevin Petersohn
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree constructionUddalok Jana
 
Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Francesco Gadaleta
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubJennifer Shelton
 
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...SOYEON KIM
 

Ähnlich wie Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated with High-Density Lipoprotein Cholesterol in Two Asian Cohorts (20)

Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
Pathway analysis for genomics data
Pathway analysis for genomics dataPathway analysis for genomics data
Pathway analysis for genomics data
 
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
 
O.M.GSEA - An in-depth introduction to gene-set enrichment analysis
O.M.GSEA - An in-depth introduction to gene-set enrichment analysisO.M.GSEA - An in-depth introduction to gene-set enrichment analysis
O.M.GSEA - An in-depth introduction to gene-set enrichment analysis
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptx
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Chapter38
Chapter38Chapter38
Chapter38
 
Linkage analysis
Linkage analysisLinkage analysis
Linkage analysis
 
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Pooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorPooled Sequence Haplotype Estimator
Pooled Sequence Haplotype Estimator
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 
Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19
 
Combining ability study
Combining ability study Combining ability study
Combining ability study
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Vanderbilt b
Vanderbilt bVanderbilt b
Vanderbilt b
 
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
 

Mehr von SOYEON KIM

Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataSOYEON KIM
 
Revealing disease-associated pathways by network integration of untargeted me...
Revealing disease-associated pathways by network integration of untargeted me...Revealing disease-associated pathways by network integration of untargeted me...
Revealing disease-associated pathways by network integration of untargeted me...SOYEON KIM
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSOYEON KIM
 
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...SOYEON KIM
 
Network embedding
Network embeddingNetwork embedding
Network embeddingSOYEON KIM
 
Deep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveyDeep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveySOYEON KIM
 
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...SOYEON KIM
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a surveySOYEON KIM
 
Opinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network EffectsOpinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network EffectsSOYEON KIM
 
Evaluating color descriptors for object and scene recognition
Evaluating color descriptors for object and scene recognitionEvaluating color descriptors for object and scene recognition
Evaluating color descriptors for object and scene recognitionSOYEON KIM
 
Outcome-guided mutual information networks for investigating gene-gene intera...
Outcome-guided mutual information networks for investigating gene-gene intera...Outcome-guided mutual information networks for investigating gene-gene intera...
Outcome-guided mutual information networks for investigating gene-gene intera...SOYEON KIM
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
Sentiwordnet: A publicly available lexical resource for opinion mining
Sentiwordnet: A publicly available lexical resource for opinion miningSentiwordnet: A publicly available lexical resource for opinion mining
Sentiwordnet: A publicly available lexical resource for opinion miningSOYEON KIM
 
Opinion spam and analysis
Opinion spam and analysisOpinion spam and analysis
Opinion spam and analysisSOYEON KIM
 
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...SOYEON KIM
 
Graph-based KNN Algorithm for Spam SMS Detection
Graph-based KNN Algorithm for Spam SMS DetectionGraph-based KNN Algorithm for Spam SMS Detection
Graph-based KNN Algorithm for Spam SMS DetectionSOYEON KIM
 
Deep belief networks for spam filtering
Deep belief networks for spam filteringDeep belief networks for spam filtering
Deep belief networks for spam filteringSOYEON KIM
 
A study on the spacio temporal trend of brand index using twitter messages se...
A study on the spacio temporal trend of brand index using twitter messages se...A study on the spacio temporal trend of brand index using twitter messages se...
A study on the spacio temporal trend of brand index using twitter messages se...SOYEON KIM
 
A method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkA method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkSOYEON KIM
 

Mehr von SOYEON KIM (19)

Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal data
 
Revealing disease-associated pathways by network integration of untargeted me...
Revealing disease-associated pathways by network integration of untargeted me...Revealing disease-associated pathways by network integration of untargeted me...
Revealing disease-associated pathways by network integration of untargeted me...
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traits
 
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
Deep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveyDeep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a survey
 
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a survey
 
Opinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network EffectsOpinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network Effects
 
Evaluating color descriptors for object and scene recognition
Evaluating color descriptors for object and scene recognitionEvaluating color descriptors for object and scene recognition
Evaluating color descriptors for object and scene recognition
 
Outcome-guided mutual information networks for investigating gene-gene intera...
Outcome-guided mutual information networks for investigating gene-gene intera...Outcome-guided mutual information networks for investigating gene-gene intera...
Outcome-guided mutual information networks for investigating gene-gene intera...
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Sentiwordnet: A publicly available lexical resource for opinion mining
Sentiwordnet: A publicly available lexical resource for opinion miningSentiwordnet: A publicly available lexical resource for opinion mining
Sentiwordnet: A publicly available lexical resource for opinion mining
 
Opinion spam and analysis
Opinion spam and analysisOpinion spam and analysis
Opinion spam and analysis
 
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
 
Graph-based KNN Algorithm for Spam SMS Detection
Graph-based KNN Algorithm for Spam SMS DetectionGraph-based KNN Algorithm for Spam SMS Detection
Graph-based KNN Algorithm for Spam SMS Detection
 
Deep belief networks for spam filtering
Deep belief networks for spam filteringDeep belief networks for spam filtering
Deep belief networks for spam filtering
 
A study on the spacio temporal trend of brand index using twitter messages se...
A study on the spacio temporal trend of brand index using twitter messages se...A study on the spacio temporal trend of brand index using twitter messages se...
A study on the spacio temporal trend of brand index using twitter messages se...
 
A method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkA method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based network
 

Kürzlich hochgeladen

SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 

Kürzlich hochgeladen (20)

SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 

Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated with High-Density Lipoprotein Cholesterol in Two Asian Cohorts

  • 1. Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated with High-Density Lipoprotein Cholesterol in Two Asian Cohorts Silver M, Chen P, Li R, Cheng C-Y, Wong T-Y, et al. In PLOS Genetics, 2013
  • 2. Introduction • Genes do not act in isolation, but interact in complex networks or pathways • Rather than univariate approaches, a joint modelling approach, a dual-level, sparse regression model is proposed • can simultaneously identify pathways and genes for pathway selection • Pathways-driven gene selection in a search for pathways and genes associated with variation
  • 3. Sparse group lasso model • N individuals, P SNPs, (N x P) genotype matrix X, L pathways • Assumptions • All P SNPs may be mapped to L groups or pathways • Pathways are disjoint or non-overlapping causal SNPs causal pathways Pathway level constraint SNP level constraint 𝛼 controls how the sparsity constraint is distributed between the two penalties 𝜆 controls the degree of sparsity in 𝛽
  • 4. SGL model estimation • To estimate 𝛽 𝑆𝐺𝐿 , • block, or group-wise coordinate gradient descent (BCGD) algorithm • Select a pathway 𝑙 • Select SNP 𝑗 in selected pathway 𝑙 • Pathway, SNP partial residuals • Regress out the current estimated effects of all other pathways and SNPs
  • 5. SGL simulation study 1 • Hypothesis • causal SNPs are enriched in a given pathway • pathway-driven SNP selection using SGL will outperform simple lasso selection • Randomly select 5 causal SNPs from a single pathway / all 2500 SNPs (without pathway information)
  • 6. The problem of overlapping pathways • Genes and SNPs may map to multiple pathways • The optimization is no longer separable into groups (pathways) • Not be able to select pathways independently • By duplicating SNP predictors, SNPs belonging to more than one pathway can enter the model separately • SNPs are selected in each pathway whose joint effects pass a pathway selection threshold, irrespective of overlaps between pathways • Pathways are independent • they do not compete in the model estimation process Partially overlapping causal SNPs
  • 7. The problem of overlapping pathways • • each pathway is regressed against the phenotype vector y • Only coordinate gradient descent within selected pathway (SGL-CGD) • Under the independence assumption, the estimation of each 𝛽𝑙 ∗ doesn’t depend on the other estimates 𝛽 𝑘 ∗ • Need only record the set of selected SNPs in each selected pathway
  • 8. SGL simulation study 2 Figure 5. SGL Simulation Study with overlapping pathways Table 1. Mean number of pathways and SNPs selected by each model at each effect size, γ, across 2000 MC simulations • SNPs are mapped to 50 overlapping pathways, each containing 30 SNPs • Each pathway overlaps any adjacent pathway by 10 SNPs • The number of selected pathways or SNPs increases with decreasing effect size, as the number of pathways close to the selection threshold set
  • 9. SGL simulation study 2 • Pathway and SNP selection power and False positive rates (FPR) at MC simulation z • SGL-CGD consistently outperforms SGL, both in terms of pathway selection sensitivity and control of false positives • SGL-BCGD typically has a higher FPR than SGL-CGD, since more SNPs are selected from non-causal pathways • SGL-CGD is more often able to select both causal pathways, and to select additional causal SNPs that are missed by SGL Figure 6. SGL-CGD vs SGL-BCGD performance
  • 10. Pathway and SNP selection bias • Biasing factors • pathway size, varying patterns of SNP-SNP correlations, and gene sizes • An adaptive weight-tuning strategy to reduce selection bias • tuning the pathway weight vector 𝑤 to ensure that each pathway must have an equal chance of being selected
  • 11. Ranking variables • A resampling strategy • calculate pathway, gene and SNP selection frequencies by repeatedly fitting the model over B subsamples of the data, at fixed values for 𝛼 and 𝜆 • exploit knowledge of finite sample variability obtained by subsampling, to achieve better estimates of a variable's importance • can rank pathways, genes and SNPs in order of their strength of association with the phenotype • Pathways or SNPs and genes are ranked in order of their selection probabilities
  • 12. Simulation study 3 • Evaluate ranking strategies • Use real genotype and pathways data • genome-wide SNP dataset ‘SP2’ • KEGG pathways database • SNP ranking • TP: selected SNPs that tag at least one causal SNP • FP: selected SNPs which do not tag any causal SNP • gene ranking • TP: selected causal genes(map to true causal SNP) • FP: selected non-causal genes • Compared with SNP and gene rankings using a univariate, regression-based quantitative trait test (QTT) K: the number of causal SNPs GV, TV: proportion of trait variance
  • 13. Simulation study 3 TPR: The proportion of subsamples in which the correct causal pathway is selected Figure 7. A–F: SNP and gene ranking performance for the six different scenarios
  • 14. Pathway mapping • Genes are mapped to pathways using information on gene-gene interactions. • Many SNPs and genes do not map to any known pathway. • Genes and SNPs may map to more than one pathway. • Many SNPs cannot be mapped to a pathway since they do not map to a mapped gene. Available SNPs 492,639 SNPs (SP2) 515,503 SNPs (SiMES) Genes: GRCH36/hg18 21,004 genes 239,757 SNPs (SP2) 251,089 SNPs (SiMES) mapped to 18,845 genes (SP2) 18,919 genes (SiMES) within 10kbp Pathways: KEGG 185 Pathways containing 5,267 distinct genes SNP to gene mapping 75,389 SNPs (SP2) 78,933 SNPs (SiMES) mapped to 4,734 genes (SP2) 4,751 genes (SiMES) and 185 pathways SNP to pathway mapping
  • 15. Results • Pathways-driven SNP selection on the SP2 and SiMES datasets separately using SGL • Combine this with the subsampling procedure to highlight pathways and genes associated with variation • Compare results from both datasets
  • 16. • Compare with the resulting pathway and SNP selection frequency distributions with null distributions • A greater number of SNPs contribute to increase the number of pathways • The number of SNPs may affect the resulting pathway and SNP rankings • Optimal 𝛼=? Table 5. Separate combinations of regularisation parameters, 𝜆 and 𝛼 used for analysis of the SP2 dataset. Pathway level constraint SNP level constraint Pathway and SNP selection results
  • 17. Pathway and SNP selection results Figure 11. Empirical and null pathway selection frequency distributions for all 185 KEGG pathways with the SP2 dataset Figure 12. Empirical and null SNP selection frequency distributions with the SP2 dataset Figure 14. Empirical and null pathway (top) and SNP (bottom) selection frequency distributions for the SiMES dataset 𝛼 = 0.85 𝛼 = 0.95 clearer separation of empirical and null distributions Biased empirical pathway and SNP selection frequency distributions 𝛼 = 0.95
  • 18. Pathway and SNP selection results Figure 13. SP2 dataset: scatter plots comparing empirical and null selection frequencies presented in Figures 11 and 12 Figure 15. SiMES dataset: Scatter plots comparing empirical and null pathway (left) and SNP (right) selection frequencies presented in Figure 14
  • 19. • Increased correlation between empirical and null selection frequency distributions at the lower 𝛼 increase bias in the empirical results • The selection of too many SNPs will add noise, bias Table 6. SP2 dataset: Pearson correlation coefficients (r) and p- values for the data plotted in Figure 13 Table 9. SiMES dataset: Pearson correlation coefficients (r) and p-values for the data plotted in Figure 15. Pathway and SNP selection results
  • 20. Top 30 pathways and genes ... … … … … Table 7. SP2 dataset: Top 30 pathways, ranked by pathway selection frequency, 𝜋 𝑝𝑎𝑡ℎ . Table 8. SP2 and SiMES datasets: Top 30 genes ranked by gene selection frequency, 𝜋 𝑔𝑒𝑛𝑒 . ... … … …
  • 21. Top 30 pathways ... … … … … Table 10. SiMES dataset: Top 30 pathways, ranked by pathway selection frequency, 𝜋 𝑝𝑎𝑡ℎ .
  • 22. Comparison of ranked pathway and gene lists • Pathway rankings Figure 16. Comparison of top-k SP2 and SiMES pathway rankings Normalized Canberra distance(left), FDR q-values (right) Table 11. Consensus set of important pathways, Ψ25 𝑝𝑎𝑡ℎ , for SP2 and SiMES datasets with k = 25. closest agreement when k = 25
  • 23. Comparison of ranked pathway and gene lists • Gene rankings Figure 17. Comparison of top-k SP2 and SiMES gene rankings, for k = 1,…,500. Normalized Canberra distance(left), FDR q-values (right) Table 13. Top 30 consensus genes ordered by their average rank, 𝜓244 𝑔𝑒𝑛𝑒 closest agreement when k=244
  • 24. Discussion • A method for the detection of pathways and genes associated with a quantitative trait • uses a sparse regression model, the sparse group lasso, that enforces sparsity at the pathway and SNP level. • identify important pathways and also maximize the power to detect causal SNPs • Simulation studies • SGL has greater SNP selection power than lasso • a modified SGL-CGD estimation algorithm that treats pathways as independent, may offer greater sensitivity for the detection of causal SNPs and pathways • combines with a weight-tuning algorithm to reduce selection bias • a resampling technique is designed to provide a robust measure of variable importance

Hinweis der Redaktion

  1. pathways analysis methods hope to identify aspects of a disease or trait's genetic architecture that might be missed using more conventional approaches. Most existing pathways methods take a univariate approach Assessing pathway significance, important genetic variants within significant pathways are analyzed.
  2. Sparsity patterns enforced by the group lasso and sparse group lasso the number of pathways and SNPs selected by the model increases as lambda is reduced Alpha -> 0, sparsity is imposed only at the pathway level (group lasso) Alpha -> 1, lasso, pathway information is ignored
  3. SGL outperforms lasso above effect size threshold 0.04 The lasso shows a smooth distribution in power, with mean power increasing with effect size with SGL the distribution is almost bimodal, with power typically either 0 or 1, depending on whether or not the correct causal pathway is selected When pathways are important, the advantages of pathway-driven SNP selection are emphasized for detecting causal SNPs 50 pathway, 50 SNPs, total 2500 SNPs, 400 individuals, 5 causal SNPs
  4. After estimate beta_k, estimated effect of overlapping causal SNPs in beta_l is removed from the regression
  5. the number of simulations at which one method outperforms the other across all 2000 MC simulations  These additional SNPs are harder to detect with SGL, once the effect of overlapping SNPs are screened out during estimation using BCGD
  6. different scenarios in which we vary the numbers of causal SNPs and SNP effect sizes  For each scenario 400 MC simulations
  7. Since only one pathway is selected at each subsample, true positive rates represent the mean number of subsamples in which a causal pathway is selected across all MC simulations Pathway selection power is maintained by SGL for both scenarios, SGL is also able to maintain superior gene ranking performance with relatively high power and good control of false positives compared to QTT  SGL in combination with gene ranking using our proposed subsampling approach is able to demonstrate good power and specificity over a range of scenarios using real genotype and pathways data
  8. Lower value of alpha -> reduced penalty on SNP coefficient vector -> many SNPs selected -> increased group penalty -> number of selected pathways increased
  9. When alpha is lower at 0.85, empirical pathway and SNP selection frequency distributions appear to be biased (pathways and SNPs with the highest empirical selection frequencies also tend to be selected with a higher frequency under the null, where there is no association between genotype and phenotype)
  10. certain pathways and SNPs tend to be selected with a higher frequency, irrespective of whether or not a true signal may be present reduced but still significant correlations between empirical and null selection frequency distributions
  11. only the very top ranked variables are likely to reflect any true signal more emphasis is placed on differences in the ranks of highly ranked variables in either dataset Ca* = 0 corresponding to exact agreement between the lists