SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
A short and naive introduction to epistasis in
association studies
Nathalie Villa-Vialaneix
nathalie.villa-vialaneix@inra.fr
http://www.nathalievilla.org
EpiFun
June 1st, 2018 - Paris
Nathalie Villa-Vialaneix | Epistasis and GWAS 1/23
What is this presentation about?
Standard GWAS
Disease
Healthy
Nathalie Villa-Vialaneix | Epistasis and GWAS 2/23
What is this presentation about?
Standard GWAS
Disease
Healthy
What we are interesting in:
epistasis: interaction between two (or more) SNPs influences the
phenotype but every single SNP does not
how to detect SNP/SNP, gene/gene interactions?
Nathalie Villa-Vialaneix | Epistasis and GWAS 2/23
Everything is easier with a picture
Nathalie Villa-Vialaneix | Epistasis and GWAS 3/23
Disclaimer
naive (but hopefully comprehensive) presentation
seeks at giving an overview rather than precise
directions
might contains errors, overclaims, missing
references, badly understood concepts... to keep
you awake
Nathalie Villa-Vialaneix | Epistasis and GWAS 4/23
Disclaimer
naive (but hopefully comprehensive) presentation
seeks at giving an overview rather than precise
directions
might contains errors, overclaims, missing
references, badly understood concepts... to keep
you awake
Two main reviews used to make these slides:
[Neil et al., 2015, Stanislas, 2017, Emily, 2018].
Material:
References at the end of the slides
these slides on my website
http://www.nathalievilla.org/seminars2018.html
most articles available online at http://nextcloud.
nathalievilla.org/index.php/s/VLlheqpwhwD8eeZ (ask me to
be granted write rights)
Nathalie Villa-Vialaneix | Epistasis and GWAS 4/23
Evidence for epistatis
1 missing heritability: in GWAS, only a little part of the genetic variance
explains the phenotype (with a “one locus at a time” strategy)
2 small effect size of most SNPs
3 possible explanation from an evolutionnary perspective: yields robust
systems resistant to variations
Nathalie Villa-Vialaneix | Epistasis and GWAS 5/23
(a bit) More formal definition(s)...
no consensus on the definition...!
Nathalie Villa-Vialaneix | Epistasis and GWAS 6/23
(a bit) More formal definition(s)...
no consensus on the definition...!
biology/statistics [Neil et al., 2015]
biological point of view: (originally) effect of an allele at a given locus
is hidden by the effect of another allele at a second locus – (more
recently) effect of an allele at a given locus depends on the presence
or absence of a genetic variant at another locus
statistical point of view [Fisher, 1918]: departure from additive effects of
genetic variants with respect to their global contribution to the
phenotype
Nathalie Villa-Vialaneix | Epistasis and GWAS 6/23
(a bit) More formal definition(s)...
no consensus on the definition...!
[Emily, 2018]: in the case of a phenotype Y ∈ {0, 1} (cases and controls) and
two loci with variants {A, a} and {B, b} respectively, definitions of
epsistasis:
at allele ((A, B), (A, b), (a, B), (a, b)) or genotype ((AA, BB),
(Aa, BB), (Aa, Bb), ...) levels
Nathalie Villa-Vialaneix | Epistasis and GWAS 6/23
(a bit) More formal definition(s)...
no consensus on the definition...!
[Emily, 2018]: in the case of a phenotype Y ∈ {0, 1} (cases and controls) and
two loci with variants {A, a} and {B, b} respectively, definitions of
epsistasis:
at allele ((A, B), (A, b), (a, B), (a, b)) or genotype ((AA, BB),
(Aa, BB), (Aa, Bb), ...) levels
for a statistical (departure from linearity measured by odds-ratio
between cases and controls) or a biological (measures of
associations assumed to be equal in cases and controls)
Nathalie Villa-Vialaneix | Epistasis and GWAS 6/23
Back to pictures
G hides effect of B interaction independant effects
original definition extension lack of epistasis?
[Cordell, 2002]: statistical definition is less ambiguous even though it is often
hard to interpret from a biological point of view
Nathalie Villa-Vialaneix | Epistasis and GWAS 7/23
Challenges for epistatis detection
statistical “small n large p problems” (at least at genome scale)
computational complexity linear in n but exponential in p (when the
number of interactions grows)
biological gap between statistical and biological (functional)
interpretations
Nathalie Villa-Vialaneix | Epistasis and GWAS 8/23
1 A tentative definition
2 SNP-SNP approaches
3 SNPset-SNPset approaches
4 GWAS
Nathalie Villa-Vialaneix | Epistasis and GWAS 9/23
Background
Purpose
Given two loci X1 and X2 (allelic or genotype level), how to detect their
epistatic effect on Y (cases/controls)?
Nathalie Villa-Vialaneix | Epistasis and GWAS 10/23
Background
Purpose
Given two loci X1 and X2 (allelic or genotype level), how to detect their
epistatic effect on Y (cases/controls)?
1 regression based methods (mostly linear)
2 comparison of correlation in cases / controls (or odds-ratio
differences)
3 information theory based methods
Other approaches based on ROC analysis for instance (not discussed)
Nathalie Villa-Vialaneix | Epistasis and GWAS 10/23
Regression based methods
1 {stat, allele} PLINK [Purcell et al., 2007] logistic regression
logit P (Y = 1|(x1, x2)) = α + βI{x1=A} + γI{x2=B}
additive effect
+ δI{(x1,x2)=(A,B)}
departure from additivity
and test of δ = 0 (genotypic version in [Cordell, 2002])
2 {stat, geno} BOOST [Wan et al., 2010] Poisson GLM (same approach with
a count model and boolean computations)
Nathalie Villa-Vialaneix | Epistasis and GWAS 11/23
Regression based methods
1 {stat, allele} PLINK [Purcell et al., 2007] logistic regression
logit P (Y = 1|(x1, x2)) = α + βI{x1=A} + γI{x2=B}
additive effect
+ δI{(x1,x2)=(A,B)}
departure from additivity
and test of δ = 0 (genotypic version in [Cordell, 2002])
2 {stat, geno} BOOST [Wan et al., 2010] Poisson GLM (same approach with
a count model and boolean computations)
computational optimization of ML (can be numerically unstable or
difficult), only linear interactions
Nathalie Villa-Vialaneix | Epistasis and GWAS 11/23
Wald-like test methods
Principle: test “H0: W = 0” for a W that measures “association” between
X1 and X2 for the outcome Y, where (usually) W ∼ χ2
under H0
Nathalie Villa-Vialaneix | Epistasis and GWAS 12/23
Wald-like test methods
Principle: test “H0: W = 0” for a W that measures “association” between
X1 and X2 for the outcome Y, where (usually) W ∼ χ2
under H0
Example (the simplest): [Zhao et al., 2006] {bio, allele}
W =
(r1 − r0)2
Var(r1) + Var(r0)
where rk = Cor(I{X1=A}, I{X2=B}|Y = k).
Other approaches are based on odd-ratio [Emily, 2002] {bio, geno}.
Nathalie Villa-Vialaneix | Epistasis and GWAS 12/23
Entropy based methods
Methods based on information theory [Shannon, 1948] (powerful to catch
nonlinear interactions)
Mutual information
I(X1, X2) =
x1∈{AA,Aa,aa} x2∈{BB,Bb,bb}
p12 log
p12
p1p2
with p12 = P(X1 = x1, X2 = x2) and pj = P(Xj = xj).
Nathalie Villa-Vialaneix | Epistasis and GWAS 13/23
Entropy based methods
Methods based on information theory [Shannon, 1948] (powerful to catch
nonlinear interactions)
Mutual information
I(X1, X2) =
x1∈{AA,Aa,aa} x2∈{BB,Bb,bb}
p12 log
p12
p1p2
with p12 = P(X1 = x1, X2 = x2) and pj = P(Xj = xj).
Example [Fan et al., 2011] IG = I(X1, X2|Y = 1) − I(X1, X2) + resampling
methods to test significance
lack of know distribution under H0
Nathalie Villa-Vialaneix | Epistasis and GWAS 13/23
Background
Purpose
Given two sets of SNPs (genes, aplotypes, ...) X1 = (X11, . . . , X1m1
) and
X2 = (X21, . . . , X2m2
) (allelic or genotype level), how to detect a global
epistatic effect on Y (cases/controls)?
⇒ “summary” of SNPs analyses.
Nathalie Villa-Vialaneix | Epistasis and GWAS 14/23
Background
Purpose
Given two sets of SNPs (genes, aplotypes, ...) X1 = (X11, . . . , X1m1
) and
X2 = (X21, . . . , X2m2
) (allelic or genotype level), how to detect a global
epistatic effect on Y (cases/controls)?
⇒ “summary” of SNPs analyses.
1 combination of tests (multiple testing or global test)
2 multidimensional analysis (regression models, tests, enthropy based
methods at the set level)
3 kernel based methods
Nathalie Villa-Vialaneix | Epistasis and GWAS 14/23
Combining tests
1 Multiple testing tests all interactions (X1j, X2k ) and obtain m1m2
p-values (of non independant tests) + multiple testing procedure
(Simes to control intersection of null hypotheses and “number of effective tests” to
account for correlations): GATES [Li et al., 2011]
(other approaches combining p-values have been proposed)
Nathalie Villa-Vialaneix | Epistasis and GWAS 15/23
Combining tests
1 Multiple testing tests all interactions (X1j, X2k ) and obtain m1m2
p-values (of non independant tests) + multiple testing procedure
(Simes to control intersection of null hypotheses and “number of effective tests” to
account for correlations): GATES [Li et al., 2011]
(other approaches combining p-values have been proposed)
2 Global distribution of test statistics: Wjk , test statistics for logistic
regression ⇒ W = [W11, . . . , Wm1m2
] ∼ N(0, Σ)
derive a p-value from N(0, Σ), with an estimation of Σ: minP
[Emily, 2016]
Nathalie Villa-Vialaneix | Epistasis and GWAS 15/23
Combining tests
1 Multiple testing tests all interactions (X1j, X2k ) and obtain m1m2
p-values (of non independant tests) + multiple testing procedure
(Simes to control intersection of null hypotheses and “number of effective tests” to
account for correlations): GATES [Li et al., 2011]
(other approaches combining p-values have been proposed)
2 Global distribution of test statistics: Wjk , test statistics for logistic
regression ⇒ W = [W11, . . . , Wm1m2
] ∼ N(0, Σ)
derive a p-value from N(0, Σ), with an estimation of Σ: minP
[Emily, 2016]
only linear interactions ; computational issues (both methods) ;
hyper-parameter hard to set (effective number of test; GATES)
Nathalie Villa-Vialaneix | Epistasis and GWAS 15/23
Multidimensional methods I
1 dimension reduction Summarize a SNP set with a few numerical
values (PCA, CCA...) and perform logistic regression with a test of
the interaction on the summaries:
logit P (Y = 1|(x1, x2)) = α+βPC1(x1)+γPC1(x2)+δPC1(x1)PC1(x2)
and test “δ = 0” [Li et al., 2009, Stanislas et al., 2017]
Nathalie Villa-Vialaneix | Epistasis and GWAS 16/23
Multidimensional methods I
1 dimension reduction Summarize a SNP set with a few numerical
values (PCA, CCA...) and perform logistic regression with a test of
the interaction on the summaries:
logit P (Y = 1|(x1, x2)) = α+βPC1(x1)+γPC1(x2)+δPC1(x1)PC1(x2)
and test “δ = 0” [Li et al., 2009, Stanislas et al., 2017]
2 tests Summarize the correlations of SNP sets in cases and controls
(CCA) and compare these two quantities with a test:
z1
− z0
Var(z1 − z0)
∼H0
N(0, 1)
for zk
an adequate transformation of
Cor(CCA1(X1|Y = k), CCA1(X2|Y = k)) [Peng et al., 2010]
extensions to PLS, KCCA, ...
Nathalie Villa-Vialaneix | Epistasis and GWAS 16/23
Multidimensional methods II
Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is
this SNP set associated to the phenotype? (similar to what is done in
genomic selection)
Nathalie Villa-Vialaneix | Epistasis and GWAS 17/23
Multidimensional methods II
Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is
this SNP set associated to the phenotype? (similar to what is done in
genomic selection)
3 Kernel methods SKAT [Wu et al., 2010]
what is a kernel?
K is a measure of association between individuals described by their SNP
set, (x1, . . . , xn): K(xi, xj) measures a “ressemblance” between i and j.
RKHS: under mild conditions, K defines a
unique Hilbert space, H, and a unique mapping
of the individuals into H, Φ, such that:
K(xi, xj) = Φ(xi), Φ(xj) H
Nathalie Villa-Vialaneix | Epistasis and GWAS 17/23
Multidimensional methods II
Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is
this SNP set associated to the phenotype? (similar to what is done in
genomic selection)
3 Kernel methods SKAT [Wu et al., 2010]
what is a kernel?
K is a measure of association between individuals described by their SNP
set, (x1, . . . , xn): K(xi, xj) measures a “ressemblance” between i and j.
the only purpose of the previous slide was to finish people not
paying a close enough attention to my talk
Nathalie Villa-Vialaneix | Epistasis and GWAS 17/23
Multidimensional methods II (again)
Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is
this SNP set associated to the phenotype? (similar to what is done in
genomic selection)
3 Kernel methods SKAT [Wu et al., 2010]
fixed effect model in RKHS:
logiti ∼ α + h(Xi) with h ∈ H to be estimated
is equivalent to a mixed effect model
logiti ∼ α + hi with hi ∼ N(0, τK), τ to be estimated
and tests of “h(X) = 0” can be performed using the kernel K
Nathalie Villa-Vialaneix | Epistasis and GWAS 18/23
Multidimensional methods II (again)
Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is
this SNP set associated to the phenotype? (similar to what is done in
genomic selection)
3 Kernel methods SKAT [Wu et al., 2010]
fixed effect model in RKHS:
logiti ∼ α + h(Xi) with h ∈ H to be estimated
is equivalent to a mixed effect model
logiti ∼ α + hi with hi ∼ N(0, τK), τ to be estimated
and tests of “h(X) = 0” can be performed using the kernel K
Idea: h is able to capture high order interactions between SNPs within the
set X.
Nathalie Villa-Vialaneix | Epistasis and GWAS 18/23
Background
Purpose
How to detect epistatic effects genome-wide?
Nathalie Villa-Vialaneix | Epistasis and GWAS 19/23
Background
Purpose
How to detect epistatic effects genome-wide?
Basics: combine information between SNP-SNP effects or
SNPset-SNPset effects... but combinatorial issues, especially to catch
high order interactions
Nathalie Villa-Vialaneix | Epistasis and GWAS 19/23
Background
Purpose
How to detect epistatic effects genome-wide?
Basics: combine information between SNP-SNP effects or
SNPset-SNPset effects... but combinatorial issues, especially to catch
high order interactions
1 exhaustive approaches
2 filtering
3 machine learning
Nathalie Villa-Vialaneix | Epistasis and GWAS 19/23
Exhaustive approaches
1 exhaustive testing PLINK (which multiple testing corrections?) or
(penalized) regression [Wu et al., 2009] (Lasso but not really
genome-wide)
mostly restricted to linear effects and pairwise interactions
Nathalie Villa-Vialaneix | Epistasis and GWAS 20/23
Exhaustive approaches
1 exhaustive testing PLINK (which multiple testing corrections?) or
(penalized) regression [Wu et al., 2009] (Lasso but not really
genome-wide)
mostly restricted to linear effects and pairwise interactions
2 Multiple Dimensionality
Reduction (MDR) (non
parametric, model free,
can deal with high order
interactions)
[Ritchie et al., 2001]
can fail to detect pure
epistasis, strongly
depends on several
hyperparameters,
overfits
Nathalie Villa-Vialaneix | Epistasis and GWAS 20/23
Filtering
Idea: filter SNPs or SNP pairs before exhaustive search
1 filtering on marginal effects (prevents from detecting pure epistasis)
[Marchini et al., 2005]
Nathalie Villa-Vialaneix | Epistasis and GWAS 21/23
Filtering
Idea: filter SNPs or SNP pairs before exhaustive search
1 filtering on marginal effects (prevents from detecting pure epistasis)
[Marchini et al., 2005]
2 Relief genetic distance between individuals is used to compute a
measure of the importance of the SNP according to differences in the
SNP between neighbors when they have common/different Y
[Robnik-Šikonja and Kononenko, 2003]
Nathalie Villa-Vialaneix | Epistasis and GWAS 21/23
Filtering
Idea: filter SNPs or SNP pairs before exhaustive search
1 filtering on marginal effects (prevents from detecting pure epistasis)
[Marchini et al., 2005]
2 Relief genetic distance between individuals is used to compute a
measure of the importance of the SNP according to differences in the
SNP between neighbors when they have common/different Y
[Robnik-Šikonja and Kononenko, 2003]
3 biofilter combines information coming from 13 datasets that identify if
SNP sets are related to the same pathway, to proteins that interact
(PPI), ... [Pendergrass et al., 2013]
strong bias toward most documented genes/pathways
Nathalie Villa-Vialaneix | Epistasis and GWAS 21/23
ML approaches
Idea: fit a ML model that predicts Y given all SNPs and try to extract
information about interactions: random forests (with conditional variable
importance [Bureau et al., 2004, Strobl et al., 2008]), Bayesian Network (BEAM,
[Zhang and Liu, 2007]), ... (I guess: evolutionnary algorithms, deep NN, ant
colony, ...)
Nathalie Villa-Vialaneix | Epistasis and GWAS 22/23
ML approaches
Idea: fit a ML model that predicts Y given all SNPs and try to extract
information about interactions: random forests (with conditional variable
importance [Bureau et al., 2004, Strobl et al., 2008]), Bayesian Network (BEAM,
[Zhang and Liu, 2007]), ... (I guess: evolutionnary algorithms, deep NN, ant
colony, ...)
Limitations: n might be too small to make a non parametric estimation
affordable (from a statistical perspective)
Nathalie Villa-Vialaneix | Epistasis and GWAS 22/23
no conclusion because this is just the beginning of the discussion...
(and I was dead tired finishing my slides at 4 am this morning)
Nathalie Villa-Vialaneix | Epistasis and GWAS 23/23
References
Bureau, A., Dupuis, J., Falls, K., Lunetta, K. L., Hayward, B., Keith, T. P., and van Eerdewegh, P. (2004).
Identifying SNPs predictive of phenotype using random forests.
Genetic Epidemiology, 28(2):171–182.
Cordell, H. J. (2002).
Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans.
Human Molecular Genetics, 11(20):2463–2468.
Emily, M. (2002).
IndOR: a new statistical procedure to test for SNP-SNP epistasis in genome-wide association studies.
Statistics in Medecine, 31(21):2359–2373.
Emily, M. (2016).
AGGrEGATOr: a gene-based gene-gene interaction test for case-control association studies.
Statistical Applications in Genetics and Molecular Biology, 15(2):151–171.
Emily, M. (2018).
A survey of statistical methods for gene-gene interaction in case-control genome-wide association studies.
Journal de la Société Française de Statistique, 159(1):27–67.
Fan, R., Zhong, M., Wang, S., Andrew, A., Karagas, M., Chen, H.and Amos, C., Xiong, M., and Moore, J. (2011).
Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment
interactions/correlations of complex diseases.
Genetic Epidemiology, 35:706–721.
Fisher, R. (1918).
The correlation between relatives on the supposition of Mendelian inheritance.
Transactions of the Royal Society of Edinburgh, 52(9):399–433.
Li, J., Tang, R., Biernacka, J. M., and de Andrade, M. (2009).
Identification of gene-gene interaction using principal components.
BMC Proceedings, 3(Suppl 7):S78.
Nathalie Villa-Vialaneix | Epistasis and GWAS 23/23
Li, M.-X., Gui, H.-S., and Kwan, Johnny S.H. Sham, P. C. (2011).
GATES: a rapid and powerful gene-based association test using extended Simes procedure.
The American Journal of Human Genetics, 88(3):283–293.
Marchini, J., Donnelly, P., and Cardon, L. R. (2005).
Genome-wide strategies for detecting multiple loci that influence complex diseases.
Nature Genetics, 37:413–417.
Neil, C., Sinoquet, C., Dina, C., and Rocheleau, G. (2015).
A survey about methods dedicated to epistasis detection.
Frontiers in Genetics.
Pendergrass, S. A., Frase, A., Wallace, J., Wolfe, D., Katiyar, N., Moore, C., and Ritchie, M. D. (2013).
Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development.
BioData Mining, 6:25.
Peng, Q., Zhao, J., and Xue, F. (2010).
A gene-based method for detecting gene-gene co-association in a case-control association study.
European Journal of Human Genetics, 18:582–587.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., and Skiar, P. (2007).
PLINK: a tool set for whole-genome association and population-based linkage analyses.
The American Journal of Human Genetics, 81(3):559–575.
Ritchie, M. D., Hahn, L. W., Roodi, N., Bailey, L. R., Dupont, W. D., Parl, F. F., and Moore, J. H. (2001).
Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast
cancer.
The American Journal of Human Genetics, 69(1):138–147.
Robnik-Šikonja, M. and Kononenko, I. (2003).
Theoretical and empirical analysis of ReliefF and RReliefF.
Machine Learning, 53(1-2):23–69.
Shannon, C. E. (1948).
Nathalie Villa-Vialaneix | Epistasis and GWAS 23/23
A mathematical theory of communication.
Bell System Technical Journal, 27:347–423 and 623–656.
Stanislas, V. (2017).
Approches statistiques pour la detection d’épistasie dans les études d’associations pangénomiques.
Thèse de doctorat, Université Paris Saclay, Paris, France.
Stanislas, V., Dalmasso, C., and Christophe, A. (2017).
Eigen-epistasis for detecting gene-gene interactions.
BMC Bioinformatics, 18:54.
Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., and Zeilis, A. (2008).
Conditional variable importance for random forests.
BMC Bioinformatics, 9:307.
Wan, X., Yang, C., Yang, Q., Xue, H., Fan, X., Tang, N. L., and Yu, W. (2010).
BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies.
The American Journal of Human Genetics, 87(3):325–340.
Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J., and Lin, X. (2010).
Powerful SNP-set analysis for case-control genome-wide association studies.
American Journal of Human Genetics, 86(6):929–942.
Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E., and Lange, K. (2009).
Genome-wide association analysis by lasso penalized logistic regression.
Bioinformatics, 25(6):714–721.
Zhang, Y. and Liu, J. S. (2007).
Bayesian inference of epistatic interactions in case-control studies.
Nature Genetics, 39:1167–1173.
Zhao, J., Jin, L., and Xiong, M. (2006).
Test for interaction between two unlinked loci.
The American Journal of Human Genetics, 79(5):831–845.
Nathalie Villa-Vialaneix | Epistasis and GWAS 23/23

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Some sampling techniques for big data analysis
Some sampling techniques for big data analysisSome sampling techniques for big data analysis
Some sampling techniques for big data analysis
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for ml
 
Modeling Heterogeneity by Structural Varying Coefficients Models in Presence of...
Modeling Heterogeneity by Structural Varying Coefficients Models in Presence of...Modeling Heterogeneity by Structural Varying Coefficients Models in Presence of...
Modeling Heterogeneity by Structural Varying Coefficients Models in Presence of...
 
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
 
Predictive mean-matching2
Predictive mean-matching2Predictive mean-matching2
Predictive mean-matching2
 
MNAR
MNARMNAR
MNAR
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
 
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequences
 
Slides csm
Slides csmSlides csm
Slides csm
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distribution
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
 
Spike timing dependent plasticity to make robot navigation more intelligent. ...
Spike timing dependent plasticity to make robot navigation more intelligent. ...Spike timing dependent plasticity to make robot navigation more intelligent. ...
Spike timing dependent plasticity to make robot navigation more intelligent. ...
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
 
New Insights and Applications of Eco-Finance Networks and Collaborative Games
New Insights and Applications of Eco-Finance Networks and Collaborative GamesNew Insights and Applications of Eco-Finance Networks and Collaborative Games
New Insights and Applications of Eco-Finance Networks and Collaborative Games
 
Al24258261
Al24258261Al24258261
Al24258261
 
Behaviour-based Clustering of Neural Networks applied to Document Enhancement
Behaviour-based Clustering of Neural Networks applied to Document EnhancementBehaviour-based Clustering of Neural Networks applied to Document Enhancement
Behaviour-based Clustering of Neural Networks applied to Document Enhancement
 

Ähnlich wie A short and naive introduction to epistasis in association studies

2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis
NUI Galway
 

Ähnlich wie A short and naive introduction to epistasis in association studies (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual Networks
 
Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
Consensual gene co-expression network inference with multiple samples
Consensual gene co-expression network inference with multiple samplesConsensual gene co-expression network inference with multiple samples
Consensual gene co-expression network inference with multiple samples
 
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...
Étude du pathobiome respiratoire chez les jeunes bovins atteints de bronchopn...
 
2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis
 
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
 
Perspective of feature selection in bioinformatics
Perspective of feature selection in bioinformaticsPerspective of feature selection in bioinformatics
Perspective of feature selection in bioinformatics
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Marginal Regression for a Bi-variate Response with Diabetes Mellitus Study
Marginal Regression for a Bi-variate Response with Diabetes Mellitus StudyMarginal Regression for a Bi-variate Response with Diabetes Mellitus Study
Marginal Regression for a Bi-variate Response with Diabetes Mellitus Study
 
A Note On Exact Tests Of Hardy-Weinberg Equilibrium
A Note On Exact Tests Of Hardy-Weinberg EquilibriumA Note On Exact Tests Of Hardy-Weinberg Equilibrium
A Note On Exact Tests Of Hardy-Weinberg Equilibrium
 
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
 
K0745256
K0745256K0745256
K0745256
 
Presentation1
Presentation1Presentation1
Presentation1
 
6 55 E
6 55 E6 55 E
6 55 E
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 
Appendix 2 Probability And Statistics
Appendix 2  Probability And StatisticsAppendix 2  Probability And Statistics
Appendix 2 Probability And Statistics
 

Mehr von tuxette

Mehr von tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Kürzlich hochgeladen

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Kürzlich hochgeladen (20)

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 

A short and naive introduction to epistasis in association studies

  • 1. A short and naive introduction to epistasis in association studies Nathalie Villa-Vialaneix nathalie.villa-vialaneix@inra.fr http://www.nathalievilla.org EpiFun June 1st, 2018 - Paris Nathalie Villa-Vialaneix | Epistasis and GWAS 1/23
  • 2. What is this presentation about? Standard GWAS Disease Healthy Nathalie Villa-Vialaneix | Epistasis and GWAS 2/23
  • 3. What is this presentation about? Standard GWAS Disease Healthy What we are interesting in: epistasis: interaction between two (or more) SNPs influences the phenotype but every single SNP does not how to detect SNP/SNP, gene/gene interactions? Nathalie Villa-Vialaneix | Epistasis and GWAS 2/23
  • 4. Everything is easier with a picture Nathalie Villa-Vialaneix | Epistasis and GWAS 3/23
  • 5. Disclaimer naive (but hopefully comprehensive) presentation seeks at giving an overview rather than precise directions might contains errors, overclaims, missing references, badly understood concepts... to keep you awake Nathalie Villa-Vialaneix | Epistasis and GWAS 4/23
  • 6. Disclaimer naive (but hopefully comprehensive) presentation seeks at giving an overview rather than precise directions might contains errors, overclaims, missing references, badly understood concepts... to keep you awake Two main reviews used to make these slides: [Neil et al., 2015, Stanislas, 2017, Emily, 2018]. Material: References at the end of the slides these slides on my website http://www.nathalievilla.org/seminars2018.html most articles available online at http://nextcloud. nathalievilla.org/index.php/s/VLlheqpwhwD8eeZ (ask me to be granted write rights) Nathalie Villa-Vialaneix | Epistasis and GWAS 4/23
  • 7. Evidence for epistatis 1 missing heritability: in GWAS, only a little part of the genetic variance explains the phenotype (with a “one locus at a time” strategy) 2 small effect size of most SNPs 3 possible explanation from an evolutionnary perspective: yields robust systems resistant to variations Nathalie Villa-Vialaneix | Epistasis and GWAS 5/23
  • 8. (a bit) More formal definition(s)... no consensus on the definition...! Nathalie Villa-Vialaneix | Epistasis and GWAS 6/23
  • 9. (a bit) More formal definition(s)... no consensus on the definition...! biology/statistics [Neil et al., 2015] biological point of view: (originally) effect of an allele at a given locus is hidden by the effect of another allele at a second locus – (more recently) effect of an allele at a given locus depends on the presence or absence of a genetic variant at another locus statistical point of view [Fisher, 1918]: departure from additive effects of genetic variants with respect to their global contribution to the phenotype Nathalie Villa-Vialaneix | Epistasis and GWAS 6/23
  • 10. (a bit) More formal definition(s)... no consensus on the definition...! [Emily, 2018]: in the case of a phenotype Y ∈ {0, 1} (cases and controls) and two loci with variants {A, a} and {B, b} respectively, definitions of epsistasis: at allele ((A, B), (A, b), (a, B), (a, b)) or genotype ((AA, BB), (Aa, BB), (Aa, Bb), ...) levels Nathalie Villa-Vialaneix | Epistasis and GWAS 6/23
  • 11. (a bit) More formal definition(s)... no consensus on the definition...! [Emily, 2018]: in the case of a phenotype Y ∈ {0, 1} (cases and controls) and two loci with variants {A, a} and {B, b} respectively, definitions of epsistasis: at allele ((A, B), (A, b), (a, B), (a, b)) or genotype ((AA, BB), (Aa, BB), (Aa, Bb), ...) levels for a statistical (departure from linearity measured by odds-ratio between cases and controls) or a biological (measures of associations assumed to be equal in cases and controls) Nathalie Villa-Vialaneix | Epistasis and GWAS 6/23
  • 12. Back to pictures G hides effect of B interaction independant effects original definition extension lack of epistasis? [Cordell, 2002]: statistical definition is less ambiguous even though it is often hard to interpret from a biological point of view Nathalie Villa-Vialaneix | Epistasis and GWAS 7/23
  • 13. Challenges for epistatis detection statistical “small n large p problems” (at least at genome scale) computational complexity linear in n but exponential in p (when the number of interactions grows) biological gap between statistical and biological (functional) interpretations Nathalie Villa-Vialaneix | Epistasis and GWAS 8/23
  • 14. 1 A tentative definition 2 SNP-SNP approaches 3 SNPset-SNPset approaches 4 GWAS Nathalie Villa-Vialaneix | Epistasis and GWAS 9/23
  • 15. Background Purpose Given two loci X1 and X2 (allelic or genotype level), how to detect their epistatic effect on Y (cases/controls)? Nathalie Villa-Vialaneix | Epistasis and GWAS 10/23
  • 16. Background Purpose Given two loci X1 and X2 (allelic or genotype level), how to detect their epistatic effect on Y (cases/controls)? 1 regression based methods (mostly linear) 2 comparison of correlation in cases / controls (or odds-ratio differences) 3 information theory based methods Other approaches based on ROC analysis for instance (not discussed) Nathalie Villa-Vialaneix | Epistasis and GWAS 10/23
  • 17. Regression based methods 1 {stat, allele} PLINK [Purcell et al., 2007] logistic regression logit P (Y = 1|(x1, x2)) = α + βI{x1=A} + γI{x2=B} additive effect + δI{(x1,x2)=(A,B)} departure from additivity and test of δ = 0 (genotypic version in [Cordell, 2002]) 2 {stat, geno} BOOST [Wan et al., 2010] Poisson GLM (same approach with a count model and boolean computations) Nathalie Villa-Vialaneix | Epistasis and GWAS 11/23
  • 18. Regression based methods 1 {stat, allele} PLINK [Purcell et al., 2007] logistic regression logit P (Y = 1|(x1, x2)) = α + βI{x1=A} + γI{x2=B} additive effect + δI{(x1,x2)=(A,B)} departure from additivity and test of δ = 0 (genotypic version in [Cordell, 2002]) 2 {stat, geno} BOOST [Wan et al., 2010] Poisson GLM (same approach with a count model and boolean computations) computational optimization of ML (can be numerically unstable or difficult), only linear interactions Nathalie Villa-Vialaneix | Epistasis and GWAS 11/23
  • 19. Wald-like test methods Principle: test “H0: W = 0” for a W that measures “association” between X1 and X2 for the outcome Y, where (usually) W ∼ χ2 under H0 Nathalie Villa-Vialaneix | Epistasis and GWAS 12/23
  • 20. Wald-like test methods Principle: test “H0: W = 0” for a W that measures “association” between X1 and X2 for the outcome Y, where (usually) W ∼ χ2 under H0 Example (the simplest): [Zhao et al., 2006] {bio, allele} W = (r1 − r0)2 Var(r1) + Var(r0) where rk = Cor(I{X1=A}, I{X2=B}|Y = k). Other approaches are based on odd-ratio [Emily, 2002] {bio, geno}. Nathalie Villa-Vialaneix | Epistasis and GWAS 12/23
  • 21. Entropy based methods Methods based on information theory [Shannon, 1948] (powerful to catch nonlinear interactions) Mutual information I(X1, X2) = x1∈{AA,Aa,aa} x2∈{BB,Bb,bb} p12 log p12 p1p2 with p12 = P(X1 = x1, X2 = x2) and pj = P(Xj = xj). Nathalie Villa-Vialaneix | Epistasis and GWAS 13/23
  • 22. Entropy based methods Methods based on information theory [Shannon, 1948] (powerful to catch nonlinear interactions) Mutual information I(X1, X2) = x1∈{AA,Aa,aa} x2∈{BB,Bb,bb} p12 log p12 p1p2 with p12 = P(X1 = x1, X2 = x2) and pj = P(Xj = xj). Example [Fan et al., 2011] IG = I(X1, X2|Y = 1) − I(X1, X2) + resampling methods to test significance lack of know distribution under H0 Nathalie Villa-Vialaneix | Epistasis and GWAS 13/23
  • 23. Background Purpose Given two sets of SNPs (genes, aplotypes, ...) X1 = (X11, . . . , X1m1 ) and X2 = (X21, . . . , X2m2 ) (allelic or genotype level), how to detect a global epistatic effect on Y (cases/controls)? ⇒ “summary” of SNPs analyses. Nathalie Villa-Vialaneix | Epistasis and GWAS 14/23
  • 24. Background Purpose Given two sets of SNPs (genes, aplotypes, ...) X1 = (X11, . . . , X1m1 ) and X2 = (X21, . . . , X2m2 ) (allelic or genotype level), how to detect a global epistatic effect on Y (cases/controls)? ⇒ “summary” of SNPs analyses. 1 combination of tests (multiple testing or global test) 2 multidimensional analysis (regression models, tests, enthropy based methods at the set level) 3 kernel based methods Nathalie Villa-Vialaneix | Epistasis and GWAS 14/23
  • 25. Combining tests 1 Multiple testing tests all interactions (X1j, X2k ) and obtain m1m2 p-values (of non independant tests) + multiple testing procedure (Simes to control intersection of null hypotheses and “number of effective tests” to account for correlations): GATES [Li et al., 2011] (other approaches combining p-values have been proposed) Nathalie Villa-Vialaneix | Epistasis and GWAS 15/23
  • 26. Combining tests 1 Multiple testing tests all interactions (X1j, X2k ) and obtain m1m2 p-values (of non independant tests) + multiple testing procedure (Simes to control intersection of null hypotheses and “number of effective tests” to account for correlations): GATES [Li et al., 2011] (other approaches combining p-values have been proposed) 2 Global distribution of test statistics: Wjk , test statistics for logistic regression ⇒ W = [W11, . . . , Wm1m2 ] ∼ N(0, Σ) derive a p-value from N(0, Σ), with an estimation of Σ: minP [Emily, 2016] Nathalie Villa-Vialaneix | Epistasis and GWAS 15/23
  • 27. Combining tests 1 Multiple testing tests all interactions (X1j, X2k ) and obtain m1m2 p-values (of non independant tests) + multiple testing procedure (Simes to control intersection of null hypotheses and “number of effective tests” to account for correlations): GATES [Li et al., 2011] (other approaches combining p-values have been proposed) 2 Global distribution of test statistics: Wjk , test statistics for logistic regression ⇒ W = [W11, . . . , Wm1m2 ] ∼ N(0, Σ) derive a p-value from N(0, Σ), with an estimation of Σ: minP [Emily, 2016] only linear interactions ; computational issues (both methods) ; hyper-parameter hard to set (effective number of test; GATES) Nathalie Villa-Vialaneix | Epistasis and GWAS 15/23
  • 28. Multidimensional methods I 1 dimension reduction Summarize a SNP set with a few numerical values (PCA, CCA...) and perform logistic regression with a test of the interaction on the summaries: logit P (Y = 1|(x1, x2)) = α+βPC1(x1)+γPC1(x2)+δPC1(x1)PC1(x2) and test “δ = 0” [Li et al., 2009, Stanislas et al., 2017] Nathalie Villa-Vialaneix | Epistasis and GWAS 16/23
  • 29. Multidimensional methods I 1 dimension reduction Summarize a SNP set with a few numerical values (PCA, CCA...) and perform logistic regression with a test of the interaction on the summaries: logit P (Y = 1|(x1, x2)) = α+βPC1(x1)+γPC1(x2)+δPC1(x1)PC1(x2) and test “δ = 0” [Li et al., 2009, Stanislas et al., 2017] 2 tests Summarize the correlations of SNP sets in cases and controls (CCA) and compare these two quantities with a test: z1 − z0 Var(z1 − z0) ∼H0 N(0, 1) for zk an adequate transformation of Cor(CCA1(X1|Y = k), CCA1(X2|Y = k)) [Peng et al., 2010] extensions to PLS, KCCA, ... Nathalie Villa-Vialaneix | Epistasis and GWAS 16/23
  • 30. Multidimensional methods II Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is this SNP set associated to the phenotype? (similar to what is done in genomic selection) Nathalie Villa-Vialaneix | Epistasis and GWAS 17/23
  • 31. Multidimensional methods II Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is this SNP set associated to the phenotype? (similar to what is done in genomic selection) 3 Kernel methods SKAT [Wu et al., 2010] what is a kernel? K is a measure of association between individuals described by their SNP set, (x1, . . . , xn): K(xi, xj) measures a “ressemblance” between i and j. RKHS: under mild conditions, K defines a unique Hilbert space, H, and a unique mapping of the individuals into H, Φ, such that: K(xi, xj) = Φ(xi), Φ(xj) H Nathalie Villa-Vialaneix | Epistasis and GWAS 17/23
  • 32. Multidimensional methods II Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is this SNP set associated to the phenotype? (similar to what is done in genomic selection) 3 Kernel methods SKAT [Wu et al., 2010] what is a kernel? K is a measure of association between individuals described by their SNP set, (x1, . . . , xn): K(xi, xj) measures a “ressemblance” between i and j. the only purpose of the previous slide was to finish people not paying a close enough attention to my talk Nathalie Villa-Vialaneix | Epistasis and GWAS 17/23
  • 33. Multidimensional methods II (again) Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is this SNP set associated to the phenotype? (similar to what is done in genomic selection) 3 Kernel methods SKAT [Wu et al., 2010] fixed effect model in RKHS: logiti ∼ α + h(Xi) with h ∈ H to be estimated is equivalent to a mixed effect model logiti ∼ α + hi with hi ∼ N(0, τK), τ to be estimated and tests of “h(X) = 0” can be performed using the kernel K Nathalie Villa-Vialaneix | Epistasis and GWAS 18/23
  • 34. Multidimensional methods II (again) Here the purpose is a bit different: only one SNP set X = (X1, . . . , Xm). Is this SNP set associated to the phenotype? (similar to what is done in genomic selection) 3 Kernel methods SKAT [Wu et al., 2010] fixed effect model in RKHS: logiti ∼ α + h(Xi) with h ∈ H to be estimated is equivalent to a mixed effect model logiti ∼ α + hi with hi ∼ N(0, τK), τ to be estimated and tests of “h(X) = 0” can be performed using the kernel K Idea: h is able to capture high order interactions between SNPs within the set X. Nathalie Villa-Vialaneix | Epistasis and GWAS 18/23
  • 35. Background Purpose How to detect epistatic effects genome-wide? Nathalie Villa-Vialaneix | Epistasis and GWAS 19/23
  • 36. Background Purpose How to detect epistatic effects genome-wide? Basics: combine information between SNP-SNP effects or SNPset-SNPset effects... but combinatorial issues, especially to catch high order interactions Nathalie Villa-Vialaneix | Epistasis and GWAS 19/23
  • 37. Background Purpose How to detect epistatic effects genome-wide? Basics: combine information between SNP-SNP effects or SNPset-SNPset effects... but combinatorial issues, especially to catch high order interactions 1 exhaustive approaches 2 filtering 3 machine learning Nathalie Villa-Vialaneix | Epistasis and GWAS 19/23
  • 38. Exhaustive approaches 1 exhaustive testing PLINK (which multiple testing corrections?) or (penalized) regression [Wu et al., 2009] (Lasso but not really genome-wide) mostly restricted to linear effects and pairwise interactions Nathalie Villa-Vialaneix | Epistasis and GWAS 20/23
  • 39. Exhaustive approaches 1 exhaustive testing PLINK (which multiple testing corrections?) or (penalized) regression [Wu et al., 2009] (Lasso but not really genome-wide) mostly restricted to linear effects and pairwise interactions 2 Multiple Dimensionality Reduction (MDR) (non parametric, model free, can deal with high order interactions) [Ritchie et al., 2001] can fail to detect pure epistasis, strongly depends on several hyperparameters, overfits Nathalie Villa-Vialaneix | Epistasis and GWAS 20/23
  • 40. Filtering Idea: filter SNPs or SNP pairs before exhaustive search 1 filtering on marginal effects (prevents from detecting pure epistasis) [Marchini et al., 2005] Nathalie Villa-Vialaneix | Epistasis and GWAS 21/23
  • 41. Filtering Idea: filter SNPs or SNP pairs before exhaustive search 1 filtering on marginal effects (prevents from detecting pure epistasis) [Marchini et al., 2005] 2 Relief genetic distance between individuals is used to compute a measure of the importance of the SNP according to differences in the SNP between neighbors when they have common/different Y [Robnik-Šikonja and Kononenko, 2003] Nathalie Villa-Vialaneix | Epistasis and GWAS 21/23
  • 42. Filtering Idea: filter SNPs or SNP pairs before exhaustive search 1 filtering on marginal effects (prevents from detecting pure epistasis) [Marchini et al., 2005] 2 Relief genetic distance between individuals is used to compute a measure of the importance of the SNP according to differences in the SNP between neighbors when they have common/different Y [Robnik-Šikonja and Kononenko, 2003] 3 biofilter combines information coming from 13 datasets that identify if SNP sets are related to the same pathway, to proteins that interact (PPI), ... [Pendergrass et al., 2013] strong bias toward most documented genes/pathways Nathalie Villa-Vialaneix | Epistasis and GWAS 21/23
  • 43. ML approaches Idea: fit a ML model that predicts Y given all SNPs and try to extract information about interactions: random forests (with conditional variable importance [Bureau et al., 2004, Strobl et al., 2008]), Bayesian Network (BEAM, [Zhang and Liu, 2007]), ... (I guess: evolutionnary algorithms, deep NN, ant colony, ...) Nathalie Villa-Vialaneix | Epistasis and GWAS 22/23
  • 44. ML approaches Idea: fit a ML model that predicts Y given all SNPs and try to extract information about interactions: random forests (with conditional variable importance [Bureau et al., 2004, Strobl et al., 2008]), Bayesian Network (BEAM, [Zhang and Liu, 2007]), ... (I guess: evolutionnary algorithms, deep NN, ant colony, ...) Limitations: n might be too small to make a non parametric estimation affordable (from a statistical perspective) Nathalie Villa-Vialaneix | Epistasis and GWAS 22/23
  • 45. no conclusion because this is just the beginning of the discussion... (and I was dead tired finishing my slides at 4 am this morning) Nathalie Villa-Vialaneix | Epistasis and GWAS 23/23
  • 46. References Bureau, A., Dupuis, J., Falls, K., Lunetta, K. L., Hayward, B., Keith, T. P., and van Eerdewegh, P. (2004). Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology, 28(2):171–182. Cordell, H. J. (2002). Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human Molecular Genetics, 11(20):2463–2468. Emily, M. (2002). IndOR: a new statistical procedure to test for SNP-SNP epistasis in genome-wide association studies. Statistics in Medecine, 31(21):2359–2373. Emily, M. (2016). AGGrEGATOr: a gene-based gene-gene interaction test for case-control association studies. Statistical Applications in Genetics and Molecular Biology, 15(2):151–171. Emily, M. (2018). A survey of statistical methods for gene-gene interaction in case-control genome-wide association studies. Journal de la Société Française de Statistique, 159(1):27–67. Fan, R., Zhong, M., Wang, S., Andrew, A., Karagas, M., Chen, H.and Amos, C., Xiong, M., and Moore, J. (2011). Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases. Genetic Epidemiology, 35:706–721. Fisher, R. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52(9):399–433. Li, J., Tang, R., Biernacka, J. M., and de Andrade, M. (2009). Identification of gene-gene interaction using principal components. BMC Proceedings, 3(Suppl 7):S78. Nathalie Villa-Vialaneix | Epistasis and GWAS 23/23
  • 47. Li, M.-X., Gui, H.-S., and Kwan, Johnny S.H. Sham, P. C. (2011). GATES: a rapid and powerful gene-based association test using extended Simes procedure. The American Journal of Human Genetics, 88(3):283–293. Marchini, J., Donnelly, P., and Cardon, L. R. (2005). Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics, 37:413–417. Neil, C., Sinoquet, C., Dina, C., and Rocheleau, G. (2015). A survey about methods dedicated to epistasis detection. Frontiers in Genetics. Pendergrass, S. A., Frase, A., Wallace, J., Wolfe, D., Katiyar, N., Moore, C., and Ritchie, M. D. (2013). Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development. BioData Mining, 6:25. Peng, Q., Zhao, J., and Xue, F. (2010). A gene-based method for detecting gene-gene co-association in a case-control association study. European Journal of Human Genetics, 18:582–587. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., and Skiar, P. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81(3):559–575. Ritchie, M. D., Hahn, L. W., Roodi, N., Bailey, L. R., Dupont, W. D., Parl, F. F., and Moore, J. H. (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. The American Journal of Human Genetics, 69(1):138–147. Robnik-Šikonja, M. and Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53(1-2):23–69. Shannon, C. E. (1948). Nathalie Villa-Vialaneix | Epistasis and GWAS 23/23
  • 48. A mathematical theory of communication. Bell System Technical Journal, 27:347–423 and 623–656. Stanislas, V. (2017). Approches statistiques pour la detection d’épistasie dans les études d’associations pangénomiques. Thèse de doctorat, Université Paris Saclay, Paris, France. Stanislas, V., Dalmasso, C., and Christophe, A. (2017). Eigen-epistasis for detecting gene-gene interactions. BMC Bioinformatics, 18:54. Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., and Zeilis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9:307. Wan, X., Yang, C., Yang, Q., Xue, H., Fan, X., Tang, N. L., and Yu, W. (2010). BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. The American Journal of Human Genetics, 87(3):325–340. Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J., and Lin, X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics, 86(6):929–942. Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E., and Lange, K. (2009). Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6):714–721. Zhang, Y. and Liu, J. S. (2007). Bayesian inference of epistatic interactions in case-control studies. Nature Genetics, 39:1167–1173. Zhao, J., Jin, L., and Xiong, M. (2006). Test for interaction between two unlinked loci. The American Journal of Human Genetics, 79(5):831–845. Nathalie Villa-Vialaneix | Epistasis and GWAS 23/23