Simulating Genes in Genome-wide Association Studies

Simulating Genes in
GWAS
Kevin R. Thornton
Ecology and Evolutionary Biology
UC Irvine
slides will be available at
http://www.slideshare.net/molpopgen
http://www.molpopgen.org

Acknowledgements
Tony Long
Andrew Foran
Jaleal Sanjak

Several genomic regions have been implicated in linkage studies
and, recently, replicated evidence implicating specific genes has been
reported. Increasing evidence suggests an overlap in genetic suscept-
ibility with schizophrenia, a psychotic disorder with many similar-
ities to BD. In particular association findings have been reported with
expanded reference group analysis (Supplementary Table 9), it is of
interest that the closest gene to the signal at rs1526805 (P 5 2.2 3
1027
) is KCNC2 which encodes the Shaw-related voltage-gated pot-
assium channel. Ion channelopathies are well-recognized as causes of
episodic central nervous system disease, including seizures, ataxias
−log10
(P)
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
Chromosome
Type 2 diabetes
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
22
XX
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Coronary artery disease
Crohn’s disease
Hypertension
Rheumatoid arthritis
Type 1 diabetes
Bipolar disorder
Figure 4 | Genome-wide scan for seven diseases. For each of seven diseases
2log10 of the trend test P value for quality-control-positive SNPs, excluding
those in each disease that were excluded for having poor clustering after
visual inspection, are plotted against position on each chromosome.
Chromosomes are shown in alternating colours for clarity, with
P values ,1 3 1025
highlighted in green. All panels are truncated at
2log10(P value) 5 15, although some markers (for example, in the MHC in
T1D and RA) exceed this significance threshold.
666
doi:10.1038/nature05911
Burton et al.

the differences observed in their allelic architecture. Some apparent
differences may simply be due to differences in the stage of investiga-
tion across traits. Studies in several conditions have clearly demon-
strated that the number of detected variants increases with increasing
sample size22–24
.
Population genetic theory suggests an explanation for the paucity
of variants explaining a large proportion of disease predisposition, in
that decreased reproductive fitness should typically act to reduce the
frequencies of high-risk variants. This might explain the relative lack
of variants detected so far for some neuropsychiatric conditions, such
as autism spectrum disorders, given their low reproductive fitness25
.
Yet for a condition such as type 1 diabetes, which has a similar pre-
valence, familial risk, early onset and poor reproductive fitness (at
yielded intriguing new variants33,34
. Studies of populations of recent
African ancestry in particular is likely to increase the yield of rare
variants and narrow the large chromosomal regions of association
identified in the ‘younger’ population due to extended linkage dis-
equilibrium, or the tendency for adjacent genetic loci to be inherited
together31
. Isolated populations may also be of value given their
potential to be enriched in unique variants35
.
The accuracy of current heritability estimates is also important,
because experimentally identified variants could never explain all the
variance in an erroneously inflated heritability estimate. Heritability
of quantitative traits, formally defined as the proportion of pheno-
typic variance in a population attributable to additive genetic factors
(narrow-sense heritability, h2
(ref. 36)) is typically estimated from
Table 1 | Estimates of heritability and number of loci for several complex traits
Disease Number of loci Proportion of heritability explained Heritability measure
Age-related macular degeneration72
5 50% Sibling recurrence risk
Crohn’s disease21
32 20% Genetic risk (liability)
Systemic lupus erythematosus73
Type 2 diabetes74
HDL cholesterol75
7 5.2% Residual* phenotypic variance
Height15
40 5% Phenotypic variance
Early onset myocardial infarction76
9 2.8% Phenotypic variance
Fasting glucose77
4 1.5% Phenotypic variance
* Residual is after adjustment for age, gender, diabetes.
748
Macmillan Publishers Limited. All rights reserved©2009
doi:10.1038/nature08494
Manolio et al.

NHGRI GWA Catalog
www.genome.gov/GWAStudies
www.ebi.ac.uk/fgpt/gwas/
Published Genome-Wide Associations through 12/2012
P -8 for 17 trait categories

doi:10.1371/journal.pbio.1000579
Wray et al.

Unsurprisingly, since the GWAS method is primarily powered
common alleles, risk allele frequencies were well above 5%
all TASPs (reported index TASs with an association p valu
5.0 ϫ 10Ϫ8 and all HapMap phase II CEU SNPs in LD [r2 Ͼ 0
OCA2, eye color
MC1R, hair color
LOXL1, exfoliation glaucoma125102030
OddsRatio
0 20 40 60 80 100
Reported risk allele frequency, %
1. Published odds ratios for discrete traits by reported risk allele frequencies. Labeled SNP-trait associations are those with the highest ORs. Note tha
is is on the log scale.
www.pnas.org/cgi/doi/10.1073/pnas.0903103106
Hindorff et al.

tion explained by rare variants, because natural selection should
mize the frequency of deleterious variants in the population [24].
efore, for any phenotype, many causal variants will be rare, and
proportion of population-level genetic variance in complex
notypes attributable to variants across the allele frequency
trum will depend upon the strength of selection in our evolu-
ry past. The problem is that this is something that we do not
that the power of detection is proportional to pa2
, but it is clear
for each complex trait, variance is contributed from the entire a
frequency spectrum. This highlights the scarcity of low-frequ
variants identified by GWAS for quantitative traits and com
disease in humans. Detecting these variants will require a comb
tion of greater sample size, better genotyping, and impro
phenotyping.
Minor allele frequency
(A) (B)
Absoluteeﬀect(SDunits)
<0.001 0.01 0.1 0.5
0135
Risk allele frequencyOddsraƟo
<0.001 0.01 0.1 0.5 1
1510 TRENDS in Genetics
e I. For quantitative traits (A), the absolute effect is plotted against the minor allele frequency, whereas for complex common diseases (B), the odds ratio is pl
st the risk allele frequency. Each of the 38 quantitative traits and 43 disease traits are represented by different colors. Abbreviation: SD, standard deviation
http://dx.doi.org/10.1016/j.tig.2014.02.003
Robinson et al.

1
2
3
4
5
6
7
8
9
10
OddsRatio
N
on−synonym
ous
sites
Prom
oters
(1kb)
Prom
oters
(5kb)
5’U
TR
s
3’U
TR
s
m
iR
TS
Intronic
regions
Intergenic
regions
Intergenic
TFBSsC
pG
islandsPR
eM
od
sites
O
R
egAnno
elem
entsEAR
regions
M
C
Ss
H
AR
s
PSG
s
Annotation Set
Enrichment/depletion analysis after adjusting for ’hitchhiking’ effects from non−synonymous sites
Fig. 2. Odds ratios for TAS block enrichment/depletion analysis after adjusting for ‘‘hitchhiking’’ effects from nonsynonymous sites. Four annotation sets (Splice
sites, Validated enhancers, EvoFold elements, and noncoding RNAs) are not represented here because no TAS blocks mapped to these annotation sets. The blue
circle represents the point estimate of the odds ratio (OR) and the red lines represent the 95% CI. Possible ‘‘hitchhiking’’ effects from nonsynonymous sites are
reduced by discarding any TASP/control SNP in r2 Ͼ 0.6 with a nonsynonymous SNP. For an explanation of the annotation sets on the x axis, we refer the reader
to Table S4. Note that the y axis is on the log scale. Nonsynonymous OR computation is not adjusted for ‘‘hitchhiking’’ effects.
www.pnas.org/cgi/doi/10.1073/pnas.0903103106
Hindorff et al.

Observation Interpretation
Missing H Lots
Uniform frequencies of “hits” Common associations exist
Rare hits have larger OR
Rare alleles may have larger
effects
Larger OR in genes Genes matter

Observation Interpretation
Rare hits have larger
OR
Rare alleles may have
larger effects
Disease is harmful
with respect to ﬁtness
(in the evolutionary
sense).
Larger OR in genes Genes matter

0.4 0.02
0.01
0.01
0.00
a b
0.3
Frequencyofobservations
Causalvariantfrequency
0.2
0.1
0
0.05 0.50 1.0
Figure 3 | Inconsistency between genome-wide association stu
a | The frequency distribution of risk allele frequencies (shown in lighdoi:10.1038/nrg3118
Gibson

0.4 0.020
0.015
0.010
0.005
a b
0.3
0.2
0.1
0
0.05 0.50 1.0 0.1 0.2 0.3 0.4 0.5
Odds
ratio
2
3
4
5
6
7
8
9
> 9
Figure 3 | Inconsistency between genome-wide association study results and rare variant expectations.
a | The frequency distribution of risk allele frequencies (shown in light red) for 414 common variant associations with 17
diseases is only slightly skewed towards lower-frequency variants. By contrast, simulations — in this case, assuming up
to nine rare causal variants inducing the common variant association with SNPs at the same frequency as observed on
common genotyping platforms (light green bars) — result in a marked left-skew with a peak for common variants
whose frequency is less than 10%. (The skew is even stronger if only a single causal variant is responsible.) The observed
data are thus not immediately consistent with the rare variant model. b | Part of the problem with synthetic associations
is that they would explain too much heritability if they were pervasively responsible for common variant effects. This is
due to the relationship between allele frequency, maximum possible linkage disequilibrium (LD) and the amount of
variance explained19
. The plot shows the expected odds ratio due to a rare variant of the indicated frequency (from
0.5% to 2%) if it increases the odds ratio at a common SNP (with which it is in maximum possible LD) by 1.1-fold.
Intermediate effect sizes (2 < odds ratio < 5) require combined causal variant frequencies in excess of 1%. As the
number of rare variants increases, the likelihood that they are in high LD with the common variant also drops, further
WS
The multiplicative model
G =
Y
i
(1 + ei)
Risch & colleagues, Pritchard,
countless others

The multiplicative model
G =
Y
i
(1 + ei)
0 2 4 6 8 10
0246810
Causative mutations on paternal allele
Causativemutationsonmaternalallele
0.2
0.4
0.6
0.8
1
1.2
1.4
Risch & colleagues, Pritchard,
countless others

WWHD?
(What would Haldane do?)
p2 2pq q2
1 1 sh 1 2s
Genotype AA Aa aa
Mating
frequency
Fitness
ˆq =
u
sh
ˆq ⇡
r
u
s
as h ! 0
DOI: 10.1017/S0305004100015644
Haldane

Mutation at rate u (per gamete per generation)
“A” allele
X
X
X
“a” allele
is heterogeneous
in its molecular origin
trans-heterozygotes are at risk.
Phenotype has (weak) effect on individual ﬁtness
doi:10.1371/journal.pgen.1003258
Thornton et al.

E↵ect sizes ⇠ Exp( )
0.0
2.5
5.0
7.5
0.0 0.3 0.6 0.9
Effect size
density
= effect of haplotype.
Additive over causative mutations
hi
Thornton et al.

Gij =
p
hi ⇥ hj
(geometric mean)
0 2 4 6 8 10
0246810
Causative mutations on paternal allele
Causativemutationsonmaternalallele
0.05
0.1
0.15
0.2
0.25
0.3 0.35
0.4
Pi,j = Gi,j + N(0, )
w = e
(Pi,j )2
2 2
S
Thornton et al.

Aside: simulation tools
• C++ library for rapid forward simulation
• Available from https://github.com/molpopgen/
fwdpp
• Preprint on arXiv at http://arxiv.org/abs/1401.3786

1e−031e−021e−011e+001e+01
θ = ρ = 100
Population size (N diploids)
Meanruntime(days)
1000 10000 50000
sfs_code
SLiM
fwdpp (gamete−based)
fwddpp (individual−based)
0.0050.0200.0500.2000.5002.0005.000
θ = ρ = 500
1000 10000 50000
51020501002005001000
Meanpeakmemoryuse(Mb)
1000 10000 50000
1020501002005001000
1000 10000 50000
http://arxiv.org/abs/1401.3786
Thornton

2Nsh = 1 2Nsh = 10 2Nsh = 100
0
5
10
15
20
0.1 0.5 1 0.1 0.5 1 0.1 0.5 1
Proportion of new mutations that are deleterious
Meanruntime(hours)
Simulation
fwdpp (gamete−based)
fwdpp (individual−based)
SLiM
2Nsh = 1 2Nsh = 10 2Nsh = 100
0
50
100
150
0.1 0.5 1 0.1 0.5 1 0.1 0.5 1
Proportion of new mutations that are deleterious
Meanpeakmemoryuse(megabytes)
http://arxiv.org/abs/1401.3786
Thornton

Selection is weak
●●● ● ● ● ● ● ● ● ●
0.0 0.1 0.2 0.3 0.4 0.5
0.700.800.901.00
Mean effect size (λ)
Relativefitness
● Population mean fitness
Average fitness of a case
Average minimum fitness
Thornton et al.

Heritability plateaus
●
●
●
●
●
● ●
● ● ●
●
0.0 0.1 0.2 0.3 0.4 0.5
0.000.020.040.06
Mean effect size (λλ)
Broad−senseheritability
Thornton et al.

Rare alleles
0.00.20.4
Derived allele frequency
Proportion
1 5 10
●
●
● ● ● ● ● ● ● ● ●
= 0.25
Thornton et al.

GWAS have poor power
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.8
Power
GWAS
GWAS,
no recombination
resequencing
resequencing
no recombination
Thornton et al.

Compare model to data…
0.4 0.020
0.015
0.010
0.005
a b
0.3
0.2
0.1
0
0.05 0.50 1.0 0.1 0.2 0.3 0.4 0.5
Odds
ratio
2
3
4
5
6
7
8
9
> 9
Figure 3 | Inconsistency between genome-wide association study results and rare variant expectations.
a | The frequency distribution of risk allele frequencies (shown in light red) for 414 common variant associations with 17
diseases is only slightly skewed towards lower-frequency variants. By contrast, simulations — in this case, assuming up
to nine rare causal variants inducing the common variant association with SNPs at the same frequency as observed on
common genotyping platforms (light green bars) — result in a marked left-skew with a peak for common variants
whose frequency is less than 10%. (The skew is even stronger if only a single causal variant is responsible.) The observed
data are thus not immediately consistent with the rare variant model. b | Part of the problem with synthetic associations
is that they would explain too much heritability if they were pervasively responsible for common variant effects. This is
due to the relationship between allele frequency, maximum possible linkage disequilibrium (LD) and the amount of
variance explained19
. The plot shows the expected odds ratio due to a rare variant of the indicated frequency (from
0.5% to 2%) if it increases the odds ratio at a common SNP (with which it is in maximum possible LD) by 1.1-fold.
Intermediate effect sizes (2 < odds ratio < 5) require combined causal variant frequencies in excess of 1%. As the
number of rare variants increases, the likelihood that they are in high LD with the common variant also drops, further
reducing the probability that they can explain observed common variant association. Suppose that a disease has a
REVIEWS
doi:10.1038/nrg3118 doi:10.1371/journal.pbio.1000579
Gibson Wray et al.

…reveals a pretty good ﬁt
doi:10.1371/journal.pbio.1000579
Wray et al.
0246810
MAF of most significant marker
(in cases)
Meannumberofmarkers
n = 36.899
0 0.1 0.2 0.3 0.4 0.5
= 0.05
(Based on simulating
imperfect SNP chips)

“Burden” tests do badly…
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Power
GWAS
GWAS
no recombination
Resequencing
Resequencing
no recombination
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Power
50 markers
50 markers
no recombination
100 markers
100 markers
no recombination
200 markers
200 markers
no recombination
250 markers
250 markers
no recombination
Madsen and Browning
(2009)
Li and Leal (2008)
Thornton et al.

…because the model is
wrong.
●
●
●
●
●
●
●
●
●
●
0.0 0.1 0.2 0.3 0.4 0.5
02468
Meannumberofcausativemutationsperdiploid
●
●
●
●
●
●
●
●
●
●
●
●
Controls
Cases
Controls (rares)
Cases (rares)
Thornton et al.

SKAT does ok
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Power
Resequencing, default weights and optimal p−values
GWAS, default weights and optimal p−values
Resequencing, Madsen−Browning weights and optimal p−values
GWAS, Madsen−Browning weights and optimal p−values
Thornton et al.

Manhattan plots
0 20 40 60 80 100
051015
Position (kbp)
−log10(p)
Common
Common, causative
Rare
Rare, causative
0 20 40 60 80 100
051015
Position (kbp)
−log10(p)
Common
Common, causative
Rare
Rare, causative
Methods), and excluded 153 individuals on this basis. We next
evolutio
particul
eases; po
tase 1) a
well as
biology
There
capture
implem
STRUC
reverted
subset o
librium
clearly p
rather th
show th
perhaps
tary Fig
The
results
Europe
trend te
1.05 for
diseases
than str
sion of
ariates i
only slig
graphica
P values
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
3020
20
100
0
40
80
60
40
100
Observedteststatistic
Expected chi-squared value
a
b
Figure 2 | Genome-wide picture of geographic variation. a, P values for the
11-d.f. test for difference in SNP allele frequencies between geographical
regions, within the 9 collections. SNPs have been excluded using the project
quality control filters described in Methods. Green dots indicate SNPs with a
P value ,1 3 1025
. b, Quantile-quantile plots of these test statistics. SNPs at
which the test statistic exceeds 100 are represented by triangles at the top of
the plot, and the shaded region is the 95% concentration band (see
Methods). Also shown in blue is the quantile-quantile plot resulting from
removal of all SNPs in the 13 most differentiated regions (Table 1).
NATURE|Vol 447|7 June 2007
doi:10.1038/nature05911
Burton et al.
Thornton et al.

A new association test
evolutionary interest, genes showing eviden
particularly interesting for the biology of tra
eases; possible targets for selection include N
tase 1) at 11q13, which could have a role in
well as TLR1 (toll-like receptor 1) at 4p14
biology of tuberculosis and leprosy has been
There may be important population st
captured by current geographical region
implementations of strongly model-base
STRUCTURE11,12
are impracticable for dat
reverted to the classical method of principa
subset of 197,175 SNPs chosen to reduce in
librium. Nevertheless, four of the first si
clearly picked up effects attributable to loc
rather than genome-wide structure. The rem
show the same predominant geographical t
perhaps unsurprisingly, London is set some
tary Fig. 8).
The overall effect of population struc
results seems to be small, once recent
Europe are excluded. Estimates of over-disp
trend test statistics (usually denoted l; ref. 1
1.05 for RA and T1D, respectively, to 1.08
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
3020
20
100
0
40
80
60
40
100
Observedteststatistic
Expected chi-squared value
a
b
Figure 2 | Genome-wide picture of geographic variation. a, P values for the
11-d.f. test for difference in SNP allele frequencies between geographical
regions, within the 9 collections. SNPs have been excluded using the project
NATURE|Vol 447|7 June 2007
ESMK =
i=KX
i=1
✓
log10(pi) + log10
i
K
◆
Thornton et al.

ESM is a more powerful test
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Power
GWAS
GWAS,
no recombination
resequencing
resequencing
no recombination
(Caveat: requires permutation to get p-values)
Thornton et al.

Running ESM on real data
• We think we can implement ESM using a mix of the
PLINK toolkit plus some custom programs.
• We need data to test it out on.
• There are very few modern GWAS available for
reanalysis.
• Lack of data sharing hurts the ﬁeld.

Rare alleles and missing
heritability
• Current tests are underpowered
• Heterogeneity means that GWAS “hits” tag few
causative mutations
• Causative mutations that are tagged tend to be
(relatively) common. These “common” mutations
have effect sizes much smaller than the typical
causative mutation that segregates

●
●
●● ●
●
●
● ●● ●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●● ● ● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
0.010 0.025
0.050 0.075
0.100 0.125
0.175 0.250
0.350 0.500
0.0000
0.0015
0.0030
0.0000
0.0015
0.0030
0.0000
0.0015
0.0030
0.0000
0.0015
0.0030
0.0000
0.0015
0.0030
0 1 2 0 1 2
Number of copies of derived allele at focal SNP
Meannumberofcausativesingletonsperindividual
Focal SNP
●
●
Most significant marker
Unassociated SNP
Thornton et al.

Population growth
Time
PresentPast
Populationsize

H^2 insensitive to growth
●
●
●
●
● ●
●
●
●
●
0.01
0.02
0.03
0.04
0.0 0.1 0.2 0.3 0.4 0.5
Average effect size of new mutation
Meanbroad−senseheritability
model
●
constant
growth
Unpublished

Consistent with recent
ﬁndings from other groups
N A LY S I S
t despite these substantial shifts in the
rall frequency spectrum, the impact on
netic load—namely, the mean number of
eterious variants per individual and thus
average fitness—is much more subtle.
n the semidominant case, the individual
rden is essentially unaffected by these
mographic events (Fig. 1c,d). With growth,
increased number of segregating sites
alanced exactly by a decrease in the mean
quency (with the converse being true for
bottleneck model) so that the number
variants per individual stays constant.
is kind of balance is predicted by classic
tation-selection balance models18 and
n be shown to hold for general changes
population size, provided that selection
trong and deleterious alleles are at least
tially dominant (Supplementary Note).
The behavior of the recessive model is
re complicated (Fig. 1e,f). In the bottle-
a b
c d
e f
100
–1,000 0 1,000 2,000 3,000
Time since beginning of bottleneck (generations) Time since beginning of growth (generations)
10,000
1,000
–1,000 0 1,000 2,000 3,000
Time (generations)
Bottleneck
Populationsize
100,000
10,000
Time (generations)
Growth
Populationsize
–200 –100 0 100 200
10
2
10
4
SemidominantRecessive
NumberperMB
102
104
102
104
umberperMB
umberperMB
100
10
2
10
4
NumberperMB
Number of
segregating sites
Number of segregating
sites
Number of segregating sites
Number of deleterious
alleles per individual
Number of deleterious alleles per individual
Number of rare deleterious alleles
Number of segregating sites
Number of rare segregating sites
Number of rare segregating
sites
Load: number of deleterious alleles per individual
Load: number of homozygous sites per individual
Load: number of deleterious alleles per individual
Number of rare
deleterious
alleles per individual
Number of rare deleterious alleles per individual
–200 –100 0 100 200
ure 1 Time course of load and other key
ects of variation through a bottleneck and
onential growth. (a,b) The bottleneck (a)
exponential growth (b). (c–f) The expected
mber of variants and alleles per MB assuming
midominant mutations (c,d) or recessive
tations (e,f) with s = 1% and a mutation rate
site per generation of 10−8.
Simons et al.
doi:10.1038/ng.2896

Power is affected
0.00
0.02
0.04
0.06
0.08
0.000 0.025 0.050 0.075 0.100
Effect size of segregating causative mutation
Frequencyinpopulation
Model
Constant
Growth
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
0.2
0.4
0.6
0.8
0.0 0.1 0.2 0.3 0.4 0.5
Mean effect size of causative mutation
Power
Statistic
●
ESM50
Logit
SKAT
Model
Constant
Growth
Unpublished

Excellent ﬁt to empirical
data
Frequency of most−associated marker
No.markers
0.0 0.2 0.4 0.6 0.8 1.0
02468101214
Unpublished

Implications
• Power to detect regions with modest effects on risk
(4-5% contribution to broad-sense heritability) is
very low in growing populations
• The explanatory power of simple models is
probably far from exhausted

Implications
• Much more likely to detect loci
with mutations of modest
effect
• Underlying distribution of
mean effect size across loci is
completely unknown in any
system
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
0.2
0.4
0.6
0.8
0.0 0.1 0.2 0.3 0.4 0.5
Mean effect size of causative mutation
Power
Statistic
●
ESM50
Logit
SKAT
Model
Constant
Growth
Unpublished

Future work
• Multilocus models with epistasis
• Machine learning approaches: do they work?
• Develop new simulation tools
• Make simulation output available
• Implement ESM test for analyzing real GWAS data

Other work in the lab
• Copy number variation in Drosophila: doi: 10.1093/
molbev/msu124
• Detecting TE insertions using paired-end data in
Drosophila: doi: 10.1093/molbev/mst129
• Modeling experimental evolution: doi: 10.1093/
molbev/msu048
• Structural variation and variation in gene
expression

Simulating Genes in Genome-wide Association Studies

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie Simulating Genes in Genome-wide Association Studies

Ähnlich wie Simulating Genes in Genome-wide Association Studies (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Simulating Genes in Genome-wide Association Studies