Epidemiologisk FredagsmøDe 15 2 2008

Association Mapping
Through local genealogies

Thomas Mailund
Bioinformatics Research Center
http://www.birc.au.dk/

Gunshot wounds
Car accidents
Smoking induced
lung cancer “Genetic” Diseases
Cardiovascular
disease
Obesity
Diabetes 2
Alzheimer
Schizophrenia
BRCA1
breast cancer
Cystic ﬁbrosis
Haemophilia

Disease Mapping...
Locate disease-affecting polymorphism

Cases (affected)
--A--------C--------A----G---X----T---C---A----
--T--------G--------A----G---X----C---C---A----
--A--------G--------G----G---X----C---C---A----
--A--------C--------A----G---X----T---C---A----
--T--------C--------A----G---X----T---C---A----
--T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---G----
--T--------C--------A----T---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------G----T---X----C---A---A----
--A--------C--------A----G---X----C---C---G----

Controls (unaffected)

Unrealistic Assumptions
We only measure -A-- -C- -A--

“unphased” data
--T-- --G- -G-- -C-
-A-- -A--
--A-- --C- -G-- -T- -C-


“unphased” data
--T-- --G- -G-- -C-
-A-- -A--
--A-- --C- -G-- -T- -C-

We ﬁrst need to
infer the phase
--T--------G--------A----G--------C---C---A----
--A--------C--------A----G--------T---C---A----


“unphased” data
--T-- --G- -G-- -C-
-A-- -A--
--A-- --C- -G-- -T- -C-

We ﬁrst need to
infer the phase
--T--------G--------A----G--------C---C---A----
--A--------C--------A----G--------T---C---A----

--T--------G--------A----G--------T---C---A----
--A--------C--------A----G--------C---C---A----


“unphased” data
--T-- --G- -G-- -C-
-A-- -A--
--A-- --C- -G-- -T- -C-

We ﬁrst need to
infer the phase
--T--------G--------A----G--------C---C---A----
--A--------C--------A----G--------T---C---A----

--T--------G--------A----G--------T---C---A----
--A--------C--------A----G--------C---C---A----

--T--------C--------A----G--------T---C---A----
--A--------G--------A----G--------C---C---A----


“unphased” data
--T-- --G- -G-- -C-
-A-- -A--
--A-- --C- -G-- -T- -C-

We ﬁrst need to

?
infer the phase
--T--------G--------A----G--------C---C---A----
--A--------C--------A----G--------T---C---A----

--A--------G--------A----G--------C---C---A----
--T--------C--------A----G--------T---C---A----

--T--------C--------A----G--------T---C---A----
--A--------G--------A----G--------C---C---A----

Disease Mapping...
Markers are locally correlated

Cases (affected)
--A--------C--------A----G---X----T---C---A----
--T--------G--------A----G---X----C---C---A----
--A--------G--------G----G---X----C---C---A----
--A--------C--------A----G---X----T---C---A----
--T--------C--------A----G---X----T---C---A----
--T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---G----
--T--------C--------A----T---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------G----T---X----C---A---A----
--A--------C--------A----G---X----C---C---G----


Disease Mapping...
Search for indirect signals

Cases (affected)
--A--------C--------A----G---X----T---C---A----
--T--------G--------A----G---X----C---C---A----
--A--------G--------G----G---X----C---C---A----
--A--------C--------A----G---X----T---C---A----
--T--------C--------A----G---X----T---C---A----
--T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---G----
--T--------C--------A----T---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------G----T---X----C---A---A----
--A--------C--------A----G---X----C---C---G----


Marker Relatedness
Linkage disequilibrium (LD)

Empirical Results Theoretical Results

LD (r2)

Recombination rate

Clark et al. 2003, AJHG 73:285-300. Hein et al. 2005

Indirect Association
“Tag” markers Unobserved marker

Cases (affected)
--A--------C--------A----G---X----T---C---A----
--T--------G--------A----G---X----C---C---A----
--A--------G--------G----G---X----C---C---A----
--A--------C--------A----G---X----T---C---A----
--T--------C--------A----G---X----T---C---A----
--T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---G----
--T--------C--------A----T---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------G----T---X----C---A---A----
--A--------C--------A----G---X----C---C---G----


Indirect Association

Cases (affected)
--A--------C--------A----G---X----T---C---A----
--T--------G--------A----G---X----C---C---A----
--A--------G--------G----G---X----C---C---A----
--A--------C--------A----G---X----T---C---A----
--T--------C--------A----G---X----T---C---A----
--T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---G----
--T--------C--------A----T---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------G----T---X----C---A---A----
--A--------C--------A----G---X----C---C---G----


Indirect
Multi-Marker
Association
Cases (affected)
--A--------C--------A----G---X----T---C---A----
--T--------G--------A----G---X----C---C---A----
--A--------G--------G----G---X----C---C---A----
--A--------C--------A----G---X----T---C---A----
--T--------C--------A----G---X----T---C---A----
--T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------A----G---X----T---C---G----
--T--------C--------A----T---X----T---C---A----
--A--------C--------A----G---X----T---C---A----
--A--------C--------G----T---X----C---A---A----
--A--------C--------A----G---X----C---C---G----


The Ancestral
Recombination Graph

Hudson 1990, Grifﬁth and Marjoram 1996

A Reasonable Local Model
Copyright Ó 2007 by the Genetics Society of America
DOI: 10.1534/genetics.107.071126

On Recombination-Induced Multiple and Simultaneous Coalescent Events

Joanna L. Davies,1 Frantisek Simanc´k, Rune Lyngsø, Thomas Mailund and Jotun Hein
ˇ ˇı
Department of Statistics, University of Oxford, Oxford, OX1 3TG, United Kingdom
Manuscript received January 18, 2007
Accepted for publication October 2, 2007

ABSTRACT
Coalescent theory deals with the dynamics of how sampled genetic material has spread through a
population from a single ancestor over many generations and is ubiquitous in contemporary molecular
population genetics. Inherent in most applications is a continuous-time approximation that is derived
under the assumption that sample size is small relative to the actual population size. In effect, this
precludes multiple and simultaneous coalescent events that take place in the history of large samples. If
sequences do not recombine, the number of sequences ancestral to a large sample is reduced sufficiently
after relatively few generations such that use of the continuous-time approximation is justified. However,
in tracing the history of large chromosomal segments, a large recombination rate per generation will
consistently maintain a large number of ancestors. This can create a major disparity between discrete-time
and continuous-time models and we analyze its importance, illustrated with model parameters typical of
the human genome. The presence of gene conversion exacerbates the disparity and could seriously
undermine applications of coalescent theory to complete genomes. However, we show that multiple and
simultaneous coalescent events influence global quantities, such as total number of ancestors, but have
negligible effect on local quantities, such as linkage disequilibrium. Reassuringly, most applications of the
coalescent model with recombination (including association mapping) focus on local quantities.

K INGMAN (1982) models the ancestry of a sample
of sequences with a continuous-time Markov pro-
cess referred to as the Kingman coalescent. Lineages
ulation size, the probability of such events occurring
becomes nonnegligible and consequently in these
instances the rate of coalescence is underestimated
collide or coalesce after random exponential waiting by Hudson’s continuous-time model. Hudson’s model

A Reasonable Local Model
• The “back in time” approach (in general)
means we ignore selection
• Implicit assumption that the disease is
selectively neutral
• Which may or may not be reasonable...
• Might be okay for late onset diseases...

The ARG as a
Statistical Model

P( )

The ARG as a
Statistical Model

P( | )

The ARG as a
Statistical Model

P( | , )P( |)

The ARG as a
Statistical Model
lhd( )=
P( | )=
∫P( | , )P( | )d

The ARG as a
Statistical Model
lhd( )=
∫P( | , )P( | )d
Integration by magic

The ARG as a
Statistical Model
lhd( )=
∫P( | , )P( | )d
Integration by magic
statistical sampling

ARG Methods

• Sampling ARGs from the coalescence
process
• Sampling ARGs conditional on the data
(importance sampling)
• Sampling parsimonious ARGs conditional
on the data

ARG Methods
process
• This is a no go -- you would never sample an
ARG that can explain the data

on the data

ARG Methods
process
• Larribe, Lessard and Schork 2002 -- scales to
tens of individuals and tens of markers

on the data

ARG Methods
• Sampling parsimonious ARGs conditional on
the data
• Lyngsø, Song & Hein 2005 (calculates parsimonious
ARGs -- a 2008 paper in press for sampling)

• Minichiello & Durbin 2006 (samples parsimonious
ARGs and scores local genealogies)

• Both preferentially selects mutations and
coalescence events over recombinations

• Scales to thousands of individuals and hundreds of
markers

Local Phylogenies
For each “point” on the chromosome, the ARG
determines a (local) tree:

Changing Phylogenies
Type 1: No change

Type 2: Change in branch lengths

Type 3: Change in topology

From Hein et al. 2005

Trees and LD
Tree similarity

LD r2

Recombination rate Recombination rate

Clustering on a Tree
Disease affecting mutation

Complete penetrance

Incomplete penetrance

Spurious disease


25%
Case/control clustering
is not random on the tree...
75%

40%
60%

Sampling Trees
(with recombination)

Zöllner & Pritchard 2005

Sampling Trees
(with recombination)
We only sample the
process on the left --
much fewer events

Zöllner & Pritchard 2005

Using “Perfect Phylogenies”
Use the four-gamete test to ﬁnd regions that
can be explained by a tree with no recurrent mutations

Mailund, Besenbacher & Schierup 2006

Build trees for each such region


Each marker splits a sub-tree in two



Much faster (and much cruder)

Catches the essential tree structure


Scoring the Clustering

Red=cases
Green=controls

Are the case chromosomes signiﬁcantly
over-represented in some clusters?

Wild-types

Mutation

Mutants

We can place “mutations” on the tree edges
and partition chromosomes into “mutants”
and “wild-types” and test for different
distributions of cases and controls

Wild-types

Mutation

Mutants

Use average or maximum to score the tree

Average is kosher Bayesian stats; maximum
needs to be corrected for over-ﬁtting.

Blossoc
(BLOck aSSOCiation)
Homepage: www.birc.au.dk/~mailund/Blossoc
Command line and
graphical user interface
(with limited functionality)

Blossoc
(BLOck aSSOCiation)
Homepage: www.birc.au.dk/~mailund/Blossoc

Fast enough to analyse
tens of thousands of
individuals in hundred of
thousands of markers in a
day or two on a desktop
computer...

Localisation Accuracy
A single causal mutation
Max BF / min p-value used as point estimate

Localisation Accuracy
Two causal mutations
Max BF / min p-value used as point estimate

Thank you!

More information at
http://www.birc.au.dk/~mailund/association-mapping/

Epidemiologisk FredagsmøDe 15 2 2008

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Epidemiologisk FredagsmøDe 15 2 2008

Ähnlich wie Epidemiologisk FredagsmøDe 15 2 2008 (7)

Mehr von mailund

Mehr von mailund (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Epidemiologisk FredagsmøDe 15 2 2008