2. All about my classes
LekkiWood@Gmail.com
• Lectures are stand alone - No preparation needed
except for previous course content.
• Nearly always provide additional resources
• -Take home exercise
• -Papers referenced
• -Resources such as other lecture slides
5. Try to always orient you to the session
• Go over the theory of linkage disequilibrium and
haplotypes
• Calculate linkage disequilibrium by hand
• Relaxing session: story of HapMap
• Lab: Today walk you through, hand-holding look
at HapMap.
• Each ~30 minutes, so please go spent extra time
familiarizing yourself with HapMap.
6. Try to give you your learning objectives
• Primary objectives
• Describe linkage disequilibrium and a haplotype
• Explain the meaning of r2 = 1.0, r2 = .8 and r2 = .5
• Find a region of interest (ROI) on HapMap
• Locate tagSNPs for an ROI on HapMap.
• Secondary objectives
• Describe how mutations and recombination give rise to linkage
disequilibrium and haplotypes
• Calculate D, D’ and r2 by hand
• List key differences between D, D’ and r2
• Evaluate the contribution of HapMap to public health genetics
8. One source of variation in our DNA occurs
through mutation events….
A
C
C
Mutation
Ancestral population
Mutation event
A
Population
9. Mutations that proliferate are ‘SNPs’
• Single Nucleotide Polymorphisms
• The most common type of variation in DNA
• Substitution of 1 nucleotide for another
• 2/3 SNPs involve C-> T
• Definition is evolving:
• Old definition: SNPs must be seen in 1% of the
population
• SNPs occur ~ every 300 bp
• Therefore ~ 10 million SNPs in the human genome
10. The number of mutations increases over time
A
C
1st Mutation event
2nd Mutation event
G
G
A
C
G
G
C C Mutation
11. Proliferating SNPs give rise to haplotypes
• A haplotype is “A specific set of DNA variants observed
on a single chromosome, or part of a chromosome”
• In practice, usually referring to a set of SNPs within a
single gene
13. Resolve the population haplotypes!
C G A C T A G T
GA, CA, GT, CT,
C G A C T A G T A C C A
GAG, CAG, GTG, CTG, GAT, CAT, GTT, CTT,
G
C
A
T
G
C
A
T
G
T
14. How many possible haplotypes?
C G A C T A G T
GA, CA, GT, CT,
C G A C T A G T A C C A
GAG, CAG, GTG, CTG, GAT, CAT, GTT, CTT,
G
C
A
T
G
C
A
T
G
T
22 = 6
23 = 8
15. How many possible haplotypes?
2 (alleles) to the power of n
loci:
2n
16. How many haplotypes does a person have for a
given chromosomal region?
C G A C T A G TG
C
A
T
C G A C T A G TG
C
A
T
C G A C T A G TG
C
A
T
17. But what if the person is homozygous at both
loci?
C G A C T A G TG
C
A
T
C G A C T A G T
C G A C T A G T
GA, CA, GT, CT,
C
C
T
T
CT, CT, CT, CT,
18. Haplotype overview
• Method of characterizing variation at more than one
locus on a chromosome
• Only 1 allele from each locus
• But as many alleles as there are loci on the
chromosome… IF….
……those loci contain variation (SNPs)
• Like SNPs each person has 2 haplotypes
….. Which (like SNPs) may be the same
• The number of possible haplotypes in the population is
2 to the power of n loci.
19. Variation in our DNA also occurs through
recombination
A G
Before recombination
After recombination
C G
C C
A G
C G
C C
A C
20. The number of recombination events increases
over time
21. Our chromosome are mosaics….
• The extent and conservation of pieces depends on:
• Recombination rate
• Mutation rate
• Population size
• Natural selection
24. Linkage Disequilibrium (LD)
• The nonrandom association of alleles at different
loci
• Equilibrium – when things are ‘in balance’ or as we
would expect
• When a particular allele at one locus is found
together on the same chromosome with a specific
allele at a second locus, more often than expected
if the loci were segregating independently in a
population. The loci are in disequilibrium – it is out
of balance, or not what we would expect
28. Summary of part 1
• Mutations give rise to SNPs
• SNPs give rise to haplotypes
• A haplotype is a specific set of DNA variants
• Recombination patterns lead to linkage
disequilibrium
• Linkage disequilibrium is when we see haplotypes
more often than by chance
Questions before we
proceed to calculating LD?
30. All about punnet squares….
Locus B
Locus A
B b
A
a
PAB PAb
PaB Pab
Totals
Totals:
PA
Pa
PB Pb 1.0
2 loci; A: A/a, B: B/b
What are out haplotypes?
31. All about punnet squares (in LD calculation)….
• Each cell contains frequency of a haplotype
• Row & column ends contain the frequency of an
allele
• When you sum the rows and columns you should
get 1.0
32. Measures of Linkage Disequilibrium
• (A Little History lesson)
• Three measures of LD:
• D
• D’
• r
33. Measures of Linkage Disequilibrium - D
• 1960 Lewontin & Kojima
• D – unstandardized measure of how far the
association between two alleles differs from that
expected by chance
37. Linkage Disequilibrium – an example
Given the following haplotype frequencies –
are the alleles in linkage disequilibrium?
PAB = .2
PAb = .5
PaB = .3
Pab = .0
i.e. what is D?
D = PAB - (PAPB)
38. Step 1: Complete the punnet square PAB = .2
PAb = .5
PaB = .3
Pab = .0
Locus B
B b
A
a
.2 .5
.3 .0
Totals
Totals:
.7
.3
.5 .5 1.0
D = PAB - (PAPB)
Locus A
40. Step 3: Calculate D PAB = .2
PAb = .5
PaB = .3
Pab = .0
PA =
Pa =
PB =
Pb =
.7
.3
.5
.5
D = PAB - (PAPB)
D=.2 – (.7 * . 5)
D= -.15
Are the alleles in linkage
disequlibrium?
41. Measures of Linkage Disequilibrium - D
Problems:
• Sign is arbitrary
• Range depends on allele frequencies
42. Measures of Linkage Disequilibrium – D’
• 1964 Lewinton
• D’ – Standardize D to the maximum possible value it
can take
• D’ = D / Dmax/min
43. Step 4: Calculate Dmax/min
PAB = .2
PAb = .5
PaB = .3
Pab = .0
PA =
Pa =
PB =
Pb =
.7
.3
.5
.5
D = -.15
• Where D is positive:
Dmax = the lesser of PAPb or PaPB
• Where D is negative:
Dmin = the larger of -PAPB or -PaPb
What is our Dmax/min?
Max {-.7*.5, -.3*.5} =
Max{-.35, -.15}
45. Measures of Linkage Disequilibrium – D’
• D’= +/- 1 = complete LD
• No evidence for recombination
• Ancestral haplotype not disrupted
Problems
• D’ is inflated in small N
• D’ inflated with rare alleles
• No information on allele frequency
46. Measures of Linkage Disequilibrium – r2
• 1968 Hill & Robertson
• r2 = correlation coefficient between 2 alleles
47. Step 5: Calculate r2 PAB = .2
PAb = .5
PaB = .3
Pab = .0
PA =
Pa =
PB =
Pb =
.7
.3
.5
.5
D = -.15
Dmin = -.15
r2 = D2 / PA Pa PB Pb
r2 = -.152 / [.7*.3*.5*.5] = .43
48. Measures of Linkage Disequilibrium – r2
• r2 = 0-1
• 1= two markers give identical information
Problems
49. What can we learn from our 3 measures of LD?
• D = -.15
• D’ = 1.0
• r2 = .43
50. D’ vs r2
• Both are a measure of association with 1 being the
maximum, and indicating most LD
• BUT r2 requires equal allele frequency to be 1.
51. Perfect LD
• Equal allele frequency
• Allelic association is as strong
as possible
– 2 haplotypes observed
– No detected recombination
between SNPs
D´ = 1
r2 = 1
52. Complete LD
Unequal allele frequency
– 3 haplotypes observed
– No detected recombination
between SNPs
D´ = 1
r2 < 1
53. Calculate your own Linkage Disequilibrium
measures of D, D’ and r2
PAB = .6
PAb = .1
PaB = .2
Pab = .1
54. At the end of the day…..
Linkage disequilibrium is the non random
association of markers [SNPs] at two or more loci
….. But what does this mean for applying
genetics to public health?
(finally we get there….)
59. Inflammatory bowel disease
• Likely had many causal variants
• Heritable MZ > DZ
• 10% of those with IBD had 1 relative with IBD
• Reasonable linkage signal on Chr 5
• What could explain this structure?
63. Haplotype Map
• Add to Human Genome Project with information
on diversity
• How did HapMap and Human genome project
differ?
• ‘Chunks’ of data
8 SNPs
GGACAACC
AATTCGGG
64. “Short cuts”
A T A G T A C AT
C
A
C
A
T
G
A
G
C
G
CA
A
A
T
T
G
G
A
A
G
C
G
C
T
C
C
C
G
C
G
C
A
C
C
C
SNPs 1, 3 and 4 are TagSNPs
65. HapMap
• Launched in 2001
• Open access resource for all researchers
• In real time
• Spin off from The Human Genome Project
• Qu: What was the key difference between the HGP
and HapMap?
• Characterizes LD across the genome
• Also develop analytic tools
• Haploview
66. HapMap
“The success of the HapMap will be measured in terms of
the genetic discoveries enabled, and improved
knowledge of disease aetiology.”
67. HapMap
Mark Daly “The
community’s response
after a number of years of
struggling and to not
finding genetic factors for
complex disease”.
68. HapMap – Phase 1
• Launched in 2001; Production 2002-3
• Phase I
• Not comprehensive
• 90 Yoruba individuals
• 90 individuals of European descent
• 45 Han Chinese
• 45 Japanese
• 1,000,000 SNPs
70. HapMap – Phase I
• Released in 2005
• 1 million SNPs
• August 2006, “dbSNP included more than ten million SNPs, and
more than 40% of them were known to be polymorphic. By
comparison, at the start of the project, fewer than 3 million
SNPs were identified, and no more than 10% of them were
known to be polymorphic.”
74. Tagger
Table 7 Number of selected tag SNPs to capture all observed
common SNPs in the Phase I HapMap for the three analysis
panels using pairwise tagging at different r2 thresholds
YRI CEU CHB+JPT
Pairwise r2 ≥ 0.5 324,865 178,501 159,029
r2 ≥ 0.8
474,409 293,835 259,779
r2 = 1 604,886 447,579 434,476
75. Will tag SNPs picked from HapMap
apply to other population samples?
Population differences add very little inefficiency
(stolen slide from ASHG... I can’t source this)
CEU
Whites from
Los Angeles, CA
Botnia, Finland
CEUCEU
Utah residents with
European ancestry
(CEPH)
76. HapMap – Phases II and III
• Phase II
• >3.1 million genetic variants
• Captured 90 to 96 percent of common genetic
variation
• Phase III
• 1,301 samples from 11 populations
77. HapMap and Public Health
• How has HapMap helped us in the quest to find genes
for disorders?
78. What is next for HapMap?
• 1,000 Genomes Project
80. Goals of this lab
Part 1
1. Find HapMap SNPs near a gene.
2. View patterns of LD amongst the SNPs.
3. Select tag SNPs.
4. Download information on the SNPs for use in
Haploview.
5. Evaluate genotype data in a paper against HapMap
data.
Part 2
6. Make a file from data for use in haploview
82. Goals of this lab
Part 1
1. Find HapMap SNPs near a gene.
2. View patterns of LD amongst the SNPs.
3. Select tag SNPs.
4. Download information on the SNPs for use in
Haploview.
5. Evaluate genotype data in a paper against HapMap
data.
83. Goals of this lab
Part 1
1. Find HapMap SNPs near a gene.
>Navigate to HapMap
>Using release #27 (Pase 3) locate the LRP1 gene (hint:
it is a landmark).
>Answer questions 1-3
92. Goals of this lab
Part 1
1. Find HapMap SNPs near a gene.
2. View patterns of LD amongst the SNPs.
3. Select tag SNPs.
4. Download information on the SNPs for use in
Haploview.
5. Evaluate genotype data in a paper against HapMap
data.
99. Try to give you your learning objectives
• Primary objectives
• Describe linkage disequilibrium and a haplotype
• Explain the meaning of r2 = 1.0, r2 = .8 and r2 = .5
• Find a region of interest (ROI) on HapMap
• Locate tagSNPs for an ROI on HapMap.
• Secondary objectives
• Describe how mutations and recombination give rise to linkage
disequilibrium and haplotypes
• Calculate D, D’ and r2 by hand
• List key differences between D, D’ and r2
• Evaluate the contribution of HapMap to public health genetics
100. A
C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible SNP combinations:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G T A C C T A
C C G A C T A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 4
Haplotype 5
Haplotype 6
Haplotype 7
Haplotype 8
G
101. C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible Haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G
H
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
102. C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible Haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G
H
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
1
7
2
103. C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible Haplotypes, but 3
observed haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G
H
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
TAG
TAT
TGG
TGT
CAG
CAT
CGG
CGT
104. 1. Information about our population
• Factors that influence linkage disequilibrium:
• Genetic drift
• Mutation
• Founder effects
• Selection
• Stratification
• Factors that maintain linkage disequilibrium:
• Selection
• Non-random mating
• Linkage
• Mainstay of ‘population genetics’
105. 2. Interpretation of our findings
• Genetic association is correlational therefore, we
cannot make causal inferences
• SNP1 -> Trait
• SNP1 and SNP2 are in LD
• We don’t know which is the true causal
variant
106.
107. Linkage Disequilibrium coefficient D’
PAB = PAPB
DAB = PAB - PAPB
PAB = PAPB + DAB
Problems:
• Sign is arbitrary
• Range depends on allele frequencies
Q: Why are these problems for applied genetics in public
health?
109. A
C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible SNP combinations:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G T A C C T A
C C G A C T A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 4
Haplotype 5
Haplotype 6
Haplotype 7
Haplotype 8
G
110. C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible Haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G
H
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
111. C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible Haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G
H
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
1
7
2
112. C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible Haplotypes, but 3
observed haplotypes:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G
H
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
TAG
TAT
TGG
TGT
CAG
CAT
CGG
CGT
114. Linkage Disequilibrium coefficient D
PAB = PAPB
DAB = PAB - PAPB
Problems:
• Sign is arbitrary
• Range depends on allele frequencies
Q: Why are these problems for applied genetics in public
health?
115. S.M. Bray, J.G. Mulle, A.F. Dodd, A.E. Pulver, S. Wooding and S.T.
Warren. Signatures of founder effects, admixture and selection in the
Ashkenazi Jewish population. PNAS Early Edition (2010).
116. C T G A C T A A G T A C C G A
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
8 Possible haplotypes:
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 4
Haplotype 5
Haplotype 6
Haplotype 7
Haplotype 8
117. C T G A C T A A G T A C C G A
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
118. Measures of Linkage Disequilibrium - D
• 1960s Lewontin & Kojima
• D – unstandardized measure of how far the
association between two alleles differs from that
expected by chance
119. Then we get recombination
A
C
G
G
C C
A
C
G
G
C C
Before recombination
After recombination
A C
120. C T G A C T A A G T A C C G A
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
124. C
C G A C T A G T A C C AT
C
A
G
G
T
T G A C T A A G T A C C G A
8 Possible SNP
combinations:
C T G A C T A A G T A C C T A
C T G A C T A G G T A C C G A
C T G A C T A G
H
G T A C C T A
C C G A C T A A G T A C C G A
C C G A C T A A G T A C C T A
C C G A C T A G G T A C C G A
C C G A C T A G G T A C C T A
127. New Concept – Linkage Disequilibrium
• Linkage Disequilibrium is the tendency for 2
(or more) SNPs to be inherited together
• AATAAGCCTGATC
• ATTAAGCCTGATC
• AATTAGCCTGATC
• ATTAAGGCTGATC
128. Why is this important?
• Allows to only genotype certain SNPs of the
genome…
• ….. We can infer more than we type