Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Presentation1population neutral theory
1.
2.
3. The neutral theory
• The neutral theory and its predictions
for levels of polymorphism and rates of
divergence.
• The nearly neutral theory
1
4. There is much genetic variation
within
almost all species. The amount of
genetic variation is too much to be
maintained by selection
5. Natural Selection view of Evolution
► Mutations arise by chance
(meaning the mutations
are not directed to match environmental needs).
► Favorable (= higher fitness) mutations
increase in frequency via selection (changed
fitness often associated with changed
environment).
► Deleterious (= lower fitness) mutations are
reduced in frequency (many resistance mutations
are deleterious unless toxic agent present).
6. New Idea
►The neutral theory of evolution developed
by Motoo Kimura
►The neutral theory departed from all existing models by
using N, the population size, as the most important
population parameter.
►What is the neutral theory of evolution?
7. The Neutral Theory
► 1) There are no fitness differences between
almost all of the molecular variation that is
detected in populations.
►
Neutral is the word chosen to describe the lack of
fitness differences ( functionally equivalent alleles ).
► 2) Amount of genetic variation in a population is
determined by a balance between an increase due
to mutation, rate = μ, and a decrease due to finite
population size (=genetic drift).
8. The neutral theory now forms the basis of the
most widely employed null model in molecular
evolution.
The neutral theory adopts the perspective that
most mutations have little or no fitness
advantage or disadvantage and are therefore
selectively neutral.
Genetic drift is therefore the primary
evolutionary process that dictates the fate
(fixation or loss) of newly occurring
mutations.
9. In the 1950s and 1960s, it was widely
thought that most mutations would have
substantial fitness differences and
therefore the fate of most mutations was
dictated by natural selection.
Motoo Kimura argued instead that the
interplay of mutation and genetic drift
could explain many of the patterns of
genetic variation and the evolution of
protein and DNA sequences seen in
biological populations.
10. The neutral theory null model makes two
major predictions under the assumption
that genetic drift alone determines the
fate of new mutations.
One prediction is the amount of
polymorphism for sequences sampled within
a population of one species.
The other prediction is the degree and
rate of divergence among sequences
sampled from separate species.
11. Divergence Fixed genetic differences that
accumulate between two completely isolated
lineages that were originally identical when
they diverged from a common ancestor.
Polymorphism The existence in a population
of two or more alleles at one locus.
Populations with genetic polymorphisms
have heterozygosity, gene diversity, or
nucleotide diversity measures that are
greater than zero
12. The balance of genetic drift and mutation determines
.polymorphism in the neutral theory
More alleles segregating in the population
indicate more polymorphism. Segregating alleles,
and therefore polymorphism, result from the random
walk in frequency that each mutation takes under
genetic drift.
13. The neutral theory then predicts that the rate of fixation is μ
and therefore
the expected time between fixations is 1/μ generations. For
that subset of mutations that eventually fix, the expected time
from
introduction to fixation is 4Ne generations
14. chance of eventual fixation is
chance of eventual loss is
average time to fixation of a new mutation
approaches 4N generations
average time to loss approaches just
combined processes of mutation and genetic drift
produce equilibrium heterozygosity:
16. With neutral mutations most mutations
go to loss fairly rapidly and a few
mutations eventually go to fixation
17. balancing selection greatly increases the
segregation time of alleles and increases
polymorphism compared to neutrality
18. The neutral theory also predicts the rate of
divergence between sequences. Genetic
divergence occurs by substitutions that
accumulate in two DNA sequences over time.
19. As substitutions accumulate, the two sequences
diverge from the ancestral sequence as well as from each other.
In this example, the two sequences are eventually divergent at
five of 16 nucleotide sites due to substitutions.
20. Substitution
The complete replacement of one allele previously
most frequent in the population with another allele
that originally arose by mutation.
The neutral theory predicts the rate at which allelic
substitutions occur and thereby the rate at which
divergence occurs. Predicting the substitution rate
for neutral alleles requires knowing the probability
that an allele becomes fixed in a population and the
number of new mutations that occur each
generation.
21. Initial frequency of a new mutation is
Under genetic drift, the chance of fixation of
any neutral allele is simply its initial
frequency
chance that an allele copy mutates is μ
the expected number of new mutations in a population each
generation is 2Nμ.
22. the rate at which alleles that originally
entered the population as mutations go
to fixation per generation
Notice that this equation simplifies to k = μ
Since the rate of neutral substitution is μ, the expected time
between neutral substitutions is 1/μ generations
Using a clock that chimes on the hour as an example,
the rate of chiming is 24 per day (or 24/day).
23. NEARLY NEUTRAL THEORY
The nearly neutral theory considers the fate of
new mutations if some portion of new
mutations are acted on by natural selection of
.different strengths
The nearly neutral theory recognizes three
categories of new mutations:
Neutral mutations, mutations acted on
strongly by either positive or negative natural
selection, and mutations acted on weakly by
natural selection relative to the strength of
genetic drift. This last category contains
mutations that are nearly neutral since
neither natural selection nor genetic drift will
.determine their fate exclusively
24. For a new mutation in a finite population
that experiences natural selection, the forces
of directional selection and genetic drift
oppose each other.
Genetic drift causes heterozygosity to decrease
.at a rate of 1/2Ne per generation
The selection coefficient (s) on a genotype
describes the “push” on alleles toward
fixation
or loss due to natural selection.
chance of fixation is approximately 2s
25. Setting these forces equal to
each other,
gives
4Nes = 1 as the condition where the processes of
genetic drift and natural selection are equal. When
4Nes is much greater than one natural selection is
the stronger process whereas when 4Nes is much less
than one genetic drift is the stronger process.
Using more sophisticated mathematical techniques
Probability of fixation for a
New mutation in a finite
population is
26. Under the nearly neutral theory the probability of fixation
depends on the balance between natural selection and
genetic drift, expressed in the product of the effective
population size and the selection coefficient (Nes).
27. Measures of divergence and
polymorphism
• Measuring divergence of DNA sequences.
• Nucleotide substitution models correct
divergence estimates for saturation.
• DNA polymorphism measured by number of
segregating sites and nucleotide diversity.
2
28. The smallest possible unit of the genome is a
homologous nucleotide site, or single base-pair
position in the exact same genome location, that
could be compared among individuals.
Genetic variation at such nucleotide sites is
characterized by the existence of DNA sequences
that have different nucleotides and is called
nucleotide polymorphism.
p Distance The number of nucleotide sites
that differ between two DNA sequences
divided by the total number of nucleotide
sites, a shorthand for proportion-distance.
Sometimes symbolized as d for distance.
29. SATURATION
Saturation is the phenomenon where DNA
sequence divergence appears to slow and
eventually reaches a plateau even as time
since divergence continues to increase.
Saturation in nucleotide changes over time
is caused by substitution occurring multiple
times at the same nucleotide site, a
phenomenon called
multiple hit substitution
31. There are a wide variety methods to correct the
perceived divergence between two DNA sequences
to obtain a better estimate of the true divergence
after accounting for multiple hits. These correction
methods are called nucleotide substitution
models and use parameters for DNA base
frequencies and substitution rates to obtain a
modified estimate of the divergence between two
DNA sequences. The simplest of these is the Jukes
and Cantor (1969) nucleotide-substitution model,
named for its authors.
32. The three types of event that a single
nucleotide site may experience over two
generations.
33. probability of a nucleotide substitution
is customarily represented by
probability of any substitution is
probability that a nucleotide
stay the same(e.g G) one
generation later is
probability of no substitutions over two
generations is
34. The probability that a nucleotide site retains its original base pair
under the Jukes–Cantor model of nucleotide
substitution.
35. probability of a multiple hit
nucleotide substitution which
restores the initial nucleotide
Probability that a nucleotide site has
the same bp after two generations:
36. The change in the probability that a given nucleotide
is found at a site over one generation is then
which then simplifies to
If we consider the rate of change at any time t
37. the term approaches zero so that
PG(t) approaches ¼
If the nucleotide at a site is initially a G,
then PG(t) = 1 and the probability
the site remains a G over time is
is not initially a G then PG(t) = 0. The probability
the site remains a G over time is
38. two DNA sequences originally identical by
descent at every nucleotide site at time 0, at
some later time t the probability that any site
will possess the same nucleotide is
The exponential term is now because
there are two DNA sequences
39. The probability that two sites are
different or divergent
– call it d – over time is one minus
the probability that sites are identical
natural logarithm of the right side
40. For two DNA sequences that were originally
identical by descent, we expect that each site
has a 3αt chance of substitution since there
are two sequences, there is a 6αt chance of a
site being divergent between the two
sequences
If we set expected divergence K = 6αt, then we
notice K is close to the 8αt above. In fact, K is
3/4 of the expression for 8αt
41. Imagine two DNA sequences that differ at 1 site in 10 so
the p distance is 10% or d = 0.10. This level of observed
divergence is an under-estimate because it does not
account for multiple hits. To adjust for multiple hits
we compute corrected divergence as
which shows that at the low apparent divergence
of 10% there are expected to be 0.7% of sites that
had experienced multiple hits.
42.
43. Variable DNA sequences at one locus within a
species represent different alleles that are
present in the population.
construct a multiple sequence alignment so
that the homologous nucleotide sites for each
sequence are all lined up in the same columns
One measure of DNA polymorphism is the
number
of segregating sites, S. A segregating site is
any of the L nucleotide sites that maintains
two or
more nucleotides within the population
44. by dividing the number of segregating
sites by the total number of sites:
The number of segregating sites (S) under
neutrality
is a function of the scaled mutation rate 4Neμ.
Watterson (1975) first developed a way to estimate
θ from the number of segregating sites observed in
a sample of DNA sequences. The expected number
of
segregating sites at drift–mutation equilibrium can
more easily be determined using the logic of the
coalescent model
45.
46. Under the infinite sites model of mutation, each
mutation that occurs increases the number of
segregating sites by one. The expected number of
segregating sites is therefore just the expected
number of mutations for a given genealogy
the expected number of mutations in one
generation is kμ
If the expected time to coalescence
for k lineages is Tk, then kμTk mutations
are
expected for each value of k.
47. The expected number of mutations is
obtained by summing over all k between the
present
(and the most recent common ancestor
(MRCA
the probability of k lineages
coalescing is
the expected time to coalescence
is the inverse
48. expected number of segregating sites
in a sample of n DNA sequences
Notice that θ = 4Neμ can be
substituted in equation to give
and then rearranging
49. An estimate of the scaled mutation rate
determined from the number of segregating
sites in a sample of DNA sequences is
symbolized as (Wfor Watterson) or
(S for segregating sites). If
we define a new variable,
Then
using the absolute number of
segregating sites
50. A second measure of DNA polymorphism is the
nucleotide diversity in a sample of DNA
sequences,
symbolized by π (pronounced “pie”), and also
known as the average pairwise differences in a
sample of DNA sequences
The nucleotide diversity is the sum of the
number of nucleotide differences seen for each
pair of DNA sequences
51. where i and j are indices that refer to individual
DNA sequences, dij is the number of nucleotide sites
that differ between sequences i and j, and n is the
total number of DNA sequences in the sample
52. In larger samples that may include multiple identical
DNA sequences, the nucleotide diversity can be
estimated by
where pi and pj are the frequencies of alleles i and j,
respectively, in a sample of k different sequences that
each represent one allele.
Estimates of nucleotide diversity are useful because π is a
measure of heterozygosity for DNA sequences