4. 4
Selection: The core of Plant Breeding
• Improvement of traits is affected by selection
• Phenotypic selection has been less effective
• Direct selection for these traits is less effective as they are
controlled by large number of genes with substantial G×E
interaction
• DNA markers are being increasingly used as surrogates for
selection of genotypes for combination of desirable but
conventionally difficult-to-breed traits
Advantages of indirect selections through markers
• Off season selection
• Early selection
• Cost effective
• High throughput
• Pyramiding and stacking
• For MABC: Transfer of recessive gene (Avoids linkage drag)
• MARS
• Genomic selection
Singh A. K. and Singh B. D., 2015 Irri.org
5. Requirements for indirect selection
Reliable marker trait association
https://www.semanticscholar.org/paper/3-Association-Mapping-in-Plant-Genomes-Soto-Cerda-Cloutier/0ecf0db269a995ebb23e1f2334d7c663bc1376ea
5
How to go about
GENETIC MAPPING
Family based Mapping
• Select Parents
• Develop MP
• Genotype
• Phenotype
• Computations and
QTL identification
Population based mapping
(Association Mapping)
• GWAS
• Candidate Gene based mapping
6. Limitations of Bi Parental mapping approaches
θ Capture only those QTL alleles for which
the parents differ.
θ Require large population size
θ low resolution.
θ Longer research time.
θ Not feasible in perennial crops and
animals.
θ Suitable mostly for coarse mapping-except
BC derived populations
6
Singh A. K. and Singh B. D., 2015
8. What is Association mapping
A tool to resolve complex trait variation down to the sequence
level by exploiting historical and evolutionary recombination
events at the population level
Association mapping, also known as "linkage disequilibrium
mapping ", is a method of mapping quantitative trait loci (QTLs)
that takes advantage of historic linkage disequilibrium (LD) to
find associations between phenotypes to genotypes
Greater precision in QTL location than family-based linkage
analysis.
Can be applied to a range of experimental and non-experimental
populations.
8
9. Advantages of association mapping
• Time and cost saved in
developing mapping
populations
• High resolution
• More number of alleles
detected
• Even for small effect QTLs
9
Yu and Buckler, 2006
10. Towards a better resolution..
• GWAS can give up to 1cM of
resolution in comparison to 10cM
in DH and 5cM in RILs
• In rice, 1cM=500kb; genes are
mapped with resolution of 100 kb
(Crowell et al., 2016)
10
Sujan Mamidi, 2020
11. •Concept of Linkage Disequilibrium (LD)
Concept of LD was first described by Jennings (1917).
Term- Levontin and Kojima (1960).
Measure -D-Coefficient of LD- developed by Levontin (1964).
LD is the ‘non-random association of alleles at different loci’.
Gametic phase disequilibrium of loci in population
θ Non-Random association of alleles from loci from markers/QTLs or a marker
and QTL
B
f (B)=0.6
b
f (b)=0.4
Total
A
f (A) = 0.2
0.1/ 0.12 0.1/ 0.08 0.2
a
f (a) = 0.8
0.5/ 0.48 0.3/ 0.32 0.8
Total 0.6 0.4 1.0
11
13. Measures of Linkage Disequilibrium
Measure Formula Reference Remarks
D D=Pr(A1B1) - Pr(A1) Pr(B1)
(DABC = pABC − pADBC − pBDAC − pCDAB − pApBpC)
Weir, 1996
General account of
LD
D’ Lewontin, 1964 Unitless measure
r
Hill and Robertson,
1968
Commonly used for
bi allelic markers
δ
Levin and Bertell,
1978
Odds ratio
Devlin and Risch,
1995
13
15. Illustration
• Functional allele T is in LD with
berry number from simple
association test
• Allele C/T is in LD with functional
allele
15
Myles et al., 2009
Principle of association
mapping
16. Natural
population
size
Phenotypic
data M1 M2 M3 M4 M5
1 R 1 1 0 -1 -1
2 R 1 1 0 -1 -1
3 R -1 1 0 -1 -1
4 R 0 1 1 -1 0
5 R 0 1 1 -1 -1
6 S 1 -1 -1 1 1
7 S -1 -1 -1 1 1
8 S 0 -1 0 1 1
9 S -1 -1 -1 1 1
10 S 0 -1 1 1 1 16
M1M1 = 1
M1M2 = 0
M2M2 = -1
17. Testing the Statistical Significance of LD
Chi-square test
Fisher’s exact test (Fisher, 1935)
Likelihood ratio test
Multi-factorial permutation analysis
A threshold p-value of ≤ 0.05 is often used to declare significant
LD
These statistical methods are implemented in software: Power
Marker (Liu & Muse, 2005) and TASSEL (Bradbury et al., 2007)
and R
17
19. LD Decay plot
Zegeye, Habtemariam; Rasheed, Awais; Makdis, Farid; Badebo, Ayele; C. Ogbonnaya, Francis (2015): Linkage disequilibrium (LD) decay as a function of
genetic distance.. PLOS ONE. Figure. https://doi.org/10.1371/journal.pone.0105593.g003
19
20. Manhattan Plots: Representing GWAS results
θ Scatter plot arranged chr wise to summarize results
θ The X-axis is the genomic position of each SNP, and the Y-axis is the
negative logarithm of the P-value obtained from the GWAS mod
20
Skyscrappers
GAPIT manual, Zhiwu zhang, 2020 pp: 27
22. Mutation & Recombination
New mutation-LD with all loci-recombination causes LD to decay as new
haplotypes are created.
LD is broken down by recombination, hence blocks of LD is expected.
D is expressed in standardized units as D' or r2
r = 0.05
r = 0.5
r = 0.005
r = 0.0005
Dt+1 = (1-r) Dt
r = 0.5 for unlinked loci, so
LD decays by half each
generation
22
23. Mating system
Selfing reduces opportunities for effective recombination- LD extends much larger
distance. (low marker density- less resolution).
LD declines more rapidly in out crossing plant species-high marker density and
higher resolution is expected.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
4
7
10
13
16
19
22
25
28
31
34
37
40
Generation
D'
0.05 0.00
0.05 0.99
0.25 0.00
0.25 0.99
0.50 0.00
0.50 0.99
no linkage
99% selfing
outcrossing
r s
99% selfing
outcrossing
r s
23
24. Selection:
It generate LD between unlinked loci -“a hitchhiking” effect and
epistatic selection of the co-adapted genes.
fixation of alleles flanking a favored variant
Domestication and modern plant breeding considerably modified the
genome architecture and reduced genetic diversity-population
structure-effect on LD.
24
25. Genetic Drift and Bottleneck:
Genetic drift results in the consistent loss
of rare allelic combinations which increases
LD level (Flint-Garcia et al., 2003).
Marked reduction in the size of population
for one or more generations.
Enhance the genetic drift since few allelic
combinations are transmitted
25
26. Inferences from Linkage Disequilibrium
θ Populations with slow decay can be helpful in AM
θ Populations with long LD blocks are amenable for coarse
mapping with fewer markers
θ short LD blocks-fine mapping
26
29. Population structure
• Population structure affects LD throughout the genome.
• Population structure occurs from the unequal
distribution of alleles among subpopulations of different
ancestries.
• When these subgroups are sampled to construct a panel
of lines for AM which results different allele
frequencies creates LD-False positives.
• Subgroups within population formed due to allelic
frequency differences
• Needs to be taken care of if present in the association
panel
• Generally PCA represents structure and estimates
• STRUCTURE: Developers Pritchard Lab, Stanford
University: Identifies k clusters to which individuals are
assigned: Time consuming
29
Shi et. al., 2017
Detection: AMOVA,
Wright’s F statistic
31. Population Structure (contd..)
31
201 upland cotton germplasms of Agricultural Research Station, of University of Agricultural Sciences (UAS),
Dharwad, were used for association studies from a total of 557 available. It included indigenous, exotic, released
varieties and breeding lines
Population structure was inferred using the program fastSTRUCTURE (Raj, Stephens, & Pritchard, 2014).
23,254 polymorphic SNPs with minor allele frequencies greater than 0.05 for population structure analysis
33. θ Kinship is co-ancestry or half relatedness
θ Coefficient of kinship is the probability that the alleles of a
particular locus chosen randomly from two individuals are
identical by descent
θ Can be estimated based on pedigree information
θ Kf= ∑k∑a (fai * faj)/D
33
Another confounding effects in marker trait
association: Kinship
35. Kinship matrices
θ Marker based approaches: method of moments based
estimate
θ Kinship estimates for pairs of individuals
35
36. Statistical Models of association mapping
θ Case control approaches: affected by Q
θ Transmission disequilibrium tests
θ Structured association models
36
37. Statistical Models of association mapping
Generalized linear models: only has fixed effects
θInclude naïve models and models with fixed effect (population
structure (Q))
θTrait=Markers + Error or Trait=Markers + Population str+ Error
Mixed models: With both fixed and random effects
θ Trait=Markers + Population str+ Kinship+ Error
37
38. Overview of
models
38
GAPIT manual, Zhiwu zhang, 2020 pp: 14
θ Models like MLMM, SUPER,
FarmCPU, BLINK are feedback
models-after each regression
significant markers are used as
covariates to negate effect
39. Selecting the best model
θ Quantile-quantile (QQ) –plot assess
how well the model used in GWAS
accounts for population structure
and familial relatedness
θ Negative logarithms of the P-values
from the models fitted in GWAS are
plotted against their expected value
under the null hypothesis of no
association with the trait
39
GAPIT manual, Zhiwu zhang, 2020 pp: 26
41. Controlling false discoveries
θ Composite error rate increase in individual markers
equations
θ Bonferroni correction: αc= αE/m (to control family wise error
rate FWER)
θ Benjamini-Hochberg
θOrder m unadjusted p-values from hypothesis testing
θLet k be the largest i for which P(i)<= i/m q* (q* exp prop of FDR)
θReject all Hi for i ε (1,2,3…)
41
Null hypothesis Total
True False
Accept U
V m0
Reject T S m-m0
Total m-R R m
42. Marker effects
θ The additive effect of the variant allele is calculated as half
the difference between mean of the variant allele and mean
of the reference allele
θRef allele mean = 10
θVar allele mean = 16
θAdditive effect = 3
42
43. Variation explained and effect of allele
θ Regression R2 is used to denote the PVE
θ Likelihood ratio (sun et al., 2010) R2
LR
=
θ Log LM is likelihood of Full model
Ex: y = m1 + PCA + kinship + ε
θ Log L0 is likelihood of reduced model Ex: y = PCA + kinship + ε
43
44. Phenotyping
θ Since large population are involved, suitable strategy to
minimize G× E interaction has to be designed
θ Incomplete block designs are suitable or phenomics should
be used
θ Multi location and/ multi environment testing is suggested
θ Boxplots are one of the effective means to identify outliers
θ Normality tests - If not transformations
44
45. Success stories
θ In human, GWAS has identified SNPs linked to diabetes,
Alzheimer’s Parkinson’s, obesity and many more
θ Phenotypic variation in flowering time, endosperm
color, starch production, maysin and chlorogenic acid
accumulation, cell wall digestibility, and forage quality
were associated using SNP markers of candidate
genes
45
47. Current issues
θ Missing heritability
θ Algorithm for efficient estimation of epistasis are missing
θ Controlling false positives
θ Accurate phenotyping of panel
47
48. Limitations
θ Results of AM are affected by selection history, K, Q
θ Linkage may not be always the basis of significant LD
θ Demands large number of markers-increases cost on
genotyping
θ Rate of recombination is not uniform throughout the
genome-reduces reliability of using LD estimates
θ Low frequency alleles with larger effects cannot be detected
48
50. Materials and methods
θ The material for the study comprised 64 core set germplasm
accessions of Dolichos bean and two check varieties (HA-4 and
kadalavare (KA)) maintained at All India Coordinated Research
Project (AICRP) on pigeon pea, UAS, Bengaluru.
θ The core germplasm accessions include accessions of Indian (78%-
collected from Karnataka, Andhra Pradesh, Maharashtra, Gujarat,
Tamil Nadu and Kerala states of India), exotic (6%) and unknown
origin (16%).
θ Days to 50% flowering, primary branches per plant, racemes per
plant, raceme length, nodes per raceme, pods per plant, fresh pod
yield per plant, fresh seed yield per plant and 100-fresh seed weight
50
51. Genotyping
θ Those core germplasm accessions were genotyped with a total of
234 SSR markers which included 198 in-house developed SSR
markers and 36 transferable cross legume species/genera SSR
markers.
θ The population structure of Dolichos bean core set was worked
out using 95 polymorphic SSR markers
θ K and Q were estimated from STRUCTURE 2.3.2
θ To confirm population structure AMOVA and Wright’s F statistic
was performed
θ Marker trait linkage: TASSEL 3.1
51
54. θ Objective: To identify the QTLs related to Soybean protein and oil
content
θ Population size: 185 diverse soybean germplasm accessions ( china,
America, Canada, Japan and some European countries)
θ Genotyping approach: Whole genome sequencing using SLAF-seq
approach (12,072 SNPs used)
θ Phenotyping for 2 years: Protein and oil content (Infratec 1241 NIR Grain
Analyzer)
Population structure
evaluation
PCA was used to assess the
population structure using
the GAPIT package
55. Association mapping
θ Compressed mixed linear model (cMLM) in GAPIT: based
on SNPs from the 185 soybean accessions and ~12k SNPs
θ A P-value of 0.001 constituted the Type I error significance
threshold
59. In summary..
θ 9 of 23 SNPs were in line with previous reports
θ Due to overlapping detection of some SNPs in protein and oil
content, pleiotropy could be suspected
59
60. 60
Materials and Methods
229 accessions from a worldwide B. napus collection were divided into two panels of 96 and 133 accessions
Bio chemical analysis : Throuh HPLC analysis tocopherol content and composition(ATC, γ-tocopherol content
(GTC), and δ-tocopherol content (DTC) and the tocopherol composition was expressed as the ratio of α- and γ-
tocopherol (AGR)) was assessed
Glucosinolate (GSL), seed oil (SOC), and seed protein (SPC) contents were also assessed by near-infrared
spectroscopy (NIRS)
61. Genotyping
θ The 13 tocopherol candidate genes
(BnaX.VTE1.a, BnaX.VTE1.b, BnaA.VTE2.a, BnaX.VTE2.b, BnaX.VTE3.a, BnaX.VTE
3.b, BnaA.VTE4.a, BnaX.VTE4.b, BnaX.VTE4.c, BnaC.VTE5, BnaX.PDS1.a, BnaX.PD
S1.b, and BnaA.PDS1.c) were identified by BAC library screening and characterized
by functional and mapping approaches (Fritsche et al; Wang et al.).
θ Population structure: assessed by 31 publicly available genome-wide microsatellite
SSR markers (Cheng et al., 2009)
θ Principal component analysis (PCA) was performed based on SSR markers data and the first and
second principal component was used (D matrix) for the association analysis.
θ Kinship matrices also calculated
61
62. LD and association analysis
θ R2 values of LD and corresponding p-values for all loci pairs were calculated
using the software R
θ The two models, general linear model (GLM) and PK-mixed model, were
used to analyze associations between polymorphic sites and the traits in
panel 1
θ Identification of Polymorphisms within Tocopherol Genes
θ Among 13 gene, specific primer pairs yielding high-quality sequences were developed
for only nine genes
θ remaining four candidate genes had poor sequence quality
θ Setting a threshold of 5%, they found polymorphic sites in only two candidate genes
(BnaA.PDS1.c, BnaX.VTE3.a) whereas low polymorphic sites (frequency < 5%) were
detected in three genes. They found no polymorphisms in the amplified fragments of
the remaining four genes
62
66. 66
They concluded that the polymorphisms within the tocopherol genes
clearly impact tocopherol content and composition in B. napus seeds.
Hence suggest that these nucleotide variations may be used as
selectable markers for breeding rapeseed with enhanced tocopherol
quality.
My flow of seminar goes like this;
At first I am introducing about the concept of GS
Then the process of Genomic selection; it includes training population, its genotyping and phenotyping, estimation of GEBV’s afterwards
I will explain the Insights into the GS. Then I will tell about the applications of GS i.e., where and when we need to apply the GS in the plant breeding scenario. And I will cover about the research studies on GS and
Finally I will conclude my seminar.
Coming to the introduction. We the plant breeders our major goal is to breed for novel traits and genotypes. We all know Plant breeding is an art and science of improving the genetic make up of crop plants.
During crop breeding we need to carry out different activities. i.e. we need to create the variability its by natural or by artificial means. Naturally through domestication i.e. bringing wild species under human management. Germplasm collection from different countries or locations and introduction of cultivar from a new area where it is not grown earlier. Through artificially we will hybridize between the plants, we will do mutation and polyploidy and we will induce variation in clonally propagated crops means it is somaclonal variation and if we use recombinant DNA technology means it is genetic engineering.
After creating the variability we need to select the right variability which we need for the improvement of particular trait. Here selection is a key step.
It plays a crucial role for the plant breeders. There are two types of selection 1. Natural selection i.e selection by the nature here nature selects based on the survival of the fittest principle. It was proposed by Charles Darwin the other one is artificial selection here selection is by human i.e we select based on our experience and phenotypic observations of a particular trait.
Over the time Hazel and lush given a concept called selection index in this linear combination of characters associated with a particular trait we need to select. It is more reliable compared to single trait selection and increases aggregate genetic gain.
But mere selecting based on phenotype is not precise it may mislead. There may be a chances of selection of not desirable traits or individuals. So, markers were developed.
Conventional selection is based on phenotype so it is called phenotypic selection, here the environment having drastic impacts
Breeders choose good offspring using their experience and the observed phenotypes of crops, so as to achieve genetic improvement of the target traits.
There he considering one trait at a time.
So, in 1942 Hazel and Lush proposed the selection index method, which uses a total score to select for multiple traits simultaneously. It improve the aggregate genetic gain.
With the development of computer science, genetic evaluation methods for analysis of multiple traits combinely also developed.
In 1990’s markers were came to the rescue of plant breeders. As we all known that these molecular markers are the land marks on the chromosome which is use to track the dynamic trait of our interest.
MM are not Crop stage specific
They have simple inheritance and
They are environmentally neutral so they are very effective for selection instead based only on the phenotype
Markers are surrogates for the trait of our interest. If we select any genotype based on the markers data its called as MAS
And this MAS is suitable for traits controlled by small number of major genes.
But most economic traits of crops are complex and affected by a large number of genes, which is having small effect.
SO MAS has also some limitations i.e.
it is effective for only major gene/QTL
Success achieved with only qualitative traits
MAS does not identify minor QTL effects
But, most traits are quantitative in nature and contain both large and small effect QTLs
So, MAS for QTs and small effect QTL has resulted in less genetic gain
So, there is a need for other method which overcomes all these limitations
The method which rectifies all the limitations of MAS is Genomic selection
Principles of linkage disequilibrium and association mapping. a. Linkage disequilibrium. Locus 1 and Locus 2 present an unusual pattern of association between alleles A-G and T-C, which deviate from Hardy-Weinberg expectations, but without any statistical correlation with a phenotype. b. Association mapping. Locus 1 and Locus 2 are in LD. Significant covariance with the seed colour phenotype is considered evidence of associationre linked in
QTLs with larger effect size are identified with high power
Historic LD
Concentrate also on allele number on the other axis
LD or linkage equilibrium are pop genet terms denoting likelihood of co occurrence of alleles at different loci in a population. Linkage is physical occurrence of loci;
linkage equilibrium is random association of alleles at different loci (independent of others) as product law;
Hypothetical scenarios of LD between linked polymorphisms caused by different mutational and recombinational histories. The starting population has only two haplotypes; AG at locus 1 and TT at locus 2. Mutation later occurs at locus 2 with “T” being replaced by “C” in some cases. (A) shows maintenance of LD due to lack of recombination between loci 1 and 2 in generations following mutation, and (B) is a situation contingency table shows the haplotype counts. Absolute LD exists when two loci share a similar mutational subsequent paragraphs.
For higher order LD
D’ is unitless measure but we cannot infer in between values but only 0 and 1
r is (delta sq) is sq of correlation coefficient and is more reliable: mostly used: Significance is tested through chi sq or fisher’s exact test
In odds ratio, if all the allele frequency are same, we get odds ration numerator as 0
A Fictional Depiction of a Simple Genotype-Phenotype Association Test.The functional SNP responsible for variation in berry number in grapevine is in gray and is not genotyped. The genotyped SNPs lie on either side of the functional SNP. The genotyped SNP to the right is in high LD with the functional SNP, while the genotyped SNP to the left is not in LD with the functional SNP. The results of a simple association test (Pearson correlation) are shown in the bottom box. The C allele of the high LD SNP is significantly associated with berry number (P = 0.037), while there is no significant association for the low LD SNP (P = 0.77)
The ultimate aim of most mapping studies is to identify the functional genetic variants, or the quantitative trait nucleotides, that are responsible for phenotypic variation. Current data sets are commonly obtained using genotyping microarrays and often consist of hundreds of thousands of genotypes from hundreds or even thousands of individuals. Even with such large numbers of markers, however, it is unlikely that sought after functional variant(s) will be among the markers genotyped. (Though in the near future, data sets will likely contain [nearly] all variants!) The experimenter often can only hope that genetic markers that are in strong linkage disequilibrium (LD) with the functional variant(s) have been genotyped. LD refers to the correlation between polymorphisms in a population. Thus, the genotyped markers become proxies, or sentinels, for the functional variant because their genotypes are highly correlated with the genotypes of the functional variant. The power of an association study depends on the strength of this correlation (i.e., on the degree of LD between the genotyped marker and the functional variant). Figure 1 depicts a scenario in which two markers have been genotyped at a locus, one of which is associated with the phenotype and is in LD with the functional variant and one of which is not in LD with the functional variant and is therefore not associated with the phenotype. In general, the strength of the correlation between two markers is a function of the distance between them: the closer two markers are, the stronger the LD. The resolution with which a QTL can be mapped is a function of how quickly LD decays over distance. Therefore, the first step in the design of an association study is an analysis of the structure of LD in the population under study. The decay of LD has been shown to differ dramatically
In wheat 21 chromosomes measured with 54 SNPs. Bottom half: P values top half R2 values
LD decay plot is scatter plot; for θ = 10-8/bp, it requires 693 gens to attain half LD which are 100 kb apart, for genes 1kb apart- it takes 69,315 generations for half decay
More the rate of recombination, faster the decay
Represented by Decay curve; also gene conversion leads to decay in LD eg: Chr X in different populations of drosiphila where though recombination are less, Langely et al observed evidence for large amount of recombination
Large peaks in the Manhattan plot (i.e., “skyscrapers”) suggest that the surrounding genomic region has a strong association with the trait
These skyrockers are in LD with surrounding region
Both LD and Assn mapping are affected by variety of genetic and demographic factors. Demographic factors means related to population structure. Some of these factors like recent ocurrence of mutation, self pollination, population structure, kinship, genetic drift, selection, admixtures, epistssis etc.. Increases the LD . But factores like high recombination rate, high mutation rate and gene conversion etc reduces LD.
Immediately after a mutation occurs, it is in LD with all other loci.
In successive generatbions, recombination causes LD to decay as new haplotypes are created.
Decline in the LD over time with different values of recombination rates.
Most of the polymorphisms we observe are old; many generations are required for allele frequencies to rise to a frequency at which we detect them.
Therefore, most pairs of polymorphic loci show little LD originating from mutation unless closely linked.
Immediately after a mutation occurs, it is in LD with all other loci.
In successive generations, recombination causes LD to decay as new haplotypes are created.
Most of the polymorphisms we observe are old; many generations are required for allele frequencies to rise to a frequency at which we detect them.
Therefore, most pairs of polymorphic loci show little LD originating from mutation unless closely linked.
Blue line representd….LD values between adjacent genes, and it reaches a maximum value in the vicinity of the gene under selection.
The red line represents the changes in diversity of genes. Diversity is low at the gene under selection but the effect of selection extends to the surrounding genes.
Blue line representd….LD values between adjacent genes, and it reaches a maximum value in the vicinity of the gene under selection.
The red line represents the changes in diversity of genes. Diversity is low at the gene under selection but the effect of selection extends to the surrounding genes.
GWAS: comprehensive approach for systematically search genome for casual genetic variation, a large number of markers are tested for association with various complex traits and previous info on candidate gene is not required
CGAM: Candidate genes are selected based on the prior knowledge from mutational analysis, biochemical pathway or linkage analysis. An independent set of markers are needed to be scored to infer genetic relationships, low cost, hypothesis driven but miss unknown loci
Shi et al, 2017 analysed pop str in 343 spinach genotypes
STRUCTURE developed by Pritchard lab, stansfold university
Latest version is 2.3.4
Population structure is defined by the organization of genetic variation and is driven by the combined effects of evolutionary processes that include recombination, mutation, genetic drift, demographic history, and natural selection
Identical by descent: the identical alleles from two individuals arise from the same allele in an earlier generation
Where ∑k∑a (fai * faj)/D is the sum of all loci and all alleles;
fai is the frequency of allele a in population i, faj is the frequency of allele a in population j, D is the number of loci.
Sum across loci and alleles; here, 3 loci and 4 alleles, if we take b in pop 1, b=1 and in pop2 b=0
In pop1 B=0 and B=1 in pop2
A relatedness values of 0.19 indicates, if two alleles are chosen at random there is 19% chance that they are identical by descent
But usually markers give identical by state
In TDT, all possible genotypes for a loci are crossed and only heterozygotes are tested for the trait transfer to the progeny. In absence of LD, transmitted to non transmitted ratio is 1:1
A distorted ratio indicates LD
y =Xα +Qβ+Kγ+ ε where y is a vector for phenotypic observations α is the fixed effects related to the SNP marker β is a vector of the fixed effects related to the population structure, γ is a vector of the random effects related to the relatedness among the individuals, ε is a vector of the residual effects. X is genotypes of the SNP markers, Q is the matrix of the subpopulation, K is the kinship matrix
The variances of the random effects were estimated as Var(u) = 2KVg and Var(e) = IVR, where K is a kinship matrix, I is an identity matrix, Vg is the genetic variance, VR is the residual variance.
Quantile-quantile (QQ) –plot of P-values. The Y-axis is the observed negative base 10 logarithm of the P-values, and the X-axis is the expected observed negative base 10 logarithm of the Pvalues under the assumption that the P-values follow a uniform[0,1] distribution. The dotted lines show the 95% confidence interval for the QQ-plot under the null hypothesis of no association between the SNP and the trait.
Decision Reviev System
BH paper in was one of the most cited papers in statistics
Controls error rate expectations that probabilities FDR=E(FDP)
Selection of q* is ambiguous
In the presence of marker, what is the likelihood ratio of
Phenotype is the major player in GWAS
Objective: To know SSR markers linked to 9 productivity traits
In AMOVA if within population variance is more-presence of population structure
Significance of wright’s F stats indicates genetic structure
Soy bean research institute Shenyang
Genotyping using Illumina genome analyser
PCA of population structure. (A) Distribution of the accessions in the association panel under PC1 and PC2. (B) The genetic variation explained by the first ten 10PCs.
AM has advantages of more number of alleles detected, fine mapping, less time requirement etc but bi parental mapping is effective in capturing minor alleles with major effects
Thus, for an effective crop improvement strategy, AM along with family based approaches will identify potential markers associated with targeted trait, we can use them for mas