SlideShare a Scribd company logo
1 of 41
Download to read offline
Genome-wide Association Mapping
Avjinder Singh Kaler
PhD Candidate
Department of Crop, Soil, and Environmental Sciences
University of Arkansas
Nov-15-2016
Plant Breeding Lecture
Identify genomic regions associated with
phenotypes
Phenotypic Data
• Flowering time
• Plant height
• Yield
• Phenotype Variation
• Phenotypes are response
variables
Genotypic Data
• Genomic markers that span the
entire genome
• Single nucleotide
polymorphisms (SNPs) are
commonly used as markers
• Markers are explanatory
variables
Functional Diversity: Phenotype
Plant Height Seed Color
Genetic Architecture of Complex Traits
Phenotype
Genotype Environment
P = G + E + GE
How do we connect genotype to phenotype?
Functional Diversity: Phenotype Variation
• Few recombination events, resulting in relatively low mapping resolution
• Historical recombination events and natural genetic diversity, resulting in high
mapping resolution
GWAS based on Linkage Disequilibrium (LD)
• LD is the non-random correlation or association of alleles at
two loci
• D, D′ (normalized), and r2 are commonly used summary
statistics to estimate pairwise LD
• r2 is preferred in association studies because it is more
indicative of how markers might correlate with QTL
Visualize extent of LD between pairs of loci
LD Decay LD Block (Haplotype View)
Genome-wide association study (GWAS)
• Identify genomic regions associated with a phenotype
• Fit a statistical model at each SNP in genome
• Use fitted models to test H0: No association with SNP
and phenotype
Associating SNPs with phenotypes
• At each SNP: Conduct a test of association with trait
• Significant SNP/trait association suggests:
– SNP has direct biological function (functional polymorphism)
– SNP in LD with functional polymorphism(s)
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
A/C T/C G/A A/G G/T
Genetic diversity can lead to false positives in a GWAS
• Two sources for false positives:
– Population structure—allele frequency differences among individuals due to local
adaptation or diversifying selection
– Familial relatedness—allele frequency differences among individuals due to recent co-
ancestry
Genetic Diversity of 2,815 Maize Inbreds
Principal Coordinate 1
PrincipalCoordinate2
Romay et al. (2013)
Controlling False Positives due to Population
Structure
• STRUCTURE (Q)
• Identify different subpopulations within a sample of individuals
collected from a population of unknown structure
• Estimating Q- matrix
• Time Consuming
• Principle Component Analysis
• Fast and effective approach to diagnose population structure
• PCA summarizes variation observed across all markers into a smaller
number of underlying component variables
• Estimating PCs-matrix
Principle Component Analysis
•Scree plot –shows the
fraction of total variance
in the data explained by
each PC
•PCs selected based on the
L-curve
Controlling False Positives due to Familial
relatedness
•A kinship coefficient (F) is the probability that two
homologous genes are identical by descent
•Kinship from genetic markers is an estimate of relative
kinship that is based on probabilities of identical by
state
Mixed models reduce false positives in GWAS
• (Line1,…, Linen) ~ MVN(0, )
• K = kinship matrix
• εi ~ i.i.d. N(0, )
Phenotype of ith
individual
Grand Mean
Fixed effects: account
for population
structure
Marker effect
Observed SNP alleles
of ith individual
Random effects:
account for familial
relatedness
Random error
term
Yu et al. (2006)
Measures relatedness between
individuals
Association Mapping Pipeline
Germplasm Selection
•Choice of germplasm is critical to the success of the
association analysis
•Phenotyping
•Design Experiment
• Collection of high quality phenotypic data
Phenotypic Outliers
•Outliers are “unusual” data points that substantially
deviate from the mean and strongly influence
parameter estimates
•Should ALWAYS check for outliers in our data sets
• Do NOT ignore outliers if detected
Phenotypic Outliers
• Outliers can
• increase error variance
• reduce the power of statistical tests
• distort estimates
• decrease normality if non-randomly distributed
• Potential Causes of Outliers
• Human errors in data collection, recording, or entry
• Technical errors from faulty or non-calibrated phenotyping equipment
• Intentional or motivated mis-reporting such as “speed” phenotyping in a
hot field environment
Evaluate Data for Outliers
•Histogram
•Box-plot (Box and Whisker plot)
•Quantile-Quantile plot – graphical method for
comparing two probability distributions to assess
goodness-of-fit
Get to know your data!
Statistical Identification of Outliers
•Cook’s distance – measures influence of a data point.
Data points that substantially change effect estimates.
•Deleted studentized residuals – measures leverage of
a data point. Data points that affect least squares fit.
Two of several possible methods
Removal of Outliers
•Removing anomalous data points from data sets is
controversial to some folks.
•If outliers are not removed, inferences made from the
fitted model may not be representative of the
population under study.
•If you remove outliers, then be sure to report it in the
manuscript.
Non-Normal Trait Data
•When fitting a mixed model, two very important
assumptions are that the error terms follow a normal
distribution and that there is a constant variance.
•When data are non-normal, these two assumptions in
particular could be violated.
Analysis of Non-Normal Trait Data
•Generalized linear mixed models can be used to
analyze non-normal data
•The Box-Cox procedure can be used to find the most
appropriate transformation that corrects for non-
normality of the error terms and unequal variances.
Box-Cox Transformation
Association Mapping Pipeline
Genotyping
• SNPs most commonly used in association mapping
Genotype-Quality Control
• Removing the monomorphic markers
• Markers with Minor allele Frequency < 5% or < 3%
• Markers with high missing rate (e.g. > 10%)
• Imputation for missing data (LD-kNNi, FILLIN, FSHAP,
BEAGLE)
Controlling False Positives
• Population structure—allele frequency differences among individuals
due to local adaptation or diversifying selection
• Familial relatedness—allele frequency differences among individuals
due to recent co-ancestry
• If not properly controlled both can cause spurious associations in
GWAS
Controlling False Positives
• Population structure
• STRUCTURE (Q-matrix)
• Principle Component Analysis (PCs-matrix)
• Familial relatedness
• Kinship matrix
Association Mapping Pipeline
Mixed models reduce false positives in GWAS
• (Line1,…, Linen) ~ MVN(0, )
• K = kinship matrix
• εi ~ i.i.d. N(0, )
Phenotype of ith
individual
Grand Mean
Fixed effects: account
for population
structure
Marker effect
Observed SNP alleles
of ith individual
Random effects:
account for familial
relatedness
Random error
term
Yu et al. (2006)
Measures relatedness between
individuals
What is a significant association?
• Bonferroni correction –procedure to control the family-wise error rate
(i.e., probability of making one or more type I errors)
– Simplest and most conservative method to control FWER
– Calculated as α/n, when nis number of hypotheses (i.e., SNPs tested)
• False Discovery Rate –procedure to control the expected proportion of
false discoveries
– Less stringent than Bonferroni
– q-value is the FDR analogue of p-value e.g., q=0.10 is 10 false discoveries/100
tests
• Use list of p-values from ALL SNP tests as input to R function p.adjust
or packages qvalue, fdrtool, … others
Slide adapted from Prof. Jim Holland
Genome-wide Association Mapping Results
Manhattan plot: summarize GWAS results
Genome-wide Association Mapping Results
QQ-plot: assess performance of Statistical model
Simple Model without correcting for population structure Mixed Linear Model
Genome-wide Association Mapping Results
GWAS results for all SNPs that were analyzed
Software for GWAS
• TASSEL
• GAPIT
• PLINK
• GEMMA
• FARMCPU
• JMP Genomics
• https://omictools.com/gwas-category
• Tutorials
– http://www.slideshare.net/AvjinderSingh/basic-tutorial-of-association-mapping-
by-avjinder-kaler
– http://www.slideshare.net/AvjinderSingh/tutorial-for-association-mapping-with-
farm-cpu

More Related Content

What's hot

Genomic selection
Genomic  selectionGenomic  selection
Genomic selection
pandadebadatta
 
TILLING & ECO-TILLING
TILLING & ECO-TILLINGTILLING & ECO-TILLING
TILLING & ECO-TILLING
Rachana Bagudam
 

What's hot (20)

QTL mapping for crop improvement
QTL mapping for crop improvementQTL mapping for crop improvement
QTL mapping for crop improvement
 
Qtl mapping
 Qtl mapping  Qtl mapping
Qtl mapping
 
Genotyping by sequencing
Genotyping by sequencingGenotyping by sequencing
Genotyping by sequencing
 
Genomic selection for crop improvement
Genomic selection for crop improvementGenomic selection for crop improvement
Genomic selection for crop improvement
 
Mapping and QTL
Mapping and QTLMapping and QTL
Mapping and QTL
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breeding
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Mapping population ppt
Mapping population pptMapping population ppt
Mapping population ppt
 
Genomic selection
Genomic  selectionGenomic  selection
Genomic selection
 
Mapping population
Mapping populationMapping population
Mapping population
 
MAPPING POPULATIONS
MAPPING POPULATIONS MAPPING POPULATIONS
MAPPING POPULATIONS
 
TILLING & ECO-TILLING
TILLING & ECO-TILLINGTILLING & ECO-TILLING
TILLING & ECO-TILLING
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Association mapping in plants
Association mapping in plantsAssociation mapping in plants
Association mapping in plants
 
Marker Assisted Selection in Crop Breeding
 Marker Assisted Selection in Crop Breeding Marker Assisted Selection in Crop Breeding
Marker Assisted Selection in Crop Breeding
 
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
 
MAGIC populations and its role in crop improvement
MAGIC populations and its role in crop improvementMAGIC populations and its role in crop improvement
MAGIC populations and its role in crop improvement
 
Genetic diversity analysis
Genetic diversity analysisGenetic diversity analysis
Genetic diversity analysis
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Association mapping
Association mapping Association mapping
Association mapping
 

Similar to Genome wide association mapping

DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
DivyanshGupta922023
 

Similar to Genome wide association mapping (20)

Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptx
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in rice
 
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
 
Strategies for mapping of genes for agronomic traits in plants
Strategies for mapping of genes for agronomic traits in plantsStrategies for mapping of genes for agronomic traits in plants
Strategies for mapping of genes for agronomic traits in plants
 
3UnitGeneMapping.pptx
3UnitGeneMapping.pptx3UnitGeneMapping.pptx
3UnitGeneMapping.pptx
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Biometry for 2015.ppt
Biometry for 2015.pptBiometry for 2015.ppt
Biometry for 2015.ppt
 
Basics of association_mapping
Basics of association_mappingBasics of association_mapping
Basics of association_mapping
 
Linkage analysis
Linkage analysisLinkage analysis
Linkage analysis
 
assessment of poly genetic variations and path co-efficient analysis
assessment of poly genetic variations and path co-efficient analysisassessment of poly genetic variations and path co-efficient analysis
assessment of poly genetic variations and path co-efficient analysis
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEAD
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEADGWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEAD
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEAD
 
Genetic mapping and qtl detection
Genetic mapping and qtl detectionGenetic mapping and qtl detection
Genetic mapping and qtl detection
 
QTL mapping and analysis.pptx
QTL mapping and analysis.pptxQTL mapping and analysis.pptx
QTL mapping and analysis.pptx
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Predicting Response Mode Preferences of Survey Respondents
Predicting Response Mode Preferences of Survey RespondentsPredicting Response Mode Preferences of Survey Respondents
Predicting Response Mode Preferences of Survey Respondents
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traits
 
Prashanth_Seminar.pptx
Prashanth_Seminar.pptxPrashanth_Seminar.pptx
Prashanth_Seminar.pptx
 

More from Avjinder (Avi) Kaler

More from Avjinder (Avi) Kaler (20)

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with Keras
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine Learning
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functions
 
Kaler et al 2018 euphytica
Kaler et al 2018 euphyticaKaler et al 2018 euphytica
Kaler et al 2018 euphytica
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plots
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder Kaler
 
Population genetics
Population geneticsPopulation genetics
Population genetics
 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
 
Abiotic stresses in plant
Abiotic stresses in plantAbiotic stresses in plant
Abiotic stresses in plant
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experiment
 
R Code for EM Algorithm
R Code for EM AlgorithmR Code for EM Algorithm
R Code for EM Algorithm
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Genome wide association mapping

  • 1. Genome-wide Association Mapping Avjinder Singh Kaler PhD Candidate Department of Crop, Soil, and Environmental Sciences University of Arkansas Nov-15-2016 Plant Breeding Lecture
  • 2. Identify genomic regions associated with phenotypes Phenotypic Data • Flowering time • Plant height • Yield • Phenotype Variation • Phenotypes are response variables Genotypic Data • Genomic markers that span the entire genome • Single nucleotide polymorphisms (SNPs) are commonly used as markers • Markers are explanatory variables
  • 3.
  • 5. Genetic Architecture of Complex Traits Phenotype Genotype Environment P = G + E + GE
  • 6. How do we connect genotype to phenotype? Functional Diversity: Phenotype Variation
  • 7. • Few recombination events, resulting in relatively low mapping resolution
  • 8. • Historical recombination events and natural genetic diversity, resulting in high mapping resolution
  • 9.
  • 10. GWAS based on Linkage Disequilibrium (LD) • LD is the non-random correlation or association of alleles at two loci • D, D′ (normalized), and r2 are commonly used summary statistics to estimate pairwise LD • r2 is preferred in association studies because it is more indicative of how markers might correlate with QTL
  • 11. Visualize extent of LD between pairs of loci LD Decay LD Block (Haplotype View)
  • 12.
  • 13. Genome-wide association study (GWAS) • Identify genomic regions associated with a phenotype • Fit a statistical model at each SNP in genome • Use fitted models to test H0: No association with SNP and phenotype
  • 14. Associating SNPs with phenotypes • At each SNP: Conduct a test of association with trait • Significant SNP/trait association suggests: – SNP has direct biological function (functional polymorphism) – SNP in LD with functional polymorphism(s) Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 A/C T/C G/A A/G G/T
  • 15. Genetic diversity can lead to false positives in a GWAS • Two sources for false positives: – Population structure—allele frequency differences among individuals due to local adaptation or diversifying selection – Familial relatedness—allele frequency differences among individuals due to recent co- ancestry Genetic Diversity of 2,815 Maize Inbreds Principal Coordinate 1 PrincipalCoordinate2 Romay et al. (2013)
  • 16. Controlling False Positives due to Population Structure • STRUCTURE (Q) • Identify different subpopulations within a sample of individuals collected from a population of unknown structure • Estimating Q- matrix • Time Consuming • Principle Component Analysis • Fast and effective approach to diagnose population structure • PCA summarizes variation observed across all markers into a smaller number of underlying component variables • Estimating PCs-matrix
  • 17. Principle Component Analysis •Scree plot –shows the fraction of total variance in the data explained by each PC •PCs selected based on the L-curve
  • 18. Controlling False Positives due to Familial relatedness •A kinship coefficient (F) is the probability that two homologous genes are identical by descent •Kinship from genetic markers is an estimate of relative kinship that is based on probabilities of identical by state
  • 19. Mixed models reduce false positives in GWAS • (Line1,…, Linen) ~ MVN(0, ) • K = kinship matrix • Îľi ~ i.i.d. N(0, ) Phenotype of ith individual Grand Mean Fixed effects: account for population structure Marker effect Observed SNP alleles of ith individual Random effects: account for familial relatedness Random error term Yu et al. (2006) Measures relatedness between individuals
  • 21. Germplasm Selection •Choice of germplasm is critical to the success of the association analysis •Phenotyping •Design Experiment • Collection of high quality phenotypic data
  • 22. Phenotypic Outliers •Outliers are “unusual” data points that substantially deviate from the mean and strongly influence parameter estimates •Should ALWAYS check for outliers in our data sets • Do NOT ignore outliers if detected
  • 23. Phenotypic Outliers • Outliers can • increase error variance • reduce the power of statistical tests • distort estimates • decrease normality if non-randomly distributed • Potential Causes of Outliers • Human errors in data collection, recording, or entry • Technical errors from faulty or non-calibrated phenotyping equipment • Intentional or motivated mis-reporting such as “speed” phenotyping in a hot field environment
  • 24. Evaluate Data for Outliers •Histogram •Box-plot (Box and Whisker plot) •Quantile-Quantile plot – graphical method for comparing two probability distributions to assess goodness-of-fit Get to know your data!
  • 25. Statistical Identification of Outliers •Cook’s distance – measures influence of a data point. Data points that substantially change effect estimates. •Deleted studentized residuals – measures leverage of a data point. Data points that affect least squares fit. Two of several possible methods
  • 26. Removal of Outliers •Removing anomalous data points from data sets is controversial to some folks. •If outliers are not removed, inferences made from the fitted model may not be representative of the population under study. •If you remove outliers, then be sure to report it in the manuscript.
  • 27. Non-Normal Trait Data •When fitting a mixed model, two very important assumptions are that the error terms follow a normal distribution and that there is a constant variance. •When data are non-normal, these two assumptions in particular could be violated.
  • 28. Analysis of Non-Normal Trait Data •Generalized linear mixed models can be used to analyze non-normal data •The Box-Cox procedure can be used to find the most appropriate transformation that corrects for non- normality of the error terms and unequal variances.
  • 31. Genotyping • SNPs most commonly used in association mapping
  • 32. Genotype-Quality Control • Removing the monomorphic markers • Markers with Minor allele Frequency < 5% or < 3% • Markers with high missing rate (e.g. > 10%) • Imputation for missing data (LD-kNNi, FILLIN, FSHAP, BEAGLE)
  • 33. Controlling False Positives • Population structure—allele frequency differences among individuals due to local adaptation or diversifying selection • Familial relatedness—allele frequency differences among individuals due to recent co-ancestry • If not properly controlled both can cause spurious associations in GWAS
  • 34. Controlling False Positives • Population structure • STRUCTURE (Q-matrix) • Principle Component Analysis (PCs-matrix) • Familial relatedness • Kinship matrix
  • 36. Mixed models reduce false positives in GWAS • (Line1,…, Linen) ~ MVN(0, ) • K = kinship matrix • Îľi ~ i.i.d. N(0, ) Phenotype of ith individual Grand Mean Fixed effects: account for population structure Marker effect Observed SNP alleles of ith individual Random effects: account for familial relatedness Random error term Yu et al. (2006) Measures relatedness between individuals
  • 37. What is a significant association? • Bonferroni correction –procedure to control the family-wise error rate (i.e., probability of making one or more type I errors) – Simplest and most conservative method to control FWER – Calculated as Îą/n, when nis number of hypotheses (i.e., SNPs tested) • False Discovery Rate –procedure to control the expected proportion of false discoveries – Less stringent than Bonferroni – q-value is the FDR analogue of p-value e.g., q=0.10 is 10 false discoveries/100 tests • Use list of p-values from ALL SNP tests as input to R function p.adjust or packages qvalue, fdrtool, … others Slide adapted from Prof. Jim Holland
  • 38. Genome-wide Association Mapping Results Manhattan plot: summarize GWAS results
  • 39. Genome-wide Association Mapping Results QQ-plot: assess performance of Statistical model Simple Model without correcting for population structure Mixed Linear Model
  • 40. Genome-wide Association Mapping Results GWAS results for all SNPs that were analyzed
  • 41. Software for GWAS • TASSEL • GAPIT • PLINK • GEMMA • FARMCPU • JMP Genomics • https://omictools.com/gwas-category • Tutorials – http://www.slideshare.net/AvjinderSingh/basic-tutorial-of-association-mapping- by-avjinder-kaler – http://www.slideshare.net/AvjinderSingh/tutorial-for-association-mapping-with- farm-cpu