SlideShare ist ein Scribd-Unternehmen logo
1 von 37
A Simulation-Based Approach to
Teaching Population Genetics:
R as a Teaching Platform
Bruce J. Cochrane
Department of Zoology/Biology
Miami University
Oxford OH
Two Time Points
• 1974
o Lots of Theory
o Not much Data
o Allozymes Rule
• 2013
o Even More Theory
o Lots of Data
o Sequences, -omics, ???
The Problem
• The basic approach hasn’t changed, e. g.
o Hardy Weinberg
o Mutation
o Selection
o Drift
o Etc.
• Much of it is deterministic
And
• There is little initial connection with real data
o The world seems to revolve around A and a
• At least in my hands, it doesn’t work
The Alternative
• Take a numerical (as opposed to analytical) approach
• Focus on understanding random variables and distributions
• Incorporate “big data”
• Introduce current approaches – coalescence, Bayesian
Analysis, etc. – in this context
Why R?
• Open Source
• Platform-independent (Windows, Mac, Linux)
• Object oriented
• Facile Graphics
• Web-oriented
• Packages available for specialized functions
Where We are Going
• The Basics – Distributions, chi-square and the Hardy Weinberg
Equilibrium
• Simulating the Ewens-Watterson Distribution
• Coalescence and summary statistics
• What works and what doesn’t
The RStudio Interface
The Normal Distribution
dat.norm <-rnorm(1000)
hist(dat.norm,freq=FALSE,ylim=c(0,.5))
curve(dnorm(x,0,1),add=TRUE,col="red")
mean(dat.norm)
var(dat.norm)
> mean(dat.norm)
[1] 0.003546691
> var(dat.norm)
[1] 1.020076
Sample Size and Cutoff Values
n <-c(10,30,100,1000)
res <-sapply(n,ndist)
colnames(res)=n
res
> res
10 30 100 1000
2.5% -1.110054 -1.599227 -1.713401 -1.981675
97.5% 2.043314 1.679208 1.729095 1.928852
What is chi-square All About?
xsq <-rchisq(10000,1)
hist(xsq, main="Chi Square Distribution, N=1000, 1 d. f",xlab="Value")
p05 <-quantile(xsq,.95)
abline(v=p05, col="red")
p05
95%
3.867886
Simple Generation of Critical Values
d <-1:10
chicrit <-qchisq(.95,d)
chitab <-cbind(d,chicrit)
chitab
d chicrit
[1,] 1 3.841459
[2,] 2 5.991465
[3,] 3 7.814728
[4,] 4 9.487729
[5,] 5 11.070498
[6,] 6 12.591587
[7,] 7 14.067140
[8,] 8 15.507313
[9,] 9 16.918978
[10,] 10 18.307038
Calculating chi-squared
The function
function(obs,exp,df=1){
chi <-sum((obs-exp)^2/exp)
pr <-1-pchisq(chi,df)
c(chi,pr)
A sample function call
obs <-c(315,108,101,32)
z <-sum(obs)/16
exp <-c(9*z,3*z,3*z,z)
chixw(obs,exp,3)
The output
chi-square = 0.47
probability(<.05) = 0.93
deg. freedom = 3
Basic Hardy Weinberg Calculations
The Biallelic Case
Sample input
obs <-c(13,35,70)
hw(obs)
Output
[1] "p= 0.2585 q= 0.7415"
obs exp
[1,] 13 8
[2,] 35 45
[3,] 70 65
[1] "chi squared = 5.732 p = 0.017 with 1 d. f."
Illustrating With Ternary Plots
library(HardyWeinberg)
dat <-(HWData(100,100))
gdist <-dat$Xt #create a variable with the working data
HWTernaryPlot(gdist, hwcurve=TRUE,addmarkers=FALSE,region=0,vbounds=FALSE,axis=2,
vertexlab=c("0","","1"),main="Theoretical Relationship",cex.main=1.5)
Access to Data
• Direct access of data
o HapMap
o Dryad
o Others
• Manipulation and visualization within R
• Preparation for export (e. g. Genalex)
Direct Access of HapMap Data
library (chopsticks)
chr21 <-read.HapMap.data("http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/latest_phaseII_ncbi_b36/fwd_strand/
non-redundant/genotypes_chr21_YRI_r24_nr.b36_fwd.txt.gz")
chr21.sum <-summary(chr21$snp.data)
head(chr21.sum)
Calls Call.rate MAF P.AA P.AB P.BB z.HWE
rs885550 90 1.0000000 0.09444444 0.8111111 0.1888889 0.00000000 0.9894243
rs1468022 90 1.0000000 0.00000000 0.0000000 0.0000000 1.00000000 NA
rs169758 90 1.0000000 0.31666667 0.4000000 0.5666667 0.03333333 2.9349509
rs150482 89 0.9888889 0.00000000 0.0000000 0.0000000 1.00000000 NA
rs12627229 89 0.9888889 0.00000000 0.0000000 0.0000000 1.00000000 NA
rs9982283 90 1.0000000 0.05555556 0.0000000 0.1111111 0.88888889 0.5580490
Distribution of Hardy Weinberg Deviation on
Chromosome 22 Markers
And Determining the Number of Outliers
nsnps <- length(hwdist)
quant <-quantile(hwdist,c(.025,.975))
low <-length(hwdist[hwdist<quant[1]])
high <-length(hwdist[hwdist>quant[2]])
accept <-nsnps-low-high
low; accept; high
[1] 982
[1] 37330
[1] 976
Sampling and Plotting Deviation from Hardy Weinberg
chr21.poly <-na.omit(chr21.sum) #remove all NA's (fixed SNPs)
chr21.samp <-sample(nrow(chr21.poly),1000, replace=FALSE)
plot(chr21.poly$z.HWE[chr21.samp])
Plotting F for Randomly Sampled Markers
chr21.sub <-chr21.poly[chr21.samp,]
Hexp <- 2*chr21.sub$MAF*(1-chr21.sub$MAF)
Fi <- 1-(chr21.sub$P.AB/Hexp)
plot(Fi,xlab="Locus",ylab="F")
Additional Information
head(chr21$snp.support)
dbSNPalleles Assignment Chromosome Position Strand
rs885550 C/T C/T chr21 9887804 +
rs1468022 C/T C/T chr21 9887958 +
rs169758 C/T C/T chr21 9928786 +
rs150482 A/G A/G chr21 9932218 +
rs12627229 C/T C/T chr21 9935312 +
rs9982283 C/T C/T chr21 9935844 +
The Ewens- Watterson Test
• Based on Ewens (1977) derivation of the theoretical
equilibrium distribution of allele frequencies under the
infinite allele model.
• Uses expected homozygosity (Σp2) as test statistic
• Compares observed homozygosity in sample to expected
distribution in n random simulations
• Observed data are
o N=number of samples
o k= number of alleles
o Allele Frequency Distribution
Classic Data (Keith et al., 1985)
• Xdh in D. pseudoobscura, analyzed by sequential
electrophoresis
• 89 samples, 15 distinct alleles
Testing the Data
1. Input the Data
Xdh <- c(52,9,8,4,4,2,2,1,1,1,1,1,1,1,1) # vector of allele numbers
length(Xdh) # number of alleles = k
sum(Xdh) #number of samples = n
2. Calculate Expected Homozygosity
Fx <-fhat(Xdh)
3. Run the Analysis
Ewens(n,k,Fx)
The Result
With Newer (and more complete) Data
Lactase Haplotypes in European and African Populations
1. Download data for Lactase gene from HapMap (CEU, YRI)
o 25 SNPS
o 48,000 KB
2. Determine numbers of haplotypes and frequencies for each
3. Apply Ewens-Waterson test to each.
The Results
par(mfrow=c(2,1))
pops <-c("ceu","yri")
sapply(pops,hapE)
CEU
YRI
Some Basic Statistics from Sequence Data
library(seqinR)
library(pegas)
dat <-read.fasta(file="./Data/FGB.fas")
#additional code needed to rearrange data
sites <-seg.sites(dat.dna)
nd <-nuc.div(dat.dna)
taj <-tajima.test(dat.dna)
length(sites); nd;taj$D
[1] 23
[1] 0.007561061
[1] -0.7759744
Intron sequences, 433 nucleotides each
from Peters JL, Roberts TE, Winker K, McCracken KG (2012)
PLoS ONE 7(2): e31972. doi:10.1371/journal.pone.0031972
Coalescence I – A Bunch of Trees
trees <-read.tree("http://dl.dropbox.com/u/9752688/ZOO%20422P/R/msfiles/tree.1.txt")
plot(trees[1:9],layout=9)
Coalescence II - MRCA
msout.1.txt <-system("./ms 10 1000 -t .1 -L", intern=TRUE)
ms.1 <- read.ms.output(msout.1.txt)
hist(ms.1$times[,1],main="MRCA, Theta=0.1",xlab="4N")
Coalescence III – Summary Statistics
system("./ms 50 1000 -s 10 -L | ./sample_stats >samp.ss")
# 1000 simulations of 50 samples, with number of sites set to 10
ss.out <-read_ss("samp.ss")
head(ss.out)
pi S D thetaH H
1. 1.825306 10 -0.521575 2.419592 -0.594286
2. 2.746939 10 0.658832 2.518367 0.228571
3. 3.837551 10 2.055665 3.631837 0.205714
4. 2.985306 10 0.964128 2.280000 0.705306
5. 1.577959 10 -0.838371 5.728163 -4.150204
6. 2.991020 10 0.971447 3.539592 -0.548571
Coalescence IV – Distribution of Summary Statistics
hist(ss.out$D,main="Distribution of Tajima's D (N=1000)",xlab="D")
abline(v=mean(ss.out$D),col="blue")
abline(v=quantile(ss.out$D,c(.025,.975)),col="red")
Other Uses
• Data Manipulation
o Conversion of HapMap Data for use elsewhere (e. g. Genalex)
o Other data sources via API’s (e. g. package rdryad)
• Other Analyses
o Hierarchical F statistics (hierfstat)
o Haplotype networking (pegas)
o Phylogenetics (ape, phyclust, others)
o Approximate Bayesian Computation (abc)
• Access for students
o Scripts available via LMS
o Course specific functions can be accessed (source("http://db.tt/A6tReYEC")
o Notes with embedded code in HTML (Rstudio, knitr)
Sample HTML Rendering
Challenges
• Some coding required
• Data Structures are a challenge
• Packages are heterogeneous
• Students resist coding
Nevertheless
• Fundamental concepts can be easily visualized graphically
• Real data can be incorporated from the outset
• It takes students from fundamental concepts to real-world
applications and analyses
For Further information:
cochrabj@miamioh.edu
Functions
http://db.tt/A6tReYEC

Weitere ähnliche Inhalte

Was ist angesagt?

Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 Multiplicative Decompositions of Stochastic Distributions and Their Applicat... Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
Multiplicative Decompositions of Stochastic Distributions and Their Applicat...Toshiyuki Shimono
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Toshiyuki Shimono
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDChristos Kallidonis
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
10.1.1.474.2861
10.1.1.474.286110.1.1.474.2861
10.1.1.474.2861pkavitha
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneShinichi Tamura
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...Christian Robert
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Christian Robert
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 

Was ist angesagt? (20)

Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 Multiplicative Decompositions of Stochastic Distributions and Their Applicat... Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCD
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
10.1.1.474.2861
10.1.1.474.286110.1.1.474.2861
10.1.1.474.2861
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 

Ähnlich wie Teaching Population Genetics with R

Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...André Panisson
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer ChemistryPreferred Networks
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Revisiting the fundamental concepts and assumptions of statistics pps
Revisiting the fundamental concepts and assumptions of statistics ppsRevisiting the fundamental concepts and assumptions of statistics pps
Revisiting the fundamental concepts and assumptions of statistics ppsD Dutta Roy
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONAndré Panisson
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445Rasha Orban
 
Dynamics of structures with uncertainties
Dynamics of structures with uncertaintiesDynamics of structures with uncertainties
Dynamics of structures with uncertaintiesUniversity of Glasgow
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
 
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Alexander Litvinenko
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingArthur Charpentier
 
K-adaptive partitioning for survival data
K-adaptive partitioning for survival dataK-adaptive partitioning for survival data
K-adaptive partitioning for survival data수행 어
 
BASIC STATISTICS AND PROBABILITY.pptx
BASIC STATISTICS AND PROBABILITY.pptxBASIC STATISTICS AND PROBABILITY.pptx
BASIC STATISTICS AND PROBABILITY.pptxRanjuBijoy
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification學翰 施
 

Ähnlich wie Teaching Population Genetics with R (20)

Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Technical
TechnicalTechnical
Technical
 
Revisiting the fundamental concepts and assumptions of statistics pps
Revisiting the fundamental concepts and assumptions of statistics ppsRevisiting the fundamental concepts and assumptions of statistics pps
Revisiting the fundamental concepts and assumptions of statistics pps
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHON
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445
 
Dynamics of structures with uncertainties
Dynamics of structures with uncertaintiesDynamics of structures with uncertainties
Dynamics of structures with uncertainties
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive Modeling
 
K-adaptive partitioning for survival data
K-adaptive partitioning for survival dataK-adaptive partitioning for survival data
K-adaptive partitioning for survival data
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
 
BASIC STATISTICS AND PROBABILITY.pptx
BASIC STATISTICS AND PROBABILITY.pptxBASIC STATISTICS AND PROBABILITY.pptx
BASIC STATISTICS AND PROBABILITY.pptx
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
 

Kürzlich hochgeladen

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 

Kürzlich hochgeladen (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 

Teaching Population Genetics with R

  • 1. A Simulation-Based Approach to Teaching Population Genetics: R as a Teaching Platform Bruce J. Cochrane Department of Zoology/Biology Miami University Oxford OH
  • 2. Two Time Points • 1974 o Lots of Theory o Not much Data o Allozymes Rule • 2013 o Even More Theory o Lots of Data o Sequences, -omics, ???
  • 3. The Problem • The basic approach hasn’t changed, e. g. o Hardy Weinberg o Mutation o Selection o Drift o Etc. • Much of it is deterministic
  • 4. And • There is little initial connection with real data o The world seems to revolve around A and a • At least in my hands, it doesn’t work
  • 5. The Alternative • Take a numerical (as opposed to analytical) approach • Focus on understanding random variables and distributions • Incorporate “big data” • Introduce current approaches – coalescence, Bayesian Analysis, etc. – in this context
  • 6. Why R? • Open Source • Platform-independent (Windows, Mac, Linux) • Object oriented • Facile Graphics • Web-oriented • Packages available for specialized functions
  • 7. Where We are Going • The Basics – Distributions, chi-square and the Hardy Weinberg Equilibrium • Simulating the Ewens-Watterson Distribution • Coalescence and summary statistics • What works and what doesn’t
  • 9. The Normal Distribution dat.norm <-rnorm(1000) hist(dat.norm,freq=FALSE,ylim=c(0,.5)) curve(dnorm(x,0,1),add=TRUE,col="red") mean(dat.norm) var(dat.norm) > mean(dat.norm) [1] 0.003546691 > var(dat.norm) [1] 1.020076
  • 10. Sample Size and Cutoff Values n <-c(10,30,100,1000) res <-sapply(n,ndist) colnames(res)=n res > res 10 30 100 1000 2.5% -1.110054 -1.599227 -1.713401 -1.981675 97.5% 2.043314 1.679208 1.729095 1.928852
  • 11. What is chi-square All About? xsq <-rchisq(10000,1) hist(xsq, main="Chi Square Distribution, N=1000, 1 d. f",xlab="Value") p05 <-quantile(xsq,.95) abline(v=p05, col="red") p05 95% 3.867886
  • 12. Simple Generation of Critical Values d <-1:10 chicrit <-qchisq(.95,d) chitab <-cbind(d,chicrit) chitab d chicrit [1,] 1 3.841459 [2,] 2 5.991465 [3,] 3 7.814728 [4,] 4 9.487729 [5,] 5 11.070498 [6,] 6 12.591587 [7,] 7 14.067140 [8,] 8 15.507313 [9,] 9 16.918978 [10,] 10 18.307038
  • 13. Calculating chi-squared The function function(obs,exp,df=1){ chi <-sum((obs-exp)^2/exp) pr <-1-pchisq(chi,df) c(chi,pr) A sample function call obs <-c(315,108,101,32) z <-sum(obs)/16 exp <-c(9*z,3*z,3*z,z) chixw(obs,exp,3) The output chi-square = 0.47 probability(<.05) = 0.93 deg. freedom = 3
  • 14. Basic Hardy Weinberg Calculations The Biallelic Case Sample input obs <-c(13,35,70) hw(obs) Output [1] "p= 0.2585 q= 0.7415" obs exp [1,] 13 8 [2,] 35 45 [3,] 70 65 [1] "chi squared = 5.732 p = 0.017 with 1 d. f."
  • 15. Illustrating With Ternary Plots library(HardyWeinberg) dat <-(HWData(100,100)) gdist <-dat$Xt #create a variable with the working data HWTernaryPlot(gdist, hwcurve=TRUE,addmarkers=FALSE,region=0,vbounds=FALSE,axis=2, vertexlab=c("0","","1"),main="Theoretical Relationship",cex.main=1.5)
  • 16. Access to Data • Direct access of data o HapMap o Dryad o Others • Manipulation and visualization within R • Preparation for export (e. g. Genalex)
  • 17. Direct Access of HapMap Data library (chopsticks) chr21 <-read.HapMap.data("http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/latest_phaseII_ncbi_b36/fwd_strand/ non-redundant/genotypes_chr21_YRI_r24_nr.b36_fwd.txt.gz") chr21.sum <-summary(chr21$snp.data) head(chr21.sum) Calls Call.rate MAF P.AA P.AB P.BB z.HWE rs885550 90 1.0000000 0.09444444 0.8111111 0.1888889 0.00000000 0.9894243 rs1468022 90 1.0000000 0.00000000 0.0000000 0.0000000 1.00000000 NA rs169758 90 1.0000000 0.31666667 0.4000000 0.5666667 0.03333333 2.9349509 rs150482 89 0.9888889 0.00000000 0.0000000 0.0000000 1.00000000 NA rs12627229 89 0.9888889 0.00000000 0.0000000 0.0000000 1.00000000 NA rs9982283 90 1.0000000 0.05555556 0.0000000 0.1111111 0.88888889 0.5580490
  • 18. Distribution of Hardy Weinberg Deviation on Chromosome 22 Markers
  • 19. And Determining the Number of Outliers nsnps <- length(hwdist) quant <-quantile(hwdist,c(.025,.975)) low <-length(hwdist[hwdist<quant[1]]) high <-length(hwdist[hwdist>quant[2]]) accept <-nsnps-low-high low; accept; high [1] 982 [1] 37330 [1] 976
  • 20. Sampling and Plotting Deviation from Hardy Weinberg chr21.poly <-na.omit(chr21.sum) #remove all NA's (fixed SNPs) chr21.samp <-sample(nrow(chr21.poly),1000, replace=FALSE) plot(chr21.poly$z.HWE[chr21.samp])
  • 21. Plotting F for Randomly Sampled Markers chr21.sub <-chr21.poly[chr21.samp,] Hexp <- 2*chr21.sub$MAF*(1-chr21.sub$MAF) Fi <- 1-(chr21.sub$P.AB/Hexp) plot(Fi,xlab="Locus",ylab="F")
  • 22. Additional Information head(chr21$snp.support) dbSNPalleles Assignment Chromosome Position Strand rs885550 C/T C/T chr21 9887804 + rs1468022 C/T C/T chr21 9887958 + rs169758 C/T C/T chr21 9928786 + rs150482 A/G A/G chr21 9932218 + rs12627229 C/T C/T chr21 9935312 + rs9982283 C/T C/T chr21 9935844 +
  • 23. The Ewens- Watterson Test • Based on Ewens (1977) derivation of the theoretical equilibrium distribution of allele frequencies under the infinite allele model. • Uses expected homozygosity (Σp2) as test statistic • Compares observed homozygosity in sample to expected distribution in n random simulations • Observed data are o N=number of samples o k= number of alleles o Allele Frequency Distribution
  • 24. Classic Data (Keith et al., 1985) • Xdh in D. pseudoobscura, analyzed by sequential electrophoresis • 89 samples, 15 distinct alleles
  • 25. Testing the Data 1. Input the Data Xdh <- c(52,9,8,4,4,2,2,1,1,1,1,1,1,1,1) # vector of allele numbers length(Xdh) # number of alleles = k sum(Xdh) #number of samples = n 2. Calculate Expected Homozygosity Fx <-fhat(Xdh) 3. Run the Analysis Ewens(n,k,Fx)
  • 27. With Newer (and more complete) Data Lactase Haplotypes in European and African Populations 1. Download data for Lactase gene from HapMap (CEU, YRI) o 25 SNPS o 48,000 KB 2. Determine numbers of haplotypes and frequencies for each 3. Apply Ewens-Waterson test to each.
  • 29. Some Basic Statistics from Sequence Data library(seqinR) library(pegas) dat <-read.fasta(file="./Data/FGB.fas") #additional code needed to rearrange data sites <-seg.sites(dat.dna) nd <-nuc.div(dat.dna) taj <-tajima.test(dat.dna) length(sites); nd;taj$D [1] 23 [1] 0.007561061 [1] -0.7759744 Intron sequences, 433 nucleotides each from Peters JL, Roberts TE, Winker K, McCracken KG (2012) PLoS ONE 7(2): e31972. doi:10.1371/journal.pone.0031972
  • 30. Coalescence I – A Bunch of Trees trees <-read.tree("http://dl.dropbox.com/u/9752688/ZOO%20422P/R/msfiles/tree.1.txt") plot(trees[1:9],layout=9)
  • 31. Coalescence II - MRCA msout.1.txt <-system("./ms 10 1000 -t .1 -L", intern=TRUE) ms.1 <- read.ms.output(msout.1.txt) hist(ms.1$times[,1],main="MRCA, Theta=0.1",xlab="4N")
  • 32. Coalescence III – Summary Statistics system("./ms 50 1000 -s 10 -L | ./sample_stats >samp.ss") # 1000 simulations of 50 samples, with number of sites set to 10 ss.out <-read_ss("samp.ss") head(ss.out) pi S D thetaH H 1. 1.825306 10 -0.521575 2.419592 -0.594286 2. 2.746939 10 0.658832 2.518367 0.228571 3. 3.837551 10 2.055665 3.631837 0.205714 4. 2.985306 10 0.964128 2.280000 0.705306 5. 1.577959 10 -0.838371 5.728163 -4.150204 6. 2.991020 10 0.971447 3.539592 -0.548571
  • 33. Coalescence IV – Distribution of Summary Statistics hist(ss.out$D,main="Distribution of Tajima's D (N=1000)",xlab="D") abline(v=mean(ss.out$D),col="blue") abline(v=quantile(ss.out$D,c(.025,.975)),col="red")
  • 34. Other Uses • Data Manipulation o Conversion of HapMap Data for use elsewhere (e. g. Genalex) o Other data sources via API’s (e. g. package rdryad) • Other Analyses o Hierarchical F statistics (hierfstat) o Haplotype networking (pegas) o Phylogenetics (ape, phyclust, others) o Approximate Bayesian Computation (abc) • Access for students o Scripts available via LMS o Course specific functions can be accessed (source("http://db.tt/A6tReYEC") o Notes with embedded code in HTML (Rstudio, knitr)
  • 36. Challenges • Some coding required • Data Structures are a challenge • Packages are heterogeneous • Students resist coding
  • 37. Nevertheless • Fundamental concepts can be easily visualized graphically • Real data can be incorporated from the outset • It takes students from fundamental concepts to real-world applications and analyses For Further information: cochrabj@miamioh.edu Functions http://db.tt/A6tReYEC