SlideShare ist ein Scribd-Unternehmen logo
1 von 70
Probability Distributions
PART 3
3. Probability distributions
3.1. Normal distribution
3.2. Chi-square distribution
3.3. Student’s t-distribution
3.4. Summary of applications of different distributions
3.5 Central Limit Theorem
© akhila prabhakaran
Probability Distributions
Recap
When the value of a variable is the outcome of a statistical experiment, that variable is
a random variable.
Sample Space = set of all possible outcomes of an experiment.
Event = subset of the Sample Space. (example coin toss)
S = sample space {all outcomes of the experiment}
= {e1, e2, e3, e4…..en}
Probability Distribution = {p1 = P(e1), p2 = P(e2)…….pn = P(en)}
© akhila prabhakaran
Population vs Sample
A population is a group of phenomena that have something in common. The term
often refers to a group of people, as in the following examples:
 All registered voters in Bangalore
 All members of the IEEE
 All Cricketers who played atleast one league match in the past year
Populations can refer to things as well as people:
 All sensors installed in a high security location.
 All daily maximum temperatures in July for major Indian cities
 All basal ganglia cells from a particular rhesus monkey
© akhila prabhakaran
Sample vs Population
A sample is a smaller group of
members of a population selected
to represent the population.
PARAMETER => Population
characteristic like population mean
etc.
STATISTIC => Sample characteristic
© akhila prabhakaran
Probability Distribution
© akhila prabhakaran
Experiment: Flip a coin two times.
All possible outcomes: HH, HT, TH, and TT.
Random variable X : Number of Heads that result from this experiment.
All possible values of X : 0, 1, or 2.
A probability distribution is a table or an equation that links each outcome of a statistical experiment
with its probability of occurrence.
Number of Heads
(X)
Probability [ P(X =x)]
0 0.25
1 0.50
2 0.25
Cumulative Probability Distribution
© akhila prabhakaran
Refers to the probability that the value of a random variable falls within a specified range.
Experiment: Flip a coin two times.
All possible outcomes: HH, HT, TH, and TT.
What is the probability that the coin flips would result in one or fewer heads?
P(X < 1) = P(X = 0) + P(X = 1) = 0.25 + 0.50 = 0.75
Number of Probability (X =x) Cumulative
Probability (X<=x)
0 0.25 0.25
1 0.50 0.75
2 0.25 1
UNIFORM Distribution
All of the values of a random variable occur with equal probability.
Suppose the random variable X can assume k different values.
Suppose also that the P(X = xk) is constant.
P(X = xk) = 1/k
Example : Suppose a dice is tossed. What is the probability that the die will land on 5?
6 possible outcomes represented by: S = { 1, 2, 3, 4, 5, 6 }.
Each possible outcome is a random variable (X), and each outcome is equally likely to occur. The
P(X = 5) = 1/6.
What is the probability that the dice will land on a number that is smaller than 5?
© akhila prabhakaran
Probability Distributions: Discrete or
Continuous
Depends on whether it is associated with Discrete variables or Continuous variables
Discrete data
When the values in the batch are whole numbers (counts), the data set is called discrete.
Examples of discrete measurements are:
Continuous data
When the data are not constrained to be whole numbers, the data set is called continuous.
Examples are:
the maximum temperatures each day in January in your local city,
© akhila prabhakaran
Discrete Probability Distributions
If a random variable is a discrete variable, its probability distribution is called a discrete probability
distribution.
Earlier example about flipping a coin and rolling a dice.
Binomial probability distribution
 A binomial experiment is a statistical experiment that consists of n repeated trials. Each trial can
result in just two possible outcomes (success or failure). The probability of success, denoted by P,
is the same on every trial. The trials are independent; that is, the outcome on one trial does not
affect the outcome on other trials.
 A binomial random variable is the number of successes x in n repeated trials of a binomial
experiment.
The probability distribution of a binomial random variable is called a binomial distribution.
© akhila prabhakaran
Binomial distribution
© akhila prabhakaran
Probability of r successes in n
trials
Mean,
Variance
& S.D
© akhila prabhakaran
Binomial
Distribution
© akhila prabhakaran
Applications of Binomial distribution
© akhila prabhakaran
In modeling the driver behavior, intersection turning movements, and in speed studies this
distribution is used.
For example, if the probability of a vehicle turning left at an intersection is 0.15 then the
probability of 3 vehicles out of 10 vehicles turning left equals to,
10C3 (0.15)3 (0.85)7 =0.130
In the above example, a specific vehicle turning left or not is a Bernoulli trial and it is assumed
that the arrivals of individual vehicles at the junction are independent events.
Applications of Binomial distribution
© akhila prabhakaran
A Biological Application of the Binomial Distribution
Suppose that 1% of the population is infected with a virus. There are no obvious symptoms that
can be used to recognise carriers, thus individuals must be selected at random and tested. A
decision is made to obtain a sample of 20 individuals.
Is this sample size adequate? Will any infected individuals be found?
If 1% of the population is infected then p = 0.01 (1% infected) and q = 0.99 (99% non-infected).
Picking an individual at random has only a 1% chance of an infection, but surely at least 1
infected person should be found in 20 individuals? In order to answer this question lateral
thinking is needed.
Applications of Binomial distribution
© akhila prabhakaran
A Biological Application of the Binomial Distribution
To find the probability of finding some (i.e. 1 or more) the easiest way is to calculate the
probability of no cases (i.e. P(0)) and then use subtraction.
The number of successes, r, to 0, and the number of trials, n, to 20. This will gives the probability
of taking a sample of 20 individuals and finding no infected individuals.
P(0) = 20C0 p0 q20
P(0) = 20!/((0!)(20-0)! x 0.010 x 0.9920 = 0.82
Thus, if 1% of the population is infected there is a 82% chance that a sample of 20 individuals
will fail to find any infections
Poisson Distribution
© akhila prabhakaran
Probability distribution that results from a Poisson experiment.
Attributes of a Poisson Experiment
• Outcomes that can be classified as successes or failures.
• Average number of successes (μ) that occurs in a specified region is known.
• Probability that a success will occur is proportional to the size of the region.
• The probability that a success will occur in an extremely small region is virtually zero.
• The specified region could take many forms. For instance, it could be a length, an
area, a volume, a period of time, etc.
Poisson Distribution
© akhila prabhakaran
Poisson Distribution
© akhila prabhakaran
Poisson Distribution Examples
© akhila prabhakaran
Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that tourists
will see fewer than four lions on the next 1-day safari?
This is a Poisson experiment in which we know the following:
μ = 5; since 5 lions are seen per safari, on average.
x = 0, 1, 2, or 3;
Find the likelihood that tourists will see fewer than 4 lions; we want the probability that they will see 0,
1, 2, or 3 lions.
e = 2.71828; since e is a constant equal to approximately 2.71828.
We need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5).
Poisson Distribution Examples
© akhila prabhakaran
Poisson Distribution
© akhila prabhakaran
If, from the past experience it is known that on an average every two years 3
cyclones hit the coastal area of Andhra Pradesh and Orissa states. If it is
assumed that the cyclone hitting the coastal areas follows Poisson
distribution then what is the probability of two cyclones crossing the coastal
area of Andhra Pradesh and Orissa in the next two years?
Poisson Distribution
© akhila prabhakaran
The most widely used situation is the arrival pattern of vehicles. In this
case m becomes the average number of vehicles per any stated time interval.
Queueing systems use poisson distribution or variations of this distribution,
extensively to understand and optimize queueing patterns/workflow.
Probability Density Function
© akhila prabhakaran
There are three basic differences between a continuous and a discrete probability distribution:
1. The probability that a continuous variable will take a specific value is equal to zero.
2. Because of this, we can never express continuous probability distribution in a tabular form.
3. Thus we require an equation or a formula to describe such kind of distribution. Such equation
is termed as probability density function.
Probability Density Function
© akhila prabhakaran
PDF and CDF
© akhila prabhakaran
Expected Value / MEAN
© akhila prabhakaran
Mean or Expected Value
Variance
© akhila prabhakaran
Variance
Uniform (continuous) Distribution
© akhila prabhakaran
Probability Density Function of Uniform
Distribution
Normal Distribution
© akhila prabhakaran
Normal Distribution
© akhila prabhakaran
Probability Density Function
Cumulative Distribution Function
© akhila prabhakaran
Normal Distribution
© akhila prabhakaran
 Normal distributions are symmetric around their mean.
 The mean, median, and mode of a normal distribution are equal.
 The area under the normal curve is equal to 1.0.
 Normal distributions are denser in the center and less dense in the tails.
 Normal distributions are defined by two parameters, the mean (μ) and the
standard deviation (σ).
 68% of the area of a normal distribution is within one standard deviation of the
mean.
 Approximately 95% of the area of a normal distribution is within two standard
deviations of the mean.
Normal Distribution
© akhila prabhakaran
Normal Distribution
© akhila prabhakaran
One of the first applications of the normal distribution was to the analysis of errors of measurement
made in astronomical observations, errors that occurred because of imperfect instruments and
imperfect observers.
Galileo in the 17th century noted that these errors were symmetric and that small errors occurred more
frequently than large errors.
This led to several hypothesized distributions of errors, but it was not until the early 19th century that it
was discovered that these errors followed a normal distribution.
Independently, the mathematicians Adrain in 1808 and Gauss in 1809 developed the formula for the
normal distribution and showed that errors were fit well by this distribution.
This same distribution had been discovered by Laplace in 1778 when he derived the extremely
important central limit theorem.
Laplace showed that even if a distribution is not normally distributed, the means of repeated samples
from the distribution would be very nearly normally distributed, and that the larger the sample size, the
closer the distribution of means would be to a normal distribution.
Most statistical procedures for testing differences between means assume normal distributions. These
tests work well even if the original distribution is only roughly normal.
Quételet was the first to apply the normal distribution to human characteristics. He noted that
characteristics such as height, weight, and strength were normally distributed.
Normal Distribution – Area under the
curve
© akhila prabhakaran
http://onlinestatbook.com/2/calculators/normal_dist.html
> pnorm(1, mean=0, sd=1)
[1] 0.8413447
> x=seq(-4,4,length=200)
> y=dnorm(x)
> plot(x,y,type="l", lwd=2, col="blue")
> x=seq(-4,1,length=200)
> y=dnorm(x)
> polygon(c(-4,x,1),c(0,y,0),col="gray")
Interpretation of area as a probability
This result indicates that if we draw a number at
random from the standard normal distribution, the
probability that we draw a number that is less than or
equal to 1 is 0.8413447.
Normal Distribution: Area under the curve
© akhila prabhakaran
The probability that a randomly selected number from the standard normal distribution occurs
within one standard deviation of the mean.
This probability is represented by the area under the standard normal curve between x = -1
and x = 1
> pnorm(1, mean=0, sd=1)-pnorm(-1,mean-0, sd = 1)
[1] 0.6826895
> x=seq(-4,4,length=200)
> y=dnorm(x)
> plot(x,y,type="l", lwd=2, col="blue")
> x=seq(-1,1,length=100)
> y=dnorm(x)
> polygon(c(-1,x,1),c(0,y,0),col="gray")
Normal Distribution: Quantiles
© akhila prabhakaran
Given the probability (or area under the curve) find the x value.
What is the 95th percentile of a standard normal distribution?
> qnorm(0.95,mean=0,sd=1)
[1] 1.644854
Find all quantiles of the standard normal distribution.
Display pdfs of normal distributions with mean of 50 and with
standard deviations of 10 and 5 respectively.
Display pdfs of normal distributions with mean of 50 and 70
& standard deviations of 10 and 15 respectively
Sum of Normal Random Variables
© akhila prabhakaran
X and Y are Normally distributed random variables, that are independent
Sum of Normal Random Variables
© akhila prabhakaran
Degrees
of
Freedom
© akhila prabhakaran
The degrees of freedom (df) of an estimate is the number of
independent pieces of information on which the estimate is
based.
For example, an estimate of the variance based on a sample
size of 100 is based on more information than an estimate of
the variance based on a sample size of 5.
If we know that the mean height of Martians is 6 and wish to
estimate the variance of their heights. We randomly sample
one Martian and find that its height is 8.
Variance = (8-6)2 Has 1 degree of freedom
If we have the height of another Martian, say 9, The new
variance would be [(8-6)2 + (9-6)2] x 1/2 With 2 degrees of
freedom
Now, if we do not know the mean, the degrees of freedom
reduces by 1
Degrees
of
Freedom
© akhila prabhakaran
What is inferential statistics?
© akhila prabhakaran
Generalizing from sample to population
A critical part of inferential statistics involves determining how far
sample statistics are likely to vary from each other and from the
population parameter.
These are determined based on Sampling Distributions.
What is a sampling distribution?
© akhila prabhakaran
A sampling distribution is a graph of a statistic for your sample data
Technically, you could choose any statistic to paint a picture, some common ones are:
• Mean
• Mean absolute value of the deviation from the mean
• Range
• Standard deviation of the sample
• Unbiased estimate of variance
• Variance of the sample
Sampling distributions
© akhila prabhakaran
• A set of three pool balls, each with a number on it.
• Two of the balls are selected randomly (with replacement) and the average of their
numbers is computed.
• Tabulate each outcome and its mean.
• Tabulate the frequencies of the mean of each outcome
Sampling distributions
© akhila prabhakaran
sample(1:3, 9, replace=TRUE)
Sampling Distribution
© akhila prabhakaran
EXERCISE : SAMPLING DISTRIBUTION OF
RANGE
© akhila prabhakaran
for(i in 1:10)
{
print(sample(c(1,2,3), 2,
replace = TRUE, prob = NULL))
}
Sampling distributions and inferential statistics
© akhila prabhakaran
s <- list()
for(i in 1:20)
{
l1 <-sample(SachinNoNAs$Runs, 2, replace = TRUE,
prob = NULL)
s <- append(s, mean(l1))
}
ggplot() + geom_histogram(aes(x = unlist(s)),
bins= 100, color = "white", fill = "blue")
#########################################
s <- list()
for(i in 1:100)
{
l1 <-sample(SachinNoNAs$Runs, 50, replace =
TRUE, prob = NULL)
s <- append(s, mean(l1))
}
ggplot() + geom_histogram(aes(x = unlist(s)),
bins= 100, color = "white", fill = "blue")
Normal Approximation to Binomial
© akhila prabhakaran
Assume you have a fair coin and
wish to know the probability that
you would get 8 heads out of 10
flips.
Using dbinom
dbinom(8,10,0.5)
#[1] 0.04394531
plot(dbinom(seq(1:100), 100,
0.5), col="red", pch=19)
Normal Approximation to Binomial
© akhila prabhakaran
Binomial distribution has a mean of μ = Np = (10)(0.5) = 5
and a variance of σ2 = Np(1-p) = (10)(0.5)(0.5) = 2.5
The standard deviation is therefore 1.5811.
A total of 8 heads is (8 - 5)/1.5811 = 1.897 standard deviations above
the mean of the distribution.
Solution: round off and consider any value from 7.5 to 8.5 to
represent an outcome of 8 heads. Using this approach, we figure out
the area under a normal curve from 7.5 to 8.5.
Central limit theorem
© akhila prabhakaran
Given a population with a finite mean μ and a finite non-zero variance σ2,
the sampling distribution of the mean approaches a normal distribution
with a mean of μ and a variance of σ2/N as N, the sample size, increases.
If a population has a mean μ, then the mean of the sampling
distribution of the mean is also μ.
μM = μ
The variance of the sampling distribution of the mean is
Central limit theorem
© akhila prabhakaran
EXERCISE
© akhila prabhakaran
1. X = sum of two 6-faced dice. What is the sample space of X? Can you
simulate this using R? The experiment is performed N(=10,20,30) times.
What is the distribution of X. Plot a histogram.
2. Find the sampling distribution of the means of X.
3. What is the mean and variance of the sampling distribution?
Central limit theorem - Usage
© akhila prabhakaran
Central limit theorem - Usage
© akhila prabhakaran
Three central limit theorem examples:
Find the probability that the mean is greater than a certain number
Find the probability that the mean is less than a certain number
Find the probability that the mean is between a certain set of numbers either
side of the mean
Central limit theorem - Usage
© akhila prabhakaran
Problem: A certain group of welfare recipients receives SNAP benefits of $110
per week with a standard deviation of $20. If a random sample of 25 people is
taken, what is the probability their mean benefit will be greater than $120 per
week?
The mean (average or μ)
The standard deviation (σ)
Sample size (n)
In other words, the problem is asking you “What is the probability that a
sample mean of x items will be greater than a given number?
Central limit theorem - Usage
© akhila prabhakaran
The mean (average or μ)
The standard deviation (σ)
Population size
Sample size (n)
In other words, the problem is asking you “What is the probability that a
sample mean of x items will be greater than a given number?
Central limit theorem - Usage
© akhila prabhakaran
Problem: A certain group of welfare recipients receives SNAP benefits of $110
per week with a standard deviation of $20. If a random sample of 25 people is
taken, what is the probability their mean benefit will be greater than $120 per
week?
X ~ mean of the random sample
To find P(X > $120)
X ~ N(110, 20/sqrt(25))
(X – 110)/4 ~ N(0,1)
Problem translates to P[(X-110)/4 > (120-110)/4] or P( Y > 2.5) where
Y~N(0,1)
1 - pnorm(2.5)
Central limit theorem - Usage
© akhila prabhakaran
Problem: A population of 29 year-old males has a mean salary of $29,321 with
a standard deviation of $2,120. If a sample of 100 men is taken, what is the
probability their mean salaries will be less than $29,000?
The mean (average or μ) = 29321
The standard deviation (σ) = 2120
Sample size (n) = 100
In other words, the problem is asking you “What is the probability that a
sample mean of 100 items will be less than a given number?
X ~ sample mean
Y = [(X – μ)/(σ/sqrt(n))] ~ N(0.1)
P (Y < [(29000 – μ)/(σ/sqrt(n))])= pnorm(-1.51)
Central limit theorem - Usage
© akhila prabhakaran
Problem: There are 250 dogs at a dog show who weigh an average of 12
pounds, with a standard deviation of 8 pounds. If 4 dogs are chosen at
random, what is the probability they have an average weight of greater than 8
pounds and less than 25 pounds?
The mean (average or μ) = 12
The standard deviation (σ) = 8
Sample size (n) = 4
In other words, the problem is asking you “What is the probability that a
sample mean of 4 items will be less than 25 and more than 8?
X ~ sample mean
Y = [(X – μ)/(σ/sqrt(n))] ~ N(0.1)
P ([(8 – μ)/(σ/sqrt(n))] < Y < [(25 – μ)/(σ/sqrt(n))])
Central limit theorem - Usage
© akhila prabhakaran
The mean (average or μ) = 12
The standard deviation (σ) = 8
Sample size (n) = 4
X ~ sample mean
Y = [(X – μ)/(σ/sqrt(n))] ~ N(0.1)
P ([(8 – μ)/(σ/sqrt(n))] < Y < [(25 – μ)/(σ/sqrt(n))])
P(-4/4 < Y < 13/4 )
= pnorm(3.5) + 1 – pnorm(-1)
Chi-square distribution
© akhila prabhakaran
If X is a standard normal random variable with mean μ and variance σ2 then X2 has a
Chi-square distribution with 1 degree of freedom.
If X1 ,X2 ,X3, ,X4 …… ,Xn are independent standard normal random variables with mean
μ and variance σ2 , then Y = X1
2 + X2
2 + X3
2 +…Xn
2 has a Chi-square distribution with
n degrees of freedom.
Chi-square distribution
© akhila prabhakaran
X ~ Chi-square with n degrees of freedom
Prob. Density function
c is a constant
E[X] = n
Var[X] = 2n
Chi-square distribution
© akhila prabhakaran
Chi-square distribution
© akhila prabhakaran
?chisquare
dchisq(x, df, ncp = 0, log = FALSE)
pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)
qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)
rchisq(n, df, ncp = 0)
plot(dchisq(seq(from = 0, to = 10, by = 0.005), df=1))
plot(dchisq(seq(from = 0, to = 10, by = 0.005), df=2))
plot(dchisq(seq(from = 0, to = 10, by = 0.005), df=3))
plot(dchisq(seq(from = 0, to = 10, by = 0.005), df=4))
Chi-square distribution
© akhila prabhakaran
Let X1 and X2 be two independent normal random variables having mean μ =0
and variance σ2 =16. Compute the following probability:
Let X be a chi-square random variable with 3 degrees of freedom.
Compute the following probability:
pchisq(7.81, df = 3) – pchisq(0.35, df = 3)
Student’s T - Distribution
© akhila prabhakaran
X1, ..., Xn are independent and identically distributed as N(μ, σ2), i.e. this is a sample
of size n from a normally distributed population with expected mean value μ and
variance σ2.
Sample Mean Sample Variance
Has a standard normal distribution
Has a Students T distribution with n-1 degrees of
freedom
Student’s T - Distribution
© akhila prabhakaran
Properties of the t Distribution
 The mean of the distribution is equal to 0 .
 The variance is equal to n / ( n - 2 ), where v is the degrees of
freedom and n > 2.
 The variance is always greater than 1, although it is close to 1 when
there are many degrees of freedom.
 With infinite degrees of freedom, the t distribution is the same as the
standard normal distribution.
Student’s T - Distribution
© akhila prabhakaran
?tdist
dt(x, df, ncp, log = FALSE)
pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)
qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
rt(n, df, ncp)
Exercise: Plot probability density function of students T distribution for 1 to 10
degrees of freedom

Weitere ähnliche Inhalte

Was ist angesagt?

Binomial probability distributions ppt
Binomial probability distributions pptBinomial probability distributions ppt
Binomial probability distributions ppt
Tayab Ali
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
rishi.indian
 
Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval Estimation
Shubham Mehta
 
F Distribution
F  DistributionF  Distribution
F Distribution
jravish
 
20140602 statistical power - husnul and nur
20140602   statistical power - husnul and nur20140602   statistical power - husnul and nur
20140602 statistical power - husnul and nur
Muhammad Khuluq
 

Was ist angesagt? (20)

Hypothesis
HypothesisHypothesis
Hypothesis
 
Binomial probability distributions ppt
Binomial probability distributions pptBinomial probability distributions ppt
Binomial probability distributions ppt
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Probability And Probability Distributions
Probability And Probability Distributions Probability And Probability Distributions
Probability And Probability Distributions
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Advance Statistics - Wilcoxon Signed Rank Test
Advance Statistics - Wilcoxon Signed Rank TestAdvance Statistics - Wilcoxon Signed Rank Test
Advance Statistics - Wilcoxon Signed Rank Test
 
Introduction to Hypothesis Testing
Introduction to Hypothesis TestingIntroduction to Hypothesis Testing
Introduction to Hypothesis Testing
 
Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval Estimation
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Chap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsChap06 sampling and sampling distributions
Chap06 sampling and sampling distributions
 
F Distribution
F  DistributionF  Distribution
F Distribution
 
Measures of Variability
Measures of VariabilityMeasures of Variability
Measures of Variability
 
Factorial ANOVA
Factorial ANOVAFactorial ANOVA
Factorial ANOVA
 
20140602 statistical power - husnul and nur
20140602   statistical power - husnul and nur20140602   statistical power - husnul and nur
20140602 statistical power - husnul and nur
 
Introduction to Analysis of Variance
Introduction to Analysis of VarianceIntroduction to Analysis of Variance
Introduction to Analysis of Variance
 
Regression
RegressionRegression
Regression
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements
 
Expectation of Discrete Random Variable.ppt
Expectation of Discrete Random Variable.pptExpectation of Discrete Random Variable.ppt
Expectation of Discrete Random Variable.ppt
 
Two-Way ANOVA Overview & SPSS interpretation
Two-Way ANOVA Overview & SPSS interpretationTwo-Way ANOVA Overview & SPSS interpretation
Two-Way ANOVA Overview & SPSS interpretation
 

Ähnlich wie Statistical Analysis with R- III

2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.
WeihanKhor2
 
Statistik Chapter 5 (1)
Statistik Chapter 5 (1)Statistik Chapter 5 (1)
Statistik Chapter 5 (1)
WanBK Leo
 
4Probability and probability distributions.pdf
4Probability and probability distributions.pdf4Probability and probability distributions.pdf
4Probability and probability distributions.pdf
AmanuelDina
 

Ähnlich wie Statistical Analysis with R- III (20)

PG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability DistributionPG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability Distribution
 
Inorganic CHEMISTRY
Inorganic CHEMISTRYInorganic CHEMISTRY
Inorganic CHEMISTRY
 
Prob distros
Prob distrosProb distros
Prob distros
 
Unit3
Unit3Unit3
Unit3
 
4 1 probability and discrete probability distributions
4 1 probability and discrete    probability distributions4 1 probability and discrete    probability distributions
4 1 probability and discrete probability distributions
 
Sqqs1013 ch5-a122
Sqqs1013 ch5-a122Sqqs1013 ch5-a122
Sqqs1013 ch5-a122
 
Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributions
 
Probability Distributions.pdf
Probability Distributions.pdfProbability Distributions.pdf
Probability Distributions.pdf
 
U unit8 ksb
U unit8 ksbU unit8 ksb
U unit8 ksb
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.
 
Probability Distribution - Binomial, Exponential and Normal
Probability Distribution - Binomial, Exponential and NormalProbability Distribution - Binomial, Exponential and Normal
Probability Distribution - Binomial, Exponential and Normal
 
Hypothesis testing.pptx
Hypothesis testing.pptxHypothesis testing.pptx
Hypothesis testing.pptx
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
 
Inferential Statistics.pdf
Inferential Statistics.pdfInferential Statistics.pdf
Inferential Statistics.pdf
 
Probability
ProbabilityProbability
Probability
 
Statistik Chapter 5 (1)
Statistik Chapter 5 (1)Statistik Chapter 5 (1)
Statistik Chapter 5 (1)
 
Probability
ProbabilityProbability
Probability
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for Dummies
 
4Probability and probability distributions.pdf
4Probability and probability distributions.pdf4Probability and probability distributions.pdf
4Probability and probability distributions.pdf
 

Mehr von Akhila Prabhakaran (9)

Re Imagining Education
Re Imagining EducationRe Imagining Education
Re Imagining Education
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Hypothesis testing Part1
Hypothesis testing Part1Hypothesis testing Part1
Hypothesis testing Part1
 
Statistical Analysis with R -II
Statistical Analysis with R -IIStatistical Analysis with R -II
Statistical Analysis with R -II
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 

Kürzlich hochgeladen

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Kürzlich hochgeladen (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Statistical Analysis with R- III

  • 2. PART 3 3. Probability distributions 3.1. Normal distribution 3.2. Chi-square distribution 3.3. Student’s t-distribution 3.4. Summary of applications of different distributions 3.5 Central Limit Theorem © akhila prabhakaran
  • 3. Probability Distributions Recap When the value of a variable is the outcome of a statistical experiment, that variable is a random variable. Sample Space = set of all possible outcomes of an experiment. Event = subset of the Sample Space. (example coin toss) S = sample space {all outcomes of the experiment} = {e1, e2, e3, e4…..en} Probability Distribution = {p1 = P(e1), p2 = P(e2)…….pn = P(en)} © akhila prabhakaran
  • 4. Population vs Sample A population is a group of phenomena that have something in common. The term often refers to a group of people, as in the following examples:  All registered voters in Bangalore  All members of the IEEE  All Cricketers who played atleast one league match in the past year Populations can refer to things as well as people:  All sensors installed in a high security location.  All daily maximum temperatures in July for major Indian cities  All basal ganglia cells from a particular rhesus monkey © akhila prabhakaran
  • 5. Sample vs Population A sample is a smaller group of members of a population selected to represent the population. PARAMETER => Population characteristic like population mean etc. STATISTIC => Sample characteristic © akhila prabhakaran
  • 6. Probability Distribution © akhila prabhakaran Experiment: Flip a coin two times. All possible outcomes: HH, HT, TH, and TT. Random variable X : Number of Heads that result from this experiment. All possible values of X : 0, 1, or 2. A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence. Number of Heads (X) Probability [ P(X =x)] 0 0.25 1 0.50 2 0.25
  • 7. Cumulative Probability Distribution © akhila prabhakaran Refers to the probability that the value of a random variable falls within a specified range. Experiment: Flip a coin two times. All possible outcomes: HH, HT, TH, and TT. What is the probability that the coin flips would result in one or fewer heads? P(X < 1) = P(X = 0) + P(X = 1) = 0.25 + 0.50 = 0.75 Number of Probability (X =x) Cumulative Probability (X<=x) 0 0.25 0.25 1 0.50 0.75 2 0.25 1
  • 8. UNIFORM Distribution All of the values of a random variable occur with equal probability. Suppose the random variable X can assume k different values. Suppose also that the P(X = xk) is constant. P(X = xk) = 1/k Example : Suppose a dice is tossed. What is the probability that the die will land on 5? 6 possible outcomes represented by: S = { 1, 2, 3, 4, 5, 6 }. Each possible outcome is a random variable (X), and each outcome is equally likely to occur. The P(X = 5) = 1/6. What is the probability that the dice will land on a number that is smaller than 5? © akhila prabhakaran
  • 9. Probability Distributions: Discrete or Continuous Depends on whether it is associated with Discrete variables or Continuous variables Discrete data When the values in the batch are whole numbers (counts), the data set is called discrete. Examples of discrete measurements are: Continuous data When the data are not constrained to be whole numbers, the data set is called continuous. Examples are: the maximum temperatures each day in January in your local city, © akhila prabhakaran
  • 10. Discrete Probability Distributions If a random variable is a discrete variable, its probability distribution is called a discrete probability distribution. Earlier example about flipping a coin and rolling a dice. Binomial probability distribution  A binomial experiment is a statistical experiment that consists of n repeated trials. Each trial can result in just two possible outcomes (success or failure). The probability of success, denoted by P, is the same on every trial. The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.  A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution. © akhila prabhakaran
  • 11. Binomial distribution © akhila prabhakaran Probability of r successes in n trials
  • 14. Applications of Binomial distribution © akhila prabhakaran In modeling the driver behavior, intersection turning movements, and in speed studies this distribution is used. For example, if the probability of a vehicle turning left at an intersection is 0.15 then the probability of 3 vehicles out of 10 vehicles turning left equals to, 10C3 (0.15)3 (0.85)7 =0.130 In the above example, a specific vehicle turning left or not is a Bernoulli trial and it is assumed that the arrivals of individual vehicles at the junction are independent events.
  • 15. Applications of Binomial distribution © akhila prabhakaran A Biological Application of the Binomial Distribution Suppose that 1% of the population is infected with a virus. There are no obvious symptoms that can be used to recognise carriers, thus individuals must be selected at random and tested. A decision is made to obtain a sample of 20 individuals. Is this sample size adequate? Will any infected individuals be found? If 1% of the population is infected then p = 0.01 (1% infected) and q = 0.99 (99% non-infected). Picking an individual at random has only a 1% chance of an infection, but surely at least 1 infected person should be found in 20 individuals? In order to answer this question lateral thinking is needed.
  • 16. Applications of Binomial distribution © akhila prabhakaran A Biological Application of the Binomial Distribution To find the probability of finding some (i.e. 1 or more) the easiest way is to calculate the probability of no cases (i.e. P(0)) and then use subtraction. The number of successes, r, to 0, and the number of trials, n, to 20. This will gives the probability of taking a sample of 20 individuals and finding no infected individuals. P(0) = 20C0 p0 q20 P(0) = 20!/((0!)(20-0)! x 0.010 x 0.9920 = 0.82 Thus, if 1% of the population is infected there is a 82% chance that a sample of 20 individuals will fail to find any infections
  • 17. Poisson Distribution © akhila prabhakaran Probability distribution that results from a Poisson experiment. Attributes of a Poisson Experiment • Outcomes that can be classified as successes or failures. • Average number of successes (μ) that occurs in a specified region is known. • Probability that a success will occur is proportional to the size of the region. • The probability that a success will occur in an extremely small region is virtually zero. • The specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc.
  • 20. Poisson Distribution Examples © akhila prabhakaran Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that tourists will see fewer than four lions on the next 1-day safari? This is a Poisson experiment in which we know the following: μ = 5; since 5 lions are seen per safari, on average. x = 0, 1, 2, or 3; Find the likelihood that tourists will see fewer than 4 lions; we want the probability that they will see 0, 1, 2, or 3 lions. e = 2.71828; since e is a constant equal to approximately 2.71828. We need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5).
  • 21. Poisson Distribution Examples © akhila prabhakaran
  • 22. Poisson Distribution © akhila prabhakaran If, from the past experience it is known that on an average every two years 3 cyclones hit the coastal area of Andhra Pradesh and Orissa states. If it is assumed that the cyclone hitting the coastal areas follows Poisson distribution then what is the probability of two cyclones crossing the coastal area of Andhra Pradesh and Orissa in the next two years?
  • 23. Poisson Distribution © akhila prabhakaran The most widely used situation is the arrival pattern of vehicles. In this case m becomes the average number of vehicles per any stated time interval. Queueing systems use poisson distribution or variations of this distribution, extensively to understand and optimize queueing patterns/workflow.
  • 24. Probability Density Function © akhila prabhakaran There are three basic differences between a continuous and a discrete probability distribution: 1. The probability that a continuous variable will take a specific value is equal to zero. 2. Because of this, we can never express continuous probability distribution in a tabular form. 3. Thus we require an equation or a formula to describe such kind of distribution. Such equation is termed as probability density function.
  • 25. Probability Density Function © akhila prabhakaran
  • 26. PDF and CDF © akhila prabhakaran
  • 27. Expected Value / MEAN © akhila prabhakaran Mean or Expected Value
  • 29. Uniform (continuous) Distribution © akhila prabhakaran Probability Density Function of Uniform Distribution
  • 31. Normal Distribution © akhila prabhakaran Probability Density Function
  • 33. Normal Distribution © akhila prabhakaran  Normal distributions are symmetric around their mean.  The mean, median, and mode of a normal distribution are equal.  The area under the normal curve is equal to 1.0.  Normal distributions are denser in the center and less dense in the tails.  Normal distributions are defined by two parameters, the mean (μ) and the standard deviation (σ).  68% of the area of a normal distribution is within one standard deviation of the mean.  Approximately 95% of the area of a normal distribution is within two standard deviations of the mean.
  • 35. Normal Distribution © akhila prabhakaran One of the first applications of the normal distribution was to the analysis of errors of measurement made in astronomical observations, errors that occurred because of imperfect instruments and imperfect observers. Galileo in the 17th century noted that these errors were symmetric and that small errors occurred more frequently than large errors. This led to several hypothesized distributions of errors, but it was not until the early 19th century that it was discovered that these errors followed a normal distribution. Independently, the mathematicians Adrain in 1808 and Gauss in 1809 developed the formula for the normal distribution and showed that errors were fit well by this distribution. This same distribution had been discovered by Laplace in 1778 when he derived the extremely important central limit theorem. Laplace showed that even if a distribution is not normally distributed, the means of repeated samples from the distribution would be very nearly normally distributed, and that the larger the sample size, the closer the distribution of means would be to a normal distribution. Most statistical procedures for testing differences between means assume normal distributions. These tests work well even if the original distribution is only roughly normal. Quételet was the first to apply the normal distribution to human characteristics. He noted that characteristics such as height, weight, and strength were normally distributed.
  • 36. Normal Distribution – Area under the curve © akhila prabhakaran http://onlinestatbook.com/2/calculators/normal_dist.html > pnorm(1, mean=0, sd=1) [1] 0.8413447 > x=seq(-4,4,length=200) > y=dnorm(x) > plot(x,y,type="l", lwd=2, col="blue") > x=seq(-4,1,length=200) > y=dnorm(x) > polygon(c(-4,x,1),c(0,y,0),col="gray") Interpretation of area as a probability This result indicates that if we draw a number at random from the standard normal distribution, the probability that we draw a number that is less than or equal to 1 is 0.8413447.
  • 37. Normal Distribution: Area under the curve © akhila prabhakaran The probability that a randomly selected number from the standard normal distribution occurs within one standard deviation of the mean. This probability is represented by the area under the standard normal curve between x = -1 and x = 1 > pnorm(1, mean=0, sd=1)-pnorm(-1,mean-0, sd = 1) [1] 0.6826895 > x=seq(-4,4,length=200) > y=dnorm(x) > plot(x,y,type="l", lwd=2, col="blue") > x=seq(-1,1,length=100) > y=dnorm(x) > polygon(c(-1,x,1),c(0,y,0),col="gray")
  • 38. Normal Distribution: Quantiles © akhila prabhakaran Given the probability (or area under the curve) find the x value. What is the 95th percentile of a standard normal distribution? > qnorm(0.95,mean=0,sd=1) [1] 1.644854 Find all quantiles of the standard normal distribution. Display pdfs of normal distributions with mean of 50 and with standard deviations of 10 and 5 respectively. Display pdfs of normal distributions with mean of 50 and 70 & standard deviations of 10 and 15 respectively
  • 39. Sum of Normal Random Variables © akhila prabhakaran X and Y are Normally distributed random variables, that are independent
  • 40. Sum of Normal Random Variables © akhila prabhakaran
  • 41. Degrees of Freedom © akhila prabhakaran The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. For example, an estimate of the variance based on a sample size of 100 is based on more information than an estimate of the variance based on a sample size of 5. If we know that the mean height of Martians is 6 and wish to estimate the variance of their heights. We randomly sample one Martian and find that its height is 8. Variance = (8-6)2 Has 1 degree of freedom If we have the height of another Martian, say 9, The new variance would be [(8-6)2 + (9-6)2] x 1/2 With 2 degrees of freedom Now, if we do not know the mean, the degrees of freedom reduces by 1
  • 43. What is inferential statistics? © akhila prabhakaran Generalizing from sample to population A critical part of inferential statistics involves determining how far sample statistics are likely to vary from each other and from the population parameter. These are determined based on Sampling Distributions.
  • 44. What is a sampling distribution? © akhila prabhakaran A sampling distribution is a graph of a statistic for your sample data Technically, you could choose any statistic to paint a picture, some common ones are: • Mean • Mean absolute value of the deviation from the mean • Range • Standard deviation of the sample • Unbiased estimate of variance • Variance of the sample
  • 45. Sampling distributions © akhila prabhakaran • A set of three pool balls, each with a number on it. • Two of the balls are selected randomly (with replacement) and the average of their numbers is computed. • Tabulate each outcome and its mean. • Tabulate the frequencies of the mean of each outcome
  • 46. Sampling distributions © akhila prabhakaran sample(1:3, 9, replace=TRUE)
  • 48. EXERCISE : SAMPLING DISTRIBUTION OF RANGE © akhila prabhakaran for(i in 1:10) { print(sample(c(1,2,3), 2, replace = TRUE, prob = NULL)) }
  • 49. Sampling distributions and inferential statistics © akhila prabhakaran s <- list() for(i in 1:20) { l1 <-sample(SachinNoNAs$Runs, 2, replace = TRUE, prob = NULL) s <- append(s, mean(l1)) } ggplot() + geom_histogram(aes(x = unlist(s)), bins= 100, color = "white", fill = "blue") ######################################### s <- list() for(i in 1:100) { l1 <-sample(SachinNoNAs$Runs, 50, replace = TRUE, prob = NULL) s <- append(s, mean(l1)) } ggplot() + geom_histogram(aes(x = unlist(s)), bins= 100, color = "white", fill = "blue")
  • 50. Normal Approximation to Binomial © akhila prabhakaran Assume you have a fair coin and wish to know the probability that you would get 8 heads out of 10 flips. Using dbinom dbinom(8,10,0.5) #[1] 0.04394531 plot(dbinom(seq(1:100), 100, 0.5), col="red", pch=19)
  • 51. Normal Approximation to Binomial © akhila prabhakaran Binomial distribution has a mean of μ = Np = (10)(0.5) = 5 and a variance of σ2 = Np(1-p) = (10)(0.5)(0.5) = 2.5 The standard deviation is therefore 1.5811. A total of 8 heads is (8 - 5)/1.5811 = 1.897 standard deviations above the mean of the distribution. Solution: round off and consider any value from 7.5 to 8.5 to represent an outcome of 8 heads. Using this approach, we figure out the area under a normal curve from 7.5 to 8.5.
  • 52. Central limit theorem © akhila prabhakaran Given a population with a finite mean μ and a finite non-zero variance σ2, the sampling distribution of the mean approaches a normal distribution with a mean of μ and a variance of σ2/N as N, the sample size, increases. If a population has a mean μ, then the mean of the sampling distribution of the mean is also μ. μM = μ The variance of the sampling distribution of the mean is
  • 53. Central limit theorem © akhila prabhakaran
  • 54. EXERCISE © akhila prabhakaran 1. X = sum of two 6-faced dice. What is the sample space of X? Can you simulate this using R? The experiment is performed N(=10,20,30) times. What is the distribution of X. Plot a histogram. 2. Find the sampling distribution of the means of X. 3. What is the mean and variance of the sampling distribution?
  • 55. Central limit theorem - Usage © akhila prabhakaran
  • 56. Central limit theorem - Usage © akhila prabhakaran Three central limit theorem examples: Find the probability that the mean is greater than a certain number Find the probability that the mean is less than a certain number Find the probability that the mean is between a certain set of numbers either side of the mean
  • 57. Central limit theorem - Usage © akhila prabhakaran Problem: A certain group of welfare recipients receives SNAP benefits of $110 per week with a standard deviation of $20. If a random sample of 25 people is taken, what is the probability their mean benefit will be greater than $120 per week? The mean (average or μ) The standard deviation (σ) Sample size (n) In other words, the problem is asking you “What is the probability that a sample mean of x items will be greater than a given number?
  • 58. Central limit theorem - Usage © akhila prabhakaran The mean (average or μ) The standard deviation (σ) Population size Sample size (n) In other words, the problem is asking you “What is the probability that a sample mean of x items will be greater than a given number?
  • 59. Central limit theorem - Usage © akhila prabhakaran Problem: A certain group of welfare recipients receives SNAP benefits of $110 per week with a standard deviation of $20. If a random sample of 25 people is taken, what is the probability their mean benefit will be greater than $120 per week? X ~ mean of the random sample To find P(X > $120) X ~ N(110, 20/sqrt(25)) (X – 110)/4 ~ N(0,1) Problem translates to P[(X-110)/4 > (120-110)/4] or P( Y > 2.5) where Y~N(0,1) 1 - pnorm(2.5)
  • 60. Central limit theorem - Usage © akhila prabhakaran Problem: A population of 29 year-old males has a mean salary of $29,321 with a standard deviation of $2,120. If a sample of 100 men is taken, what is the probability their mean salaries will be less than $29,000? The mean (average or μ) = 29321 The standard deviation (σ) = 2120 Sample size (n) = 100 In other words, the problem is asking you “What is the probability that a sample mean of 100 items will be less than a given number? X ~ sample mean Y = [(X – μ)/(σ/sqrt(n))] ~ N(0.1) P (Y < [(29000 – μ)/(σ/sqrt(n))])= pnorm(-1.51)
  • 61. Central limit theorem - Usage © akhila prabhakaran Problem: There are 250 dogs at a dog show who weigh an average of 12 pounds, with a standard deviation of 8 pounds. If 4 dogs are chosen at random, what is the probability they have an average weight of greater than 8 pounds and less than 25 pounds? The mean (average or μ) = 12 The standard deviation (σ) = 8 Sample size (n) = 4 In other words, the problem is asking you “What is the probability that a sample mean of 4 items will be less than 25 and more than 8? X ~ sample mean Y = [(X – μ)/(σ/sqrt(n))] ~ N(0.1) P ([(8 – μ)/(σ/sqrt(n))] < Y < [(25 – μ)/(σ/sqrt(n))])
  • 62. Central limit theorem - Usage © akhila prabhakaran The mean (average or μ) = 12 The standard deviation (σ) = 8 Sample size (n) = 4 X ~ sample mean Y = [(X – μ)/(σ/sqrt(n))] ~ N(0.1) P ([(8 – μ)/(σ/sqrt(n))] < Y < [(25 – μ)/(σ/sqrt(n))]) P(-4/4 < Y < 13/4 ) = pnorm(3.5) + 1 – pnorm(-1)
  • 63. Chi-square distribution © akhila prabhakaran If X is a standard normal random variable with mean μ and variance σ2 then X2 has a Chi-square distribution with 1 degree of freedom. If X1 ,X2 ,X3, ,X4 …… ,Xn are independent standard normal random variables with mean μ and variance σ2 , then Y = X1 2 + X2 2 + X3 2 +…Xn 2 has a Chi-square distribution with n degrees of freedom.
  • 64. Chi-square distribution © akhila prabhakaran X ~ Chi-square with n degrees of freedom Prob. Density function c is a constant E[X] = n Var[X] = 2n
  • 66. Chi-square distribution © akhila prabhakaran ?chisquare dchisq(x, df, ncp = 0, log = FALSE) pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE) qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE) rchisq(n, df, ncp = 0) plot(dchisq(seq(from = 0, to = 10, by = 0.005), df=1)) plot(dchisq(seq(from = 0, to = 10, by = 0.005), df=2)) plot(dchisq(seq(from = 0, to = 10, by = 0.005), df=3)) plot(dchisq(seq(from = 0, to = 10, by = 0.005), df=4))
  • 67. Chi-square distribution © akhila prabhakaran Let X1 and X2 be two independent normal random variables having mean μ =0 and variance σ2 =16. Compute the following probability: Let X be a chi-square random variable with 3 degrees of freedom. Compute the following probability: pchisq(7.81, df = 3) – pchisq(0.35, df = 3)
  • 68. Student’s T - Distribution © akhila prabhakaran X1, ..., Xn are independent and identically distributed as N(μ, σ2), i.e. this is a sample of size n from a normally distributed population with expected mean value μ and variance σ2. Sample Mean Sample Variance Has a standard normal distribution Has a Students T distribution with n-1 degrees of freedom
  • 69. Student’s T - Distribution © akhila prabhakaran Properties of the t Distribution  The mean of the distribution is equal to 0 .  The variance is equal to n / ( n - 2 ), where v is the degrees of freedom and n > 2.  The variance is always greater than 1, although it is close to 1 when there are many degrees of freedom.  With infinite degrees of freedom, the t distribution is the same as the standard normal distribution.
  • 70. Student’s T - Distribution © akhila prabhakaran ?tdist dt(x, df, ncp, log = FALSE) pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE) qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE) rt(n, df, ncp) Exercise: Plot probability density function of students T distribution for 1 to 10 degrees of freedom

Hinweis der Redaktion

  1. All probability distributions can be classified as discrete probability distributions or as continuous probability distributions, depending on whether they define probabilities associated with discrete variables or continuous variables. the number of admissions in a hospital's accident and emergency unit each day over a period of two months, the number of people in each household in a survey of 10,000 households,
  2. http://stattrek.com/probability-distributions/binomial.aspx This has several applications in other fields of civil engineering, such as the probability of occurrence of peak floods greater than the design peak flood in a particular time period, probability of peak ground acceleration exceeding certain design value in a given time interval etc.
  3. http://stattrek.com/probability-distributions/binomial.aspx This has several applications in other fields of civil engineering, such as the probability of occurrence of peak floods greater than the design peak flood in a particular time period, probability of peak ground acceleration exceeding certain design value in a given time interval etc.
  4. http://stattrek.com/probability-distributions/binomial.aspx This has several applications in other fields of civil engineering, such as the probability of occurrence of peak floods greater than the design peak flood in a particular time period, probability of peak ground acceleration exceeding certain design value in a given time interval etc.
  5. The Standard Normal curve, shown here, has mean 0 and standard deviation 1. If a dataset follows a normal distribution, then about 68% of the observations will fall within  of the mean , which in this case is with the interval (-1,1). About 95% of the observations will fall within 2 standard deviations of the mean, which is the interval (-2,2) for the standard normal, and about 99.7% of the observations will fall within 3 standard deviations of the mean, which corresponds to the interval (-3,3) in this case. Although it may appear as if a normal distribution does not include any values beyond a certain interval, the density is actually positive for all values, . Data from any normal distribution may be transformed into data following the standard normal distribution by subtracting the mean  and dividing by the standard deviation . 
  6. you can use it to find the proportion of a normal distribution with a mean of 90 and a standard deviation of 12 that is above 110. Set the mean to 90 and the standard deviation to 12. Then enter "110" in the box to the right of the radio button "Above." At the bottom of the display you will see that the shaded area is 0.0478. See if you can use the calculator to find that the area between 115 and 120 is 0.0124
  7. you can use it to find the proportion of a normal distribution with a mean of 90 and a standard deviation of 12 that is above 110. Set the mean to 90 and the standard deviation to 12. Then enter "110" in the box to the right of the radio button "Above." At the bottom of the display you will see that the shaded area is 0.0478. See if you can use the calculator to find that the area between 115 and 120 is 0.0124
  8. you can use it to find the proportion of a normal distribution with a mean of 90 and a standard deviation of 12 that is above 110. Set the mean to 90 and the standard deviation to 12. Then enter "110" in the box to the right of the radio button "Above." At the bottom of the display you will see that the shaded area is 0.0478. See if you can use the calculator to find that the area between 115 and 120 is 0.0124
  9. Tail risk can be evaluated by assuming a normal distribution and computing the probability of such an event. Is that how "tail risk" should be evaluated?  http://onlinestatbook.com/2/normal_distribution/ch6_exercises.html
  10. http://rpubs.com/Lionel/11497
  11. http://stattrek.com/probability-distributions/t-distribution.aspx