Elements of Inference covers the following concepts and takes off right from where we left off in the previous slide https://www.slideshare.net/GiridharChandrasekar1/statistics1-the-basics-of-statistics.
Population Vs Sample (Measures)
Probability
Random Variables
Probability Distributions
Statistical Inference – The Concept
2. A Small Recap of Previous Presentation
• Descriptive Vs Inferential Statistics
• Sample Vs Population
• Need for sampling
3. STATISTICS
Descriptive
• Descriptive statistics in
simple sense is to provide
people a description of the
data that we currently have.
• Example would be what is
the statistics of performance
of a class of students and
answer would be like mean
marks are 64.5
Inferential
• Inferential statistics is when
we have to infer an outcome
by just looking at a small
portion of data.
• Example would be who will
win this election and answer
would be like a survey of
10000 people suggests that
XYZ has a 60-70% chance with
95% confidence
4. Sample Vs Population
Sample
• A sample is a portion of the
population which is readily
available or easily attainable.
• Example would be a survey
or just a million people from
the population.
Population
• A population is the entire
data that should be ideally
used for a statistic.
• Example would be a census
or population of a country
5. Why do we go for sample ?
• Going around and asking the entire
population of people who they are going to
vote for is impossible.
• Taking the heights of the entire population is
not feasible as a lot would die and a lot would
be born by the time we finish.
• Sometimes the sample would be a
“Representative sample” meaning it has the
same nature and characteristics of the
population.
6. Elements of Inference
• Population Vs Sample (Measures)
• Probability
• Random Variables
• Probability Distributions
• Statistical Inference – The Concept
7. Population Vs Sample (Measures)
Population Measures
• Mean = μ
• SD = σ
• Var = σ2 :
• Variance Formula : Average
squared distance from
population mean.
Sample Measures
• Mean = X̅
• SD = S
• Var = S2 :
• Variance Formula :
MODIFIED Average squared
distance from Sample
mean.
Explained
in A
separate
simulation
10. Probability – Measure of Randomness
• Probability can be assumed as the measure of
randomness
• Usually done on a population quantity
11. A Simple rule in subsets
• Advanced Rule – If Occurance of A implies
occurrence of B, Then Probability of A
occurrence is < probability of B occurrence.
• P(A) < P(B).
BA
12. Probability in Statistics
• Why probability is a part of statistics?
• Probability calculus helps model the
randomness and hence is a part of statistical
measures.
• Mass function and density function is the
starting point.
• Example is bell curve (Normal distribution)
and all other similar distribution.
13. Types of Data
Quantitative
• Quantitative data are
called discrete if the sample space
contains a finite or countable
infinite number.
• S = {0, 1, 2, ..., 31} – students
passed
• S = {0, 1, 2, ...} – cars crossed in a
given hour
• Quantitative data are
called continuous if the sample
space contains an interval or
continuous span of real numbers.
• S = {h: h ≥ 0 hours} – number of
hours spent studying
Qualitative
• Qualitative data are
called categorical if the
sample space contains objects
that are grouped or
categorized based on some
qualitative trait.
• When there are only two such
groups or categories, the data
are considered binary.
• S = {yes, no} – binary
• S = {Male, Female, Other} -
Categorical
15. Random Variables
• Similar to variables in a computer program
• RV is a variable which holds the numeric
outcome of a experiment
• Types : Discrete and Continuous
16. Discrete random variables
• Examples of Discrete : Die roll
/ Coin toss etc.
• Coin toss – 2 possible values
{H, T}
• Roll of Dice – 6 possible
values {1,2,3,4,5,6}
• Modelling Discrete : We
associate a probability to all
individual outcomes.
• Web traffic in a given day –
Can have a fixed but unbound
value at any given day
Continuous random variables
• Example of continuous :
Number of hours I sleep
daily ( Discrete ? ).
• Lets develop the above into
continuous
• If you answer that 7, I may
ask is it & or 7.000001?
• Modelling Continuous : We
associate a probability to a
various ranges of outcomes
17. Continuous RV Example
• Height differentials of students in a class.
• Try to put the values in a SET -> { , , }
• Can you be sure if you get a value of 1.24?
• It can be 1.245.... Or 1.242...
• So Continuous RVs does not have defined value
instead can take plenty of values or infinite states.
• That is why we use a range to model them like.
[1-2], [2-3], [3-4]
19. Types of Distributions
Discrete Probability Distribution
• Bernoulli Distribution
• Binomial Distribution
• Geometric Distribution
• Poisson Distribution
Continuous Probability Distribution
• Uniform Distribution
• Normal Distribution
• Exponential Distribution
• Gamma Distribution
• Chi-Squared Distribution
20. Probability mass function (p.m.f)
• The probability that a discrete random
variable X takes on a particular value x, that
is, P(X = x),also denoted f(x).
• The function f(x) is typically called the probability
mass function.
• A.k.a
• probability function
• frequency function
• probability density function.
21. PMF of Discrete Random Variables
• PMF(H) -> f(coin toss) -> P(coin toss = H) = ½
• Defined as PMF is a function of a value of
random variable (X) which gives the
probability associated with that value of X(x).
• Cant be zero, all values of RV sums to 1
22. Cumulative Distribution Function
• The function: F(x) = P(X ≤ x)
is called a cumulative probability distribution.
• For a discrete random variable X, the
cumulative probability distribution F(x) is
determined by:
23. PMF Vs CDF
• note that the probability mass function, f(x),
of a discrete random variable X is
distinguished from the cumulative probability
distribution, F(x), of a discrete random
variable X by the use of a lowercase f and an
uppercase F.
• That is, the notation f(3) means P(X = 3), while
the notation F(3) means P(X ≤ 3).
24. Survival function
• Both CDF and Survival function is just some functions which if
named can make life easier.
• CDF(x) is probability that the function takes the value x and
lower.
• Survival function is just the opposite of Cdf.
• Survival fn(x) -> P(X > x).
25. Bernoulli Distribution
• X = 0 -> Tails
• X = 1 -> Heads
• P(x) = ( ½ )x.( ½ )(x-1)
• P(x) = (θ)x.(1-θ)(x-1) for a biased coin
• The above is Bernoulli distribution and models
a coin toss.
27. Binomial Distrbution
• A discrete random variable X is a binomial random
variable if:
• An experiment, or trial, is performed in exactly the same
way n times.
• Each of the n trials has only two possible outcomes. One of
the outcomes is called a "success," while the other is called
a "failure." Such a trial is called a Bernoulli trial.
• The n trials are independent.
• The probability of success, denoted p, is the same for each
trial. The probability of failure is q = 1 − p.
• The random variable X = the number of successes in
the n trials.
28. Binomial Distribution
• The probability mass function of a binomial random
variable X is:
• f(x)=(nCx) (p)x.(1-p)(n-x) Or (nCx) (p)x.(q)(n-x)
• We denote the binomial distribution as b(n, p). That is, we
say:
• X ~ b(n, p)
• where the tilde (~) is read "as distributed as,"
and n and p are called parameters of the distribution.
29. Function, Mean and Variance of
Binomial Distribution
• P.m.f -> f(x)=(nCx) (p)x.(1-p)(n-x)
• Mean -> np
• Sd -> σ= √np(1−p)
• Variance -> σ2=np(1−p)
30. Geometric Distribution
• Assume Bernoulli trials — that is,
• (1) there are two possible outcomes,
• (2) the trials are independent, and
• (3) p, the probability of success, remains the same from trial to trial.
• Let X denote the number of trials until the first success. Then, the
probability mass function of X is:
• f(x)=P(X=x)=(1−p)x−1p
• for x = 1, 2, ... In this case, we say that X follows a geometric
distribution.
31. Geometric Distribution Example
• A representative from the National Football
League's Marketing Division randomly selects
people on a random street in Kansas City, Kansas
until he finds a person who attended the last
home football game. Let p, the probability that he
succeeds in finding such a person, equal 0.20.
And, let X denote the number of people he
selects until he finds his first success. What is the
probability that the marketing representative
must select 4 people before he finds one who
attended the last home football game?
32. Function, Mean and Variance of
Geometric Distribution
• P.m.f -> f(x)= P(X=x)=(1−p)x−1p
• Mean -> 1/p
• Sd -> σ= √(1−p)/p
• Variance -> σ2=(1-p)/p2
33. Negative Binomial Distribution
• Assume Bernoulli trials from the same geometric distribution
example— that is,
• (1) there are two possible outcomes,
• (2) the trials are independent, and
• (3) p, the probability of success, remains the same from trial to trial.
• Let X denote the number of trials until the rth success. Then, the
probability mass function of X is:
• f(x)=P(X=x)=(x−1Cr−1)(1−p)x−rpr
• for x = r, r + 1, r + 2, ... In this case, we say that X follows a negative
binomial distribution.
• A geometric distribution is a special case of a negative binomial
distribution with r = 1
34. Function, Mean and Variance of
Negative Binomial Distribution
• P.m.f -> f(x)=(x−1Cr−1)(1−p)x−rpr
• Mean -> r/p
• Sd -> √r(1−p)/p
• Variance -> r(1-p)/p2
35. Poisson Distribution
• Let the discrete random variable X denote the number of times an event
occurs in an interval of time (or space). Then X maybe a Poisson random
variable with x = 0, 1, 2, ...
• Examples:
• Let X equal the number of typos on a printed page. (This is an example of
an interval of space — the space being the printed page.)
• Let X equal the number of cars passing through the intersection of Allen
Street and College Avenue in one minute. (This is an example of an
interval of time — the time being one minute.)
• Let X equal the number of Alaskan salmon caught in a squid driftnet. (This
is again an example of an interval of space — the space being the squid
driftnet.)
• Let X equal the number of customers at an ATM in 10-minute intervals.
• Let X equal the number of students arriving during office hours.
36. Function, Mean and Variance of
Poisson Distribution
• P.m.f ->
• Mean -> λ
• Sd -> √ λ
• Variance -> λ
Poisson can be approximated to the binomial distribution when n is large and p is
small.
38. Continuous Distributions
• In this section, as the title suggests, we are
going to investigate probability distributions
of continuous random variables, that is,
random variables whose values contains an
infinite interval of possible outcomes.
39. Useful Pre-requisites
• Empirical Rule for 68,95 and 99.7 percentile.
• When the gathered data is mound or bell-
shaped.
• We can use the following formula to identify the
amount of data points below the level.
• 68% data is between μ ± 1σ
• 95% data is between μ ± 2σ
• 99.7% data is between μ ± 3σ
40. Useful Pre-requisites
• Quantiles
• Percentile is a derivative of quantile
• Percentile will map the value to that of 100
• Quantile will map the value to that of the
maximum value.
• The median is the 50th quantile.
• 25th Quantile will encompass 1/4th of the data
41. Useful Pre-requisite
• The 25th percentile is also called the first
quartile and is denoted as q1.
• The 50th percentile is also called the second
quartile or median, and is denoted as q2 or m.
• The 75th percentile is also called the third
quartile and is denoted as q3.
• The interquartile range (IQR) is the difference
between the first and third quartiles.
42. Useful Pre-requisite
• Five-Number Summary
• we have a random sample of 20 concentrations of
calcium carbonate (CaCO3) in milligrams per litre:
• Minimum: 127.8
• First quartile: 130.12
• Median: 131.45
• Third quartile: 132.70
• Maximum: 134.8
43. Useful Pre-requisite
• Skewness and Symmetry
• For a distribution that is skewed left, the bulk of the
data values (including the median) lie to the right of
the mean, and there is a long tail on the left side.
• For a distribution that is skewed right, the bulk of the
data values (including the median) lie to the left of the
mean, and there is a long tail on the right side.
• For a distribution that is symmetric, approximately half
of the data values lie to the left of the mean, and
approximately half of the data values lie to the right of
the mean.
45. Probability Density Function
• The probability that X takes on any particular
value x is 0. That is, finding P(X = x) for a
continuous random variable X is not going to
work.
• Instead, we'll need to find the probability
that X falls in some interval (a, b), that is, we'll
need to find P(a < X < b). We'll do that using a
probability density function ("p.d.f.").
46. Probability Density Function
• PDF -> Helps model the continuous RV and used area represented by the
probabilities
• Value is always larger than zero throughout the curve or function.
• Total area under the curve is 1.
• Area under the curve gives probabilities associated with Random Variable.
• Probablity that a person with IQ > 100 but less than 115 in a PDF defined is
shown.
100 115
47. Special Considerations on PDF
• The probability that the Random Variable
takes any specific value is 0 since area of the
curve (not under) is 0.
• The PDF always talks or deals with the
population measure (NOT sample measure).
48. Cumulative Distribution Function
• The function: F(x) = P(X ≤ x)
is called a cumulative probability distribution.
• For a discrete random variable X, F(x) is:
• For a continuous RV X F(x) is :
• The summation is made to integral.
49. Survival function
• Both CDF and Survival function is just some functions which if
named can make life easier.
• CDF(x) is probability that the function takes the value x and
lower.
• Survival function is just the opposite of Cdf.
• Survival fn(x) -> P(X > x).
CDF SF
50. Continuous Distributions
• Uniform Distribution
• Beta Distribution
• Normal Distribution
• Exponential Distribution
• Gamma Distribution
• Chi-Squared Distribution
51. Uniform Distribution
• A continuous random variable X has a uniform
distribution, denoted U(a, b), if its distribution
is as given below
•
52. P.d.f, mean and variance of a Uniform
distribution
• P.d.f ->
• Mean ->
• Variance ->
53. Exponential Distribution
• Suppose X, following an (approximate) Poisson process,
equals the number of customers arriving at a bank in an
interval of length 1.
• If λ, the mean number of customers arriving in an interval
of length 1, is 6, say, then we might observe something like
this:
•
• w – waiting time
54. Exponential Distribution – contd..
• Previously, our focus would have been on the
discrete random variable X, the number of
customers arriving.
• As the picture suggests, however, we could
alternatively be interested in the continuous
random variable W, the waiting time until
the first customer arrives.
55. p.d.f, mean and variance of
Exponential Distribution
• P.d.f ->
• Mean ->
• Variance ->
56. Gamma Distributions
• we learned that in an approximate Poisson process with
mean λ, the waiting time X until the first event occurs
follows an exponential distribution with mean θ = 1/λ.
• We now let W denote the waiting time until
the αth event occurs and find the distribution of W. We
could represent the situation as follows:
•
57. p.d.f, mean and variance of Gamma
Distribution
• P.d.f ->
• Mean ->
• Variance ->
58. Chi-Square Distribution
• the chi-square distribution is just a special
case of the gamma distribution!
• Let X follow a gamma distribution with θ = 2
and α = r/2, where r is a positive integer. Then
we say that X follows a chi-square distribution
with r degrees of freedom, denoted χ2(r) and
read "chi-square-r."
59. p.d.f, mean and variance of Chi-Square
Distribution
• P.d.f ->
• Mean ->
• Variance ->
60. Normal Distribution
• Most frequent distribution seen in the natural
world.
• P.d.f ->
• Mean -> μ
• Variance -> σ2
61. Properties of Normal Distribution
• All normal curves are bell-shaped.
• All normal curves are symmetric about the mean μ.
• The area under an entire normal curve is 1.
• All normal curves are positive for all x. That is, f(x) > 0
for all x.
• The limit of f(x) as x goes to infinity is 0, and the limit
of f(x) as x goes to negative infinity is 0.
• The height of any normal curve is maximized at x = µ.
• The shape of any normal curve depends on its
mean μ and standard deviation σ.
62. Standard Normal Distribution
• If X ~ N(μ, σ2), then: Z = (X-μ)/σ follows N(0,1).
• This means that Z is a random variable which
follows the N(0,1) distribution, which is called
the standardized (or standard) normal
distribution.
• Now we can use the standard normal N(0,1)
table, typically referred to as the Z-table, to find
the desired probability.
63. Finding probabilities in normal
distribution?
• Lets see the following question:-
• Let X equal the IQ of a randomly selected American.
Assume X ~ N(100, 162). What is the probability that a
randomly selected American has an IQ below 90?
•
64. Finding probabilities in normal
distribution?
• The following integral gives us the answer for the question
• But there is just 1 problem
• It is not possible to integrate the normal p.d.f. That is, no
simple expression exists for the antiderivative. We can only
approximate the integral using numerical analysis
techniques.
• So, all we need to do is find a normal probability table for a
normal distribution with mean μ = 100 and standard
deviation σ = 16. Then there would have to be an infinite
number of normal probability tables for various μ and σ.
65. Finding probabilities in normal
distribution? - Solution
• The cumulative probabilities have been tabled
for the N(0,1) distribution.
• All we need to do is transform our N(100,162)
distribution to a N(0,1) distribution
• Then use the cumulative probability table for
the N(0,1) distribution to calculate our desired
probability.
66. Transforming X ~ N(μ,σ2) to X ~ N(0,1)
• If X ~ N(μ, σ2), then:
• follows the N(0,1) distribution, which is called
the standardized (or standard) normal
distribution.
• Use the standard normal N(0,1) table, typically
referred to as the Z-table, to find the desired
probability.
67. Functions to find probability or X value
• First and foremost method is to use the
Empirical rule for 90%, 95%, 97.5%, 99% viz
1.28σ, 1.645σ, 1.96σ, 2.33σ respectively
• Second method is to use the Z-Table.
• Excel and R Formulas are described in detail in
the following slides.
68. Excel Formula for finding probability
• X value, μ, σ are given.
• NORMDIST(x, μ, σ, cumulative)
• X is the value for which you want the distribution.
• μ is the arithmetic mean of the distribution.
• σ is the standard deviation of the distribution.
• Cumulative :
• If TRUE, returns the c.d.f. (Area to the left) P(X ≤ x)
• if FALSE, it returns the p.d.f.
• 1 - NORMDIST(x, μ, σ, TRUE) (Area to the right) P(X < x)
69. Excel Formula for finding x value
• Probability, μ, σ are given.
• NORMINV(probability, μ, σ )
• Probability is a probability corresponding to the normal
distribution.
• μ is the arithmetic mean of the distribution.
• σ is the standard deviation of the distribution.
• For STANDARD NORMAL.
• NORMSDIST(z)
• Z is the value for which you want the distribution.
• NORMSINV(probability)
• Probability is a probability corresponding to the normal
distribution.
70. R functions for finding probability
• pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
• Defaults are specified
• q - The x values that we need to find the
probability for.
• We can also directly calculate z value and use z value like
given below
• pnorm(z, lower.tail = TRUE, log.p = FALSE)
71. R functions for finding x value
• qnorm(p, mean, sd , lower.tail, log.p)
• P - probability (defaults NONE)
(Usually given as a quantile like
0.92 for 92%)
• Mean - mean (defaults 0)
• Sd - std_deviation (defaults 1)
• Lower.tail - TRUE -> P(X ≤ x)
- FALSE -> P(X < x)
• Log.p - TRUE -> p is given as log(p)
- FALSE -> p is a quantile
72. Relationship between normal and Chi-
Square Distribution
• If X is normally distributed with mean μ and
variance σ2 > 0, then:
• is distributed as a chi-square random variable
with 1 degree of freedom.
74. What is Statistical Inference ?
• Generating conclusions about population from
a noisy sample
• We try to identify the estimates of population
from the data available in the form of samples
• The Historical data is one of the most widely
available data
• The Survey data is the other form of available
data
75. Statistical Inference – The process
• We have sample data
• Hence we will have a measure for it like mean,
median or mode.
• The sample measure is called the estimator.
• Where it tries to estimate the population
measure.
• Sample mean is an estimate of population mean
• Sample median is an estimate of population
median.