SlideShare ist ein Scribd-Unternehmen logo
1 von 76
STATISTICS - 2
Elements of Inference
A Small Recap of Previous Presentation
• Descriptive Vs Inferential Statistics
• Sample Vs Population
• Need for sampling
STATISTICS
Descriptive
• Descriptive statistics in
simple sense is to provide
people a description of the
data that we currently have.
• Example would be what is
the statistics of performance
of a class of students and
answer would be like mean
marks are 64.5
Inferential
• Inferential statistics is when
we have to infer an outcome
by just looking at a small
portion of data.
• Example would be who will
win this election and answer
would be like a survey of
10000 people suggests that
XYZ has a 60-70% chance with
95% confidence
Sample Vs Population
Sample
• A sample is a portion of the
population which is readily
available or easily attainable.
• Example would be a survey
or just a million people from
the population.
Population
• A population is the entire
data that should be ideally
used for a statistic.
• Example would be a census
or population of a country
Why do we go for sample ?
• Going around and asking the entire
population of people who they are going to
vote for is impossible.
• Taking the heights of the entire population is
not feasible as a lot would die and a lot would
be born by the time we finish.
• Sometimes the sample would be a
“Representative sample” meaning it has the
same nature and characteristics of the
population.
Elements of Inference
• Population Vs Sample (Measures)
• Probability
• Random Variables
• Probability Distributions
• Statistical Inference – The Concept
Population Vs Sample (Measures)
Population Measures
• Mean = μ
• SD = σ
• Var = σ2 :
• Variance Formula : Average
squared distance from
population mean.
Sample Measures
• Mean = X̅
• SD = S
• Var = S2 :
• Variance Formula :
MODIFIED Average squared
distance from Sample
mean.
Explained
in A
separate
simulation
Alternative formula for variance
Population Variance
Sample Variance
Find the proof here
Probability
Probability – Measure of Randomness
• Probability can be assumed as the measure of
randomness
• Usually done on a population quantity
A Simple rule in subsets
• Advanced Rule – If Occurance of A implies
occurrence of B, Then Probability of A
occurrence is < probability of B occurrence.
• P(A) < P(B).
BA
Probability in Statistics
• Why probability is a part of statistics?
• Probability calculus helps model the
randomness and hence is a part of statistical
measures.
• Mass function and density function is the
starting point.
• Example is bell curve (Normal distribution)
and all other similar distribution.
Types of Data
Quantitative
• Quantitative data are
called discrete if the sample space
contains a finite or countable
infinite number.
• S = {0, 1, 2, ..., 31} – students
passed
• S = {0, 1, 2, ...} – cars crossed in a
given hour
• Quantitative data are
called continuous if the sample
space contains an interval or
continuous span of real numbers.
• S = {h: h ≥ 0 hours} – number of
hours spent studying
Qualitative
• Qualitative data are
called categorical if the
sample space contains objects
that are grouped or
categorized based on some
qualitative trait.
• When there are only two such
groups or categories, the data
are considered binary.
• S = {yes, no} – binary
• S = {Male, Female, Other} -
Categorical
Random Variables
Random Variables
• Similar to variables in a computer program
• RV is a variable which holds the numeric
outcome of a experiment
• Types : Discrete and Continuous
Discrete random variables
• Examples of Discrete : Die roll
/ Coin toss etc.
• Coin toss – 2 possible values
{H, T}
• Roll of Dice – 6 possible
values {1,2,3,4,5,6}
• Modelling Discrete : We
associate a probability to all
individual outcomes.
• Web traffic in a given day –
Can have a fixed but unbound
value at any given day
Continuous random variables
• Example of continuous :
Number of hours I sleep
daily ( Discrete ? ).
• Lets develop the above into
continuous
• If you answer that 7, I may
ask is it & or 7.000001?
• Modelling Continuous : We
associate a probability to a
various ranges of outcomes
Continuous RV Example
• Height differentials of students in a class.
• Try to put the values in a SET -> { , , }
• Can you be sure if you get a value of 1.24?
• It can be 1.245.... Or 1.242...
• So Continuous RVs does not have defined value
instead can take plenty of values or infinite states.
• That is why we use a range to model them like.
[1-2], [2-3], [3-4]
Probability Distributions
Types of Distributions
Discrete Probability Distribution
• Bernoulli Distribution
• Binomial Distribution
• Geometric Distribution
• Poisson Distribution
Continuous Probability Distribution
• Uniform Distribution
• Normal Distribution
• Exponential Distribution
• Gamma Distribution
• Chi-Squared Distribution
Probability mass function (p.m.f)
• The probability that a discrete random
variable X takes on a particular value x, that
is, P(X = x),also denoted f(x).
• The function f(x) is typically called the probability
mass function.
• A.k.a
• probability function
• frequency function
• probability density function.
PMF of Discrete Random Variables
• PMF(H) -> f(coin toss) -> P(coin toss = H) = ½
• Defined as PMF is a function of a value of
random variable (X) which gives the
probability associated with that value of X(x).
• Cant be zero, all values of RV sums to 1
Cumulative Distribution Function
• The function: F(x) = P(X ≤ x)
is called a cumulative probability distribution.
• For a discrete random variable X, the
cumulative probability distribution F(x) is
determined by:
PMF Vs CDF
• note that the probability mass function, f(x),
of a discrete random variable X is
distinguished from the cumulative probability
distribution, F(x), of a discrete random
variable X by the use of a lowercase f and an
uppercase F.
• That is, the notation f(3) means P(X = 3), while
the notation F(3) means P(X ≤ 3).
Survival function
• Both CDF and Survival function is just some functions which if
named can make life easier.
• CDF(x) is probability that the function takes the value x and
lower.
• Survival function is just the opposite of Cdf.
• Survival fn(x) -> P(X > x).
Bernoulli Distribution
• X = 0 -> Tails
• X = 1 -> Heads
• P(x) = ( ½ )x.( ½ )(x-1)
• P(x) = (θ)x.(1-θ)(x-1) for a biased coin
• The above is Bernoulli distribution and models
a coin toss.
Bernoulli Distributions
• Bernoulli
• F(X=x) = Px.(1-p)(1-x) .
• Mean = p
• Variance = p.(1-p)
P
1-p
0 0.5 1
Binomial Distrbution
• A discrete random variable X is a binomial random
variable if:
• An experiment, or trial, is performed in exactly the same
way n times.
• Each of the n trials has only two possible outcomes. One of
the outcomes is called a "success," while the other is called
a "failure." Such a trial is called a Bernoulli trial.
• The n trials are independent.
• The probability of success, denoted p, is the same for each
trial. The probability of failure is q = 1 − p.
• The random variable X = the number of successes in
the n trials.
Binomial Distribution
• The probability mass function of a binomial random
variable X is:
• f(x)=(nCx) (p)x.(1-p)(n-x) Or (nCx) (p)x.(q)(n-x)
• We denote the binomial distribution as b(n, p). That is, we
say:
• X ~ b(n, p)
• where the tilde (~) is read "as distributed as,"
and n and p are called parameters of the distribution.
Function, Mean and Variance of
Binomial Distribution
• P.m.f -> f(x)=(nCx) (p)x.(1-p)(n-x)
• Mean -> np
• Sd -> σ= √np(1−p)
• Variance -> σ2=np(1−p)
Geometric Distribution
• Assume Bernoulli trials — that is,
• (1) there are two possible outcomes,
• (2) the trials are independent, and
• (3) p, the probability of success, remains the same from trial to trial.
• Let X denote the number of trials until the first success. Then, the
probability mass function of X is:
• f(x)=P(X=x)=(1−p)x−1p
• for x = 1, 2, ... In this case, we say that X follows a geometric
distribution.
Geometric Distribution Example
• A representative from the National Football
League's Marketing Division randomly selects
people on a random street in Kansas City, Kansas
until he finds a person who attended the last
home football game. Let p, the probability that he
succeeds in finding such a person, equal 0.20.
And, let X denote the number of people he
selects until he finds his first success. What is the
probability that the marketing representative
must select 4 people before he finds one who
attended the last home football game?
Function, Mean and Variance of
Geometric Distribution
• P.m.f -> f(x)= P(X=x)=(1−p)x−1p
• Mean -> 1/p
• Sd -> σ= √(1−p)/p
• Variance -> σ2=(1-p)/p2
Negative Binomial Distribution
• Assume Bernoulli trials from the same geometric distribution
example— that is,
• (1) there are two possible outcomes,
• (2) the trials are independent, and
• (3) p, the probability of success, remains the same from trial to trial.
• Let X denote the number of trials until the rth success. Then, the
probability mass function of X is:
• f(x)=P(X=x)=(x−1Cr−1)(1−p)x−rpr
• for x = r, r + 1, r + 2, ... In this case, we say that X follows a negative
binomial distribution.
• A geometric distribution is a special case of a negative binomial
distribution with r = 1
Function, Mean and Variance of
Negative Binomial Distribution
• P.m.f -> f(x)=(x−1Cr−1)(1−p)x−rpr
• Mean -> r/p
• Sd -> √r(1−p)/p
• Variance -> r(1-p)/p2
Poisson Distribution
• Let the discrete random variable X denote the number of times an event
occurs in an interval of time (or space). Then X maybe a Poisson random
variable with x = 0, 1, 2, ...
• Examples:
• Let X equal the number of typos on a printed page. (This is an example of
an interval of space — the space being the printed page.)
• Let X equal the number of cars passing through the intersection of Allen
Street and College Avenue in one minute. (This is an example of an
interval of time — the time being one minute.)
• Let X equal the number of Alaskan salmon caught in a squid driftnet. (This
is again an example of an interval of space — the space being the squid
driftnet.)
• Let X equal the number of customers at an ATM in 10-minute intervals.
• Let X equal the number of students arriving during office hours.
Function, Mean and Variance of
Poisson Distribution
• P.m.f ->
• Mean -> λ
• Sd -> √ λ
• Variance -> λ
Poisson can be approximated to the binomial distribution when n is large and p is
small.
Continuous Probability Distributions
Continuous Distributions
• In this section, as the title suggests, we are
going to investigate probability distributions
of continuous random variables, that is,
random variables whose values contains an
infinite interval of possible outcomes.
Useful Pre-requisites
• Empirical Rule for 68,95 and 99.7 percentile.
• When the gathered data is mound or bell-
shaped.
• We can use the following formula to identify the
amount of data points below the level.
• 68% data is between μ ± 1σ
• 95% data is between μ ± 2σ
• 99.7% data is between μ ± 3σ
Useful Pre-requisites
• Quantiles
• Percentile is a derivative of quantile
• Percentile will map the value to that of 100
• Quantile will map the value to that of the
maximum value.
• The median is the 50th quantile.
• 25th Quantile will encompass 1/4th of the data
Useful Pre-requisite
• The 25th percentile is also called the first
quartile and is denoted as q1.
• The 50th percentile is also called the second
quartile or median, and is denoted as q2 or m.
• The 75th percentile is also called the third
quartile and is denoted as q3.
• The interquartile range (IQR) is the difference
between the first and third quartiles.
Useful Pre-requisite
• Five-Number Summary
• we have a random sample of 20 concentrations of
calcium carbonate (CaCO3) in milligrams per litre:
• Minimum: 127.8
• First quartile: 130.12
• Median: 131.45
• Third quartile: 132.70
• Maximum: 134.8
Useful Pre-requisite
• Skewness and Symmetry
• For a distribution that is skewed left, the bulk of the
data values (including the median) lie to the right of
the mean, and there is a long tail on the left side.
• For a distribution that is skewed right, the bulk of the
data values (including the median) lie to the left of the
mean, and there is a long tail on the right side.
• For a distribution that is symmetric, approximately half
of the data values lie to the left of the mean, and
approximately half of the data values lie to the right of
the mean.
Contd..
Symmetric Skewed right Skewed left
Probability Density Function
• The probability that X takes on any particular
value x is 0. That is, finding P(X = x) for a
continuous random variable X is not going to
work.
• Instead, we'll need to find the probability
that X falls in some interval (a, b), that is, we'll
need to find P(a < X < b). We'll do that using a
probability density function ("p.d.f.").
Probability Density Function
• PDF -> Helps model the continuous RV and used area represented by the
probabilities
• Value is always larger than zero throughout the curve or function.
• Total area under the curve is 1.
• Area under the curve gives probabilities associated with Random Variable.
• Probablity that a person with IQ > 100 but less than 115 in a PDF defined is
shown.
100 115
Special Considerations on PDF
• The probability that the Random Variable
takes any specific value is 0 since area of the
curve (not under) is 0.
• The PDF always talks or deals with the
population measure (NOT sample measure).
Cumulative Distribution Function
• The function: F(x) = P(X ≤ x)
is called a cumulative probability distribution.
• For a discrete random variable X, F(x) is:
• For a continuous RV X F(x) is :
• The summation is made to integral.
Survival function
• Both CDF and Survival function is just some functions which if
named can make life easier.
• CDF(x) is probability that the function takes the value x and
lower.
• Survival function is just the opposite of Cdf.
• Survival fn(x) -> P(X > x).
CDF SF
Continuous Distributions
• Uniform Distribution
• Beta Distribution
• Normal Distribution
• Exponential Distribution
• Gamma Distribution
• Chi-Squared Distribution
Uniform Distribution
• A continuous random variable X has a uniform
distribution, denoted U(a, b), if its distribution
is as given below
•
P.d.f, mean and variance of a Uniform
distribution
• P.d.f ->
• Mean ->
• Variance ->
Exponential Distribution
• Suppose X, following an (approximate) Poisson process,
equals the number of customers arriving at a bank in an
interval of length 1.
• If λ, the mean number of customers arriving in an interval
of length 1, is 6, say, then we might observe something like
this:
•
• w – waiting time
Exponential Distribution – contd..
• Previously, our focus would have been on the
discrete random variable X, the number of
customers arriving.
• As the picture suggests, however, we could
alternatively be interested in the continuous
random variable W, the waiting time until
the first customer arrives.
p.d.f, mean and variance of
Exponential Distribution
• P.d.f ->
• Mean ->
• Variance ->
Gamma Distributions
• we learned that in an approximate Poisson process with
mean λ, the waiting time X until the first event occurs
follows an exponential distribution with mean θ = 1/λ.
• We now let W denote the waiting time until
the αth event occurs and find the distribution of W. We
could represent the situation as follows:
•
p.d.f, mean and variance of Gamma
Distribution
• P.d.f ->
• Mean ->
• Variance ->
Chi-Square Distribution
• the chi-square distribution is just a special
case of the gamma distribution!
• Let X follow a gamma distribution with θ = 2
and α = r/2, where r is a positive integer. Then
we say that X follows a chi-square distribution
with r degrees of freedom, denoted χ2(r) and
read "chi-square-r."
p.d.f, mean and variance of Chi-Square
Distribution
• P.d.f ->
• Mean ->
• Variance ->
Normal Distribution
• Most frequent distribution seen in the natural
world.
• P.d.f ->
• Mean -> μ
• Variance -> σ2
Properties of Normal Distribution
• All normal curves are bell-shaped.
• All normal curves are symmetric about the mean μ.
• The area under an entire normal curve is 1.
• All normal curves are positive for all x. That is, f(x) > 0
for all x.
• The limit of f(x) as x goes to infinity is 0, and the limit
of f(x) as x goes to negative infinity is 0.
• The height of any normal curve is maximized at x = µ.
• The shape of any normal curve depends on its
mean μ and standard deviation σ.
Standard Normal Distribution
• If X ~ N(μ, σ2), then: Z = (X-μ)/σ follows N(0,1).
• This means that Z is a random variable which
follows the N(0,1) distribution, which is called
the standardized (or standard) normal
distribution.
• Now we can use the standard normal N(0,1)
table, typically referred to as the Z-table, to find
the desired probability.
Finding probabilities in normal
distribution?
• Lets see the following question:-
• Let X equal the IQ of a randomly selected American.
Assume X ~ N(100, 162). What is the probability that a
randomly selected American has an IQ below 90?
•
Finding probabilities in normal
distribution?
• The following integral gives us the answer for the question
• But there is just 1 problem
• It is not possible to integrate the normal p.d.f. That is, no
simple expression exists for the antiderivative. We can only
approximate the integral using numerical analysis
techniques.
• So, all we need to do is find a normal probability table for a
normal distribution with mean μ = 100 and standard
deviation σ = 16. Then there would have to be an infinite
number of normal probability tables for various μ and σ.
Finding probabilities in normal
distribution? - Solution
• The cumulative probabilities have been tabled
for the N(0,1) distribution.
• All we need to do is transform our N(100,162)
distribution to a N(0,1) distribution
• Then use the cumulative probability table for
the N(0,1) distribution to calculate our desired
probability.
Transforming X ~ N(μ,σ2) to X ~ N(0,1)
• If X ~ N(μ, σ2), then:
• follows the N(0,1) distribution, which is called
the standardized (or standard) normal
distribution.
• Use the standard normal N(0,1) table, typically
referred to as the Z-table, to find the desired
probability.
Functions to find probability or X value
• First and foremost method is to use the
Empirical rule for 90%, 95%, 97.5%, 99% viz
1.28σ, 1.645σ, 1.96σ, 2.33σ respectively
• Second method is to use the Z-Table.
• Excel and R Formulas are described in detail in
the following slides.
Excel Formula for finding probability
• X value, μ, σ are given.
• NORMDIST(x, μ, σ, cumulative)
• X is the value for which you want the distribution.
• μ is the arithmetic mean of the distribution.
• σ is the standard deviation of the distribution.
• Cumulative :
• If TRUE, returns the c.d.f. (Area to the left) P(X ≤ x)
• if FALSE, it returns the p.d.f.
• 1 - NORMDIST(x, μ, σ, TRUE) (Area to the right) P(X < x)
Excel Formula for finding x value
• Probability, μ, σ are given.
• NORMINV(probability, μ, σ )
• Probability is a probability corresponding to the normal
distribution.
• μ is the arithmetic mean of the distribution.
• σ is the standard deviation of the distribution.
• For STANDARD NORMAL.
• NORMSDIST(z)
• Z is the value for which you want the distribution.
• NORMSINV(probability)
• Probability is a probability corresponding to the normal
distribution.
R functions for finding probability
• pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
• Defaults are specified
• q - The x values that we need to find the
probability for.
• We can also directly calculate z value and use z value like
given below
• pnorm(z, lower.tail = TRUE, log.p = FALSE)
R functions for finding x value
• qnorm(p, mean, sd , lower.tail, log.p)
• P - probability (defaults NONE)
(Usually given as a quantile like
0.92 for 92%)
• Mean - mean (defaults 0)
• Sd - std_deviation (defaults 1)
• Lower.tail - TRUE -> P(X ≤ x)
- FALSE -> P(X < x)
• Log.p - TRUE -> p is given as log(p)
- FALSE -> p is a quantile
Relationship between normal and Chi-
Square Distribution
• If X is normally distributed with mean μ and
variance σ2 > 0, then:
• is distributed as a chi-square random variable
with 1 degree of freedom.
Statistical Inference
What is Statistical Inference ?
• Generating conclusions about population from
a noisy sample
• We try to identify the estimates of population
from the data available in the form of samples
• The Historical data is one of the most widely
available data
• The Survey data is the other form of available
data
Statistical Inference – The process
• We have sample data
• Hence we will have a measure for it like mean,
median or mode.
• The sample measure is called the estimator.
• Where it tries to estimate the population
measure.
• Sample mean is an estimate of population mean
• Sample median is an estimate of population
median.
END OF MODULE
~~X~~
Coming up Next = Statistical Inference - Core
76

Weitere ähnliche Inhalte

Was ist angesagt?

How to calculate Sample Size
How to calculate Sample SizeHow to calculate Sample Size
How to calculate Sample SizeMNDU net
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONJohn Labrador
 
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptxSAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptxssuserd509321
 
Basic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxBasic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxParag Shah
 
Correlational analysis, Basics, Assumptions for Pearson, Spearman Tests
Correlational analysis, Basics, Assumptions for Pearson, Spearman TestsCorrelational analysis, Basics, Assumptions for Pearson, Spearman Tests
Correlational analysis, Basics, Assumptions for Pearson, Spearman TestsMichael J Leo
 
Sample size estimation
Sample size estimationSample size estimation
Sample size estimationHanaaBayomy
 
When to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxWhen to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxAsokan R
 
Inferential statictis ready go
Inferential statictis ready goInferential statictis ready go
Inferential statictis ready goMmedsc Hahm
 
Inferential statistics (2)
Inferential statistics (2)Inferential statistics (2)
Inferential statistics (2)rajnulada
 
Sample Size Estimation
Sample Size EstimationSample Size Estimation
Sample Size EstimationNayyar Kazmi
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statisticssikojp
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statisticsAshok Kulkarni
 
Unit 9b. Sample size estimation.ppt
Unit 9b. Sample size estimation.pptUnit 9b. Sample size estimation.ppt
Unit 9b. Sample size estimation.pptshakirRahman10
 
Odds ratios (Basic concepts)
Odds ratios (Basic concepts)Odds ratios (Basic concepts)
Odds ratios (Basic concepts)Tarekk Alazabee
 
Power, Effect Sizes, Confidence Intervals, & Academic Integrity
Power, Effect Sizes, Confidence Intervals, & Academic IntegrityPower, Effect Sizes, Confidence Intervals, & Academic Integrity
Power, Effect Sizes, Confidence Intervals, & Academic IntegrityJames Neill
 
Interval estimation for proportions
Interval estimation for proportionsInterval estimation for proportions
Interval estimation for proportionsAditya Mahagaonkar
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and StatisticsT.S. Lim
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionMohit Asija
 

Was ist angesagt? (20)

Confidence Intervals
Confidence IntervalsConfidence Intervals
Confidence Intervals
 
How to calculate Sample Size
How to calculate Sample SizeHow to calculate Sample Size
How to calculate Sample Size
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTION
 
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptxSAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
 
Basic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxBasic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptx
 
Correlational analysis, Basics, Assumptions for Pearson, Spearman Tests
Correlational analysis, Basics, Assumptions for Pearson, Spearman TestsCorrelational analysis, Basics, Assumptions for Pearson, Spearman Tests
Correlational analysis, Basics, Assumptions for Pearson, Spearman Tests
 
Sample size estimation
Sample size estimationSample size estimation
Sample size estimation
 
When to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxWhen to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptx
 
Tests of Significance: The Basics Concepts
Tests of Significance: The Basics ConceptsTests of Significance: The Basics Concepts
Tests of Significance: The Basics Concepts
 
Inferential statictis ready go
Inferential statictis ready goInferential statictis ready go
Inferential statictis ready go
 
Inferential statistics (2)
Inferential statistics (2)Inferential statistics (2)
Inferential statistics (2)
 
Sample Size Estimation
Sample Size EstimationSample Size Estimation
Sample Size Estimation
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statistics
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Unit 9b. Sample size estimation.ppt
Unit 9b. Sample size estimation.pptUnit 9b. Sample size estimation.ppt
Unit 9b. Sample size estimation.ppt
 
Odds ratios (Basic concepts)
Odds ratios (Basic concepts)Odds ratios (Basic concepts)
Odds ratios (Basic concepts)
 
Power, Effect Sizes, Confidence Intervals, & Academic Integrity
Power, Effect Sizes, Confidence Intervals, & Academic IntegrityPower, Effect Sizes, Confidence Intervals, & Academic Integrity
Power, Effect Sizes, Confidence Intervals, & Academic Integrity
 
Interval estimation for proportions
Interval estimation for proportionsInterval estimation for proportions
Interval estimation for proportions
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 

Ähnlich wie Statistics-2 : Elements of Inference

Probability distribution
Probability distributionProbability distribution
Probability distributionRanjan Kumar
 
Binomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBinomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBharath kumar Karanam
 
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxBINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxletbestrong
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)마이캠퍼스
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic tradingQuantInsti
 
Statistics Formulae for School Students
Statistics Formulae for School StudentsStatistics Formulae for School Students
Statistics Formulae for School Studentsdhatiraghu
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrvPooja Sakhla
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10Ruru Chowdhury
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- IIIAkhila Prabhakaran
 
7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt7 Chi-square and F (1).ppt
7 Chi-square and F (1).pptAbebe334138
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributionsScholarsPoint1
 

Ähnlich wie Statistics-2 : Elements of Inference (20)

Probability distribution
Probability distributionProbability distribution
Probability distribution
 
lecture4.pdf
lecture4.pdflecture4.pdf
lecture4.pdf
 
5. RV and Distributions.pptx
5. RV and Distributions.pptx5. RV and Distributions.pptx
5. RV and Distributions.pptx
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Binomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBinomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distribution
 
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxBINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
 
FandTtests.ppt
FandTtests.pptFandTtests.ppt
FandTtests.ppt
 
Statistics Formulae for School Students
Statistics Formulae for School StudentsStatistics Formulae for School Students
Statistics Formulae for School Students
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Probability
ProbabilityProbability
Probability
 
Chapter 5 and Chapter 6
Chapter 5 and Chapter 6 Chapter 5 and Chapter 6
Chapter 5 and Chapter 6
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10
 
Unit3
Unit3Unit3
Unit3
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
 
7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt
 
Crv
CrvCrv
Crv
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
 

Kürzlich hochgeladen

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Kürzlich hochgeladen (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Statistics-2 : Elements of Inference

  • 1. STATISTICS - 2 Elements of Inference
  • 2. A Small Recap of Previous Presentation • Descriptive Vs Inferential Statistics • Sample Vs Population • Need for sampling
  • 3. STATISTICS Descriptive • Descriptive statistics in simple sense is to provide people a description of the data that we currently have. • Example would be what is the statistics of performance of a class of students and answer would be like mean marks are 64.5 Inferential • Inferential statistics is when we have to infer an outcome by just looking at a small portion of data. • Example would be who will win this election and answer would be like a survey of 10000 people suggests that XYZ has a 60-70% chance with 95% confidence
  • 4. Sample Vs Population Sample • A sample is a portion of the population which is readily available or easily attainable. • Example would be a survey or just a million people from the population. Population • A population is the entire data that should be ideally used for a statistic. • Example would be a census or population of a country
  • 5. Why do we go for sample ? • Going around and asking the entire population of people who they are going to vote for is impossible. • Taking the heights of the entire population is not feasible as a lot would die and a lot would be born by the time we finish. • Sometimes the sample would be a “Representative sample” meaning it has the same nature and characteristics of the population.
  • 6. Elements of Inference • Population Vs Sample (Measures) • Probability • Random Variables • Probability Distributions • Statistical Inference – The Concept
  • 7. Population Vs Sample (Measures) Population Measures • Mean = μ • SD = σ • Var = σ2 : • Variance Formula : Average squared distance from population mean. Sample Measures • Mean = X̅ • SD = S • Var = S2 : • Variance Formula : MODIFIED Average squared distance from Sample mean. Explained in A separate simulation
  • 8. Alternative formula for variance Population Variance Sample Variance Find the proof here
  • 10. Probability – Measure of Randomness • Probability can be assumed as the measure of randomness • Usually done on a population quantity
  • 11. A Simple rule in subsets • Advanced Rule – If Occurance of A implies occurrence of B, Then Probability of A occurrence is < probability of B occurrence. • P(A) < P(B). BA
  • 12. Probability in Statistics • Why probability is a part of statistics? • Probability calculus helps model the randomness and hence is a part of statistical measures. • Mass function and density function is the starting point. • Example is bell curve (Normal distribution) and all other similar distribution.
  • 13. Types of Data Quantitative • Quantitative data are called discrete if the sample space contains a finite or countable infinite number. • S = {0, 1, 2, ..., 31} – students passed • S = {0, 1, 2, ...} – cars crossed in a given hour • Quantitative data are called continuous if the sample space contains an interval or continuous span of real numbers. • S = {h: h ≥ 0 hours} – number of hours spent studying Qualitative • Qualitative data are called categorical if the sample space contains objects that are grouped or categorized based on some qualitative trait. • When there are only two such groups or categories, the data are considered binary. • S = {yes, no} – binary • S = {Male, Female, Other} - Categorical
  • 15. Random Variables • Similar to variables in a computer program • RV is a variable which holds the numeric outcome of a experiment • Types : Discrete and Continuous
  • 16. Discrete random variables • Examples of Discrete : Die roll / Coin toss etc. • Coin toss – 2 possible values {H, T} • Roll of Dice – 6 possible values {1,2,3,4,5,6} • Modelling Discrete : We associate a probability to all individual outcomes. • Web traffic in a given day – Can have a fixed but unbound value at any given day Continuous random variables • Example of continuous : Number of hours I sleep daily ( Discrete ? ). • Lets develop the above into continuous • If you answer that 7, I may ask is it & or 7.000001? • Modelling Continuous : We associate a probability to a various ranges of outcomes
  • 17. Continuous RV Example • Height differentials of students in a class. • Try to put the values in a SET -> { , , } • Can you be sure if you get a value of 1.24? • It can be 1.245.... Or 1.242... • So Continuous RVs does not have defined value instead can take plenty of values or infinite states. • That is why we use a range to model them like. [1-2], [2-3], [3-4]
  • 19. Types of Distributions Discrete Probability Distribution • Bernoulli Distribution • Binomial Distribution • Geometric Distribution • Poisson Distribution Continuous Probability Distribution • Uniform Distribution • Normal Distribution • Exponential Distribution • Gamma Distribution • Chi-Squared Distribution
  • 20. Probability mass function (p.m.f) • The probability that a discrete random variable X takes on a particular value x, that is, P(X = x),also denoted f(x). • The function f(x) is typically called the probability mass function. • A.k.a • probability function • frequency function • probability density function.
  • 21. PMF of Discrete Random Variables • PMF(H) -> f(coin toss) -> P(coin toss = H) = ½ • Defined as PMF is a function of a value of random variable (X) which gives the probability associated with that value of X(x). • Cant be zero, all values of RV sums to 1
  • 22. Cumulative Distribution Function • The function: F(x) = P(X ≤ x) is called a cumulative probability distribution. • For a discrete random variable X, the cumulative probability distribution F(x) is determined by:
  • 23. PMF Vs CDF • note that the probability mass function, f(x), of a discrete random variable X is distinguished from the cumulative probability distribution, F(x), of a discrete random variable X by the use of a lowercase f and an uppercase F. • That is, the notation f(3) means P(X = 3), while the notation F(3) means P(X ≤ 3).
  • 24. Survival function • Both CDF and Survival function is just some functions which if named can make life easier. • CDF(x) is probability that the function takes the value x and lower. • Survival function is just the opposite of Cdf. • Survival fn(x) -> P(X > x).
  • 25. Bernoulli Distribution • X = 0 -> Tails • X = 1 -> Heads • P(x) = ( ½ )x.( ½ )(x-1) • P(x) = (θ)x.(1-θ)(x-1) for a biased coin • The above is Bernoulli distribution and models a coin toss.
  • 26. Bernoulli Distributions • Bernoulli • F(X=x) = Px.(1-p)(1-x) . • Mean = p • Variance = p.(1-p) P 1-p 0 0.5 1
  • 27. Binomial Distrbution • A discrete random variable X is a binomial random variable if: • An experiment, or trial, is performed in exactly the same way n times. • Each of the n trials has only two possible outcomes. One of the outcomes is called a "success," while the other is called a "failure." Such a trial is called a Bernoulli trial. • The n trials are independent. • The probability of success, denoted p, is the same for each trial. The probability of failure is q = 1 − p. • The random variable X = the number of successes in the n trials.
  • 28. Binomial Distribution • The probability mass function of a binomial random variable X is: • f(x)=(nCx) (p)x.(1-p)(n-x) Or (nCx) (p)x.(q)(n-x) • We denote the binomial distribution as b(n, p). That is, we say: • X ~ b(n, p) • where the tilde (~) is read "as distributed as," and n and p are called parameters of the distribution.
  • 29. Function, Mean and Variance of Binomial Distribution • P.m.f -> f(x)=(nCx) (p)x.(1-p)(n-x) • Mean -> np • Sd -> σ= √np(1−p) • Variance -> σ2=np(1−p)
  • 30. Geometric Distribution • Assume Bernoulli trials — that is, • (1) there are two possible outcomes, • (2) the trials are independent, and • (3) p, the probability of success, remains the same from trial to trial. • Let X denote the number of trials until the first success. Then, the probability mass function of X is: • f(x)=P(X=x)=(1−p)x−1p • for x = 1, 2, ... In this case, we say that X follows a geometric distribution.
  • 31. Geometric Distribution Example • A representative from the National Football League's Marketing Division randomly selects people on a random street in Kansas City, Kansas until he finds a person who attended the last home football game. Let p, the probability that he succeeds in finding such a person, equal 0.20. And, let X denote the number of people he selects until he finds his first success. What is the probability that the marketing representative must select 4 people before he finds one who attended the last home football game?
  • 32. Function, Mean and Variance of Geometric Distribution • P.m.f -> f(x)= P(X=x)=(1−p)x−1p • Mean -> 1/p • Sd -> σ= √(1−p)/p • Variance -> σ2=(1-p)/p2
  • 33. Negative Binomial Distribution • Assume Bernoulli trials from the same geometric distribution example— that is, • (1) there are two possible outcomes, • (2) the trials are independent, and • (3) p, the probability of success, remains the same from trial to trial. • Let X denote the number of trials until the rth success. Then, the probability mass function of X is: • f(x)=P(X=x)=(x−1Cr−1)(1−p)x−rpr • for x = r, r + 1, r + 2, ... In this case, we say that X follows a negative binomial distribution. • A geometric distribution is a special case of a negative binomial distribution with r = 1
  • 34. Function, Mean and Variance of Negative Binomial Distribution • P.m.f -> f(x)=(x−1Cr−1)(1−p)x−rpr • Mean -> r/p • Sd -> √r(1−p)/p • Variance -> r(1-p)/p2
  • 35. Poisson Distribution • Let the discrete random variable X denote the number of times an event occurs in an interval of time (or space). Then X maybe a Poisson random variable with x = 0, 1, 2, ... • Examples: • Let X equal the number of typos on a printed page. (This is an example of an interval of space — the space being the printed page.) • Let X equal the number of cars passing through the intersection of Allen Street and College Avenue in one minute. (This is an example of an interval of time — the time being one minute.) • Let X equal the number of Alaskan salmon caught in a squid driftnet. (This is again an example of an interval of space — the space being the squid driftnet.) • Let X equal the number of customers at an ATM in 10-minute intervals. • Let X equal the number of students arriving during office hours.
  • 36. Function, Mean and Variance of Poisson Distribution • P.m.f -> • Mean -> λ • Sd -> √ λ • Variance -> λ Poisson can be approximated to the binomial distribution when n is large and p is small.
  • 38. Continuous Distributions • In this section, as the title suggests, we are going to investigate probability distributions of continuous random variables, that is, random variables whose values contains an infinite interval of possible outcomes.
  • 39. Useful Pre-requisites • Empirical Rule for 68,95 and 99.7 percentile. • When the gathered data is mound or bell- shaped. • We can use the following formula to identify the amount of data points below the level. • 68% data is between μ ± 1σ • 95% data is between μ ± 2σ • 99.7% data is between μ ± 3σ
  • 40. Useful Pre-requisites • Quantiles • Percentile is a derivative of quantile • Percentile will map the value to that of 100 • Quantile will map the value to that of the maximum value. • The median is the 50th quantile. • 25th Quantile will encompass 1/4th of the data
  • 41. Useful Pre-requisite • The 25th percentile is also called the first quartile and is denoted as q1. • The 50th percentile is also called the second quartile or median, and is denoted as q2 or m. • The 75th percentile is also called the third quartile and is denoted as q3. • The interquartile range (IQR) is the difference between the first and third quartiles.
  • 42. Useful Pre-requisite • Five-Number Summary • we have a random sample of 20 concentrations of calcium carbonate (CaCO3) in milligrams per litre: • Minimum: 127.8 • First quartile: 130.12 • Median: 131.45 • Third quartile: 132.70 • Maximum: 134.8
  • 43. Useful Pre-requisite • Skewness and Symmetry • For a distribution that is skewed left, the bulk of the data values (including the median) lie to the right of the mean, and there is a long tail on the left side. • For a distribution that is skewed right, the bulk of the data values (including the median) lie to the left of the mean, and there is a long tail on the right side. • For a distribution that is symmetric, approximately half of the data values lie to the left of the mean, and approximately half of the data values lie to the right of the mean.
  • 45. Probability Density Function • The probability that X takes on any particular value x is 0. That is, finding P(X = x) for a continuous random variable X is not going to work. • Instead, we'll need to find the probability that X falls in some interval (a, b), that is, we'll need to find P(a < X < b). We'll do that using a probability density function ("p.d.f.").
  • 46. Probability Density Function • PDF -> Helps model the continuous RV and used area represented by the probabilities • Value is always larger than zero throughout the curve or function. • Total area under the curve is 1. • Area under the curve gives probabilities associated with Random Variable. • Probablity that a person with IQ > 100 but less than 115 in a PDF defined is shown. 100 115
  • 47. Special Considerations on PDF • The probability that the Random Variable takes any specific value is 0 since area of the curve (not under) is 0. • The PDF always talks or deals with the population measure (NOT sample measure).
  • 48. Cumulative Distribution Function • The function: F(x) = P(X ≤ x) is called a cumulative probability distribution. • For a discrete random variable X, F(x) is: • For a continuous RV X F(x) is : • The summation is made to integral.
  • 49. Survival function • Both CDF and Survival function is just some functions which if named can make life easier. • CDF(x) is probability that the function takes the value x and lower. • Survival function is just the opposite of Cdf. • Survival fn(x) -> P(X > x). CDF SF
  • 50. Continuous Distributions • Uniform Distribution • Beta Distribution • Normal Distribution • Exponential Distribution • Gamma Distribution • Chi-Squared Distribution
  • 51. Uniform Distribution • A continuous random variable X has a uniform distribution, denoted U(a, b), if its distribution is as given below •
  • 52. P.d.f, mean and variance of a Uniform distribution • P.d.f -> • Mean -> • Variance ->
  • 53. Exponential Distribution • Suppose X, following an (approximate) Poisson process, equals the number of customers arriving at a bank in an interval of length 1. • If λ, the mean number of customers arriving in an interval of length 1, is 6, say, then we might observe something like this: • • w – waiting time
  • 54. Exponential Distribution – contd.. • Previously, our focus would have been on the discrete random variable X, the number of customers arriving. • As the picture suggests, however, we could alternatively be interested in the continuous random variable W, the waiting time until the first customer arrives.
  • 55. p.d.f, mean and variance of Exponential Distribution • P.d.f -> • Mean -> • Variance ->
  • 56. Gamma Distributions • we learned that in an approximate Poisson process with mean λ, the waiting time X until the first event occurs follows an exponential distribution with mean θ = 1/λ. • We now let W denote the waiting time until the αth event occurs and find the distribution of W. We could represent the situation as follows: •
  • 57. p.d.f, mean and variance of Gamma Distribution • P.d.f -> • Mean -> • Variance ->
  • 58. Chi-Square Distribution • the chi-square distribution is just a special case of the gamma distribution! • Let X follow a gamma distribution with θ = 2 and α = r/2, where r is a positive integer. Then we say that X follows a chi-square distribution with r degrees of freedom, denoted χ2(r) and read "chi-square-r."
  • 59. p.d.f, mean and variance of Chi-Square Distribution • P.d.f -> • Mean -> • Variance ->
  • 60. Normal Distribution • Most frequent distribution seen in the natural world. • P.d.f -> • Mean -> μ • Variance -> σ2
  • 61. Properties of Normal Distribution • All normal curves are bell-shaped. • All normal curves are symmetric about the mean μ. • The area under an entire normal curve is 1. • All normal curves are positive for all x. That is, f(x) > 0 for all x. • The limit of f(x) as x goes to infinity is 0, and the limit of f(x) as x goes to negative infinity is 0. • The height of any normal curve is maximized at x = µ. • The shape of any normal curve depends on its mean μ and standard deviation σ.
  • 62. Standard Normal Distribution • If X ~ N(μ, σ2), then: Z = (X-μ)/σ follows N(0,1). • This means that Z is a random variable which follows the N(0,1) distribution, which is called the standardized (or standard) normal distribution. • Now we can use the standard normal N(0,1) table, typically referred to as the Z-table, to find the desired probability.
  • 63. Finding probabilities in normal distribution? • Lets see the following question:- • Let X equal the IQ of a randomly selected American. Assume X ~ N(100, 162). What is the probability that a randomly selected American has an IQ below 90? •
  • 64. Finding probabilities in normal distribution? • The following integral gives us the answer for the question • But there is just 1 problem • It is not possible to integrate the normal p.d.f. That is, no simple expression exists for the antiderivative. We can only approximate the integral using numerical analysis techniques. • So, all we need to do is find a normal probability table for a normal distribution with mean μ = 100 and standard deviation σ = 16. Then there would have to be an infinite number of normal probability tables for various μ and σ.
  • 65. Finding probabilities in normal distribution? - Solution • The cumulative probabilities have been tabled for the N(0,1) distribution. • All we need to do is transform our N(100,162) distribution to a N(0,1) distribution • Then use the cumulative probability table for the N(0,1) distribution to calculate our desired probability.
  • 66. Transforming X ~ N(μ,σ2) to X ~ N(0,1) • If X ~ N(μ, σ2), then: • follows the N(0,1) distribution, which is called the standardized (or standard) normal distribution. • Use the standard normal N(0,1) table, typically referred to as the Z-table, to find the desired probability.
  • 67. Functions to find probability or X value • First and foremost method is to use the Empirical rule for 90%, 95%, 97.5%, 99% viz 1.28σ, 1.645σ, 1.96σ, 2.33σ respectively • Second method is to use the Z-Table. • Excel and R Formulas are described in detail in the following slides.
  • 68. Excel Formula for finding probability • X value, μ, σ are given. • NORMDIST(x, μ, σ, cumulative) • X is the value for which you want the distribution. • μ is the arithmetic mean of the distribution. • σ is the standard deviation of the distribution. • Cumulative : • If TRUE, returns the c.d.f. (Area to the left) P(X ≤ x) • if FALSE, it returns the p.d.f. • 1 - NORMDIST(x, μ, σ, TRUE) (Area to the right) P(X < x)
  • 69. Excel Formula for finding x value • Probability, μ, σ are given. • NORMINV(probability, μ, σ ) • Probability is a probability corresponding to the normal distribution. • μ is the arithmetic mean of the distribution. • σ is the standard deviation of the distribution. • For STANDARD NORMAL. • NORMSDIST(z) • Z is the value for which you want the distribution. • NORMSINV(probability) • Probability is a probability corresponding to the normal distribution.
  • 70. R functions for finding probability • pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) • Defaults are specified • q - The x values that we need to find the probability for. • We can also directly calculate z value and use z value like given below • pnorm(z, lower.tail = TRUE, log.p = FALSE)
  • 71. R functions for finding x value • qnorm(p, mean, sd , lower.tail, log.p) • P - probability (defaults NONE) (Usually given as a quantile like 0.92 for 92%) • Mean - mean (defaults 0) • Sd - std_deviation (defaults 1) • Lower.tail - TRUE -> P(X ≤ x) - FALSE -> P(X < x) • Log.p - TRUE -> p is given as log(p) - FALSE -> p is a quantile
  • 72. Relationship between normal and Chi- Square Distribution • If X is normally distributed with mean μ and variance σ2 > 0, then: • is distributed as a chi-square random variable with 1 degree of freedom.
  • 74. What is Statistical Inference ? • Generating conclusions about population from a noisy sample • We try to identify the estimates of population from the data available in the form of samples • The Historical data is one of the most widely available data • The Survey data is the other form of available data
  • 75. Statistical Inference – The process • We have sample data • Hence we will have a measure for it like mean, median or mode. • The sample measure is called the estimator. • Where it tries to estimate the population measure. • Sample mean is an estimate of population mean • Sample median is an estimate of population median.
  • 76. END OF MODULE ~~X~~ Coming up Next = Statistical Inference - Core 76