2. Probability
• Chance of observing a particular outcome
• Likelihood of an event
• Assumes a “stochastic” or “random” process: i.e..
the outcome is not predetermined - there is an
element of chance
2
3. • Probability theory developed from the study of
games of chance like dice and cards.
• A process like flipping a coin, rolling a die or
drawing a card from a deck are probability
experiments.
3
4. Why Probability in Statistics?
• Results are not certain
• To evaluate how accurate our results are:
– Given how our data were collected, are our
results accurate ?
– Given the level of accuracy needed, how many
observations need to be collected ?
4
5. • Probability theory is a foundation for
statistical inference, &
• Allows us to draw conclusions about a
population of patients based on
information obtained from a sample of
patients drawn from that population.
5
6. More importantly probability theory is used to
understand:
– About probability distributions: Binomial,
Poisson, and Normal Distributions
– Sampling and sampling distributions
– Estimation
– Hypothesis testing
– Advanced statistical analysis
6
7. Two Categories of Probability
• Objective and Subjective Probabilities.
• Objective probability
1) Classical probability and
2) Relative frequency probability.
7
8. Classical Probability
• Is based on gambling ideas
• Rolling a die -
– There are 6 possible outcomes:
– Total ways = {1, 2, 3, 4, 5, 6}.
• Each is equally likely
– P(i) = 1/6, i=1,2,...,6.
P(1) = 1/6
P(2) = 1/6
…….
P(6) = 1/6
SUM = 1
8
9. • Definition: If an event can occur in N mutually
exclusive and equally likely ways, and if m of these
posses a characteristic, E, the probability of the
occurrence of E = m/N.
P(E)= the probability of E = P(E) = m/N
• If we toss a die, what is the probability of 4 coming up?
m = 1(which is 4) and N = 6
The probability of 4 coming up is 1/6.
9
10. • Another “equally likely” setting is the tossing
of a coin –
– There are 2 possible outcomes in the set of all
possible outcomes {H, T}.
P(H) = 0.5
P(T) = 0.5
SUM = 1
10
11. Relative Frequency Probability
Definition: The probability that something occurs
is the proportion of times it occurs when exactly
the same experiment is repeated a very large
(preferably infinite!) number of times in
independent trials.
• If a process is repeated a large number of times (n), and
if an event with the characteristic E occurs m times, the
relative frequency of E,
Probability of E = P(E) = m/n.
11
12. • If you toss a coin 100 times and head comes
up 40 times,
P(H) = 40/100 = 0.4.
• If we toss a coin 10,000 times and the head
comes up 5562,
P(H) = 0.5562.
• Therefore, the longer the series and the longer sample size,
the closer the estimate to the true value.
12
13. • Since trials cannot be repeated an infinite
number of times, theoretical probabilities are
often estimated by empirical probabilities based
on a finite amount of data
• Example:
Of 158 people who attended a dinner party, 99
were ill.
P (Illness) = 99/158 = 0.63 = 63%.
13
14. • In 1998, there were 2,500,000 registered live
births; of these, 200,000 were LBW infants.
• Therefore, the probability that a newborn is LBW
is estimated by
P (LBW) = 200,000/2,500,000
= 0.08
14
15. Subjective Probability
• Personalistic (represents one’s degree of belief in the
occurrence of an event).
• Personal assessment of which is more effective to
provide cure – traditional/modern
• Personal assessment of which sports team will win a
match.
• Also uses classical and relative frequency methods to
assess the likelihood of an event.
15
16. • E.g., If someone says that he is 95% certain that
a cure for AIDS will be discovered within 5
years, then he means that:
P(discovery of cure for AIDS within 5 years) = 95% =
0.95
• Although the subjective view of probability has enjoyed
increased attention over the years, it has not fully accepted by
scientists.
16
17. Definitions of some terms commonly
encountered in probability
Experiment : In statistics anything that results in a
count or a measurement is called an experiment.
Sample space: The set of all possible outcomes of
an experiment , for example, (H,T).
Event: Any subset of the sample space H or T.
17
18. Mutually Exclusive Events
Two events A and B are mutually exclusive if they
cannot both happen at the same time
P (A and B) = 0
• Example:
– A coin toss cannot produce heads and tails
simultaneously.
– Weight of an individual can’t be classified
simultaneously as “underweight”, “normal”,
“overweight”
18
19. Independent Events
• Two events A and B are independent if the
probability of the first one happening is the
same no matter how the second one turns
out. OR. The outcome of one event has no effect on the occurrence or
non-occurrence of the other.
P(A∩B) = P(A) x P(B) (Independent events)
19
20. Example of independent event
A classic example is n tosses of a coin and the
chances that on each toss it lands heads. These
are independent events.
The chance of heads on any one toss is
independent of the number of previous heads.
No matter how many heads have already been
observed, the chance of heads on the next toss
is ½.
20
21. Intersection, and union
• The intersection of two events A and B, A ∩ B, is the
event that A and B happen simultaneously
P ( A and B ) = P (A ∩ B )
• Let A represent the event that a randomly selected
newborn is LBW, and B the event that he or she is
from a multiple birth
• The intersection of A and B is the event that the infant
is both LBW and from a multiple birth
21
22. • The union of A and B, A U B, is the event that
either A happens or B happens or they both
happen simultaneously
P ( A or B ) = P ( A U B )
• In the example above, the union of A and B is
the event that the newborn is either LBW or
from a multiple birth, or both
22
23. Basic Probability Rules
1. Addition rule
If events A and B are mutually exclusive:
P(A or B) = P(A) + P(B)
P(A and B) = 0
More generally:
P(A or B) = P(A) + P(B) - P(A and B)
P(event A or event B occurs or they both occur)
23
24. The additive law, when applied to two mutually
exclusive events, states that the probability of
either of the two events occurring is obtained by
adding the probabilities of each event.
24
25. Example1: a thrown die may show a one or a two,
but not both. The probability that it shows a one
or a two
Pr(1 or 2) = Pr (1) + Pr(2).
Pr(1 or 2) = Pr (1/6) + Pr(1/6)= 2/6
25
26. Extension of the additive law to more than two
events indicates that if A, B, C… are mutually
exclusive events, Pr(A or B or C or…) = Pr (A) +
pr(B)+ pr(C) + …
when A and B are not mutually exclusive, Pr(A
or B) = Pr (A) + Pr(B) – Pr(A and B).
26
27. Example2:
• Of 200 seniors at a certain college, 98
are women, 34 are majoring in Biology,
and 20 Biology majors are women. If
one student is chosen at random from
the senior class, what is the probability
that the choice will be either a Biology
major or a women?
27
28. • Pr ( Biology major or woman ) = Pr (Biology
major) + Pr(woman ) - Pr (Biology major and
woman)
=34/200 + 98/200 - 20/200 = 112/200 = .56
28
30. Example
Suppose we toss a coin twice, and the probability
of two heads occurring is the product of their
probabilities, that is
Pr(two heads)= Pr(1/2)* Pr(1/2) =1/4
30
31. Conditional Probability
• Refers to the probability of an event, given that
another event is known to have occurred.
• “What happened first is assumed”
• Hint - When thinking about conditional probabilities, think in stages. Think of
the two events A and B occurring chronologically, one after the other, either
in time or space.
31
32. • The conditional probability that event B has
occurred given that event A has already
occurred is denoted P(B|A) and is defined
provided that P(A) ≠ 0.
32
33. Example
• Suppose in country X the chance that an
infant lives to age 25 is .95, whereas the
chance that he lives to age 65 is .65.
• What is the chance that a person 25 years of
age survives to age 65?
33
34. Notation Event Probability
A Survive birth to age 25 .95
A and B Survive both birth to age 25 and age 25 to 65 .65
B/A Survive age 25 to 65 given survival to age 25 ?
• Then, Pr(B/A) = Pr(A and B ) / Pr(A) = .65/.95 = .684 . That is, a
person aged 25 has a 68.4 percent chance of living to age 65.
34
35. Properties of Probability
1. The numerical value of a probability always lies
between 0 and 1.
0 P(E) 1
A value 0 means the event can not occur
A value 1 means the event definitely will occur
A value of 0.5 means that the probability that the
event will occur is the same as the probability that it
will not occur.
35
36. 2. The sum of the probabilities of all mutually
exclusive outcomes is equal to 1.
P(E1) + P(E2 ) + .... + P(En ) = 1.
3. For two mutually exclusive events A and B,
P(A or B ) = P(A) + P(B).
If not mutually exclusive:
P(A or B) = P(A) + P(B) - P(A and B)
4. For two independent events A and B,
P(A and B ) = P(A)*P(B).
36
37. 5. The complement of an event A, denoted by Ā or
Ac, is the event that A does not occur
• Consists of all the outcomes in which event A
does NOT occur
P(Ā) = P(not A) = 1 – P(A)
• Ā occurs only when A does not occur.
• These are complementary events.
37
38. • In the example, the complement of A is the
event that a newborn is not LBW
• In other words, A is the event that the child
weighs 2500 grams at birth
P(Ā) = 1 − P(A)
P(not low bwt) = 1 − P(low bwt)
= 1− 0.08
= 0.92
38
39. EXERCISE
Consider certain area X the probability of having
hookworm infestation is 0.5 and the probability of
having schistosomiasis is 0.6.
What is the probability of having hookworm or
schistosomiasis?
39
41. • A probability distribution is a device used to
describe the behavior that a random variable may
have by applying the theory of probability.
• It is the way data are distributed, in order to draw
conclusions about a set of data
• Random Variable = Any quantity or characteristic
that is able to assume a number of different values
such that any particular outcome is determined by
chance
41
42. • Random variables:
can be either discrete or continuous
• A discrete random variable is able to assume
only a finite or countable number of outcomes
• A continuous random variable can take on any
value in a specified interval
42
43. Therefore, The probability distribution can be
displayed in the form of a table giving the
values and their associated probabilities
and/or it can be expressed as a mathematical
formula giving the probability of all possible
values.
43
44. A. Discrete Probability Distributions
• For a discrete random variable, the probability
distribution specifies each of the possible outcomes
of the random variable along with the probability
that each will occur
44
45. • We represent a potential outcome of the
random variable X by x
0 ≤ P(X = x) ≤ 1
∑ P(X = x) = 1
45
46. Example 1: The following data shows the number of
diagnostic services a patient receives
46
47. • What is the probability that a patient receives
exactly 3 diagnostic services?
P(X=3) = 0.031
• What is the probability that a patient receives at
most one diagnostic service?
P (X≤1) = P(X = 0) + P(X = 1)
= 0.671 + 0.229
= 0.900
47
48. • What is the probability that a patient receives
at least four diagnostic services?
P (X≥4) = P(X = 4) + P(X = 5)
= 0.010 + 0.006
= 0.016
48
49. Probability distributions can also
be displayed using a graph
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5
No. of diagnostic services, x
Probability,
X=x
49
50. Example 2
Toss a coin 2 times. Let x be the number of
heads obtained. Find the probability distribution
of x .
Pr (X = xi) , i = 0, 1, 2,
Pr (x = 0) = 1/4 …………………………….. TT
Pr (x = 1) = 1/2 ……………………………. HT TH
Pr (x = 2) = 1/4 ……………………………..HH
50
52. The expected value, denoted by E(x) or ,
represents the “average” value of the random
Variable in the long run.
• It is obtained by multiplying each
possible value by its respective probability and
summing over all the values.
The Expected Value of a Discrete Random
variable
52
53. E(X) = =
Where the xi’s are the values the random
variable assumes with positive probability
)
x
P(X
n
x i
1
i
i
53
54. Probability distribution of X.
E(X) = = 0(1/4)+ 1(1/2) +2(1/4)
= 1
Thus, on average we would expect one head for
each toss of a coin.
54
55. The variance represents the spread of all values
that have positive probability relative to the
expected value.
In particular, the variance is obtained by
multiplying the squared distance of each
possible value from the expected value by its
respective probability and summing overall the
values that have positive probability.
The Variance of a Discrete Random
Variable
55
57. 1. Binomial Distribution
• Is the distribution followed by the number of successes
in n independent trials when the probability of any
single trial being a success is p.
• It is one of the most widely encountered discrete
probability distributions.
• Consider dichotomous (binary) random variable
Patient survives or dies
A specimen is positive or negative
A child has vaccinated or not vaccinated
57
58. Example:
• We are interested in determining whether a
newborn infant will survive until his/her 70th
birthday
• Let Y represent the survival status of the
child at age 70 years
• Y = 1 if the child survives and Y = 0 if he/she
does not
58
59. • The outcomes are mutually exclusive and
exhaustive
• Suppose that 72% of infants born survive to age
70 years
P(Y = 1) = p = 0.72
P(Y = 0) = 1 − p = 0.28
59
61. Binomial assumptions
1. The same experiment is carried out n times ( n
trials are made).
2. The result of each trial is independent of the
result of any other trial.
3. Each trial must have all outcomes that fall into
two categories.
• usually these outcomes are called “ success” and
“ failure”.
• If P is the probability of success in one trial, then
, 1-p is the probability of failure.
61
62. • If the binomial assumptions are satisfied, the
probability of r successes in n trials is:
62
r
n
r
P)
(1
P
r
n
r)
P(X
r= 0,1,2…n
63. = n Cr is the number of ways of choosing
r items from n.
• The general formula for the coefficients
63
r
n
r
n
r
n
r)!
(n
r!
n!
is
64. • n denotes the number of fixed trials
• r denotes the number of successes in
the n trials
• p denotes the probability of success
• q denotes the probability of failure (1- p)
64
65. Example:
• Suppose we know that 40% of a certain
population are cigarette smokers. If we take a
random sample of 10 people from this
population, what is the probability that we
will have exactly 4 smokers in our sample?
65
66. • If the probability that any individual in the
population is a smoker to be P=.40, then the
probability that r=4 smokers out of n=10 subjects
selected is:
P(X=4) =10C4(0.4)4
(1-0.4)10-4
= 10C4(0.4)4
(0.6)6
= 210(.0256)(.04666)
= 0.25
• The probability of obtaining exactly 4 smokers in the
sample is about 0.25.
66
67. • We can compute the probability of observing zero
smokers out of 10 subjects selected at random, exactly
1 smoker, and so on, and display the results in a table,
as given, below.
• The third column, P(X ≤ x), gives the cumulative
probability. E.g. the probability of selecting 3 or fewer
smokers into the sample of 10 subjects is
P(X ≤ 3) =.3823, or about 38%.
67
69. The probability in the above table can be
converted into the following graph
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
No. of Smokers
Probability
69
70. • If the true proportion of events of interest is P,
then in a sample of size n the mean of the
binomial distribution is np and the standard
deviation is :
p)
np(1
70
71. Example:
• 70% of a certain population has been immunized for
polio. If a sample of size 50 is taken, what is the
“expected total number”, in the sample who have
been immunized?
µ = np = 50(.70) = 35
• This tells us that “on the average” we expect to see
35 immunized subjects in a sample of 50 from this
population.
71
72. Exercise
Suppose that in a certain malarious area past
experience indicates that the probability of a
person with a high fever will be positive for
malaria is 0.7. Consider 3 randomly selected
patients (with high fever) in that same area.
72
73. • What is the probability that no patient will be
positive for malaria?
• What is the probability that exactly one
patient will be positive for malaria?
• What is the probability that exactly two of the
patients will be positive for malaria?
73
74. • What is the probability that all patients will be
positive for malaria?
• Find the mean and the SD of the probability
distribution given above.
74
75. 2. The Normal distribution
• The ND is the most important probability
distribution in statistics.
• Frequently called the “Gaussian distribution”
or bell-shape curve.
• Variables such as blood pressure, weight,
height, serum cholesterol level, and IQ score
— are approximately normally distributed
75
76. A random variable is said to have a normal
distribution if it has a probability distribution that is
symmetric and bell-shaped
76
77. The important characteristics of the Normal
Distribution are:
1. It is unimodal, bell-shaped and symmetrical
about x = u.
2. It is determined by two quantities: its mean (
) and SD ( ).
3. The total area under the curve about the x
axis is 1 square unit.
4. It is a probability distribution of a continuous
variable. It extends from minus infinity( -) to
plus infinity (+).
77
78. • We have different normal distributions
depending on the values of μ and σ2.
• We cannot tabulate every possible
distribution
• Tabulated normal probability calculations are
available only for the ND with µ = 0 and σ2=1
78
79. Standard Normal Distribution
It is a normal distribution that has a mean
equal to 0 and a SD equal to 1, and is denoted
by N(0, 1).
The main idea is to standardize all the data
that is given by using Z-scores.
These Z-scores can then be used to find the
area (and thus the probability) under the
normal curve.
79
80. • The standard normal distribution has
mean 0 and variance 1
80
81. • If a random variable X~N(,) then we can
transform it to a SND with the help of Z-
transformation
SND = Z score = x -
• Z represents the Z-score for a given x value
81
82. Area under any Normal curve
To find the area under a normal curve ( with
mean and standard deviation ) between
x=a and x=b, find the Z scores corresponding to
a and b (call them Z1 and Z2) and then find the
area under the standard normal curve between
Z1 and Z2 from the published table.
82
83. Z-scores are important because given a Z – value
we can find out the probability of obtaining a
score this large or larger (or this low or lower).
(look up the value in a z-table).
83
84. Example
a) What is the probability that z < -1.96?
(1) Sketch a normal curve
(2) Draw a perpendicular line for z = -1.9
(3) Find the area in the table
(4) The answer is the area to the left of the line P(z < -1.96)
= 0.0250
84
85. b) What is the probability that -1.96 < z < 1.96?
The area between the values P(-1.96 < z <
1.96) = .1 - .0250-.0250 = .9500
85
86. c) What is the probability that z > 1.96?
• The answer is the area to the right of the line P(z > 1.96) =
0.0250
N.B From the symmetry properties of the standard normal
distribution,
P(Z -x) = P(Z x)
86
87. Exercise
1. Compute P(-1 ≤ Z ≤ 1.5)
2. Compute P(-1.66 < Z < 2.85)
3. Find the area under SND for p(0.83 < z < 1.25)
4. Find the area for Z<1.96
87
88. Example
the height of adult men in United Kingdom,
which is approximately normal with men =
171.5cm and standard deviation = 6.5cm.
1. What is the probability that a randomly
selected men has a height taller than 180cm
88
89. 1. First find the corresponding SND= Z scores
SND = Z = x -
Z= 180-171.5 = 1.31
6.5
P(z>1.31)= 0.0951
or equivalently 9.51% of adult men are taller than
180cm.
89
90. 2. What is the probability that a randomly
selected men has a height shorter than 160cm
z= 160- 171.5 = -1.77
6.5
P( z< -1.77)= 0.0384
thus 3.84% of men are shorter than 160cm
90
91. 3.What is the probability that a randomly
selected men has a height between 165cm and
175cm
SND corresponding to 165cm
z= 165-171.5 = -1
6.5
proportion below this height is 0.1587
91
92. SND corresponding to 175cm
z= 175-171.5 = 0.54
6.5
• proportion above this height is 0.2946
• Proportion of men with height between 165cm
and 175cm.
= 1- proportion below165cm- proportion
above 175cm
= 1- 0.1587- 0.2946= 0.5467= or 54.67%.
92
93. Exercise
• The diastolic blood pressures of males 35–44
years of age are normally distributed with µ =
80 mm Hg and σ = 12 mm Hg
• Let individuals with BP above 95 mm Hg are
considered to be hypertensive.
93
94. a. What is the probability that a randomly selected male
has a BP above 95 mm Hg?
Ans. P(z>1.25)= 0.1056
Approximately 10.6% of this population would be
classified as hypertensive
b. What is the probability that a randomly selected male
has a DBP above 110 mm Hg?
Ans. P(z>2.50)= 0.0062
Approximately 0.6% of the population has a DBP above
110 mm Hg
94
95. c. What is the probability that a randomly
selected male has a DBP below 60 mm Hg?
Ans. P (Z < -1.67) = 0.0475
Approximately 4.8% of the population has a
DBP below 60 mm Hg
95