2. Based on the Normal distribution
Probability distribution of a continuous variable
Most important probability distribution in statistical inference
NORMAL : statistical properties of a set of data
Most biomedical variables follow this
Its not a law
Truth : many of these characteristics approx. follow it
No variable is precisely normally distributed
Introduction
3. Can be used to model the distribution of variable of interest
Allows us to make useful probability statements
Human stature & human intelligence
PD powerful tool for summarizing , describing set of data
Conclusion about a population based on sample
Relationship between values of a random variable & probability
of their occurrence
Expressed as a graph or formulae
Introduction
4. Abraham de Moivre discovered the
normal distribution in 1733
French
Quetelet noticed this in heights of
army people.
Belgian
5. Gaussian distribution, after
Carl Friedrich Gauss.
German
Marquis de Laplace proved
the central limit theorem in
1810 , French
For large sample size the
sampling distribution of the
mean follows normal
distribution
If sample studied is large
enough normal distribution
can be assumed for all practical
purposes
8. The Normal Distribution
X
f(X)
µ
σ
Changing μ shifts the
distribution left or right.
Changing σ increases or
decreases the spread.
The normal curve is not a single curve but a
family of curves, each of which is determined
by its mean and standard deviation.
13. Properties Of Normal Curve
Perfectly symmetrical about its mean µ
has a so called ‘ bell-shaped’ form
Unimodal & Unskewed
The mean of a distribution is the midpoint of the
curve and mean = median = mode
Two points of inflection
The tails are asymptotic
As no of observations n tend towards → ∞
And the Width of class interval → 0
The frequency polygon approaches a smooth curve
14. Properties Of Normal Curve
The “area under the curve” is measured in standard
deviations from the mean
Total area under curve & x axis = 1 sq unit (based on
probability)
Transformed to a standard curve for comparison
Proportion of the area under the curve is the relative
frequency of the z-score
Mean = 0 and SD = 1 , unit normal distribution
15. Properties of the normal curve
General relationships: ±1 SD = about 68.26%
±2 SD = about 95.44%
±3 SD = about 99.72%
-5 -4 -3 -2 -1 0 1 2 3 4 5
68.26%
95.44%
99.72%
16. Consider the distribution of a group of runners :
mean = 127.8
SD = 15.5
68-95-99.7 Rule
68% of
the data
95% of the data
99.7% of the data
17. 8 0 9 0 1 0 0 1 1 0 1 2 0 1 3 0 1 4 0 1 5 0 1 6 0
0
5
1 0
1 5
2 0
2 5
P
e
r
c
e
n
t
P O U N D S
127.8 143.3112.3
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1± SD (15.5 kg) of the mean.
Weight(kg)
18. 8 0 9 0 1 0 0 1 1 0 1 2 0 1 3 0 1 4 0 1 5 0 1 6 0
0
5
1 0
1 5
2 0
2 5
P
e
r
c
e
n
t
P O U N D S
127.896.8
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.
158.8
Weight(kg)
19. 8 0 9 0 1 0 0 1 1 0 1 2 0 1 3 0 1 4 0 1 5 0 1 6 0
0
5
1 0
1 5
2 0
2 5
P
e
r
c
e
n
t
P O U N D S
127.881.3
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the mean.
174.3
Weight(kg)
20. Standard Scores are expressed in standard deviation units
To compare variables measured on different scales.
There are many kinds of Standard Scores. The most common is
the ‘z’ scores.
How much the original score lies above or below the mean of a
normal curve
All normal distributions can be converted into the standard
normal curve by subtracting the mean and dividing by the
standard deviation
The Standard Normal Distribution (Z)
21. Z scores
What is a z-score?
A z score is a raw score expressed in
standard deviation units.
S
XX
zHere is the formula for a z score:
22. Comparing X and Z units
Z
100
2.00
200 X ( = 100, = 50)
( = 0, = 1)
What we need is a standardized normal curve which can
be used for any normally distributed variable. Such a
curve is called the Standard Normal Curve.
23. Application of Normal Curve
Model
Using z scores to compare two raw scores from different
distributions
Can determine relative frequency and probability
Can determine percentile rank
Can determine the proportion of scores between the mean
and a particular score
Can determine the number of people within a particular
range of scores by multiplying the proportion by N
24. Using z scores to compare two raw scores
from different distributions
You score 80/100 on a statistics test and your friend also scores 80/100 on
their test in another section. Hey congratulations you friend says—we are
both doing equally well in statistics. What do you need to know if the two
scores are equivalent?
the mean?
What if the mean of both tests was 75?
You also need to know the
standard deviation
What would you say about the two test scores if the S in your
class was 5 and the S in your friends class is 10?
25. Calculating z scores
What is the z score for your test: raw
score = 80; mean = 75, S = 5?
S
XX
z 1
5
7580
z
What is the z score of your friend’s test:
raw score = 80; mean = 75, S = 10?
S
XX
z 5.
10
7580
z
Who do you think did better on their test? Why do you think this?
26.
27. Area under curve
Procedure:
To find areas, first compute Z scores.
Substitute score of interest for Xi
Use sample mean for µ and sample standard deviation for S.
The formula changes a “raw” score (Xi) to a standardized
score (Z).
S
XX
z
28. Finding Probabilities
If a distribution has:
= 13
s = 4
What is the probability of randomly selecting a score of
19 or more?
Find the Z score.
For Xi = 19, Z = 1.50.
Find area in Z table = 0.9332
Probability is 1- 0.9332 = 0.0668 or 0.07
X
Areas under the curve can also be expressed as probabilities
29.
30. In Class Example
After an exam, you learn that the mean for the class is 60,
with a standard deviation of 10. Suppose your exam score is
70.
What is your Z-score?
Where, relative to the mean, does your score lie?
What is the probability associated with your score (use Z
table)?
33. Your Z-score of +1.0 is exactly 1 s.d. above the mean (an area
of 34.13% + 50%) You are at the 84.13 percentile.
-5 -4 -3 -2 -1 0 1 2 3 4 5
< Mean = 60
Area 34.13%> <Area 34.13%
< Z = +1.0
68.26%
Area 50%-------> <-------Area 50%
95.44%
99.72%
34. What if your score is 72?
Calculate your Z-score.
What percentage of students have a score below
your score? Above?
How many students are in between you and mean
What percentile are you at?
35. Answer:
Z = 1.2 , area = 0.8849 (from left side upto z)
The area beyond Z = 1 - 0.8849 = 0.1151
(% of marks below = 88.49%)
(11.51% of marks are above yours)
Area between mean and Z = 0.8849 - 0.50 =
0.3849 = 38 %
Your mark is at the 88th percentile!
36.
37. What if your mark is 55%?
Calculate your Z-score.
What percentage of students have a score below
your score? Above?
What percentile are you at?
38. Answer:
Z = - 0.5
The area beyond Z = .3085
(30.85% of the marks are below yours)
Students above your score 1 – 0.3085 = 0.6915
(% of marks above = 69.15%)
Your mark is only at the 31st percentile!
39.
40. Another Question…
What if you want to know how much better
or worse you did than someone else?
Suppose you have 72% and your classmate has
55%?
How much better is your score?
41. Answer:
Z for 72% = 1.2 or area = 0.3849 (0.8849 – 0.5 ) above
mean
Z for 55% = -0.5 area 0.1915 below mean (table 0.3085)
1 – 0.3085 = 0.6915
0.6915 – 0.5 = 0.1915
Area between Z = 1.2 and Z = -.5 would be .3849 +
.1915 = .5764
Your mark is 57.64% better than your classmate’s
mark with respect to the rest of the class.
42. Probability:
Let’s say your classmate won’t show you the mark….
How can you make an informed guess about what
your neighbour’s mark might be?
What is the probability that your classmate has a mark
between 60% (the mean) and 70% (1 s.d. above the
mean)?
43. Answer:
Calculate Z for 70%......Z = 1.0
In looking at Z table, you see that the area between
the mean and Z is .3413
There is a .34 probability (or 34% chance) that your
classmate has a mark between 60% and 70%.
44.
45. The probability of your classmate having a mark
between 60 and 70% is .34 :
-5 -4 -3 -2 -1 0 1 2 3 4 5
< Mean = 60
Area 34.13%> <Area 34.13%
< Z = +1.0 (70%)
68.26%
Area 50%-------> <------Area 50%
95.44%
99.72%
46. Mean cholesterol of a sample : 210 mg %, SD = 20mg%
Cholesterol value is normally distributed in a sample of 1000.
Find the no of persons 1) > 210 2) > 260 3) < 250 4) between
210 and 230 .
Z1 = (210-210)/20 =0 area = 0.5 person = 1000*0.5 = 500
Z2 = (260-210)/20 = 2.5 , area = 0.9938
1 – 0.9938 = 0.0062
Persons = 1000*0.0062= 6.2
Z3 = (250-210)/20 = 2 , area = 0.9773 ,person = 1000 * 0.9773 =
977.2
Z4 = (230-210)/20 = 1 , area = 0.3413 , person = 1000*0.3413 =
341.3
Medical problem
47.
48. References :
1. Biostatistics ,7th edition By Wayne W. Daniel ,Wiley
India Pvt. Ltd.
2. Medical Statistics ,By K R Sundaram ,BI Publications.
3. Methods in Biostatistics ,7th edition By B K Mahajan
, Jaypee publication
4. Park’s Textbook of PSM , 22nd edition.
5. Biostatistics ,2nd edition By K.V.Rao ,Jaypee
publications.
6. Principles & practice of Biostatistics , 5th edition ,by
J.V.Dixit , Bhanot publishers.
50. Why z-scores?
Transforming scores in order to make comparisons, especially
when using different scales
Gives information about the relative standing of a score in
relation to the characteristics of the sample or population
Location relative to mean measured in standard deviations
Relative frequency and percentile
Gives us information about the location of that score relative
to the “average” deviation of all scores