This is part one of the series of learning sessions designed to understand the basics of statistics used in pharmaceutical companies.
This presentation includes the following topics:
Accuracy and Precision
Tendency of data
Sampling errors and their mitigation
Confidence interval and range
T-test
2. Statistics is the study of the collection, analysis, interpretation,
presentation, and organization of data.
STATISTIC AND ITS USE IN PHARMACEITICAL
The use of statistic has become
need of time in pharmaceutical
industry. It is in not only used to
provide the rationale for many
criteria which are already defined
but also helps to revisit the system
statistically in scientific language
eliminating the works âmay be â,
âto some extentâ etc.
3. Following topics will be covered during this training Session:
ï± Basic Terminologies and their interpretation
ï± Tendency of Data
ï± Sampling Errors and Its mitigation
ï± Confidence Interval and range
ï± T-test
4. Accuracy describes the nearness of a measurement to the
standard or true value, i.e., a highly accurate measuring device
will provide measurements very close to the standard, true or
known values.
Precision is the degree to which
several measurements provide
answers very close to each other.
It is an indicator of the scatter in
the data. The lesser the scatter,
higher the precision.
5. Mean
The usual approach to showing the central tendency of a set of
data is to quote the average.
But Average is not always the true picture of process, it just
represents the middle value of data ignoring the tendency of data
and extremes of individual values which may be out of
specifications.
INDICATOR OF TENDENCY
6. Standard deviation
In statistics, the standard deviation (SD) (represented by the
Greek letter sigma, Ï) is a measure that is used to quantify the
amount of variation or dispersion of a set of data values.
INDICATOR OF TENDENCY
A standard deviation close to 0
indicates that the data points tend
to be very close to the mean (also
called the expected value) of the
set, while a high standard deviation
indicates that the data points are
spread out over a wider range of
values.
7. The coefficient of variation
Coefficient of variation = SD_
Mean
Expresses variation relative to the magnitude of the data.
INDICATOR OF TENDENCY
8. About 68% of values drawn
from a normal distribution are
within one standard deviation
Ï away from the mean; about
95% of the values lie within
two standard deviations; and
about 99.7% are within three
standard deviations. This fact
is known as the 68-95-99.7
(empirical) rule, or the 3-sigma
rule.
NORMAL DISTRIBUTION
9. Population
The complete collection of
individuals about whom we wish to
draw some conclusion.
e.g. Batch Size, Total No. of Packs
Produced
Sample
A random selection of individuals
from the population we wish to
study.
POPULATION AND SAMPLE
10. Bias â systematic error
A consistent form of mis-estimation of the mean.
Either most such samples would over-estimate
the value or most would under-estimate it.
Bias arises from flaws in our experimental design.
We can remove the bias by improving our
experimental design
For Example if we sample at the start of every
batch, the change with the passage of time can
not be detected
11. Random error
Any given sample has an equal chance of under- or over-estimating
the population mean value.
1. Over- and under-estimation are
equally likely.
2. Even the best designed
experiments are subject to random
error.
3. It is impossible get rid of random
error.
The random error can be minimized
by Sample Size and Variability within
the data
12. Sample Size
Big Samples : Good
Small Samples: Bad
Variability in the Data
Big SDs : Bad
Small SDs : Good
STANDARD ERROR OF MEAN (SEM)
13. Mean derived from a sample is unlikely to be a perfect estimate
of the population mean. Since it is not possible to produce a
single reliable value, a commonly used way forward is to quote a
range within which we are reasonably confident the true
population mean lies.
Such a range is referred to as a âconfidence intervalâ.
We add and subtract a suitable amount to the point estimate to
define upper and lower limits of an interval. We then state that
the true population mean probably lies somewhere within the
range we have now defined.
CONFIDENCE INTERVAL
14. Range should be wider
than SEM but not wider
enough that it becomes
meaningless
RANGE AND SEM
15. â95 %â CONFIDENCE
Imagine a scientist who regularly makes sets of measurements
and then expresses the results as 95 per cent confidence
intervals for the mean. The width of the intervals is calculated
so that 19 out every 20 will include the true population mean.
In the remaining twentieth case, an unusually unrepresentative
sample generates an interval that fails to include it.
Confidence intervals become wider with
greater SDs, narrower with larger sample
sizes and wider if higher levels of
confidence are required. Most
dramatically â small samples give
horribly wide 95 per cent confidence
intervals.
16. It can be used to determine if two sets of data are significantly
different from each other or the difference is just the result of
sampling error.
STUDENT T- Test
Although it can be calculated
manually, statistics software
are available to calculate t-
value at specific confidence
intervals and compare it p
value to verdict whether the
difference is real or mere the
result of sampling error.