2. Introduction to Frequency
Distributions
• In healthcare, we deal with vast quantities
of clinical data. Since it is very difficult to
look at data in raw form, data are
summarized into frequency distributions.
• A frequency distribution shows the values
that a variable can take and the number of
observations associated with each value.
3. Ungrouped Frequency Distribution
LOS Frequency
1 2
2 6
3 6 Frequency
4 5 Distribution for
5 11 Patient LOS
6 6
7 8
8 5
9 3
10 1
11 2
12 3
4. Grouped Frequency Distribution
Class Interval Frequency
1-2 8 Frequency
3-4 11 Distribution for
5-6 17 Patient LOS
7-8 13
9-10 4
11-12 5
5. The Variable
• A Variable is a characteristic or property
that may take on different values.
• Height, weight, gender, and third-party
payer are examples of variables.
6. The Variable
Variables can be classified into:
1. A Quantitative variable: measures outcomes that
are expressed numerically. Examples include
patient’s age, LOS, weight, height.
2. A Qualitative variable: consists of outcomes that
cannot be expressed numerically without
modification/coding. Examples include patient
satisfaction (very satisfied, satisfied, neutral,
dissatisfied, very dissatisfied), and evaluation of a
departmental performance (poor, average, good).
7. The Variable
Quantitative variables can either be:
i. Discrete variables: can assume only certain values, and there
are usually gaps between the consecutive values. For
example, the number of beds in a hospital can take only
integer values (e.g., 92, 95); thus, there is a “gap” between
possible values. Furthermore, you cannot say that the
hospital has 93.56 beds. Typically, discrete variables result
from counting.
ii. Continuous variables: here, observations can assume any
value within a specific range. Measuring body weight is an
example; it can take decimal values (e.g., 80.7 Kgms).
Typically, continuous variables result from measuring.
8. Scales/Levels of Measurement
(arranged from the lowest to the highest level of measurement)
1. Nominal scale Used with Qualitative
Variables
2. Ordinal Scale
3. Scale for Metric Variables Used with
Quantitative Variables
- The level of measurement of the data often dictates
the calculations that can be done to summarize and
present the data. It also determines the statistical
tests that should be performed with this data.
9. Scales/Levels of Measurement
1. Nominal scale:
• Here, observations of a qualitative variable may only
be classified and counted.
• Measures are organized into categories; there is no
recognition of order within these categories.
Examples of categories on the nominal scale of
measurement are gender, nationality, and
classification of the six colors of M&M’s milk
chocolate candies.
10. Scales/Levels of Measurement
Nominal scale:
Gender Number
Here, we
Male 6
classified
patients by
the gender
Female 4
attribute and
counted them
TOTAL 10
11. Scales/Levels of Measurement
2. Ordinal scale:
• Here, observations of a qualitative variable be
classified, ranked and counted.
• Here, categories have an order.
• Examples of ordinal variable are the ordering of
adjectives describing patient satisfaction; numbers
may be assigned to represent the ordering of the
variables (e.g., the Likert-type scale that have five
points from “strongly disagree” to “strongly
agree”).
12. Scales/Levels of Measurement
Ordinal scale:
Degree of Frequency
Agreement
S. Agree = 5 10
Here, we
classified the Agree = 4 8
degree of
agreement
on a ranked Neutral = 3 3
scale and
counted the Disagree = 2 2
responses
S. Disagree = 1 1
13. Scales/Levels of Measurement
2. Ordinal scale:
• Here, the order of numbers (e.g., 1-5) is meaningful and
the number can be dealt as a weight/score and can
yield a mean.
• Here, there are no equal intervals between successive
categories; so, we cannot say that the patient who
gives a score (1 = strongly agree) is 3-time more
agreeing with the statement than the patient who
responds (3 = neutral). We are not able to distinguish
the magnitude of the difference between groups.
14. Scales/Levels of Measurement
3. Scales for Metric variables:
Interval scale:
– Here, the intervals between successive values
are equal.
– There is a defined unit of measure, values can
be ranked and there is a meaningful difference
between values; but no true zero point and no
ratios between values (but there are ratios
between intervals).
15. Scales/Levels of Measurement
3. Scales for Metric variables:
Interval scale:
– Time of day is measured on an interval scale; we
cannot say that 10:00 AM is twice 5:00 AM, but
we can say that the interval between 0:00 AM
(midnight) and 10:00 AM is twice the duration is
twice as long as the interval between 0:00 AM
and 10:00 AM.
– We cannot say that 0:00 AM means absence of
time.
– We cannot say that 10:00 AM is twice as long as
5:00 AM.
16. Scales/Levels of Measurement
3. Scales for Metric variables:
Ratio scale:
– This is the highest level of measurement.
– The interval between successive values are
equal.
– There is a real zero point and ratio between
values (e.g.) for weight, if you have 0 kgms,
then you have no weight. On the length scale;
we can say "Mary is twice as tall as Jill" (i.e.) the
ratio of two numbers is meaningful.
18. Scales/Levels of Measurement
Ratio scale:
Students Score (out of 10)
A 8
Here, we
B 4 scored
(counted
C 10 the right
answers
for each
D 3 student)
E 6
20. Scales/Levels of Measurement
Ordinal scale:
Success status Frequency
Here, we
classified Excellent 1
the
students
on a
Good 2
ranked
scale and
counted
Fair 2
21. Measures of Central Tendency and
Variability
• Two main types of measures are used to
describe frequency distributions: measures
of central tendency and measures of
variability.
• Measures of central tendency focus on the
typical value of a data set, while measures of
variability measure dispersion around the
typical value of a data set.
22. Measures of Central Tendency
• Measures of central tendency summarize the
typical value of a variable.
• There are three major measures of central
tendency:
1. Mode: appropriate for nominal and metric data
2. Median: appropriate for ordinal and metric data
3. Mean: appropriate for metric data
23. Measures of Central Tendency
Mode
– The mode is the simplest measure of central
tendency.
– Mode is defined as the most frequently occurring
observation for metric data or the most
frequently occurring category for nominal data.
– It is the only measure of central tendency that is
appropriate for nominal data, because an average
cannot be taken from data that are placed in
categories (averages only work when a variable
has a unit of measurement).
24. Measures of Central Tendency
Mode
– The mode offers several advantages: for
example, it is not sensitive to extreme values and
it is easy to communicate and explain to others.
– On the other hand, the mode does not provide
information about the entire frequency
distribution - it only tells us the most frequently
occurring value (it records only one value
ignoring other values - unlike the mean) in the
frequency distribution.
25. Measures of Central Tendency
Median
– When categories of a variable are ordered, the
measure of central tendency should take order into
account. The Median does so by finding the value of
the variable which corresponds to the middle case.
– The median is the midpoint of the frequency
distribution (above which 50% of the cases fall and
below which 50% of the cases fall); it is appropriate
for both ordinal and metric data.
– Usually used with the metric data set which has
outlier values which may affect the mean value.
26. Measures of Central Tendency
Median
– If there is an odd number of observations, the median is
the middle number.
– If there is an even number of observations, the median
is the average between the two middle observations
(i.e.) the midpoint between the two middle observations.
If the two middle observations take on the same value,
the median is that value.
– There are several advantages of using the median:
1. It is relatively easy to obtain,
2. It is not influenced by extreme values.
27. Measures of Central Tendency
Median
The following example illustrates one common use
of the "median": assume that one faculty have 50
available scholarships and that it has to choose 50
scholars among the 101 graduates who applied for
such scholarships; one way to choose the best 50 is
to arrange them in an ascending order based on
their scores, and choosing the 50 ones beyond the
one with the median score value (the one with the
50th percentile score).
28. Measures of Central Tendency
Mean
– Symbolized by X, it is the arithmetic average of
the values on the variable; it is appropriate for
metric data.
– The mode and the median can be computed on
metric data but they do not take full advantage
of the numeric data in the frequency
distribution.
– It is calculated by dividing the summation of the
values observed by the total number of
observation in the distribution.
29. Measures of Central Tendency
Mean
– There are two disadvantages associated with the
mean:
• First, the mean can take a fractional value even
when the variable itself can take only integer
values (e.g.) 16.7 days - this means that number
of days is between 16 and 17 days.
• Second, the mean is sensitive to extreme
measures (i.e.) strongly influenced by outliers,
which can produce a misleading value.
30. Measures of Central Tendency
Weighted Mean
– Often in the healthcare setting, we have
separate samples (e.g., for different time
intervals) with separate means for each, and
each sample may be of different size.
– The weighted mean takes into account the
difference in the sizes of the samples and is
therefore more precise.
31. Measures of Central Tendency
Weighted Mean
Month Discharges ALOS
Jan 974 4.46
Feb 763 5.20
Mar 574 3.21
Average of the Means = 4.46+5.20+3.21/3 = 4.29
Weighted Mean = [974(4.46)] + [763(5.20)] + [574(3.21)] /2311= 4.39
More precise
32. Measures of Central Tendency
N.B.
– In case a data set has a normal (symmetrical)
distribution, the three measures of central
tendency coincide at the same point
(i.e., mean=mode=median).
33. Measures of Variability
• Measures of variability tell the spread of the
frequency distribution (i.e.) how widely the
observations are spread out around the measure
of central tendency.
• The most commonly used measures of spread are
the variance and the standard deviation.
• Measures of spread increase in value with greater
variation on the variable. Measures of spread
equal zero when there is no variation.
34. Measures of Variability
Range
– The simplest measure of spread is the range. It is
simply the difference between the smallest and
the largest values in a frequency distribution:
Range = Xmax – Xmin
– The range is easy to calculate but it is affected by
extreme measures; only the two most extreme
scores affect its value, so it is not sensitive to
other values in the distribution. The range is also
dependant on the sample size: in general, the
larger the sample size, the greater the range.
35. Measures of Variability
Range
– Two frequency distributions may have the same
range, but the observations may differ greatly in
variability. For example, consider the following two
frequency distributions:
Distribution 1
1 2 3 4 5 6 7 8 9 10
Distribution 2
1 1.5 3 3.5 3.7 7 8 8.2 10 10
36. Measures of Variability
Range (continue previous slide)
– The range for both distributions is 9. But if we
compare the two distributions, we see that there is
more variation in distribution 2 than in distribution 1.
– This is confirmed when the variance for each
distribution is calculated – the variances for
distributions 1 and 2 are 3.03 and 3.44, respectively.
37. Measures of Variability
Variance
– The variance (s2) is the average of the squared
deviations from the mean; the squared
deviations of the mean are calculated by
subtracting the mean of a frequency distribution
from each value in the distribution. The
difference between the two values is then
squared. The squared differences are summed
and divided by (n -1).
38. Measures of Variability
Variance
n
2
(Xi X)
s2 = i 1
n-1
The interpretation of the variance is not easy at
the descriptive level because the original units of
measure are squared.
39. Measures of Variability
Standard Deviation
– The square root of variance is the standard
deviation (s). It measures variability in the same
unit of measurement as the sample.
– The standard deviation is the most widely used
measure of variation that is used in descriptive
statistics.
40. Measures of Variability
Standard Deviation
– Example: if the mean for ages of nursing home
residents is 82 and the SD is 4.45, this means
that approximately 68% of the nursing home
residents are between the ages 77.55 and 86.45.
41. Percentiles
– Percentiles are measures of ranking.
– If in a graduation class, a student has the 85th
percentile rank; this means that 85% of the students
of the class are ranked lower than this student. Thus,
the percentile rank for an individual graduating last
would be 0% because there is no classmates ranked
below. In a class of 200, the first will have a percentile
rank of 99.5; this means that 99.5% of the class
students (199 students) are ranked lower than him.
– The individual with the “median” rank has the 50th
percentile rank.
42. Ratios, Proportions, and Rates
– Qualitative nominal variables often have only
two possible categories, such as alive or dead, or
male or female. Variables having only two
possible categories are called dichotomous.
– The frequency measures used with dichotomous
variables are ratios, proportions, and rates.
– The 3 measures are based on the same formula:
X
Ratio, proportion, rate = 10n
Y
43. Ratios, Proportions, and Rates
Ratios
– In a ratio, the values of a variable, such as sex (x =
female, y = male), may be expressed so that x and y are
completely independent of each other, or x may be
included in y.
– For example, the sex of patients discharged from a
hospital could be compared in either of two ways:
Female/male or x/y
Female/(male + female) or x/(x+y)
− Both expressions are considered ratios (the 2nd type is
called proportion)
44. Ratios, Proportions, and Rates
Ratios
– For example, suppose that the female
discharges from your hospital during July were
457 and the male discharges were 395; then
female-to-male ratio would be 457/395 or
1.16/1 (i.e.) there were 1.16 female discharges
for every male discharge.
45. Ratios, Proportions, and Rates
Proportions
– A proportion is a particular type of ratio.
– A proportion is a ratio in which x is a portion of
the whole, x+y.
– In the pre-mentioned example, the proportion of
female discharges during July would be
457/ (457+395) = 457/852 or 0.54/1.00 (i.e.) the
proportion of discharges that were female is 0.54.
46. Ratios, Proportions, and Rates
Rates
– In healthcare, rates are often used to
measure an event over time and are
sometimes used as performance
improvement measures.
– The basic formula for a rate is:
No. of cases or events occurring during a given time period x 10n
Number of cases or population at risk during same time period
47. Ratios, Proportions, and Rates
Rates
– In inpatient facilities, there are many commonly
computed rates.
– In computing the CS rate, for example, we count the
number of C-sections performed during a given period
of time; this value is placed in the numerator. The
number of cases or population at risk is the number of
women who delivered during the same time period; this
number is placed in the denominator.
– By convention, inpatient hospital rates are calculated as
a rate per 100 cases and are expressed as a percentage.
48. Rates
• Example: for the month of July, 23 C-sections
were performed; during the same time period,
149 women delivered.
What is the CS rate for the month of July?
The rate would be (23/149) x 100 = 15.4%;
thus, the CS rate for the month of July is 15.4%.
49. Inpatient Census
• Inpatient census refers to the number of hospital
inpatients present at any one time.
• Because the census may change throughout the
day as admissions and discharges occur; in most
facilities the official count takes place at
midnight (12:00 am). This count is referred to as
(Daily Inpatient Census).
• (Daily Inpatient Census) includes any patient
who was admitted and discharged the same day.
(e.g.) a patient who was admitted at 1 pm and
died at 4 pm on the same day.
50. Sample Daily Inpatient Census Report
May 2
Number of patients in hospital at midnight, May 1 230
+ Number of patients admitted May 2 +35
- Number of patients discharged, including deaths, May 2 -40
Number of Patients in hospital at midnight, May 2 225
+ Number of patients both admitted and discharged on May 2 , +5
including deaths (those patients are not present during time of census but
they utilized the hospital services during this day)
Daily Patient Census at midnight, May 2 230
51. Inpatient Bed Occupancy Rate
• The (Inpatient Bed Occupancy Rate) is the
percentage of official beds occupied by hospital
inpatients for a given period of time.
• The numerator here is the total number of (daily
inpatient census) during a certain period, while the
denominator is the total number of (bed count days)
= no. of beds multiplied by the no. of days during the
same period.
• Formula=
Total no. of daily inpatient census for a given period x 100
Total no. of inpatient bed count days for the same period
52. Inpatient Bed Occupancy Rate
• For example, if 200 patients occupied 280 beds on
May 2, the (inpatient bed occupancy rate) would be
(200/280)x100 = 71.4%
• If the total inpatient census for 7 days = 1729; the no.
of beds (280) would be multiplied by the no. of days
(7), and (the inpatient bed occupancy rate) for that
week would be [1729/(280x7)]x100 = 88.2%
• The inpatient bed occupancy may exceed 100%; this
may occur in case of epidemics or disasters when
hospitals set up extra temporary beds that are not
included in the official count.
53. Bed Turnover Rate
• Bed turnover rate informs us about the number of
times each hospital bed change occupants.
• Formula=
Total no. of discharges, including deaths, for a given time period
Average bed count for the same period
Average bed count = summation of the daily no. of available beds
divided by the no. of days
• For example, if hospital XYZ experienced discharges
(including deaths) = 2060 during April; and the
average bed count was 400. then the bed turnover
rate = 2060/400 = 5 (this means that on average each
hospital bed had 5 occupants during April)
54. Length of Stay (LOS) Data
• LOS is calculated after patient is discharged.
• It refers to the number of calendar days from the day
of patient admission to the day of discharge. Day of
discharge is not counted.
• It is calculated by subtracting the day of admission
from the date of discharge. For example, the LOS of a
patient admitted on May 12 and discharged on May
17 is 5 days (17-12 = 5).
• When patient is admitted and discharged on the
same day, the LOS = 1 day. Similarly is the LOS of the
patient who is admitted on one day and discharged
the next day.
55. Length of Stay (LOS) Data
• When the LOS for all patients discharged (or died) for
a given period of time is summed, the result is the
total LOS.
Patient LOS
1 5
2 3
3 1
4 8
5 10
Total LOS 27
56. Length of Stay (LOS) Data
• The total LOS divided by the number of patients
discharged is the average LOS (ALOS).
• Formula =
Total LOS for a given time period
Total no. of discharges (including deaths) for the same period
• For the previous example, 27/5 = 5.4 days
57. Hospital Inpatient Death (Mortality)
Rate
• It is the basic indicator of mortality in a healthcare
facility.
Total no. of inpatient deaths for a given time period x 100
Total no. of discharges, including deaths, for the same period
• There are more specific mortality measures
(e.g.) maternal death rates.
58. Measures of Morbidity
• Some commonly used measures to describe the
presence of disease in a community or a specific
location, such as a nursing home (long-term care
facilities), are incidence and prevalence rates.
59. Incidence Rate
• The formula for calculating the incidence rate is:
Total no. of new cases of a specific disease during a given time interval x 10n
Total population at risk during the same time interval
For 10n, a value is selected so that the smallest rate calculated
results in a whole number.
60. Prevalence Rate
• The formula for calculating the prevalence rate is:
All new & preexisting cases of a specific disease during a given time interval x 10n
Total population during the same time period
Example: At Manor Nursing Home, 10 new cases of Klebsiella pneumoniae
occurred in January. For the month of January there were a total of 17
cases of Klebsiella pneumoniae. The facility had 250 residents during
January.
Incidence rate for the month of January= (10/250) x 100 = 4%
Prevalence rate for the month of January = (17/250) x 100 = 6.8%
61. Hospital Infection Rates
• The most common morbidity rates calculated for
hospitals are related to hospital-acquired
(nosocomial) infections.
• Examples of nosocomial infections include: urinary
tract infections, infections related to intravascular
catheters, surgical wound infections, respiratory
tract infections, ……etc.
• The hospital infection rate can be calculated for the
entire hospital or for a specific unit in the hospital.
62. Hospital Infection Rates
• The formula for calculating the nosocomial infection
rate is:
Total no. of hospital infections for a given time period x 100
Total no. of discharges, including deaths, for the same period
• The formula for calculating the post-operative
infection rate is:
No. of infections in clean surgical cases for a given time period x 100
Total no. of surgical operations for the same period
A clean surgical case is the one in which no infection existed prior to surgery
63. Safety-Related Measures
• Incidence rates are used to monitor safety-related
issues. Nosocomial infection rate is a safety-related
measure.
• The numerator of the rate is the number of times the
specific event occurred in the observed population.
The denominator includes the patient-days (patients
days are obtained by summing the LOS for all
patients during a given time interval).
64. Safety-Related Measures
• The use of the number of hospital discharges as a
denominator to calculate error and infection rates is
crude and inaccurate. It is better to use "patient-days" as
a denominator, since a patient with a 4-day admission
would have on average twice the risk of exposure as one
with a 2-day admission.
• For example, VAP at ICU’s can be calculated using the
denominator (1000 ventilated patient-days) rather than
the denominator (Total no. of discharges from ICU’s).
65. Safety-Related Measures
• The use of unified denominators allows for valid
comparisons and identification of real differences in
frequency.
• For example; if ward "A" at one hospital has a number
of falls = 6 for 182 patient-days (i.e., incidence rate =
33 per 1,000 patient-days), while ward "B" has a
number of falls = 12 for 720 patient-days (i.e.,
incidence rate = 16 per 1,000 patient-days); focusing
attention on ward B would divert attention from ward
A which has the higher rate of falls.
66. Safety-Related Measures
Risk Stratification and Subgroup Analysis
• Heterogeneity of risk within the patient
population or across the period of observation
calls for risk stratification.
• For example, risk stratification may involve
sorting the population at risk by a common
variable, such as age, gender, or admitting
diagnosis. More precise stratification can help to
identify the groups within a population at
greatest risk.
67. Safety-Related Measures
Risk Stratification by Wound Class
Rate of infection
Number of Number of (per 1,000
Wound Class Infections Surgeries surgeries)
I. Clean 3 160 18
II. Clean – Conta. 11 240 46
III. Contaminated 13 56 232
IV. Dirty 5 12 416
TOTAL 32 468 68
68. Safety-Related Measures
Risk Stratification and Subgroup Analysis
• When medication administration errors are sorted based
on the steps in the medication administration process
(i.e., prescribing, transcribing, dispensing, administration,
and monitoring), this is not considered as stratification
(i.e., developing strata based on the risk for error), rather
a subgrouping of different types of errors under the
broad heading of medication error (here, each patient
may be subject to the same error multiple times during
his/her hospital stay; so it’s better to have the
denominator equals the total administered doses).
69. Safety-Related Measures
Risk Stratification and Subgroup Analysis
• Even for a single type of error (e.g., administration), it is
possible to perform a further subgrouping analysis as
shown in the table in the next slide.
70. Safety-Related Measures
Errors While Administering Oral Medication,
by subgroup of Error
Type of Error Number of Errors Number of Doses Incidence Rate
(per 10,000 doses)
Wrong 14 14,284 9.8
medication
given
Wrong dose 32 14,284 22.1
Wrong patient 12 14,284 8.4
Patient allergic 4 14,284 2.8
71. N.B.
For calculating rates, we have 3 options for the
denominator:
1. To include the total number of patients/population
subject to the event (i.e., total number of discharges for
with safety-related measures
the whole hospital or one unit inside the hospital).
2. To be in the form of patient days (after summing the LOS
of all patients subject to the event).
3. To be in the form of events (after summing the total
number of events, like performed surgeries and
administered doses). Here, one patient may be subject to
the same event more than once.
72. Criteria for the Proper
Measurement Process
These criteria are:
1. Validity
2. Reliability
3. Sensitivity
4. Specificity
73. Validity
• Accuracy in measurement cannot happen without validity. The
measuring instrument, whether a ruler, an IQ test, or a survey
instrument, is considered valid if it measures what it is intended
to measure and for the intended purpose.
• A ruler or scale is a direct measure. In healthcare, because we
cannot measure quality with scales, quality is often assessed
through indirect measures.
• For example, measuring the number of inpatient admissions for
long-term diabetes in a Metropolitan area is actually a measure
of the effectiveness of managing these patients in the
ambulatory settings; the more the effective this management,
the less the number of hospitalization of the chronic DM cases.
Here , inpatient admissions act as a proxy measure.
74. Reliability
• Error is integral to the measurement process, whether it is
measurement of weight, height, or blood pressure. Even
when measurement is made as accurately as the instrument
allows and all procedures are followed, repeated measures
do not always give exactly the same results.
• However, an instrument that is reliable will tend to have
results that are consistent with each other over repeated
trials. A measurement process is said to be reliable if
repeated measurements over time on the same property or
attribute give the same or approximately the same results.
• An example of unreliable measuring device is a scale that
gives widely different weights each time the same object is
measured.
75. Sensitivity
• A measure is sensitive if it includes true positives, in
addition to the false positives.
• Sensitive tests are used for rapid tests (e.g.) HCV;
which are then confirmed later on by the time-taking,
more expensive, and more interventional specific
tests.
• Used for screening. Usually performed through urine
analysis or pinpricks.
76. Specificity
• A measure is specific if it identify the true positive
cases.
• For example, a population elements (1000) who were
screened for HCV by an initial sensitive rapid test
yielded 600 positives; those 600 are later diagnosed
using ELIZA test. 200 out of the 600, for example, may
turn out to be negatives. Thus we have the more
expensive test conducted on 600 instead of 1000.
• Specific tests are more expensive and more
interventional (drawing blood samples).
• Specific tests are usually used for research purposes.