Case control & cohort study

Dr. Bhumika Bhatt
Junior Resident

 DEFINITION
 TYPES OF STUDY
 ANALYTICAL STUDIES
 CASE CONTROL STUDY
 VARIANTS OF CASE CONTROL STUDY
 SUMMARY
 COHORT STUDY
 DIFFERENCE
 SUMMARY
 REFERENCE

 The most conventional definition of epidemiology
is "the study of the distribution and determinants
of health-related states or events in specified
populations, and the application of this study to
control health problems." ( John M.Last,1988)

Experimental Observational
RCT Non RCT
Analytical Descriptive
Ecological Cross-sectional Case-control Cohort

 In analytical studies , the subject of interest is the
individual within the population.
 The object is not to formulate but to test the
hypothesis.
 To evaluate an association between exposure and
disease.
 Analytical studies focuses on the magnitude of the
association between the exposure and the health
problem under the study.

 Unit of Study: Cases/Control(Individuals)
 Study Question : What had happened 
 Direction of Inquiry: E O
 Study Design:
 CasesNot
Exposed
Exposed
Control
Exposed
Not
Exposed

 A case–control study is an observational study in
which subjects are sampled based upon presence
or absence of disease and then their prior exposure
status is determined.
 Distinct feature:
a. Both exposure and outcome (disease) have
occurred before the start of the study.
b. The study proceeds backwards from effect to
cause.
c. It uses a control or comparison group to
support or refute an inference.

RISK
FACTORS
CASES
(Disease
Present)
CONTORLS
(Disease
Absent)
PRESENT
a b
ABSENT
c d
Total
a+c b+d

 Selection of cases and controls.
 Matching.
 Measurement of exposure and
 Analysis and interpretation.

 Study begins with cases, i.e. the patients in whom
the disease has already occurred.
 Patients with the disease in question (cases) were
enquired for all the details of their exposure to the
suspected cause.
 The new cases, which are similar clinically,
histologically, pathologically and in their duration
of exposure (stage) will be chosen to avoid any
error and for better comparison.

Definition of case: it involve two specifications-
(i) Diagnostic criteria :Enunciate clear cut
diagnostic criteria for the disease of interest. As far
as possible use criteria given by expert bodies.
(ii) Eligibility criteria : It is always advisable to take
the incident cases since the prevalent cases might
have changed their exposure status due to medical
advice etc.
Sources of Cases
 Hospitals.
 General population:

 Controls must be free from the disease under
study.
 The usual principle that is to be observed while
selecting controls should be that “like should be
compared with the like” to avoid errors and for
better comparison .
Sources of controls:
 Hospital controls
 General population
 Relatives/Neighborhood
To Do To Avoid
Select controls from
various diagnostic groups
so no particular risk
factors will be
overrepresented
Do not select patients who
have multiple concurrent
conditions
Select controls from
patients with acute
conditions so earlier
exposures could not have
been influenced by the
condition
Do not select patients with
diagnoses known to be
related to the risk factor of
interest
-Source of controls (healthy
population based or hospital based)
- No. of controls
- No. of control groups
- Method of sampling the controls
- Matching, if considered.

Population-Based Hospital-Based
Source population is better defined Subjects are more accessible
Easier to make certain that cases
and controls derive from the same
source population
Subjects tend to be more cooperative
Exposure histories of controls
more likely to reflect those of
persons without the disease of
interest
Easier to collect exposure information
from medical records and biological
specimens

 Defined as one which is associated both with
exposure and disease and is distributed unequally
in study and control groups.
Confounder
Exposure outcome
(i)Associated with the exposure of
interest.
(ii) Related to the outcome of the
interest.
(iii) It should not be in the direct chain
or link between the exposure and
outcome

Hypothesis:Whether consumption of alcohol is a risk factor for oral CA.
100 cases of oral CA and 100 healthy subjects were asked regarding
the history of alcohol consumption during past 15 years.
Odds ratio
= (a x d / b x c)
= (80 x 80)
(20 x 20)
= 16
Risk of getting oral cancer is 16 times higher if a person drinks alcohol.
History of
Alcohol
Oral Cancer
Present
Oral Cancer
Absent
Total
Present 80 20 100
Absent 20 80 100
Total 100 100 200
Due to the “hidden” effect of tobacco use because
people who drink alcohol are also often the ones who
also use tobacco; and tobacco use is itself a direct
cause of oral cancer, whether one drinks or not.
Findings may be false:
Dissecting hypothetical data into two strata
Tobacco Users Non-Tobacco Users
Stratum OR=60x5/20x15 =1 Stratum OR= 5x 60/15 x 20=1
Conclusion :Both the strata OR falls to 1 i.e. there is no risk of
cancer from alcohol after adjusting for the effect of tobacco

 Randomisation: If a group of subjects is divided
into two , using “random allocation” (syn.
Randomization) the 2 groups will be similar to
each other in all respect.
 Restriction: the subjects having the particular
confounding variable(s) are not taken up at all.
 Matching

 Defined as the process by
which we select controls in
such a way that they are
similar to cases with regards
to certain pertinent selected
variables (e g. age, sex,
occupation, social status etc. )
which are known to influence
the outcome of the disease.

Advantages Disadvantages
May increase the precision of case-
control comparisons and thus allow a
smaller study.
May be time-consuming and
expensive to perform.
The sampling process is easy to
understand and explain.
Some potential cases and controls
may be excluded because matches
cannot be made.
If analyzed correctly, provides
reassurance that matched variables
cannot explain case-control
differences in the risk factor of
interest.
The matched variables cannot be
evaluated as risk factors in the study
population.

 Information about the exposure should be obtained
in precisely the same manner for both cases and
controls.
 This may be obtained by the interviews, by
questionnaires, or by studying past records of
cases such as hospital records, employment
records.

The final step is Analysis:
 Exposure rate among cases and controls to
suspected factors.
 Estimation of the Disease risk associated with
exposure (Odds ratio).

CASES
(WITH LUNG CANCER
CONTROLS
(WITHOUT LUNG
CANCER)
SMOKERS 33(a) 55(b)
NON SMOKERS 2(c ) 27 (d)
TOTAL 35 (a + c) 82( b + d)

 Exposure rates:
A. Cases a/a + c = 33/35 = 94.2%.
B. Controls = b/b + d = 55/82 = 67.0%
 This shows frequency rate of lung cancer is
definitely higher among smokers than among non-
smokers.

The chance of something happening can be
expressed as a risk and/or as an odds
Risk = the chances of something happening
the chances of all things happening
Odds = the chances of something happening
the chances of it not happening

Example-1: If we choose a student randomly from your
class of say 9, how likely is it that you will be chosen?
Risk (probability) = 1/9 = .111
Odds = 1/8 = .125
 Example-2: Among 100 people at baseline, 20 develop
influenza over a year.
The risk is 1 in 5 (i.e. 20 among 100) = .2
The odds is 1 to 4 (i.e. 20 compared to 80) = .25

 Measure of strength of association between risk factors
and outcome.
 Odds ratio= P/1-P, P= Probability
 The odds ratio is also known as the cross-products ratio
 Based on 3 assumption:
1. Disease being investigated must be relatively rare. In fact
majority of the chronic disease have a low incidence in the
general population.
2. The cases must be representative of those with the
disease.
3. The controls must be representative of those without the
disease.

Cohort study Case control study

 Odds Ratio : ad/ bc
33 X 27/55 X 2 = 8.1
 Odds ratio is a Key Parameter in the analysis of case
control studies.
 It interprets that odds of cases being exposed are so
many times higher compared to the odds of controls
being exposed.
 In our example risk of lung cancer due to smoking is
8.1 as compared to non smoking.

 Selection Biases
 Berksonian Bias : The probability of admission to hospital or detection of
the outcome (disease) may be more among the cases simply because of the
exposure.
 Selection of inappropriate Cases or Controls : Cases or controls who do
not have adequate chance of exposure.
 Self selection Bias : Patients who are admitted to a particular hospital and
hence taken as cases may be systematically very different from most of the
patients with the disease but who are not admitted to that hospital, as regards
the exposure status.
 Survivorship Bias : Case control study generally takes the patients who are
living. Cases who have died are generally not taken and these may be
systematically very different from living case as regards the exposure status
 Selection of wrong control group : Controls who are not from the same
source population from where the cases have come; selection of close friends
of cases - since they would in general have the same behavioural factors as
cases (birds of a feather flock together ), example of condom use and STDs.
 Information (measurement) Biases
 Recall bias : Cases who are suffering from a disease are likely to recall much
more as regards their exposure (example on congenital malformation and
exposure to X - rays).
 Observer bias : If observer is aware of the case - control status, he/she may
subconsciously tend to ask much more from cases.
 Confounding Bias

 Combines the advantages of a cohort and a case
control study.
 Firstly , the study becomes inexpensive and take
care of the logistics.
 Secondly, we can calculate the incidence of the
disease which would not have been possible in a
usual case control study.
 Thirdly, the problem of recall bias and that the
controls may be from a different source population
than cases (which occur in case control study) have
been prevented.

Watch for 15 - 20 years
20 randomly selected
samples of those who
have not developed
mental illness
(controls)
analyse these 40
samples for serum
lithium and
make comparisons
between the two
groups
20 cases of mental
disease(cases)
Rest of the cohort is
continously folowed
Rest of the cohort ris
continously folowed
Hypothesis : High serum lithium levels are a cause of subsequent mental illness.
Take a cohort of say 1000 persons
who are free of mental disease, collect their
blood sample, preserve them in cold storage

Advantages:
 Recall bias is eliminated.
 If abnormalities in biologic characteristics such as
laboratory values are found, because the specimens were
obtained years before the development of clinical disease, it
is more likely that these findings represent risk factors or
other premorbid characteristics than a manifestation of
early, subclinical disease. When such abnormalities are
found in the traditional case-control study, we do not know
whether they preceded the disease or were a result of the
disease.
 More economical to conduct.
 It is possible to study different diseases (different sets of
cases) in the same case-cohort study using the same cohort
for controls.

Advantages Disadvantages
Efficient for the study of rare
diseases
Risk of disease cannot be
estimated directly
Efficient for the study of chronic
diseases
Not efficient for the study of rare
exposure
Tend to require a smaller sample
size than other designs
More susceptible to selection bias
than alternative designs
Less expensive than alternative
designs
Information on exposure may be
less accurate than that available in
alternative designs
May be completed more rapidly
than alternative designs

Review of research question and confirm that case -
control study is the right design.
Specify the total population and actual (study)
population.
Specify the major study variables
(exposure,outcome,confounding factors) and their
‘scales’ of measurement(dichotomous etc)
Calculate the sample size.
Specify the selection criteria of cases
• Well suited for diseases which have a long latent
period(e.g. cancers, AIDS, MI, CVA etc.)
• Well suited for an outcome which is ‘rare’
• Well suited for conditions in which medical care is
usually sought
• Helps in examining multiple etiologic factors - once we
have the cases of the disease, we can take history of all
the factors that we feel may be risk factors
• Reasonably good for diseases that have a “relatively
rapid onset” and are usually hospitalised (e.g. most of the
acute infections; injuries etc.)

Specify the selection procedure for controls
Specify the procedures of measurement and
specially take care to ensure validity and
reliability
Do a pilot study on 5 to 10 cases and controls
Conduct the study
Analysis of data

Forward looking ,incidence , longitudinal, prospective
study or follow up study
 Cohort = Group of people who share a common
characteristic or experience within a defined time
period(age, occupation ,exposure etc).
 Cohort study: Cohort studies are observational studies
in which the investigator determines the exposure
status of subjects and then follows them for subsequent
outcomes
 Quantified with relative risk/incidence
rates/attributable risk
 Cohorts are identified prior to the appearance of the
disease under investigation.

 In cohort study the exposure has occurred , but the
disease has not.
Cohort With
disease
Without
disease
Total
exposure
Exposure
(etiologic
factor)
a b a + b
Non- Exposure c d c + d
a/(a + b) - Incidence of disease in exposed
c/( c + d)- Incidence of disease in non exposed
if a/(a + b )> c/ (c + d) It would suggest that the disease and suspected
cause are associated.

 Cohorts must be free from the disease under study.
 Study and control group must be easily susceptible
to the disease under study.
 Both the groups must be comparable in respect to
all the possible variables which may influence the
frequency of the disease.
 The diagnostic and eligibility criteria of the disease
must be defined before hand.
 Groups are then followed , under the same
identical conditions, over a period of time to
determine the outcome of the exposure.

define population
Non randomization
exposed Non exposed
diseased Not diseased diseased Not diseased
2000
2010
2020
1987
1997
2007
RetrospectiveProspective
combined
1987 2007 2017

 SELECTION OF STUDY SUBJECT
 OBTAINING THE DATA ON THE EXPOSURE.
 SELECTION OF THE COMPARISION GROUP.
 FOLLOW UP
 ANALYSIS

 Special Exposure Groups (e.g. radiologists for
studies on effect of radiation; ANC cases having
PIH for studying the outcome of pregnancy, etc.)
 Cohort defined on basis of geographical or
administrative boundaries (e.g. people living in a
given state or district like Framingham heart
study). The special advantage of such cohort is that
the same group will give an exposed as well as
unexposed (comparison) cohort.
 Groups offering special resources (e.g. all
registered doctors can be followed up for
development of IHD after recording their physical
activity levels.

DATA External Sources Internal Sources
Exposure Hospital records Questionnaires,
physical examinations,
and/or blood tests,
other diagnostic tests
Event Disease registries,
death certificates,
physician and hospital
records
Questionnaires,
physical examinations,
and/or blood tests,
other diagnostic tests
Confounder Hospital records
registries
Questionnaires,
physical examinations

 Internal Control Group
 Exposed and non-exposed in
the same Study population
(Framingham study)
 Minimise the differences
between exposed and non-
exposed
 External Control Group
 When information on degree
of exposure is not available
chose another group, another
cohort (smokers and non
smokers)
 General Population: If none of the
above comparison is available than
the mortality experience of the
exposed group is compared with the
mortality experience of the general
population in the same geographic
area as the exposed people.
 E.g. comparison of frequency of
cancer among uranium mine
workers with the rate in general
population in same geographic area.

 One of the problem in cohort studies is the regular
follow up of the participants.
 Therefore , at the start of the study, methods
should be devised depending upon the outcome to
be determined (morbidity or Death) to obtain the
data assessing the outcome.
Routine
surveillance of
death records.
Review
physician and
hospital records
Mailed
questionnaires,
telephone calls,
periodic home
visits.
Periodic
medical
examination of
each member of
the cohort.
Death.
Change of residence.
Migration.
Withdrawal from occupation
etc.
Procedures:

 Absolute comparison
 Risk difference
I exposed - I unexposed
Measures public health problem caused by the
exposure
 Relative comparison
 Relative Risk
 Odds Ratio
RR=I exposed / I unexposed
Measures strength of an association

 DATAARE ANALYSED IN TERMS OF
a) Incidence rates of outcome among exposed and
non- exposed.
b) Estimation of risk.
(i) relative risk
(ii) attributable risk

Cigarette
smoking
Develop
CHD
Did not
develop
CHD
Total Incidence
Yes 70
(a)
6930
(b)
7000
(a + b )
70/7000
=10 per
1000
No 3
(c)
2997
(d)
3000
(c +d)
3/3000
=1 per
1000

R. R = incidence of disease (or Death) among exposed
incidence of disease (or Death) among non- exposed
Cigarette
smoking
Develop
CHD
Did not
develop
CHD
Total Incidence
Yes 70
(a)
6930
(b)
7000
(a + b )
70/7000
0.01
No 3
(c)
2997
(d)
3000
( c + d )
3/3000
.001
RR= a/a+b = 70/7000 = 10
c/c+d 3/3000

 RR=1 = No association between exposure and disease
 incidence rates are identical between groups
 RR=> 1 = Positive association
 exposed group has higher incidence than non-
exposed group
 RR=< 1 = Negative association or protective effect.
 non-exposed group has higher incidence than exposed
or exposed group has lower incidence than non-
exposed e.g. RR 10% / 20% = 0.5 it would indicate
that if one smokes, the risk of getting IHD is 10%; on
the other hand if one does not smokes, the risk is 20%.
Smoking thus reduces the risk of getting IHD by half.

 Risk difference =I exposed- I non exposed
 Attributable risk percent
 Population attributable risk percent.

Incidence
Exposed Unexposed
Iexposed – Iunexposed
I = Incidence
= ( Iexposed-I unexposed)x 100
Iexposed

Attributable risk in our example:AR=( .01-.001/.01)x 100=90%
 It indicates to what extent disease under study can be attributed to
exposure. If smoking is given up then there will be 90% reduction in
CHD among smokers.
Cigarette
smoking
Develop
CHD
Did not
develop
CHD
Total Incidence
Yes 70
(a)
6930
(b)
7000
(a + b )
70/7000
0.01
No 3
(c)
2997
(d)
3000
( c + d )
3/3000
.001The limitation of AR% is that it tells us
the quantum of reduction
in the disease that would be achieved
in the “exposed” group if
“exposure” was given up by them.
However, it does not tell us
about the reduction that will occur in
the “total population”

 Population attributable risk percent
 Proportion of disease in the study population that could be
eliminated if exposure is removed
Incidence in total population – Incidence in unexposed
incidence in total population
{(73/10,000)-(3/3000)}/73/10,000=.86 PAR%=86%

 Measurement (Ascertainment) bias : For obviating this, inform all
subjects of both groups well in advance of the dates and timings of
medical examination and ensure that both the groups are examined by
observers who have similar type of training and using similar type of
instruments and techniques.
 Observer bias : This occurs because the investigator is aware about
the fact as to which subject is ‘exposed’ and who is not exposed. For
obviating this, if possible, ‘blind’ the observer to the exposure status,
the details of exposure being known only to another co - worker who is,
himself, not making any observation regarding ascertainment of
outcome.
 Cross over bias : This may happen because those having the exposure
(e.g. smokers) may cross over to the non exposed group (i.e. become
non smokers) and vice versa. Periodic evaluation of both the groups as
regards level of exposure, making record entries and subsequent
adjustments in the data analysis can help overcoming this problem.
 ‘Loss to follow up’ bias : Some subjects in any case are likely to be
lost to follow up / drop out.

 Incidence can be
calculated
 Several possible
outcomes related to
exposure can be
studied
simultaneously.
 Cohort studies provide
a direct estimate of
R.R
 Dose – response ratio
can also be calculated.
• Large No. of population.
• Very lengthy- takes very long
time to complete.
• Certain administrative.
• Loss of experience staff.
• Loss of funding.
• Extensive record keeping.
 Selection of comparison group-
limiting factor
 There may be changes in study
methods or Diagnostic Criteria
of the Disease over the
prolonged period.
 Cohort studies are expensive.
 The study may itself alter the
patients Behavior.

 Best-known cohort studies is the Framingham Study of
cardiovascular disease.
 Started in 1948.
 Framingham is a town in Massachusetts, about 20
miles from Boston.
 Residents between 30 and 62 years of age were
considered eligible for study.
 1971 enrolled a second generation of participants.
 In April 2002, a third generation was enrolled in the
core study.

 Hypothesis:
 Incidence of CHD increases with age
 Hypertension develop CHD
 Elevated cholestrol is associated with ed CHD
 Tobacco smoking and habitual use of alcohol increased CHD
 Increased physical activity a/w with decreased incidence of
CHD
 Increased Body weight inceases incidence of CHD
 Diabetes increases incidence of CHD
 New coronary events were identified by examining the study
population every 2 years and by daily surveillance of
hospitalizations at the only hospital in Framingham.
: contd..

 Results:
 1960s: Cigarette smoking Increased cholesterol and
elevated blood pressure obesity increases risk of heart
disease. Exercise decreases risk of heart disease.
 1970s: Elevated blood pressure increases risk of
stroke. Postmenopausal women risk of heart disease is
increased compared with who are premenopausal.
 1980s High levels of HDL cholesterol reduce risk of
heart disease.
 1990s: Elevated blood pressure can progress to heart
failure. At 40 years of age, the lifetime risk for CHD is
50% for men and 33% for women.
contd...

 2000s “High normal blood pressure" increases risk of
cardiovascular disease (high normal blood pressure is
called prehypertension in medicine; it is defined as a
systolic pressure of 120–139 mm Hg and/or a diastolic
pressure of 80–89 mm Hg). Lifetime risk of developing
elevated blood pressure is 90%. Serum aldosterone
levels predict risk of elevated blood pressure. Lifetime
risk for obesity is approximately 50%.
contd...

Specify the research question, objectives and
background significance, confirm cohort study is
to be done
Specify the variables of interest and their scales
of Measurement (Exposure variable, Outcome
variable, confounders)
Specify the exclusion criteria ( e.g. like to
restrict the study to males)
Calculate the sample size
Select the study cohort(Special Exposure Groups ,
on basis of geographical or administrative
boundaries)
• Where there is good evidence of association
between exposure and disease, as derived
from clinical observation and supported by
descriptive and case –control studies.
• When exposure is rare, but the incidence of
disease is high among exposed.
• When attrition of study population can be
minimized e. g. follow up is easy , cohort is
stable.
• When ample funds are available.

Select the study cohort
Select the comparison cohort (Ext. group,Int.
group)
Specify the sampling procedure ( simple random
or by systematic random sampling method).
Exclude the disease or outcome of interest in
both the exposed and unexposed cohort groups
Obtain data on exposure level

Obtain Data on all Potential
confounding factors
Consider matching (matching is not
important , if eligible then
frequency matching )
Follow up and ascertainment of
‘outcome’ of interest
Analysis

 Text book of PSM 19th ed by K. Park
 Lange Medical Epidemiology 4th by Raymonds S
Greenberg , Stephen R Daniels ,John William
Elley
 Epidemiology by Leon Gordis.
 Textbook of Public Health and community
medicine by Rajvir Bhalwar ,Rajesh Vaidya, Reena
Tilak
 http://en.wikipedia.org/wiki/Cohort_(statistics)

Case control & cohort study

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Case control & cohort study

Similar to Case control & cohort study (20)

Recently uploaded

Recently uploaded (20)

Case control & cohort study