1. Reliability and ValidityReliability and Validity
Hatim Al-JifreeHatim Al-Jifree
MB;ChB(Hon), FRCSC, GOC, MMedEdMB;ChB(Hon), FRCSC, GOC, MMedEd
2. Lecture objectivesLecture objectives
To review the definitions of reliability andTo review the definitions of reliability and
validityvalidity
To review methods of evaluating reliability andTo review methods of evaluating reliability and
validity in survey researchvalidity in survey research
EBM prospectiveEBM prospective
4. DefinitionDefinition
The degree ofThe degree of stabilitystability exhibited when aexhibited when a
measurement ismeasurement is repeatedrepeated under identicalunder identical
conditionsconditions
Lack of reliability may arise from divergencesLack of reliability may arise from divergences
betweenbetween observersobservers oror instrumentsinstruments ofof
measurement ormeasurement or instabilityinstability of the attributeof the attribute
being measuredbeing measured
(from Last. Dictionary of Epidemiology)(from Last. Dictionary of Epidemiology)
5. Assessment of reliabilityAssessment of reliability
Reliability is assessed in 3 formsReliability is assessed in 3 forms
1.1. Test-retest reliabilityTest-retest reliability
2.2. Alternate-form reliabilityAlternate-form reliability
3.3. Internal consistency reliabilityInternal consistency reliability
6. Test-retest reliabilityTest-retest reliability
Most common form in surveysMost common form in surveys
Same respondents complete a survey atSame respondents complete a survey at
twotwo different points indifferent points in timetime
Usually quantified with aUsually quantified with a correlationcorrelation
coefficient (coefficient (rr value)value)
rr values are considered good ifvalues are considered good if rr ≥≥ 0.700.70
7. Test-retest reliability (2)Test-retest reliability (2)
If data are recorded by an observer, youIf data are recorded by an observer, you
can have thecan have the same observersame observer makemake twotwo
separate measurementsseparate measurements
The comparison between the twoThe comparison between the two
measurements ismeasurements is intrintraaobserverobserver reliabilityreliability
What does a difference mean?What does a difference mean?
8. Test-retest reliability (3)Test-retest reliability (3)
You can test-retestYou can test-retest specific questionsspecific questions oror
thethe entireentire survey instrumentsurvey instrument
Variables likely to change over a shortVariables likely to change over a short
period of time, such as energy, happiness,period of time, such as energy, happiness,
anxietyanxiety
Test-retest over very short periods of timeTest-retest over very short periods of time
9. Test-retest reliability (4)Test-retest reliability (4)
Potential problem with test-retest is thePotential problem with test-retest is the
practice effectpractice effect
Individuals become familiar with theIndividuals become familiar with the
itemsitems
What effect does this have on yourWhat effect does this have on your
reliability estimates?reliability estimates?
It inflates the reliability estimateIt inflates the reliability estimate
10. Alternate-form reliabilityAlternate-form reliability
Use differently worded forms toUse differently worded forms to
measure the same attributemeasure the same attribute
Questions or responses are rewordedQuestions or responses are reworded
Or their order is changedOr their order is changed
To produce two items that areTo produce two items that are
similar but not identicalsimilar but not identical
11. Alternate-form reliability (2)Alternate-form reliability (2)
Two items address:Two items address:
The same aspect of behaviorThe same aspect of behavior
Same vocabularySame vocabulary
Same level of difficultySame level of difficulty
Items should differ in wording onlyItems should differ in wording only
It is common to simply change the order of theIt is common to simply change the order of the
response alternativesresponse alternatives
This reduces practice effectThis reduces practice effect
12. Example: Assessment of depressionExample: Assessment of depression
Circle one itemCircle one item
Version A:Version A:
During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted:
Every dayEvery day 11
Some daysSome days 22
NeverNever 33
Version B:Version B:
During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted:
NeverNever 11
Some daysSome days 22
Every dayEvery day 33
13. Alternate-form reliability (3)Alternate-form reliability (3)
You could alsoYou could also change the wordingchange the wording
of theof the responseresponse alternatives withoutalternatives without
changing the meaningchanging the meaning
14. Example: Assessment of urinary functionExample: Assessment of urinary function
Version A:Version A:
During the past week, how often did you usually empty yourDuring the past week, how often did you usually empty your
bladder?bladder?
1 to 2 times per day1 to 2 times per day
3 to 4 times per day3 to 4 times per day
5 to 8 times per day5 to 8 times per day
12 times per day12 times per day
More than 12 times per dayMore than 12 times per day
15. Example: Assessment of urinary functionExample: Assessment of urinary function
Version B:Version B:
During the past week, how often did you usually empty yourDuring the past week, how often did you usually empty your
bladder?bladder?
Every 12 to 24 hoursEvery 12 to 24 hours
Every 6 to 8 hoursEvery 6 to 8 hours
Every 3 to 5 hoursEvery 3 to 5 hours
Every 2 hoursEvery 2 hours
More than every 2 hoursMore than every 2 hours
16. Alternate-form reliability (4)Alternate-form reliability (4)
You could also change the actual wording ofYou could also change the actual wording of
thethe questionquestion
The two items must be equivalentThe two items must be equivalent
Items with different degrees of difficulty do notItems with different degrees of difficulty do not
measure the same attributemeasure the same attribute
What might they measure?What might they measure?
Reading comprehension or cognitive functionReading comprehension or cognitive function
17. Example: Assessment of lonelinessExample: Assessment of loneliness
Version A:Version A:
How often in the past month have you felt alone in the world?How often in the past month have you felt alone in the world?
Every dayEvery day
Some daysSome days
OccasionallyOccasionally
NeverNever
Version B:Version B:
During the past 4 weeks, how often have you felt a sense of loneliness?During the past 4 weeks, how often have you felt a sense of loneliness?
All of the timeAll of the time
SometimesSometimes
From time to timeFrom time to time
NeverNever
18. Example of nonequivalent item rewordingExample of nonequivalent item rewording
Version A:Version A:
When your boss blames you for something you did not do, how often do you stickWhen your boss blames you for something you did not do, how often do you stick
up for yourself?up for yourself?
All the timeAll the time
Some of the timeSome of the time
None of the timeNone of the time
Version B:Version B:
When presented with difficult professional situations where a superior censuresWhen presented with difficult professional situations where a superior censures
you for an act for which you are not responsible, how frequently do youyou for an act for which you are not responsible, how frequently do you
respond in an assertive way?respond in an assertive way?
All of the timeAll of the time
Some of the timeSome of the time
None of the timeNone of the time
19. Alternate-form reliability (5)Alternate-form reliability (5)
You can measure alternate-form reliability at theYou can measure alternate-form reliability at the samesame
timepointtimepoint oror separate timepointsseparate timepoints
If large enough sample:If large enough sample:
You can split it in half and administer one item to eachYou can split it in half and administer one item to each
halfhalf
Then compare the two halvesThen compare the two halves
This is called a split-halves methodThis is called a split-halves method
Can split into thirds and administer three forms of the itemCan split into thirds and administer three forms of the item
20. Internal consistency reliabilityInternal consistency reliability
Applied toApplied to groups of itemsgroups of items that are thought tothat are thought to
measuremeasure different aspectsdifferent aspects of theof the same conceptsame concept
CronbachCronbach’’s coefficient alphas coefficient alpha
Measures internal consistency reliabilityMeasures internal consistency reliability
It is a reflection of how well the different itemsIt is a reflection of how well the different items
complement eachcomplement each
Interpret like a correlation coefficient (Interpret like a correlation coefficient (≥≥0.70 is good)0.70 is good)
21. Example: Assessment of physical functionExample: Assessment of physical function
Limited a
lot
Limited a
little
Not
limited
Vigorous activities, such as running, lifting heavy
objects, participating in strenuous sports
1 2 3
Moderate activities, such as moving a table,
pushing a vacuum cleaner, bowling, or playing golf
1 2 3
Lifting or carrying groceries 1 2 3
Climbing several flights of stairs 1 2 3
Bending, kneeling, or stooping 1 2 3
Walking more than a mile 1 2 3
Walking several blocks 1 2 3
Walking one block 1 2 3
Bathing or dressing yourself 1 2 3
22. Calculation of CronbachCalculation of Cronbach’’s coefficient alphas coefficient alpha
Example: Assessment of emotional healthExample: Assessment of emotional health
During the past month:During the past month: Yes NoYes No
Have you been a very nervous person?Have you been a very nervous person? 1 01 0
Have you felt downhearted and blue?Have you felt downhearted and blue? 1 01 0
Have you felt so down in the dumps thatHave you felt so down in the dumps that
nothing could cheer you up?nothing could cheer you up? 1 01 0
24. CalculationsCalculations
Mean score=2Mean score=2
Sample variance=Sample variance=
5.1
)15(
)22()23()20()23()22( 22222
=
−
−+−+−+−+−
86.0
2
3
5.1
)4)(.6(.)2)(.8(.)4)(.6(.
1
1
)(%)(%
1
=
++
−=
−
−=
∑
k
k
Var
negpos
alphaCC
ii
Conclude that this scale has good reliability
25. Internal consistency reliability (2)Internal consistency reliability (2)
If internal consistency is low:If internal consistency is low:
You can add more itemsYou can add more items
Re-examine existing items forRe-examine existing items for
clarityclarity
26. Interobserver reliabilityInterobserver reliability
How wellHow well twotwo evaluators agree in theirevaluators agree in their
assessment of a variableassessment of a variable
UseUse correlation coefficientcorrelation coefficient to compareto compare
data between observersdata between observers
May be used asMay be used as property of the testproperty of the test or asor as
anan outcome variableoutcome variable
28. DefinitionDefinition
How well a surveyHow well a survey
measures what it setsmeasures what it sets
out to measureout to measure
29. Assessment of validityAssessment of validity
Validity is measured in four formsValidity is measured in four forms
Face validityFace validity
Content validityContent validity
Criterion validityCriterion validity
Construct validityConstruct validity
30. Face validityFace validity
Cursory review of survey items by untrainedCursory review of survey items by untrained
judgesjudges
Ex. Showing the survey toEx. Showing the survey to untraineduntrained
individualsindividuals to see whether they think theto see whether they think the
items look okayitems look okay
Very casual, softVery casual, soft
Many donMany don’’t really consider this as at really consider this as a
measure of validity at allmeasure of validity at all
31. Content validityContent validity
SubjectiveSubjective measure of how appropriate themeasure of how appropriate the
items seem to a set of reviewers who haveitems seem to a set of reviewers who have
some knowledgesome knowledge of the subject matterof the subject matter
Usually consists of an organized review ofUsually consists of an organized review of
the surveythe survey’’s contentss contents
Still very qualitativeStill very qualitative
32. Criterion validityCriterion validity
Measure of how wellMeasure of how well one instrumentone instrument stacks upstacks up
against another instrumentagainst another instrument or predictoror predictor
ConcurrentConcurrent: assess your instrument against a: assess your instrument against a
““gold standardgold standard””
PredictivePredictive: assess the ability of your: assess the ability of your
instrument to forecastinstrument to forecast future eventsfuture events,,
behavior, attitudes, orbehavior, attitudes, or outcomesoutcomes
Assess withAssess with correlation coefficientcorrelation coefficient
33. Construct validityConstruct validity
MostMost valuablevaluable and mostand most difficultdifficult
measure of validitymeasure of validity
Basically, it is a measure of howBasically, it is a measure of how
meaningful the scale or instrument ismeaningful the scale or instrument is
when it is in practical usewhen it is in practical use
34. Construct validity (2)Construct validity (2)
ConvergentConvergent: Implies that: Implies that several differentseveral different
methodsmethods for obtaining thefor obtaining the same informationsame information
about a given trait or concept produce similarabout a given trait or concept produce similar
resultsresults
Evaluation is analogous toEvaluation is analogous to alternate-formalternate-form
reliabilityreliability exceptexcept that it isthat it is more theoreticalmore theoretical andand
requires a great deal of work-usuallyrequires a great deal of work-usually byby
multiple investigators with different approachesmultiple investigators with different approaches
35. Construct validity (3)Construct validity (3)
DivergentDivergent: The ability of a measure to: The ability of a measure to
estimate the underlying truth in a givenestimate the underlying truth in a given
area-must be shown not to correlate tooarea-must be shown not to correlate too
closely with similar butclosely with similar but distinct conceptsdistinct concepts
or traitsor traits
37. IntroductionIntroduction
Three Steps in Using MedicalThree Steps in Using Medical
Literature Articles :Literature Articles :
Are the results of the study valid?Are the results of the study valid?
What are the results?What are the results?
How can I apply these results toHow can I apply these results to
patient care?patient care?
38. IntroductionIntroduction
Four types of papers:Four types of papers:
TherapyTherapy
Diagnostic InterventionDiagnostic Intervention
PrognosisPrognosis
Systematic reviewSystematic review
39. TherapyTherapy
Study design: RCTStudy design: RCT
Were Patients Randomized?Were Patients Randomized?
Was Randomization Concealed?Was Randomization Concealed?
Were Patients Analyzed in the Groups toWere Patients Analyzed in the Groups to
Which They Were Randomized?Which They Were Randomized?
Intention to treat analysisIntention to treat analysis
40. TherapyTherapy
Were Patients inWere Patients in
The TreatmentThe Treatment
And Control GroupsAnd Control Groups
Similar With Respect to KnownSimilar With Respect to Known
Prognostic Factors?Prognostic Factors?
Were Patients Aware of GroupWere Patients Aware of Group
Allocation?Allocation?
41. TherapyTherapy
Were Clinicians Aware of GroupWere Clinicians Aware of Group
Allocation?Allocation?
Were Outcome Assessors AwareWere Outcome Assessors Aware
of Group Allocation?of Group Allocation?
Was Follow-up Complete?Was Follow-up Complete?
Was Follow-up Long Enough?Was Follow-up Long Enough?
42. Diagnostic InterventionDiagnostic Intervention
Study Design: Cross-sectionalStudy Design: Cross-sectional
Was there an independent, blind comparison with aWas there an independent, blind comparison with a
reference standard?reference standard?
•Spectrum of patientsSpectrum of patients
•Did the results of the test being evaluated influence theDid the results of the test being evaluated influence the
decision to perform the reference standard?decision to perform the reference standard?
•Were the methods description permit replication?Were the methods description permit replication?
43. PrognosisPrognosis
• Study design: CohortStudy design: Cohort
• Was aWas a
– Defined,Defined,
– representative sample of patientrepresentative sample of patient
– assembled at a common point in the course of their disease?assembled at a common point in the course of their disease?
• Inception Cohort; earlyInception Cohort; early
• Late stage prognosisLate stage prognosis
• Patient equal in all prognostic factorsPatient equal in all prognostic factors
• Stratified analysis?Stratified analysis?
• Follow up complete and long enoughFollow up complete and long enough
• Valid and reliable data collectionValid and reliable data collection