Poster: Test-Retest Reliability and Equivalence of PRO Measures

A literature review of the variance in ‘interval length’ between administrations
for assessment of test-retest reliability and equivalence of PRO measures
Helen Anderson1
, Nuz Quadri1
, Diane Wild1
, Paul O’Donohoe2
Willie Muehlhausen1
1
Oxford Outcomes, an ICON plc Company, Oxford, United Kingdom 2
CRF Health, London, United Kingdom
www.oxfordoutcomes.com
Background
Repeatability or test-retest reliability is an important component of the psychometric validation of
patient-reported outcome (PRO) measures, and is referred to in the FDA PRO guidance document
(2009) as being a key indicator of an instrument’s validity. Equivalence testing is designed to
evaluate the comparability between PRO scores from an electronic mode of administration and
paper and pencil administration, or between various electronic platforms. Coons et al (2009)
recommend that when the original PRO has undergone a moderate change during its migration to
an electronic platform, an equivalence study is required to ensure that the psychometric properties
haven’t changed.
There are a number of related designs available for both test-retest reliability and equivalence
administration (paper or electronic), but in equivalence studies respondents will complete one
administration on the original version (usually paper) and the other on an electronic platform.
scores (Laenen et al., 2006), often measured by test-retest correlations. Discrepancies between
the scores can occur due to transient or temporal error, which is error due to the repeated
measurement of the same subject at different time points (Schmidt et al., 2003). Various factors
can contribute towards this type of error: carryover effects such as memory and practice, the
recall period used in the PRO, and the stability of the condition being measured. One of the
the measure. A shorter interval runs the risk of potential memory or practice effects and a longer
period runs the risk of the condition having changed between intervals.
There is very little literature addressing the issue of the appropriate length of interval required
between two administrations (Marx et al., 2003). The FDA PRO guidance document states that
“the time interval chosen depends on the variability of the state or experience being evaluated and
the condition rather than variability in stable patients.”
The objective of this literature review was to determine what administration intervals are commonly
used in the development and validation of PROs and to determine whether there is any pattern in
terms of what is currently done based on the criteria described above.
Method
A literature search was conducted in PsychInfo, using the following search terms:
‘test retest reliability’,
‘equivalence testing’,
‘washout period’,
‘interval’.
The search was limited to the past 10 years (2003-2013) and to ‘English language’ articles,
yielding a total of 554 abstracts.
Forty-six additional abstracts from a meta-analytic review of equivalence studies conducted by
Gwaltney et al (2008) were included. A further 65 abstracts were included from a more recent
meta-analysis (in press), resulting in a total of 665 abstracts.
The abstracts were reviewed by researchers, who extracted and collated the administration
interval where available. Full papers were retrieved where required in order to obtain
the interval used.
Abstracts were included if they were test-retest and/or equivalence studies, and used a
PRO measure. Studies were excluded if clinical outcomes assessments other than
PROs were used, if a cross-over design was not used, and if the interval was not clear
from the full paper.
of studies reviewed to extract the information.
Results
Of the 375 studies reviewed, 99 studies were equivalence studies and 276 were test-retest
studies. The studies showed a huge amount of variance in administration interval used, ranging
from no variance (completed immediately) to a 7-year interval.
The variance in administration intervals for test-rest studies was 1 minute to 7 years. The
most commonly used interval was 2 weeks (22%). The variance in administration intervals for
equivalence studies was no interval to 1 month (with an outlier of a 6 month interval). The most
commonly used interval was one hour or less (30%).
Information on the medical conditions that were investigated in the studies was also extracted.
For the test-retest studies the most common conditions were mental health conditions (such as
anxiety, depression, and bipolar disorder), fatigue, cancer, and pain. For the equivalence studies,
the most common conditions were mental health, respiratory (such as asthma and chronic
obstructive pulmonary disease (COPD)), arthritic conditions (such as rheumatoid arthritis and
osteoarthritis), cancer, and pain.
In order to understand more about how the intervals were used across both types of studies, the
intervals of three conditions were assessed more closely: pain, mental health and cancer. The
interval used in these three conditions is provided in Figures 2 and 3.
Figure 2. Interval used in equivalence studies for pain, cancer and mental health
Figure 3. Interval used in test-retest studies for pain, cancer and mental health
The results from analysis of these three conditions show that although they are the same
conditions being investigated, the interval is different for the type of the study being conducted.
The equivalence study intervals are shorter with a modal interval of one hour or less, whereas the
test-retest study intervals are longer with a modal interval of two weeks
to one month.
Figure 1. Flow chart of number of abstracts reviewed
Conclusion
There is no clear guidance on what interval is most appropriate to use in test-retest or equivalence
studies, beyond the need to balance considerations of changes in health state and the need for
complications are seen in the difference of interval lengths used for different types of studies
(i.e. test-retest and equivalence) and also for different conditions. While the literature seems to
indicate the use of different interval lengths for test-retest versus equivalence studies in the same
appropriate interval length for test-retest and equivalence studies. Issues that need to be
considered when selecting the most appropriate interval include: the stability of the condition, the
complexity and length of the measure, and the recall period used in the measure.
References
Coons SJ, Gwaltney CJ, Hays RD, et al. (2009). Recommendations On Evidence Needed To Support
Measurement Equivalence Between Electronic And Paper-Based Patient-Reported Outcome (PRO)
Measures: ISPOR ePRO Good Research Practices Task Force Report. Value Health, 12, 419-429.
Gwaltney CJ, Shields AL, Shiffman S. (2008). Equivalence of electronic and paper-and-pencil
administration of patient-reported outcomes measures: a meta-analytic review. Value Health, 11, 322-
333.
Laenen A, Vangeneugden T, Geys H, et al. (2006) Generalized reliability estimation using repeated
measurements. British Journal of Mathematical and Statistical Psychology, 59, 113-131.
Marx RG, Menezes A, Horovitz L, et al. (2003) A comparison of two time intervals for test-retest
reliability of health status instruments. Journal of Clinical Epidemiology, Volume 56, Issue 8, August
2003, Pages 730–735.
Schmidt FL, Le H, Ilies R. (2003). Beyond Alpha: an empirical examination of the effects of different
sources of measurement error on reliability estimates for measures of individual differences constructs.
Psychological Methods, 8, 206–224.
US Food and Drug Administration: Final Guidance for Industry (2009). Patient-reported outcome
measures: Use in medical product development to support labelling claims.
PsycInfo
554
Gwaltney
et al
Recent meta-analysis
(in press)
282 40 53
46 65
376
Number of
abstracts reviewed:
Number of
relevant studies:
Total number of
studies reviewed:
Source:
0
2
4
6
8
10
12
14
Interval length categories
Test-retest studies
Frequency
1 hour
or less
1 hour
to 1 day
1 day to
1 week
1 week to
2 weeks
2 weeks to
1 month
1 to 2
months
2 months
or over
Pain
Mental
health
Cancer
0
1
2
3
4
5
6
7
Interval length categories
Equivalence studies
Frequency
1 hour
or less
1 hour
to 1 day
1 day to
1 week
1 week to
2 weeks
2 weeks to
1 month
1 to 2
months
2 months
or over
Pain
Mental
health
Cancer

Poster: Test-Retest Reliability and Equivalence of PRO Measures

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Poster: Test-Retest Reliability and Equivalence of PRO Measures

Ähnlich wie Poster: Test-Retest Reliability and Equivalence of PRO Measures (20)

Mehr von CRF Health

Mehr von CRF Health (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Poster: Test-Retest Reliability and Equivalence of PRO Measures