SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
ORIGINAL ARTICLE
Reliability of Binocular Vision Measurements
Used in the Classification of
Convergence Insufficiency
MICHAEL W. ROUSE, OD, MS, FAAO, ERIC BORSTING, OD, MS, PAUL N. DELAND, PhD,
and The Convergence Insufficiency and Reading Study (CIRS) Group
Southern California College of Optometry, Fullerton, California (MWR, EB), Mathematics Department, California State University at
Fullerton, Fullerton, California (PND)
ABSTRACT: Purpose. To evaluate the reliability of binocular vision measurements used in the classification of
convergence insufficiency. Methods. Two examiners tested 20 fifth and sixth graders in a school setting who passed a
screening of visual acuity, refraction, and binocularity. The tests, conducted using a standard protocol, consisted of von
Graefe near heterophoria (NH), phorometric positive fusional vergence (PFV), nearpoint of convergence (NPC), and
monocular pushup accommodative amplitude (AA). Each examiner measured each child three consecutive times for
each test, on two separate occasions, spaced approximately 1 week apart. Intraexaminer and interexaminer agreement
was assessed using intraclass correlation coefficients (ICC), the median absolute difference (MAD), and the coefficient
of repeatability (COR). Results. The within-session reliability of the NH (ICC: 0.95 to 0.99), NPC (ICC: 0.94 to 0.98),
and AA (ICC: 0.88 to 0.95) were good, whereas the PFV was less reliable (ICC: 0.71 to 0.94). The intraexaminer
reliability between sessions was good for the NPC (ICC: 0.92 and 0.89), less reliable for NH (ICC: 0.81 and 0.81) and
AA (ICC: 0.89 and 0.69), and much less reliable for PFV break (ICC: 0.59 and 0.53). Typical between-session PFV
differences (MAD) were between 3 and 4 ⌬, whereas the COR differences were as large as 12 ⌬. Conclusions. Three
of the four measures (NH, NPC, and AA) often used in the classification of convergence insufficiency generally have
good within-session and between-session reliability. The PFV break was found to have only fair reliability with clinically
significant differences between sessions. The large potential test-retest differences found could complicate clinical
decision-making in regards to diagnosis and treatment. (Optom Vis Sci 2002;79:254–264)
Key Words: reliability, repeatability, heterophoria, nearpoint of convergence, accommodative amplitude, positive
fusional vergence, convergence insufficiency, binocular vision
T
he Convergence Insufficiency and Reading Study (CIRS)
group is investigating the relationship between convergence
insufficiency (CI) and reading. The CIRS Group has com-
pleted the initial steps of developing a CI classification system and
standardized protocols for each of the diagnostic methods.1–3
The
classification system relies on three diagnostic signs: near hetero-
phoria, near positive fusional vergence, and the nearpoint of con-
vergence. Accommodative amplitude was also included as a fourth
diagnostic sign because of the high association between accommo-
dative insufficiency and CI.3
The diagnostic methods selected to
assess whether these signs are present were von Graefe heterophoria
(NH)at30cm,vonGraefepositivefusionalvergence(PFV)at30cm,
the nearpoint of convergence (NPC), and the pushup method of
accommodative amplitude (AA). The von Graefe phorometry meth-
ods were selected by the CIRS Group in 1994 because of their com-
mon use in practice, use in the diagnosis of CI, and the availability of
associated normative data studies. Even though these methods are
commonandacceptedmeasuresusedinthediagnosisofCI,wefound
no single study that evaluated the intraexaminer and interexaminer
reliability for this group of methods on children.
Reliability (or the commonly used synonym repeatability) re-
flects the amount of error, both random and systematic, inherent
in any measurement.4
Reliability helps determine the confidence
with which we can appraise the presence or absence of functional
The complete Manual of Testing Protocols for the CIRS Group (1996) is
available by contacting Michael W. Rouse, OD, MS, Southern California College
of Optometry, 2575 Yorba Linda Blvd., Fullerton, CA 92831 (or by e-mail:
mrouse@scco.edu).
1040-5488/02/7904-0254/0 VOL. 79, NO. 4, PP. 254–264
OPTOMETRY AND VISION SCIENCE
Copyright © 2002 American Academy of Optometry
Optometry and Vision Science, Vol. 79, No. 4, April 2002
abnormalities, trends of deterioration or of spontaneous improve-
ment, and the effects of therapy. Tests should be repeatable, with
the same examiner at different times (intraexaminer reliability) as
well as with different examiners (interexaminer reliability) obtain-
ing similar results. Reliability is critical information for both the
clinician and researcher who want to obtain an accurate time
course of the patient’s condition.
Reliability of von Graefe Heterophoria
Hirsch and Bing5
reported the reliability of the near von Graefe
method using 38 adult subjects (optometry students) measured by
two examiners on two separate occasions. The exact time interval
between the two sessions was not specified. Hirsch and Bing found
good-to-excellent reliability for both intraexaminer (r ϭ 0.88 ex-
aminer 1 and 2) and interexaminer (r ϭ 0.94) measurements. They
also reported relatively small intraexaminer mean differences of
2.16 ⌬ (SD ϭ 1.84) for one examiner, 2.05 ⌬ (SD ϭ 1.75) for the
other examiner, and a small interexaminer mean difference of 2.00
⌬. Morgan6
reported good intraexaminer reliability (r ϭ 0.81) on
23 optometry students who first served as subject and then exam-
iner each week over a 5-week period. Rainey et al.7
evaluated in-
terexaminer repeatability of heterophoria tests on 72 second- and
third-year optometry students. He reported fair-to-good reliability
(r ϭ 0.75) for the near von Graefe method, with a small mean
difference of Ϫ0.20 ⌬, but clinically large 95% limits of agreement
of (Ϫ0.20 Ϯ 6.7 ⌬).
Reliability of von Graefe Fusional Vergence
There have been few investigations regarding the reliability of
fusional vergence measurements. The general opinion is that when
fusional vergence tests are repeated on the same patient, the second
value found may be quite different from the first.8
Sheedy9
sug-
gested that “A difference of 10 prism diopters from one fusional
vergence amplitude measurement to another is not unusual unless
rigorous controls are applied.”
Brozek et al.10
examined the PFV at distance on six occasions in
six subjects between the ages of 20 and 30 years. A Risley prism was
held in front of one eye while the subject fixated a spot of light at
6 m. It was not clear whether single or multiple examiners were
used. Brozek et al. found good consistency among the six measures
(rc ϭ 0.81, where rc is a modified intraclass correlation coefficient
[ICC]). The actual ICC (which we calculated from their data) for
these data is 0.72, which still indicates good reliability. Assuming
that the mean difference between two vergence measurements is
zero and using our ICC calculation, we estimated the 95% limits of
agreement for Brozek’s data to be Ϯ5.06 ⌬. Penisten et al.11
re-
cently completed a similar study, but of phoropter-mounted Risley
prism fusional vergence at 4 m and 40 cm on eight young adult
subjects. The authors reported that the distance PFV break and
near PFV blur and recovery were the least repeatable with an esti-
mated intrasubject SD on replicated measurements of about 2.75
⌬ (compared with 3.45 ⌬ for Brozek et al.) whereas the distance
PFV blur and recovery were slightly more repeatable (SD ϭ 2.00
to 2.25 ⌬). The PFV break at near had the smallest SD of 1.5 to 2
⌬.
Feldman et al.12
compared the near PFV taken twice within a
single session (5 min apart) by a single examiner. Subjects were
adults (optometric students, faculty, and staff) with a mean age of
25 years. They reported good-to-excellent within-session reliabil-
ity for both PFV break (r ϭ 0.87) and PFV recovery (r ϭ 0.86).
Reliability of Nearpoint of Convergence
Brozek et al.10
also examined the nearpoint of convergence on
six occasions in six subjects between the ages of 20 and 30 years. A
Prentice rule with a white circular target, 2 mm in diameter, was
bought in from the distance of clear vision to the point of binocular
diplopia. It was not clear whether single or multiple examiners
were used. They reported good consistency (rc ϭ 0.79), but the
corresponding ICC was only 0.65, which reflects only a fair level of
reliability.
Reliability of Accommodative Amplitude
Brozek et al.10
also examined the nearpoint of accommodation
on six occasions in six subjects between the ages of 20 and 30 years.
A Prentice rule with a 20/30 line of letters was bought in from a
distance of clear vision to the point of first blur. It was not clear
whether single or multiple examiners were used. Three measure-
ments were taken and averaged on each of six occasions, and good
consistency (rc ϭ 0.76) was reported, although the ICC was only
0.51, which suggests only fair reliability. The AA of the six subjects
were fairly homogeneous, which may have artificially lowered the
ICC.
Rosenfield and Cohen13
evaluated the pushup method of ac-
commodative amplitude on five occasions separated by at least
24 h. The maximum separation between the sessions was not re-
ported. It was also not clear whether single or multiple examiners
were used. Thirteen adult subjects (mean age of 24 years) viewed a
single optotype within the smallest line of letters that could be
resolved at a viewing distance of 40 cm, and the target was brought
from clear vision to first sustained blur. They reported that the
range over which 95% of accommodative amplitude values would
be predicted to lie was 10.11 D Ϯ 1.44 (i.e., mean ϭ 10.11 D, SD
ϭ 0.73 D, and 1.96 ϫ 0.73 ϭ 1.44 D). These authors inappro-
priately characterized this range as the Bland-Altman 95% limits of
agreement.14
In this case the Bland and Altman limits of agreement
should provide an interval in which 95% of the differences between
two measurements of amplitude, not the actual amplitude values,
would be predicted to lie. From the results of Rosenfield and Co-
hen and by making certain reasonable assumptions, estimated val-
ues of the Bland-Altman limits of agreement can be calculated. In
particular, assuming that there is no bias between two measure-
ments and that the ICC is a moderate 0.70, the 95% limits of
agreement can be estimated to be Ϯ1.11 D.
Chen and O’Leary15
measured accommodative amplitude on
18 adults on two separate occasions (the exact time period was not
reported). A modified pushup method of blur to first detection was
used with a target size of N8 reduced Lea symbols. They reported
a correlation coefficient of 0.99, with a mean difference of 0.07 D
and 95% limits of agreement of 0.07 Ϯ 1.22 D.
The literature review reveals heterogeneity in the reporting of
reliability (or repeatability) study results on binocular measures,
making direct and clear comparisons between studies difficult.
Classification of Convergence Insufficiency—Rouse et al. 255
Optometry and Vision Science, Vol. 79, No. 4, April 2002
Many of the above-cited studies inappropriately have used the
Pearson product-moment correlation coefficient (r) as an index of
reliability. Other studies utilized the methods of Bland and Alt-
man14
in reporting the limits of agreement—a range of values in
which it is reasonable to expect the difference between two mea-
surements of the same parameter to occur just by chance. The
studies by Rosenfield and Cohen13
and Penisten et al.11
inappro-
priately used the Bland-Altman “limits of agreement” terminol-
ogy, but in fact reported ranges of values on a single measurement
of a parameter that one might expect from a typical patient. This
indirect view of reliability is difficult to compare with the Bland-
Altman approach, which is a direct view of the distribution of
differences between replicated measurements.
There is a clear need to evaluate intraexaminer and interexam-
iner reliability of common binocular vision measurements in
school-aged children. Although a few studies suggest that there is
good reliability for some measurements (near heterophoria and
accommodative amplitude), it may not be appropriate to apply
adult results to children because binocular function depends on
both examiner instructions and the patient’s subjective response.
Children may be poorer observers, have more trouble understand-
ing instructions or expected endpoints, or be slower to respond
than adults. The purpose of this paper is to evaluate the reliability
of the primary binocular vision measurements used in determining
the diagnosis of CI in school-aged children.
METHODS
This study was approved by the Southern California College of
Optometry institutional review board, and informed consent was
obtained for all subjects in the study.
Study Population
Fifth and sixth graders were screened in a school setting by two
CIRS examiners according to a standard protocol. Screening cri-
teria were as follows:
• either no glasses or had worn glasses or contact lenses Ն1 month
by subject report;
• visual acuity 20/30 or better in each eye with habitual correction
using a Snellen wall chart;
• uncorrected refractive error equal to or between Ϫ0.50 to
ϩ1.00 D, and Յ1.00 D of astigmatism in either eye or Յ1.00
D of anisometropia by retinoscopy;
• no strabismus at 3 m or 30 cm by unilateral cover test.
Data Collection
The first 20 consecutive children who passed the vision screen-
ing were used as subjects. Intraexaminer and interexaminer reli-
ability were evaluated for the following measurements:
• von Graefe heterophoria at 30 cm using a single line of 20/30
reduced Snellen target;
• von Graefe PFV and NFV at 30 cm (blur/break/recovery) using
a 2 ϫ 5 block of 20/30 reduced Snellen target;
• NPC (break/recovery) using a single line of 20/30 reduced
Snellen target on a Astron International (ACR/21) Accommo-
dative Rule;
• monocular accommodative amplitude (Donder’s Pushup
Method) of the right eye only using a single line of 20/30
reduced Snellen target on an Astron International (ACR/21)
Accommodative Rule.
Each examiner took three consecutive measurements on each
subject according to the standard protocol outlined in the Appen-
dix. The exception was vergence measures, where a counterbal-
anced method of negative and then positive fusional vergence was
conducted until three measurements of each were reached. The
examiners performed independent measurements on the same sub-
jects without knowledge of the other examiner’s results. Measure-
ments were taken again by the same examiners on the same subjects
1 week later.
One problem we noted while reviewing the literature was a lack
of detail in the Methods sections. Because there is some variation in
the literature and especially among practitioners as to the exact
procedure for our four measures, we are providing detailed meth-
ods as an Appendix to the manuscript as outlined from the CIRS
Manual of Procedures.1
a
Data Analysis
This study design allows for the consideration of intraexaminer
reliability both within and between sessions as well as within-
session interexaminer reliability for each of the four principal CI
diagnostic variables: NH, PFV break, NPC break, and AA.
Within-session intraexaminer reliability was assessed using both
the within-session ranges and the intraclass correlation coefficient
(ICC). The range for each subject is the difference between the
maximum and minimum of the three within-session measure-
ments. We will report first the sample mean range, which provides
a measure of a typical patient’s within-session difference in mea-
sures; second, the 95th percentile of the ranges (R95), which gives
a practical upper limit on the differences between within-session
measurements. We estimate that 95% of all patients would have a
maximum within-session difference in measures no greater than
this limit.
The ICC is an overall index of reliability ranging between zero
and one. A value of one indicates perfect repeatability—meaning,
in this case, each subject obtained the same value on the three
within-session measures. A value of zero indicates no reproducibil-
ity of the measurement and, hence, no reliability. The ICC is
commonly interpreted as follows16
:
ICC Ͻ 0.4 indicates poor reliability;
0.4 Ͻ ICC Ͻ 0.75 indicates fair-to-good reliability;
ICC Ͼ 0.75 indicates good-to-excellent reliability.
The ICC depends on both the between- and within-subject
variability. It will be high when the within-subject variability is low
relative to the total of between- and within-subject variability. It
will be low when within-subject variability is high relative to this
total variability. Hence, a sample that either overestimates or un-
derestimates the population variability may result in a distorted
ICC estimate. Consequently, it is important that the ICC is inter-
256 Classification of Convergence Insufficiency—Rouse et al.
Optometry and Vision Science, Vol. 79, No. 4, April 2002
preted in conjunction with measures of variability like the range.
Low values of the range will correspond typically to a higher ICC.
For intraexaminer between-session reliability, each examiner’s ses-
sion 1 and session 2 means were compared. A principal focus is the
distribution of the between-session differences in these means.
Here the methods of Bland and Altman14
are useful in the case
where the distribution of differences is approximately normal and
the mean difference is close to zero. To check these assumptions, a
preliminary Anderson-Darling test for normality17
and a matched-
paired t-test were conducted. If both of these tests were nonsignif-
icant, then we proceeded with the Bland-Altman methodology.
The mean difference, the SD of the differences, and the coefficient
of repeatability (COR), which is 1.96 ϫ SD, and the 95% limits of
agreement (mean difference Ϯ COR) were computed. In cases
where either of the preliminary tests were significant, we consid-
ered the distribution of absolute differences. In place of the COR,
we computed the 95th percentile of the absolute differences
(AD95). This 95th percentile provides, as does the COR in the
case of normality and a zero mean, a threshold for differences of
successive measures that would have to be exceeded to conclude
that a true shift in value has likely occurred, as opposed to an
observed difference that can be explained by the natural variability
in the measure. In both cases, we find it useful to compute the
median absolute difference (MAD). It provides a measure of
the typical difference in mean between the two sessions where the
distribution of differences may not be normal. The ICC also was
computed as an index of agreement between the means of the two
sessions. These same methods were also used in assessing within-
session interexaminer reliability.
Sample Size
The sample size of 20 was selected based on the available re-
sources to CIRS at the time of testing. In testing the hypothesis of
having just a fair level of reliability, say Ho: ICC ϭ 0.50, at the 0.05
level of significance, 23 subjects would be required to have 80%
power to reject the alternative of having excellent reliability, say
ICC ϭ 0.80. This assumes that a one-tailed test is to be conducted.
For the same test conducted at 70% power, 18 subjects would be
required. Hence, our sample size of 20 renders our test slightly
underpowered, but is also a substantial improvement on several of
the frequently referenced studies (e.g., Brozek et al.,10
Penisten et
al.,11
and Rosenfield and Cohen13
) that were described previously.
RESULTS
Twenty fifth and sixth graders (8 males, 12 females; mean age
10.8 years, SD 0.34 years, range 10.2 to 11.5 years) served as the
subjects.
Near Heterophoria
Table 1 provides a summary of reliability measures for the NH.
The results indicate a high level of intraexaminer reliability, both
within and between sessions. The within-session ICC’s are excel-
lent (0.95 or higher), and the mean ranges are Ͻ2 ⌬ with R95 Յ4
⌬. Intraexaminer between-session reliability was good for both
examiners (ICC ϭ 0.81) with MAD’s Յ2 ⌬. The COR for exam-
iner 1, which was ~7 ⌬, and the corresponding limits of agreement
are illustrated in Fig. 1. The interexaminer within-session reliabil-
ity was excellent for session 1 (ICC ϭ 0.91) and good for session 2
(ICC ϭ 0.72). The COR’s were Ͻ9 ⌬, with the MAD’s Յ2.5 ⌬.
Positive Fusional Vergence Break
Table 2 shows the summary of reliability measures for PFV
break. The within-session measurements for examiners 1 and 2
indicate different levels of intraexaminer reliability depending on
the testing session. Both examiners had good session 1 reliability
(ICC: 0.76 and 0.71) but excellent session 2 reliability (ICC: 0.94
and 0.93). Consistent with the ICC’s, for examiners 1 and 2,
session 1 mean ranges and R95’s were higher (means: 5.30 ⌬ and
5.40 ⌬) than the corresponding session 2 values (means: 3.80 ⌬
and 2.45 ⌬). Intraexaminer between-session reliability was fair
(ICC: 0.59 and 0.53) with COR’s of 14.07 ⌬ and 12 ⌬. The 95%
TABLE 1.
Near heterophoria.a
Intraexaminer Within-Session Reliability
Session 1 Session 2
Examiner 1
Mean 4.20 ⌬ XP 4.05 ⌬ XP
Mean range 1.95 ⌬ 1.65 ⌬
R95 4.00 ⌬ 4.00 ⌬
ICC 0.95 0.95
Examiner 2
Mean 4.82 ⌬ XP 4.23 ⌬ XP
Mean range 1.70 ⌬ 0.90 ⌬
R95 3.00 ⌬ 2.00 ⌬
ICC 0.97 0.99
Intraexaminer Between-Session Reliability
ICC COR MAD
E1S1 vs. E1S2 0.81 6.78 ⌬ 1.67 ⌬
E2S1 vs. E2S2 0.81 7.64 ⌬ 2.00 ⌬
Interexaminer Within-Session Reliability
ICC COR MAD
E1S1 vs. E2S1 0.91 4.86 ⌬ 1.33 ⌬
E1S2 vs. E2S2 0.72 8.86 ⌬ 2.50 ⌬
a
For intraexaminer within-session reliability, mean is the av-
erage of the 60 (20 patients times 3 measurements per patient)
within-session measures, mean range is the average of the 20
individual patient ranges, R95 is the 95th percentile of those 20
ranges, and ICC is the intraclass correlation coefficient. For both
the intraexaminer between-session and interexaminer within-ses-
sion reliability, ICC is the intraclass correlation coefficient for the
session means, COR is the coefficient of repeatability, which is
1.96 times the SD of the session differences, and MAD is the
median absolute difference of those session differences. In cases
where the COR value is asterisked, the 95th percentile of the
absolute differences is being substituted. E1S1, examiner 1/ses-
sion 1; E2S2, examiner 2/session 2.
Classification of Convergence Insufficiency—Rouse et al. 257
Optometry and Vision Science, Vol. 79, No. 4, April 2002
limits of agreement for examiner 2 are illustrated in Fig. 2. Inter-
examiner within-session reliability was also fair. The ICC’s were
0.64 (session 1) and 0.53 (session 2), with COR’s of 10.30 ⌬ or
higher.
NPC Break
The summary of reliability measures for NPC break is shown in
Table 3. In all three comparisons, NPC break has excellent reli-
ability, with ICC’s no lower than 0.86. Intraexaminer within-
session reliability is especially high, with all ICC’s Ն0.94 and mean
ranges Յ1.25 cm. The intraexaminer between-session reliability
was excellent (ICC: 0.91 and 0.89) with MAD’s of ~1 cm. The
limits of agreement for examiner 2 are illustrated in Fig. 3. This
plot also shows a fairly strong positive trend (r ϭ 0.78) between the
differences between the two measures and their means, indicating
FIGURE 1.
Examiner 1 between-session reliability on near heterophoria; the plot of
the difference between the two session averages (session 2 Ϫ session 1) vs.
the mean of those two averages. The lines at L ϭ Ϫ6.93 and U ϭ 6.63
show, respectively, the lower and upper 95% limits of agreement.
TABLE 2.
Positive fusional vergence break.a
Intraexaminer Within-Session Reliability
Session 1 Session 2
Examiner 1
Mean 22.10 ⌬ 24.10 ⌬
Mean range 5.30 ⌬ 3.80 ⌬
R95 8.00 ⌬ 8.00 ⌬
ICC 0.76 0.94
Examiner 2
Mean 22.78 ⌬ 19.06 ⌬
Mean range 5.40 ⌬ 2.45 ⌬
R95 12.00 ⌬ 6.00 ⌬
ICC 0.71 0.93
Intraexaminer Between-Session Reliability
ICC COR MAD
E1S1 vs. E1S2 0.59 14.07 ⌬ 3.67 ⌬
E2S1 vs. E2S2 0.53 12.00 ⌬* 4.00 ⌬
Interexaminer Within-Session Reliability
ICC COR MAD
E1S1 vs. E2S1 0.64 10.30 ⌬ 3.33 ⌬
E1S2 vs. E2S2 0.53 16.00 ⌬* 5.67 ⌬
a
See notes for Table 1.
FIGURE 2.
Examiner 2 between-session reliability on PFV break; the plot of the
difference between the two session averages (session 2 Ϫ session 1) vs. the
mean of those two averages. The lines at L ϭ Ϫ12.00 and U ϭ 12.00
show, respectively, the lower and upper empirical 95% limits of
agreement.
TABLE 3.
Nearpoint of convergence break.a
Intraexaminer Within-Session Reliability
Session 1 Session 2
Examiner 1
Mean 5.45 cm 5.72 cm
Mean range 1.10 cm 0.80 cm
R95 2.00 cm 2.00 cm
ICC 0.98 0.98
Examiner 2
Mean 4.54 cm 5.68 cm
Mean range 0.78 cm 1.25 cm
R95 2.00 cm 3.00 cm
ICC 0.98 0.94
Intraexaminer Between-Session Reliability
ICC COR MAD
E1S1 vs. E1S2 0.92 5.33 cm* 1.17 cm
E2S1 vs. E2S2 0.89 5.00 cm* 1.00 cm
Interexaminer Within-Session Reliability
ICC COR MAD
E1S1 vs. E2S1 0.86 4.43 cm 1.68 cm
E1S2 vs. E2S2 0.97 2.55 cm 0.67 cm
a
See notes for Table 1.
258 Classification of Convergence Insufficiency—Rouse et al.
Optometry and Vision Science, Vol. 79, No. 4, April 2002
a tendency for the difference to increase with the NPC break. A
similar pattern (not shown) is evident for examiner 1, but two
highly influential outliers lower the correlation (r ϭ 0.01, but r ϭ
0.58 with the outliers excluded). The interexaminer within-session
reliability was also excellent, with smaller COR’s than the intraex-
aminer between-session reliability.
Accommodative Amplitude
Table 4 is a summary of the reliability measures for AA. Intraex-
aminer within-session reliability is excellent with ICC’s Ն0.88,
mean ranges Յ2.29 D, and R95 of 5.00 D in all cases. Intraexam-
iner between-session reliability differed by examiner (0.82 vs. 0.69)
with MAD’s of ~2 ⌬ or less. The limits of agreement for examiner
1 are illustrated in Fig. 4. The interexaminer within-session reli-
ability was good (0.81 and 0.85), with slightly higher MAD’s and
smaller COR’s than the intraexaminer between-session reliability.
Positive Fusional Vergence Recovery and
NPC Recovery
Although PFV recovery and NPC recovery are not used in our
diagnostic classification system, they typically are measured in a
clinical assessment of binocular vision. In Tables 5 and 6, the
summary of reliability information is provided for these binocular
measures.
DISCUSSION
Our multifaceted data analysis approach provides different per-
spectives on the issue of reliability or repeatability. The ICC is a
reliability index ranging from zero to one regardless of the units of
the measure under consideration. It readily allows for direct com-
parison of reliability between different measurements. The ICC
takes into account intrasubject and intersubject variability, but it
does not directly convey the level of intrasubject variability. For
example, the ICC was 0.81 for the between-examiner reliability
NH (session 1). This means that 81% of the variability in these
measurements is due to intersubject variability and only 19% is
due to intrasubject variability. It is this intrasubject variability that
is more clinically relevant to the practitioner. A more direct clinical
summary of these data is provided by the MAD, which is 1.67 ⌬,
and the COR, which is 6.78 ⌬. That is, the difference between NH
taken 1 week apart would typically differ by Ͻ2 ⌬, but it would be
FIGURE 3.
Examiner 2 between-session reliability on NPC break; the plot of the
difference between the two session averages (session 2 Ϫ session 1) vs. the
mean of those two averages. The lines at L ϭ Ϫ5.00 and U ϭ 5.00 show,
respectively, the lower and upper empirical 95% limits of agreement.
TABLE 4.
Accommodative amplitude.a
Intraexaminer Within-Session Reliability
Session 1 Session 2
Examiner 1
Mean 14.18 D 15.46 D
Mean range 2.04 D 2.29 D
R95 5.00 D 5.00 D
ICC 0.88 0.90
Examiner 2
Mean 14.41 D 15.17 D
Mean range 2.25 D 1.70 D
R95 5.00 D 5.00 D
ICC 0.90 0.95
Intraexaminer Between-Session Reliability
ICC COR MAD
E1S1 vs. E1S2 0.82 5.32 D 1.63 D
E2S1 vs. E2S2 0.69 10.48 D 2.06 D
Interexaminer Within-Session Reliability
ICC COR MAD
E1S1 vs. E2S1 0.81 4.13 D* 1.82 D
E1S2 vs. E2S2 0.85 6.86 D 2.58 D
a
See notes for Table 1.
FIGURE 4.
Examiner 1 between-session reliability on accommodative amplitude; the
plot of the difference between the two session averages (session 2 Ϫ
session 1) vs. the mean of those two averages. The lines at L ϭ Ϫ4.04 and
U ϭ 6.60 show, respectively, the lower and upper 95% limits of
agreement.
Classification of Convergence Insufficiency—Rouse et al. 259
Optometry and Vision Science, Vol. 79, No. 4, April 2002
possible for the difference to be as large as ~7 ⌬. Fig. 1 shows these
95% limits of agreement. Any finding outside this range has only a
5% probability of being due to measurement error alone. We feel
as others18
that the Bland-Altman approach gives a more relevant
clinical picture of measurement error because it details the nature
of the intrasubject variability. However, Bland and Altman have
acknowledged the appropriateness of the ICC in reliability stud-
ies.19
In addition, most of the older studies have used the standard
product moment correlation coefficient (r) to evaluate reliability.
The ICC is preferable because it is a measure of agreement between
measures, whereas r is a measure of association. Because the ICC is
usually close to r and always less than or equal to it, we are still able
to draw comparisons between this and previous studies.
Near Heterophoria
The intraexaminer within-session reliability was found to be
excellent (0.95 to 0.99). Hence, repeated measurements within a
single testing session are very repeatable in children. The previous
adult studies measuring intraexaminer reliability ranged from 0.81
to 0.88, with a MAD of ~2 ⌬.5, 6
Our intraexaminer between
session reliability results (0.81) using the ICC are similar to these
previous studies. Therefore, the clinician can expect typical differ-
ences of ~2 ⌬ (MAD) but can measure differences as large as 6 to 7
⌬ (COR). Our interexaminer reliability was also similar (0.91 and
0.72) to previous adult studies, which ranged from 0.75 to 0.94.5, 7
In general, children appear to respond as reliably as adults on this
near heterophoria measure. However, most clinicians would prob-
ably be uncomfortable with the large COR values for intraexam-
iner reliability between sessions.
Positive Fusional Vergence
Intraexaminer within-session reliability varied between the two
testing sessions, with session 1 being lower (0.76 and 0.71) and
session 2 higher (0.94 and 0.93) than that reported by Feldman et
al.12
(0.87). The mean ranges were higher for session 1, which
resulted in the lower ICC’s. The initial testing session may have
served as training, and the children may have learned to respond
better to the PFV on the second testing session where the mean
range and R95 are smaller.
Our findings do indicate that the intraexaminer between-ses-
sion and interexaminer within-session reliability is at best fair. The
intraexaminer between-session reliability results are lower (0.59
and 0.53) than Brozek et al.10
(0.72). However, Brozek et al. (and
Penisten et al.11
) took their measurements on consecutive days,
evaluated distance PFV, and used adult subjects, which makes
direct comparison difficult. One might expect PFV at distance to
be more stable, thus more repeatable than PFV at near where the
accommodative-convergence relationship is more complex, al-
though Penisten et al. found intrasubject variability to be lowest
with the near PFV break. Based on our results, the clinician can
expect typical differences of 3 to 4 ⌬, but can measure differences
as large as 12 ⌬ on follow-up visits. The large differences may cause
problems with accurately classifying patients as CI and monitoring
treatment outcomes. It may also explain why some patients appear
to have CI, based for example on Sheard’s criteria, and are asymp-
TABLE 5.
Positive fusional vergence recovery.a
Intraexaminer Within-Session Reliability
Session 1 Session 2
Examiner 1
Mean 6.72 ⌬ 9.08 ⌬
Mean range 5.70 ⌬ 4.75 ⌬
R95 13.00 ⌬ 8.00 ⌬
ICC 0.71 0.88
Examiner 2
Mean 6.47 ⌬ 5.78 ⌬
Mean range 5.15 ⌬ 3.55 ⌬
R95 13.00 ⌬ 7.00 ⌬
ICC 0.68 0.90
Intraexaminer Between-Session Reliability
ICC COR MAD
E1S1 vs. E1S2 0.27 16.54 ⌬ 3.00 ⌬
E2S1 vs. E2S2 0.50 12.15 ⌬ 4.00 ⌬
Interexaminer Within-Session Reliability
ICC COR MAD
E1S1 vs. E2S1 0.57 10.62 ⌬ 4.17 ⌬
E1S2 vs. E2S2 0.65 10.00 ⌬* 4.17 ⌬
a
See notes for Table 1.
TABLE 6.
Nearpoint of convergence recovery.a
Intraexaminer Within-Session Reliability
Session 1 Session 2
Examiner 1
Mean 7.88 cm 8.33 cm
Mean range 1.20 cm 1.10 cm
R95 3.00 cm 2.00 cm
ICC 0.97 0.98
Examiner 2
Mean 6.03 cm 7.38 cm
Mean range 1.03 cm 1.25 cm
R95 2.00 cm 3.00 cm
ICC 0.97 0.97
Intraexaminer Between-Session Reliability
ICC COR MAD
E1S1 vs. E1S2 0.90 5.15 cm 1.00 cm
E2S1 vs. E2S2 0.84 7.33 cm* 0.92 cm
Interexaminer Within-Session Reliability
ICC COR MAD
E1S1 vs. E2S1 0.80 6.00 cm* 2.17 cm
E1S2 vs. E2S2 0.96 2.70 cm* 1.00 cm
a
See notes for Table 1.
260 Classification of Convergence Insufficiency—Rouse et al.
Optometry and Vision Science, Vol. 79, No. 4, April 2002
tomatic or vise versa. Additionally, when evaluating the effects of
vision therapy, a single examiner would need a change of 12 ⌬,
whereas different examiners might need changes as large as 10 to
16 ⌬ to be confident that the change was real and not the result of
measurement variability.
The large PFV break differences could be due to children having
more difficulty with the psychophysical aspects of this test. Chil-
dren may be poorer observers, have more trouble understanding
the instructions or expected endpoints, or be slower to respond
than adults. Presently we are evaluating the PFV in adults to ad-
dress this issue. If the large break differences are not due to subject
age, then the differences may be related to the fusional vergence
system being inherently variable over time.
Nearpoint of Convergence
The intraexaminer within-session reliability was found to be
excellent (0.95 to 0.99) for the NPC break. Hence, the measure-
ments within a single testing session are very repeatable in children.
Regarding intraexaminer between-session reliability, the only pre-
vious study10
with adult subjects reported fair intraexaminer reli-
ability (ICC: 0.65). We found higher ICC values (0.92 and 0.89),
suggesting that the NPC break is a reliable measure over time in
children. The clinician can expect typical differences of ~1 cm, but
differences as large as ~5 cm may be measured. One caveat is that
patients with receded NPC’s (Ͼ6 cm) will generally have larger
differences when tested over time. The results for the NPC recov-
ery showed similar high ICC, suggesting it is also a reliable
measure.
Accommodative Amplitude
The intraexaminer within-session reliability was found to be
excellent (0.88 to 0.95) for the AA. Hence, the measurements
within a single testing session are very repeatable in children. Our
intraexaminer between-session results in children are consistent
with the previous study by Chen and O’Leary15
in adults showing
excellent (r ϭ 0.99) reliability. Our results show a higher level of
reliability than Brozek’s adult study in which the ICC was 0.51.10
Based on our results, a clinician can expect typical within-ses-
sion differences of ~2 D, but differences as large as ~5 D may be
measured. These differences are difficult to compare with the re-
sults reported in the two most often quoted adult studies.10, 13
These studies reported the typical patient would have a SD of
about 0.75 D and, hence, a range of values of about Ϯ1.5 D. From
this, Rosenfield and Cohen13
suggested that the typical difference
between two AA measurements on the same subject would be
within 1.5 D of each other. This conclusion is erroneous. The
Ϯ1.5 D range understates the reliability of the AA measure, sug-
gesting that the typical patient could have a range of AA measure-
ments of actually ~3 D. Thus, the two AA measurements on such
a patient could readily be more than 1.5 D (and up to 3 D)
different and still be within the bounds of the natural variability of
this patient. This 3 D difference for adults is lower than this study’s
5 D difference for children, which indicates adults tend to have
better within-session reliability than children.
In both of these previous adult studies, the intervening time
between measurements varies from several hours to a day or more,
whereas in our study, the between-session measurements were
taken 1 week apart. Our within-session reliability is more compa-
rable to the short-term repeatability results of these adult studies.
There are no comparable studies for estimating the between-ses-
sion differences that this study found. Based on our results, a
clinician can expect typical between-session differences of ~5 D,
but differences as large as ~5 to 10 D may be measured. The large
COR values for intraexaminer reliability between sessions are be-
yond the comfort level of most clinicians.
CONCLUSIONS
The study and analysis of measurement reliability is extensive
and intricate, and different authors have divergent views on which
methods are most appropriate. We elected to use a multifaceted
approach in presenting our reliability results because there is no
one accepted mode of analysis, and each method gives a different
and useful perspective on the problem.
The ICC, perhaps the most common index of reliability in the
health science literature,4, 20
provides a method to compare the
reliability of tests that have different units of measurement (in our
case, tests using prism diopters, to lense diopters, to centimeters).
We can view the relative reliability of the group of tests typically
used in evaluating the syndrome diagnosis of CI. The ICC also
allows us to compare our results with older literature that may have
only used the correlation coefficient in their analysis. Three of the
four measures (NH, NPC, and AA) often used in the classification
of CI generally have good-to-excellent intraexaminer and interex-
aminer reliability based on the ICC evaluation. The PFV break was
found to have only fair intraexaminer and interexaminer reliability.
A difficulty with the ICC is that its interpretation is problematic
for the clinician. Knowing that the ICC for a test is 0.90 does not
help the clinician with the question of “how much difference
should I reasonably expect between two measurements of that
same test?” The Bland-Altman approach provides a more clinician-
friendly view of reliability. We have presented both the typical
difference between measurements (mean range for within session
and MAD for between sessions) and what the clinician may think
of as the worst-case difference, or as we have described in the results
section, “the difference can be as large as” (R95 within session or
the COR between sessions).
We feel that the clinician who routinely takes these binocular
measurements on children will find the typical differences within
session and between sessions to be in line with what they generally
expect. See the summary in Table 7. The worst-case difference will
be greater, and in some cases much greater (two to five times the
typical differences) than those differences expected by that same
clinician. These “worst-case” differences represent the maximum
difference between measurements that a clinician would ever ob-
serve on nearly all patients.
It may be unfair to look at each new patient in light of the
worse-case difference scenario. Most patients are close to typical,
but of course, a few problematic patients are not! We suggest
viewing a patient using the typical difference in most cases and
asking the following question: would the diagnosis be altered if the
observed measurement changed by as much as the typical differ-
ence? What if it changed by as much as the worst-case difference? It
is especially important to consider the worst-case differences when
Classification of Convergence Insufficiency—Rouse et al. 261
Optometry and Vision Science, Vol. 79, No. 4, April 2002
there are inconsistencies in the case findings; for example, when a
patient with clinical findings supports the diagnosis of CI, but the
patient has no or few symptoms, or when a patient presents with CI
type symptoms, but has clinical findings that appear within accept-
able limits.
Unfortunately for the clinician treating and monitoring this
condition, they will need to use the worst-case differences to feel
confident that the changes that are being seen are not just natural
variation in the between-session measurements. The large poten-
tial test-retest differences found could complicate clinical decision-
making in regards to diagnosis and treatment. Changes in the
testing protocol used in this study, as well as other PFV procedures
should be investigated in an attempt to improve both intraexam-
iner and interexaminer reliability.
APPENDIX
von Graefe Near Heterophoria Test
A table stand with phoropter (B and L style) was used. Risley
prisms were marked in 2 ⌬ increments from 0 to 30 ⌬. The fixation
target was a vertical column of 20/30 reduced Snellen letters. Illu-
mination was provided by a floor-stand lamp with 100- to 150-ft-
cd/m2
on the card face. The patient’s interpupillary distance was
taken by pupillometer and dialed into the phoropter. The subject’s
habitual distance refractive correction was placed in the phoropter
if the subject was wearing glasses.
Before testing, the subject was shown a two-picture demonstra-
tion of the test responses. The first picture showed the initial pre-
sentation. The subject was told, “First you will see two lines of
letters with one being higher and to the right.” The second picture
showed vernier alignment of the two lines. The subject was told,
“The upper line of letters will flash on and off several times. Each
time they come on tell me whether they are to the right, to the left,
or directly above the lower letters as shown here.”
The examiner then introduced 4 to 6 ⌬ base-up over the left eye
(OS) for dissociation and 12 ⌬ base-in over the right eye (OD) for
biasing. The subject was asked to “Please read the line of letters”
first with the OS and then with the OD. The occluder was re-
moved, and the subject was asked, “Do you see two lines of letters
with one being higher and to the right (the subject’s right shoulder
was tapped to reinforce the concept of ‘right’ direction).” If only
one target was seen, the prisms were readjusted. If the subject was
still unable to see the two targets, the suppressing eye was deter-
mined and testing was stopped.
The subject was then instructed to “Keep the letters as clear as
you can. The upper letters will flash on and off several times. Each
time they come on tell me whether they are to the right, to the left,
or directly above the lower letters.” The RE was occluded, and then
the RE was uncovered and recovered (~1 s flash exposure time). If
the upper target was seen to the right, the RE prism was reduced by
4 ⌬ and the flashing was repeated until the upper target was first
seen to the left. The prism was changed in 2 ⌬ increments until the
subject reported alignment. The prism amount and direction of
the deviation (eso, exo, and ortho) were recorded. The procedure
was repeated two additional times, and results were recorded.
von Graefe Fusional Vergence Tests
The initial setup for the von Graefe fusional vergence measures
was the same as for the von Graefe heterophoria measurement
except the fixation target was a two letters by five letters block of
20/30 reduced Snellen letters.
Before testing, the subject was shown a three-picture demon-
stration of the test responses. The first picture had the block of
letters in focus. The subject was told, “This is what I mean by
clear.” The second picture had the block of letters photographically
blurred to a level of 0.50 to 0.75 D to simulated the first sustained
blur point. The subject was told, “This is what I mean by blurred.”
The third picture had two blocks of letters one printed on paper,
one overlaid on a plastic sheet. The subject was told, “This is what
I mean by double (as the examiner slowly slides the targets apart).
I will separate them a little more, like this... (As examiner slides the
targets so they are distinctly separate). Then tell me when the two
blocks come together into one block, like this... (As the examiner
slides the two targets together until they are one).”
The subject was positioned behind the phoropter and was in-
structed to “Read aloud the letters in the top row of the block of
letters you see in front of you.” Then the subject was instructed, “I
want you to say ‘now or blur’ when the letters become blurred as I
showed you in the picture.” Base-in prism was added at the ap-
proximate rate of 4 ⌬/s.
Once blur was reported or if no blur was reported, the subject
was instructed, “I want you to say ‘now or double’ when the block
of letters are seen as double as I showed you in the last picture.”
Once again, base-in prism was added at the approximate rate of 4
⌬/s until double vision was reported. Once diplopia was reported,
the subject was instructed, “Now I want you to tell me when you
see only one block of letters. The letters may either be clear or
blurred, but they must be single.” Base-in prism was reduced at the
approximate rate 4 ⌬/s until single vision was reported. The results
were recorded in prism diopters for the blur, break, and recovery
findings. A 20-s “wait” period was used between fusional vergence
measurements. PFV was measured next using base-out prisms with
instructions as above. In alternate order, NFV and then PFV were
repeated two additional times and recorded.
TABLE 7.
Summary of the typical differences and worst-case differ-
ences that clinicians might expect for the four binocular
measures.a
Typical Difference Worst-Case Difference
Within
Session
Between
Session
Within
Session
Between
Session
NH 1–2 ⌬ 1–2 ⌬ 2–4 ⌬ 7–8 ⌬
PFV-break 2–5 ⌬ 4 ⌬ 6–12 ⌬ 12–14 ⌬
NPC 1 cm 1 cm 2–3 cm 5 cm
AA 2 D 1–2 D 5 D 5–10 D
a
NH, near heterophoria; PFV-break, positive fusion vergence
break point; NPC, near point of convergence; AA, accommoda-
tive amplitude. Worst-case difference (within session ϭ R95,
between session ϭ COR (or LoA); typical difference (within ses-
sion ϭ mean range; between session ϭ MAD).
262 Classification of Convergence Insufficiency—Rouse et al.
Optometry and Vision Science, Vol. 79, No. 4, April 2002
Nearpoint of Convergence Test
The accommodative rule was used with a fixation target consist-
ing of a single vertical column of five 20/30 reduced Snellen letters.
Subjects wore their habitual spectacle or contact lenses prescription
during the testing.
The subject was given the following instructions: “This proce-
dure is designed to measure your ability to converge your eyes; that
is, turn your eyes in toward your nose. Look directly at the line of
letters on this card (examiner pointed to target card on accommo-
dative rule) with both eyes open as the card is moved toward you.
The image may appear to blur. That is okay. However, if you see
the letter double (that is, split into two), say ‘two.’ I will then pull
the card back. If you see the two images join into a single image
again, say ‘one.’ This procedure will be repeated three times.”
The examiner sat slightly to the side of the subject and viewed
the subject from a slightly elevated position. The examiner held the
accommodative rule in the horizontal position with the ruler po-
sitioned against the middle of the subject’s forehead approximately
1 cm above the eyebrow line and tested convergence along the
anterior-polar or z axis. The target card was started at approxi-
mately 20 cm. The target card was moved along the accommoda-
tive rule in a smooth linear manner at the rate of approximately 1
cm/s toward the subject. The slider was stopped when the subject’s
eyes were observed to fail to converge or when the subject reported
diplopia. The target card was stopped at this point, and the subject
was asked if the images remained double. If the images remained
double or the subject remained strabismic relative to the target, the
centimeter position of the card on the accommodative rule was
recorded. If the images became fused into one or the subject was
observed to be bifixating the target, the target was moved toward
the subject’s face until the eyes were objectively observed to fail to
converge or the subject reported diplopia. This cycle was contin-
ued until the subject remained diplopic or strabismic relative to the
target.
The target was then moved away from the subject at approxi-
mately 1 cm/s until the eyes were observed to reestablish bifixation
or the subject reported the two images fused into a single image.
The centimeter position of the card on the accommodative rule
was recorded. The target card was moved back to the starting
position of 20 cm, and the subject was given a 10-s rest period
before starting the next measurement. The procedure was repeated
two additional times and recorded.
Pushup Method for Monocular Accommodative
Amplitude
The accommodative rule was used with a fixation target consist-
ing of a single vertical column of five 20/30 reduced Snellen letters.
Subjects wore their habitual spectacle or contact lenses prescription
during the testing.
The subject was given the following instructions: “This proce-
dure is designed to measure your ability to focus your eyes on a
target that is slowly moved closer to your eyes. I am going to cover
your left eye with this patch. Look directly at the line of letters on
this card (pointing to the target card on the accommodative rule) as
the card is moved toward you. Tell me when the target first be-
comes blurry. This is what I mean by blurry (demo card shown).
This procedure will be repeated three times.”
The examiner sat in front of the subject and viewed the
subject from a slightly elevated position. The examiner held the
accommodative rule in the horizontal position with the ruler
positioned against the subject’s forehead (approximately 1 cm
above the eyebrow line) and above the right eye. The examiner
tested the accommodative amplitude of the right eye only. The
target card was started at approximately 20 cm. The target card
was moved along the accommodative rule in a smooth linear
manner at the rate of approximately 1 cm/s toward the subject’s
face. The target was stopped when the subject first reported that
the letters were blurry. The subject was then asked if the letters
remained blurry or became clear. If the letters remained blurry,
the centimeter position of the target on the accommodative rule
was recorded. If the print became clear, the target was moved
toward the subject until the print first became blurry. This cycle
was continued until the subject reported a sustained blur. The
target was moved back to the starting position of approximately
20 cm, and the subject was given a 10-s rest period before
starting the next measurement. The procedure was repeated two
additional times and recorded.
ACKNOWLEDGMENTS
The Convergence Insufficiency and Reading Study (CIRS) group is Michael
W. Rouse, Leslie Hyman, Mohamed Hussein, Harold Solan, Eric Borsting,
Susan Cotter, David Grisham, Leonard Press, and Mitchell Scheiman.
Received June 26, 2000; revision received December 20, 2001.
REFERENCES
1. Rouse MW, Borsting E, Hyman L, Hussein M, the Convergence
Insufficiency and Reading Study (CIRS) group. Pilot study to evalu-
ate convergence insufficiency in a school-aged population. Optom
Vis Sci 1995;72(Suppl):218.
2. Rouse MW, Hyman L, Hussein M, Solan H, the Convergence Insuf-
ficiency and Reading Study (CIRS) Group. Frequency of conver-
gence insufficiency in optometry clinic settings. Optom Vis Sci 1998;
75:88–96.
3. Rouse MW, Borsting E, Hyman L, Hussein M, Cotter SA, Flynn M,
Scheiman M, Gallaway M, De Land PN, the Convergence Insuffi-
ciency and Reading Study (CIRS) Group. Frequency of convergence
insufficiency among fifth and sixth graders. Optom Vis Sci 1999;76:
643–9.
4. Streiner DL, Norman GR. Health Measurement Scales: a Practical
Guide to Their Development and Use. New York: Oxford University
Press, 1995:104–27.
5. Hirsch MJ, Bing LB. The effect of testing method on values obtained
for phoria at forty centimeters. Am J Optom Arch Am Acad Optom
1948;25:407–16.
6. Morgan MW. The reliability of clinical measurements with special
reference to distance heterophoria. Am J Optom Arch Am Acad Op-
tom 1955;32:167–79.
7. Rainey BB, Schroeder TL, Goss DA, Grosvenor TP. Inter-examiner
repeatability of heterophoria tests. Optom Vis Sci 1998;75:719–26.
8. Larson WL. Vergence breaks with a stepping prism. Am J Optom
Arch Am Acad Optom 1972;49:569–74.
9. Sheedy JE, Saladin JJ. Validity of diagnostic criteria and case analysis
in binocular vision disorders. In: Schor CM, Ciuffreda KJ, eds. Ver-
Classification of Convergence Insufficiency—Rouse et al. 263
Optometry and Vision Science, Vol. 79, No. 4, April 2002
gence Eye Movements: Basic and Clinical Aspects. Boston: Butter-
worth, 1983:517–40.
10. Brozek J, Simonson E, Bushard JW, Peterson HJ. Effects of practice
and the consistency of repeated measurements of accommodation
and vergence. Am J Ophthalmol 1948;31:191–8.
11. Penisten DK, Hofstetter HW, Goss DA. Reliability of rotary prism
fusional vergence ranges. Optometry 2001;72:117–22.
12. Feldman JM, Cooper J, Carniglia P, Schiff FM, Skeete JN. Compar-
ison of fusional ranges measured by Risley prisms, vectograms, and
computer orthopter. Optom Vis Sci 1989;66:375–82.
13. Rosenfield M, Cohen AS. Repeatability of clinical measurements of the am-
plitude of accommodation. Ophthalmic Physiol Opt 1996;16:247–9.
14. Bland JM, Altman DG. Statistical methods for assessing agreement be-
tween two methods of clinical measurement. Lancet 1986;1:307–10.
15. Chen AH, O’Leary DJ. Validity and repeatability of the modified
push-up method for measuring the amplitude of accommodation.
Clin Exper Optom 1998;81:63–71.
16. Fleiss JL. The Design and Analysis of Clinical Experiments. New
York: Wiley, 1986.
17. D’Augostino RB, Stevens MA. Tests for the Normal Distribution.
New York: Marcel Dekker, 1986:367–419.
18. Zadnik K, Mutti DO, Bullimore MA. Use of statistics for comparing
two measurement methods. Optom Vis Sci 1994;71:539–41.
19. Bland JM, Altman DG. A note on the use of the intraclass correlation
coefficient in the evaluation of agreement between two methods of
measurement. Comput Biol Med 1990;20:337–40.
20. Shoukri MM, Pause CA. Statistical Methods for Health Sciences,
2nd ed. Boca Raton, FL: CRC Press, 1999:19–42.
Michael W. Rouse
Southern California College of Optometry
2575 Yorba Linda Blvd.
Fullerton, California 92831
e-mail: mrouse@scco.edu
264 Classification of Convergence Insufficiency—Rouse et al.
Optometry and Vision Science, Vol. 79, No. 4, April 2002

Weitere ähnliche Inhalte

Was ist angesagt?

The Visual Field - For Doctors
The Visual Field - For DoctorsThe Visual Field - For Doctors
The Visual Field - For Doctors
Jessica Griego
 
Orthokeratology power point presentation
Orthokeratology power point presentationOrthokeratology power point presentation
Orthokeratology power point presentation
Digital-Marketing-Guru
 
Scleral lens case report series beyond the corneal borders
Scleral lens case report series  beyond the corneal bordersScleral lens case report series  beyond the corneal borders
Scleral lens case report series beyond the corneal borders
Hossein Mirzaie
 

Was ist angesagt? (20)

Patient compliance and follow up issues
Patient compliance and follow up issuesPatient compliance and follow up issues
Patient compliance and follow up issues
 
The Visual Field - For Doctors
The Visual Field - For DoctorsThe Visual Field - For Doctors
The Visual Field - For Doctors
 
Interpretation of OCT(Glaucoma)
Interpretation of OCT(Glaucoma)Interpretation of OCT(Glaucoma)
Interpretation of OCT(Glaucoma)
 
NAION
NAIONNAION
NAION
 
Calculation of magnification in low vision
Calculation of magnification in low visionCalculation of magnification in low vision
Calculation of magnification in low vision
 
Accommodative esotropia
Accommodative esotropiaAccommodative esotropia
Accommodative esotropia
 
Orthokeratology power point presentation
Orthokeratology power point presentationOrthokeratology power point presentation
Orthokeratology power point presentation
 
Wide field imaging in retinal pathology.pptx
Wide field imaging in retinal pathology.pptxWide field imaging in retinal pathology.pptx
Wide field imaging in retinal pathology.pptx
 
Armd cases in low vision
Armd cases in low visionArmd cases in low vision
Armd cases in low vision
 
AMBLYOPIA TREATMENT STUDIES PEDIG
AMBLYOPIA TREATMENT STUDIES PEDIGAMBLYOPIA TREATMENT STUDIES PEDIG
AMBLYOPIA TREATMENT STUDIES PEDIG
 
antisuppression exercises.ppt
antisuppression exercises.pptantisuppression exercises.ppt
antisuppression exercises.ppt
 
Multifocal choroiditis
Multifocal choroiditisMultifocal choroiditis
Multifocal choroiditis
 
Eccentric Fixation
Eccentric FixationEccentric Fixation
Eccentric Fixation
 
A Scan- Basics and Update
A Scan- Basics and UpdateA Scan- Basics and Update
A Scan- Basics and Update
 
Automated lensometer.pptx
Automated lensometer.pptxAutomated lensometer.pptx
Automated lensometer.pptx
 
Fixation_disparity.pptx
Fixation_disparity.pptxFixation_disparity.pptx
Fixation_disparity.pptx
 
Contact Lens Induced Dry Eyes (CLIDE)
Contact Lens Induced Dry Eyes (CLIDE)Contact Lens Induced Dry Eyes (CLIDE)
Contact Lens Induced Dry Eyes (CLIDE)
 
Retina Rounds
Retina RoundsRetina Rounds
Retina Rounds
 
Corneal topography by suraj
Corneal topography by surajCorneal topography by suraj
Corneal topography by suraj
 
Scleral lens case report series beyond the corneal borders
Scleral lens case report series  beyond the corneal bordersScleral lens case report series  beyond the corneal borders
Scleral lens case report series beyond the corneal borders
 

Andere mochten auch

Evaluating relative accommodations_in_general.10
Evaluating relative accommodations_in_general.10Evaluating relative accommodations_in_general.10
Evaluating relative accommodations_in_general.10
Yesenia Castillo Salinas
 
Accommodative training to_reduce_nearwork_induced.11
Accommodative training to_reduce_nearwork_induced.11Accommodative training to_reduce_nearwork_induced.11
Accommodative training to_reduce_nearwork_induced.11
Yesenia Castillo Salinas
 

Andere mochten auch (19)

Parallel testing infinity_balance__instrument_and.12
Parallel testing infinity_balance__instrument_and.12Parallel testing infinity_balance__instrument_and.12
Parallel testing infinity_balance__instrument_and.12
 
Procrsmed00485 0178
Procrsmed00485 0178Procrsmed00485 0178
Procrsmed00485 0178
 
Evaluating relative accommodations_in_general.10
Evaluating relative accommodations_in_general.10Evaluating relative accommodations_in_general.10
Evaluating relative accommodations_in_general.10
 
A binocular approach_to_treating_amblyopia_.11
A binocular approach_to_treating_amblyopia_.11A binocular approach_to_treating_amblyopia_.11
A binocular approach_to_treating_amblyopia_.11
 
Evaluating relative accommodations_in_general.10
Evaluating relative accommodations_in_general.10Evaluating relative accommodations_in_general.10
Evaluating relative accommodations_in_general.10
 
Mop chapter04
Mop chapter04Mop chapter04
Mop chapter04
 
Mop chapter05
Mop chapter05Mop chapter05
Mop chapter05
 
Effect of amblyopia_on_self_esteem_in_children.10
Effect of amblyopia_on_self_esteem_in_children.10Effect of amblyopia_on_self_esteem_in_children.10
Effect of amblyopia_on_self_esteem_in_children.10
 
The clinical course_of_intermittent_exotropia.9
The clinical course_of_intermittent_exotropia.9The clinical course_of_intermittent_exotropia.9
The clinical course_of_intermittent_exotropia.9
 
Mop chapter06
Mop chapter06Mop chapter06
Mop chapter06
 
Improvement of visual_function_in_an_adult.14
Improvement of visual_function_in_an_adult.14Improvement of visual_function_in_an_adult.14
Improvement of visual_function_in_an_adult.14
 
Mop chapter09
Mop chapter09Mop chapter09
Mop chapter09
 
April 5-2013-vrics
April 5-2013-vricsApril 5-2013-vrics
April 5-2013-vrics
 
Monocular vernier acuity_in_normally_binocular,.7
Monocular vernier acuity_in_normally_binocular,.7Monocular vernier acuity_in_normally_binocular,.7
Monocular vernier acuity_in_normally_binocular,.7
 
Prevalence of amblyopia_and_refractive_errors_in.19
Prevalence of amblyopia_and_refractive_errors_in.19Prevalence of amblyopia_and_refractive_errors_in.19
Prevalence of amblyopia_and_refractive_errors_in.19
 
Accommodative training to_reduce_nearwork_induced.11
Accommodative training to_reduce_nearwork_induced.11Accommodative training to_reduce_nearwork_induced.11
Accommodative training to_reduce_nearwork_induced.11
 
Mop chapter08
Mop chapter08Mop chapter08
Mop chapter08
 
Retinoscopy
RetinoscopyRetinoscopy
Retinoscopy
 
Tests
TestsTests
Tests
 

Ähnlich wie Reliability of binocular_vision_measurements_used.12

SIOP® Lesson Plan Template 2 © 2008 Pearson Ed.docx
SIOP® Lesson Plan Template 2     © 2008 Pearson Ed.docxSIOP® Lesson Plan Template 2     © 2008 Pearson Ed.docx
SIOP® Lesson Plan Template 2 © 2008 Pearson Ed.docx
jennifer822
 
New reponses 4 and 5When conducting research its important to k.docx
New reponses  4 and 5When conducting research its important to k.docxNew reponses  4 and 5When conducting research its important to k.docx
New reponses 4 and 5When conducting research its important to k.docx
curwenmichaela
 
Olive palpation sonography_and_barium_study_in_the
Olive palpation sonography_and_barium_study_in_theOlive palpation sonography_and_barium_study_in_the
Olive palpation sonography_and_barium_study_in_the
angelicaRAMIREZALTAM
 
Snp microarray based 24 chromosome
Snp microarray based 24 chromosomeSnp microarray based 24 chromosome
Snp microarray based 24 chromosome
t7260678
 
qpcr 1 s2.0-s0015028213005499-main (1)
qpcr           1 s2.0-s0015028213005499-main (1)qpcr           1 s2.0-s0015028213005499-main (1)
qpcr 1 s2.0-s0015028213005499-main (1)
鋒博 蔡
 
1 s2.0-s0015028213005499-main (1)
1 s2.0-s0015028213005499-main (1)1 s2.0-s0015028213005499-main (1)
1 s2.0-s0015028213005499-main (1)
鋒博 蔡
 

Ähnlich wie Reliability of binocular_vision_measurements_used.12 (20)

A randomized clinical_trial_of_vision.12
A randomized clinical_trial_of_vision.12A randomized clinical_trial_of_vision.12
A randomized clinical_trial_of_vision.12
 
8. a randomized clinical trial of vision therapy orthoptics versus pencil pus...
8. a randomized clinical trial of vision therapy orthoptics versus pencil pus...8. a randomized clinical trial of vision therapy orthoptics versus pencil pus...
8. a randomized clinical trial of vision therapy orthoptics versus pencil pus...
 
A randomized clinical_trial_of_vision.12
A randomized clinical_trial_of_vision.12A randomized clinical_trial_of_vision.12
A randomized clinical_trial_of_vision.12
 
SIOP® Lesson Plan Template 2 © 2008 Pearson Ed.docx
SIOP® Lesson Plan Template 2     © 2008 Pearson Ed.docxSIOP® Lesson Plan Template 2     © 2008 Pearson Ed.docx
SIOP® Lesson Plan Template 2 © 2008 Pearson Ed.docx
 
New reponses 4 and 5When conducting research its important to k.docx
New reponses  4 and 5When conducting research its important to k.docxNew reponses  4 and 5When conducting research its important to k.docx
New reponses 4 and 5When conducting research its important to k.docx
 
Evaluating the Medical Literature
Evaluating the Medical LiteratureEvaluating the Medical Literature
Evaluating the Medical Literature
 
Olive palpation sonography_and_barium_study_in_the
Olive palpation sonography_and_barium_study_in_theOlive palpation sonography_and_barium_study_in_the
Olive palpation sonography_and_barium_study_in_the
 
Oac guidelines
Oac guidelinesOac guidelines
Oac guidelines
 
Limbal Ischemia
Limbal IschemiaLimbal Ischemia
Limbal Ischemia
 
Snp microarray based 24 chromosome
Snp microarray based 24 chromosomeSnp microarray based 24 chromosome
Snp microarray based 24 chromosome
 
Alvarado Syst Rv
Alvarado Syst RvAlvarado Syst Rv
Alvarado Syst Rv
 
Surveillance Swabs for Detection of Methicillin Resistant 10.20.08
Surveillance Swabs for Detection of Methicillin Resistant 10.20.08Surveillance Swabs for Detection of Methicillin Resistant 10.20.08
Surveillance Swabs for Detection of Methicillin Resistant 10.20.08
 
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...
 
Poster: Test-Retest Reliability and Equivalence of PRO Measures
Poster: Test-Retest Reliability and Equivalence of PRO MeasuresPoster: Test-Retest Reliability and Equivalence of PRO Measures
Poster: Test-Retest Reliability and Equivalence of PRO Measures
 
littenberg-strep
littenberg-streplittenberg-strep
littenberg-strep
 
1 s2.0-s1525157820300106-main
1 s2.0-s1525157820300106-main1 s2.0-s1525157820300106-main
1 s2.0-s1525157820300106-main
 
MLBC vs CPS.pptx
MLBC vs CPS.pptxMLBC vs CPS.pptx
MLBC vs CPS.pptx
 
Flacs vs mcs
Flacs vs mcsFlacs vs mcs
Flacs vs mcs
 
qpcr 1 s2.0-s0015028213005499-main (1)
qpcr           1 s2.0-s0015028213005499-main (1)qpcr           1 s2.0-s0015028213005499-main (1)
qpcr 1 s2.0-s0015028213005499-main (1)
 
1 s2.0-s0015028213005499-main (1)
1 s2.0-s0015028213005499-main (1)1 s2.0-s0015028213005499-main (1)
1 s2.0-s0015028213005499-main (1)
 

Mehr von Yesenia Castillo Salinas

Mehr von Yesenia Castillo Salinas (20)

Tfm final final_2011
Tfm final final_2011Tfm final final_2011
Tfm final final_2011
 
Tfm lucia morchón
Tfm lucia morchónTfm lucia morchón
Tfm lucia morchón
 
Terapia visual-y-comportamental-frente-al-aprendizaje
Terapia visual-y-comportamental-frente-al-aprendizajeTerapia visual-y-comportamental-frente-al-aprendizaje
Terapia visual-y-comportamental-frente-al-aprendizaje
 
Terapia visual-ii
Terapia visual-iiTerapia visual-ii
Terapia visual-ii
 
Terapia de__accion__visual
Terapia  de__accion__visualTerapia  de__accion__visual
Terapia de__accion__visual
 
Terapia visual
Terapia visualTerapia visual
Terapia visual
 
Terapia visual en la escuela
Terapia visual en la escuelaTerapia visual en la escuela
Terapia visual en la escuela
 
Terapia visual 1
Terapia visual 1Terapia visual 1
Terapia visual 1
 
Tema 2-format-paloma-sobrado
Tema 2-format-paloma-sobradoTema 2-format-paloma-sobrado
Tema 2-format-paloma-sobrado
 
Tema 1 ocw
Tema 1 ocwTema 1 ocw
Tema 1 ocw
 
Spasm of the_near_reflex_triggered_by_disruption.9
Spasm of the_near_reflex_triggered_by_disruption.9Spasm of the_near_reflex_triggered_by_disruption.9
Spasm of the_near_reflex_triggered_by_disruption.9
 
Revital visioninyourpractice
Revital visioninyourpracticeRevital visioninyourpractice
Revital visioninyourpractice
 
Puell óptica fisiológica
Puell óptica fisiológicaPuell óptica fisiológica
Puell óptica fisiológica
 
Prom coi vision 2
Prom coi vision 2Prom coi vision 2
Prom coi vision 2
 
Prescribing spectacles in_children__a_pediatric.9
Prescribing spectacles in_children__a_pediatric.9Prescribing spectacles in_children__a_pediatric.9
Prescribing spectacles in_children__a_pediatric.9
 
Op00306 c
Op00306 cOp00306 c
Op00306 c
 
Leccion 17 texto
Leccion 17 textoLeccion 17 texto
Leccion 17 texto
 
Evolucion del ojo
Evolucion del ojoEvolucion del ojo
Evolucion del ojo
 
Eoft m01 t03
Eoft m01 t03Eoft m01 t03
Eoft m01 t03
 
Entrenamiento visual
Entrenamiento visualEntrenamiento visual
Entrenamiento visual
 

Kürzlich hochgeladen

Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Sheetaleventcompany
 
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Sheetaleventcompany
 
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Sheetaleventcompany
 
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Sheetaleventcompany
 
Control of Local Blood Flow: acute and chronic
Control of Local Blood Flow: acute and chronicControl of Local Blood Flow: acute and chronic
Control of Local Blood Flow: acute and chronic
MedicoseAcademics
 
👉Chandigarh Call Girl Service📲Niamh 8868886958 📲Book 24hours Now📲👉Sexy Call G...
👉Chandigarh Call Girl Service📲Niamh 8868886958 📲Book 24hours Now📲👉Sexy Call G...👉Chandigarh Call Girl Service📲Niamh 8868886958 📲Book 24hours Now📲👉Sexy Call G...
👉Chandigarh Call Girl Service📲Niamh 8868886958 📲Book 24hours Now📲👉Sexy Call G...
Sheetaleventcompany
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
Sheetaleventcompany
 

Kürzlich hochgeladen (20)

Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
 
Intramuscular & Intravenous Injection.pptx
Intramuscular & Intravenous Injection.pptxIntramuscular & Intravenous Injection.pptx
Intramuscular & Intravenous Injection.pptx
 
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
 
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
 
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
Premium Call Girls Nagpur {9xx000xx09} ❤️VVIP POOJA Call Girls in Nagpur Maha...
 
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
 
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
 
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
 
Control of Local Blood Flow: acute and chronic
Control of Local Blood Flow: acute and chronicControl of Local Blood Flow: acute and chronic
Control of Local Blood Flow: acute and chronic
 
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableCall Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
 
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 
Most Beautiful Call Girl in Chennai 7427069034 Contact on WhatsApp
Most Beautiful Call Girl in Chennai 7427069034 Contact on WhatsAppMost Beautiful Call Girl in Chennai 7427069034 Contact on WhatsApp
Most Beautiful Call Girl in Chennai 7427069034 Contact on WhatsApp
 
👉Chandigarh Call Girl Service📲Niamh 8868886958 📲Book 24hours Now📲👉Sexy Call G...
👉Chandigarh Call Girl Service📲Niamh 8868886958 📲Book 24hours Now📲👉Sexy Call G...👉Chandigarh Call Girl Service📲Niamh 8868886958 📲Book 24hours Now📲👉Sexy Call G...
👉Chandigarh Call Girl Service📲Niamh 8868886958 📲Book 24hours Now📲👉Sexy Call G...
 
🚺LEELA JOSHI WhatsApp Number +91-9930245274 ✔ Unsatisfied Bhabhi Call Girls T...
🚺LEELA JOSHI WhatsApp Number +91-9930245274 ✔ Unsatisfied Bhabhi Call Girls T...🚺LEELA JOSHI WhatsApp Number +91-9930245274 ✔ Unsatisfied Bhabhi Call Girls T...
🚺LEELA JOSHI WhatsApp Number +91-9930245274 ✔ Unsatisfied Bhabhi Call Girls T...
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
 
Gastric Cancer: Сlinical Implementation of Artificial Intelligence, Synergeti...
Gastric Cancer: Сlinical Implementation of Artificial Intelligence, Synergeti...Gastric Cancer: Сlinical Implementation of Artificial Intelligence, Synergeti...
Gastric Cancer: Сlinical Implementation of Artificial Intelligence, Synergeti...
 
VIP Hyderabad Call Girls KPHB 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls KPHB 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls KPHB 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls KPHB 7877925207 ₹5000 To 25K With AC Room 💚😋
 
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
 
Kolkata Call Girls Shobhabazar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Gir...
Kolkata Call Girls Shobhabazar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Gir...Kolkata Call Girls Shobhabazar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Gir...
Kolkata Call Girls Shobhabazar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Gir...
 

Reliability of binocular_vision_measurements_used.12

  • 1. ORIGINAL ARTICLE Reliability of Binocular Vision Measurements Used in the Classification of Convergence Insufficiency MICHAEL W. ROUSE, OD, MS, FAAO, ERIC BORSTING, OD, MS, PAUL N. DELAND, PhD, and The Convergence Insufficiency and Reading Study (CIRS) Group Southern California College of Optometry, Fullerton, California (MWR, EB), Mathematics Department, California State University at Fullerton, Fullerton, California (PND) ABSTRACT: Purpose. To evaluate the reliability of binocular vision measurements used in the classification of convergence insufficiency. Methods. Two examiners tested 20 fifth and sixth graders in a school setting who passed a screening of visual acuity, refraction, and binocularity. The tests, conducted using a standard protocol, consisted of von Graefe near heterophoria (NH), phorometric positive fusional vergence (PFV), nearpoint of convergence (NPC), and monocular pushup accommodative amplitude (AA). Each examiner measured each child three consecutive times for each test, on two separate occasions, spaced approximately 1 week apart. Intraexaminer and interexaminer agreement was assessed using intraclass correlation coefficients (ICC), the median absolute difference (MAD), and the coefficient of repeatability (COR). Results. The within-session reliability of the NH (ICC: 0.95 to 0.99), NPC (ICC: 0.94 to 0.98), and AA (ICC: 0.88 to 0.95) were good, whereas the PFV was less reliable (ICC: 0.71 to 0.94). The intraexaminer reliability between sessions was good for the NPC (ICC: 0.92 and 0.89), less reliable for NH (ICC: 0.81 and 0.81) and AA (ICC: 0.89 and 0.69), and much less reliable for PFV break (ICC: 0.59 and 0.53). Typical between-session PFV differences (MAD) were between 3 and 4 ⌬, whereas the COR differences were as large as 12 ⌬. Conclusions. Three of the four measures (NH, NPC, and AA) often used in the classification of convergence insufficiency generally have good within-session and between-session reliability. The PFV break was found to have only fair reliability with clinically significant differences between sessions. The large potential test-retest differences found could complicate clinical decision-making in regards to diagnosis and treatment. (Optom Vis Sci 2002;79:254–264) Key Words: reliability, repeatability, heterophoria, nearpoint of convergence, accommodative amplitude, positive fusional vergence, convergence insufficiency, binocular vision T he Convergence Insufficiency and Reading Study (CIRS) group is investigating the relationship between convergence insufficiency (CI) and reading. The CIRS Group has com- pleted the initial steps of developing a CI classification system and standardized protocols for each of the diagnostic methods.1–3 The classification system relies on three diagnostic signs: near hetero- phoria, near positive fusional vergence, and the nearpoint of con- vergence. Accommodative amplitude was also included as a fourth diagnostic sign because of the high association between accommo- dative insufficiency and CI.3 The diagnostic methods selected to assess whether these signs are present were von Graefe heterophoria (NH)at30cm,vonGraefepositivefusionalvergence(PFV)at30cm, the nearpoint of convergence (NPC), and the pushup method of accommodative amplitude (AA). The von Graefe phorometry meth- ods were selected by the CIRS Group in 1994 because of their com- mon use in practice, use in the diagnosis of CI, and the availability of associated normative data studies. Even though these methods are commonandacceptedmeasuresusedinthediagnosisofCI,wefound no single study that evaluated the intraexaminer and interexaminer reliability for this group of methods on children. Reliability (or the commonly used synonym repeatability) re- flects the amount of error, both random and systematic, inherent in any measurement.4 Reliability helps determine the confidence with which we can appraise the presence or absence of functional The complete Manual of Testing Protocols for the CIRS Group (1996) is available by contacting Michael W. Rouse, OD, MS, Southern California College of Optometry, 2575 Yorba Linda Blvd., Fullerton, CA 92831 (or by e-mail: mrouse@scco.edu). 1040-5488/02/7904-0254/0 VOL. 79, NO. 4, PP. 254–264 OPTOMETRY AND VISION SCIENCE Copyright © 2002 American Academy of Optometry Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 2. abnormalities, trends of deterioration or of spontaneous improve- ment, and the effects of therapy. Tests should be repeatable, with the same examiner at different times (intraexaminer reliability) as well as with different examiners (interexaminer reliability) obtain- ing similar results. Reliability is critical information for both the clinician and researcher who want to obtain an accurate time course of the patient’s condition. Reliability of von Graefe Heterophoria Hirsch and Bing5 reported the reliability of the near von Graefe method using 38 adult subjects (optometry students) measured by two examiners on two separate occasions. The exact time interval between the two sessions was not specified. Hirsch and Bing found good-to-excellent reliability for both intraexaminer (r ϭ 0.88 ex- aminer 1 and 2) and interexaminer (r ϭ 0.94) measurements. They also reported relatively small intraexaminer mean differences of 2.16 ⌬ (SD ϭ 1.84) for one examiner, 2.05 ⌬ (SD ϭ 1.75) for the other examiner, and a small interexaminer mean difference of 2.00 ⌬. Morgan6 reported good intraexaminer reliability (r ϭ 0.81) on 23 optometry students who first served as subject and then exam- iner each week over a 5-week period. Rainey et al.7 evaluated in- terexaminer repeatability of heterophoria tests on 72 second- and third-year optometry students. He reported fair-to-good reliability (r ϭ 0.75) for the near von Graefe method, with a small mean difference of Ϫ0.20 ⌬, but clinically large 95% limits of agreement of (Ϫ0.20 Ϯ 6.7 ⌬). Reliability of von Graefe Fusional Vergence There have been few investigations regarding the reliability of fusional vergence measurements. The general opinion is that when fusional vergence tests are repeated on the same patient, the second value found may be quite different from the first.8 Sheedy9 sug- gested that “A difference of 10 prism diopters from one fusional vergence amplitude measurement to another is not unusual unless rigorous controls are applied.” Brozek et al.10 examined the PFV at distance on six occasions in six subjects between the ages of 20 and 30 years. A Risley prism was held in front of one eye while the subject fixated a spot of light at 6 m. It was not clear whether single or multiple examiners were used. Brozek et al. found good consistency among the six measures (rc ϭ 0.81, where rc is a modified intraclass correlation coefficient [ICC]). The actual ICC (which we calculated from their data) for these data is 0.72, which still indicates good reliability. Assuming that the mean difference between two vergence measurements is zero and using our ICC calculation, we estimated the 95% limits of agreement for Brozek’s data to be Ϯ5.06 ⌬. Penisten et al.11 re- cently completed a similar study, but of phoropter-mounted Risley prism fusional vergence at 4 m and 40 cm on eight young adult subjects. The authors reported that the distance PFV break and near PFV blur and recovery were the least repeatable with an esti- mated intrasubject SD on replicated measurements of about 2.75 ⌬ (compared with 3.45 ⌬ for Brozek et al.) whereas the distance PFV blur and recovery were slightly more repeatable (SD ϭ 2.00 to 2.25 ⌬). The PFV break at near had the smallest SD of 1.5 to 2 ⌬. Feldman et al.12 compared the near PFV taken twice within a single session (5 min apart) by a single examiner. Subjects were adults (optometric students, faculty, and staff) with a mean age of 25 years. They reported good-to-excellent within-session reliabil- ity for both PFV break (r ϭ 0.87) and PFV recovery (r ϭ 0.86). Reliability of Nearpoint of Convergence Brozek et al.10 also examined the nearpoint of convergence on six occasions in six subjects between the ages of 20 and 30 years. A Prentice rule with a white circular target, 2 mm in diameter, was bought in from the distance of clear vision to the point of binocular diplopia. It was not clear whether single or multiple examiners were used. They reported good consistency (rc ϭ 0.79), but the corresponding ICC was only 0.65, which reflects only a fair level of reliability. Reliability of Accommodative Amplitude Brozek et al.10 also examined the nearpoint of accommodation on six occasions in six subjects between the ages of 20 and 30 years. A Prentice rule with a 20/30 line of letters was bought in from a distance of clear vision to the point of first blur. It was not clear whether single or multiple examiners were used. Three measure- ments were taken and averaged on each of six occasions, and good consistency (rc ϭ 0.76) was reported, although the ICC was only 0.51, which suggests only fair reliability. The AA of the six subjects were fairly homogeneous, which may have artificially lowered the ICC. Rosenfield and Cohen13 evaluated the pushup method of ac- commodative amplitude on five occasions separated by at least 24 h. The maximum separation between the sessions was not re- ported. It was also not clear whether single or multiple examiners were used. Thirteen adult subjects (mean age of 24 years) viewed a single optotype within the smallest line of letters that could be resolved at a viewing distance of 40 cm, and the target was brought from clear vision to first sustained blur. They reported that the range over which 95% of accommodative amplitude values would be predicted to lie was 10.11 D Ϯ 1.44 (i.e., mean ϭ 10.11 D, SD ϭ 0.73 D, and 1.96 ϫ 0.73 ϭ 1.44 D). These authors inappro- priately characterized this range as the Bland-Altman 95% limits of agreement.14 In this case the Bland and Altman limits of agreement should provide an interval in which 95% of the differences between two measurements of amplitude, not the actual amplitude values, would be predicted to lie. From the results of Rosenfield and Co- hen and by making certain reasonable assumptions, estimated val- ues of the Bland-Altman limits of agreement can be calculated. In particular, assuming that there is no bias between two measure- ments and that the ICC is a moderate 0.70, the 95% limits of agreement can be estimated to be Ϯ1.11 D. Chen and O’Leary15 measured accommodative amplitude on 18 adults on two separate occasions (the exact time period was not reported). A modified pushup method of blur to first detection was used with a target size of N8 reduced Lea symbols. They reported a correlation coefficient of 0.99, with a mean difference of 0.07 D and 95% limits of agreement of 0.07 Ϯ 1.22 D. The literature review reveals heterogeneity in the reporting of reliability (or repeatability) study results on binocular measures, making direct and clear comparisons between studies difficult. Classification of Convergence Insufficiency—Rouse et al. 255 Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 3. Many of the above-cited studies inappropriately have used the Pearson product-moment correlation coefficient (r) as an index of reliability. Other studies utilized the methods of Bland and Alt- man14 in reporting the limits of agreement—a range of values in which it is reasonable to expect the difference between two mea- surements of the same parameter to occur just by chance. The studies by Rosenfield and Cohen13 and Penisten et al.11 inappro- priately used the Bland-Altman “limits of agreement” terminol- ogy, but in fact reported ranges of values on a single measurement of a parameter that one might expect from a typical patient. This indirect view of reliability is difficult to compare with the Bland- Altman approach, which is a direct view of the distribution of differences between replicated measurements. There is a clear need to evaluate intraexaminer and interexam- iner reliability of common binocular vision measurements in school-aged children. Although a few studies suggest that there is good reliability for some measurements (near heterophoria and accommodative amplitude), it may not be appropriate to apply adult results to children because binocular function depends on both examiner instructions and the patient’s subjective response. Children may be poorer observers, have more trouble understand- ing instructions or expected endpoints, or be slower to respond than adults. The purpose of this paper is to evaluate the reliability of the primary binocular vision measurements used in determining the diagnosis of CI in school-aged children. METHODS This study was approved by the Southern California College of Optometry institutional review board, and informed consent was obtained for all subjects in the study. Study Population Fifth and sixth graders were screened in a school setting by two CIRS examiners according to a standard protocol. Screening cri- teria were as follows: • either no glasses or had worn glasses or contact lenses Ն1 month by subject report; • visual acuity 20/30 or better in each eye with habitual correction using a Snellen wall chart; • uncorrected refractive error equal to or between Ϫ0.50 to ϩ1.00 D, and Յ1.00 D of astigmatism in either eye or Յ1.00 D of anisometropia by retinoscopy; • no strabismus at 3 m or 30 cm by unilateral cover test. Data Collection The first 20 consecutive children who passed the vision screen- ing were used as subjects. Intraexaminer and interexaminer reli- ability were evaluated for the following measurements: • von Graefe heterophoria at 30 cm using a single line of 20/30 reduced Snellen target; • von Graefe PFV and NFV at 30 cm (blur/break/recovery) using a 2 ϫ 5 block of 20/30 reduced Snellen target; • NPC (break/recovery) using a single line of 20/30 reduced Snellen target on a Astron International (ACR/21) Accommo- dative Rule; • monocular accommodative amplitude (Donder’s Pushup Method) of the right eye only using a single line of 20/30 reduced Snellen target on an Astron International (ACR/21) Accommodative Rule. Each examiner took three consecutive measurements on each subject according to the standard protocol outlined in the Appen- dix. The exception was vergence measures, where a counterbal- anced method of negative and then positive fusional vergence was conducted until three measurements of each were reached. The examiners performed independent measurements on the same sub- jects without knowledge of the other examiner’s results. Measure- ments were taken again by the same examiners on the same subjects 1 week later. One problem we noted while reviewing the literature was a lack of detail in the Methods sections. Because there is some variation in the literature and especially among practitioners as to the exact procedure for our four measures, we are providing detailed meth- ods as an Appendix to the manuscript as outlined from the CIRS Manual of Procedures.1 a Data Analysis This study design allows for the consideration of intraexaminer reliability both within and between sessions as well as within- session interexaminer reliability for each of the four principal CI diagnostic variables: NH, PFV break, NPC break, and AA. Within-session intraexaminer reliability was assessed using both the within-session ranges and the intraclass correlation coefficient (ICC). The range for each subject is the difference between the maximum and minimum of the three within-session measure- ments. We will report first the sample mean range, which provides a measure of a typical patient’s within-session difference in mea- sures; second, the 95th percentile of the ranges (R95), which gives a practical upper limit on the differences between within-session measurements. We estimate that 95% of all patients would have a maximum within-session difference in measures no greater than this limit. The ICC is an overall index of reliability ranging between zero and one. A value of one indicates perfect repeatability—meaning, in this case, each subject obtained the same value on the three within-session measures. A value of zero indicates no reproducibil- ity of the measurement and, hence, no reliability. The ICC is commonly interpreted as follows16 : ICC Ͻ 0.4 indicates poor reliability; 0.4 Ͻ ICC Ͻ 0.75 indicates fair-to-good reliability; ICC Ͼ 0.75 indicates good-to-excellent reliability. The ICC depends on both the between- and within-subject variability. It will be high when the within-subject variability is low relative to the total of between- and within-subject variability. It will be low when within-subject variability is high relative to this total variability. Hence, a sample that either overestimates or un- derestimates the population variability may result in a distorted ICC estimate. Consequently, it is important that the ICC is inter- 256 Classification of Convergence Insufficiency—Rouse et al. Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 4. preted in conjunction with measures of variability like the range. Low values of the range will correspond typically to a higher ICC. For intraexaminer between-session reliability, each examiner’s ses- sion 1 and session 2 means were compared. A principal focus is the distribution of the between-session differences in these means. Here the methods of Bland and Altman14 are useful in the case where the distribution of differences is approximately normal and the mean difference is close to zero. To check these assumptions, a preliminary Anderson-Darling test for normality17 and a matched- paired t-test were conducted. If both of these tests were nonsignif- icant, then we proceeded with the Bland-Altman methodology. The mean difference, the SD of the differences, and the coefficient of repeatability (COR), which is 1.96 ϫ SD, and the 95% limits of agreement (mean difference Ϯ COR) were computed. In cases where either of the preliminary tests were significant, we consid- ered the distribution of absolute differences. In place of the COR, we computed the 95th percentile of the absolute differences (AD95). This 95th percentile provides, as does the COR in the case of normality and a zero mean, a threshold for differences of successive measures that would have to be exceeded to conclude that a true shift in value has likely occurred, as opposed to an observed difference that can be explained by the natural variability in the measure. In both cases, we find it useful to compute the median absolute difference (MAD). It provides a measure of the typical difference in mean between the two sessions where the distribution of differences may not be normal. The ICC also was computed as an index of agreement between the means of the two sessions. These same methods were also used in assessing within- session interexaminer reliability. Sample Size The sample size of 20 was selected based on the available re- sources to CIRS at the time of testing. In testing the hypothesis of having just a fair level of reliability, say Ho: ICC ϭ 0.50, at the 0.05 level of significance, 23 subjects would be required to have 80% power to reject the alternative of having excellent reliability, say ICC ϭ 0.80. This assumes that a one-tailed test is to be conducted. For the same test conducted at 70% power, 18 subjects would be required. Hence, our sample size of 20 renders our test slightly underpowered, but is also a substantial improvement on several of the frequently referenced studies (e.g., Brozek et al.,10 Penisten et al.,11 and Rosenfield and Cohen13 ) that were described previously. RESULTS Twenty fifth and sixth graders (8 males, 12 females; mean age 10.8 years, SD 0.34 years, range 10.2 to 11.5 years) served as the subjects. Near Heterophoria Table 1 provides a summary of reliability measures for the NH. The results indicate a high level of intraexaminer reliability, both within and between sessions. The within-session ICC’s are excel- lent (0.95 or higher), and the mean ranges are Ͻ2 ⌬ with R95 Յ4 ⌬. Intraexaminer between-session reliability was good for both examiners (ICC ϭ 0.81) with MAD’s Յ2 ⌬. The COR for exam- iner 1, which was ~7 ⌬, and the corresponding limits of agreement are illustrated in Fig. 1. The interexaminer within-session reliabil- ity was excellent for session 1 (ICC ϭ 0.91) and good for session 2 (ICC ϭ 0.72). The COR’s were Ͻ9 ⌬, with the MAD’s Յ2.5 ⌬. Positive Fusional Vergence Break Table 2 shows the summary of reliability measures for PFV break. The within-session measurements for examiners 1 and 2 indicate different levels of intraexaminer reliability depending on the testing session. Both examiners had good session 1 reliability (ICC: 0.76 and 0.71) but excellent session 2 reliability (ICC: 0.94 and 0.93). Consistent with the ICC’s, for examiners 1 and 2, session 1 mean ranges and R95’s were higher (means: 5.30 ⌬ and 5.40 ⌬) than the corresponding session 2 values (means: 3.80 ⌬ and 2.45 ⌬). Intraexaminer between-session reliability was fair (ICC: 0.59 and 0.53) with COR’s of 14.07 ⌬ and 12 ⌬. The 95% TABLE 1. Near heterophoria.a Intraexaminer Within-Session Reliability Session 1 Session 2 Examiner 1 Mean 4.20 ⌬ XP 4.05 ⌬ XP Mean range 1.95 ⌬ 1.65 ⌬ R95 4.00 ⌬ 4.00 ⌬ ICC 0.95 0.95 Examiner 2 Mean 4.82 ⌬ XP 4.23 ⌬ XP Mean range 1.70 ⌬ 0.90 ⌬ R95 3.00 ⌬ 2.00 ⌬ ICC 0.97 0.99 Intraexaminer Between-Session Reliability ICC COR MAD E1S1 vs. E1S2 0.81 6.78 ⌬ 1.67 ⌬ E2S1 vs. E2S2 0.81 7.64 ⌬ 2.00 ⌬ Interexaminer Within-Session Reliability ICC COR MAD E1S1 vs. E2S1 0.91 4.86 ⌬ 1.33 ⌬ E1S2 vs. E2S2 0.72 8.86 ⌬ 2.50 ⌬ a For intraexaminer within-session reliability, mean is the av- erage of the 60 (20 patients times 3 measurements per patient) within-session measures, mean range is the average of the 20 individual patient ranges, R95 is the 95th percentile of those 20 ranges, and ICC is the intraclass correlation coefficient. For both the intraexaminer between-session and interexaminer within-ses- sion reliability, ICC is the intraclass correlation coefficient for the session means, COR is the coefficient of repeatability, which is 1.96 times the SD of the session differences, and MAD is the median absolute difference of those session differences. In cases where the COR value is asterisked, the 95th percentile of the absolute differences is being substituted. E1S1, examiner 1/ses- sion 1; E2S2, examiner 2/session 2. Classification of Convergence Insufficiency—Rouse et al. 257 Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 5. limits of agreement for examiner 2 are illustrated in Fig. 2. Inter- examiner within-session reliability was also fair. The ICC’s were 0.64 (session 1) and 0.53 (session 2), with COR’s of 10.30 ⌬ or higher. NPC Break The summary of reliability measures for NPC break is shown in Table 3. In all three comparisons, NPC break has excellent reli- ability, with ICC’s no lower than 0.86. Intraexaminer within- session reliability is especially high, with all ICC’s Ն0.94 and mean ranges Յ1.25 cm. The intraexaminer between-session reliability was excellent (ICC: 0.91 and 0.89) with MAD’s of ~1 cm. The limits of agreement for examiner 2 are illustrated in Fig. 3. This plot also shows a fairly strong positive trend (r ϭ 0.78) between the differences between the two measures and their means, indicating FIGURE 1. Examiner 1 between-session reliability on near heterophoria; the plot of the difference between the two session averages (session 2 Ϫ session 1) vs. the mean of those two averages. The lines at L ϭ Ϫ6.93 and U ϭ 6.63 show, respectively, the lower and upper 95% limits of agreement. TABLE 2. Positive fusional vergence break.a Intraexaminer Within-Session Reliability Session 1 Session 2 Examiner 1 Mean 22.10 ⌬ 24.10 ⌬ Mean range 5.30 ⌬ 3.80 ⌬ R95 8.00 ⌬ 8.00 ⌬ ICC 0.76 0.94 Examiner 2 Mean 22.78 ⌬ 19.06 ⌬ Mean range 5.40 ⌬ 2.45 ⌬ R95 12.00 ⌬ 6.00 ⌬ ICC 0.71 0.93 Intraexaminer Between-Session Reliability ICC COR MAD E1S1 vs. E1S2 0.59 14.07 ⌬ 3.67 ⌬ E2S1 vs. E2S2 0.53 12.00 ⌬* 4.00 ⌬ Interexaminer Within-Session Reliability ICC COR MAD E1S1 vs. E2S1 0.64 10.30 ⌬ 3.33 ⌬ E1S2 vs. E2S2 0.53 16.00 ⌬* 5.67 ⌬ a See notes for Table 1. FIGURE 2. Examiner 2 between-session reliability on PFV break; the plot of the difference between the two session averages (session 2 Ϫ session 1) vs. the mean of those two averages. The lines at L ϭ Ϫ12.00 and U ϭ 12.00 show, respectively, the lower and upper empirical 95% limits of agreement. TABLE 3. Nearpoint of convergence break.a Intraexaminer Within-Session Reliability Session 1 Session 2 Examiner 1 Mean 5.45 cm 5.72 cm Mean range 1.10 cm 0.80 cm R95 2.00 cm 2.00 cm ICC 0.98 0.98 Examiner 2 Mean 4.54 cm 5.68 cm Mean range 0.78 cm 1.25 cm R95 2.00 cm 3.00 cm ICC 0.98 0.94 Intraexaminer Between-Session Reliability ICC COR MAD E1S1 vs. E1S2 0.92 5.33 cm* 1.17 cm E2S1 vs. E2S2 0.89 5.00 cm* 1.00 cm Interexaminer Within-Session Reliability ICC COR MAD E1S1 vs. E2S1 0.86 4.43 cm 1.68 cm E1S2 vs. E2S2 0.97 2.55 cm 0.67 cm a See notes for Table 1. 258 Classification of Convergence Insufficiency—Rouse et al. Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 6. a tendency for the difference to increase with the NPC break. A similar pattern (not shown) is evident for examiner 1, but two highly influential outliers lower the correlation (r ϭ 0.01, but r ϭ 0.58 with the outliers excluded). The interexaminer within-session reliability was also excellent, with smaller COR’s than the intraex- aminer between-session reliability. Accommodative Amplitude Table 4 is a summary of the reliability measures for AA. Intraex- aminer within-session reliability is excellent with ICC’s Ն0.88, mean ranges Յ2.29 D, and R95 of 5.00 D in all cases. Intraexam- iner between-session reliability differed by examiner (0.82 vs. 0.69) with MAD’s of ~2 ⌬ or less. The limits of agreement for examiner 1 are illustrated in Fig. 4. The interexaminer within-session reli- ability was good (0.81 and 0.85), with slightly higher MAD’s and smaller COR’s than the intraexaminer between-session reliability. Positive Fusional Vergence Recovery and NPC Recovery Although PFV recovery and NPC recovery are not used in our diagnostic classification system, they typically are measured in a clinical assessment of binocular vision. In Tables 5 and 6, the summary of reliability information is provided for these binocular measures. DISCUSSION Our multifaceted data analysis approach provides different per- spectives on the issue of reliability or repeatability. The ICC is a reliability index ranging from zero to one regardless of the units of the measure under consideration. It readily allows for direct com- parison of reliability between different measurements. The ICC takes into account intrasubject and intersubject variability, but it does not directly convey the level of intrasubject variability. For example, the ICC was 0.81 for the between-examiner reliability NH (session 1). This means that 81% of the variability in these measurements is due to intersubject variability and only 19% is due to intrasubject variability. It is this intrasubject variability that is more clinically relevant to the practitioner. A more direct clinical summary of these data is provided by the MAD, which is 1.67 ⌬, and the COR, which is 6.78 ⌬. That is, the difference between NH taken 1 week apart would typically differ by Ͻ2 ⌬, but it would be FIGURE 3. Examiner 2 between-session reliability on NPC break; the plot of the difference between the two session averages (session 2 Ϫ session 1) vs. the mean of those two averages. The lines at L ϭ Ϫ5.00 and U ϭ 5.00 show, respectively, the lower and upper empirical 95% limits of agreement. TABLE 4. Accommodative amplitude.a Intraexaminer Within-Session Reliability Session 1 Session 2 Examiner 1 Mean 14.18 D 15.46 D Mean range 2.04 D 2.29 D R95 5.00 D 5.00 D ICC 0.88 0.90 Examiner 2 Mean 14.41 D 15.17 D Mean range 2.25 D 1.70 D R95 5.00 D 5.00 D ICC 0.90 0.95 Intraexaminer Between-Session Reliability ICC COR MAD E1S1 vs. E1S2 0.82 5.32 D 1.63 D E2S1 vs. E2S2 0.69 10.48 D 2.06 D Interexaminer Within-Session Reliability ICC COR MAD E1S1 vs. E2S1 0.81 4.13 D* 1.82 D E1S2 vs. E2S2 0.85 6.86 D 2.58 D a See notes for Table 1. FIGURE 4. Examiner 1 between-session reliability on accommodative amplitude; the plot of the difference between the two session averages (session 2 Ϫ session 1) vs. the mean of those two averages. The lines at L ϭ Ϫ4.04 and U ϭ 6.60 show, respectively, the lower and upper 95% limits of agreement. Classification of Convergence Insufficiency—Rouse et al. 259 Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 7. possible for the difference to be as large as ~7 ⌬. Fig. 1 shows these 95% limits of agreement. Any finding outside this range has only a 5% probability of being due to measurement error alone. We feel as others18 that the Bland-Altman approach gives a more relevant clinical picture of measurement error because it details the nature of the intrasubject variability. However, Bland and Altman have acknowledged the appropriateness of the ICC in reliability stud- ies.19 In addition, most of the older studies have used the standard product moment correlation coefficient (r) to evaluate reliability. The ICC is preferable because it is a measure of agreement between measures, whereas r is a measure of association. Because the ICC is usually close to r and always less than or equal to it, we are still able to draw comparisons between this and previous studies. Near Heterophoria The intraexaminer within-session reliability was found to be excellent (0.95 to 0.99). Hence, repeated measurements within a single testing session are very repeatable in children. The previous adult studies measuring intraexaminer reliability ranged from 0.81 to 0.88, with a MAD of ~2 ⌬.5, 6 Our intraexaminer between session reliability results (0.81) using the ICC are similar to these previous studies. Therefore, the clinician can expect typical differ- ences of ~2 ⌬ (MAD) but can measure differences as large as 6 to 7 ⌬ (COR). Our interexaminer reliability was also similar (0.91 and 0.72) to previous adult studies, which ranged from 0.75 to 0.94.5, 7 In general, children appear to respond as reliably as adults on this near heterophoria measure. However, most clinicians would prob- ably be uncomfortable with the large COR values for intraexam- iner reliability between sessions. Positive Fusional Vergence Intraexaminer within-session reliability varied between the two testing sessions, with session 1 being lower (0.76 and 0.71) and session 2 higher (0.94 and 0.93) than that reported by Feldman et al.12 (0.87). The mean ranges were higher for session 1, which resulted in the lower ICC’s. The initial testing session may have served as training, and the children may have learned to respond better to the PFV on the second testing session where the mean range and R95 are smaller. Our findings do indicate that the intraexaminer between-ses- sion and interexaminer within-session reliability is at best fair. The intraexaminer between-session reliability results are lower (0.59 and 0.53) than Brozek et al.10 (0.72). However, Brozek et al. (and Penisten et al.11 ) took their measurements on consecutive days, evaluated distance PFV, and used adult subjects, which makes direct comparison difficult. One might expect PFV at distance to be more stable, thus more repeatable than PFV at near where the accommodative-convergence relationship is more complex, al- though Penisten et al. found intrasubject variability to be lowest with the near PFV break. Based on our results, the clinician can expect typical differences of 3 to 4 ⌬, but can measure differences as large as 12 ⌬ on follow-up visits. The large differences may cause problems with accurately classifying patients as CI and monitoring treatment outcomes. It may also explain why some patients appear to have CI, based for example on Sheard’s criteria, and are asymp- TABLE 5. Positive fusional vergence recovery.a Intraexaminer Within-Session Reliability Session 1 Session 2 Examiner 1 Mean 6.72 ⌬ 9.08 ⌬ Mean range 5.70 ⌬ 4.75 ⌬ R95 13.00 ⌬ 8.00 ⌬ ICC 0.71 0.88 Examiner 2 Mean 6.47 ⌬ 5.78 ⌬ Mean range 5.15 ⌬ 3.55 ⌬ R95 13.00 ⌬ 7.00 ⌬ ICC 0.68 0.90 Intraexaminer Between-Session Reliability ICC COR MAD E1S1 vs. E1S2 0.27 16.54 ⌬ 3.00 ⌬ E2S1 vs. E2S2 0.50 12.15 ⌬ 4.00 ⌬ Interexaminer Within-Session Reliability ICC COR MAD E1S1 vs. E2S1 0.57 10.62 ⌬ 4.17 ⌬ E1S2 vs. E2S2 0.65 10.00 ⌬* 4.17 ⌬ a See notes for Table 1. TABLE 6. Nearpoint of convergence recovery.a Intraexaminer Within-Session Reliability Session 1 Session 2 Examiner 1 Mean 7.88 cm 8.33 cm Mean range 1.20 cm 1.10 cm R95 3.00 cm 2.00 cm ICC 0.97 0.98 Examiner 2 Mean 6.03 cm 7.38 cm Mean range 1.03 cm 1.25 cm R95 2.00 cm 3.00 cm ICC 0.97 0.97 Intraexaminer Between-Session Reliability ICC COR MAD E1S1 vs. E1S2 0.90 5.15 cm 1.00 cm E2S1 vs. E2S2 0.84 7.33 cm* 0.92 cm Interexaminer Within-Session Reliability ICC COR MAD E1S1 vs. E2S1 0.80 6.00 cm* 2.17 cm E1S2 vs. E2S2 0.96 2.70 cm* 1.00 cm a See notes for Table 1. 260 Classification of Convergence Insufficiency—Rouse et al. Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 8. tomatic or vise versa. Additionally, when evaluating the effects of vision therapy, a single examiner would need a change of 12 ⌬, whereas different examiners might need changes as large as 10 to 16 ⌬ to be confident that the change was real and not the result of measurement variability. The large PFV break differences could be due to children having more difficulty with the psychophysical aspects of this test. Chil- dren may be poorer observers, have more trouble understanding the instructions or expected endpoints, or be slower to respond than adults. Presently we are evaluating the PFV in adults to ad- dress this issue. If the large break differences are not due to subject age, then the differences may be related to the fusional vergence system being inherently variable over time. Nearpoint of Convergence The intraexaminer within-session reliability was found to be excellent (0.95 to 0.99) for the NPC break. Hence, the measure- ments within a single testing session are very repeatable in children. Regarding intraexaminer between-session reliability, the only pre- vious study10 with adult subjects reported fair intraexaminer reli- ability (ICC: 0.65). We found higher ICC values (0.92 and 0.89), suggesting that the NPC break is a reliable measure over time in children. The clinician can expect typical differences of ~1 cm, but differences as large as ~5 cm may be measured. One caveat is that patients with receded NPC’s (Ͼ6 cm) will generally have larger differences when tested over time. The results for the NPC recov- ery showed similar high ICC, suggesting it is also a reliable measure. Accommodative Amplitude The intraexaminer within-session reliability was found to be excellent (0.88 to 0.95) for the AA. Hence, the measurements within a single testing session are very repeatable in children. Our intraexaminer between-session results in children are consistent with the previous study by Chen and O’Leary15 in adults showing excellent (r ϭ 0.99) reliability. Our results show a higher level of reliability than Brozek’s adult study in which the ICC was 0.51.10 Based on our results, a clinician can expect typical within-ses- sion differences of ~2 D, but differences as large as ~5 D may be measured. These differences are difficult to compare with the re- sults reported in the two most often quoted adult studies.10, 13 These studies reported the typical patient would have a SD of about 0.75 D and, hence, a range of values of about Ϯ1.5 D. From this, Rosenfield and Cohen13 suggested that the typical difference between two AA measurements on the same subject would be within 1.5 D of each other. This conclusion is erroneous. The Ϯ1.5 D range understates the reliability of the AA measure, sug- gesting that the typical patient could have a range of AA measure- ments of actually ~3 D. Thus, the two AA measurements on such a patient could readily be more than 1.5 D (and up to 3 D) different and still be within the bounds of the natural variability of this patient. This 3 D difference for adults is lower than this study’s 5 D difference for children, which indicates adults tend to have better within-session reliability than children. In both of these previous adult studies, the intervening time between measurements varies from several hours to a day or more, whereas in our study, the between-session measurements were taken 1 week apart. Our within-session reliability is more compa- rable to the short-term repeatability results of these adult studies. There are no comparable studies for estimating the between-ses- sion differences that this study found. Based on our results, a clinician can expect typical between-session differences of ~5 D, but differences as large as ~5 to 10 D may be measured. The large COR values for intraexaminer reliability between sessions are be- yond the comfort level of most clinicians. CONCLUSIONS The study and analysis of measurement reliability is extensive and intricate, and different authors have divergent views on which methods are most appropriate. We elected to use a multifaceted approach in presenting our reliability results because there is no one accepted mode of analysis, and each method gives a different and useful perspective on the problem. The ICC, perhaps the most common index of reliability in the health science literature,4, 20 provides a method to compare the reliability of tests that have different units of measurement (in our case, tests using prism diopters, to lense diopters, to centimeters). We can view the relative reliability of the group of tests typically used in evaluating the syndrome diagnosis of CI. The ICC also allows us to compare our results with older literature that may have only used the correlation coefficient in their analysis. Three of the four measures (NH, NPC, and AA) often used in the classification of CI generally have good-to-excellent intraexaminer and interex- aminer reliability based on the ICC evaluation. The PFV break was found to have only fair intraexaminer and interexaminer reliability. A difficulty with the ICC is that its interpretation is problematic for the clinician. Knowing that the ICC for a test is 0.90 does not help the clinician with the question of “how much difference should I reasonably expect between two measurements of that same test?” The Bland-Altman approach provides a more clinician- friendly view of reliability. We have presented both the typical difference between measurements (mean range for within session and MAD for between sessions) and what the clinician may think of as the worst-case difference, or as we have described in the results section, “the difference can be as large as” (R95 within session or the COR between sessions). We feel that the clinician who routinely takes these binocular measurements on children will find the typical differences within session and between sessions to be in line with what they generally expect. See the summary in Table 7. The worst-case difference will be greater, and in some cases much greater (two to five times the typical differences) than those differences expected by that same clinician. These “worst-case” differences represent the maximum difference between measurements that a clinician would ever ob- serve on nearly all patients. It may be unfair to look at each new patient in light of the worse-case difference scenario. Most patients are close to typical, but of course, a few problematic patients are not! We suggest viewing a patient using the typical difference in most cases and asking the following question: would the diagnosis be altered if the observed measurement changed by as much as the typical differ- ence? What if it changed by as much as the worst-case difference? It is especially important to consider the worst-case differences when Classification of Convergence Insufficiency—Rouse et al. 261 Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 9. there are inconsistencies in the case findings; for example, when a patient with clinical findings supports the diagnosis of CI, but the patient has no or few symptoms, or when a patient presents with CI type symptoms, but has clinical findings that appear within accept- able limits. Unfortunately for the clinician treating and monitoring this condition, they will need to use the worst-case differences to feel confident that the changes that are being seen are not just natural variation in the between-session measurements. The large poten- tial test-retest differences found could complicate clinical decision- making in regards to diagnosis and treatment. Changes in the testing protocol used in this study, as well as other PFV procedures should be investigated in an attempt to improve both intraexam- iner and interexaminer reliability. APPENDIX von Graefe Near Heterophoria Test A table stand with phoropter (B and L style) was used. Risley prisms were marked in 2 ⌬ increments from 0 to 30 ⌬. The fixation target was a vertical column of 20/30 reduced Snellen letters. Illu- mination was provided by a floor-stand lamp with 100- to 150-ft- cd/m2 on the card face. The patient’s interpupillary distance was taken by pupillometer and dialed into the phoropter. The subject’s habitual distance refractive correction was placed in the phoropter if the subject was wearing glasses. Before testing, the subject was shown a two-picture demonstra- tion of the test responses. The first picture showed the initial pre- sentation. The subject was told, “First you will see two lines of letters with one being higher and to the right.” The second picture showed vernier alignment of the two lines. The subject was told, “The upper line of letters will flash on and off several times. Each time they come on tell me whether they are to the right, to the left, or directly above the lower letters as shown here.” The examiner then introduced 4 to 6 ⌬ base-up over the left eye (OS) for dissociation and 12 ⌬ base-in over the right eye (OD) for biasing. The subject was asked to “Please read the line of letters” first with the OS and then with the OD. The occluder was re- moved, and the subject was asked, “Do you see two lines of letters with one being higher and to the right (the subject’s right shoulder was tapped to reinforce the concept of ‘right’ direction).” If only one target was seen, the prisms were readjusted. If the subject was still unable to see the two targets, the suppressing eye was deter- mined and testing was stopped. The subject was then instructed to “Keep the letters as clear as you can. The upper letters will flash on and off several times. Each time they come on tell me whether they are to the right, to the left, or directly above the lower letters.” The RE was occluded, and then the RE was uncovered and recovered (~1 s flash exposure time). If the upper target was seen to the right, the RE prism was reduced by 4 ⌬ and the flashing was repeated until the upper target was first seen to the left. The prism was changed in 2 ⌬ increments until the subject reported alignment. The prism amount and direction of the deviation (eso, exo, and ortho) were recorded. The procedure was repeated two additional times, and results were recorded. von Graefe Fusional Vergence Tests The initial setup for the von Graefe fusional vergence measures was the same as for the von Graefe heterophoria measurement except the fixation target was a two letters by five letters block of 20/30 reduced Snellen letters. Before testing, the subject was shown a three-picture demon- stration of the test responses. The first picture had the block of letters in focus. The subject was told, “This is what I mean by clear.” The second picture had the block of letters photographically blurred to a level of 0.50 to 0.75 D to simulated the first sustained blur point. The subject was told, “This is what I mean by blurred.” The third picture had two blocks of letters one printed on paper, one overlaid on a plastic sheet. The subject was told, “This is what I mean by double (as the examiner slowly slides the targets apart). I will separate them a little more, like this... (As examiner slides the targets so they are distinctly separate). Then tell me when the two blocks come together into one block, like this... (As the examiner slides the two targets together until they are one).” The subject was positioned behind the phoropter and was in- structed to “Read aloud the letters in the top row of the block of letters you see in front of you.” Then the subject was instructed, “I want you to say ‘now or blur’ when the letters become blurred as I showed you in the picture.” Base-in prism was added at the ap- proximate rate of 4 ⌬/s. Once blur was reported or if no blur was reported, the subject was instructed, “I want you to say ‘now or double’ when the block of letters are seen as double as I showed you in the last picture.” Once again, base-in prism was added at the approximate rate of 4 ⌬/s until double vision was reported. Once diplopia was reported, the subject was instructed, “Now I want you to tell me when you see only one block of letters. The letters may either be clear or blurred, but they must be single.” Base-in prism was reduced at the approximate rate 4 ⌬/s until single vision was reported. The results were recorded in prism diopters for the blur, break, and recovery findings. A 20-s “wait” period was used between fusional vergence measurements. PFV was measured next using base-out prisms with instructions as above. In alternate order, NFV and then PFV were repeated two additional times and recorded. TABLE 7. Summary of the typical differences and worst-case differ- ences that clinicians might expect for the four binocular measures.a Typical Difference Worst-Case Difference Within Session Between Session Within Session Between Session NH 1–2 ⌬ 1–2 ⌬ 2–4 ⌬ 7–8 ⌬ PFV-break 2–5 ⌬ 4 ⌬ 6–12 ⌬ 12–14 ⌬ NPC 1 cm 1 cm 2–3 cm 5 cm AA 2 D 1–2 D 5 D 5–10 D a NH, near heterophoria; PFV-break, positive fusion vergence break point; NPC, near point of convergence; AA, accommoda- tive amplitude. Worst-case difference (within session ϭ R95, between session ϭ COR (or LoA); typical difference (within ses- sion ϭ mean range; between session ϭ MAD). 262 Classification of Convergence Insufficiency—Rouse et al. Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 10. Nearpoint of Convergence Test The accommodative rule was used with a fixation target consist- ing of a single vertical column of five 20/30 reduced Snellen letters. Subjects wore their habitual spectacle or contact lenses prescription during the testing. The subject was given the following instructions: “This proce- dure is designed to measure your ability to converge your eyes; that is, turn your eyes in toward your nose. Look directly at the line of letters on this card (examiner pointed to target card on accommo- dative rule) with both eyes open as the card is moved toward you. The image may appear to blur. That is okay. However, if you see the letter double (that is, split into two), say ‘two.’ I will then pull the card back. If you see the two images join into a single image again, say ‘one.’ This procedure will be repeated three times.” The examiner sat slightly to the side of the subject and viewed the subject from a slightly elevated position. The examiner held the accommodative rule in the horizontal position with the ruler po- sitioned against the middle of the subject’s forehead approximately 1 cm above the eyebrow line and tested convergence along the anterior-polar or z axis. The target card was started at approxi- mately 20 cm. The target card was moved along the accommoda- tive rule in a smooth linear manner at the rate of approximately 1 cm/s toward the subject. The slider was stopped when the subject’s eyes were observed to fail to converge or when the subject reported diplopia. The target card was stopped at this point, and the subject was asked if the images remained double. If the images remained double or the subject remained strabismic relative to the target, the centimeter position of the card on the accommodative rule was recorded. If the images became fused into one or the subject was observed to be bifixating the target, the target was moved toward the subject’s face until the eyes were objectively observed to fail to converge or the subject reported diplopia. This cycle was contin- ued until the subject remained diplopic or strabismic relative to the target. The target was then moved away from the subject at approxi- mately 1 cm/s until the eyes were observed to reestablish bifixation or the subject reported the two images fused into a single image. The centimeter position of the card on the accommodative rule was recorded. The target card was moved back to the starting position of 20 cm, and the subject was given a 10-s rest period before starting the next measurement. The procedure was repeated two additional times and recorded. Pushup Method for Monocular Accommodative Amplitude The accommodative rule was used with a fixation target consist- ing of a single vertical column of five 20/30 reduced Snellen letters. Subjects wore their habitual spectacle or contact lenses prescription during the testing. The subject was given the following instructions: “This proce- dure is designed to measure your ability to focus your eyes on a target that is slowly moved closer to your eyes. I am going to cover your left eye with this patch. Look directly at the line of letters on this card (pointing to the target card on the accommodative rule) as the card is moved toward you. Tell me when the target first be- comes blurry. This is what I mean by blurry (demo card shown). This procedure will be repeated three times.” The examiner sat in front of the subject and viewed the subject from a slightly elevated position. The examiner held the accommodative rule in the horizontal position with the ruler positioned against the subject’s forehead (approximately 1 cm above the eyebrow line) and above the right eye. The examiner tested the accommodative amplitude of the right eye only. The target card was started at approximately 20 cm. The target card was moved along the accommodative rule in a smooth linear manner at the rate of approximately 1 cm/s toward the subject’s face. The target was stopped when the subject first reported that the letters were blurry. The subject was then asked if the letters remained blurry or became clear. If the letters remained blurry, the centimeter position of the target on the accommodative rule was recorded. If the print became clear, the target was moved toward the subject until the print first became blurry. This cycle was continued until the subject reported a sustained blur. The target was moved back to the starting position of approximately 20 cm, and the subject was given a 10-s rest period before starting the next measurement. The procedure was repeated two additional times and recorded. ACKNOWLEDGMENTS The Convergence Insufficiency and Reading Study (CIRS) group is Michael W. Rouse, Leslie Hyman, Mohamed Hussein, Harold Solan, Eric Borsting, Susan Cotter, David Grisham, Leonard Press, and Mitchell Scheiman. Received June 26, 2000; revision received December 20, 2001. REFERENCES 1. Rouse MW, Borsting E, Hyman L, Hussein M, the Convergence Insufficiency and Reading Study (CIRS) group. Pilot study to evalu- ate convergence insufficiency in a school-aged population. Optom Vis Sci 1995;72(Suppl):218. 2. Rouse MW, Hyman L, Hussein M, Solan H, the Convergence Insuf- ficiency and Reading Study (CIRS) Group. Frequency of conver- gence insufficiency in optometry clinic settings. Optom Vis Sci 1998; 75:88–96. 3. Rouse MW, Borsting E, Hyman L, Hussein M, Cotter SA, Flynn M, Scheiman M, Gallaway M, De Land PN, the Convergence Insuffi- ciency and Reading Study (CIRS) Group. Frequency of convergence insufficiency among fifth and sixth graders. Optom Vis Sci 1999;76: 643–9. 4. Streiner DL, Norman GR. Health Measurement Scales: a Practical Guide to Their Development and Use. New York: Oxford University Press, 1995:104–27. 5. Hirsch MJ, Bing LB. The effect of testing method on values obtained for phoria at forty centimeters. Am J Optom Arch Am Acad Optom 1948;25:407–16. 6. Morgan MW. The reliability of clinical measurements with special reference to distance heterophoria. Am J Optom Arch Am Acad Op- tom 1955;32:167–79. 7. Rainey BB, Schroeder TL, Goss DA, Grosvenor TP. Inter-examiner repeatability of heterophoria tests. Optom Vis Sci 1998;75:719–26. 8. Larson WL. Vergence breaks with a stepping prism. Am J Optom Arch Am Acad Optom 1972;49:569–74. 9. Sheedy JE, Saladin JJ. Validity of diagnostic criteria and case analysis in binocular vision disorders. In: Schor CM, Ciuffreda KJ, eds. Ver- Classification of Convergence Insufficiency—Rouse et al. 263 Optometry and Vision Science, Vol. 79, No. 4, April 2002
  • 11. gence Eye Movements: Basic and Clinical Aspects. Boston: Butter- worth, 1983:517–40. 10. Brozek J, Simonson E, Bushard JW, Peterson HJ. Effects of practice and the consistency of repeated measurements of accommodation and vergence. Am J Ophthalmol 1948;31:191–8. 11. Penisten DK, Hofstetter HW, Goss DA. Reliability of rotary prism fusional vergence ranges. Optometry 2001;72:117–22. 12. Feldman JM, Cooper J, Carniglia P, Schiff FM, Skeete JN. Compar- ison of fusional ranges measured by Risley prisms, vectograms, and computer orthopter. Optom Vis Sci 1989;66:375–82. 13. Rosenfield M, Cohen AS. Repeatability of clinical measurements of the am- plitude of accommodation. Ophthalmic Physiol Opt 1996;16:247–9. 14. Bland JM, Altman DG. Statistical methods for assessing agreement be- tween two methods of clinical measurement. Lancet 1986;1:307–10. 15. Chen AH, O’Leary DJ. Validity and repeatability of the modified push-up method for measuring the amplitude of accommodation. Clin Exper Optom 1998;81:63–71. 16. Fleiss JL. The Design and Analysis of Clinical Experiments. New York: Wiley, 1986. 17. D’Augostino RB, Stevens MA. Tests for the Normal Distribution. New York: Marcel Dekker, 1986:367–419. 18. Zadnik K, Mutti DO, Bullimore MA. Use of statistics for comparing two measurement methods. Optom Vis Sci 1994;71:539–41. 19. Bland JM, Altman DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med 1990;20:337–40. 20. Shoukri MM, Pause CA. Statistical Methods for Health Sciences, 2nd ed. Boca Raton, FL: CRC Press, 1999:19–42. Michael W. Rouse Southern California College of Optometry 2575 Yorba Linda Blvd. Fullerton, California 92831 e-mail: mrouse@scco.edu 264 Classification of Convergence Insufficiency—Rouse et al. Optometry and Vision Science, Vol. 79, No. 4, April 2002