A remote eye tracker is used to explore its utility for ocular vergence measurement. Subsequently, vergence measurements are compared in response to anaglyphic stereographic stimuli as well as in response to monoscopic stimulus presentation on a standard display. Results indicate a highly significant effect of anaglyphic stereoscopic display on ocular vergence when viewing a stereoscopic calibration video. Significant convergence measurements were obtained for stimuli fused in the anterior image plane.
2. from about 5.3 to 7.3 cm) [Smith and Atchison 1997]. Vergence
is assumed to be symmetrical (although in our experiments no chin
rest was used and so the head was free to rotate, violating this as-
sumption; for a derivation of angular disparity under this condition
see Howard and Rogers [2002]). Viewing distance is assumed to
P be D = 50 cm, the operating range of the eye tracker, although the
tracker allows head movement within a 30 × 15 × 20 cm volume
(see below). It is also important to note that the above derivation
of angular disparity (vergence) assumes zero binocular disparity
when viewing point F at the screen surface. Empirical measure-
θP ments during calibration show that this assumption does not always
∆d hold, or it may be obscured by the noise inherent in the eye tracker’s
(x ,y ) position signal.
l l
F (x r ,yr ) 2 Methodology
A within-subjects experiment was conducted to test vergence mea-
φl surement. One of nine video clips served as the independent vari-
θF able in the analysis, a subset of a larger study. Two versions of the
φr
left eye video clip were used to test vergence response. The only difference
D between the two versions was that one was rendered in a standard
two-dimensional monoscopic format while the other was rendered
right eye in a red-cyan anaglyphic stereoscopic format. Ocular angular ver-
a gence response served as the dependent variable. The operational
hypothesis was simply that a significant difference in vergence re-
sponse would be observed between watching the monoscopic and
Figure 1: Binocular disparity of point P with respect to fixation stereoscopic versions of the video.
point F , at viewing distance D with (assumed) interocular distance
a [Howard and Rogers 2002]. Given the binocular gaze point co- 2.1 Apparatus
ordinates on the image plane (xl , yl ) and (xr , yr ) the distance be-
tween F and P , ∆d, is obtained via triangle similarity. Assuming A Tobii ET-1750 video-based corneal reflection (binocular) eye
symmetrical vergence and small disparities, angular disparity η is tracker was used for real-time gaze coordinate measurement (and
derived (see text). recording). The eye tracker operates at a sampling rate of 50 Hz
with an accuracy typically better than 0.3◦ over a ±20◦ horizontal
and vertical range using the pupil/corneal reflection difference [To-
via triangle similarity: bii Technology AB 2003] (in practice, measurement error ranges
roughly ± 10 pixels). The eye tracker’s 17 LCD monitor was
a xr − xl (xr − xl )D set to 1280 × 1024 resolution and the stimulus display was maxi-
= ⇒ ∆d = .
(D + ∆d) ∆d a − (xr − xl ) mized to cover the entire screen (save for its title bar at the top of
the screen). The eye tracking server ran on a dual 2.0 GHz AMD
For objects in the median plane of the head, φl = φr so the total Opteron 246 PC (2 G RAM) running Windows XP. The client dis-
disparity η is 2φ degrees. By elementary geometry, φ = θF − θP play application ran on a 2.2 GHz AMD Opteron 148 Sun Ultra 20
[Howard and Rogers 2002]. If the interocular distance is a, running the CentOS operating system. The client/server PCs were
connected via 1 Gb Ethernet (connected via a switch on the same
θP a θF a subnet). Participants initially sat at a viewing distance of about 50
tan = and tan = . cm from the monitor, the tracker video camera’s focal length.
2 2(D + ∆d) 2 2D
For small angles, the tangent of an angle is equal to the angle in 2.2 Participants
radians. Therefore,
Twelve college students (9 M, 3 F; ages 22-27) participated in the
a a −a∆d study, recruited verbally on a volunteer basis. Only three partici-
η = 2φ ≈ − or η ≈ 2 . (1)
2(D + ∆d) 2D D + D∆d pants had previously seen a stereoscopic film. Participants were not
screened for color blindness or impaired depth perception.
Since for objects within Panum’s fusional area ∆d is usually small
by comparison with D we can write
2.3 Stimulus
−a∆d
η≈ . (2) The anaglyphic stereogram used in this study was created with a red
D2
image for the left eye, and a cyan image for the right eye. Likewise,
Thus, for symmetrical vergence and small disparities, the disparity viewers of these images wore glasses with a red lens in front of the
between the images of a small object is approximately proportional left eye, and a cyan lens in front of the right. The distance between
to the distance in depth of the object from the fixation point. corresponding pixels in the red and cyan images creates an illusion
of depth when the composite image is fused together by the viewer.
In the current analysis, the following assumptions are made for sim-
plicity. Interocular distance is assumed to be the average separation Eight anaglyphic video clips were shown to participants, with a
between the eyes (a = 6.3 cm), i.e., the average for all people ninth rendered traditionally (monoscopically). All nine videos were
regardless of gender, although this can vary considerably (ranging computer-generated. The first of the anaglyphic videos was of a
98
3. (a) back visual plane (b) middle visual plane (c) front visual plane
Figure 2: Calibration video, showing a white disk visiting each of the four corners and the center of each visual plane, along with a viewer’s
gaze point (represented by a small rectangle) during visualization: (a) the disk appears to sink into the screen, (b) the disk appears at the
monocular, middle image plane; (c) the disk appears to “pop out” of the screen. The size of the gaze point rectangle is scaled to visually
depict horizontal disparity. A smaller rectangle, as in (a), represents divergence, while a larger rectangle, as in (c), represents convergence.
roving disk in three-dimensional space, as shown in Figure 2. The data be properly aligned with video frames over which eye move-
purpose of this video was calibration of vergence normalization, as ments were recorded (this is needed for subsequent visualization).
the stereoscopic depth of the roving disk matched the depth of the Neither data streams necessarily begin at the same time, nor are
other video clips. The goal was to elicit divergent eye movements they streamed at the same data rate. The video display library
as the disk sunk into and beyond the monocular image plane, and to used (xine-lib) provides media player style functionality (ver-
elicit convergent eye movements as the disk passed through and in sus video processing), and as such is liable to drop frames following
front of the monocular image plane. The roving disk moves within video stream decompression in order to maintain the desired play-
a cube, texture-mapped with a checkerboard texture to provide ad- back speed. All videos used in the experiment were created to run
ditional depth cues. The disk starts moving in the back plane. After at 25 frames per second. If no video frames were dropped, synchro-
stopping at all four corners and the center, the disk moves closer to nization is straightforward, since the eye tracker records data at 50
the viewer to the middle plane. The disk again visits each of the Hz, and relies mainly on identification of a common start point (the
four corners and the center, before translating to the front plane, eye tracker provides a timestamp that can be used for this purpose
where again each of the four corners and center is visited. Only the assuming both streams are initiated at about the same time, e.g., as
40 s calibration video clip is relevant to the analysis given in this controlled by the application).
paper. The video was always the first viewed by each participant.
For binocular vergence analysis, eye movement analysis is required
2.4 Procedure to both smooth the data, to reduce inherent noise due to eye move-
ment jitter, as well as to identify fixations within the eye movement
Demographic information consisting of the age and gender of each data stream. A simple and popular approach to denoising is the
participant was collected. Each participant filled out a short pre- use of a smoothing (averaging) filter. For visualization playback,
test questionnaire regarding his or her familiarity with anaglyphic the coordinate used is the linear average of the filter (of width 10
stereographs. A quick (two-dimensional) calibration of the eye frames, in the present case).
tracker was performed by having participants visually follow a rov-
ing dot between nine different locations on the screen. After 2D The use of a smoothing filter can introduce lag, depending on the fil-
calibration, participants were presented with a second, this time 3D ter’s temporal position within the data stream. If the filter is aligned
(stereoscopic) calibration video of the roving disk, translating in to compute the average of the last ten gaze data points, and the
2D as well as in depth. Next, participants were shown three short difference in timestamps between successive gaze points is 20 ms,
videos, either of the long videos (stereo or mono), three more short then the average gaze coordinate from the filter summation will be
videos, and once again the long video (stereo or mono). The order 100 ms behind. To alleviate this lag, the filter is temporally shifted
of presentation of the short videos followed a Latin square rotation. forward by half its length.
The order of the long videos was toggled for viewers such that all
odd-numbered viewers saw the stereoscopic version first, all even- Care in the filtering summation is taken to ignore invalid gaze data
numbered viewers saw the monoscopic version first. points. The eye tracker will, on occasion, e.g., due to blinks or
other reasons for loss of eye image in the tracker’s cameras, flag
Participants were instructed to keep looking at the center of the rov-
gaze data as invalid (a validity code is provided by the eye track-
ing calibration disk as it moved on the screen, during each of the
ing server). In addition to the validity code, gaze data is set to
2D and 3D calibrations. No other instructions were given to par-
(−1, −1), which, if folded into the smoothing filter’s summation,
ticipants as they viewed the other 9 stimulus video clips (a “free
would inappropriately skew the gaze centroid. Invalid data is there-
viewing” task was implied).
fore ignored in the filter’s summation, resulting in potentially fewer
gaze points considered for the average calculation. To avoid the
2.5 Data Stream Synchronization problem of the filter potentially being given only a few or no valid
points with which to compute the average, a threshold of 80% is
A major issue concerning gaze data analysis over dynamic me- used. If more than 80% of the filter’s data is invalid, then the filter’s
dia such as video is synchronization. It is imperative that gaze output is flagged as invalid and is not drawn.
99
4. 3 Results Acknowledgments
This work was supported in part by IIS grant #0915085 from
Following recording of raw eye movement data, the collected gaze
the National Science Foundation (HCC: Small: Eye Movement in
points (xl , yl ), (xr , yr ) and timestamp t were analyzed to detect
Stereoscopic Displays, Implications for Visualization).
fixations in the data stream. The angular disparity between the left
and right gaze coordinates, given by equation (1), was calculated
for every gaze point that occurred during a fixation, as identified by References
an implementation of the position-variance approach, with a spatial
deviation threshold of 0.029 and number of samples set to 10. Note ¨
B UTTNER -E NNEVER , J. A., Ed. 1988. Neuroanatomy of the Ocu-
that the gaze data used in this analysis is normalized, hence the lomotor System. Reviews of Oculomotor Research, vol. II. Else-
deviation threshold specified is in dimensionless units although it is vier Press, Amsterdam, Holland.
typically expressed in pixels or degrees visual angle. The fixation
E SSIG , K., P OMPLUN , M., AND R ITTER , H. 2004. Application of
analysis code is freely available on the web.1 The angular disparity
a Novel Neural Approach to 3D Gaze Tracking: Vergence Eye-
serves as the dependent variable in the analysis of the experiment.
Movements in Autostereograms. In Proceedings of the Twenty-
Sixth Annual Meeting of the Cognitive Science Society, K. For-
Averaging across each of the three calibration segments, when the bus, D. Gentner, and T. Regier, Eds. Cognitive Science Society,
calibration stimulus was shown in each of the back, mid, and front 357–362.
stereo planes, with image plane and subject acting as fixed fac-
tors, repeated-measures (within-subjects) one-way ANOVA show H OLLIMAN , N., F RONER , B., AND L IVERSEDGE , S. 2007. An
a highly significant effect of stereo plane on vergence response Application Driven Comparison of Depth Perception on Desktop
(F(2,22) = 8.15, p < 0.01).2 Pairwise comparisons using t-tests 3D Displays. In Stereoscopic Displays and Applications XVIII.
with pooled SD indicate highly significant differences between dis- SPIE.
parities measured when viewing the front plane and each of the
back and mid planes (p < 0.01), but not between the back and mid H OWARD , I. P. 2002. Seeing in Depth. Vol. I: Basic Mechanisms.
planes, as shown in Figure 3. I Porteous, University of Toronto Press, Thornhill, ON, Canada.
H OWARD , I. P. AND ROGERS , B. J. 2002. Seeing in Depth. Vol.
II: Depth Perception. I Porteous, University of Toronto Press,
Mean Disparity During Calibration Thornhill, ON, Canada.
Mean Disparity (deg. visual angle, with SE)
1
J ULESZ , B. 1964. Binocular Depth Perception without Familiarity
0.9 Cues. Science 145, 3630 (Jul), 356–362.
0.8 K WON , Y.-M. AND S HUL , J. K. 2006. Experimental Researches
on Gaze-Based 3D Interaction to Stereo Image Display. In Edu-
0.7
tainment, Z. Pan et al., Ed. Springer-Verlag, Berlin, 1112–1120.
0.6 LNCS 3942.
0.5 L IPTON , L. 1982. Foundations of the Stereoscopic Cin-
ema: A Study in Depth. Van Nostrand Reinhold Com-
0.4
pany Inc., New York, NY. ISBN O-442-24724-9 , URL:
0.3 <http://www.stereoscopic.org>.
0.2 S HAKHNOVICH , A. R. 1977. The Brain and Regulation of Eye
Back Mid Front Movement. Plenum Press, New York, NY.
Stereo Plane
S MITH , G. AND ATCHISON , D. A. 1997. The Eye and Visual
Figure 3: Binocular disparity when viewing the stereoscopic cal- Optical Instrucments. Cambridge Univ. Press, Cambridge, UK.
ibration video, averaging over all viewers within ∼40 s viewing T OBII T ECHNOLOGY AB. 2003. Tobii ET-17 Eye-tracker Product
time split in thirds, i.e., when the calibration dot was seen in the Description. (Version 1.1).
back plane, the mid plane, and then in the front plane.
4 Conclusion
Results suggest that vergence is more active when viewing stereo-
scopic imagery than when no stereo disparity is present. Moreover,
a commercially available binocular eye tracker can be used to mea-
sure vergence via estimation of horizontal disparity between the left
and right gaze points recorded when fixating.
1 The position-variance fixation analysis code was originally
made available by LC Technologies. The C++ interface and im-
plementation ported from C by Mike Ashmore are available at:
<http://andrewd.ces.clemson.edu/courses/cpsc412/fall08>.
2 With sphericity assumed by R, the statistical package used throughout.
100