Daugherty Measuring Vergence Over Stereoscopic Video With A Remote Eye Tracker

Measuring Vergence Over Stereoscopic Video with a Remote Eye Tracker
Brian C. Daugherty† Andrew T. Duchowski†∗ Donald H. House†∗ Celambarasan Ramasamy‡
†
School of Computing & ‡ Digital Production Arts, Clemson University

Abstract The angle between the visual axes is the vergence angle. When a
person fixates a point at infinity, the visual axes are parallel and the
A remote eye tracker is used to explore its utility for ocular ver- vergence angle is zero. The angle increases when the eyes converge.
gence measurement. Subsequently, vergence measurements are For symmetrical convergence, the angle of horizontal vergence φ is
compared in response to anaglyphic stereographic stimuli as well related to the interocular distance a and the distance of the point of
as in response to monoscopic stimulus presentation on a standard fixation from a point midway between the eyes D by the expres-
display. Results indicate a highly significant effect of anaglyphic sion: tan (φ/2) = a/(2D). Thus, the change in vergence per unit
stereoscopic display on ocular vergence when viewing a stereo- change in distance is greater at near than at far viewing distances.
scopic calibration video. Significant convergence measurements
were obtained for stimuli fused in the anterior image plane. About 70% of a person’s normal range of vergence is used within
one meter from the eyes. The angle of vergence changes about
CR Categories: I.3.3 [Computer Graphics]: Picture/Im- 14◦ when gaze is moved from infinity to the nearest distance for
age Generation—Display algorithms; I.3.6 [Computer Graphics]: comfortable convergence at about 25 cm. Vergence changes about
Methodology and Techniques—Ergonomics; 36◦ when the gaze moves to the nearest point to which the eyes
can converge. About 90% of this total change occurs when the eyes
Keywords: eye tracking, stereoscopic rendering converge from 1 m.

1 Introduction & Background In this paper, vergence measurements are made over anaglyphic
stereo imagery. Although depth perception has been studied on
desktop 3D displays [Holliman et al. 2007], eye movements were
Since their 1838 introduction by Sir Charles Wheatstone [Lipton
not used to verify vergence. Holliman et al. conclude that depth
1982], stereoscopic images have appeared in a variety of forms,
judgment cannot always be predicted from display geometry alone.
including dichoptic stereo pairs (different image to each eye),
The only other similar effort we are aware of is measurement of in-
random-dot stereograms [Julesz 1964], autostereograms (e.g., the
terocular distance on a stereo display during rendering of a stereo
popular Magic Eye and Magic Eye II images), and anaglyphic im-
image at five different depths [Kwon and Shul 2006]. Interocular
ages and movies, with the latter currently resurging in popularity
distance was seen to range by about 10 pixels across three partici-
in American cinema (e.g., Monsters vs. Aliens, DreamWorks An-
pants. In this paper, we report observations on how binocular an-
imation and Paramount Pictures). An anaglyphic stereogram is a
gular disparity (of twelve participants) is affected when viewing an
composite image consisting of two colors and two slightly differ-
anaglyphic stereo video clip.
ent perspectives that produces a stereoscopic image when viewed
through two corresponding colored filters. Although the elicited
perception of depth appears to be effective, relatively little is known How can vergence be measured when viewing a dichoptic computer
about how similarly this effect may be on viewers’ eye movements. animation presented anaglyphically? To gauge the effect of stereo
Autostereograms, for example, are easily fused by some, but not by display on ocular vergence, it is sufficient to measure the disparity
others. between the left and right horizontal gaze coordinates, e.g., xr − xl
given the left and right gaze points, (xl , yl ), (xr , yr ) as delivered
When the eyes move through equal angles in opposite directions, by current binocular eye trackers. Thus far, to our knowledge, the
disjunctive movement, or vergence, is produced [Howard 2002]. In only such vergence measurements to have been carried out have
horizontal vergence, each visual axis moves within a plane con- been performed over random dot stereograms [Essig et al. 2004].
taining the interocular axis. When the visual axes move inwards,
the eyes converge; when the axes move outwards, they diverge. The question that we are attempting to address is whether the ver-
The convergent movement of the eyes (binocular convergence), gence angle can be measured from gaze point data captured by a
i.e., their simultaneous inward rotation toward each other (cf. di- (binocular) eye tracker. Of particular interest is the measure of rel-
vergence denotes the outward rotation), ensures that the projec- ative vergence, that is, the change in vergence from fixating a point
tion of images on the retina of both eyes are in registration with P placed some distance ∆d behind (or in front of) point F , the
each other, allowing the brain to fuse the images into a single per- point at which the visual axes converge at viewing distance D. The
cept. This fused percept provides stereoscopic vision of three- visual angle between P and F at the nodal point of the left eye is φl ,
dimensional space. Normal binocular vision is primarily charac- signed positive if P is to the right of the fixation point. The same
terized by this type of fusional vergence of the disparate retinal im- angle for the right eye is φr , signed in the same way. The binocular
ages [Shakhnovich 1977]. Vergence driven by retinal blur is distin- disparity of the images of F is zero, since each image is centered
guished as accommodative vergence [B¨ ttner-Ennever 1988].
u on each eye’s visual axis. The angular disparity η of the images of
∗ Email:
P is φl − φr . If θF is the binocular subtense of point F and θP is
{andrewd | dhouse}@cs.clemson.edu the binocular subtense of point P , then η = φl − φr = θP − θF .
Copyright © 2010 by the Association for Computing Machinery, Inc. Thus, the angular disparity between the images of a pair of objects
Permission to make digital or hard copies of part or all of this work for personal or
is the binocular subtense of one object minus the binocular subtense
classroom use is granted without fee provided that copies are not made or distributed
for commercial advantage and that copies bear this notice and the full citation on the of the other (see Figure 1).
first page. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on Given the binocular gaze point coordinates reported by the eye
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
tracker, (xl , yl ) and (xr , yr ), an estimate of η can be derived fol-
permissions@acm.org. lowing calculation of the distance ∆d between F and P , obtained
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

97

from about 5.3 to 7.3 cm) [Smith and Atchison 1997]. Vergence
is assumed to be symmetrical (although in our experiments no chin
rest was used and so the head was free to rotate, violating this as-
sumption; for a derivation of angular disparity under this condition
see Howard and Rogers [2002]). Viewing distance is assumed to
P be D = 50 cm, the operating range of the eye tracker, although the
tracker allows head movement within a 30 × 15 × 20 cm volume
(see below). It is also important to note that the above derivation
of angular disparity (vergence) assumes zero binocular disparity
when viewing point F at the screen surface. Empirical measure-
θP ments during calibration show that this assumption does not always
∆d hold, or it may be obscured by the noise inherent in the eye tracker’s
(x ,y ) position signal.
l l
F (x r ,yr ) 2 Methodology
A within-subjects experiment was conducted to test vergence mea-
φl surement. One of nine video clips served as the independent vari-
θF able in the analysis, a subset of a larger study. Two versions of the
φr
left eye video clip were used to test vergence response. The only difference
D between the two versions was that one was rendered in a standard
two-dimensional monoscopic format while the other was rendered
right eye in a red-cyan anaglyphic stereoscopic format. Ocular angular ver-
a gence response served as the dependent variable. The operational
hypothesis was simply that a significant difference in vergence re-
sponse would be observed between watching the monoscopic and
Figure 1: Binocular disparity of point P with respect to fixation stereoscopic versions of the video.
point F , at viewing distance D with (assumed) interocular distance
a [Howard and Rogers 2002]. Given the binocular gaze point co- 2.1 Apparatus
ordinates on the image plane (xl , yl ) and (xr , yr ) the distance be-
tween F and P , ∆d, is obtained via triangle similarity. Assuming A Tobii ET-1750 video-based corneal reflection (binocular) eye
symmetrical vergence and small disparities, angular disparity η is tracker was used for real-time gaze coordinate measurement (and
derived (see text). recording). The eye tracker operates at a sampling rate of 50 Hz
with an accuracy typically better than 0.3◦ over a ±20◦ horizontal
and vertical range using the pupil/corneal reflection difference [To-
via triangle similarity: bii Technology AB 2003] (in practice, measurement error ranges
roughly ± 10 pixels). The eye tracker’s 17 LCD monitor was
a xr − xl (xr − xl )D set to 1280 × 1024 resolution and the stimulus display was maxi-
= ⇒ ∆d = .
(D + ∆d) ∆d a − (xr − xl ) mized to cover the entire screen (save for its title bar at the top of
the screen). The eye tracking server ran on a dual 2.0 GHz AMD
For objects in the median plane of the head, φl = φr so the total Opteron 246 PC (2 G RAM) running Windows XP. The client dis-
disparity η is 2φ degrees. By elementary geometry, φ = θF − θP play application ran on a 2.2 GHz AMD Opteron 148 Sun Ultra 20
[Howard and Rogers 2002]. If the interocular distance is a, running the CentOS operating system. The client/server PCs were
connected via 1 Gb Ethernet (connected via a switch on the same
θP a θF a subnet). Participants initially sat at a viewing distance of about 50
tan = and tan = . cm from the monitor, the tracker video camera’s focal length.
2 2(D + ∆d) 2 2D

For small angles, the tangent of an angle is equal to the angle in 2.2 Participants
radians. Therefore,
Twelve college students (9 M, 3 F; ages 22-27) participated in the
a a −a∆d study, recruited verbally on a volunteer basis. Only three partici-
η = 2φ ≈ − or η ≈ 2 . (1)
2(D + ∆d) 2D D + D∆d pants had previously seen a stereoscopic film. Participants were not
screened for color blindness or impaired depth perception.
Since for objects within Panum’s fusional area ∆d is usually small
by comparison with D we can write
2.3 Stimulus
−a∆d
η≈ . (2) The anaglyphic stereogram used in this study was created with a red
D2
image for the left eye, and a cyan image for the right eye. Likewise,
Thus, for symmetrical vergence and small disparities, the disparity viewers of these images wore glasses with a red lens in front of the
between the images of a small object is approximately proportional left eye, and a cyan lens in front of the right. The distance between
to the distance in depth of the object from the fixation point. corresponding pixels in the red and cyan images creates an illusion
of depth when the composite image is fused together by the viewer.
In the current analysis, the following assumptions are made for sim-
plicity. Interocular distance is assumed to be the average separation Eight anaglyphic video clips were shown to participants, with a
between the eyes (a = 6.3 cm), i.e., the average for all people ninth rendered traditionally (monoscopically). All nine videos were
regardless of gender, although this can vary considerably (ranging computer-generated. The first of the anaglyphic videos was of a

98

(a) back visual plane (b) middle visual plane (c) front visual plane

Figure 2: Calibration video, showing a white disk visiting each of the four corners and the center of each visual plane, along with a viewer’s
gaze point (represented by a small rectangle) during visualization: (a) the disk appears to sink into the screen, (b) the disk appears at the
monocular, middle image plane; (c) the disk appears to “pop out” of the screen. The size of the gaze point rectangle is scaled to visually
depict horizontal disparity. A smaller rectangle, as in (a), represents divergence, while a larger rectangle, as in (c), represents convergence.

roving disk in three-dimensional space, as shown in Figure 2. The data be properly aligned with video frames over which eye move-
purpose of this video was calibration of vergence normalization, as ments were recorded (this is needed for subsequent visualization).
the stereoscopic depth of the roving disk matched the depth of the Neither data streams necessarily begin at the same time, nor are
other video clips. The goal was to elicit divergent eye movements they streamed at the same data rate. The video display library
as the disk sunk into and beyond the monocular image plane, and to used (xine-lib) provides media player style functionality (ver-
elicit convergent eye movements as the disk passed through and in sus video processing), and as such is liable to drop frames following
front of the monocular image plane. The roving disk moves within video stream decompression in order to maintain the desired play-
a cube, texture-mapped with a checkerboard texture to provide ad- back speed. All videos used in the experiment were created to run
ditional depth cues. The disk starts moving in the back plane. After at 25 frames per second. If no video frames were dropped, synchro-
stopping at all four corners and the center, the disk moves closer to nization is straightforward, since the eye tracker records data at 50
the viewer to the middle plane. The disk again visits each of the Hz, and relies mainly on identification of a common start point (the
four corners and the center, before translating to the front plane, eye tracker provides a timestamp that can be used for this purpose
where again each of the four corners and center is visited. Only the assuming both streams are initiated at about the same time, e.g., as
40 s calibration video clip is relevant to the analysis given in this controlled by the application).
paper. The video was always the first viewed by each participant.
For binocular vergence analysis, eye movement analysis is required
2.4 Procedure to both smooth the data, to reduce inherent noise due to eye move-
ment jitter, as well as to identify fixations within the eye movement
Demographic information consisting of the age and gender of each data stream. A simple and popular approach to denoising is the
participant was collected. Each participant filled out a short pre- use of a smoothing (averaging) filter. For visualization playback,
test questionnaire regarding his or her familiarity with anaglyphic the coordinate used is the linear average of the filter (of width 10
stereographs. A quick (two-dimensional) calibration of the eye frames, in the present case).
tracker was performed by having participants visually follow a rov-
ing dot between nine different locations on the screen. After 2D The use of a smoothing filter can introduce lag, depending on the fil-
calibration, participants were presented with a second, this time 3D ter’s temporal position within the data stream. If the filter is aligned
(stereoscopic) calibration video of the roving disk, translating in to compute the average of the last ten gaze data points, and the
2D as well as in depth. Next, participants were shown three short difference in timestamps between successive gaze points is 20 ms,
videos, either of the long videos (stereo or mono), three more short then the average gaze coordinate from the filter summation will be
videos, and once again the long video (stereo or mono). The order 100 ms behind. To alleviate this lag, the filter is temporally shifted
of presentation of the short videos followed a Latin square rotation. forward by half its length.
The order of the long videos was toggled for viewers such that all
odd-numbered viewers saw the stereoscopic version first, all even- Care in the filtering summation is taken to ignore invalid gaze data
numbered viewers saw the monoscopic version first. points. The eye tracker will, on occasion, e.g., due to blinks or
other reasons for loss of eye image in the tracker’s cameras, flag
Participants were instructed to keep looking at the center of the rov-
gaze data as invalid (a validity code is provided by the eye track-
ing calibration disk as it moved on the screen, during each of the
ing server). In addition to the validity code, gaze data is set to
2D and 3D calibrations. No other instructions were given to par-
(−1, −1), which, if folded into the smoothing filter’s summation,
ticipants as they viewed the other 9 stimulus video clips (a “free
would inappropriately skew the gaze centroid. Invalid data is there-
viewing” task was implied).
fore ignored in the filter’s summation, resulting in potentially fewer
gaze points considered for the average calculation. To avoid the
2.5 Data Stream Synchronization problem of the filter potentially being given only a few or no valid
points with which to compute the average, a threshold of 80% is
A major issue concerning gaze data analysis over dynamic me- used. If more than 80% of the filter’s data is invalid, then the filter’s
dia such as video is synchronization. It is imperative that gaze output is flagged as invalid and is not drawn.

99

3 Results Acknowledgments
This work was supported in part by IIS grant #0915085 from
Following recording of raw eye movement data, the collected gaze
the National Science Foundation (HCC: Small: Eye Movement in
points (xl , yl ), (xr , yr ) and timestamp t were analyzed to detect
Stereoscopic Displays, Implications for Visualization).
fixations in the data stream. The angular disparity between the left
and right gaze coordinates, given by equation (1), was calculated
for every gaze point that occurred during a fixation, as identified by References
an implementation of the position-variance approach, with a spatial
deviation threshold of 0.029 and number of samples set to 10. Note ¨
B UTTNER -E NNEVER , J. A., Ed. 1988. Neuroanatomy of the Ocu-
that the gaze data used in this analysis is normalized, hence the lomotor System. Reviews of Oculomotor Research, vol. II. Else-
deviation threshold specified is in dimensionless units although it is vier Press, Amsterdam, Holland.
typically expressed in pixels or degrees visual angle. The fixation
E SSIG , K., P OMPLUN , M., AND R ITTER , H. 2004. Application of
analysis code is freely available on the web.1 The angular disparity
a Novel Neural Approach to 3D Gaze Tracking: Vergence Eye-
serves as the dependent variable in the analysis of the experiment.
Movements in Autostereograms. In Proceedings of the Twenty-
Sixth Annual Meeting of the Cognitive Science Society, K. For-
Averaging across each of the three calibration segments, when the bus, D. Gentner, and T. Regier, Eds. Cognitive Science Society,
calibration stimulus was shown in each of the back, mid, and front 357–362.
stereo planes, with image plane and subject acting as fixed fac-
tors, repeated-measures (within-subjects) one-way ANOVA show H OLLIMAN , N., F RONER , B., AND L IVERSEDGE , S. 2007. An
a highly significant effect of stereo plane on vergence response Application Driven Comparison of Depth Perception on Desktop
(F(2,22) = 8.15, p < 0.01).2 Pairwise comparisons using t-tests 3D Displays. In Stereoscopic Displays and Applications XVIII.
with pooled SD indicate highly significant differences between dis- SPIE.
parities measured when viewing the front plane and each of the
back and mid planes (p < 0.01), but not between the back and mid H OWARD , I. P. 2002. Seeing in Depth. Vol. I: Basic Mechanisms.
planes, as shown in Figure 3. I Porteous, University of Toronto Press, Thornhill, ON, Canada.
H OWARD , I. P. AND ROGERS , B. J. 2002. Seeing in Depth. Vol.
II: Depth Perception. I Porteous, University of Toronto Press,
Mean Disparity During Calibration Thornhill, ON, Canada.
Mean Disparity (deg. visual angle, with SE)

1
J ULESZ , B. 1964. Binocular Depth Perception without Familiarity
0.9 Cues. Science 145, 3630 (Jul), 356–362.
0.8 K WON , Y.-M. AND S HUL , J. K. 2006. Experimental Researches
on Gaze-Based 3D Interaction to Stereo Image Display. In Edu-
0.7
tainment, Z. Pan et al., Ed. Springer-Verlag, Berlin, 1112–1120.
0.6 LNCS 3942.
0.5 L IPTON , L. 1982. Foundations of the Stereoscopic Cin-
ema: A Study in Depth. Van Nostrand Reinhold Com-
0.4
pany Inc., New York, NY. ISBN O-442-24724-9 , URL:
0.3 <http://www.stereoscopic.org>.
0.2 S HAKHNOVICH , A. R. 1977. The Brain and Regulation of Eye
Back Mid Front Movement. Plenum Press, New York, NY.
Stereo Plane
S MITH , G. AND ATCHISON , D. A. 1997. The Eye and Visual
Figure 3: Binocular disparity when viewing the stereoscopic cal- Optical Instrucments. Cambridge Univ. Press, Cambridge, UK.
ibration video, averaging over all viewers within ∼40 s viewing T OBII T ECHNOLOGY AB. 2003. Tobii ET-17 Eye-tracker Product
time split in thirds, i.e., when the calibration dot was seen in the Description. (Version 1.1).
back plane, the mid plane, and then in the front plane.

4 Conclusion

Results suggest that vergence is more active when viewing stereo-
scopic imagery than when no stereo disparity is present. Moreover,
a commercially available binocular eye tracker can be used to mea-
sure vergence via estimation of horizontal disparity between the left
and right gaze points recorded when fixating.

1 The position-variance fixation analysis code was originally

made available by LC Technologies. The C++ interface and im-
plementation ported from C by Mike Ashmore are available at:
<http://andrewd.ces.clemson.edu/courses/cpsc412/fall08>.
2 With sphericity assumed by R, the statistical package used throughout.

100

Daugherty Measuring Vergence Over Stereoscopic Video With A Remote Eye Tracker

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Daugherty Measuring Vergence Over Stereoscopic Video With A Remote Eye Tracker

Similar to Daugherty Measuring Vergence Over Stereoscopic Video With A Remote Eye Tracker (20)

More from Kalle

More from Kalle (20)

Daugherty Measuring Vergence Over Stereoscopic Video With A Remote Eye Tracker