Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks

Match-Moving for Area-Based Analysis of Eye Movements in Natural Tasks
Wayne J. Ryan Andrew T. Duchowski ∗ Ellen A. Vincent Dina Battisto
School of Computing School of Computing Department of Horticulture Department of Architecture
Clemson University

(a) Headgear. (b) Complete device. (c) User interface.

Figure 1: Our do-it-yourself wearable eye tracker (a) from off-the-shelf components (b) and the graphical user interface (c) featuring VCR
controls for frame advancement and match-moving search boxes for dynamic object tracking.

Abstract of generally unconstrained eye, head, and hand movements. The
Analysis of recordings made by a wearable eye tracker is compli- most common eye tracking metrics sought include the number of
cated by video stream synchronization, pupil coordinate mapping, fixations, fixation durations, and number and duration of fixations
eye movement analysis, and tracking of dynamic Areas Of Interest per Area Of Interest, or AOI, among several others [Megaw and
(AOIs) within the scene. In this paper a semi-automatic system is Richardson 1979; Jacob and Karn 2003]. Longer fixations gener-
developed to help automate these processes. Synchronization is ac- ally indicate greater cognitive processing of the fixated area and the
complished via side by side video playback control. A deformable fixation and percentage of fixation time devoted to a particular area
eye template and calibration dot marker allow reliable initialization may indicate its saliency [Webb and Renshaw 2008].
via simple drag and drop as well as a user-friendly way to correct Complications in analysis arise in synchronization of the video
the algorithm when it fails. Specifically, drift may be corrected by streams, mapping of eye position in the eye image frame to the
nudging the detected pupil center to the appropriate coordinates. In point of gaze in the scene image frame, distinction of fixations
a case study, the impact of surrogate nature views on physiological from saccades within the raw gaze point data stream, and determi-
health and perceived well-being is examined via analysis of gaze nation of the frame-to-frame location of dynamic AOIs within the
over images of nature. A match-moving methodology was devel- scene video stream. Most previous work relied on manual video
oped to track AOIs for this particular application but is applicable frame alignment as well as manual (frame-by-frame) classification
toward similar future studies. of eye movements. In this paper, a semi-automatic system is de-
CR Categories: I.3.6 [Computer Graphics]: Methodology and veloped to help automate these processes, inspired by established
Techniques—Ergonomics; J.4 [Computer Applications]: Social computer graphics methods primarily employed in video composit-
and Behavioral Sciences—Psychology. ing. The system consists of video synchronization, calibration dot
and limbus tracking (as means of estimation of parameters for gaze
Keywords: eye tracking, match moving point mapping and of the pupil center, respectively), fixation detec-
tion via a signal analysis approach independent of video frame rate,
1 Introduction and AOI tracking for eventual statistical analysis of fixations within
Buswell’s [1935] seminal exploration of eye gaze over complex AOIs. Tracking of the calibration dot and of AOIs is achieved by
scenes, e.g., photographs of paintings, patterns, architecture, and implementation of a simple 2D variant of match-moving [Paolini
interior design, helped influence development of techniques for 2006], a technique used for tracking markers in film, primarily
recording and analysis of human eye movements during perfor- to facilitate compositing of special effects (e.g., texture mapping
mance of natural tasks. Wearable eye trackers allow collection computer-generated elements atop principal photography). The re-
of eye movements in natural situations, usually involving the use sult is a semi-automatic approach akin to keyframing to set the lo-
cation of markers over scene elements in specific video frames and
∗ e-mail: {wryan | andrewd}@cs.clemson.edu inbetweening their location coordinates (usually) by linear interpo-
Copyright © 2010 by the Association for Computing Machinery, Inc.
lation.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
A case study is presented where eye movements are analyzed
for commercial advantage and that copies bear this notice and the full citation on the over images viewed in a hospital setting. The analysis is part of
first page. Copyrights for components of this work owned by others than ACM must be an experiment conducted to better understand the potential health
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on benefits of images of nature toward patient recovery. Although de-
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
scriptive statistics of gaze locations over AOIs are underwhelming,
permissions@acm.org. the given methodology is applicable toward similar future studies.
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

235

2 Background dated (by today’s standards) video recording equipment (a Sony
DCR-TRV19 DVR). Nevertheless, the system served its purpose in
Classic work analyzing eye movements during performance of a
fostering a nascent open-source approach to (wearable) eye track-
well-learned task in a natural setting (making tea) aimed to deter-
ing that is still influential today.
mine the pattern of fixations and to classify the types of monitoring
action that the eyes perform [Land et al. 1999]. A head-mounted Recent work from the same research lab advanced the analyti-
eye-movement video camera was used that provided a continuous cal capability of the wearable eye tracker in two important ways
view of the scene ahead, with a dot indicating foveal direction with [Munn et al. 2008]. First, fixation detection was used to analyze
an accuracy of about 1◦ . Results indicated that even automated rou- raw eye movement data. Second, a method was presented for track-
tine activities require a surprising level of continuous monitoring. ing objects in the scene. Both developments are somewhat similar
Land et al. concluded that although the actions of tea-making are to what is presented in this paper, but with significant distinctions.
‘automated’ and proceed with little conscious involvement, the eyes First, the prior method of fixation detection is tied to the eye video
closely monitor every step of the process. This type of unconscious frame rate. In this paper we show that eye movement analysis is in-
attention must be a common phenomenon in everyday life. dependent of frame rate, insofar as it operates on the eye movement
Relations of eye and hand movements in extended food prepara- data stream (x, y,t) where the timestamp of each gaze coordinate
tion tasks have also been examined [Land and Hayhoe 2001]. The is encoded in the data tuple. This form of analysis is not new, in
task of tea-making was compared to the task of making peanut but- fact the analysis code was originally developed commercially (by
ter and jelly sandwiches. In both cases the location of foveal gaze LC Technologies), made publicly available, and successfully used
was monitored continuously using a head-mounted eye tracker with in at least one instance [Freed 2003]. Second, the prior technique
an accuracy of about 1◦ , with the head free to move. The eyes usu- for scene object tracking uses structure from motion to compute 3D
ally reached the next object in the task sequence before any sign information [Munn and Pelz 2008]. We show that such complex
of manipulative action, indicating that eye movements are planned computation is unnecessary and a simple 2D translational tracker is
into the motor pattern and lead each action. Eye movements during sufficient. This form of tracking, known as match-moving, is also
this kind of task are nearly all to task-relevant objects, and thus their well established in the practice of video compositing.
control is seen primarily ‘top-down’, and influenced very little by
the ‘intrinsic salience’ of objects.
3 Technical Development
Examination of short-term memory in the course of a natural Hardware design follows the description of Li [2006], with mini-
hand-eye task [Ballard et al. 1995], showed employment of deic- mal modifications. The apparatus is constructed entirely from in-
tic primitives through serialization of the task with eye movements expensive commercial off-the-shelf (COTS) components (see Fig-
(e.g., using the eyes to “point to” scene objects in lieu of memo- ures 1 and 2). The entire parts list for the device includes one pair
rizing all of the objects’ positions and other properties). A head of safety glasses (AOSafety X-Factor XF503), a more comfortable
mounted eye tracker was used to measure eye movements over a nose piece of a second pair of plastic sunglasses (AOSafety I-Riot
three-dimensional physical workplace block display. By recording 90714), black polyester braided elastic for wrapping the wires, two
eye movements during a block pick-and-place task, it was shown screws to connect the scene camera bracket and nose piece, a small
that subjects frequently directed gaze to the model pattern before aluminum or brass rod for mounting the eye camera, and two digital
arranging blocks in the workspace area. This suggests that infor- video minicams.
mation is acquired incrementally during the task and is not acquired The two digital video mini-camcorders used are the Camwear
in toto at the beginning of the tasks. That is, subjects appeared to Model 200 from DejaView [Reich et al. 2004]. Each De-
use short-term memory frugally, acquiring information just prior jaView wearable digital mini-camcorder uses the NW901 MPEG-4
to its use, and did not appear to memorize the entire model block CODEC from Divio, Inc., enabling MPEG-4 video recording at 30
configuration before making a copy of the block arrangement. fps. Each DejaView camera’s field of view subtends 60◦ .
In a similar block-moving experiment, horizontal movements of The DejaView camera is connected via flexible cable to the
gaze, head, and hand were shown to follow a coordinated pattern recorder box, which comes with a belt clip for hands-free use, but
[Smeets et al. 1996]. A shift of gaze was generally followed by lacks an LCD display. Video is recorded on a 512MB SD mini
a movement of the head, which preceded the movement of the disk. After recording, the video may be transferred to a computer
hand. This relationship is to a large extent task-dependent. In for offline processing. The DejaView mini-camcorders do not sup-
goal-directed tasks in which future points of interest are highly pre- port transfer of video while recording, precluding online process-
dictable, while gaze and head movements may decouple, the actual ing. The lack of an LCD display also prevents verification of correct
position of the hand is a likely candidate for the next gaze shift. camera positioning until after the recording is complete.
Up to this point, while significant in its contributions to vision
research, the analysis employed in the above examples was often
based manual inspection of video frames.
Relatively recently, intentionally-based, termed “look-ahead”,
eye movements were reported [Pelz et al. 2000]. A commercially
available wearable eye tracker from ASL was worn on the head with
a computer carried in a backpack. Subsequently, a custom-built
wearable eye tracker was assembled with off-the-shelf components,
initiating an open-source movement to develop practical eye track-
ing software and hardware [Babcock and Pelz 2004]. Tips were
provided on constructing the tracker, opening the door to open-
source software development. This corneal reflection eye tracker,
mainly constructed from inexpensive components (a Radio Shack
parts list was made available at one time), was one of the first Do-It-
Yourself eye trackers, but suffered from two significant problems.
First, it required the inclusion of one expensive video component,
namely a video multiplexer, used to synchronize the video feeds of
the scene and eye cameras. Second, the system relied on somewhat Figure 2: Eye tracker assembly [Ryan et al. 2008] © ACM 2008.

236

Figure 3: Screen flash for synchronization visible as eye reflection. Figure 4: Initialization of pupil/limbus and dot tracking.

No IR illumination is used, simplifying the hardware and reduc- by mapping the pupil center (x, y) to scene coordinates (sx , sy ) via
ing cost. The eye-tracker functions in environments with signifi- a second order polynomial [Morimoto and Mimica 2005],
cant ambient IR illumination (e.g., outdoors on a sunny day, see sx = a0 + a1 x + a2 y + a3 xy + a4 x2 + a5 y2
Ryan et al. [2008]). However, lacking a stable corneal reflection
and visible spectrum filtering, video processing is more challeng- sy = b0 + b1 x + b2 y + b3 xy + b4 x2 + b5 y2 . (1)
ing. Specular reflections often occlude the limbus and contrast at
the pupil boundary is inconsistent. The unknown parameters ak and bk are computed via least squares
fitting (e.g., see Lancaster and Šalkauskas [1986]).
3.1 Stimulus for Video Processing 3.3.1 Initialization of Pupil/Limbus and Dot Tracking
For video synchronization and calibration, a laptop computer is Pupil center in the eye video stream and calibration dot in the scene
placed in front of the participant. To synchronize the two videos video stream are tracked by different local search algorithms, both
a simple program that flashes the display several times is executed. initialized by manually positioning a template over recognizable
Next, a roving dot is displayed for calibration purposes. The par- eye features and a crosshair over the calibration dot. Grip boxes
ticipant is asked to visually track the dot as it moves. The laptop allow for adjustment of the eye template (see Figure 4). During
display is then flashed again to signify the end of calibration. For initialization, only one playback control is visible, controlling ad-
good calibration the laptop display should appear entirely within the vancement of both video streams. It may be necessary to advance
scene image frame, and should span most of the frame. After cali- to the first frame with a clearly visible calibration dot. Subsequent
bration the laptop is moved away and the participant is free to view searches exploit temporal coherence by using the previous search
the scene normally. After a period of time (in this instance about result as the starting location.
two minutes) the recording is stopped and video collection is com-
plete. All subsequent processing is then carried out offline. Note 3.3.2 Dot Tracking
that during recording it is impossible to judge camera alignment. A simple greedy algorithm is used to track the calibration dot. The
Poor camera alignment is the single greatest impediment toward underlying assumption is that the dot is a set of bright pixels sur-
successful data processing. rounded by darker pixels (see Figure 4). The sum of differences is
largest at a bright pixel surrounded by dark pixels. The dot moves
3.2 Synchronization from one location to the next in discrete steps determined by the
Video processing begins with synchronization. Synchronization refresh rate of the display. To the human eye this appears as smooth
is necessary because the two cameras might not begin recording motion, but in a single frame of video it appears as a short trail of
at precisely the same time. This situation would be alleviated if multiple dots. To mitigate this effect the image is blurred with a
the cameras could be synchronized via hardware or software con- Gaussian smoothing function, increasing the algorithm’s tolerance
trol (e.g., via IEEE 1394 bus control). In the present case, no to variations in dot size. In the present application the dot radius
such mechanism was available. As suggested previously [Li and was roughly 3 to 5 pixels in the scene image frame.
Parkhurst 2006], a flash of light visible in both videos is used as The dot tracking algorithm begins with an assumed dot location
a marker. Using the marker an offset necessary for proper frame obtained from the previous frame of video, or from initialization. A
alignment is established. In order to find these marker locations in sum of differences is evaluated over an 8×8 reference window:
the two video streams, they are both displayed side by side, each
with its own playback control. The playback speed is adjustable in ∑ ∑ I(x, y) − I(x − i, y − j), −8 < i, j < 8. (2)
i j
forward and reverse directions. Single frame advance is also pos-
sible. To synchronize the videos, the playback controls are used This evaluation is repeated over a 5×5 search field centered at the
to manually advance/rewind each video to the last frame where the assumed location (x, y). If the assumed location yields a maximum
light flash is visible (see Figure 3). within the 25 pixel field then the algorithm stops. Otherwise the
location with the highest sum of differences becomes the new as-
3.3 Calibration & Gaze Point Mapping
sumed location and the computation is repeated.
Pupil center coordinates are produced by a search algorithm exe- One drawback of this approach is that the dot is not well tracked
cuted over eye video frames. The goal is to map the pupil center near the edge of the laptop display. Reducing the search field and
to gaze coordinates in the corresponding scene video frame. Cali- reference window allows better discrimination between the dot and
bration requires sequential viewing of a set of spatially distributed display edges while reducing the tolerance to rapid dot movement.
calibration points with known scene coordinates. Once calibration
is complete the eye is tracked and gaze coordinates are computed 3.4 Pupil/Limbus Tracking
for the remainder of the video. A traditional video-oculography ap- A two-step process is used to locate the limbus (iris-sclera bound-
proach [Pelz et al. 2000; Li et al. 2006] calculates the point of gaze ary) and hence pupil center in an eye image. First, feature points are

237

(a) constrained ray origin and extent (b) constrained rays and fit ellipse

Figure 5: Constrained search for limbic feature points: (a) con-
strained ray origin and termination point; (b) resultant rays, fitted
ellipse, and center. For clarity of presentation only 36 rays are dis- Figure 6: Display of fitted ellipse and computed gaze point.
played in (b), in practice 360 feature points are identified.

ellipse on the screen. If the user observes drift in the computed el-
detected. Second, an ellipse is fit to the feature points. The ellipse lipse the center may be nudged to the correct location using a simple
center is a good estimate of the pupil center. drag and drop action.
These strategies are analogous to traditional keyframing opera-
3.4.1 Feature Detection tions, e.g., when match-moving. If a feature tracker fails to track
The purpose of feature detection is to identify point locations on the a given pixel pattern, manual intervention is required at specific
limbus. We use a technique similar to Starburst [Li et al. 2005]. A frames. The result is a semi-automatic combination of manual
candidate feature point is found by casting a ray R away from an trackbox positioning and automatic trackbox translation. Although
origin point O and terminating the ray as it exits a dark region. We not as fast as a fully automatic approach, this is still considerably
determine if the ray is exiting a dark region by checking the gradi- better than the fully manual, frame-by-frame alternative. A screen-
ent magnitude collinear with the ray. The location with maximum shot of the user interface is shown in Figure 6.
collinear gradient component max ∇ is recorded as a feature point. 3.4.4 Tracking Accuracy
Starburst used a fixed threshold value rather than the maximum and
did not constrain the length of the rays. The DejaView camera has approximately a 60◦ field of view, with
Consistent and accurate feature point identification and selection video resolution of 320×240. Therefore a simple multiplication by
is critical for stable and accurate eye-tracking. Erroneous feature 0.1875 converts our measurement in pixels of Euclidean distance
points are often located at the edges of the pupil, eyelid, or at a spec- between gaze point and calibration coordinates to degrees visual
ular reflection. To mitigate these effects the feature point search angle. Using this metric, the eye tracker’s horizontal accuracy is
area is constrained by further exploiting temporal coherence. The better than 2◦ , on average [Ryan et al. 2008]. Vertical and horizon-
limbic boundary is not expected to move much from one frame to tal accuracy is roughly equivalent.
the next, therefore it is assumed that feature points will be near the 3.5 Fixation Detection
ellipse E identified in the previous frame. If P is the intersection of
ray R and ellipse E the search is constrained according to: After mapping eye coordinates to scene coordinates via Equation
(1), the collected gaze points and timestamp x = (x, y,t) are ana-
max ∇(O + α(P − O) : 0.8 < α < 1.2), (3) lyzed to detect fixations in the data stream. Prior to this type of
analysis, raw eye movement data is not very useful as it represents
as depicted in Figure 5. For the first frame in the video we use the a conjugate eye movement signal, composed of a rapidly changing
eye model manually aligned at initialization to determine P. component (generated by fast saccadic eye movements) with the
comparatively stationary component representative of fixations, the
3.4.2 Ellipse Fitting and Evaluation
eye movements generally associated with cognitive processing.
Ellipses are fit to the set of feature points using linear least squares There are two leading methods for detecting fixations in the raw
minimization (e.g., [Lancaster and Šalkauskas 1986]). This method eye movement data stream: the position-variance or velocity-based
will generate ellipses even during blinks when no valid ellipse is approaches. The former defines fixations spatially, with centroid
attainable. In order to detect these invalid ellipses we implemented and variance indicating spatial distribution [Anliker 1976]. If the
an ellipse evaluation method. variance of a given point is above some threshold, then that point
Each pixel that the ellipse passes through is labeled as acceptable is considered outside of any fixation cluster and is considered to
or not depending upon the magnitude and direction of the gradient be part of a saccade. The latter approach, which could be consid-
at that pixel. The percentage of acceptable pixels is computed and ered a dual of the former, examines the velocity of a gaze point,
1
included in the output as a confidence measure. e.g., via differential filtering, xi = ∆t ∑k xi+ j g j , i ∈ [0, n − k),
˙ j=0
3.4.3 Recovery From Failure where k is the filter length, ∆t = k − i. A 2-tap filter with coeffi-
cients g j = {1, −1}, while noisy, can produce acceptable results.
The ellipse fitting algorithm occasionally fails to identify a valid ˙
The point xi is considered to be a saccade if the velocity xi is above
ellipse due to blinks or other occlusions. Reliance on temporal co- threshold [Duchowski et al. 2002]. It is possible to combine these
herence can prevent the algorithm from recovering from such situ- methods by either checking the two threshold detector outputs (e.g.,
ations. To mitigate this problem we incorporated both manual and for agreement) or by deriving the state-probability estimates, e.g.,
automatic recovery strategies. Automatic recovery relies on ellipse via Hidden Markov Models [Salvucci and Goldberg 2000].
evaluation: if an ellipse evaluates poorly, it is not used to constrain In the present implementation, fixations are identified by a vari-
the search for feature points in the subsequent frame. Instead, we ant of the position-variance approach, with a spatial deviation
revert to using the radius of the eye model as determined at ini- threshold of 19 pixels and number of samples set to 10 (the fixa-
tialization, in conjunction with the center of the last good ellipse. tion analysis code is freely available on the web1 ). Note that this
Sometimes this automatic recovery is insufficient to provide a good
fit, however. Manual recovery is provided by displaying each fitted 1 The position-variance fixation analysis code was originally made

238

t1
θ t2

A
B
C

D x
E
t3 F

Figure 7: AOI trackbox with corners labeled (A, B,C, D).
G
H
I

approach is independent of frame rate, so long as each gaze point is
listed with its timestamp, unlike a previous approach where fixation
detection was tied to the video frame rate [Munn et al. 2008]. Figure 8: Trackboxes t1 , t2 , t3 , AOIs A, B, . . ., I, and fixation x.
The sequence of detected fixations can be processed to gain
insight into the attentional deployment strategy employed by the
wearer of the eye tracking apparatus. A common approach is to within the reference window of the trackbox (see Figure 7). As in
count the number of fixations observed over given Areas Of Inter- dot tracking, a 5×5 search field is used within an 8×8 reference
est, or AOIs, in the scene. To do so in dynamic media, i.e., over window. Equation (2) is now replaced with I(x, y) − µ, where
video, it is necessary to track the AOIs as their apparent position in
(S(A) + S(B)) − (S(C) + S(D))
the video translates due to camera movement. µ= .
p×q
3.6 Feature Tracking
Trackable features include both bright spots and dark spots in
By tracking the movement of individual features it is possible to ap- the scene image. For a bright spot, I(x, y) − µ is maximum at the
proximate the movement of identified AOIs. We allow the user to target location. Dark spots produce minima at target locations. Ini-
place trackboxes at any desired feature in the scene. The trackbox tial placement of the trackbox determines whether the feature to
then follows the feature as it translates from frame to frame. This be tracked is a bright or dark spot, based on the sign of the initial
is similar in principle to the match-moving tracker window in com- evaluation of I(x, y) − µ.
mon compositing software packages (e.g., Apple’s Shake [Paolini Some features cannot be correctly tracked because they exit the
2006]). Locations of trackboxes are written to the output data file camera field. For this study three trackboxes were sufficient to
along with corresponding gaze coordinates. We then post-process properly track all areas of interest within the scene viewed by par-
the data to compute fixation and AOI information from gazepoint ticipants in the study. Extra trackboxes were placed and the three
and trackbox data. that appeared to be providing the best track were selected manu-
The user places a trackbox by clicking on the trackbox sym- ally. Our implementation output a text file and a video. The text file
bol, dragging and dropping it onto the desired feature. A user may contained one line per frame of video. Each line included a frame
place as many trackboxes as desired. For our study trackboxes were number, the (x, y) coordinates of each trackbox, the (x, y) coordi-
placed at the corners of each monitor. nates of the corresponding gaze point, and a confidence number.
Feature tracking is similar to that used for tracking the calibra- See Figure 6 for a sample frame of the output video. Note the frame
tion dot with some minor adaptations. Computation is reduced by number in the upper left corner.
precomputing a summed area table S [Crow 1984]. The value of The video was visually inspected to determine frame numbers
any pixel in S stores the sum of all pixels above and to the left of for the beginning and end of stimulus presentation, and most us-
the corresponding pixel in the original image, able trackboxes. Text files were then manually edited to remove
extraneous information.
S(x, y) = ∑ ∑ I(i, j), 0 < i < x, 0 < j < y. (4)
i j 3.7 AOI Labeling
Computation of the summation table is efficiently performed by a The most recent approach to AOI tracking used structure from mo-
dynamic programming approach (see Algorithm 1). The summation to compute 3D information from eye gaze data [Munn and Pelz
tion table is then used to efficiently compute the average pixel value 2008]. We found such complex computation unnecessary because
we did not need 3D information. We only wanted analysis of fixa-
available by LC Technologies. The original fixfunc.c can tions in AOIs. While structure from motion is able to extract 3D in-
still be found on Andrew R. Freed’s eye tracking web page: formation including head movement, it assumes a static scene. Our
<http://freedville.com/professional/thesis/eyetrack-readme.html>. The method makes no such assumption, AOIs may move independently
C++ interface and implementation ported from C by Mike Ashmore are from the observer, and independently from each other. Structure
available at: <http://andrewd.ces.clemson.edu/courses/cpsc412/fall08>. from motion can however handle some degree of occlusion that our
approach does not. Trackboxes are unable to locate any feature that
becomes obstructed from view.
AOI labeling begins with the text files containing gaze data and
for (y = 0 to h)
sum = 0
track box locations as described above. The text files were then
for (x = 0 to w) automatically parsed and fed into our fixation detection algorithm.
sum = sum + I(x, y) Using the location of the trackboxes at the end of fixation, we were
S(x, y) = sum + S(x, y − 1) able to assign AOI labels to each fixation. For each video a short
program was written to apply translation, rotation, and scaling be-
Algorithm. 1: Single-pass computation of summation table. fore labeling the fixations, with selected trackboxes defining the
local frame of reference. The programs varied slightly depending

239

Apparatus, Environment, & Data Collected. Participants
viewed each image on a display wall consisting of nine video mon-
itors arranged in a 3×3 grid. Each of the nine video monitors’
display areas measured 36 wide × 21 high, with each monitor
framed by a 1/2 black frame for an overall measurement of 9
wide × 5 3 high.
The mock patient room measured approximately 15.6 × 18.6 .
Participants viewed the display wall from a hospital bed facing the
monitors. The bed was located approximately 5 3 from the dis-
play wall with its footboard measuring 3.6 high off the floor (the
monitors were mounted 3 from the floor). As each participant lay
on the bed, their head location measured approximately 9.6 to the
center of the monitors. Given these dimensions and distances and
using θ = 2 tan−1 (r/(2D)) to represent visual angle, with r = 9
Figure 9: Labeling AOIs. Trackboxes, usually at image corners, and D = 9.6 , the monitors subtended θ = 50.2◦ visual angle.
are used to maintain position and orientation of the 9-window dis- Pain perception, mood, blood pressure, and heart rate were con-
play panel; each of the AOIs is labeled in sequential alphanumeric tinually assessed during the experiment. Results from these mea-
order from top-left to bottom-right—the letter ‘O’ is used to record surements are omitted here, they are mentioned to give the reader a
when a fixation falls outside of the display panels. In this screen- sense of the complete procedure employed in the experiment.
shot, the viewer is looking at the purple flower field.
Procedure. Each participant was greeted and asked to provide
documentation of informed consent. After situating themselves on
the bed facing the display wall, each participant involved in the eye
upon which track boxes were chosen. For example, consider a fix- tracking portion of the study donned the wearable eye tracker. A
ation detected at location x, with trackboxes t1 , t2 , t3 , and AOIs A, laptop was then placed in front of them on a small rolling table
B, . . ., I as illustrated in Figure 8. Treating t1 as the origin of the and the participant was asked to view the calibration dot sequence.
reference frame, trackboxes t2 and t3 as well as the fixation x are Following calibration, each participant viewed the image stimulus
translated to the origin by subtracting the coordinates of trackbox (or blank monitors) for two minutes as timed by a stopwatch.
t1 . Following translation, the coordinates of trackbox t2 define the
rotation angle, θ = tan−1 (t2y /t2x ). A standard rotation matrix is Subjects. 109 healthy college students took part in the study,
used to rotate fixation point x to bring it in alignment with the hor- with a small subsample (21) participating in the eye tracking por-
izontal x-axis. Finally, if trackbox t3 is located two-thirds across tion.
and down the panel display, then the fixation coordinates are scaled
by 2/3. The now axis-aligned and scaled fixation point x is checked Experimental Design. The study used a mixed randomized de-
for which third of the axis-aligned box it is positioned in and the ap- sign. Analysis of recorded gaze points by participants wearing the
propriate label is assigned. Note that this method of AOI tracking eye tracker was performed based on a repeated measures design
is scale- and 2D-rotationally-invariant. It is not, however, invari- where the set of fixations generated by each individual was treated
ant to shear, resulting from feature rotation in 3D (e.g., perspective as the within-subjects fixed factor.
rotation).
Following fixation localization, another text file is then output Discarded Data. Four recordings were collected over each of
with one line per fixation. Each line contains the subject number, four stimulus images with four additional recordings displaying
stimulus identifier, AOI label, and fixation duration. This informa- no image as control. There was one failed attempt to record data
tion is then reformatted for subsequent statistical analysis by the over the purple flower field stimulus. A replacement recording was
statistical package used (R in this case). made. There were 21 sessions in all.
Ten recordings were discarded during post processing because
4 Applied Example video quality prohibited effective eye tracking. In each of these
videos some combination of multiple factors rendered them unus-
In an experiment conducted to better understand the potential health able. These factors included heavy mascara, eyelid occlusion, fre-
benefits of images of nature in a hospital setting, participants’ gaze quent blinking, low contrast between iris and sclera, poor position-
was recorded along with physiological and self-reported psycho- ing of eye cameras, and calibration dots not in the field of view. We
logical data collected. successfully processed 2 control, 4 yellow field, 1 tree, 2 fire, and 2
purple flower field videos.
Eye Movement Analysis. For analysis of fixations within AOIs, Poor camera positioning could have been discovered and cor-
trackboxes were placed at the corners of the corners of the 3×3 rected if the cameras provided real-time video feedback. Our hard-
panel display in the scene video. All 9 AOIs were assumed to be ware did not support online processing. Online processing could
equally-sized connected rectangles (see Figure 9). Trackboxes were have provided additional feedback allowing for detection and miti-
used to determine AOI position orientation and scale. Out of plane gation of most other video quality issues.
rotation was not considered. Trackboxes on the outside corners of
the 3×3 grid were preferred. Otherwise linear interpolation was 5 Results
used to determine exterior boundaries of the grid.
Using AOIs and image type as fixed factors (with participant as the
Stimulus. Using the prospect-refuge theory of landscape prefer- random factor [Baron and Li 2007]), repeated-measures two-way
ence [Appleton 1996], four different categories of images (see Fig- ANOVA indicates a marginally significant main effect of AOI on
ure 10) were viewed by participants before and after undergoing fixation duration (F(9,1069) = 2.08, p < 0.05, see Figure 11).2 Av-
a pain stressor (hand in ice water for up to 120 seconds). A fifth eraging over image types, pair-wise t-tests with pooled SD indicate
group of participants (control) viewed the same display wall (see
below) with the monitors turned off. 2 Assuming sphericity as computed by R.

240

(a) (b) (c) (d)

Figure 10: Stimulus images: (a) yellow field: prospect (Getty Images), (b) tree: refuge (Getty Images), (c) fire: hazard (Getty Images), (d)
purple flower field: mixed prospect and refuge (courtesy Ellen A. Vincent).

Fixation Durations vs. AOI Fixation Durations vs. AOI
Mean Fixation Durations (in ms; with SE)

Mean Fixation Durations (in ms; with SE)
1800 4000
control
3500 yellow field
1600 tree
3000 fire
lavender field
1400
2500

1200 2000

1500
1000
1000
800
500

600 0
A B C D E F G H I O A B C D E F G H I O
AOI AOI

Figure 11: Comparison of mean fixation duration per AOI aver- Figure 12: Comparison of mean fixation duration per AOI and per
aged over image types, with standard error bars. image type, with standard error bars.

no significant differences in fixation durations between any pair of a propensity of viewers to look around more when presented with
AOIs. stimulus than when there is nothing of interest at all.
A similar observation could be made regarding fixation durations
Repeated-measures ANOVA also indicates a significant main ef-
found over region C (upper right) for the purple flower field image,
fect of image type on fixation duration (F(34,1044) = 1.78, p <
an image with which viewers perceived lower sensory pain com-
0.01), with AOI × image interaction not significant (see Figure 12).
pared to those who viewed other landscape images and no images
Averaging over AOIs, pair-wise t-tests with pooled SD indicate sig-
with statistical significance at α = 0.1 [Vincent et al. 2009]. How-
nificantly different fixation durations between the control image
ever, the difference in fixation durations over region C is not signif-
(blank screen) and the tree image (p < 0.01, with Bonferroni cor-
icant according to the pair-wise post-hoc analysis.
rection). No other significant differences were detected.
7 Conclusion
6 Discussion
A match-moving approach was presented to help automate analy-
Averaging over image types, the marginally significant difference sis of eye movements collected by a wearable eye tracker. Tech-
in fixation durations over AOIs suggests that longest durations tend nical contributions addressed video stream synchronization, pupil
to fall on central AOIs (E and H). This simply suggests that viewers detection, eye movement analysis, and tracking of dynamic scene
tend to fixate the image center. This is not unusual, particularly Areas Of Interest (AOIs). The techniques were demonstrated in
in the absence of a specific viewing task [Wooding 2002]. Post- the evaluation of eye movements on images of nature viewed by
hoc pair-wise comparisons failed to reveal significant differences, subjects participating in an experiment on the perception of well-
which is likely due to the relatively high variability of the data. being. Although descriptive statistics of gaze locations over AOIs
Averaging over AOIs shows that the tree image drew signifi- failed to show significance of any particular AOI except the center,
cantly shorter fixations than the control (blank) screen. Due to av- the methodology is applicable toward similar future studies.
eraging, however, it is difficult to infer further details regarding fix-
ation duration distributions over particular image regions. Cursory References
examination of Figure 12 suggests shorter fixations over the center A NLIKER , J. 1976. Eye Movements: On-Line Measurement, Anal-
panels (E & H), compared to the longer dwell times made when the ysis, and Control. In Eye Movements and Psychological Pro-
screen was blank. Considering the averaging inherent in ANOVA, cesses, R. A. Monty and J. W. Senders, Eds. Lawrence Erlbaum
this could just mean that fixations are more evenly distributed over Associates, Hillsdale, NJ, 185–202.
the tree image than over the blank display, where it is fairly clear
that viewers mainly looked at the center panels. This may suggest A PPLETON , J. 1996. The Experience of Landscape. John Wiley &
a greater amount of visual interest offered by the tree image and Sons, Ltd., Chicester, UK.

241

BABCOCK , J. S. AND P ELZ , J. B. 2004. Building a Lightweight M EGAW, E. D. AND R ICHARDSON , J. 1979. Eye Movements and
Eyetracking Headgear. In ETRA ’04: Proceedings of the 2004 Industrial Inspection. Applied Ergonomics 10, 145–154.
Symposium on Eye Tracking Research & Applications. ACM,
San Antonio, TX, 109–114. M ORIMOTO , C. H. AND M IMICA , M. R. M. 2005. Eye Gaze
Tracking Techniques for Interactive Applications. Computer Vi-
BALLARD , D. H., H AYHOE , M. M., AND P ELZ , J. B. 1995. Mem- sion and Image Understanding 98, 4–24.
ory Representations in Natural Tasks. Journal of Cognitive Neu-
roscience 7, 1, 66–80. M UNN , S. M. AND P ELZ , J. B. 2008. 3D point-of-regard, position
and head orientation from a portable monocular video-based eye
BARON , J. AND L I , Y. 2007. Notes on the use of R for psy- tracker. In ETRA ’08: Proceedings of the 2008 Symposium on
chology experiments and questionnaires. Online Notes. URL: Eye Tracking Research & Applications. ACM, Savannah, GA,
<http://www.psych.upenn.edu/∼baron/rpsych/rpsych.html> 181–188.
(last accessed December 2007).
M UNN , S. M., S TEFANO , L., AND P ELZ , J. B. 2008. Fixation-
B USWELL , G. T. 1935. How People Look At Pictures. University identiﬁcation in dynamic scenes: Comparing an automated al-
of Chicago Press, Chicago, IL. gorithm to manual coding. In APGV ’08: Proceedings of the
5th Symposium on Applied Perception in Graphics and Visual-
C ROW, F. C. 1984. Summed-area tables for texture mapping. In ization. ACM, New York, NY, 33–42.
SIGGRAPH ’84: Proceedings of the 11th Annual Conference
on Computer Graphics and Interactive Techniques. ACM, New PAOLINI , M. 2006. Apple Pro Training Series: Shake 4. Peachpit
York, NY, 207–212. Press, Berkeley, CA.
D UCHOWSKI , A., M EDLIN , E., C OURNIA , N., G RAMOPADHYE , P ELZ , J. B., C ANOSA , R., AND BABCOCK , J. 2000. Extended
A., NAIR , S., VORAH , J., AND M ELLOY, B. 2002. 3D Eye Tasks Elicit Complex Eye Movement Patterns. In ETRA ’00:
Movement Analysis. Behavior Research Methods, Instruments, Proceedings of the 2000 Symposium on Eye Tracking Research
Computers (BRMIC) 34, 4 (November), 573–591. & Applications. ACM, Palm Beach Gardens, FL, 37–43.
F REED , A. R. 2003. The Effects of Interface Design on Telephone R EICH , S., G OLDBERG , L., AND H UDEK , S. 2004. Deja View
Dialing Performance. M.S. thesis, Pennsylvania State Univer- Camwear Model 100. In CARPE’04: Proceedings of the 1st
sity, University Park, PA. ACM Workshop on Continuous Archival and Retrieval of Per-
sonal Experiences. ACM Press, New York, NY, 110–111.
JACOB , R. J. K. AND K ARN , K. S. 2003. Eye Tracking in Human-
Computer Interaction and Usability Research: Ready to Deliver RYAN , W. J., D UCHOWSKI , A. T., AND B IRCHFIELD , S. T. 2008.
the Promises. In The Mind’s Eye: Cognitive and Applied Aspects Limbus/pupil switching for wearable eye tracking under variable
of Eye Movement Research, J. Hyönä, R. Radach, and H. Deubel, lighting conditions. In ETRA ’08: Proceedings of the 2008 Sym-
Eds. Elsevier Science, Amsterdam, The Netherlands, 573–605. posium on Eye Tracking Research & Applications. ACM, New
York, NY, 61–64.
L ANCASTER , P. AND Š ALKAUSKAS , K. 1986. Curve and Surface
Fitting: An Introduction. Academic Press, San Diego, CA. S ALVUCCI , D. D. AND G OLDBERG , J. H. 2000. Identifying Fix-
ations and Saccades in Eye-Tracking Protocols. In ETRA ’00:
L AND , M., M ENNIE , N., AND RUSTED , J. 1999. The Roles of Proceedings of the 2000 Symposium on Eye Tracking Research
Vision and Eye Movements in the Control of Activities of Daily & Applications. ACM, Palm Beach Gardens, FL, 71–78.
Living. Perception 28, 11, 1307–1432.
S MEETS , J. B. J., H AYHOE , H. M., AND BALLARD , D. H. 1996.
L AND , M. F. AND H AYHOE , M. 2001. In What Ways Do
Goal-Directed Arm Movements Change Eye-Head Coordina-
Eye Movements Contribute to Everyday Activities. Vision Re-
tion. Experimental Brain Research 109, 434–440.
search 41, 25-26, 3559–3565. (Special Issue on Eye Movements
and Vision in the Natual World, with most contributions to the V INCENT, E., BATTISTO , D., G RIMES , L., AND M C C UBBIN , J.
volume originally presented at the ‘Eye Movements and Vision 2009. Effects of nature images on pain in a simulated hospital
in the Natural World’ symposium held at the Royal Netherlands patient room. Health Environments Research and Design. In
Academy of Sciences, Amsterdam, September 2000). press.
L I , D. 2006. Low-Cost Eye-Tracking for Human Computer Inter- W EBB , N. AND R ENSHAW, T. 2008. Eyetracking in HCI. In Re-
action. M.S. thesis, Iowa State University, Ames, IA. Techreport search Methods for Human-Computer Interaction, P. Cairns and
TAMU-88-010. A. L. Cox, Eds. Cambridge University Press, Cambridge, UK,
35–69.
L I , D., BABCOCK , J., AND PARKHURST, D. J. 2006. openEyes: A
Low-Cost Head-Mounted Eye-Tracking Solution. In ETRA ’06: W OODING , D. 2002. Fixation Maps: Quantifying Eye-Movement
Proceedings of the 2006 Symposium on Eye Tracking Research Traces. In Proceedings of ETRA ’02. ACM, New Orleans, LA.
& Applications. ACM, San Diego, CA.
L I , D. AND PARKHURST, D. 2006. Open-Source Software for
Real-Time Visible-Spectrum Eye Tracking. In Conference on
Communication by Gaze Interaction. COGAIN, Turin, Italy.
L I , D., W INFIELD , D., AND PARKHURST, D. J. 2005. Star-
burst: A hybrid algorithm for video-based eye tracking com-
bining feature-based and model-based approaches. In Vision
for Human-Computer Interaction Workshop (in conjunction with
CVPR).

242

Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks

Ähnlich wie Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks (20)

Mehr von Kalle

Mehr von Kalle (20)

Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks