In some situations, high quality eye tracking systems are not affordable. This generates the demand for inexpensive systems built upon non-specialized, off the shelf devices. Investigations show that algorithms developed for high resolution systems do not perform satisfactorily on such lowcost and low resolution systems. We investigate
algorithms specifically tailored to such low resolution input devices, based on combination of different strategies. An approach called gradient direction consensus is introduced and compared to image based correlation with adaptive templates as well as other known methods. The results are compared using synthetic input data with known ground truth.
2. Figure 2: Low resolution pupil image (infra red, with glint), en- Figure 3: Gradients Figure 4: Relevant gradients
larged to identify individual pixels (red arrows are shortened) chosen (see sect. 3.1)
For their circle approximation method the rim points are used to For remote infrared eye trackers like our system, pupil and glint
estimate a circle. Obvious outliers to the circle fitting are weighted centers are the key values for the gaze estimation. Their relative
down to find a good approximation after a few iterations. position, together with the geometry of the setup, provide the input
for subsequent gaze estimation. While the glint is easily identified
The Starburst approach presented in [Li et al. 2005] first removes as the brightest area in the eye image, the pupils outer form is not
any glints and makes an initial rough guess for the pupil center. intact when the glint partly covers it. Additionally, the pupil border
From here radial rays are sent and observed for a significant jump is not very sharp, causing further handicaps in its determination. To
in intensity. From these pupil rim points secondary rays are sent a certain degree this can be compensated by using the iris border as
like a fan in the opposite direction in a range of ±50◦ , providing an additional support, but it might be partly covered by the upper
additional points on the rim. Using the RANSAC algorithm, an (and sometimes lower) eye lid(s) (see Figure 2).
ellipse is fitted to the detected rim points.
For low resolution images the comparison presented in [Droege
[P´ rez et al. 2003] describes a similar technique, starting from an
e et al. 2008] shows that none of the tested algorithms (e.g those in
initial pupil center guess, the center of gravity of pixels which are Section 2) performs significantly better than the calculation of the
darker than a threshold. However, only primary rays are used for center of gravity of sufficiently dark pixels. This center of gravity
the pupil rim detection, which is done by employing a Laplace filter. though often gives a good first guess for subsequent steps as per-
If the detected points are not equidistant from the estimated center, formed in the Starburst algorithm and others.
iteratively a new center is chosen by using the mid points of the
diagonals. 3.1 Gradient Direction Consensus
The algorithm described in [Ohno et al. 2002] is called double el- Observation shows that in an image around the iris, a large num-
lipse fitting. First, dark, round regions give an initial guess for ber of gradients point near the pupil/iris center (Figure 3). Most of
pupils. Rim points are determined similar to Starburst’s primary those not pointing towards the center are caused by noise and iris
rays. An ellipse is fitted to these points and its center is used as structure, mostly having a small strength. Other outliers are caused
starting point for a second run with a doubled number of rays, ig- by the glint and denoted by a high gradient strength. By using a
noriong obvious outliers. lower and a higher threshold for the gradient strength all ”reason-
ably” strong gradients are selected for further processing.
This is by far not a comprehensive list of published algorithms, sev-
eral other approaches do exist, e. g. [Poursaberi and Araabi 2005], Using the Hessian form for straight lines we can describe any point
but throughout follow similar principles. p on a line by p · n = d where n is the straight lines normal and
d is its distance from the origin. The gradient components deter-
mined using the Sobel operator give the gradient direction as g =
3 Approaches (gx , gy )T and thus the normal to it as ng = (−gy , gx )T . Given any
point on the straight line, which in this case is just the coordinate
Using lowcost COTS (commercial off the shelf) devices limits the (x, y)T where we determined the gradient, the distance dg to the
possibilities for the system setup. Common inexpensive cameras origin can be calculated as dg = (x, y)T ·ng = y ·gx −x·gy . Thus,
have a rather large field of view, resulting in very small projections every point p = (x, y)T on the gradient line solves p ·ng −dg = ∆
of the eye on the image sensor. While this has the advantage of with ∆ = 0, for other points ∆ gives the distance to the line.
covering a larger range of head movements, which is desirable for
some users, the disadvantage is the very small amount of pixels ˆ
The goal is to estimate a common center point m minimizing
forming the eye in the image as shown in Figure 2.
ˆ
m · n g − dg = ∆ (1)
Given such coarse input data it is obvious that working on the level
of discrete pixels is not sufficient to achieve the desired accuracy for all (relevant) gradients g. Expanding (1) yields
required to control a gaze based interaction system. Only some
of the published algorithms deal with sub-pixel accuracy and only ˆ
∆ = m · ng − dg = (my − y)gx + (x − mx )gy
very few put a specific focus on this aspect. For head mounted
systems as well as for remote systems with a narrow field of view, and the partial derivatives compute as
sub-pixel accuracy is of limited importance, which explains this
commonly neglected point. For a setup with COTS components ∂∆ ∂∆
however it is of great significance. = −gy , = gx .
∂mx ∂my
170
3. Algorithm Tag RMSE
center of gravity CG11 0.1351
[P´ rez et al. 2003]
e PE10 0.3047
ellipse fit, 20◦ steps EFS2 0.4016
ellipse fit, 10◦ steps EFS1 0.4022
Adaptive image pattern match PM01 0.1059
Gradient direction consensus GD02 0.0501
Table 1: RMSE values for different algorithms
3.3 Ellipse Fitting
Figure 5: Example template images. Black denotes masked out
For comparison, an ellipse fitting approach Inspired by the algo-
pixels. The glint position is masked out.
rithms presented in Section 2 has been implemented. First, pupil
rim points are detected similar to [P´ rez et al. 2003]. To eliminate
e
the influence of erroneously detected points algorithms again an M-
Using an M-estimator (described e. g. in [Staudte and Sheather estimator is employed.
1990]), a common intersection point is determined.
Figure 4 shows the positions in the image with gradient strengths Using 18 sample lines the result is
within the abovementioned limits. Used positions are marked in not satisfactory. The estimator some-
dark blue, outliers in light blue. It is apparent in this sample image, times still includes glint border points
that the resulting center (red cross) is not ’pushed away’ by the in the set of input samples to opti-
glint opposed to the reference point (green) calculated by the simple mize, resulting in an ellipse distorted
center of gravity approach. by the glint.
3.2 Image Based Template Matching Sending a ray every 20◦ seems rather
coarse, raising the number of rays to
Given the relatively small number of pixels to be analyzed direct 36 (10◦ steps) however dose not give
image comparison with an adaptable image template is possible. a noticeable improvement (see Sec- Figure 6: Ellipse fitting
Such a template must reflect the situation of low resolution by tak- tion 4).
ing into account the smooth transitions in intensities between the
different regions. That is, the generation of the template image 4 Comparison
must be able to perform sub-pixel accuracy and anti aliasing and
blending in the drawing process. Providing reference data for eye tracking is difficult. Instructing
Choosing appropriate (initial) pupil and iris radii and luminosity a subject to follow a point with known coordinates on a computer
values for the eye regions, an initial template is generated. By iter- screen bears two error sources when comparing pupil detection al-
atively shifting it to different positions around the initial guess, the gorithms: errors introduced by the gaze estimation algorithms due
position with the best match is taken for the next step with reduced to non-linearities cannot be distinguished from detection errors and
search range and step width. This is repeated until a minimum step due to unintentional saccades the position of the pupil center does
width, e. g. 0.1 pixel, is reached. not necessarily correspond to the reference position on screen.
Of course, using appropriate parameters for the generation is es- Given the low contrast in eye images gathered from a low resolution
sential. Therefore these parameters should be updated over time to device it is also difficult to assign a ground truth value to real images
account for changes in the appearance of the eye. Intensity values matching the desired accuracy. Hence, as a first mean of evaluation,
can be adapted by biasing the current values towards the actual val- synthetic images have been generated with graphics algorithms ca-
ues found after every successful match. The pupil radius can be pable of producing sub-pixel accurate anti aliased output. While
adapted from the ellipse parameters determined for an analysis as these images cannot be seen as a valid replacement for real images
described in Section 3.3. it can at least be argued that bad performance on such almost ideal
images will not become better in real images.
The image comparison of two images f and g of size W × H is
performed using common methods like the mean squared error or To distinguish between the pure pupil center detection and the sit-
by employing a cross-correlation measure. uation where glints put an additional strain on the algorithm the
first tests have been performed without glints. As some algorithms
W,H
1 X benefit from prior knowledge of the glint position (if any), an accu-
MSE = (fxy − gxy )2 (2) rate glint position detection should be performed prior to the pupil
W · H x,y
center detection and has been neglected in this work.
W,H
1 X
A series of 700 images with synthetic eye positions at regular sub-
COR = (fxy · gxy ) (3)
W · H x,y pixel positions was generated as test input. The algorithms from
Section 3 as well as the center of gravity and the algorithm de-
Producing a template image with all relevant aspects would cause a scribed in [P´ rez et al. 2003] (which performed best in [Droege
e
multitude of additional parameters to be considered e. g. glint posi- et al. 2008]) have been applied to this sequence. The Euclidean dis-
tion(s), glint radius, upper and lower lid positions and form. Since tance to the reference coordinate is measured, the resulting rooted
most of these parameters are hard to determine and of little further mean square error (RMSE) is listed in Table 1. The deviation of the
use it is easier to mask out these regions from the comparison (Fig- measured positions (red dots) from the reference (light blue dots) is
ure 5). shown in Figures 7-9.
171
4. synth_01_noglint L synth_01_noglint L synth_01_noglint L
223 223 223
EFS1 PM01 GD02
GNDT GNDT GNDT
RMSE: 0.4022 RMSE: 0.1059 RMSE: 0.05005
222 222 222
221 221 221
220 220 220
219 219 219
223 224 225 226 227 228 223 224 225 226 227 228 223 224 225 226 227 228
Figure 7: Ellipse fit sub-sampled, Figure 8: Adaptive image pattern match, Figure 9: Gradient direction consensus,
10◦ steps, synthetic input synthetic input synthetic input
Some algorithms show periodic patterns repeating at pixel fre- By improving the test framework and increasing the naturalness of
quency, like in Figure 7. This is easily explained by the effects the test image series a more thorough investigation of new and ex-
of aliasing and thresholds, but nevertheless undesirable. isting algorithms will be possible and lead to improved results.
The RMSE values from Table 1 seem to contradict the plots at a
first glance, but the variance for some of the other algorithms is References
much higher resulting in a bad RMSE result. The most appeal-
ing result is found in Figure 9 for the gradient direction consensus. DAUGMAN , J. 2004. How iris recognition works. Circuits and
This is backed by the best RMSE value, however it must be re- Systems for Video Technology, IEEE Transactions 14, 1, 21–30.
membered that this is measured on ’clean’ synthetic input images. DAUNYS , G., AND R AMANAUSKAS , N. 2004. The accuracy of
Adding noise to the images reveals that the performance decreases eye tracking using image processing. In NordiCHI ’04, ACM,
the stronger the noise in the images is. Additionally, no real iris New York, NY, USA, 377–380.
structure is present in the synthetic images which would introduce
noisy regions even in the case of a good input signal. D ROEGE , D., S CHMIDT, C., AND PAULUS , D. 2008. A compar-
ison of pupil center estimation algorithms. In COGAIN 2008,
0.4
GD02,L
GD02,R
H. Istance, O. Stepankova, and R. Bates, Eds. (short paper).
EFB2,L
0.35 EFB2,R
PE10,L
PE10,R H ANSEN , D. W., AND J I , Q. 2009. In the eye of the beholder:
0.3
A survey of models for eyes and gaze. IEEE Transactions on
0.25 Pattern Analysis and Machine Intelligence. (in print).
0.2
L I , D., W INFIELD , D., AND PARKHURST, D. J. 2005. Star-
0.15 burst: A hybrid algorithm for video-based eye tracking combin-
ing feature-based and model-based approaches. In CVPR ’05,
0.1
IEEE, Washington, 79–86.
0.05
¨ ¨
M AJARANTA , P., AND R AIH A , K.-J. 2002. Twenty years of eye
0
0 1 2 3 4 5 6 7 typing. In Proceedings of Eye Tracking Research and Applica-
tions, ACM, 15–22.
Figure 10: RMSE values dependent on noise strength (σ)
O HNO , T., M UKAWA , N., AND YOSHIKAWA , A. 2002. Freegaze:
Figure 10 shows the RMSE values for some of the algorithms as A gaze tracking system for everyday gaze interaction. In Pro-
a function of the strength of noise added to the synthetic images, ceedings of the symposium on ETRA 2002: eye tracking research
left and right eye (marked ’L’ and ’R’). The sigma of the Gaussian and applications symposium, 125–132.
noise is varied from 0 to 7. The labels correspond to Table 1. ´ ´ ´
P E REZ , A., C ORDOBA , M., G ARCIA , A., M E NDEZ , R., M UNOZ ,
While the subjective impression of the performance on real input ´
M., P EDRAZA , J., AND S ANCHEZ , F. 2003. A precise eye-gaze
images looks good, no quantifiable results can be given up to now. detection and tracking system. In WSCG POSTERS proceedings,
Since the overall performance in a complete eye tracking system is February 3-7, 2003.
always influenced by the gaze estimation process such results are P OURSABERI , A., AND A RAABI , B. 2005. A novel iris recog-
difficult to interpret and to account for the specific algorithm used. nition system using morphological edge detector and wavelet
phase features. ICGST International Journal on Graphics, Vi-
5 Conclusion sion and Image Processing 05.
S AN AGUSTIN , J., S KOVSGAARD , H., H ANSEN , J. P., AND
It is evident that a reasonable comparison must not be based on H ANSEN , D. W. 2009. Low-cost gaze interaction: ready to de-
RMSE values of the dislocation distance alone. Additional mea- liver the promises. In CHI EA ’09: 27th intl. conf. on Human fac-
sures like the variance must be used to characterize the perfor- tors in computing systems, ACM, New York, USA, 4453–4458.
mance. The approaches of adaptive pattern matching and gradient
direction consensus seem to be good candidates for further investi- S TAUDTE , R. G., AND S HEATHER , S. J. 1990. Robust Estimation
gation. Their robustness to noisy signals and distortions by glints is and Testing. Wiley, New York.
still to be investigated further, the subjective impression observed
with real input images look promising.
172