Droege Pupil Center Detection In Low Resolution Images

Pupil Center Detection in Low Resolution Images
Detlev Droege Dietrich Paulus
Active Vision Group Active Vision Group
University of Koblenz-Landau University of Koblenz-Landau
droege@uni-koblenz.de paulus@uni-koblenz.de

Abstract
In some situations, high quality eye tracking systems are not afford-
able. This generates the demand for inexpensive systems built upon
non-specialized, off the shelf devices. Investigations show that al-
gorithms developed for high resolution systems do not perform sat-
isfactorily on such lowcost and low resolution systems. We inves-
tigate algorithms specifically tailored to such low resolution input
devices, based on combinations of different strategies. An approach
called gradient direction consensus is introduced and compared to
image based correlation with adaptive templates as well as other
known methods. The results are compared using synthetic input
data with known ground truth.

CR Categories: I.4.8 [Image Processing and Computer Vi-
sion]: Scene Analysis—Tracking; I.5.4 [Pattern Recognition]:
Applications—Computer vision
Figure 1: Experimental setup.
Keywords: pupil center detection, low resolution, gaze tracking

A number of algorithms has been published to determine the pupil
1 Introduction center in eye tracking systems like [Daunys and Ramanauskas
2004], [Li et al. 2005], [Ohno et al. 2002], and others. These al-
The quality and accuracy of common eye and gaze tracking de- gorithms were shown to work good on medium to high resolution
vices provides a very solid base for gaze based interaction systems. images, for low resolution input however they leave room for im-
However, they are usually not precise enough to select tiny screen provement.
elements of common graphical user interfaces, leading to the devel-
opment of specific user interfaces for gaze interaction. Such inter-
faces are designed to be used with a much lower need for accuracy, 2 Related Work
not only due to the technical limitations, but also to account for
the users capabilities [Majaranta and R¨ ih¨ 2002]. Gaze interaction
a a Numerous approaches to the detection of the eye in video-
systems are often used by disabled users, who, depending on the oculography (video based eye trackers) have been described in var-
type and severeness of their handicaps, might also not be able to ious publications. A thorough overview is given by Hansen and Ji
position their gaze as exact as would be required by conventional [2009]. Most of these algorithms were developed with specific se-
user interfaces. tups and conditions in mind, as the usage scenarios for eye detection
and tracking are quite different.
Depending on the service capability of the health care system in
different countries, affected persons might often not be funded to In the context of iris recognition, close-up high resolution images
obtain a gaze tracking system. Therefore, several research groups of the subjects eye are self-evident. Appropriate oculars, specific
work on systems using inexpensive off the shelf devices to compile illumination and appropriate cameras deliver almost perfect images
simple gaze tracking systems like e. g. [San Agustin et al. 2009]. of the iris, however accept a distraction of the eye using extra il-
Such systems will perform at a significantly lower accuracy than the lumination. Such images were the basis for Daugman’s algorithm
established systems, but still good enough for being used with gaze described in [Daugman 2004]. He defines the integrodifferential
interaction systems. While the goal of using cheap (US$ 20-30) operator, based on a circular integral around the currently estimated
web cams could not yet be met, system costs of less than US$ 300 pupil/iris center. Iteratively increasing the radius large changes in
are currently realistic. Figure 1 shows an experimental setup built the integral indicate the limbus and ar used to find new center esti-
using such components. mates.
In [Daunys and Ramanauskas 2004] two methods to determine the
Copyright © 2010 by the Association for Computing Machinery, Inc. center of a pupil are presented. First the transitions from the pupil to
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
the iris is approximated by a polynomial to determine the pupil rim
for commercial advantage and that copies bear this notice and the full citation on the with sub-pixel accuracy. The coordinates averaging approach then
first page. Copyrights for components of this work owned by others than ACM must be forms horizontal and vertical scan lines between corresponding rim
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on points. The pupil center is estimated by averaging the horizontal
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
and vertical mid points for the respective component. For images
permissions@acm.org. without artifacts from e. g. glints this gives rather accurate results.
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

169

Figure 2: Low resolution pupil image (infra red, with glint), en- Figure 3: Gradients Figure 4: Relevant gradients
larged to identify individual pixels (red arrows are shortened) chosen (see sect. 3.1)

For their circle approximation method the rim points are used to For remote infrared eye trackers like our system, pupil and glint
estimate a circle. Obvious outliers to the circle fitting are weighted centers are the key values for the gaze estimation. Their relative
down to find a good approximation after a few iterations. position, together with the geometry of the setup, provide the input
for subsequent gaze estimation. While the glint is easily identified
The Starburst approach presented in [Li et al. 2005] first removes as the brightest area in the eye image, the pupils outer form is not
any glints and makes an initial rough guess for the pupil center. intact when the glint partly covers it. Additionally, the pupil border
From here radial rays are sent and observed for a significant jump is not very sharp, causing further handicaps in its determination. To
in intensity. From these pupil rim points secondary rays are sent a certain degree this can be compensated by using the iris border as
like a fan in the opposite direction in a range of ±50◦ , providing an additional support, but it might be partly covered by the upper
additional points on the rim. Using the RANSAC algorithm, an (and sometimes lower) eye lid(s) (see Figure 2).
ellipse is fitted to the detected rim points.
For low resolution images the comparison presented in [Droege
[P´ rez et al. 2003] describes a similar technique, starting from an
e et al. 2008] shows that none of the tested algorithms (e.g those in
initial pupil center guess, the center of gravity of pixels which are Section 2) performs significantly better than the calculation of the
darker than a threshold. However, only primary rays are used for center of gravity of sufficiently dark pixels. This center of gravity
the pupil rim detection, which is done by employing a Laplace filter. though often gives a good first guess for subsequent steps as per-
If the detected points are not equidistant from the estimated center, formed in the Starburst algorithm and others.
iteratively a new center is chosen by using the mid points of the
diagonals. 3.1 Gradient Direction Consensus

The algorithm described in [Ohno et al. 2002] is called double el- Observation shows that in an image around the iris, a large num-
lipse fitting. First, dark, round regions give an initial guess for ber of gradients point near the pupil/iris center (Figure 3). Most of
pupils. Rim points are determined similar to Starburst’s primary those not pointing towards the center are caused by noise and iris
rays. An ellipse is fitted to these points and its center is used as structure, mostly having a small strength. Other outliers are caused
starting point for a second run with a doubled number of rays, ig- by the glint and denoted by a high gradient strength. By using a
noriong obvious outliers. lower and a higher threshold for the gradient strength all ”reason-
ably” strong gradients are selected for further processing.
This is by far not a comprehensive list of published algorithms, sev-
eral other approaches do exist, e. g. [Poursaberi and Araabi 2005], Using the Hessian form for straight lines we can describe any point
but throughout follow similar principles. p on a line by p · n = d where n is the straight lines normal and
d is its distance from the origin. The gradient components deter-
mined using the Sobel operator give the gradient direction as g =
3 Approaches (gx , gy )T and thus the normal to it as ng = (−gy , gx )T . Given any
point on the straight line, which in this case is just the coordinate
Using lowcost COTS (commercial off the shelf) devices limits the (x, y)T where we determined the gradient, the distance dg to the
possibilities for the system setup. Common inexpensive cameras origin can be calculated as dg = (x, y)T ·ng = y ·gx −x·gy . Thus,
have a rather large field of view, resulting in very small projections every point p = (x, y)T on the gradient line solves p ·ng −dg = ∆
of the eye on the image sensor. While this has the advantage of with ∆ = 0, for other points ∆ gives the distance to the line.
covering a larger range of head movements, which is desirable for
some users, the disadvantage is the very small amount of pixels ˆ
The goal is to estimate a common center point m minimizing
forming the eye in the image as shown in Figure 2.
ˆ
m · n g − dg = ∆ (1)
Given such coarse input data it is obvious that working on the level
of discrete pixels is not sufficient to achieve the desired accuracy for all (relevant) gradients g. Expanding (1) yields
required to control a gaze based interaction system. Only some
of the published algorithms deal with sub-pixel accuracy and only ˆ
∆ = m · ng − dg = (my − y)gx + (x − mx )gy
very few put a specific focus on this aspect. For head mounted
systems as well as for remote systems with a narrow field of view, and the partial derivatives compute as
sub-pixel accuracy is of limited importance, which explains this
commonly neglected point. For a setup with COTS components ∂∆ ∂∆
however it is of great significance. = −gy , = gx .
∂mx ∂my

170

Algorithm Tag RMSE
center of gravity CG11 0.1351
[P´ rez et al. 2003]
e PE10 0.3047
ellipse fit, 20◦ steps EFS2 0.4016
ellipse fit, 10◦ steps EFS1 0.4022
Adaptive image pattern match PM01 0.1059
Gradient direction consensus GD02 0.0501

Table 1: RMSE values for different algorithms

3.3 Ellipse Fitting
Figure 5: Example template images. Black denotes masked out
For comparison, an ellipse fitting approach Inspired by the algo-
pixels. The glint position is masked out.
rithms presented in Section 2 has been implemented. First, pupil
rim points are detected similar to [P´ rez et al. 2003]. To eliminate
e
the influence of erroneously detected points algorithms again an M-
Using an M-estimator (described e. g. in [Staudte and Sheather estimator is employed.
1990]), a common intersection point is determined.
Figure 4 shows the positions in the image with gradient strengths Using 18 sample lines the result is
within the abovementioned limits. Used positions are marked in not satisfactory. The estimator some-
dark blue, outliers in light blue. It is apparent in this sample image, times still includes glint border points
that the resulting center (red cross) is not ’pushed away’ by the in the set of input samples to opti-
glint opposed to the reference point (green) calculated by the simple mize, resulting in an ellipse distorted
center of gravity approach. by the glint.

3.2 Image Based Template Matching Sending a ray every 20◦ seems rather
coarse, raising the number of rays to
Given the relatively small number of pixels to be analyzed direct 36 (10◦ steps) however dose not give
image comparison with an adaptable image template is possible. a noticeable improvement (see Sec- Figure 6: Ellipse fitting
Such a template must reflect the situation of low resolution by tak- tion 4).
ing into account the smooth transitions in intensities between the
different regions. That is, the generation of the template image 4 Comparison
must be able to perform sub-pixel accuracy and anti aliasing and
blending in the drawing process. Providing reference data for eye tracking is difficult. Instructing
Choosing appropriate (initial) pupil and iris radii and luminosity a subject to follow a point with known coordinates on a computer
values for the eye regions, an initial template is generated. By iter- screen bears two error sources when comparing pupil detection al-
atively shifting it to different positions around the initial guess, the gorithms: errors introduced by the gaze estimation algorithms due
position with the best match is taken for the next step with reduced to non-linearities cannot be distinguished from detection errors and
search range and step width. This is repeated until a minimum step due to unintentional saccades the position of the pupil center does
width, e. g. 0.1 pixel, is reached. not necessarily correspond to the reference position on screen.

Of course, using appropriate parameters for the generation is es- Given the low contrast in eye images gathered from a low resolution
sential. Therefore these parameters should be updated over time to device it is also difficult to assign a ground truth value to real images
account for changes in the appearance of the eye. Intensity values matching the desired accuracy. Hence, as a first mean of evaluation,
can be adapted by biasing the current values towards the actual val- synthetic images have been generated with graphics algorithms ca-
ues found after every successful match. The pupil radius can be pable of producing sub-pixel accurate anti aliased output. While
adapted from the ellipse parameters determined for an analysis as these images cannot be seen as a valid replacement for real images
described in Section 3.3. it can at least be argued that bad performance on such almost ideal
images will not become better in real images.
The image comparison of two images f and g of size W × H is
performed using common methods like the mean squared error or To distinguish between the pure pupil center detection and the sit-
by employing a cross-correlation measure. uation where glints put an additional strain on the algorithm the
first tests have been performed without glints. As some algorithms
W,H
1 X benefit from prior knowledge of the glint position (if any), an accu-
MSE = (fxy − gxy )2 (2) rate glint position detection should be performed prior to the pupil
W · H x,y
center detection and has been neglected in this work.
W,H
1 X
A series of 700 images with synthetic eye positions at regular sub-
COR = (fxy · gxy ) (3)
W · H x,y pixel positions was generated as test input. The algorithms from
Section 3 as well as the center of gravity and the algorithm de-
Producing a template image with all relevant aspects would cause a scribed in [P´ rez et al. 2003] (which performed best in [Droege
e
multitude of additional parameters to be considered e. g. glint posi- et al. 2008]) have been applied to this sequence. The Euclidean dis-
tion(s), glint radius, upper and lower lid positions and form. Since tance to the reference coordinate is measured, the resulting rooted
most of these parameters are hard to determine and of little further mean square error (RMSE) is listed in Table 1. The deviation of the
use it is easier to mask out these regions from the comparison (Fig- measured positions (red dots) from the reference (light blue dots) is
ure 5). shown in Figures 7-9.

171

synth_01_noglint L synth_01_noglint L synth_01_noglint L
223 223 223
EFS1 PM01 GD02
GNDT GNDT GNDT
RMSE: 0.4022 RMSE: 0.1059 RMSE: 0.05005

222 222 222

221 221 221

220 220 220

219 219 219
223 224 225 226 227 228 223 224 225 226 227 228 223 224 225 226 227 228

Figure 7: Ellipse fit sub-sampled, Figure 8: Adaptive image pattern match, Figure 9: Gradient direction consensus,
10◦ steps, synthetic input synthetic input synthetic input

Some algorithms show periodic patterns repeating at pixel fre- By improving the test framework and increasing the naturalness of
quency, like in Figure 7. This is easily explained by the effects the test image series a more thorough investigation of new and ex-
of aliasing and thresholds, but nevertheless undesirable. isting algorithms will be possible and lead to improved results.
The RMSE values from Table 1 seem to contradict the plots at a
first glance, but the variance for some of the other algorithms is References
much higher resulting in a bad RMSE result. The most appeal-
ing result is found in Figure 9 for the gradient direction consensus. DAUGMAN , J. 2004. How iris recognition works. Circuits and
This is backed by the best RMSE value, however it must be re- Systems for Video Technology, IEEE Transactions 14, 1, 21–30.
membered that this is measured on ’clean’ synthetic input images. DAUNYS , G., AND R AMANAUSKAS , N. 2004. The accuracy of
Adding noise to the images reveals that the performance decreases eye tracking using image processing. In NordiCHI ’04, ACM,
the stronger the noise in the images is. Additionally, no real iris New York, NY, USA, 377–380.
structure is present in the synthetic images which would introduce
noisy regions even in the case of a good input signal. D ROEGE , D., S CHMIDT, C., AND PAULUS , D. 2008. A compar-
ison of pupil center estimation algorithms. In COGAIN 2008,
0.4
GD02,L
GD02,R
H. Istance, O. Stepankova, and R. Bates, Eds. (short paper).
EFB2,L
0.35 EFB2,R
PE10,L
PE10,R H ANSEN , D. W., AND J I , Q. 2009. In the eye of the beholder:
0.3
A survey of models for eyes and gaze. IEEE Transactions on
0.25 Pattern Analysis and Machine Intelligence. (in print).
0.2
L I , D., W INFIELD , D., AND PARKHURST, D. J. 2005. Star-
0.15 burst: A hybrid algorithm for video-based eye tracking combin-
ing feature-based and model-based approaches. In CVPR ’05,
0.1
IEEE, Washington, 79–86.
0.05

¨ ¨
M AJARANTA , P., AND R AIH A , K.-J. 2002. Twenty years of eye
0
0 1 2 3 4 5 6 7 typing. In Proceedings of Eye Tracking Research and Applica-
tions, ACM, 15–22.
Figure 10: RMSE values dependent on noise strength (σ)
O HNO , T., M UKAWA , N., AND YOSHIKAWA , A. 2002. Freegaze:
Figure 10 shows the RMSE values for some of the algorithms as A gaze tracking system for everyday gaze interaction. In Pro-
a function of the strength of noise added to the synthetic images, ceedings of the symposium on ETRA 2002: eye tracking research
left and right eye (marked ’L’ and ’R’). The sigma of the Gaussian and applications symposium, 125–132.
noise is varied from 0 to 7. The labels correspond to Table 1. ´ ´ ´
P E REZ , A., C ORDOBA , M., G ARCIA , A., M E NDEZ , R., M UNOZ ,
While the subjective impression of the performance on real input ´
M., P EDRAZA , J., AND S ANCHEZ , F. 2003. A precise eye-gaze
images looks good, no quantifiable results can be given up to now. detection and tracking system. In WSCG POSTERS proceedings,
Since the overall performance in a complete eye tracking system is February 3-7, 2003.
always influenced by the gaze estimation process such results are P OURSABERI , A., AND A RAABI , B. 2005. A novel iris recog-
difficult to interpret and to account for the specific algorithm used. nition system using morphological edge detector and wavelet
phase features. ICGST International Journal on Graphics, Vi-
5 Conclusion sion and Image Processing 05.
S AN AGUSTIN , J., S KOVSGAARD , H., H ANSEN , J. P., AND
It is evident that a reasonable comparison must not be based on H ANSEN , D. W. 2009. Low-cost gaze interaction: ready to de-
RMSE values of the dislocation distance alone. Additional mea- liver the promises. In CHI EA ’09: 27th intl. conf. on Human fac-
sures like the variance must be used to characterize the perfor- tors in computing systems, ACM, New York, USA, 4453–4458.
mance. The approaches of adaptive pattern matching and gradient
direction consensus seem to be good candidates for further investi- S TAUDTE , R. G., AND S HEATHER , S. J. 1990. Robust Estimation
gation. Their robustness to noisy signals and distortions by glints is and Testing. Wiley, New York.
still to be investigated further, the subjective impression observed
with real input images look promising.

172

Droege Pupil Center Detection In Low Resolution Images

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Droege Pupil Center Detection In Low Resolution Images

Ähnlich wie Droege Pupil Center Detection In Low Resolution Images (20)

Mehr von Kalle

Mehr von Kalle (20)

Droege Pupil Center Detection In Low Resolution Images