GaZIR is a gaze-based interface for searching and browsing images. We first describe the system in detail, how users interact with it and how the system predicts the relevance of images the user is searching by using eye-tracking. We then go on to experiment with the system by testing the predictions of image relevancy and actual image retrieval accuracy.
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
GaZIR: Gaze-based Zooming Interface for Image Retrieval
1. Intelligent Multimodal Interaction 2014
Francesco Bonadiman
Craig Kershaw
GaZIR
“Gaze-based Zooming Interface
for Image Retrieval”
László Kozma, Arto Klami, Samuel Kaski
Helsinki Institute for Information Technology HIIT
2. What is GaZIR?
● Gaze-based interface for searching and
browsing images
○ Collecting information from what the user would do
naturally via eye-tracking (implicit feedback)
● The user can zoom-in and out
○ Focusing on the centre or the borders
○ Allowing Image retrieval
3. Put a Ring on it!
● Consists of 3 Rings of images
○ Each consecutive ring shows the next set of
relevant images based on information gathered
from the previous ring
● Better for predicting image relevancy
○ Avoids users scanning images row-by-row, as with
grid-based layouts
5. Eye Tracking
● Eye-movements tracking user’s pupils
○ Fixation of >120ms → Relevant image
● 3 main advantages of Eye-Tracking
○ Effortlessness, user only needs to look at the images
○ “I-will-know-it-when-I-see-it” search problems
○ Hands are not needed → Motor disabilities
6. Similar Work
● Only preliminary studies
● Oyekoya et al.
○ Simple retrieval → relevance from viewing time
● Klami et al.
○ More complex predictions
○ Only measure isolated predictions
○ Artificial setup
7. … and GaZIR?
● GaZIR → combines two approaches
○ More sophisticated relevance predictor
○ Real retrieval search engine
○ Gaze-based interaction designed interface
8. Aim of the research paper
● To provide a user interface that is more
fluid and natural with searching
● Test whether it is feasible to construct and
if it works in practice with pre-existing
CBIR search engines
○ Designed to work with any CBIR engine
9. Data collection
● Simplifications were made
○
○
○
○
user was only expected to zoom inwards
not allowed to reset the process
images only retrieved when zooming-in
mouse wheel used for zooming (no eye control)
● Training data collected to create a model
○ show images closer to users’ expectations
10. Experiment 1
● 6 different users
● Each of them performing 6 search tasks
○ look into the MirFlickr database
○ search images matching the category description
○ indicate which ones were relevant
● On average around 120 images
○ eye movement over 4300 user-task-image instances
11. Experiment 2
● 3 average users from the previous
● 6 new search tasks:
○ 2 with the gaze-based relevance predictor
○ 2 with a dummy interface
○ 2 same interface + explicit feedback (mouse click)
● Performances measurement:
○ by counting the proportion of relevant images
12. Results
● Prediction accuracy > random for all users
○ confirms “relevance through eye movements”
● Huge differences between the users
○
○
○
○
○
due to different tasks or use of the system
for some prediction accuracy → excellent
for others → slightly better than random
explicit feedback (mouse click) → the best
predicted feedback → comparable for 50% of tasks
13. Contribution
● Distinction “false positive and negative”
○ former: look similar to the relevant but miss details
○ latter: images (too) easy to recognize as relevant
● Promising results → further experiments
● GaZIR is concluded to be
○ “first attempt of building a sophisticated image
retrieval interface utilizing implicit gaze information”