Can AI say from our eyes when we read relevant information?

EYE TRACKING
COMPUTER VISION
INFORMATION
RELEVANCE
Can AI say from our eyes
when we read relevant information?

Nilavra Bhattacharya1, Somnath Rakshit1, Jacek Gwizdka1, Paul Kogut2
ACM SIGIR CHIIR 2020 • VANCOUVER VIRTUAL
RELEVANCE PREDICTION
FROM EYE MOVEMENTS
Using Semi-interpretable Convolutional Neural Networks
1 School of Information, The University of Texas at Austin
2 Rotary and Mission Systems, Lockheed Martin Corporation
ixlab.ischool.utexas.edu

INTRODUCTION
Introduction User Study Scanpath Image Classification Interpretability Conclusion

Two Worlds in the Information Field
Image: Tefko Saracevic (https://studylib.net/doc/15399702)
• Situational relevance or utility:
“situationally relevant items of
information are those that
answer, or logically help to
answer, questions of concern”
(Wilson, 1973)
• This work: situational relevance
= users’ perceived-relevance of
the documents they examine for
answering a question
Image: https://www.noldus.com/applications/eye-tracking-
physiology
Eye-tracking

Background: Eye-tracking & Information Relevance
• Drawback 1: aggregate ET data at stimulus / trial / participant level
• aggregated fixation counts/durations (Fahey+ 2011; Frey+ 2013; Gwizdka 2014; Loboda+ 2011; Puolamäki+ 2008; Wenzel+ 2017; Wittek+ 2016)
• reading related preprocessing before aggregation (Buscher+ 2008; 2012; Gwizdka, 2014a; 2014b, 2017; Gwizdka+ 2017)
• ET features from 2-second windows near the end of trial has more discriminating power (Gwizdka+ 2017)
=> collapsing ET data leads to loss of information
• Drawback 2: lack of standard feature selection => varied prediction performance; accuracy rarely above 70%
(Simola+ 2008; Slanzi+ 2017; Wenzel+ 2017; Gwizdka+ 2017);
Eye Movement Scanpath 1 Eye Movement Scanpath 2
Similar? Different?
How Much?

Background: Convolutional Neural Networks
• image classification is a major application of CNNs
• take an input image and predict a label for the image (e.g. “cat” or “dog”?)
• transfer learning: training received by a CNN for solving one task can be re-used to solve another related task
• e.g. training from cat/dog classifier can be re-used to classify traffic symbols
• benchmark CNN models, pre-trained on millions of images for classification tasks (ImageNet challenge) are readily available
• e.g. VGG, ResNet, DenseNet, etc.
Image: https://towardsdatascience.com/covolutional-neural-network-cb0883dd6529

Proposed Approach
Image: https://dev.to/frosnerd/handwritten-digit-recognition-using-convolutional-neural-networks-11g0
Scanpath - Image CNN Image Classifier
User
Perceived
Relevant?
Prediction
Eye movement
Scanpath

EYE-TRACKING
USER STUDY

Experimental Design
• Participants (N = 25, college-age students)
Example Trigger Q: The submarine Kursk was part of which Russian fleet?
Perceived Relevant Perceived Irrelevant
Trigger
Question
TREC 2005
Q&A Task
Spacebar
Relevance
Judgement
(binary)
Y/N then
Spacebar
+
1s
Short News
Article
AQUAINT Corpus
of English News
Text
+
Fixation
>= 2s

GENERATING
SCANPATH-IMAGES

Generating Scanpath-Images
SCANPATH - IMAGESCANPATH
Encode three attributes of eye fixations:
1. fixation location
2. fixation duration
3. fixation start time, for temporal ordering

Generating Scanpath-Images: Fixation Duration

Generating Scanpath-Images: Fixation Start Time
First
Saccade
Last
Saccade
Matplotlib’s winter colourmap
• each linearized saccade has a solid colour
Saccade Colour

Scanpath-Images
PERCEIVED RELEVANT PERCEIVED IRRELEVANT

SCANPATH-IMAGE
CLASSIFICATION

Scanpath-Image Classification
Given only the scanpath-image of a user’s eye movements on
the news article, predict if the user perceived the article to be
relevant for answering the trigger question.
Image: https://dev.to/frosnerd/handwritten-digit-recognition-using-convolutional-neural-networks-11g0
Scanpath - Image CNN Image Classifier
Perceived
Relevance
Prediction

Scanpath-Image Classification: Neural Network Architecture
Final hyperparameters:
epochs: 6, batch-size: 16, momentum: 0.9
Shallow(er) models:
VGG16, VGG19
Really Deep Models:
ResNet50
DenseNet121, DenseNet201
InceptionResNetV2
Optimizer: Stochastic Gradient Descent (SGD) with momentum
Pre-trained
CNN model
(ImageNet
Weights)
Fully
Connected
Layer
(256 nodes, ReLU,
with/without L1L2)
Dropout
(prob = 0.2)
Output Layer
(1 node, Sigmoid)

Scanpath-Image Classification: Results
For this specific task:
• Models do not overfit
• Shallow models classify better than deep models
Shallow
Deep
Table 1 from paper

CNN PREDICTION
INTERPRETABILITY

Attempt to Interpret CNN Predictions
Gradient-Weighted Class Activation Mapping (Grad-CAM)
Original Image CAM for “Cat” classCAM for “Dog” class
2017 IEEE International Conference on Computer Vision

Attempt to Interpret CNN Predictions
SCANPATH CLASS ACTIVATION MAP (CAM) AVERAGE CAM
Across all scanpath-images in this relevance class
Perceived Irrelevant
Perceived Relevant

CONCLUSION

Conclusion
Limitations:
• very simple information search task
• short texts of similar type
• relatively uniform group of participants (college-age
students)
Future Directions:
• complex scenarios, e.g., freely searching on the open
web
• diverse participants, e.g., young vs. older adults
• Eye-movement scanpath-image
classification:
• no aggregate measures: all eye-tracking data is
used
• spatio-temporal aspects of eye-movements are
preserved
• knowledge of screen content not needed
• additional insights (e.g. reading / scanning) not
needed
• Proof of concept:
• promising results, even with small dataset, without
overfitting
• CNNs trained for a different task can detect
patterns in eye-movements which are concordant
with prior literature

Acknowledgements
Student Travel Grant
Experimental Design Contribution,
Data Collection
Prof. Bradley Hatfield
Dr. Rodolphe Gentili
Dr. Joe Dien
Hyuk Oh
Kyle James Jaquess
Li-Chuan Lo
Department of Kinesiology,
University of Maryland, College Park
For inspiration:
Blog post on using mouse
trajectories for fraud detection
Gleb Esman
Splunk Inc.
THANK
YOU
@NilavraBnilavra@ieee.org ixlab.ischool.utexas.edu
Full paper:
https://dl.acm.org/doi/10.1145/3343413.3377960
https://arxiv.org/abs/2001.05152

Can AI say from our eyes when we read relevant information?

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Can AI say from our eyes when we read relevant information?

Ähnlich wie Can AI say from our eyes when we read relevant information? (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Can AI say from our eyes when we read relevant information?