MindCare 2014: Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments

Towards a Smart Wearable Tool to Enable
People with SSPI to Communicate by
Sentence Fragments
Gyula Vörös (ELTE)
Anita Verő (ELTE)
Balázs Pintér (ELTE)
Brigitta Miksztay-Réthey (ELTE)
Takumi Toyama (DFKI)
Andras Lorincz (ELTE)
Daniel Sonntag (DFKI)
MindCare 2014, May 8, University of Tokyo
Tuesday, May 20, 14

Motivation
• The ability to communicate with others is of paramount importance
for mental well-being.
• SSPI: Severe speech and physical impairments / communication
disorders
– Cerebral palsy, stroke, ALS/motor neurone disease (MND), muscle spacticity
etc.
• People with SSPI face enormous challenges in daily life activities.
• A person who cannot speak may be forced to only communicate
directly to his closest relatives, thereby completely relying on them
for interacting with the external world.
• Understanding people using traditional alternative and
augmentative communication (AAC) methods (such as gestures and
communication boards) requires training.
• These methods restrict possible communication partners to those
who are already familiar with AAC.
Tuesday, May 20, 14

Motivation
• The ability to communicate with others is of paramount importance
for mental well-being.
• In AAC, utterances consisting of multiple symbols are often
telegraphic: they are unlike natural sentences, often missing words to
allow for successful and detailed communication. (“Water now!” instead of “I
would appreciate a glass of water, immediately.”)
– Some systems allow users to produce whole utterances or
sentences that consist of multiple words. The main task of the
AAC system is to store and retrieve such utterances.
– However, using a predeﬁned set of sentences severely restricts
the applicability.
• Other approaches allow for a generation of utterances from an
unordered, incomplete set of words, but they use predeﬁned rules
that constrain communication.
Tuesday, May 20, 14

4
Augmented Reality in Medicine
Video: https://dl.dropboxusercontent.com/u/48051165/ERmed_CHI2013video.mp4
Tuesday, May 20, 14

System Architecture: Mobile Web and App Design
XML-RPC
ERmed Proxy
Ermed Bridge
TCP
Speech-based interaction
HMD & Gaze-based
interaction
Pen-based
interaction
Tuesday, May 20, 14

Goals of presented work
• To enable broad accessibility and
communication possibilities for people
with SSPI
• Technical Challenges:
– Overcome physical impairments (for user
input)
– Help non-trained people to understand
people with SSPI (towards generation)
Tuesday, May 20, 14

Approach
• The most effective way for people to communicate would be
spontaneous language generation.
• Novel utterance generation: the ability to say “almost everything,”
without a strictly predeﬁned set of possible utterances/generations/
productions.
• We attempt to give people with SSPI the ability to say almost anything.
For this reason, we would like to build a general system that produces
novel utterances without predeﬁned rules. We chose a data-driven
approach in the form of statistical language modeling.
• In some conditions (e.g., cerebral palsy), people suffer from
communication and very severe movement disorders at the same
time. For them, special peripherals are necessary. Eye tracking
provides a promising alternative to people who cannot use their
hands.
7
Tuesday, May 20, 14

Fourfold solution
(1) Smart glasses with gaze trackers (thereby
extending ) (gaze tracking and
interpretation/graphical symbol selection)
(2) Symbol / Utterance selection
(3) Language generation (sentence generation,
thereby using language models to propose
best hypotheses)
(4) Text-To-Speech functionality (TTS)
8
Tuesday, May 20, 14

• Eye tracking glasses (ETG)
– Forward-looking camera
– Eye-tracking cameras
• Head mounted display (HMD)
– See-through
• Motion Processing Unit (MPU)
– Capture head gestures that indicate
recalibration need
Hardware components
Tuesday, May 20, 14

Eye Tracking Glasses
Eye Tracking Glasses by SensoMotoric
Instruments GmbH
Tuesday, May 20, 14

Head Mounted Display
AiRScouter Head Mounted Display by Brother
Industries
Tuesday, May 20, 14

Motion Processing Unit
MPU-9150 Motion Processing Unit by InvenSense
Inc.
Tuesday, May 20, 14

Communication Board
13
Tuesday, May 20, 14

Main functions
• Symbol selection with gaze tracking
– Calibration is crucial and poses a problem
– Adapt or recalibrate?
• Utterance generation
– Based on selected symbols
– Uses natural language models
Tuesday, May 20, 14

Usage scenario and steps
Tuesday, May 20, 14

Experiment 1
Retina HMD Setup
16
Tuesday, May 20, 14

Symbol selection on HMD
• Idea: the user selects communication
symbols on the HMD, using eye
tracking
• Crucial problems with calibration
– Eye tracking has to be calibrated
– Calibration may degrade over time
– Adapt or recalibrate?
• User might tolerate calibration errors (when
their are small and negligible)
Tuesday, May 20, 14

Symbol selection on HMD
Tuesday, May 20, 14

Symbol selection game on HMD
• Goal: select the numbers in correct order
(1234123…)
• Each selection adds a small amount of error
• Participants indicate when error is too signiﬁcant to
be tolerated
• 4 participants, 4 experiments each
• On average, errors up to 2.7° deviation from actual
eye focus are tolerable.
Tuesday, May 20, 14

Real communication experiments
• The Brother Retina HMD was too small to show a
full-sized communication board (CB) (although
resolution is quite good)
• We use a projector / beamer to display the CB
– Simulate a large HMD
• Recognition Feedback is provided (to the user)
– Estimated gaze position + selection
• Speech generation
– word for the selected symbol synthesized
Tuesday, May 20, 14

Experiment 2
Projector as HMD Surrogate
Real patient(s)
21
Tuesday, May 20, 14

https://www.youtube.com/watch?v=C4DOmMtgqz0
Participant
• The participant of this example test is a 30 year old
man with cerebral palsy.
• He usually communicates with a headstick and an
alphabetical communication board, or with a PC-
based virtual keyboard controlled by head tracking.
• Mousense is already an improvement.
22
Tuesday, May 20, 14

Bliss symbols and words (in Hungarian)
on board
Tuesday, May 20, 14

24
tea, two, sugar
(tea, kettő, cukor)
tea, lemon, sugar
(tea, citrom, cukor)
one, glass, soda
(egy, pohár, kóla)
Tuesday, May 20, 14

Communication Setting
• The participant could move his head
• The head position must be accounted for (MPU)
• Fiducial markers around the board are recognized
by vision-based pattern recognition techniques
Tuesday, May 20, 14

Usage and HCI details
• The estimated gaze position was indicated as a
small red circle on the projected board (similar to
the previous test).
• A symbol was selected by keeping the red circle on
it for two seconds.
• The eye tracker sometimes needs recalibration;
– the user could initiate recalibration by raising his
head up (detected by the MPU.)
– Once the recalibration process was triggered, a
distinct sound was played, and an arrow
indicated where the participant had to look for
doing RC.
26
Tuesday, May 20, 14

Communication Setup
Tuesday, May 20, 14

Communication experiments
• Goal: communicate with a partner
• Two situations (with different boards)
– Buying food
– Discussing an appointment
Tuesday, May 20, 14

Experimental results
• Veriﬁcation
– To verify that communication was successful,
the participant indicated misunderstandings
using well-known yes-no gestures, which
were quick and reliable. Moreover, a certiﬁed
expert in AAC was present to indicate
apparent communication problems.
– 205 symbol selections happened
– 23 of them were incorrect
• 89% accuracy
– The error rate was acceptable
• Real communication took place!
Tuesday, May 20, 14

Experiment 3
External Symbols (towards
mixed reality setup)
30
Tuesday, May 20, 14

External symbols
• Idea
– Not all symbols are present in the system
– Optical character recognition can help
– (Object recognition can help)
• Example
– The user wants to buy a certain type of
sandwich
– In the store, there are labels with the
names of the sandwich types on it
Tuesday, May 20, 14

External symbols - Communication Setting
Tuesday, May 20, 14

Results
• WoZ experiment for OCR (due to bad
light conditions)
• Similar “good” results as in experiment
2
34
Tuesday, May 20, 14

Technical input processing
and sentence generation
methods
35
Tuesday, May 20, 14

Sentence fragment generation
• A word guessing game
– Good afternoon, how are
– I apologize for being late, I am very
– My favorite OS is
36
you?
sorry!
[Linux, Mac OS, Windows XP, Win CE].
• A symbol interpretation game
• LM needs to help
– To select words with the right sense (disambiguate homonyms: e.g.,
river bank versus money bank)
– To select hypotheses where graphical symbols (words) are ‘in the
right place’
– To increase cohesion between words (agreement)
Tuesday, May 20, 14

Sentence fragment generation
• Understanding symbol communication
requires practice
• Symbol communication is non-syntactical
– Function words are rarely used
– Order of symbols may vary
• e.g., {lemon, sugar, tea} -> tea with lemon and
sugar
• Idea
– Generate fragments by inserting function words
– Rank them based on language models
– User should select from top 4 variants
Tuesday, May 20, 14

Language Models
• Estimate the probabilities of n-grams (sequences of
words)
• P(tea with sugar) > P(tea the sugar)
– Use a corpus (collection of texts)
• Sparsity problem
– Long n-grams tend to be rare
– Smoothing and backoff methods are used to deal
with this problem
38
Tuesday, May 20, 14

Backoff and Smoothing
• Backoff: use lower order statistics when the n-gram is rare
– Example: stupid backoff
• Smoothing: interpolate with lower order statistics
– Example: Kneser-Ney smoothing
Tuesday, May 20, 14

Stupid backoff
Let denote a string of L tokens of a ﬁxed vocabulary

approximation reﬂects the Markov assumption that only the most recent n-1
tokens are relevant when predicting the next word.
For any substring wi
j of w1
L let denote the frequency of occurrence of tha
substring in the longer training data. The maximum likelihood probability
estimates for the n-grams are given by their relative frequencies

Problematic because (de-)nominator can be zero.
Appropriate conditions are needed on the r.h.s.
Noisy: estimates need smoothing
Tuesday, May 20, 14

Language corpora in use
• Google Books n-gram corpus
– Collection of digitized books from Google
– Very large, freely available
– Represents written language
• OpenSubtitles corpus
– Collection of ﬁlm and TV subtitles
– Moderate size, freely available
– Represents spoken language
Tuesday, May 20, 14

Language modeling tools
• Google Books n-gram corpus
– Software: BerkeleyLM
• Pauls, A., Klein, D.: Faster and Smaller N-Gram Language Models. In: 49th Annual
Meeting of the ACL: Human Language Technologies, Vol. 1, pp. 258--267. ACL,
Stroudsburg, USA (2011)
– Method: Stupid Backoff
• Brants, T., Popat, A. C., Xu, P., Och, F. J., Dean, J.: Large language models in machine
translation. In: EMNLP 2007, pp. 858--867.
• OpenSubtitles corpus
– Software: KenLM
• Heafield, K.: KenLM: Faster and Smaller Language Model Queries. In: EMNLP 2011
Sixth Workshop on Statistical Machine Translation, pp. 187--197. ACL,
Stroudsburg, USA (2011)
– Method: Modified Kneser-Ney smoothing
• Heafield, K., Pouzyrevsky, I., Clark, J. H., Koehn, P.: Scalable Modified Kneser-Ney
Language Model Estimation. In: 51st Annual Meeting of the Association for
Computational Linguistics, pp. 690--696. Curran Associates, Inc., New York, USA
Tuesday, May 20, 14

Preﬁx tree building algorithm
• Representation
– Work with a preﬁx tree, where each path represents an
n-gram
• Input
– Set of named entities (e.g., tea, sugars)
– Set of function words (e.g., you, with, for)
– Desired length of sentence fragment (e.g., 3 words)
• Parameters
– Score threshold (minimal score, e.g., 10-30)
– Leaf limit (maximal number of open leafs, e.g., 200)
• Output
– Tree (or a list) of potential sentence fragments
Tuesday, May 20, 14

Preﬁx tree building algorithm
• Essentially a breadth-ﬁrst search, with
some constraints
• Start with an empty tree, root node is
open
• While there is an open node:
– Extend all open leafs with all available
words, if the resulting n-gram’s “score” is
above a certain threshold
– Discard every open leaf node that cannot
be extended (falls below threshold)
Tuesday, May 20, 14

Initial state
Tuesday, May 20, 14

Generated text fragments examples
Tuesday, May 20, 14

Conclusion
• A system to enable people with SSPI to
communicate with natural language
sentences
• Demonstrated the feasibility of the
approach
Tuesday, May 20, 14

Our contribution
• An interaction system to reduce communication barriers for people
with severe speech and physical impairments (SSPI) such as cerebral
palsy.
• The system consists of two main components: (i) the head-mounted
human-computer interaction (HCI) part consisting of smart glasses
with gaze trackers and text-to-speech functionality (which implement
a communication board and the selection tool), and (ii) a natural
language processing pipeline in the backend in order to generate
complete sentences from the symbols on the board.
• We developed the components to provide a smooth interaction
between the user and the system thereby including gaze tracking,
symbol selection, symbol recognition, and sentence generation.
• Our results suggest that such systems can dramatically increase
communication efficiency of people with SSPI.
Tuesday, May 20, 14

Eye Tracking Requirements
• Eye gaze is a compelling interaction
modality but requires user calibration
before interaction can commence.
• State of the art procedures require the
user to ﬁxate on a succession of
calibration markers, a task that is often
experienced as difficult and tedious.
55
Tuesday, May 20, 14

Possible improvements and outlook
• 3D gaze tracker that estimates the part of 3D space observed by the
user
• OCR, signs and object recognition methods to
convert “symbols” to the communication board
• new HMDs and eye tracker setups (ergonomic)
• Advanced NLP to transform series of symbols to whole domain
sentences within the context
• integrated calibration: e.g., https://www.andreas-bulling.de/
ﬁleadmin/docs/pfeuffer13_uist.pdf
• personalisation according to patient record
• CPS integration
56
Tuesday, May 20, 14

57
http://getnarrative.com/
The Narrative Clip is a tiny, automatic camera and app
that gives you a searchable and shareable photographic
memory.
Tuesday, May 20, 14

59
https://www.spaceglasses.com
Tuesday, May 20, 14

Thank you for your
attention!
Tuesday, May 20, 14

Potential CPS systems
• Intervention (e.g., collision avoidance);
• Precision (e.g., robotic surgery);
• Operation in dangerous or inaccessible environments (e.g., search and rescue);
• Augmented reality, and the augmentation of human capabilities (e.g., healthcare
monitoring and decision support for doctors and patients).
– We bring together augmented reality and augmentation of human capabilities in the
mobile medical context: mobile CPS, in which the physical system with a medical
purpose has got inherent mobility.
– This is motivated by the rise in popularity of mobile interaction devices such as
smartphones and tablets which has increased interest for CPS developers.
– As technologies have become small, mobile, and pervasive, the logical next step
(beyond the rapid growth of smartphones or
– tablet PCs or surrounding computers) is the usage of mobile augmented reality
and mobile decision support CPS.
• Alternative and augmentative communication
– AAC & AR (mixed reality) for the patient
– Dementia and SSPI
61http://www.dfki.de/RadSpeech/Kognit
Tuesday, May 20, 14

62
N-gram LM
 Probability for a sentence:
 Factorize (chain rule) without loss of generality:
 Limit length of history
– Unigram LM:
– Bigram LM:
– Trigram LM:
Tuesday, May 20, 14

MPU
• The accelerometer is used to detect orientation and
acceleration. It can also be applied to detect free-
fall. The accelerometer returns three values (in units
of g), one along the x, y, and z-axes respectively.
• The gyroscope is used to detect rotation in space,
and returns information in units of degrees per
second. The gyroscope also returns three values
along the x, y, and z-axes.
63
Tuesday, May 20, 14

MindCare 2014: Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie MindCare 2014: Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments

Ähnlich wie MindCare 2014: Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MindCare 2014: Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments