The most effective way for people to communicate would be spontaneous language generation.
• Novel utterance generation: the ability to say “almost everything,” without a strictly predefined set of possible utterances/generations/ productions.
• We attempt to give people with SSPI the ability to say almost anything. For this reason, we would like to build a general system that produces novel utterances without predefined rules. We chose a data-driven approach in the form of statistical language modeling.
• In some conditions (e.g., cerebral palsy), people suffer from communication and very severe movement disorders at the same time. For them, special peripherals are necessary. Eye tracking provides a promising alternative to people who cannot use their hands.
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
MindCare 2014: Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments
1. Towards a Smart Wearable Tool to Enable
People with SSPI to Communicate by
Sentence Fragments
Gyula Vörös (ELTE)
Anita Verő (ELTE)
Balázs Pintér (ELTE)
Brigitta Miksztay-Réthey (ELTE)
Takumi Toyama (DFKI)
Andras Lorincz (ELTE)
Daniel Sonntag (DFKI)
MindCare 2014, May 8, University of Tokyo
Tuesday, May 20, 14
2. Motivation
• The ability to communicate with others is of paramount importance
for mental well-being.
• SSPI: Severe speech and physical impairments / communication
disorders
– Cerebral palsy, stroke, ALS/motor neurone disease (MND), muscle spacticity
etc.
• People with SSPI face enormous challenges in daily life activities.
• A person who cannot speak may be forced to only communicate
directly to his closest relatives, thereby completely relying on them
for interacting with the external world.
• Understanding people using traditional alternative and
augmentative communication (AAC) methods (such as gestures and
communication boards) requires training.
• These methods restrict possible communication partners to those
who are already familiar with AAC.
Tuesday, May 20, 14
3. Motivation
• The ability to communicate with others is of paramount importance
for mental well-being.
• In AAC, utterances consisting of multiple symbols are often
telegraphic: they are unlike natural sentences, often missing words to
allow for successful and detailed communication. (“Water now!” instead of “I
would appreciate a glass of water, immediately.”)
– Some systems allow users to produce whole utterances or
sentences that consist of multiple words. The main task of the
AAC system is to store and retrieve such utterances.
– However, using a predefined set of sentences severely restricts
the applicability.
• Other approaches allow for a generation of utterances from an
unordered, incomplete set of words, but they use predefined rules
that constrain communication.
Tuesday, May 20, 14
4. 4
Augmented Reality in Medicine
Video: https://dl.dropboxusercontent.com/u/48051165/ERmed_CHI2013video.mp4
Tuesday, May 20, 14
5. System Architecture: Mobile Web and App Design
XML-RPC
ERmed Proxy
Ermed Bridge
TCP
Speech-based interaction
HMD & Gaze-based
interaction
Pen-based
interaction
Tuesday, May 20, 14
6. Goals of presented work
• To enable broad accessibility and
communication possibilities for people
with SSPI
• Technical Challenges:
– Overcome physical impairments (for user
input)
– Help non-trained people to understand
people with SSPI (towards generation)
Tuesday, May 20, 14
7. Approach
• The most effective way for people to communicate would be
spontaneous language generation.
• Novel utterance generation: the ability to say “almost everything,”
without a strictly predefined set of possible utterances/generations/
productions.
• We attempt to give people with SSPI the ability to say almost anything.
For this reason, we would like to build a general system that produces
novel utterances without predefined rules. We chose a data-driven
approach in the form of statistical language modeling.
• In some conditions (e.g., cerebral palsy), people suffer from
communication and very severe movement disorders at the same
time. For them, special peripherals are necessary. Eye tracking
provides a promising alternative to people who cannot use their
hands.
7
Tuesday, May 20, 14
8. Fourfold solution
(1) Smart glasses with gaze trackers (thereby
extending ) (gaze tracking and
interpretation/graphical symbol selection)
(2) Symbol / Utterance selection
(3) Language generation (sentence generation,
thereby using language models to propose
best hypotheses)
(4) Text-To-Speech functionality (TTS)
8
Tuesday, May 20, 14
9. • Eye tracking glasses (ETG)
– Forward-looking camera
– Eye-tracking cameras
• Head mounted display (HMD)
– See-through
• Motion Processing Unit (MPU)
– Capture head gestures that indicate
recalibration need
Hardware components
Tuesday, May 20, 14
14. Main functions
• Symbol selection with gaze tracking
– Calibration is crucial and poses a problem
– Adapt or recalibrate?
• Utterance generation
– Based on selected symbols
– Uses natural language models
Tuesday, May 20, 14
17. Symbol selection on HMD
• Idea: the user selects communication
symbols on the HMD, using eye
tracking
• Crucial problems with calibration
– Eye tracking has to be calibrated
– Calibration may degrade over time
– Adapt or recalibrate?
• User might tolerate calibration errors (when
their are small and negligible)
Tuesday, May 20, 14
19. Symbol selection game on HMD
• Goal: select the numbers in correct order
(1234123…)
• Each selection adds a small amount of error
• Participants indicate when error is too significant to
be tolerated
• 4 participants, 4 experiments each
• On average, errors up to 2.7° deviation from actual
eye focus are tolerable.
Tuesday, May 20, 14
20. Real communication experiments
• The Brother Retina HMD was too small to show a
full-sized communication board (CB) (although
resolution is quite good)
• We use a projector / beamer to display the CB
– Simulate a large HMD
• Recognition Feedback is provided (to the user)
– Estimated gaze position + selection
• Speech generation
– word for the selected symbol synthesized
Tuesday, May 20, 14
22. https://www.youtube.com/watch?v=C4DOmMtgqz0
Participant
• The participant of this example test is a 30 year old
man with cerebral palsy.
• He usually communicates with a headstick and an
alphabetical communication board, or with a PC-
based virtual keyboard controlled by head tracking.
• Mousense is already an improvement.
22
Tuesday, May 20, 14
25. Communication Setting
• The participant could move his head
• The head position must be accounted for (MPU)
• Fiducial markers around the board are recognized
by vision-based pattern recognition techniques
Tuesday, May 20, 14
26. Usage and HCI details
• The estimated gaze position was indicated as a
small red circle on the projected board (similar to
the previous test).
• A symbol was selected by keeping the red circle on
it for two seconds.
• The eye tracker sometimes needs recalibration;
– the user could initiate recalibration by raising his
head up (detected by the MPU.)
– Once the recalibration process was triggered, a
distinct sound was played, and an arrow
indicated where the participant had to look for
doing RC.
26
Tuesday, May 20, 14
28. Communication experiments
• Goal: communicate with a partner
• Two situations (with different boards)
– Buying food
– Discussing an appointment
Tuesday, May 20, 14
29. Experimental results
• Verification
– To verify that communication was successful,
the participant indicated misunderstandings
using well-known yes-no gestures, which
were quick and reliable. Moreover, a certified
expert in AAC was present to indicate
apparent communication problems.
– 205 symbol selections happened
– 23 of them were incorrect
• 89% accuracy
– The error rate was acceptable
• Real communication took place!
Tuesday, May 20, 14
31. External symbols
• Idea
– Not all symbols are present in the system
– Optical character recognition can help
– (Object recognition can help)
• Example
– The user wants to buy a certain type of
sandwich
– In the store, there are labels with the
names of the sandwich types on it
Tuesday, May 20, 14
36. Sentence fragment generation
• A word guessing game
– Good afternoon, how are
– I apologize for being late, I am very
– My favorite OS is
36
you?
sorry!
[Linux, Mac OS, Windows XP, Win CE].
• A symbol interpretation game
• LM needs to help
– To select words with the right sense (disambiguate homonyms: e.g.,
river bank versus money bank)
– To select hypotheses where graphical symbols (words) are ‘in the
right place’
– To increase cohesion between words (agreement)
Tuesday, May 20, 14
37. Sentence fragment generation
• Understanding symbol communication
requires practice
• Symbol communication is non-syntactical
– Function words are rarely used
– Order of symbols may vary
• e.g., {lemon, sugar, tea} -> tea with lemon and
sugar
• Idea
– Generate fragments by inserting function words
– Rank them based on language models
– User should select from top 4 variants
Tuesday, May 20, 14
38. Language Models
• Estimate the probabilities of n-grams (sequences of
words)
• P(tea with sugar) > P(tea the sugar)
– Use a corpus (collection of texts)
• Sparsity problem
– Long n-grams tend to be rare
– Smoothing and backoff methods are used to deal
with this problem
38
Tuesday, May 20, 14
39. Backoff and Smoothing
• Backoff: use lower order statistics when the n-gram is rare
– Example: stupid backoff
• Smoothing: interpolate with lower order statistics
– Example: Kneser-Ney smoothing
Tuesday, May 20, 14
40. Stupid backoff
Let denote a string of L tokens of a fixed vocabulary
approximation reflects the Markov assumption that only the most recent n-1
tokens are relevant when predicting the next word.
For any substring wi
j of w1
L let denote the frequency of occurrence of tha
substring in the longer training data. The maximum likelihood probability
estimates for the n-grams are given by their relative frequencies
Problematic because (de-)nominator can be zero.
Appropriate conditions are needed on the r.h.s.
Noisy: estimates need smoothing
Tuesday, May 20, 14
41. Language corpora in use
• Google Books n-gram corpus
– Collection of digitized books from Google
– Very large, freely available
– Represents written language
• OpenSubtitles corpus
– Collection of film and TV subtitles
– Moderate size, freely available
– Represents spoken language
Tuesday, May 20, 14
42. Language modeling tools
• Google Books n-gram corpus
– Software: BerkeleyLM
• Pauls, A., Klein, D.: Faster and Smaller N-Gram Language Models. In: 49th Annual
Meeting of the ACL: Human Language Technologies, Vol. 1, pp. 258--267. ACL,
Stroudsburg, USA (2011)
– Method: Stupid Backoff
• Brants, T., Popat, A. C., Xu, P., Och, F. J., Dean, J.: Large language models in machine
translation. In: EMNLP 2007, pp. 858--867.
• OpenSubtitles corpus
– Software: KenLM
• Heafield, K.: KenLM: Faster and Smaller Language Model Queries. In: EMNLP 2011
Sixth Workshop on Statistical Machine Translation, pp. 187--197. ACL,
Stroudsburg, USA (2011)
– Method: Modified Kneser-Ney smoothing
• Heafield, K., Pouzyrevsky, I., Clark, J. H., Koehn, P.: Scalable Modified Kneser-Ney
Language Model Estimation. In: 51st Annual Meeting of the Association for
Computational Linguistics, pp. 690--696. Curran Associates, Inc., New York, USA
Tuesday, May 20, 14
43. Prefix tree building algorithm
• Representation
– Work with a prefix tree, where each path represents an
n-gram
• Input
– Set of named entities (e.g., tea, sugars)
– Set of function words (e.g., you, with, for)
– Desired length of sentence fragment (e.g., 3 words)
• Parameters
– Score threshold (minimal score, e.g., 10-30)
– Leaf limit (maximal number of open leafs, e.g., 200)
• Output
– Tree (or a list) of potential sentence fragments
Tuesday, May 20, 14
44. Prefix tree building algorithm
• Essentially a breadth-first search, with
some constraints
• Start with an empty tree, root node is
open
• While there is an open node:
– Extend all open leafs with all available
words, if the resulting n-gram’s “score” is
above a certain threshold
– Discard every open leaf node that cannot
be extended (falls below threshold)
Tuesday, May 20, 14
53. Conclusion
• A system to enable people with SSPI to
communicate with natural language
sentences
• Demonstrated the feasibility of the
approach
Tuesday, May 20, 14
54. Our contribution
• An interaction system to reduce communication barriers for people
with severe speech and physical impairments (SSPI) such as cerebral
palsy.
• The system consists of two main components: (i) the head-mounted
human-computer interaction (HCI) part consisting of smart glasses
with gaze trackers and text-to-speech functionality (which implement
a communication board and the selection tool), and (ii) a natural
language processing pipeline in the backend in order to generate
complete sentences from the symbols on the board.
• We developed the components to provide a smooth interaction
between the user and the system thereby including gaze tracking,
symbol selection, symbol recognition, and sentence generation.
• Our results suggest that such systems can dramatically increase
communication efficiency of people with SSPI.
Tuesday, May 20, 14
55. Eye Tracking Requirements
• Eye gaze is a compelling interaction
modality but requires user calibration
before interaction can commence.
• State of the art procedures require the
user to fixate on a succession of
calibration markers, a task that is often
experienced as difficult and tedious.
55
Tuesday, May 20, 14
56. Possible improvements and outlook
• 3D gaze tracker that estimates the part of 3D space observed by the
user
• OCR, signs and object recognition methods to
convert “symbols” to the communication board
• new HMDs and eye tracker setups (ergonomic)
• Advanced NLP to transform series of symbols to whole domain
sentences within the context
• integrated calibration: e.g., https://www.andreas-bulling.de/
fileadmin/docs/pfeuffer13_uist.pdf
• personalisation according to patient record
• CPS integration
56
Tuesday, May 20, 14
61. Potential CPS systems
• Intervention (e.g., collision avoidance);
• Precision (e.g., robotic surgery);
• Operation in dangerous or inaccessible environments (e.g., search and rescue);
• Augmented reality, and the augmentation of human capabilities (e.g., healthcare
monitoring and decision support for doctors and patients).
– We bring together augmented reality and augmentation of human capabilities in the
mobile medical context: mobile CPS, in which the physical system with a medical
purpose has got inherent mobility.
– This is motivated by the rise in popularity of mobile interaction devices such as
smartphones and tablets which has increased interest for CPS developers.
– As technologies have become small, mobile, and pervasive, the logical next step
(beyond the rapid growth of smartphones or
– tablet PCs or surrounding computers) is the usage of mobile augmented reality
and mobile decision support CPS.
• Alternative and augmentative communication
– AAC & AR (mixed reality) for the patient
– Dementia and SSPI
61http://www.dfki.de/RadSpeech/Kognit
Tuesday, May 20, 14
62. 62
N-gram LM
Probability for a sentence:
Factorize (chain rule) without loss of generality:
Limit length of history
– Unigram LM:
– Bigram LM:
– Trigram LM:
Tuesday, May 20, 14
63. MPU
• The accelerometer is used to detect orientation and
acceleration. It can also be applied to detect free-
fall. The accelerometer returns three values (in units
of g), one along the x, y, and z-axes respectively.
• The gyroscope is used to detect rotation in space,
and returns information in units of degrees per
second. The gyroscope also returns three values
along the x, y, and z-axes.
63
Tuesday, May 20, 14