Intern presentation

Presentation on Internship Work
Speech and Eye Tracking Enabled Computer Assisted Translation
(SEECAT)
Copenhagen Business School
By: Himanshu Bansal

BACKGROUND
Michael Carl
Associate Professor
CBS
Srinivas Bangalore
Principal Member
AT&T Research Labs

BACKGROUND
We need translation
To convey our thoughts foreign language speaker
To understand foreign language text and speech
-------------------------------------------------
Training Data for automated system
To prepare high quality manuscripts of same text in different language

BACKGROUND
ProZ
Tomedes
Verbalizeit
gengo
Straker Translations

BACKGROUND
Translog – Manual Translation

BACKGROUND
CASMACAT – Computer Assisted Translation

BACKGROUND
SEECAT as an extension of CASMACAT
Translator reads a source text on a computer screen and speaks out the translation
in the target language, a process called sight translation. This sight translation
process is supported by an Automatic Speech Recognition (ASR) and a Machine
Translation (MT) system, which transcribe the spoken speech signal into the target
text and which assist the translator with partial translation proposals, predictions
and completions on the computer monitor. An eye-tracking device follows the
translators gaze path on the screen, detects where he or she faces translation
problems and triggers reactive assistance.

STUCTURE of INTERSHIP
21 May- 7 Jun
Lectures and hands on sessions (CBS)
8 Jun- 28 Jun
Divided into teams, worked at summer house (Nykobing, Falster)
29 Jun- 21 July
Integration (CBS)
# Excursions planned for every weekend

GAZE TEAM
Himanshu, Kritika and Rucha
Part -1
Word- Fixation Remapping
Part-2
Mutual disambiguation between gaze and speech

Word- fixation mapping is useful for cognitive/linguistic research, usability studies
and most importantly for providing interactivity into the system

Issues
Identification of the Fixations in a stream of gaze samples.
Mapping the Fixations to words/characters (Dealing with variable error)
Evaluation scheme for the fixation mapping

Our Approach

Our Approach -> Evaluation
● Input:
– Manually annotated fixation to word mapping (Gold Standard)
– Machine computed fixation to word mapping
● Output:
– The average character/word error.
● Method:
– Compute the overlaps in the gaze fixation durations in the manual and
machine annotations.
– For the overlapping fixations, compute the absolute differences in the
cursor positions of the two mappings.

Motivation

Motivation
Ambiguity in gaze
Variable Error
-Midas Touch
System Errors
- Eye Tracker
- Algorithm
- Calibration
Ambiguity in ASR
Domain of training data
Accent of speaker
Morphology of language
Speaking Style
-Co-articulation

Motivation
Consider a simple example:
● User reads the text “where is the bat”
● ASR can help map gaze points to
● Gaze can help disambiguate ASR output
Where is the bat.
There it is, behind the door
I can't find it! Where is it?
Look properly! Its right there.
Here is the mat
Here is the bat
where is the bat
where is a pat
ASR
Hypothesis
where
it there
theis Possible words
being gazed Intersection

Inspiration from literature research
Meyer et.al. studied eye movements in an object naming task. It was shown that
people consistently fixated objects prior to naming them.
Griffin showed that when multiple objects were being named in a single
utterance, Speech about one object was being produced while the next object
was fixated and lexically processed.

Experiments -> Reading Task
• 5 participants read English Text
• Eye Gaze and Speech Recorded
• 6 sets of 11 sentences
• 5 sets in domain and 1 out of domain

Experiments -> Translation Task
• 4 participants translated English Text
• 4 sets of 10-10 very simple sentences
• Target languages - Hi, Sp, Da, It
• Eye gaze on source language words and speech in target languages recorded

Approach
ASR word lattice
Reference sentence:
Leaving next day in the
morning

Approach
Gaze word lattice

Approach
Gaze bag of words

Approach
Composed word-lattice
Reference sentence: Leaving next day in the morning

System
• Performed experiments on Translog
• Speech hypothesis are provided by AT&T Watson server
• Transformed these format to word-lattice format using python
• Generate bag of words from x,y coordinates using our algo of part 1 using c#
and python
• In case of translation tasks, gaze bag of words are transformed into target
language bag of words using lexicons (1 more level of ambiguity)
• Composed these lattices using OpenFST

System -> experiment with algo
• Weights of gaze words : should consider or not
• Weights of ASR words: should consider or not
Then used Latin square ->
• Unweighted ASR with Weighted Gaze Bag-of-words (WGUA)
• Unweighted ASR with Unweighted Gaze Bag-of-words (UGUA)
• Weighted ASR with Weighted Gaze Bag-of-words (WGWA)
• Weighted ASR with Unweighted Gaze Bag-of-words (UGWA)

ASR
SCLITE was used to get the word accuracies of then-best hypotheses
with respect to the reference sentence.
Eye Gaze
Precision: ((Wg) inters (Wr))/Wg
Recall = ((Wg) inters (Wr))/Wr
F-Measure (Harmonic mean of precision and recall)
Sentence Recognition Error (SRI; 1 or 0 )
Wg =Unique words in the gazed words
Wr =Unique words in the reference sentence
Evaluation

Research Design
Reading Task
Independent Variables
Domain of test data
Weights of ASR
Weights of Eye Gaze
Dependent Variable
Gaze f-measure
Gaze SRI
ASR Word accuracy
Translation Task
Independent Variables
Target language
Weights of ASR
Weights of Eye Gaze
Dependent Variable
Gaze f-measure
Gaze SRI
ASR Word accuracy

Reading– Paired T-test
In domain Out of Domain All
Gaze_Precision 8.63075E-07 no improvement 9.2475E-07
Gaze_Recall 0.048194288 no improvement 0.048220416
Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06
ASR_WrdAccr 0.040033133 0.86206786 0.067110316
ASR_WrdAccr 0.007594247 0.86206786 0.017268861
ASR_WrdAccr 0.040033133 0.86206786 0.067110316
ASR_WrdAccr 0.040033133 0.363217468 0.099456245
WGUAUGWAUGUAWGWA

Reading– Absolute % improvements

Translation – Paired T-test
En-Hi En-Sp En-It En-Da
Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591
Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10
Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612
ASR_Word_Accuracy 0.001722134 0.002676057 0.108137263
Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10
Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10
Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10
UGUAWGUAWGWAUGWA

Translation – Absolute % improvements

Conclusions
Reading Task
• Significant improvement in both Gaze F-measure and ASR accuracy after
integration
• Gaze recall fall significantly
• SRI also improved
• Improvement in In-domain task was lot more than out-of-domain task
• Out of the four experiments UGWA is observed best

Conclusions
Translation Task
• Significant improvement in Gaze F-measure only for all languages
• ASR accuracy improved non-significantly
• For Hindi and Danish SRI decreased a lot
• Again UGWA is observed to be best (For 3 languages)

Overview flowchart
Input from gaze
Fixation-word
remapping algo
Got x, y from already
logged files EVALUATION: fixation duration
intersection b/w machine and manual
3 manual and 1 machine
Static text reading
experiment
Eye gaze data
captured (translog)
Audio recorded at
sentence level
Word-
lattices
Bag of
Words
EVALUATION: comparison with BoW of
reference sentence: precision and recall
10 best
hypothesis
Watson
server
EVALUATION: compared 1st best with
reference text (edit distance) - ScLite
Word
lattices
Eye gaze
disambiguation
ASR
disambiguation
With weighted & un-
weighted ASR lattices
Improved
BoW
Improved
Hypothesis
EVAL
EVAL
Majority
based also
Sentence
Identfi.

LEARNING
Academic
Worked with Tobii T-60
Experiment Design
Python
Latex
Moses
Translog
Putty
Cygwin
Audacity
OpenFST
Got an idea of MT and ASR
Personal
Communication Skills
Project-Management
morning reporting
presentations
weekly targets and check
Kayaking
Two string kite
Bit of Cooking

Thanks
Hoping the monkeys to be friends forever

Intern presentation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie Intern presentation

Ähnlich wie Intern presentation (20)

Mehr von Himanshu Bansal

Mehr von Himanshu Bansal (16)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Intern presentation