SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
Using Vision and Voice to Create a Multimodal Interface
                                      for Microsoft Word 2007
                                                               T.R. Beelders and P.J. Blignaut
                                                 The University of the Free State, Bleomfontein, South Africa
                                                               {beelderstr; pieterb}@ufs.ac.za

Abstract                                                                                           interface should be customizable and allow users to select any
                                                                                                   combination of interaction techniques which suit their needs.
There has recently been a call to move away from the standard                                      The premise of the research study is not to develop a new word
WIMP type of interfaces and give users access to more intuitive                                    processor but rather to incorporate additional interaction tech-
interaction techniques. Therefore, it in order to test the usability                               niques, besides the keyboard and mouse, into an application
of a multimodal interface in Word 2007, the most popular word                                      which has already been accepted by the user community. This
processor, the additional modalities of eye gaze and speech rec-                                   will allow for the improvement of an already popular product
ognition were added within Word 2007 as interaction tech-                                          and stimulate inclusiveness of non-mainstream users into the
niques. This paper discusses the developed application and the                                     mainstream market. Therefore, one aim is to determine whether
way in which the interaction techniques are included within the                                    it is possible to customize an interface to such an extent that all
well-established environment of Word 2007. The additional                                          user groups are catered for with an all-inclusive interface.
interaction techniques are fully customizable and can be used in
isolation or in combination. Eye gaze can be used with dwell                                       The research study is still in the beginning phase where devel-
time, look and shoot or blinking and speech recognition can be                                     opment of the tool is underway. Therefore, for the purposes of
used for dictation and verbal commands for both formatting                                         this paper, the application as it has been developed will be the
purposes and navigation through a document. Additionally, the                                      main focus. The paper will, however, conclude with a short dis-
look and shoot method can also be combined with a verbal                                           cussion of the next phases of the research study.
command to facilitate a completely hands-free interaction. Mag-
nification of the interface is also provided to improve accuracy                                   2     Interaction Techniques
and multiple onscreen keyboards are provided to provide hands
free typing capabilities.                                                                          Using a physical input device in order to communicate or per-
                                                                                                   form a task in human-computer dialogue is called an interaction
Keywords: Eye-tracking, speech recognition, usability, word                                        technique [Foley, et al., 1990 as cited in Jacob, 1995]. The inte-
processing, multimodal                                                                             raction techniques of speech recognition and eye tracking will be
                                                                                                   included in a popular word processor interface to create a mul-
1       Introduction                                                                               timodal interface as a means to determine whether the usability
                                                                                                   of this product can be enhanced in this way.
The word processor application has evolved substantially since
its initial inception and since then has undergone a virtual meta-                                 Although this approach has received limited attention thus far,
morphosis to achieve the capabilities that are available in these                                  the multimodal approach has always focused on the develop-
applications today. As an integral part of everyday life for many                                  ment of a third-party application, for example EyeTalk [Hatfield
people it caters for a very diverse group of users, Furthermore,                                   and Jenkins, 1997]. Contrary to this, this study will use an al-
users with disabilities or needs other than those of mainstream                                    ready existing application, namely Microsoft Word ©, which
users are not always taken into consideration during system                                        currently enjoys a high prevalence in the commercial market.
development and often have to compensate by using specially
designed applications which do not necessarily compare with the                                    3     Development environment
more popular applications. This study therefore aims to investi-
gate various means to increase the usability of a word processor                                   The development environment used was Visual Studio 2008,
for as wide a user group as possible.                                                              making use of the .NET Framework 3.5. Visual Studio Tools for
                                                                                                   Microsoft Office System 2008 (VSTO) in C# was used for de-
For this reason, the interface of the most popular word processor                                  velopment. VSTO allows programmers to use managed code to
application will be extended into a multimodal interface. This                                     build Office-based solutions in C# and VB.NET [Anderson,
interface should facilitate use of the mainstream product by                                       2009]. In order to incorporate the speech recognition the Micro-
marginalized users, whilst at the same time enhancing the user                                     soft Speech Application Programming Interface (SAPI) with
experience for novice, intermediate and expert users. Ideally the                                  version 5.1 of the SDK was used. The SDK provides the capa-
                                                                                                   bility of compiling customized grammars and accessing the
                                                                                                   functionalities of the speech recognizer. In order to provide gaze
Copyright © 2010 by the Association for Computing Machinery, Inc.
Permission to make digital or hard copies of part or all of this work for personal or
                                                                                                   interaction Tobii SDK 1.5.4 was used. For magnification pur-
classroom use is granted without fee provided that copies are not made or distributed              poses, which will be discussed in an upcoming section, the
for commercial advantage and that copies bear this notice and the full citation on the             commercial product Magnifying Glass Pro 1.7 was chosen as a
first page. Copyrights for components of this work owned by others than ACM must be                relatively inexpensive solution but primarily based on the fact
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
                                                                                                   that it was one of the few applications which incorporated click-
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail               able areas within the magnified area which are then correctly
permissions@acm.org.
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

                                                                                             173
transferred to the underlying area. This is essential in the devel-         continued use of the traditional interaction techniques. As illu-
oped product as the magnification will increase the accuracy of             strated in Figure 1, an extra tab was added to the established
cursor positioning via eye gaze and correct interpretation of user          Microsoft Word ribbon. This tab (circled in red) was named
intention and requiring the user to disable magnification before            Multimodal Add-Ins.
clicking on the interface would negate all the advantages gained
from magnification                                                          The new tab provides numerous options to the user to select
                                                                            which additional interaction techniques they would like to use
The aim of the development process was to incorporate speech                (Figure 1). As is evident from Figure 1, complete customization
recognition and eye tracking as additional interaction techniques           of the techniques is allowed via selection of any combination of
in the Microsoft Word environment. The user should also be                  techniques as well as in what capacity the techniques must be
given the freedom to determine in which combination the inte-               implemented.
raction techniques must be used, while still having the option of




Figure 1: Multimodal Add-ins for Word 2007



Additional tools which are available to enhance the user expe-              Speech recognition
rience are a magnification tool and an onscreen keyboard which
can be displayed at the bottom of the Word document. The mag-               The user has the option of enabling the speech engine so that
nification tool magnifies the immediate area under the mouse                Microsoft Word can respond to verbal utterances. In terms of the
cursor, thereby providing increased accuracy for users with                 customizable options, the user can toggle between dictation
weak eyesight and those making use of the gaze sensitive inter-             mode and command mode. In dictation mode, the speech recog-
face. Magnification is available when using the mouse or when               nition is implemented in the well-known method of capturing
using eye gaze as an interaction technique. The use of the mag-             vocalizations, translating those vocalizations into text and writ-
nification tool is entirely at the discretion of the user who is            ing the result to the currently activated document in Microsoft
capable of turning magnification on and off at will or as needed.           Word. In order for the dictation mode to be effective the user
Magnification can be toggled using either a verbal command or               must select a previously trained profile. A unique profile can be
the available button on the ribbon.                                         trained through the Windows Speech wizard. All the available
                                                                            speech profiles are provided in a drop-down box on the multi-
Onscreen keyboards are available as an alternative to using a               modal add-in tab for the convenience of the user.
traditional keyboard. The onscreen keyboard can be used either
through use of the traditional mouse or to achieve hands-free               In command mode, a grammar is activated which accepts only
typing using eye gaze or a combination of eye gaze and speech               isolated commands and responds to these in a pre-determined
recognition. The final adapted interface, as envisioned in use              manner. Command mode provides the functions of cursor con-
when the on screen keyboard is in use, is shown in Figure 2.                trol, formatting capabilities and certain document handling ca-
                                                                            pabilities. Several different commands are provided which have
The layout of the onscreen keyboard can be changed to either a              the same application reaction, thereby contributing to further
traditional QWERTY keyboard layout or to an alphabetic layout.              customization for the user as they can determine which the most
Each keyboard contains all 26 alphabetic letters, a Space bar,              desirable command is for them to use. Moreover, simple cursor
Backspace and Delete keys as well as special keys which simpli-             control is provided by providing directional commands but more
fy movement through the document. Special keys which are                    complex cursor control is also provided by allowing line selec-
provided are Page up, Page down, Home and End. The user can                 tion and movement of the cursor as though control keys (such as
also toggle between upper case and lower case by activating and             Shift) are being pressed in combination with the verbal com-
deactivating the CAPS lock key. A Select All key is provided as             mand. These types of commands will simplify selection of text
a means for the user to select all the text in the document. The            and provide verbal commands for complex key combinations
two red arrows in the lower left corner of the keyboard (Figure             which are not always known to novice and intermediate users.
2) change the size of all keyboard keys in decrements and in-               For example, the word “Bold” causes the activation or deactiva-
crements of 10 pixels respectively, thereby providing even more             tion of the bold formatting style. Similarly the words “Italic” and
customization of the keyboard for the user. Auditory feedback in            “Underline” activate or deactivate their formatting style. Words
the form of a soft beep is given when a keyboard key is clicked             such as “Cut”, “Copy” and “Paste” allow for text manipulation
on.                                                                         and are their subsequent actions are of course the cutting or co-
                                                                            pying of the currently selected text and the pasting of the clip-




                                                                      174
board contents at the position of the cursor. More complex                   current cursor position. Cursor control is achieved through the
commands for text selection are available such as “Select line”,             commands “Left”, “Right”, “Up” and “Down”. Verbal com-
which selects the whole line on which the cursor is situated,                mands can be issued in sequence to perform relatively complex
“Select word”, which selects the word nearest to the right of the            document manipulation.




Figure 2: Adapted interface of Word 2007 when the onscreen keyboard is activated



Eye-tracking                                                                 mouse click is simulated. When selecting the look and shoot
                                                                             method, the user can position the cursor using eye gaze and then
The eye tracker can be calibrated for use directly through the               press the Enter key to simulate a left mouse click. This would
Microsoft Word interface. This increases the usability of the                have the effect of either placing the cursor at the position of the
application as the user is not required to move between applica-             eye gaze or clicking on the icon directly under the eye gaze of
tions to achieve their goal of using gaze as an interaction tech-            the user. The third option available to the user is that of blinking.
nique. Since the word processor is the focus of this study, this             In this scenario, the user fixates on the desired object or position
meets the requirement of the research question scope. The user               and then blinks their eyes to simulate a left mouse click.
has the option to activate eye gaze which can then be used to
position the cursor in the document or over an object to be ma-              Multiple interaction techniques
nipulated. Customization is provided by allowing the user to
choose the activation method. For use purely as a single interac-            When the user selects No activation (Figure 1) via eye gaze that
tion method, the choices of dwell time, look and shoot and                   implies that they will instead be using voice commands to re-
blinking are provided. When dwell time is selected, the user is              spond to the current eye gaze position of the user. In this in-
able to set the interval of the dwell time (see the Sensitivity Set-         stance, the speech recognition must also be enabled and the user
ting text box o Figure 1). This provides additional customization            can then issue verbal commands to move the cursor to the cur-
as the user can determine the speed with which they are most                 rent gaze position which is analogous to executing a left mouse
comfortable and leaves the option for adjusting this interval as             click at that position. In this way, it is possible for the user to
the user gains more confidence and experience with gaze based                place the cursor at any position in the document, or to click one
interaction. The interval can be changed at any time during the              of the Microsoft Word icons on the ribbon. The verbal com-
use of the application. Dwell time requires the user to fixate on a          mands of “Go”, “Click” or “Select” all simulate a left mouse
position for the set time of the dwell time interval before a left           click at the button closest to the current gaze position. In this



                                                                       175
way, the user is free to choose the command which they find                  Furthermore, to further investigate the usability of the newly
most suitable for them.                                                      developed application, user efficiency effectiveness can be
                                                                             measured in a within-subjects experiment by requiring users to
In most instances it is envisioned that the onscreen keyboard                complete identical tasks in both the commercial Microsoft Word
will also be activated under these circumstances. When the on-               and the new multimodal Microsoft Word.
screen keyboard is activated in conjunction with the eye gaze,
visual feedback is given to the user to indicate which button will           Moreover, the usability of the various interaction techniques will
be clicked when the verbal command is issued. With each fixa-                also be analyzed to determine which combination of the interac-
tion that is detected within the boundaries of the keyboard, the             tion techniques provides the most usable interface – if any. User
button which is closest to that fixation is determined to be the             satisfaction will be measured through means of a questionnaire
target and a shape is displayed in the centre of the button. The             in order to gauge user reaction, both in a short-term and long-
user can also select which shape they would like to use for visu-            term exposure period.
al feedback. The available shapes are a solid square, a hollow
square, a solid disk and a hollow circle. The hollow shapes do               5     Summary
not obscure the letter of the key and in so doing provide the
necessary visual feedback whilst still allowing the user to see the
letter which will be written to the document. Feedback is only               A multimodal interface was developed for Microsoft Word in
                                                                             order to eventually determine whether the usability of this appli-
given on the keyboard to minimize interference during normal
                                                                             cation can be enhanced for mainstream users whilst simulta-
document browsing. In order to achieve increased stabilization
of the feedback within a targeted object, the algorithm as sug-              neously providing an adaptable and usable interface for disabled
                                                                             users. For these purposes, eye tracking and speech recognition
gested by Kumar (2007) was used.
                                                                             capabilities were built into the Word interface. These interaction
If the user is satisfied that the correct button has been deter-             techniques can be used in isolation or in combination and the
                                                                             way in which they are used can be customized in a number of
mined they can then issue any of the verbal commands to simu-
                                                                             ways. Once the development has been completed and measure-
late a left mouse click. The letter shown on the keyboard is then
written to the document at the current cursor position.                      ments can be captured automatically in the background during
                                                                             user interaction, a longitudinal usability study will be undertaken
                                                                             Both disabled and able-bodied users will be included in the
4     Where to next?
                                                                             sample and will be required to complete a number of practice
                                                                             sessions with the application over a prolonged period of time.
As previously mentioned, the research study is still in the pre-
                                                                             After each session, participants will be required to complete a
liminary stages of an empirical study. An application has been
developed to investigate the effect of multimodal interaction                number of tasks, during which measurements will be captured
                                                                             for further analysis In this way, it will be possible to determine
techniques on the usability of a mainstream word processor ap-
                                                                             whether users are able to improve their performance on the sys-
plication. Further enhancements to the application will include
the expansion of the keyboards to include numerical keys and                 tem over an extended period – in other words, whether the sys-
                                                                             tem is usable. Additionally, user performance between the new
the magnification will be refined to respond to eye gaze and
                                                                             application and the commercially available application will be
voice commands. More voice commands will be provided for,
particularly for commands that currently have shortcut keys                  compared to determine whether they can achieve comparable
                                                                             performance on both the systems. In this way, it will be possible
assigned to them, such as Save and displaying certain dialog
                                                                             to determine whether a popular commercial application can be
boxes.
                                                                             fully exteneded into a worthwhile multimodal application which
                                                                             caters for a diverse group of users comprised of both disabled
Additionally, a back-end will be written for the application
                                                                             and able-bodied users.
which will capture certain measurements which can be used for
usability analysis. Measurements such as the number of errors
made during a task, the number of actions required and the per-              References
centage of the task completed correctly will automatically be
saved to a database for further analysis.                                    ANDERSON, T. (2009). Pro Office 2007 development with VSTO.
                                                                             APress: United States of America.
Once the application has been completed, user testing will
commence. Both disabled and non-disabled users of a local uni-               HATFIELD, F. AND JENKINS, E.A. (1997). An interface integrating
versity will be approached to participate in the study. A longitu-           eye gaze and voice recognition for hands-free computer access.
dinal study will be conducted whereby the participants will be               In Proceedings of the CSUN 1997 Conference.
required to spend periods interacting with the system. After each
exposure to the system, users will be required to complete a                 JACOB, R.J. (1995). Eye tracking in advanced interface design.
number of tasks for which measurements will be captured. In                  In Virtual Environments and Advanced interface Design, W.
this way, the learnability of the study can be measured over a               Barfield and T. A. Furness, Eds. Oxford University Press, New
period of time by comparing the results of these sessions to de-             York, NY, 258-288.
termine if user performance increases in correlation to user ex-
posure to the application. Since it is expected that there will be a         KUMAR, M. (2007). Gaze-enhanced user interface design. PhD
learning curve associated with the application, it is deemed more            Thesis, Stanford University.
applicable to capture usability measurements over a period of
time rather than only after a single session with the application.
In order to determine whether the application succeeds in pro-
viding for disabled users whilst simultaneously providing for a
better user experience for mainstream users, it is imperative that
users from both these demographics be included in the sample.



                                                                       176

Weitere ähnliche Inhalte

Was ist angesagt?

DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...
DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...
DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...Microsoft Private Cloud
 
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1AEGIS-ACCESSIBLE Projects
 
Video. The new dialtone for business communications
Video. The new dialtone for business communicationsVideo. The new dialtone for business communications
Video. The new dialtone for business communicationsschinarro
 
Presentación Collaboration Video Cablevisión Day 2010
Presentación Collaboration Video Cablevisión Day 2010Presentación Collaboration Video Cablevisión Day 2010
Presentación Collaboration Video Cablevisión Day 2010Logicalis Latam
 
Php In The Enterprise 01 24 2010
Php In The Enterprise 01 24 2010Php In The Enterprise 01 24 2010
Php In The Enterprise 01 24 2010phptechtalk
 
Forum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov Final
Forum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov FinalForum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov Final
Forum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov FinalDALEZ
 
LCTY09 - Beyond office with Lotus Symphony
LCTY09 - Beyond office with Lotus SymphonyLCTY09 - Beyond office with Lotus Symphony
LCTY09 - Beyond office with Lotus SymphonyStuart McIntyre
 
Virtualize Your Telephony Platform with Cisco UCS
Virtualize Your Telephony Platform with Cisco UCSVirtualize Your Telephony Platform with Cisco UCS
Virtualize Your Telephony Platform with Cisco UCSAdvanced Logic Industries
 

Was ist angesagt? (10)

DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...
DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...
DelMonte Foods Reduces Travel and Improves Collaboration with Microsoft Web C...
 
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
 
Video. The new dialtone for business communications
Video. The new dialtone for business communicationsVideo. The new dialtone for business communications
Video. The new dialtone for business communications
 
Presentación Collaboration Video Cablevisión Day 2010
Presentación Collaboration Video Cablevisión Day 2010Presentación Collaboration Video Cablevisión Day 2010
Presentación Collaboration Video Cablevisión Day 2010
 
Php In The Enterprise 01 24 2010
Php In The Enterprise 01 24 2010Php In The Enterprise 01 24 2010
Php In The Enterprise 01 24 2010
 
Forum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov Final
Forum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov FinalForum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov Final
Forum Nokia Dev. Camp - Biz Dev_ Paris 17&18 Nov Final
 
LCTY09 - Beyond office with Lotus Symphony
LCTY09 - Beyond office with Lotus SymphonyLCTY09 - Beyond office with Lotus Symphony
LCTY09 - Beyond office with Lotus Symphony
 
My+Magic
My+MagicMy+Magic
My+Magic
 
480 483
480 483480 483
480 483
 
Virtualize Your Telephony Platform with Cisco UCS
Virtualize Your Telephony Platform with Cisco UCSVirtualize Your Telephony Platform with Cisco UCS
Virtualize Your Telephony Platform with Cisco UCS
 

Andere mochten auch

Contemporary approach in management
Contemporary approach in managementContemporary approach in management
Contemporary approach in managementshivam singh
 
C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...
C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...
C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...guest99f048
 
Guía para la gestión del uso de medicamentos
Guía para la gestión del uso de medicamentosGuía para la gestión del uso de medicamentos
Guía para la gestión del uso de medicamentosMANUEL RIVERA
 
Override Meeting S01E02
Override Meeting S01E02Override Meeting S01E02
Override Meeting S01E02Marcelo Paiva
 
Hyves Cbw Mitex Harry Van Wouter
Hyves Cbw Mitex Harry Van WouterHyves Cbw Mitex Harry Van Wouter
Hyves Cbw Mitex Harry Van Wouterguest2f17d3
 
What Is A Canadian
What Is A  CanadianWhat Is A  Canadian
What Is A Canadiansamspector93
 
Measuring social media - updated
Measuring social media - updatedMeasuring social media - updated
Measuring social media - updatedEmerson Povey
 
ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)
ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)
ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)ZFConf Conference
 
Невидимый гос долг в Казахстане
Невидимый гос долг в КазахстанеНевидимый гос долг в Казахстане
Невидимый гос долг в КазахстанеKassymkhan Kapparov
 
Viena – praga
Viena –  pragaViena –  praga
Viena – pragatsanchezro
 

Andere mochten auch (20)

Contemporary management theory
Contemporary management theoryContemporary management theory
Contemporary management theory
 
Contemporary approach in management
Contemporary approach in managementContemporary approach in management
Contemporary approach in management
 
C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...
C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...
C:\Documents And Settings\Joanne\My Documents\A Level Subjects\Media\Joanne H...
 
Guía para la gestión del uso de medicamentos
Guía para la gestión del uso de medicamentosGuía para la gestión del uso de medicamentos
Guía para la gestión del uso de medicamentos
 
Override Meeting S01E02
Override Meeting S01E02Override Meeting S01E02
Override Meeting S01E02
 
Lua on Steroids
Lua on SteroidsLua on Steroids
Lua on Steroids
 
อาฟกานิสถาน
อาฟกานิสถานอาฟกานิสถาน
อาฟกานิสถาน
 
Kennady
KennadyKennady
Kennady
 
Igualdad ikea
Igualdad ikeaIgualdad ikea
Igualdad ikea
 
อาหรับและปาเลสไตน์
อาหรับและปาเลสไตน์อาหรับและปาเลสไตน์
อาหรับและปาเลสไตน์
 
กลุ่มก่อการร้ายมุสลิม
กลุ่มก่อการร้ายมุสลิมกลุ่มก่อการร้ายมุสลิม
กลุ่มก่อการร้ายมุสลิม
 
Hyves Cbw Mitex Harry Van Wouter
Hyves Cbw Mitex Harry Van WouterHyves Cbw Mitex Harry Van Wouter
Hyves Cbw Mitex Harry Van Wouter
 
What Is A Canadian
What Is A  CanadianWhat Is A  Canadian
What Is A Canadian
 
Oxycontin®
Oxycontin®Oxycontin®
Oxycontin®
 
Measuring social media - updated
Measuring social media - updatedMeasuring social media - updated
Measuring social media - updated
 
Pr1
Pr1Pr1
Pr1
 
ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)
ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)
ZFConf 2010: Using Message Queues in Day-to-Day Projects (Zend_Queue)
 
Невидимый гос долг в Казахстане
Невидимый гос долг в КазахстанеНевидимый гос долг в Казахстане
Невидимый гос долг в Казахстане
 
Mitex Pine
Mitex PineMitex Pine
Mitex Pine
 
Viena – praga
Viena –  pragaViena –  praga
Viena – praga
 

Ähnlich wie Beelders Using Vision And Voice To Create A Multimodal Interface For Microsoft Word 2007

Hindi speech enabled windows application using microsoft
Hindi speech enabled windows application using microsoftHindi speech enabled windows application using microsoft
Hindi speech enabled windows application using microsoftIAEME Publication
 
Interactive speech based games for autistic children with asperger syndrome
Interactive speech based games for autistic children with asperger syndromeInteractive speech based games for autistic children with asperger syndrome
Interactive speech based games for autistic children with asperger syndromeAmal Abduallah
 
Comparative Analysis of Visual Basic VS Java programming language.docx
Comparative Analysis of Visual Basic VS Java programming language.docxComparative Analysis of Visual Basic VS Java programming language.docx
Comparative Analysis of Visual Basic VS Java programming language.docxRichwellIanAfrica
 
LANGUAGE TRANSLATOR APP
LANGUAGE TRANSLATOR APPLANGUAGE TRANSLATOR APP
LANGUAGE TRANSLATOR APPIRJET Journal
 
A Research Paper On A Progress Tracking Application Using Flutter And Firebase
A Research Paper On A Progress Tracking Application Using Flutter And FirebaseA Research Paper On A Progress Tracking Application Using Flutter And Firebase
A Research Paper On A Progress Tracking Application Using Flutter And FirebaseNat Rice
 
Why Use Flutter for App Development- Features and Benefits
Why Use Flutter for App Development- Features and BenefitsWhy Use Flutter for App Development- Features and Benefits
Why Use Flutter for App Development- Features and BenefitsLucy Zeniffer
 
SMARCOS CNR Paper Workshop Distributed UI
SMARCOS CNR Paper Workshop Distributed UISMARCOS CNR Paper Workshop Distributed UI
SMARCOS CNR Paper Workshop Distributed UISmarcos Eu
 
Fluent Interfaces
Fluent InterfacesFluent Interfaces
Fluent InterfacesIDES Editor
 
The Holistic Consumer Experience
The Holistic Consumer ExperienceThe Holistic Consumer Experience
The Holistic Consumer ExperienceMLD/ Mel Lim Design
 
Automated Java Code Generation (ICDIM 2006)
Automated Java Code Generation (ICDIM 2006)Automated Java Code Generation (ICDIM 2006)
Automated Java Code Generation (ICDIM 2006)IT Industry
 
IRJET - Survey Paper on Tools Used to Enhance User's Experience with Cons...
IRJET -  	  Survey Paper on Tools Used to Enhance User's Experience with Cons...IRJET -  	  Survey Paper on Tools Used to Enhance User's Experience with Cons...
IRJET - Survey Paper on Tools Used to Enhance User's Experience with Cons...IRJET Journal
 
IRJET- Voice based Email Application for Blind People
IRJET-  	  Voice based Email Application for Blind PeopleIRJET-  	  Voice based Email Application for Blind People
IRJET- Voice based Email Application for Blind PeopleIRJET Journal
 
SMARCOS_Paper_Mobile hci12 246
SMARCOS_Paper_Mobile hci12 246SMARCOS_Paper_Mobile hci12 246
SMARCOS_Paper_Mobile hci12 246Smarcos Eu
 
Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...
Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...
Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...Kalle
 

Ähnlich wie Beelders Using Vision And Voice To Create A Multimodal Interface For Microsoft Word 2007 (20)

White Paper
White PaperWhite Paper
White Paper
 
Hindi speech enabled windows application using microsoft
Hindi speech enabled windows application using microsoftHindi speech enabled windows application using microsoft
Hindi speech enabled windows application using microsoft
 
Oracle Fusion Applications 101
Oracle Fusion Applications 101Oracle Fusion Applications 101
Oracle Fusion Applications 101
 
Interactive speech based games for autistic children with asperger syndrome
Interactive speech based games for autistic children with asperger syndromeInteractive speech based games for autistic children with asperger syndrome
Interactive speech based games for autistic children with asperger syndrome
 
Aj31253258
Aj31253258Aj31253258
Aj31253258
 
Comparative Analysis of Visual Basic VS Java programming language.docx
Comparative Analysis of Visual Basic VS Java programming language.docxComparative Analysis of Visual Basic VS Java programming language.docx
Comparative Analysis of Visual Basic VS Java programming language.docx
 
LANGUAGE TRANSLATOR APP
LANGUAGE TRANSLATOR APPLANGUAGE TRANSLATOR APP
LANGUAGE TRANSLATOR APP
 
A Research Paper On A Progress Tracking Application Using Flutter And Firebase
A Research Paper On A Progress Tracking Application Using Flutter And FirebaseA Research Paper On A Progress Tracking Application Using Flutter And Firebase
A Research Paper On A Progress Tracking Application Using Flutter And Firebase
 
Poster Serenoa
Poster SerenoaPoster Serenoa
Poster Serenoa
 
Why Use Flutter for App Development- Features and Benefits
Why Use Flutter for App Development- Features and BenefitsWhy Use Flutter for App Development- Features and Benefits
Why Use Flutter for App Development- Features and Benefits
 
SMARCOS CNR Paper Workshop Distributed UI
SMARCOS CNR Paper Workshop Distributed UISMARCOS CNR Paper Workshop Distributed UI
SMARCOS CNR Paper Workshop Distributed UI
 
Pert15
Pert15Pert15
Pert15
 
Fluent Interfaces
Fluent InterfacesFluent Interfaces
Fluent Interfaces
 
The Holistic Consumer Experience
The Holistic Consumer ExperienceThe Holistic Consumer Experience
The Holistic Consumer Experience
 
Automated Java Code Generation (ICDIM 2006)
Automated Java Code Generation (ICDIM 2006)Automated Java Code Generation (ICDIM 2006)
Automated Java Code Generation (ICDIM 2006)
 
IRJET - Survey Paper on Tools Used to Enhance User's Experience with Cons...
IRJET -  	  Survey Paper on Tools Used to Enhance User's Experience with Cons...IRJET -  	  Survey Paper on Tools Used to Enhance User's Experience with Cons...
IRJET - Survey Paper on Tools Used to Enhance User's Experience with Cons...
 
IRJET- Voice based Email Application for Blind People
IRJET-  	  Voice based Email Application for Blind PeopleIRJET-  	  Voice based Email Application for Blind People
IRJET- Voice based Email Application for Blind People
 
SMARCOS_Paper_Mobile hci12 246
SMARCOS_Paper_Mobile hci12 246SMARCOS_Paper_Mobile hci12 246
SMARCOS_Paper_Mobile hci12 246
 
Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...
Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...
Hennessey An Open Source Eye Gaze Interface Expanding The Adoption Of Eye Gaz...
 
Tandberg movi
Tandberg  moviTandberg  movi
Tandberg movi
 

Mehr von Kalle

Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsKalle
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Kalle
 
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Kalle
 
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Kalle
 
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...Kalle
 
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlUrbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlKalle
 
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Kalle
 
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingTien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingKalle
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Kalle
 
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser OphthalmoscopeStevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser OphthalmoscopeKalle
 
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Kalle
 
Skovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneSkovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneKalle
 
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerSan Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerKalle
 
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksRyan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksKalle
 
Rosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingRosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingKalle
 
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchQvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchKalle
 
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyPrats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyKalle
 
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Kalle
 
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Kalle
 
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Kalle
 

Mehr von Kalle (20)

Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
 
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
 
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
 
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
 
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlUrbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
 
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
 
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingTien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
 
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser OphthalmoscopeStevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
 
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
 
Skovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneSkovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze Alone
 
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerSan Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
 
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksRyan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
 
Rosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingRosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem Solving
 
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchQvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
 
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyPrats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement Study
 
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
 
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
 
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
 

Beelders Using Vision And Voice To Create A Multimodal Interface For Microsoft Word 2007

  • 1. Using Vision and Voice to Create a Multimodal Interface for Microsoft Word 2007 T.R. Beelders and P.J. Blignaut The University of the Free State, Bleomfontein, South Africa {beelderstr; pieterb}@ufs.ac.za Abstract interface should be customizable and allow users to select any combination of interaction techniques which suit their needs. There has recently been a call to move away from the standard The premise of the research study is not to develop a new word WIMP type of interfaces and give users access to more intuitive processor but rather to incorporate additional interaction tech- interaction techniques. Therefore, it in order to test the usability niques, besides the keyboard and mouse, into an application of a multimodal interface in Word 2007, the most popular word which has already been accepted by the user community. This processor, the additional modalities of eye gaze and speech rec- will allow for the improvement of an already popular product ognition were added within Word 2007 as interaction tech- and stimulate inclusiveness of non-mainstream users into the niques. This paper discusses the developed application and the mainstream market. Therefore, one aim is to determine whether way in which the interaction techniques are included within the it is possible to customize an interface to such an extent that all well-established environment of Word 2007. The additional user groups are catered for with an all-inclusive interface. interaction techniques are fully customizable and can be used in isolation or in combination. Eye gaze can be used with dwell The research study is still in the beginning phase where devel- time, look and shoot or blinking and speech recognition can be opment of the tool is underway. Therefore, for the purposes of used for dictation and verbal commands for both formatting this paper, the application as it has been developed will be the purposes and navigation through a document. Additionally, the main focus. The paper will, however, conclude with a short dis- look and shoot method can also be combined with a verbal cussion of the next phases of the research study. command to facilitate a completely hands-free interaction. Mag- nification of the interface is also provided to improve accuracy 2 Interaction Techniques and multiple onscreen keyboards are provided to provide hands free typing capabilities. Using a physical input device in order to communicate or per- form a task in human-computer dialogue is called an interaction Keywords: Eye-tracking, speech recognition, usability, word technique [Foley, et al., 1990 as cited in Jacob, 1995]. The inte- processing, multimodal raction techniques of speech recognition and eye tracking will be included in a popular word processor interface to create a mul- 1 Introduction timodal interface as a means to determine whether the usability of this product can be enhanced in this way. The word processor application has evolved substantially since its initial inception and since then has undergone a virtual meta- Although this approach has received limited attention thus far, morphosis to achieve the capabilities that are available in these the multimodal approach has always focused on the develop- applications today. As an integral part of everyday life for many ment of a third-party application, for example EyeTalk [Hatfield people it caters for a very diverse group of users, Furthermore, and Jenkins, 1997]. Contrary to this, this study will use an al- users with disabilities or needs other than those of mainstream ready existing application, namely Microsoft Word ©, which users are not always taken into consideration during system currently enjoys a high prevalence in the commercial market. development and often have to compensate by using specially designed applications which do not necessarily compare with the 3 Development environment more popular applications. This study therefore aims to investi- gate various means to increase the usability of a word processor The development environment used was Visual Studio 2008, for as wide a user group as possible. making use of the .NET Framework 3.5. Visual Studio Tools for Microsoft Office System 2008 (VSTO) in C# was used for de- For this reason, the interface of the most popular word processor velopment. VSTO allows programmers to use managed code to application will be extended into a multimodal interface. This build Office-based solutions in C# and VB.NET [Anderson, interface should facilitate use of the mainstream product by 2009]. In order to incorporate the speech recognition the Micro- marginalized users, whilst at the same time enhancing the user soft Speech Application Programming Interface (SAPI) with experience for novice, intermediate and expert users. Ideally the version 5.1 of the SDK was used. The SDK provides the capa- bility of compiling customized grammars and accessing the functionalities of the speech recognizer. In order to provide gaze Copyright © 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or interaction Tobii SDK 1.5.4 was used. For magnification pur- classroom use is granted without fee provided that copies are not made or distributed poses, which will be discussed in an upcoming section, the for commercial advantage and that copies bear this notice and the full citation on the commercial product Magnifying Glass Pro 1.7 was chosen as a first page. Copyrights for components of this work owned by others than ACM must be relatively inexpensive solution but primarily based on the fact honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. that it was one of the few applications which incorporated click- Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail able areas within the magnified area which are then correctly permissions@acm.org. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 173
  • 2. transferred to the underlying area. This is essential in the devel- continued use of the traditional interaction techniques. As illu- oped product as the magnification will increase the accuracy of strated in Figure 1, an extra tab was added to the established cursor positioning via eye gaze and correct interpretation of user Microsoft Word ribbon. This tab (circled in red) was named intention and requiring the user to disable magnification before Multimodal Add-Ins. clicking on the interface would negate all the advantages gained from magnification The new tab provides numerous options to the user to select which additional interaction techniques they would like to use The aim of the development process was to incorporate speech (Figure 1). As is evident from Figure 1, complete customization recognition and eye tracking as additional interaction techniques of the techniques is allowed via selection of any combination of in the Microsoft Word environment. The user should also be techniques as well as in what capacity the techniques must be given the freedom to determine in which combination the inte- implemented. raction techniques must be used, while still having the option of Figure 1: Multimodal Add-ins for Word 2007 Additional tools which are available to enhance the user expe- Speech recognition rience are a magnification tool and an onscreen keyboard which can be displayed at the bottom of the Word document. The mag- The user has the option of enabling the speech engine so that nification tool magnifies the immediate area under the mouse Microsoft Word can respond to verbal utterances. In terms of the cursor, thereby providing increased accuracy for users with customizable options, the user can toggle between dictation weak eyesight and those making use of the gaze sensitive inter- mode and command mode. In dictation mode, the speech recog- face. Magnification is available when using the mouse or when nition is implemented in the well-known method of capturing using eye gaze as an interaction technique. The use of the mag- vocalizations, translating those vocalizations into text and writ- nification tool is entirely at the discretion of the user who is ing the result to the currently activated document in Microsoft capable of turning magnification on and off at will or as needed. Word. In order for the dictation mode to be effective the user Magnification can be toggled using either a verbal command or must select a previously trained profile. A unique profile can be the available button on the ribbon. trained through the Windows Speech wizard. All the available speech profiles are provided in a drop-down box on the multi- Onscreen keyboards are available as an alternative to using a modal add-in tab for the convenience of the user. traditional keyboard. The onscreen keyboard can be used either through use of the traditional mouse or to achieve hands-free In command mode, a grammar is activated which accepts only typing using eye gaze or a combination of eye gaze and speech isolated commands and responds to these in a pre-determined recognition. The final adapted interface, as envisioned in use manner. Command mode provides the functions of cursor con- when the on screen keyboard is in use, is shown in Figure 2. trol, formatting capabilities and certain document handling ca- pabilities. Several different commands are provided which have The layout of the onscreen keyboard can be changed to either a the same application reaction, thereby contributing to further traditional QWERTY keyboard layout or to an alphabetic layout. customization for the user as they can determine which the most Each keyboard contains all 26 alphabetic letters, a Space bar, desirable command is for them to use. Moreover, simple cursor Backspace and Delete keys as well as special keys which simpli- control is provided by providing directional commands but more fy movement through the document. Special keys which are complex cursor control is also provided by allowing line selec- provided are Page up, Page down, Home and End. The user can tion and movement of the cursor as though control keys (such as also toggle between upper case and lower case by activating and Shift) are being pressed in combination with the verbal com- deactivating the CAPS lock key. A Select All key is provided as mand. These types of commands will simplify selection of text a means for the user to select all the text in the document. The and provide verbal commands for complex key combinations two red arrows in the lower left corner of the keyboard (Figure which are not always known to novice and intermediate users. 2) change the size of all keyboard keys in decrements and in- For example, the word “Bold” causes the activation or deactiva- crements of 10 pixels respectively, thereby providing even more tion of the bold formatting style. Similarly the words “Italic” and customization of the keyboard for the user. Auditory feedback in “Underline” activate or deactivate their formatting style. Words the form of a soft beep is given when a keyboard key is clicked such as “Cut”, “Copy” and “Paste” allow for text manipulation on. and are their subsequent actions are of course the cutting or co- pying of the currently selected text and the pasting of the clip- 174
  • 3. board contents at the position of the cursor. More complex current cursor position. Cursor control is achieved through the commands for text selection are available such as “Select line”, commands “Left”, “Right”, “Up” and “Down”. Verbal com- which selects the whole line on which the cursor is situated, mands can be issued in sequence to perform relatively complex “Select word”, which selects the word nearest to the right of the document manipulation. Figure 2: Adapted interface of Word 2007 when the onscreen keyboard is activated Eye-tracking mouse click is simulated. When selecting the look and shoot method, the user can position the cursor using eye gaze and then The eye tracker can be calibrated for use directly through the press the Enter key to simulate a left mouse click. This would Microsoft Word interface. This increases the usability of the have the effect of either placing the cursor at the position of the application as the user is not required to move between applica- eye gaze or clicking on the icon directly under the eye gaze of tions to achieve their goal of using gaze as an interaction tech- the user. The third option available to the user is that of blinking. nique. Since the word processor is the focus of this study, this In this scenario, the user fixates on the desired object or position meets the requirement of the research question scope. The user and then blinks their eyes to simulate a left mouse click. has the option to activate eye gaze which can then be used to position the cursor in the document or over an object to be ma- Multiple interaction techniques nipulated. Customization is provided by allowing the user to choose the activation method. For use purely as a single interac- When the user selects No activation (Figure 1) via eye gaze that tion method, the choices of dwell time, look and shoot and implies that they will instead be using voice commands to re- blinking are provided. When dwell time is selected, the user is spond to the current eye gaze position of the user. In this in- able to set the interval of the dwell time (see the Sensitivity Set- stance, the speech recognition must also be enabled and the user ting text box o Figure 1). This provides additional customization can then issue verbal commands to move the cursor to the cur- as the user can determine the speed with which they are most rent gaze position which is analogous to executing a left mouse comfortable and leaves the option for adjusting this interval as click at that position. In this way, it is possible for the user to the user gains more confidence and experience with gaze based place the cursor at any position in the document, or to click one interaction. The interval can be changed at any time during the of the Microsoft Word icons on the ribbon. The verbal com- use of the application. Dwell time requires the user to fixate on a mands of “Go”, “Click” or “Select” all simulate a left mouse position for the set time of the dwell time interval before a left click at the button closest to the current gaze position. In this 175
  • 4. way, the user is free to choose the command which they find Furthermore, to further investigate the usability of the newly most suitable for them. developed application, user efficiency effectiveness can be measured in a within-subjects experiment by requiring users to In most instances it is envisioned that the onscreen keyboard complete identical tasks in both the commercial Microsoft Word will also be activated under these circumstances. When the on- and the new multimodal Microsoft Word. screen keyboard is activated in conjunction with the eye gaze, visual feedback is given to the user to indicate which button will Moreover, the usability of the various interaction techniques will be clicked when the verbal command is issued. With each fixa- also be analyzed to determine which combination of the interac- tion that is detected within the boundaries of the keyboard, the tion techniques provides the most usable interface – if any. User button which is closest to that fixation is determined to be the satisfaction will be measured through means of a questionnaire target and a shape is displayed in the centre of the button. The in order to gauge user reaction, both in a short-term and long- user can also select which shape they would like to use for visu- term exposure period. al feedback. The available shapes are a solid square, a hollow square, a solid disk and a hollow circle. The hollow shapes do 5 Summary not obscure the letter of the key and in so doing provide the necessary visual feedback whilst still allowing the user to see the letter which will be written to the document. Feedback is only A multimodal interface was developed for Microsoft Word in order to eventually determine whether the usability of this appli- given on the keyboard to minimize interference during normal cation can be enhanced for mainstream users whilst simulta- document browsing. In order to achieve increased stabilization of the feedback within a targeted object, the algorithm as sug- neously providing an adaptable and usable interface for disabled users. For these purposes, eye tracking and speech recognition gested by Kumar (2007) was used. capabilities were built into the Word interface. These interaction If the user is satisfied that the correct button has been deter- techniques can be used in isolation or in combination and the way in which they are used can be customized in a number of mined they can then issue any of the verbal commands to simu- ways. Once the development has been completed and measure- late a left mouse click. The letter shown on the keyboard is then written to the document at the current cursor position. ments can be captured automatically in the background during user interaction, a longitudinal usability study will be undertaken Both disabled and able-bodied users will be included in the 4 Where to next? sample and will be required to complete a number of practice sessions with the application over a prolonged period of time. As previously mentioned, the research study is still in the pre- After each session, participants will be required to complete a liminary stages of an empirical study. An application has been developed to investigate the effect of multimodal interaction number of tasks, during which measurements will be captured for further analysis In this way, it will be possible to determine techniques on the usability of a mainstream word processor ap- whether users are able to improve their performance on the sys- plication. Further enhancements to the application will include the expansion of the keyboards to include numerical keys and tem over an extended period – in other words, whether the sys- tem is usable. Additionally, user performance between the new the magnification will be refined to respond to eye gaze and application and the commercially available application will be voice commands. More voice commands will be provided for, particularly for commands that currently have shortcut keys compared to determine whether they can achieve comparable performance on both the systems. In this way, it will be possible assigned to them, such as Save and displaying certain dialog to determine whether a popular commercial application can be boxes. fully exteneded into a worthwhile multimodal application which caters for a diverse group of users comprised of both disabled Additionally, a back-end will be written for the application and able-bodied users. which will capture certain measurements which can be used for usability analysis. Measurements such as the number of errors made during a task, the number of actions required and the per- References centage of the task completed correctly will automatically be saved to a database for further analysis. ANDERSON, T. (2009). Pro Office 2007 development with VSTO. APress: United States of America. Once the application has been completed, user testing will commence. Both disabled and non-disabled users of a local uni- HATFIELD, F. AND JENKINS, E.A. (1997). An interface integrating versity will be approached to participate in the study. A longitu- eye gaze and voice recognition for hands-free computer access. dinal study will be conducted whereby the participants will be In Proceedings of the CSUN 1997 Conference. required to spend periods interacting with the system. After each exposure to the system, users will be required to complete a JACOB, R.J. (1995). Eye tracking in advanced interface design. number of tasks for which measurements will be captured. In In Virtual Environments and Advanced interface Design, W. this way, the learnability of the study can be measured over a Barfield and T. A. Furness, Eds. Oxford University Press, New period of time by comparing the results of these sessions to de- York, NY, 258-288. termine if user performance increases in correlation to user ex- posure to the application. Since it is expected that there will be a KUMAR, M. (2007). Gaze-enhanced user interface design. PhD learning curve associated with the application, it is deemed more Thesis, Stanford University. applicable to capture usability measurements over a period of time rather than only after a single session with the application. In order to determine whether the application succeeds in pro- viding for disabled users whilst simultaneously providing for a better user experience for mainstream users, it is imperative that users from both these demographics be included in the sample. 176