3. Measuring the User Experience
• The next slides are based on the core text
book for this module, "Measuring the User
Experience"
3
4. Study goals
• There are two main types of usability
evaluations: formative and summative
4
5. Formative usability evaluations
• The goal is to improve the design before
the release of a project
• Formative usability evaluations allow to
identify usability issues:
– Preventing users from accomplishing their
goals
– Resulting in inefficiencies or user errors
• Formative evaluations should run when
there is an opportunity to impact
design 5
6. Summative usability evaluations
• The goal is to evaluate whether a interface
meets its objectives
• Summative evaluations will allow to:
– Identify the overall usability of an interface
– How the interface compares against the
competition
– Identify whether the interface meets the
original requirements
6
8. What should you measure?
• Last week we discussed how to analyse
data
• What metrics should we choose? ( = what
data should we collect to evaluate the
usability?)
– Keep in mind that different type of studies will
use different metrics
8
9. Usability metrics
• Usability metrics include:
– Performance metrics
– Issue metrics
– Self-reported metrics
– Behavioural and physiological metrics
• This is an introduction – we are going to
see them in detail in the next weeks
9
10. Performance metrics
• Performance metrics include:
– Task success
– Time on task
– Errors
– Efficiency
– Learnability (effort required for maximum
efficiency)
10
11. Issue based metrics
• Issue based metrics relate to usability
issues that have been identified
• They include:
– Frequency of unique issues (e.g. in iterative
design or comparing with competitors)
– Frequency of issues per participant
– Issues by category
– Issues by task
11
12. Self reported metrics
• Self reported metrics relate to the user
perception of the interaction
• They include:
– Rating scales (e.g. Likert scales)
– After-Scenario Questionnaires
• Assessing specific attributes, as visual appeal,
perceived efficiency, usefulness, enjoyment and
ease of navigation
• Open ended questions
12
13. Behavioural and physiological
metrics
• Relate to the range of emotions such as
stress, excitement or frustration that users
may experience while interacting with an
interface
• They include:
– Eye tracking
– Measuring stress, emotion and other
psychological metrics
13
15. Usability studies
• Each usability study is different – and will
select a different combination of metrics
• In the next slides we are going to discuss
some examples of usability study
– Make sure that you complement the slides
with the textbook!
15
16. Completing a transaction
• Should identify:
– Task success
– Where are the most common points of failure
– Whether the interface meets the expectations
of the users
16
17. Comparing products
• The metrics selected depend from the
aims of the product your are comparing vs
the baseline
• For example your product might want to
increase the efficiency (do a better job),
improve the satisfaction
17
18. Frequent use of the same
product
• Efficiency metrics (as task time, steps
required)
• Learnability metrics
18
19. Evaluating navigation and/or
information architecture
• This is commonly used as part of the
analysis of web sites
• It commonly includes efficiency metrics (as
task time, steps required, number of
errors)
19
20. Problem discovery
• This is normally done on a product that is
already built but has not gone through
usability evaluation – or requires further
work
• Participants would often generate their
own tasks
20
21. Maximizing usability for a critical
product
• Critical products are required as part of
very important tasks – e.g. defibrillator,
voting machine or emergency exit
– User performance should be measured
against a target goal
– Task success and number of errors should
be recorded and analysed
21
22. Comparing alternative designs
• One of the most common usability
evaluation scenario
• Typically runs early in the development
process
• It commonly includes:
– Issue-based metrics
– Performance metrics
– Satisfaction
22
25. Types of evaluation
• Three main categories of evaluation methods
(Sharp, Rogers and Preece, 2006):
– Controlled settings involving users, eg. usability
testing & experiments in laboratories and living labs.
– Natural settings involving users, eg. field studies to
see how the product is used in the real world.
– Any settings not involving users, eg. consultants
critique (usability inspections) and analytical
evaluations
25
26. Types of evaluation (2)
• Different authors classify usability
evaluation methods in different ways
– James Hom popularised in 1996 the use of
the categories testing, inquiry and
inspection through his (at the time) popular
web site, http://usability.jameshom.com/
– The categories are also used in
usabilityhome.com and by many subsequent
works
26
27. Testing, inspections and inquiry
• Testing: representative users work on typical
tasks an interface
• Inspection: usability specialists examine an
interface
• Inquiry: usability evaluators obtain
information about users' likes, dislikes, needs,
and understanding of the system by talking to
them, letting them answer questions or
observing them using the system in real work
27
28. Usability study
• In practice, a usability study will include a
set of methods, collecting different metrics,
used in a complementary way
• Each method may be used in one or more
settings (we’ll cover them in the next
slides)
28
30. Settings
• The next slides will discuss the different
types of settings (and we will be using
Sharp et al.'s categories):
– Controlled settings
– Natural settings
– Without users
30
31. Controlled settings involving
users
• User activities are controlled (typically in
labs) in order to evaluate an artefact by:
– Testing hypotheses
– Measuring or observing certain behaviours
3131
32. Natural settings involving users
• Natural setting methods focus (at different
degrees) on analysing an artefact as used
in the natural environment
• The focus is on observation
– There is little or no control on user’s
activities, to try the replicate how the artefact
would be used in the real world
– Used to obtain information about users' likes,
dislikes, needs, and understanding of the
system
3232
33. Any settings not involving users
• This category includes all other methods,
not requiring direct user involvement
• It typically includes consultants and
researchers analysing and modelling
aspects of the interaction with an
interface in order to identify usability
problem
– E.g. usability inspections, heuristics,
walkthroughs, models and analytical
evaluations
33
35. Usability testing
• Usability testing methods are used to evaluate
an artefact by testing it on users
• Avoid confusion: Sharp, Rogers and Preece use
the term “controlled settings involving users”.
Nielsen uses usability testing – they have a very
similar semantic extension
– The next slides are based on Usability Engineering
(Nielsen, 1994) and so use Nielsen’s terminology
– Tullis and Albert focus on usability testing
3535
36. Usability testing (2)
• Usability testing generally involves
measuring how well test subjects interact
with an artefact in terms of:
– Effectiveness (if the artefact can be used for
specific tasks and how)
– Efficiency (the effort necessary)
– Satisfaction (the emotional response of the
user with the interface)
3636
37. Usability testing (3)
• Involves recording performance of typical users
doing typical tasks
– Controlled settings
– Users are observed and timed
– Data is recorded on video & key presses are logged.
– The data is used to calculate performance times, and
to identify & explain errors.
• User satisfaction is evaluated using
questionnaires & interviews.
37
38. Avoid confusion!
• The term usability testing is often used to
refer to any technique used to evaluate a
product or system
– In the next slides we use the term usability
testing to refer to a process that employs
people as testing participants who are
representative of the target audience to
evaluate the degree to which a product meets
specific usability criteria
38
39. Experimental design
• Employing an experimental design for usability testing
(Rubin and Chisnel, 2006) would require:
1. A hypothesis to be formulated
2. Randomly selected participants (which must represent the
characteristics of the target population)
3. Tight controls must be employed (all participants should have
nearly the identical experience)
4. Control groups (its treatment should vary only on the single
variable being tested)
5. A sample (of users) of sufficient size to measure statistically
significant differences between groups.
39
40. Experimental design (2)
• Predicts the relationship between two or
more variables.
• Independent variable is manipulated by
the researcher
– Dependent variable depends on the
independent variable
– Typical experimental designs have one or two
independent variable
• Validated statistically & replicable
40
41. Experimental design (3)
• An experimental design is often unrealistic and
inappropriate:
– The scope is typically not to conduct research, but rather to
make informed decisions on how to improve an artefact
– It is often very difficult to apply the principle of randomly
assigning participants
– The classical methodology is designed to obtain quantitative
proof of research hypotheses that one design is better than
another
• The next slides are also based on Rubin and Chisnel's
work
41
42. Usability testing & experiments
Usability testing
• Improve products
• Few participants
(typically)
• Results inform design
• Conditions controlled as
much as possible
• Procedure planned
• Results reported to
developers
Experiments for
research
• Discover knowledge
• Many participants
• Results validated
statistically
• Strongly controlled
conditions
• Experimental design
• Scientific report to
scientific community
42
43. Usability testing methodology
• Development of research questions or test objectives rather than
hypotheses
• Use of a representative sample of end users which may or may not
be randomly chosen
• Representation of the actual work environment
• Observation of end users who either use or review a
representation of the product
• Controlled and sometimes extensive interviewing and probing by
the test moderator.
• Collection of quantitative and qualitative data
• Recommendation of improvements to the design
43
44. How to start? With a test plan
• The test plan is the foundation for the
entire test.
– It is a document that addresses the how,
when, where, who, why, and what of your
usability test
44
45. Test plan
• Purpose and goals of the test
• Research questions
• Participant characteristics
• Method (test design)
• Task list
• Test environment, equipment, and logistics
• Test moderator role
• Data to be collected and evaluation measures
• Report contents and presentation
45
46. Purpose and Goals of the Test
• Used to describe at a high level the
reasons for performing the test
– Is the test attempting to resolve problems that
have been reported?
– Is there a new policy?
– Are visitors not completing transactions on an
e-commerce site?
• Testing must be tied to business goals!
46
47. Research questions
• This section describes the issues and
questions that need to be resolved and
focuses the research
– Avoid vague research questions! e.g. "Is the
web site X usable?"
47
48. Research questions (examples)
• For web sites:
– How easily do users understand what is clickable?
– How easily and successfully do users find the
products or
– information they are looking for?
– How easily and successfully do users register for the
site?
– Where in the site do users go to find Search? Why?
– How easily can users return to the home page?
48
49. Research questions
(examples)(2)
• General research questions:
– What are the major usability problems that
prevent users from completing the most
common tasks?
– Is usability better or worse than in the
previous release?
– Is usability better or worse than in the designs
from competitors?
49
51. Method
• This section describes the method that
you are going to use in your usability test
– You can use one or more methods
• We will cover usability testing methods in
the next slides
– and we will cover inquiry and inspection in the
next weeks
51
52. Method (2)
• It should provide an overview of each
aspect of the test from the time the
participants arrive until the time they leave
• A typical test consists of:
– Testing several different users
– Having them perform a series of
representative tasks on/with your artefact
52
53. Task list
• The task list comprises those tasks that the
participants will perform during the test.
• The list should consist of tasks that will ordinarily
be performed on/with the artifact
• Include success criteria: when is a task completed
successfully?
53
54. Test Environment, Equipment,
and Logistics
• This section describes the environment
you will attempt to simulate during the test
and the equipment that will be required
– e.g. a student bedroom or a busy office
54
56. Moderator role
• This section describes what the test
moderator will be doing
– This will depend on the usability testing
method used!
56
57. Data collected
• This section includes an overview of the
types of data you will collect and the
metrics used to describe it
57
58. Data collected (examples)
• Number and percentage of tasks completed
correctly with and without prompts or assistance
• Number and type of prompts given
• Number and percentage of tasks completed
incorrectly
• Count of all errors
• Count of errors of omission
• Count of incorrect menu choices
• Count of incorrect icons selected
58
59. A sample?
• You can download a sample test plan from
here: http://bit.ly/testplan2014
59
60. Reliability and validity
• There are several methodological pitfalls
in usability testing (Nielsen 1994)
– Reliability is the question of whether one
would get the same result if the test were to
be repeated
– Validity is the question of whether the result
actually reflects the usability issues one wants
to test.
60
61. Reliability
• There is huge individual differences between
test users.
– It is not uncommon to find that the best user is
10 times as fast as the slowest user, and the
best 25% of the users are normally about
twice as fast as the slowest 25% of the users
(Egan, 1988, in Nielsen 1994)
– Observing that User A using Interface X could
perform a certain task 40% faster than User Β
using Interface Y might not mean much
61
62. Validity
• Typical validity problems involve:
– Using the wrong users
– Giving them the wrong tasks
– Not considering time constraints and social influences
• For example, a management information system
might have different results when tested with
students compared to when it's tested with
experienced users of similar systems
62
64. Usability testing methods
• The next slides cover some of the
methods that are used in usability testing:
– Thinking aloud (asking users)
– Co-discovery (asking users)
– Question asking (asking users)
– Performance measurement (testing)
– Activity recording (testing)
– Remote testing (testing)
64
65. Usability testing methods (2)
• Some of those methods can be adapted
and used for other types of usability
evaluations
– E.g. Thinking aloud could be adapted to be
used in a usability inquiry too
• (Hopefully) some of the following methods
will be already familiar to you
65
66. Methods and techniques
• Each usability evaluation method will use one or more
techniques to collect data (e.g. "Asking users",
"Observing users", "Questionnaires")
– Some techniques may be compound (the composition of more
basic techniques)
– Researchers may disagree on the classification of techniques
and methods (e.g. when it comes to surveys and questionnaires
– are they method or techniques?)
– Some researchers do not differentiate between method and
technique
– You need to familiarise yourself with the subject specific
terminology and its variations
66
67. Thinking aloud
• Thinking aloud consists in an interaction
(scenario) during which the participants
are requested to perform several tasks
and to freely talk and express their
thoughts, feelings and opinions
6767
68. Co-discovery
• A variation of the thinking aloud method
with two user interacting co-operatively
– Aims to reflect real-life situations in which
users can ask for help from other people
6868
69. Question asking
• Another variation on the Thinking aloud
method, in which the evaluator asks the
user questions while s/he is performing
tasks with the artefact under analysis
6969
70. Performance measurement
• Performance measurement methods
consist of users interacting with an artefact
while trying to achieve quantifiable
objectives
– GOMS (Goals, Operators, Methods, and
Selection rules) is a popular performance
measurement method
7070
71. Activity recording
• Activity recording is based on recording
user behaviour during an usability test for
later analysis
– Activity recording is commonly used as an
add-on to other methods
7171
72. Remote testing
• Remote testing is used to remotely
evaluate an artefact, by gathering
quantitative (and in some cases
qualitative) data about the user’s
behaviour while performing task in a
scenario
– It is typically used for software interfaces
7272
74. What ISN'T usability testing?
• Techniques that do not require
representative users (e.g. expert
evaluations, use of heuristics, walk-
throughs) as part of the process are not
usability testing
74
75. Bruce Tognazzini: why you need to
evaluate
“Iterative design, with its repeating cycle of
design and testing, is the only validated
methodology in existence that will
consistently produce successful results. If
you don’t have user-testing as an integral part of
your design process you are going to throw
buckets of money down the drain.”
(http://www.asktog.com/columns/037TestOrElse.ht
ml)
75
76. Bibliography and suggested
readings
• A part from your core text book, suggested
readings for this week include:
– Dix et al. (2003) Human Computer Interaction.
(Chapter 9)
– Nielsen, J. (1994) Usability Engineering
– Rubin, J. and Chisnel, D. (2010) Handbook of
Usability Testing
– Goodman, E., Kuniavsky, M. and Moed, A. (2012)
Observing the User Experience.
– Hom, J. (1998) The Usability Methods Toolbox
[online]. Available from http://usability.jameshom.com/
(Accessed: 24 March 2014)
7676