This lecture covers the basics of user experiment design in human-computer interaction. Computer scientists and developers often create interfaces for a particular purpose. This lecture explains how a user experiment can be designed and conducted to systematically compare one interface with the other.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Â
User Experiments in Human-Computer Interaction
1. LECTURE 5:
USER EXPERIMENTS IN HCI
COMP 4026 â Advanced HCI
Semester 5 - 2017
Arindam Dey
University of South Australia
2. OVERVIEW
â˘âŻ Why do we need user experiments?
â˘âŻ How to design a user experiments?
â˘âŻ Activity
â˘âŻ How to run a user experiments?
â˘âŻ Ethical considerations
4. You (designer / developer) â User
Because you
â˘âŻ know your system well
â˘âŻ have special skills
â˘âŻ know what you are measuring
5. Who should your users be in the study?
Sample must be a true representation of the population
Everyone who may
use your product
Participants in
your study
6. What users do and say?
To what extent they do it?
Why they do it and how to fix it?
courtesy: uxdesign.cc
7. Categories of usability tests based on goals
â˘âŻ Formative
- Beginning of and during the product
development phase
- Usability problems and fixes
â˘âŻ Summative
- Towards the end of the development phase
- Statistically measured usability
8. Categories of usability tests based on data collected
â˘âŻ Qualitative
- Descriptions (verbally or behaviorally)
- Directly measured
- Takes more effort to analyze
- Mostly earlier in the design phase
â˘âŻ Quantitative
- Measurements (numbers)
- Indirectly measured
- Later in the design phase
9.
10. User Experiments
â˘âŻ A method of academic research in HCI
- To discover/test/approve new knowledge
â˘âŻ Hypothesis driven
- Compares multiple conditions to discover causal
relationships
â˘âŻ Replicable (generalizable)
- Thrives to remove bias and error (random
assignment)
â˘âŻ Draw conclusions with statistical tests of the
hypothesis
11. Usability Testing vs. User Experiments
â˘âŻ The methods can be the same
â˘âŻ The goals are often different
â˘âŻ Usability testing goals
- Identify usability problems & issues of a product
â˘âŻ User experiment goals
- Answer research questions, discovering new
knowledge (generalizable results)
12. Usability Testing vs. User Experiments
Usability Testing User Experiment
Improve products Discover knowledge
Few participants Many participants
Results inform design Results validated statistically
Usually not completely replicable
- case specific results
Must be replicable
- generalizable results
Condition(s) controlled as much as
possible
Strongly controlled conditions
Procedure planned Experimental design
Results reported to product
designer / developer
Scientific report to scientific
community
14. Hypothesis
â˘âŻ A prediction of the outcome
- Based on research question but narrower
- A research question can be tested in multiple
hypotheses
- Causal relationship between IV and DV
- A precise statement that can be directly tested
through an experiment
e.g. Condition A will be faster than Condition B
15. Hypothesis
â˘âŻ Null hypothesis (H0)
- Predicts there is no effect of IV on DV
- Statistical tests accept/reject null hypothesis
â˘âŻ Alternative hypothesis (HA)
- Predicts there is an effect of IV on DV
â˘âŻ H0 and HA are mutually exclusive
17. Experimental Task
â˘âŻ A task that participants will do in a study under
different conditions
e.g. in Fittâs Law studies participants click on buttons
using different input devices
â˘âŻ Must be suitable to the application
- depends on what is the research question
â˘âŻ Ideally risk-free
18. Independent Variables (IV)
â˘âŻ Variables that are independent of participant's
behaviour
â˘âŻ Systematically manipulated by the experimenter
â˘âŻ Variables that experimenter is interested in
â˘âŻ There can be one or more IVs in an experiment
19. Typical Independent Variables
â˘âŻ Technology (controlled)
- Types of technology, device, interface, design
â˘âŻ User
- Physical/mental/social status
- age, gender, computer experience,
professional domain, education, culture,
motivation, mood, and disabilities
â˘âŻ Context of use
- Environmental status (physical/social)
- Lighting, noise, indoor/outdoor, public/
private
21. Dependent Variables (DV)
â˘âŻ The outcome or effect that the researchers are
interested in
â˘âŻ Dependent on participantsâ behavior or the changes
in the IVs
â˘âŻ Usually the outcomes that the researchers need to
measure
- measurements or observations
22. Dependent Variables (DV)
â˘âŻ Subjective
- Based on usersâ opinions, interpretations, points
of view, emotions and judgment
- More vulnerable to context and usersâ status
- e.g. questionnaires, NASA TLX
â˘âŻ Objective
- Not influenced by personal feeling/opinion
- Based on observation, compared against
standardized scale
- More consistent
- e.g. time, error
23. Typical Dependent Variables
â˘âŻ Efficiency
- e.g. task completion time, speed
â˘âŻ Accuracy
- e.g. error, success rate
â˘âŻ Subjective satisfaction
- e.g. Likert scale ratings
â˘âŻ Ease of learning
- e.g. test score, learning curve, retention rate
â˘âŻ Physical or cognitive demand
- e.g. NASA task load index (TLX)
24. Other Variables
â˘âŻ Controlled Variables
- Set to not change during an experiment
- The more controlled
- the more internal validity, but less
generalizable
â˘âŻ Random Variables
- The more influence of random variable
- the less internal validity
â˘âŻ Confounding Variables
- Variables that researchers failed to control
- damages internal validity
25. Validity of User Experiments
â˘âŻ Internal Validity
- approximate truth about inferences regarding
cause-effect or causal relationships
- not relevant for observational studies
- higher under strict controlled lab conditions
â˘âŻ External Validity
- the extent to which the conclusions of the
experiment is generalizable
- three types: population, environmental, and
temporal
26. Experimental Designs
â˘âŻ Within-subjects
- Each subject performs under all the different conditions
- Repeated-measure
â˘âŻ Between-subjects
- Each subject is assigned to one experimental
condition
- Independent samples
- Matched groups
â˘âŻ Mixed-factorial
- Combination of the two
- More than one IV needed
28. Within-Subjects vs. Between-Subjects
Within-subjects Between-subjects
Learning effect Avoids interference effects
(e.g. practice / learning effect)
Longer time for each participant
(larger impact of fatigue and frustration)
Shorter time for each participant
(less fatigue and frustration)
Individual difference can be isolated Impact of individuals difference
Easier to detect difference between
conditions
Harder to detect difference between
conditions
Requires smaller sample size Require larger sample size
Counterbalance/randomize the order of
presenting conditions
Randomized assignment to conditions or
matched groups
29. Randomization
â˘âŻ Critical condition of a true experiment
â˘âŻ The random assignment of treatments to the
â˘âŻ experimental units or participants
â˘âŻ No one, including the experimenters, can control the
assignments
â˘âŻ Main way to minimize the effects of random
variables
30. Counterbalancing
â˘âŻ All possible permutations
- 3 conditions => 3P3 = 6 permutations
- (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2),
(3,2,1)
- 4 conditions => 4P4 = 24 permutations
- (1,2,3,4), (1,2,4,3), (1,3,2,4), (1,3,4,2), âŚ
â˘âŻ Number of participants must be multiple of number
of permutations
31. Balanced Latin Square
â˘âŻ Latin Square
- Each item occurs once in each row and column
â˘âŻ Balanced Latin Square
- Each item both precedes and follows each
other item an equal number of times
32. Participants / Subjects
â˘âŻ The sample in your experiment
â˘âŻ Number of participants
- Between subject design: 15~20 per condition
- Within subject design: 15~20
- The smaller the effect size, the more
participants needed
- The more variance between users, the more
participants needed
- The more conditions in the experiment, the
more participants needed
33. Power Analysis
â˘âŻ You can calculate the ideal number of participants
you have to test
â˘âŻ Parameters needed:
- Îą: the probability of rejecting the H0 given that
that the H0 is true (usually set to 0.05)
- β (1-β = power: the probability of observing a
difference when it really exist, usually set to 0.8)
- Effect size: difference of mean divided by std.
dev.
â˘âŻ Free program for power analysis: g*Power
http://www.gpower.hhu.de/en.html
34. Errors
â˘âŻ Random Errors
- Also called âchance errorsâ or ânoisesâ
- Cause variations in both direction
- Can be controlled by a large sample size +
randomization
â˘âŻ Systematic Errors
- Also called âbiasesâ
- Always push actual value in the same direction
- No matter how large the sample is, cannot be
offset unless the source of error is controlled
36. After Designing the Study
â˘âŻ Write down the design
- hypotheses
- task
- IVs and DVs
- design of the experiment
- participants
- randomization / counterbalancing
- data collection
â˘âŻ Critically review your own design
â˘âŻ Ask others to review your design
37. Activity
Fill out the template with your study design
You have designed a new application to resize
photos on mobile phones (Condition A) quickly.
There are several alternative solutions available in
the market, pick anyone of them (Condition B).
Design a user experiment to compare Condition A
and Condition B.
39. Typical Experimental Session (1/2)
â˘âŻ Ensure the apparatus are ready
- Both the system under test and measurement
devices
- Test-run
- Make sure forms, questionnaires etc. are printed,âŚ
â˘âŻ Greet the participants
â˘âŻ Introduce the purpose of the study and the procedures
(experimenter script)
â˘âŻ Get the consent of the participants
â˘âŻ Assign the participants to a specific experiment
condition according to the pre-defined randomization
method
40. Typical Experimental Session (2/2)
â˘âŻ Participants complete training task
â˘âŻ Participants complete experimental tasks
â˘âŻ Participants answer questionnaires (if any)
â˘âŻ If within-subject design
- change conditions and repeat above
â˘âŻ Debriefing session
- Collect details through interview
â˘âŻ Compliments (always give some gift)
41. Pilot Study
â˘âŻ A small trial run of the main testing.
- Can identify majority of issues with both the
prototype and the experimental design
â˘âŻ Pilot testing check:
- that the experimental plan is viable
- you can conduct the procedure
- your prototype and instruments for
measurement work appropriately
- the experimental task and environment
â˘âŻ Iron out problems before doing the main experiment
â˘âŻ This is not optional
42. As an Experimenter
â˘âŻ Offload your Brain!
- Write down instructions and important
information
- Prepare checklists
- Print questionnaires and documents in advance
â˘âŻ Take notes, document oddities
- Create templates
â˘âŻ Rehearse procedures
- Do you need assistants?
â˘âŻ Nothing is as bad as lost data - AVOID!!!
- Collect ASAP, Backup
44. Consent
â˘âŻ Participant has the right to know
- The experimental procedure
- What kind of data is collected
- Risks involved
- How the data will be stored and presented
â˘âŻ Experimenter must
- Explain the experiment in detail
- Ask participant to sign a consent form
45. Respect Participants
â˘âŻ They are volunteers and should be allowed to
- Take a rest (between conditions)
- Leave the experiment anytime without reasoning
- Given a token of appreciation (gift, money etc.)
- Take time to organize (but donât waste their time)
â Do unto others as you would have them do unto
you.â
- MATTHEW 7:12
46. Privacy
â˘âŻ Never disclose their identifiable data to anyone
without written consent
â˘âŻ Data must be stored in secure locations
- Digitally and physically
â˘âŻ Donât use identifiable data, images, or videos in
reports or publications
47. Limitations
â˘âŻNo data collection method will be perfect
- control problems
- available technical equipment
â˘âŻDifferences
- Multiple researchers
- Multiple methods
- Multiple measures
- Objective vs. Subjective
- Qualitative vs. Quantitative
48. Limitations
â˘âŻA single study cannot tell us everything
- Important to make sure itâs replicable
â˘âŻOne paper â scientific truth
- Different researchers, different methods, all
coming to the same conclusion, thatâs when you
find consensus
â˘âŻScience is not static
- Theories evolve and change over time