SlideShare ist ein Scribd-Unternehmen logo
1 von 69
Downloaden Sie, um offline zu lesen
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Usability Studies and Empirical Studies
Harry Hochheiser
University of Pittsburgh
Department of Biomedical Informatics
harryh@pitt.edu
+1 412 648 9300
Attribution-ShareAlike

CC BY-SA
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Outline
Usability Studies

Think-Aloud 

Summative Studies 

Empirical Studies
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Beyond Inspections
Inspections won't tell you which problems users will face in
action

Might not identify mental models and confusions

..finding out where things go wrong.
Baobab Health, March 2014

Harry Hochheiser, harryh@pitt.edu
Edit Title
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
No bright dividing line in process
Design
Fully-functional 

Prototype 

Paper

Prototype
Release
Usability Inspections
Usability Studies
Empirical User Studies, Case
Studies, Longitudinal Studies,
Acceptance Tests
Low	
  cost,	
  low	
  validity Higher	
  cost,	
  validity
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Formative Usability Studies: Goals
• Generally, to understand if the proposed design supports completion of
intended tasks
• Be specific -
• Tasks and users
• Define success
• User Satisfaction?
• Do users like the tool?
• What are the important metrics?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Formative Usability Studies: Tasks
• Representative and specific
• What would users do?
• Realistic – given available time and resources
• Appropriate for assessment of goals
• Possibly some user-defined/suggested
• Particularly if participants were informants in earlier
requirements-gathering
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Which Tasks?
Bad: Give this a try?

Better: Try to send an email, find a contact, and file a
response

Still better: 

Detailed scenario with multiple actions that required coordinated use of
diverse components of an application's functionality
Formative Usability Studies:
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Formative Usability Studies: Conditions
• Usability Lab
• Two-way mirrors/separate rooms
• Workspace
• Online?
• Often video and/or audio-recorded
• Screen-capture
• Logs and instrumented software
• Goal: Ecological Validity
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Formative Usability Studies: 

Measures
• Key question to answer: “can users complete tasks”?
• Generally, lists of usability problems
• Description of difficulty
• Severity
• Task completion times – depending on methods
• Error rates?
• User Satisfaction
• Quantitative results for measuring success
• Not comparative
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Formative Usability Studies: 

Methodology
• Define Scope
• Users complete tasks
• Researchers observe process
• What happens?
• What goes right? What goes wrong?
• Note difficulties, confusions?
• Record – audio/video, screen capture
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Formative Usability Studies: 

Participants
• Somewhat representative of likely users
• Willing guinea-pigs
• Need folks who are patient, willing to deal with problems
• Well-motivated
• Compensated
• Eager to use the tool
• Small numbers – repeat until diminishing returns
• How many?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Nielsen – why you only need to test with 5 users 

http://www.useit.com/alertbox/20000319.html

Hwang & Salvendy (2010) – maybe need 10 +/- 2
Only 5 users – or maybe not
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Two approaches
• Observation
•Subject performs tasks, researchers observe
• Ecological validity, but no insight into users
• “Think aloud”
•User describes mental state and goals
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Think-Aloud Protocols
• User describes what they are doing and why as they try to complete a task
• Describe both goals and steps taken to achieve those goals.
• Observe
• Confusions – when steps taken don't lead to expected results
• Misinterpretations – when choices don't lead to expected outcomes
• Goal: identify both micro- and macro-level usability concerns
• Strong similarities with contextual inquiry, but..
• Focus specifically on tool
• Participant encouraged to narrat
• Evaluator generally doesn’t ask questions
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Caveats
• Think-aloud is harder than it might sound
• What is the role of the investigator?
• How much feedback to provide?
• Very Little
• What (if anything) do you say when the user runs into problems?
• Not much
• What if it's a system that you built?
• How to identify/describe a usability problem?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Think-Aloud Protocols: A Comparison of Three Think-Aloud
Protocols for use in Testing Data-Dissemination Web Sites for
Usability Olmsted-Hawala, et al. 2010
"... it is recommended that rather than writing a vague statement such as 'we
had participants think aloud,' practitioners need to document their type of
TA protocol more completely, including the kind and frequency of probing.”
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Reporting Usability Problems

adapted from Mack & Montaniz, 1994
• Breakdowns in goal-directed behavior
• Correct action, noticeable effort
• To find
• To execute
• Confused by consequence
• Correct action, confusing outcome
• Incorrect action requires recovery
• Problem tangles
• Qualitative analysis by interface interactions
• Objects and actions
• Higher-level categorization of interface interactions
Gulf of Execution
Gulf of Evaluation
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Reporting Usability Problems

adapted from Mack & Montaniz, 1994
• Inferring possible causes of problems
• Problem reports
• Design-relevant descriptions
• Quantitative analysis of problems by severity
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Formative Usability Studies: 

Analysis
• Challenge – identify problems at the right level of granularity?
• When does a series of related difficulties lead to a need for
redesign?
• What if these difficulties come from different tasks?
• When appropriate, relate usability observations back to contextual inquiry or
other earlier investigations
• Does the implementation fail to line up with the needs?
• Perhaps in some unforeseen manner?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Formative Usability Studies: 

Analysis
• Multiple observers
• Calculate agreement metrics?
• Use audio, video, transcripts to illustrate difficulties
• Particularly useful for demonstrating problems to implementation
folks
• Rate problem severity
• Which are show-stoppers and which are nuisances?
• Which require redesign vs. small changes?
• Must prioritize...
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Completion – Summative User Studies
• Demonstrate successful execution of system
• With respect to
• Alternative system – even if straw man
• Stated performance goals – Acceptance Tests
• Generally empirical
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Completion – Summative

Studies of systems in use
• Case studies
• Descriptions of individual deployments
• Qualitative
• Longitudinal study of ongoing use
• Collect data regarding impact
• Similar to case studies, but potentially more quantitative.
• Use observations and interviews to see what works?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
After system is complete

More realistic conditions? 

Acceptance tests

Usability tests aimed at measuring success

Does the tool do what the client wants

• 95% task completion rate within 3 minutes, etc.?

Client has clearer idea – not just “user friendly”
Summative Tests
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
What: Empirical Studies
• Quantitative measure of some aspect of successful system use
• Task completion time (faster is better)
• Error rate
• Learnability
• Retention
• User satisfaction...
• Quality of output?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Tension in empirical studies
• Metrics that are easy to measure may not be most interesting
• Task completion time
• Error rate
• Great for repetitive data entry tasks, less so for complex tasks
• Analytics, writing...
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Empirical User Studies: Goals
• I have two interfaces – A and B.
• Which is better? and how much better?
• Want to determine if there is a measurable, consistent difference in
• Task completion times
• Error rates
• Learnability
• Memorability
• Satisfaction
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Running Example: Menu Structures
• Hierarchical Menu structures
• Multiple possibilities for any number of leaf nodes
• Broad/Shallow vs. Narrow/Deep
• which is faster?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Hypothesis
• Testable Theory about the world
• Galileo: The rate at which falling items fall is independent of their weight
• Menus
• Users will be able to find items more quickly with broad/shallow trees
than with narrow/deep trees.
• Often stated as a “null hypothesis” that you expect will be disproven:
• There will be no difference in task performance time between broad/shallow
trees and narrow/deep trees.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Background/Context
• Controlled experiments from cognitive psychology
• State a testable/falsifiable hypothesis
• Identify a small number of independent variables to manipulate
• hold all else constant
• choose dependent variables
• assign users to groups
• collect data
• statistically analyze & model
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Other goals
• Strive for
• removal of bias
• replicable results
• Generalizable theory that can inform future work
• or, demonstrable evidence of preference for one design over another.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Empirical User Studies: Tasks
• Use variants of the design to complete some meaningful operation
• Usually relatively close-ended, well-defined
• Relatively clear success/failure
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Empirical User Studies: Conditions
• Lab-like?
• Simulated realistic conditions?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Independent Variables
• What are you going to test?
• Condition that is “independent” of results
• independent of user's behaviors
• independent of what you're measuring.
• one of 2 (or 3 or 4) things you're comparing.
• can arise from subjects being classified into groups
• Examples
• Galileo: dropping a feather vs. bowling ball
• Menu structures – broad/shallow vs. narrow/deep
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Dependent variable
• Values that hypothesis test
• falling time
• task performance time, etc.
• May have more than one
• Goal: show that changes in independent variable lead to measurable, reliable
changes in dependent variables.
• With multiple independent variables, look for interactions
• Differences between interfaces increase with differences in task
complexity
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Controls
• In order to reliably say that independent variables are responsible for
changes in dependent variables, we must control for possible confounds
• Control – keep other possible factors constant for each condition/value of
independent variables
• types of users, contexts, network speeds, computing environments
• confound – uncontrolled factor that could lead to an alternate explanation
for the results
• What happens if you don’t control as much as possible?
• Confounds, not independent variables, may be the cause of changes in
dependent variables.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Examples of Controls
• Galileo:
• windy day vs. not windy?
• Menus
• network speed/delays? (do everything on one machine)
• skills of users? (more on participant selection later)
• font size, display information, etc.?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
• Related to controls
• Experimenter can introduce biases that might influence outcomes
• Instructions?
• Choice of participants?
• more on this in a moment
• Protocols
• prepare scripts ahead of time
• Learning Effects?
Bias
Thanks to Jinjuan Feng for figure
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Between-Groups vs. Within-Groups
Design
• How do you assign participants to conditions?
• All people do all tasks/cells?
• Within-groups – compare within groups of individuals.
• one group of test participants
• Certain people for certain cells?
• between groups – compare between groups of individuals
• 2 or more groups
• Mixed models
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Between Groups
• Pros
• Simpler design
• Avoid learning effect
• Don't have to worry about ordering
• Cons
• may need more participants
• to get enough data for statistical tests
• to avoid influence of some individuals.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Within-Groups
• Pros:
• Can be more powerful statistically
• same person uses each of multiple interfaces
• Fewer Participants
• Cons
• Learning effects require appropriate randomization of tasks/
interfaces
• Fatigue is possible
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Mixed Models
• Elements of both
• 3 different interfaces
• Want to compare performance of different groups
• Docs vs. Nurses?
• Each interface a within-subject experiment
• Across professions is between-subjects.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Other Challenges
• Ordering tasks?
• How many?
• Want to avoid fatigue, boredom, and expense of long sessions
• How many users?
• 20 or more?
• Variability among subjects
• May be unforeseen.
• Bi-modal distribution of education or computer experience?
• Training materials
• Run a pilot
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Procedure
• Users conduct tasks

• Measure

• record task completion times

• errors

• etc.

• Now what? 

• Analyze data to see if there is support for the hypothesis

• alternatively, if the
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Hypothesis Testing
• Not about proof or disproof

• Instead, examine data

• Find likelihood that the data occurred randomly if the null
hypothesis is true

• If this is small, say that we have support for the hypothesis
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Data, Stats, and R
• Need to talk about 

• data distributions

• statistical analyses

• to do hypothesis testing

• Tools:

• R - r-project.org

• R-Studio - rstudio.org
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Sampling
• Data sets come from some ideal universe

• all possible task performance times for a given menu selection
task

• Compare two samples with given means and deviations

• Are they really different? Or do they just appear different by
chance?

• Statistical testing gives us a p-value 

• probability that differences are random chance

• low values are significant
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
The key questions
• Given two sets of measurements, or samples, did they come
from the same underlying source or distribution

• x = [29 33 89 56 86 85 7 84 67 78 59 28 10 76 11 12 97 61
66 9 40 95 90 4 31 18 24 48 45 82]

• y = [51 3 10 11 5 90 87 13 64 86 67 98 12 55 56 80 59 63 94
93 25 4 79 52 36 73 99 22 62 2]

• mean(x) = 50.67, sd(x)=31.01

• mean(y) = 51.7, sd(y) = 33.26

• are they from the same distribution?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Boxplot
• Show quartiles

• Are they the same?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
“Normal” distributions
• Given mean and standard deviation (measure of variation)

• 95% of area under curve within 2 standard deviations

• If you take many samples from a space

• Their averages will go to a normal distribution

• Statistical testing -> comparison of distributions.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Histograms
Run a subset of a
population, 1000 times

get average of each subset

Normal distribution
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Hypothesis testing
• Test probability that there is no difference between two
distributions

• Possible errors

• Type 1 Error: α - reject null hypothesis when it is true

• believe there is a difference when there is none

• False positive

• Type 2 Error: β- accept null when false

• believe no difference when there is

• False Negative
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Significance Levels and Errors

• Highly significants ( p <0.001)

• Don't believe there is a difference unless it's really clear

• low chance of false positive – Type 1

• Greater chance of false of false negative /Type 2

• Less significant (p < 0.05)

• More ready to believe there is a difference 

• More false positive/type 1 errors

• fewer type 2 errors

• Usually use p=0.05 as cut-off.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Type 1 and Type 2 errors
Type 1 error

reject the null hypothesis when it is, in fact, true

Type 2 error

accept the null hypothesis when it is, in fact, false

Decision
Reality
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Statistical Methods -Crash Course
• Comparisons of samples

• t-tests: 2 alternatives to compares

• ANOVA: > 2 alternatives, multiple independent variables

• Correlation

• Regression
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
t-test
• x = [29 33 89 56 86 85 7 84 67 78 59 28 10 76 11 12 97 61
66 9 40 95 90 4 31 18 24 48 45 82]

• y = [51 3 10 11 5 90 87 13 64 86 67 98 12 55 56 80 59 63 94
93 25 4 79 52 36 73 99 22 62 2]

• t.test(x,y)
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Results
Welch Two Sample t-test
data: x and y
t = -0.1245, df = 57.72, p-value = 0.9014
alternative hypothesis: true difference in means is not equal
to 0
95 percent confidence interval:
-17.65522 15.58855
sample estimates:
mean of x mean of y
50.66667 51.70000
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
xkcd on significance testing

http://xkcd.com/882/
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Correlation
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Correlation
• Attributing causality

• a correlation does not imply cause and effect

• cause may be due to a third “hidden” variable related to
both other variables

• drawing strong conclusion from small numbers

• unreliable with small groups
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Regression
Calculates a line of “best fit”

Use the value of one variable to predict the value of the other

r2=.67, p < 0.01

r=.82
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Be careful

http://xkcd.com/552/
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
User Modeling

Hourcade, et al. 2004
Predict performance characteristics?

Calculate index of difficulty

similar to MT = a + b log2
(A/W+1)

Linear regression to see how well it fits
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Longitudinal use
• Lab studies are artificial
• Many tools used over time.
• use and understanding evolve
• Longitudinal studies look at usage over time
• Expensive, but better data
• Techniques
• Interviews, usability tests with multiple sessions, continuous data
logging, Instrumented software, Diaries
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Case Studies
• In-depth work with small number of users
• Multiple sessions
• Describe scenarios
• Illustrate use of tool to accomplish goals
• Good for novel designs, expert users
• Formative evaluation – can be used to gather requirements
• Summative – show validity of idea
• Possibly less compelling than usability evaluations.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Informed Consent
• Research must be done in a way that protects participants

• Principles 

• Respect for persons

• Beneficence – minimize possible harms, maximize possible benefits

• Justice – costs and benefits should not be limited to certain
populations

• Institutional Review Board (IRB) – approves experiments
and requires signatures on “informed consent” form.

• Crucial for responsible research
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Other Metrics
What if task completion time is not the most important
metric?

Insight?
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Automated Usability Testing
Possible for defined criteria

Text complexity?

Accessibility

WCAG

Section 508

Example: wave.webaim.org.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Log File Analysis
• Use clickstream and usage data to study actual use

• Which parts of the system are people using? 

• Which are they not using?

• Are they going in circles? 

• Are they having problems?

• Rich data, but hard to interpret

• particularly without observations or interviews to
provide context.
Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu
Shortcomings of User Studies
What happens in the lab may not be reflected in real use

Deployment/post-mortem, etc. 

Case studies, qualitative work

How can we meaningfully evaluate a system in use

… when deployment presents a significant expense...

Weitere ähnliche Inhalte

Was ist angesagt?

Critically Analyzing Research Resources
Critically Analyzing Research ResourcesCritically Analyzing Research Resources
Critically Analyzing Research Resources
Oxfordlibrary
 
Case study method
Case study methodCase study method
Case study method
Balogun53
 

Was ist angesagt? (14)

Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Module Three (part 2)
Module Three (part 2)Module Three (part 2)
Module Three (part 2)
 
Qualitative Research and CEHP (Turell & Howson)
Qualitative Research and CEHP (Turell & Howson)Qualitative Research and CEHP (Turell & Howson)
Qualitative Research and CEHP (Turell & Howson)
 
2013 08 27 - uon sph oer workshop feedback summary
2013 08 27 - uon sph oer workshop feedback summary2013 08 27 - uon sph oer workshop feedback summary
2013 08 27 - uon sph oer workshop feedback summary
 
Critically Analyzing Research Resources
Critically Analyzing Research ResourcesCritically Analyzing Research Resources
Critically Analyzing Research Resources
 
Publishing Scientific Research and How to Write High-Impact Research Papers
Publishing Scientific Research and How to Write High-Impact Research PapersPublishing Scientific Research and How to Write High-Impact Research Papers
Publishing Scientific Research and How to Write High-Impact Research Papers
 
Case study method
Case study methodCase study method
Case study method
 
Webinar on Dealing With Rejection and Publication Etiquette by Professor Abou...
Webinar on Dealing With Rejection and Publication Etiquette by Professor Abou...Webinar on Dealing With Rejection and Publication Etiquette by Professor Abou...
Webinar on Dealing With Rejection and Publication Etiquette by Professor Abou...
 
Taylor & Francis: Author and Researcher Workshop
Taylor & Francis: Author and Researcher WorkshopTaylor & Francis: Author and Researcher Workshop
Taylor & Francis: Author and Researcher Workshop
 
Qi toolkit oct 2020
Qi toolkit oct 2020 Qi toolkit oct 2020
Qi toolkit oct 2020
 
Survey toolkit final jc 2
Survey toolkit final jc 2Survey toolkit final jc 2
Survey toolkit final jc 2
 
Survey toolkit final jc
Survey toolkit final jcSurvey toolkit final jc
Survey toolkit final jc
 
How to deal with a journal rejection seminar
How to deal with a journal rejection seminar How to deal with a journal rejection seminar
How to deal with a journal rejection seminar
 
TIARA Module 2 Anne Sales Theory & Approaches 06192019
TIARA Module 2 Anne Sales Theory & Approaches 06192019TIARA Module 2 Anne Sales Theory & Approaches 06192019
TIARA Module 2 Anne Sales Theory & Approaches 06192019
 

Ähnlich wie Introduction to usability studies, presented to Baobab Health Trust

Translational Data Sharing: Informatics Challenges and Opportunities
Translational Data Sharing: Informatics Challenges and OpportunitiesTranslational Data Sharing: Informatics Challenges and Opportunities
Translational Data Sharing: Informatics Challenges and Opportunities
Harry Hochheiser
 
Phase II Overview of Information Literacy Assessment Project
Phase II Overview of Information Literacy Assessment ProjectPhase II Overview of Information Literacy Assessment Project
Phase II Overview of Information Literacy Assessment Project
shannonstaley70
 
Analyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ ResearchAnalyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ Research
NirmalPoudel4
 

Ähnlich wie Introduction to usability studies, presented to Baobab Health Trust (20)

Notes on user observations for Baobab Health Trust, March 2014
Notes on user observations for Baobab Health Trust, March 2014Notes on user observations for Baobab Health Trust, March 2014
Notes on user observations for Baobab Health Trust, March 2014
 
Notes on evaluation and usability inspection, Baobab Health Trust, March 2014
Notes on evaluation and usability inspection, Baobab Health Trust, March 2014Notes on evaluation and usability inspection, Baobab Health Trust, March 2014
Notes on evaluation and usability inspection, Baobab Health Trust, March 2014
 
Baobab Health, Cognitive Walkthrough
Baobab Health, Cognitive WalkthroughBaobab Health, Cognitive Walkthrough
Baobab Health, Cognitive Walkthrough
 
Baobab Health 2015 Usability Inspections
Baobab Health 2015 Usability InspectionsBaobab Health 2015 Usability Inspections
Baobab Health 2015 Usability Inspections
 
User Interface design notes
User Interface design notesUser Interface design notes
User Interface design notes
 
Usability - cognitive Factors - Baobab Health Trust, March 2014
Usability - cognitive Factors - Baobab Health Trust, March 2014 Usability - cognitive Factors - Baobab Health Trust, March 2014
Usability - cognitive Factors - Baobab Health Trust, March 2014
 
In-Depth Interviews: Techniques and Best Practices
In-Depth Interviews: Techniques and Best PracticesIn-Depth Interviews: Techniques and Best Practices
In-Depth Interviews: Techniques and Best Practices
 
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
 
Baobab 2015 modeling and design
Baobab 2015   modeling and designBaobab 2015   modeling and design
Baobab 2015 modeling and design
 
Interview workshop slides final
Interview workshop slides  finalInterview workshop slides  final
Interview workshop slides final
 
Burger_SSIB_Open_Sci_NutriXiv_7_2019_draft
Burger_SSIB_Open_Sci_NutriXiv_7_2019_draftBurger_SSIB_Open_Sci_NutriXiv_7_2019_draft
Burger_SSIB_Open_Sci_NutriXiv_7_2019_draft
 
Translational Data Sharing: Informatics Challenges and Opportunities
Translational Data Sharing: Informatics Challenges and OpportunitiesTranslational Data Sharing: Informatics Challenges and Opportunities
Translational Data Sharing: Informatics Challenges and Opportunities
 
Baobab spring 2015 analyzing
Baobab spring 2015   analyzingBaobab spring 2015   analyzing
Baobab spring 2015 analyzing
 
Hochheiser nlm-meeting-201406041612
Hochheiser nlm-meeting-201406041612Hochheiser nlm-meeting-201406041612
Hochheiser nlm-meeting-201406041612
 
Focus group
Focus groupFocus group
Focus group
 
Researc (according to Purpose)
Researc (according to Purpose)Researc (according to Purpose)
Researc (according to Purpose)
 
How To Ask The Right Questions
How To Ask The Right QuestionsHow To Ask The Right Questions
How To Ask The Right Questions
 
HCI_Lecture04.pptx
HCI_Lecture04.pptxHCI_Lecture04.pptx
HCI_Lecture04.pptx
 
Phase II Overview of Information Literacy Assessment Project
Phase II Overview of Information Literacy Assessment ProjectPhase II Overview of Information Literacy Assessment Project
Phase II Overview of Information Literacy Assessment Project
 
Analyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ ResearchAnalyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ Research
 

Mehr von Harry Hochheiser

Adventures in Translational Bioinformatics
Adventures in Translational BioinformaticsAdventures in Translational Bioinformatics
Adventures in Translational Bioinformatics
Harry Hochheiser
 

Mehr von Harry Hochheiser (8)

Baobab User stories
Baobab User storiesBaobab User stories
Baobab User stories
 
Baobab 2015 Cognitive issues and usability
Baobab 2015 Cognitive issues and usabilityBaobab 2015 Cognitive issues and usability
Baobab 2015 Cognitive issues and usability
 
Baobab spring 2015 usability and contextual inquiry
Baobab spring 2015   usability and contextual inquiryBaobab spring 2015   usability and contextual inquiry
Baobab spring 2015 usability and contextual inquiry
 
Toward interactive visual tools for comparing phenotype profiles
Toward interactive visual tools for comparing phenotype profilesToward interactive visual tools for comparing phenotype profiles
Toward interactive visual tools for comparing phenotype profiles
 
The Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype GridThe Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype Grid
 
Notes on redesign of Baobab Health Trust Prescribing Interface
Notes on redesign of Baobab Health Trust Prescribing InterfaceNotes on redesign of Baobab Health Trust Prescribing Interface
Notes on redesign of Baobab Health Trust Prescribing Interface
 
Modeling and Design Notes for HIV Testing and Counseling, Baobab Health
Modeling and Design Notes for HIV Testing and Counseling, Baobab HealthModeling and Design Notes for HIV Testing and Counseling, Baobab Health
Modeling and Design Notes for HIV Testing and Counseling, Baobab Health
 
Adventures in Translational Bioinformatics
Adventures in Translational BioinformaticsAdventures in Translational Bioinformatics
Adventures in Translational Bioinformatics
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Introduction to usability studies, presented to Baobab Health Trust

  • 1. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Usability Studies and Empirical Studies Harry Hochheiser University of Pittsburgh Department of Biomedical Informatics harryh@pitt.edu +1 412 648 9300 Attribution-ShareAlike CC BY-SA
  • 2. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Outline Usability Studies Think-Aloud Summative Studies Empirical Studies
  • 3. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Beyond Inspections Inspections won't tell you which problems users will face in action Might not identify mental models and confusions ..finding out where things go wrong.
  • 4. Baobab Health, March 2014 Harry Hochheiser, harryh@pitt.edu Edit Title Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu No bright dividing line in process Design Fully-functional Prototype Paper Prototype Release Usability Inspections Usability Studies Empirical User Studies, Case Studies, Longitudinal Studies, Acceptance Tests Low  cost,  low  validity Higher  cost,  validity
  • 5. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Formative Usability Studies: Goals • Generally, to understand if the proposed design supports completion of intended tasks • Be specific - • Tasks and users • Define success • User Satisfaction? • Do users like the tool? • What are the important metrics?
  • 6. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Formative Usability Studies: Tasks • Representative and specific • What would users do? • Realistic – given available time and resources • Appropriate for assessment of goals • Possibly some user-defined/suggested • Particularly if participants were informants in earlier requirements-gathering
  • 7. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Which Tasks? Bad: Give this a try? Better: Try to send an email, find a contact, and file a response Still better: Detailed scenario with multiple actions that required coordinated use of diverse components of an application's functionality Formative Usability Studies:
  • 8. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Formative Usability Studies: Conditions • Usability Lab • Two-way mirrors/separate rooms • Workspace • Online? • Often video and/or audio-recorded • Screen-capture • Logs and instrumented software • Goal: Ecological Validity
  • 9. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Formative Usability Studies: 
 Measures • Key question to answer: “can users complete tasks”? • Generally, lists of usability problems • Description of difficulty • Severity • Task completion times – depending on methods • Error rates? • User Satisfaction • Quantitative results for measuring success • Not comparative
  • 10. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Formative Usability Studies: 
 Methodology • Define Scope • Users complete tasks • Researchers observe process • What happens? • What goes right? What goes wrong? • Note difficulties, confusions? • Record – audio/video, screen capture
  • 11. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Formative Usability Studies: 
 Participants • Somewhat representative of likely users • Willing guinea-pigs • Need folks who are patient, willing to deal with problems • Well-motivated • Compensated • Eager to use the tool • Small numbers – repeat until diminishing returns • How many?
  • 12. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Nielsen – why you only need to test with 5 users http://www.useit.com/alertbox/20000319.html Hwang & Salvendy (2010) – maybe need 10 +/- 2 Only 5 users – or maybe not
  • 13. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Two approaches • Observation •Subject performs tasks, researchers observe • Ecological validity, but no insight into users • “Think aloud” •User describes mental state and goals
  • 14. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Think-Aloud Protocols • User describes what they are doing and why as they try to complete a task • Describe both goals and steps taken to achieve those goals. • Observe • Confusions – when steps taken don't lead to expected results • Misinterpretations – when choices don't lead to expected outcomes • Goal: identify both micro- and macro-level usability concerns • Strong similarities with contextual inquiry, but.. • Focus specifically on tool • Participant encouraged to narrat • Evaluator generally doesn’t ask questions
  • 15. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Caveats • Think-aloud is harder than it might sound • What is the role of the investigator? • How much feedback to provide? • Very Little • What (if anything) do you say when the user runs into problems? • Not much • What if it's a system that you built? • How to identify/describe a usability problem?
  • 16. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Think-Aloud Protocols: A Comparison of Three Think-Aloud Protocols for use in Testing Data-Dissemination Web Sites for Usability Olmsted-Hawala, et al. 2010 "... it is recommended that rather than writing a vague statement such as 'we had participants think aloud,' practitioners need to document their type of TA protocol more completely, including the kind and frequency of probing.”
  • 17. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Reporting Usability Problems
 adapted from Mack & Montaniz, 1994 • Breakdowns in goal-directed behavior • Correct action, noticeable effort • To find • To execute • Confused by consequence • Correct action, confusing outcome • Incorrect action requires recovery • Problem tangles • Qualitative analysis by interface interactions • Objects and actions • Higher-level categorization of interface interactions Gulf of Execution Gulf of Evaluation
  • 18. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Reporting Usability Problems
 adapted from Mack & Montaniz, 1994 • Inferring possible causes of problems • Problem reports • Design-relevant descriptions • Quantitative analysis of problems by severity
  • 19. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Formative Usability Studies: 
 Analysis • Challenge – identify problems at the right level of granularity? • When does a series of related difficulties lead to a need for redesign? • What if these difficulties come from different tasks? • When appropriate, relate usability observations back to contextual inquiry or other earlier investigations • Does the implementation fail to line up with the needs? • Perhaps in some unforeseen manner?
  • 20. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Formative Usability Studies: 
 Analysis • Multiple observers • Calculate agreement metrics? • Use audio, video, transcripts to illustrate difficulties • Particularly useful for demonstrating problems to implementation folks • Rate problem severity • Which are show-stoppers and which are nuisances? • Which require redesign vs. small changes? • Must prioritize...
  • 21. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Completion – Summative User Studies • Demonstrate successful execution of system • With respect to • Alternative system – even if straw man • Stated performance goals – Acceptance Tests • Generally empirical
  • 22. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Completion – Summative
 Studies of systems in use • Case studies • Descriptions of individual deployments • Qualitative • Longitudinal study of ongoing use • Collect data regarding impact • Similar to case studies, but potentially more quantitative. • Use observations and interviews to see what works?
  • 23. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu After system is complete More realistic conditions? Acceptance tests Usability tests aimed at measuring success Does the tool do what the client wants • 95% task completion rate within 3 minutes, etc.? Client has clearer idea – not just “user friendly” Summative Tests
  • 24. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu What: Empirical Studies • Quantitative measure of some aspect of successful system use • Task completion time (faster is better) • Error rate • Learnability • Retention • User satisfaction... • Quality of output?
  • 25. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Tension in empirical studies • Metrics that are easy to measure may not be most interesting • Task completion time • Error rate • Great for repetitive data entry tasks, less so for complex tasks • Analytics, writing...
  • 26. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Empirical User Studies: Goals • I have two interfaces – A and B. • Which is better? and how much better? • Want to determine if there is a measurable, consistent difference in • Task completion times • Error rates • Learnability • Memorability • Satisfaction
  • 27. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Running Example: Menu Structures • Hierarchical Menu structures • Multiple possibilities for any number of leaf nodes • Broad/Shallow vs. Narrow/Deep • which is faster?
  • 28. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Hypothesis • Testable Theory about the world • Galileo: The rate at which falling items fall is independent of their weight • Menus • Users will be able to find items more quickly with broad/shallow trees than with narrow/deep trees. • Often stated as a “null hypothesis” that you expect will be disproven: • There will be no difference in task performance time between broad/shallow trees and narrow/deep trees.
  • 29. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Background/Context • Controlled experiments from cognitive psychology • State a testable/falsifiable hypothesis • Identify a small number of independent variables to manipulate • hold all else constant • choose dependent variables • assign users to groups • collect data • statistically analyze & model
  • 30. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Other goals • Strive for • removal of bias • replicable results • Generalizable theory that can inform future work • or, demonstrable evidence of preference for one design over another.
  • 31. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Empirical User Studies: Tasks • Use variants of the design to complete some meaningful operation • Usually relatively close-ended, well-defined • Relatively clear success/failure
  • 32. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Empirical User Studies: Conditions • Lab-like? • Simulated realistic conditions?
  • 33. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Independent Variables • What are you going to test? • Condition that is “independent” of results • independent of user's behaviors • independent of what you're measuring. • one of 2 (or 3 or 4) things you're comparing. • can arise from subjects being classified into groups • Examples • Galileo: dropping a feather vs. bowling ball • Menu structures – broad/shallow vs. narrow/deep
  • 34. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Dependent variable • Values that hypothesis test • falling time • task performance time, etc. • May have more than one • Goal: show that changes in independent variable lead to measurable, reliable changes in dependent variables. • With multiple independent variables, look for interactions • Differences between interfaces increase with differences in task complexity
  • 35. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Controls • In order to reliably say that independent variables are responsible for changes in dependent variables, we must control for possible confounds • Control – keep other possible factors constant for each condition/value of independent variables • types of users, contexts, network speeds, computing environments • confound – uncontrolled factor that could lead to an alternate explanation for the results • What happens if you don’t control as much as possible? • Confounds, not independent variables, may be the cause of changes in dependent variables.
  • 36. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Examples of Controls • Galileo: • windy day vs. not windy? • Menus • network speed/delays? (do everything on one machine) • skills of users? (more on participant selection later) • font size, display information, etc.?
  • 37. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu • Related to controls • Experimenter can introduce biases that might influence outcomes • Instructions? • Choice of participants? • more on this in a moment • Protocols • prepare scripts ahead of time • Learning Effects? Bias Thanks to Jinjuan Feng for figure
  • 38. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Between-Groups vs. Within-Groups Design • How do you assign participants to conditions? • All people do all tasks/cells? • Within-groups – compare within groups of individuals. • one group of test participants • Certain people for certain cells? • between groups – compare between groups of individuals • 2 or more groups • Mixed models
  • 39. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Between Groups • Pros • Simpler design • Avoid learning effect • Don't have to worry about ordering • Cons • may need more participants • to get enough data for statistical tests • to avoid influence of some individuals.
  • 40. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Within-Groups • Pros: • Can be more powerful statistically • same person uses each of multiple interfaces • Fewer Participants • Cons • Learning effects require appropriate randomization of tasks/ interfaces • Fatigue is possible
  • 41. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Mixed Models • Elements of both • 3 different interfaces • Want to compare performance of different groups • Docs vs. Nurses? • Each interface a within-subject experiment • Across professions is between-subjects.
  • 42. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Other Challenges • Ordering tasks? • How many? • Want to avoid fatigue, boredom, and expense of long sessions • How many users? • 20 or more? • Variability among subjects • May be unforeseen. • Bi-modal distribution of education or computer experience? • Training materials • Run a pilot
  • 43. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Procedure • Users conduct tasks • Measure • record task completion times • errors • etc. • Now what? • Analyze data to see if there is support for the hypothesis • alternatively, if the
  • 44. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Hypothesis Testing • Not about proof or disproof • Instead, examine data • Find likelihood that the data occurred randomly if the null hypothesis is true • If this is small, say that we have support for the hypothesis
  • 45. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Data, Stats, and R • Need to talk about • data distributions • statistical analyses • to do hypothesis testing • Tools: • R - r-project.org • R-Studio - rstudio.org
  • 46. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Sampling • Data sets come from some ideal universe • all possible task performance times for a given menu selection task • Compare two samples with given means and deviations • Are they really different? Or do they just appear different by chance? • Statistical testing gives us a p-value • probability that differences are random chance • low values are significant
  • 47. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu The key questions • Given two sets of measurements, or samples, did they come from the same underlying source or distribution • x = [29 33 89 56 86 85 7 84 67 78 59 28 10 76 11 12 97 61 66 9 40 95 90 4 31 18 24 48 45 82] • y = [51 3 10 11 5 90 87 13 64 86 67 98 12 55 56 80 59 63 94 93 25 4 79 52 36 73 99 22 62 2] • mean(x) = 50.67, sd(x)=31.01 • mean(y) = 51.7, sd(y) = 33.26 • are they from the same distribution?
  • 48. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Boxplot • Show quartiles • Are they the same?
  • 49. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu “Normal” distributions • Given mean and standard deviation (measure of variation) • 95% of area under curve within 2 standard deviations • If you take many samples from a space • Their averages will go to a normal distribution • Statistical testing -> comparison of distributions.
  • 50. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Histograms Run a subset of a population, 1000 times get average of each subset Normal distribution
  • 51. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Hypothesis testing • Test probability that there is no difference between two distributions • Possible errors • Type 1 Error: α - reject null hypothesis when it is true • believe there is a difference when there is none • False positive • Type 2 Error: β- accept null when false • believe no difference when there is • False Negative
  • 52. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Significance Levels and Errors • Highly significants ( p <0.001) • Don't believe there is a difference unless it's really clear • low chance of false positive – Type 1 • Greater chance of false of false negative /Type 2 • Less significant (p < 0.05) • More ready to believe there is a difference • More false positive/type 1 errors • fewer type 2 errors • Usually use p=0.05 as cut-off.
  • 53. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Type 1 and Type 2 errors Type 1 error reject the null hypothesis when it is, in fact, true Type 2 error accept the null hypothesis when it is, in fact, false Decision Reality
  • 54. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Statistical Methods -Crash Course • Comparisons of samples • t-tests: 2 alternatives to compares • ANOVA: > 2 alternatives, multiple independent variables • Correlation • Regression
  • 55. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu t-test • x = [29 33 89 56 86 85 7 84 67 78 59 28 10 76 11 12 97 61 66 9 40 95 90 4 31 18 24 48 45 82] • y = [51 3 10 11 5 90 87 13 64 86 67 98 12 55 56 80 59 63 94 93 25 4 79 52 36 73 99 22 62 2] • t.test(x,y)
  • 56. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Results Welch Two Sample t-test data: x and y t = -0.1245, df = 57.72, p-value = 0.9014 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -17.65522 15.58855 sample estimates: mean of x mean of y 50.66667 51.70000
  • 57. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu xkcd on significance testing http://xkcd.com/882/
  • 58. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Correlation
  • 59. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Correlation • Attributing causality • a correlation does not imply cause and effect • cause may be due to a third “hidden” variable related to both other variables • drawing strong conclusion from small numbers • unreliable with small groups
  • 60. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Regression Calculates a line of “best fit” Use the value of one variable to predict the value of the other r2=.67, p < 0.01 r=.82
  • 61. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Be careful http://xkcd.com/552/
  • 62. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu User Modeling
 Hourcade, et al. 2004 Predict performance characteristics? Calculate index of difficulty similar to MT = a + b log2 (A/W+1) Linear regression to see how well it fits
  • 63. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Longitudinal use • Lab studies are artificial • Many tools used over time. • use and understanding evolve • Longitudinal studies look at usage over time • Expensive, but better data • Techniques • Interviews, usability tests with multiple sessions, continuous data logging, Instrumented software, Diaries
  • 64. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Case Studies • In-depth work with small number of users • Multiple sessions • Describe scenarios • Illustrate use of tool to accomplish goals • Good for novel designs, expert users • Formative evaluation – can be used to gather requirements • Summative – show validity of idea • Possibly less compelling than usability evaluations.
  • 65. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Informed Consent • Research must be done in a way that protects participants • Principles • Respect for persons • Beneficence – minimize possible harms, maximize possible benefits • Justice – costs and benefits should not be limited to certain populations • Institutional Review Board (IRB) – approves experiments and requires signatures on “informed consent” form. • Crucial for responsible research
  • 66. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Other Metrics What if task completion time is not the most important metric? Insight?
  • 67. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Automated Usability Testing Possible for defined criteria Text complexity? Accessibility WCAG Section 508 Example: wave.webaim.org.
  • 68. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Log File Analysis • Use clickstream and usage data to study actual use • Which parts of the system are people using? • Which are they not using? • Are they going in circles? • Are they having problems? • Rich data, but hard to interpret • particularly without observations or interviews to provide context.
  • 69. Baobab Health, March 2014Harry Hochheiser, harryh@pitt.edu Shortcomings of User Studies What happens in the lab may not be reflected in real use Deployment/post-mortem, etc. Case studies, qualitative work How can we meaningfully evaluate a system in use … when deployment presents a significant expense...