SlideShare ist ein Scribd-Unternehmen logo
1 von 29
evaluation, validation and
empirical methods

Alan Dix
http://www.alandix.com/
evaluation

   you’ve designed it, but is it right?
different kinds of evaluation

endless arguments
  quantitative vs. qualitative
  in the lab vs. in the wild
  experts vs. real users (vs UG students!)


really need to
  combine methods
         quantitative – what is true   &   qualitative – why
  what is appropriate and possible
purpose


Three
 Two types of evaluation

                         purpose               stage
   formative        improve a design        development

   summative        say “this is good”    contractual/sales

   investigative
    investigative   gain understanding
                     gain understanding       research
                                              research
    / exploratory
when does it end?

in a world of perpetual beta ...

      real use is the ultimate evaluation

logging, bug reporting, etc.
how do people really use the product?
are some features never used?
studies and experiments
what varies (and what you choose)

individuals / groups (not only UG students!)
tasks / activities
products / systems
principles / theories
prior knowledge and experience
learning and order effects

      which are you trying to find out about?
      which are ‘noise’
a little story …

BIG ACM sponsored conference
‘good’ empirical paper
looking at collaborative support for a task X
three pieces of software:
  A – domain specific software, synchronous          c      nc
                                                  syn    asy
  B – generic software, synchronous
                                        domain
  C – generic software, asynchronous      spec.
                                                  A

                                        generic   B      C
c
experiment                                                  syn
                                                                c
                                                                    as
                                                                      yn

                                                domain
                                                           A
                                                  spec.

                                                generic    B        C



reasonable nos. subjects in each condition
quality measures

significant results p<0.05
  domain spec. > generic       generic domain
                                        spec.
  asynchronous > synchronous                              sync      async



conclusion: really want async domain specific
what’s wrong with that?
                                                              c           nc
                                                          syn         a sy

interaction effects                             domain
                                                          A
                                                  spec.           ?
   gap is interesting to study
                                                generic   B          C
   not necessarily end up best

more important …
 if you blinked at the wrong moment …

NOT independent variables
   three different pieces of software   generic domain        sync       async
                                                 spec.
   like experiment on 3 people!
                                        B < A             B < C
   say system B was just bad
what went wrong?

borrowed psych method
    … but method embodies assumptions
    single simple cause, controlled environment


interaction needs ecologically valid exp.
    multiple causes, open situations


what to do?
    understand assumptions and modify
numbers and statistics
are five users enough?

one of the myths of usability!
    from a study by Nielsen and Landauer (1993)
        empirical work, cost–benefit analysis and averages,
        many assumptions: simplified model, iterative steps, ...

basic idea: decreasing returns
    each extra user gives less new information

really ... it depends
    for robust statistics – many many more
    for something interesting – one may be enough
points of comparison
measures:
  average satisfaction 3.2 on a 5 point scale
  time to complete task in range 13.2–27.6 seconds
  good or bad?
need a point of comparison
  but what?
  self, similar system, created or real??
  think purpose ...
what constitutes a ‘control’
  think!!
do I need statistics?


finding some problem to fix   NO
to know
   how frequently it occurs
   whether most users experience it   YES
   if you’ve found most problems
statistics


need a course in itself!
             experimental design
             choosing right test
             etc., etc., etc.


a few things ...
statistical significance

stat. sig = likelihood of seeing effect by chance
           5% (p <0.05) = 1 in 20 chance
           beware many tests and cherry picking!
           10 tests means 50:50 chance of seeing p<0.05
  not necessarily large effect (i.e. ≠ important)

non-significant = not proven (NOT no effect)
  may simply not be sensitive enough
  e.g. too few users
  to show no (small) effect need other methods
           find out about confidence intervals!
statistical power

how likely effect will show up in experiment
   more users means more ‘power’
            2x senisitivity needs 4x number of users

manipulate it!
   more users (but usually many more)
   within subject/group (‘cancels’ individual diffs.)
   choice of task (particularly good/bad)
   add distracter task
from data to knowledge
types of knowledge

descriptive
   explaining what happened

predictive
   saying what will happen
                    cause ⇒effect
   where science often ends
synthetic
• synthetic
working out what to do to make make you want happen
   – working out what to do to what what you want
     happen         effect ⇒cause
             effect ⇒cause
design and engineering
   – design and engineering
generalisation?

can we ever generalise?
every situation is unique, but ...
     ... to use past experience is to generalise

generalisation ≠ abstraction
           cases, descriptive frameworks, etc.

data ≠ generalistion
           interpolation – maybe
           extrapolation??
generalisation ...

    never comes (solely) from data

     always comes from the head

        requires understanding
mechanism


                                       ?
reduction reconstruction
   – formal hypothesis testing
   + may be qualitative too
   – more scientific precision

 wholistic analytic
•– wholistic analytic
   field studies, ethnographies
   – field studies, ethnographies
+ ‘end to end’ experiments
                                    ? ? ? ? ?
   + ‘end to end’ experiments
– more ecological validity
   – more ecological validity
from evaluation to validation
validating work

                            your work




                     sa                            evaluation
                       m
                         pl
                           in
 •   justification            g         •experiments
                                          evaluation
              singularity?
       – expert opinion                 – experiments
            different people
      – previous research               user studies
                                        – user studies
      – newdifferent situations
             experiments
                                        peer review
                                        – peer review
generative artefacts

                            artefact

toolkits
devices
interfaces                                             evaluation
                          singularity           to
guidelines                                    to o m
                          people, situations
methodologies
 • justification                                 sa an
                                             • evaluation
                          plus ...                 m y
    – expert opinion
                                                     pl
                                             – experiments
    – previous research                      –         e
                          different designers user studies
    – new experiments     different briefs   – peer review


     (pure) evaluation of generative artefacts
           is methodologically unsound
validating work

                           your work




justification                                  evaluation

 expert opinion
 • justification                       •experiments
                                         evaluation
 previous opinion
     – expert research                 user studies
                                       – experiments
     – previous research               – user studies
 new new experiments
     – experiments                     peer review
                                       – peer review
justification vs. validation


    justification                          evaluation



 • different disciplines
    – mathematics: proof = justification
    – medicine: drug trials = evaluation

 • combine them:
    – look for weakness in justification
    – focus evaluation there
example – scroll arrows ...
Xerox STAR – first commercial GUI
      precursor of Mac, Windows, ...
      principled design decisions

which direction for scroll arrows?
      not obvious: moving document or handle?
=> do a user study!
      gap in justification => evaluation
unfortunately ...
      Apple got the wrong designs 

Weitere ähnliche Inhalte

Was ist angesagt?

More than a Moment.
More than a Moment. More than a Moment.
More than a Moment.
Alan Dix
 
Methods for Identifying and Modeling Users Needs
Methods for Identifying and Modeling Users NeedsMethods for Identifying and Modeling Users Needs
Methods for Identifying and Modeling Users Needs
Luis Carlos Aceves
 
Mobile Prototyping Essentials Workshop: Part 2
Mobile Prototyping Essentials Workshop: Part 2Mobile Prototyping Essentials Workshop: Part 2
Mobile Prototyping Essentials Workshop: Part 2
Rachel Hinman
 
E design hci team intro
E design hci team introE design hci team intro
E design hci team intro
Capital One
 
Ohad Barzilay - Enhancing Productivity by Example - AgileIL11
Ohad Barzilay - Enhancing Productivity by Example - AgileIL11Ohad Barzilay - Enhancing Productivity by Example - AgileIL11
Ohad Barzilay - Enhancing Productivity by Example - AgileIL11
AgileSparks
 
Design Theory - Lecture 02: Design processes & Problem solving
Design Theory - Lecture 02: Design processes & Problem solvingDesign Theory - Lecture 02: Design processes & Problem solving
Design Theory - Lecture 02: Design processes & Problem solving
Bas Leurs
 
Mobile Prototyping Essentials Workshop: Part 1
Mobile Prototyping Essentials Workshop: Part 1Mobile Prototyping Essentials Workshop: Part 1
Mobile Prototyping Essentials Workshop: Part 1
Rachel Hinman
 

Was ist angesagt? (20)

More than a Moment.
More than a Moment. More than a Moment.
More than a Moment.
 
Formal 8 – Interaction Models – describing general properties of systems incl...
Formal 8 – Interaction Models – describing general properties of systems incl...Formal 8 – Interaction Models – describing general properties of systems incl...
Formal 8 – Interaction Models – describing general properties of systems incl...
 
Methods for Identifying and Modeling Users Needs
Methods for Identifying and Modeling Users NeedsMethods for Identifying and Modeling Users Needs
Methods for Identifying and Modeling Users Needs
 
Mobile Prototyping Essentials Workshop - Part One
Mobile Prototyping Essentials Workshop - Part OneMobile Prototyping Essentials Workshop - Part One
Mobile Prototyping Essentials Workshop - Part One
 
Introduction To HCI
Introduction To HCIIntroduction To HCI
Introduction To HCI
 
Mobile Prototyping Essentials - Part II
Mobile Prototyping Essentials - Part IIMobile Prototyping Essentials - Part II
Mobile Prototyping Essentials - Part II
 
Blah
BlahBlah
Blah
 
Mobile Prototyping Essentials Workshop: Part 2
Mobile Prototyping Essentials Workshop: Part 2Mobile Prototyping Essentials Workshop: Part 2
Mobile Prototyping Essentials Workshop: Part 2
 
Ux prototyping
Ux prototypingUx prototyping
Ux prototyping
 
E design hci team intro
E design hci team introE design hci team intro
E design hci team intro
 
USI courses
USI coursesUSI courses
USI courses
 
Ohad Barzilay - Enhancing Productivity by Example - AgileIL11
Ohad Barzilay - Enhancing Productivity by Example - AgileIL11Ohad Barzilay - Enhancing Productivity by Example - AgileIL11
Ohad Barzilay - Enhancing Productivity by Example - AgileIL11
 
P1 probes
P1 probesP1 probes
P1 probes
 
MHIT 603: Introduction to Prototyping
MHIT 603: Introduction to PrototypingMHIT 603: Introduction to Prototyping
MHIT 603: Introduction to Prototyping
 
Design Theory - Lecture 02: Design processes & Problem solving
Design Theory - Lecture 02: Design processes & Problem solvingDesign Theory - Lecture 02: Design processes & Problem solving
Design Theory - Lecture 02: Design processes & Problem solving
 
Icpc 2011 storey
Icpc 2011 storeyIcpc 2011 storey
Icpc 2011 storey
 
Interaction design patterns
Interaction design patternsInteraction design patterns
Interaction design patterns
 
Mobile Prototyping Essentials Workshop: Part 1
Mobile Prototyping Essentials Workshop: Part 1Mobile Prototyping Essentials Workshop: Part 1
Mobile Prototyping Essentials Workshop: Part 1
 
Guerrilla User and Design Research
Guerrilla User and Design ResearchGuerrilla User and Design Research
Guerrilla User and Design Research
 
User Interface Design in Practice
User Interface Design in PracticeUser Interface Design in Practice
User Interface Design in Practice
 

Andere mochten auch

Erni types of evaluation
Erni  types of evaluationErni  types of evaluation
Erni types of evaluation
Youise Saculo
 

Andere mochten auch (20)

Implementation
ImplementationImplementation
Implementation
 
Emotion
EmotionEmotion
Emotion
 
The Human: Sound and hearing
The Human: Sound and hearingThe Human: Sound and hearing
The Human: Sound and hearing
 
Introducing Human Computer Interaction
Introducing Human Computer InteractionIntroducing Human Computer Interaction
Introducing Human Computer Interaction
 
The Human: Eye and vision
The Human: Eye and visionThe Human: Eye and vision
The Human: Eye and vision
 
Information visualisation
Information visualisationInformation visualisation
Information visualisation
 
Types of Evaluation
Types of EvaluationTypes of Evaluation
Types of Evaluation
 
Different kinds of evaluation
Different kinds of evaluationDifferent kinds of evaluation
Different kinds of evaluation
 
Erni types of evaluation
Erni  types of evaluationErni  types of evaluation
Erni types of evaluation
 
The Human perception & Overview
The Human perception & OverviewThe Human perception & Overview
The Human perception & Overview
 
Types of Evaluation prior to Instructional Act
Types of Evaluation prior to Instructional ActTypes of Evaluation prior to Instructional Act
Types of Evaluation prior to Instructional Act
 
Eval types
Eval typesEval types
Eval types
 
Types of evaluation
Types of evaluationTypes of evaluation
Types of evaluation
 
Rizal's japan experience
Rizal's japan experienceRizal's japan experience
Rizal's japan experience
 
Lecture 5 Materials Development and Adaptation
Lecture 5 Materials Development and AdaptationLecture 5 Materials Development and Adaptation
Lecture 5 Materials Development and Adaptation
 
Technical English 2 (May 2015) - Reading Material
Technical English 2 (May 2015) - Reading MaterialTechnical English 2 (May 2015) - Reading Material
Technical English 2 (May 2015) - Reading Material
 
Material evaluation esp group 5
Material evaluation esp group 5Material evaluation esp group 5
Material evaluation esp group 5
 
Types of Evaluation 1.2
Types of Evaluation 1.2Types of Evaluation 1.2
Types of Evaluation 1.2
 
Rizal Romantic interlude in japan 1888
Rizal Romantic interlude in japan 1888Rizal Romantic interlude in japan 1888
Rizal Romantic interlude in japan 1888
 
Chapter 12: Romantic Interlude in Japan
Chapter 12: Romantic Interlude in JapanChapter 12: Romantic Interlude in Japan
Chapter 12: Romantic Interlude in Japan
 

Ähnlich wie Evaluation

Intelligent Tutoring Systems: The DynaLearn Approach
Intelligent Tutoring Systems: The DynaLearn ApproachIntelligent Tutoring Systems: The DynaLearn Approach
Intelligent Tutoring Systems: The DynaLearn Approach
Wouter Beek
 
evaluation technique uni 2
evaluation technique uni 2evaluation technique uni 2
evaluation technique uni 2
vrgokila
 
Usability testing for qualitative researchers
Usability testing for qualitative researchersUsability testing for qualitative researchers
Usability testing for qualitative researchers
Kay Corry Aubrey
 
Usability testing for qualitative researchers
Usability testing for qualitative researchersUsability testing for qualitative researchers
Usability testing for qualitative researchers
ResearchShare
 
Exploratory testing
Exploratory testingExploratory testing
Exploratory testing
Huib Schoots
 
PxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysisPxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysis
hendrikknoche
 
Usability_Presentation
Usability_PresentationUsability_Presentation
Usability_Presentation
Xuan Guo
 
Prototype and User Test
Prototype and User TestPrototype and User Test
Prototype and User Test
David Gelb
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
Daniel Tunkelang
 

Ähnlich wie Evaluation (20)

Validation and mechanism: exploring the limits of evaluation
Validation and mechanism: exploring the limits of evaluationValidation and mechanism: exploring the limits of evaluation
Validation and mechanism: exploring the limits of evaluation
 
COSC 426 Lect. 7: Evaluating AR Applications
COSC 426 Lect. 7: Evaluating AR ApplicationsCOSC 426 Lect. 7: Evaluating AR Applications
COSC 426 Lect. 7: Evaluating AR Applications
 
Intelligent Tutoring Systems: The DynaLearn Approach
Intelligent Tutoring Systems: The DynaLearn ApproachIntelligent Tutoring Systems: The DynaLearn Approach
Intelligent Tutoring Systems: The DynaLearn Approach
 
SoLAR-FlareUK-2012.11.19-lightningtalks
SoLAR-FlareUK-2012.11.19-lightningtalksSoLAR-FlareUK-2012.11.19-lightningtalks
SoLAR-FlareUK-2012.11.19-lightningtalks
 
evaluation technique uni 2
evaluation technique uni 2evaluation technique uni 2
evaluation technique uni 2
 
Evaluation and User Study in HCI
Evaluation and User Study in HCIEvaluation and User Study in HCI
Evaluation and User Study in HCI
 
Usability testing for qualitative researchers
Usability testing for qualitative researchersUsability testing for qualitative researchers
Usability testing for qualitative researchers
 
Usability testing for qualitative researchers
Usability testing for qualitative researchersUsability testing for qualitative researchers
Usability testing for qualitative researchers
 
Exploratory testing
Exploratory testingExploratory testing
Exploratory testing
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
 
PxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysisPxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysis
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
 
Usability_Presentation
Usability_PresentationUsability_Presentation
Usability_Presentation
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
Efficient And Effective Test Design
Efficient And Effective Test DesignEfficient And Effective Test Design
Efficient And Effective Test Design
 
Doing observation and Data Analysis for Qualitative Research
Doing observation and Data Analysis for Qualitative ResearchDoing observation and Data Analysis for Qualitative Research
Doing observation and Data Analysis for Qualitative Research
 
User experience design portfolio, Harry Brenton
User experience design portfolio, Harry Brenton User experience design portfolio, Harry Brenton
User experience design portfolio, Harry Brenton
 
Prototype and User Test
Prototype and User TestPrototype and User Test
Prototype and User Test
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Building a Game for a Assessment Nursing Game
Building a Game for a Assessment Nursing GameBuilding a Game for a Assessment Nursing Game
Building a Game for a Assessment Nursing Game
 

Kürzlich hochgeladen

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Kürzlich hochgeladen (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 

Evaluation

  • 1. evaluation, validation and empirical methods Alan Dix http://www.alandix.com/
  • 2. evaluation you’ve designed it, but is it right?
  • 3. different kinds of evaluation endless arguments quantitative vs. qualitative in the lab vs. in the wild experts vs. real users (vs UG students!) really need to combine methods quantitative – what is true & qualitative – why what is appropriate and possible
  • 4. purpose Three Two types of evaluation purpose stage formative improve a design development summative say “this is good” contractual/sales investigative investigative gain understanding gain understanding research research / exploratory
  • 5. when does it end? in a world of perpetual beta ... real use is the ultimate evaluation logging, bug reporting, etc. how do people really use the product? are some features never used?
  • 7. what varies (and what you choose) individuals / groups (not only UG students!) tasks / activities products / systems principles / theories prior knowledge and experience learning and order effects which are you trying to find out about? which are ‘noise’
  • 8. a little story … BIG ACM sponsored conference ‘good’ empirical paper looking at collaborative support for a task X three pieces of software: A – domain specific software, synchronous c nc syn asy B – generic software, synchronous domain C – generic software, asynchronous spec. A generic B C
  • 9. c experiment syn c as yn domain A spec. generic B C reasonable nos. subjects in each condition quality measures significant results p<0.05 domain spec. > generic generic domain spec. asynchronous > synchronous sync async conclusion: really want async domain specific
  • 10. what’s wrong with that? c nc syn a sy interaction effects domain A spec. ? gap is interesting to study generic B C not necessarily end up best more important … if you blinked at the wrong moment … NOT independent variables three different pieces of software generic domain sync async spec. like experiment on 3 people! B < A B < C say system B was just bad
  • 11. what went wrong? borrowed psych method … but method embodies assumptions single simple cause, controlled environment interaction needs ecologically valid exp. multiple causes, open situations what to do? understand assumptions and modify
  • 13. are five users enough? one of the myths of usability! from a study by Nielsen and Landauer (1993) empirical work, cost–benefit analysis and averages, many assumptions: simplified model, iterative steps, ... basic idea: decreasing returns each extra user gives less new information really ... it depends for robust statistics – many many more for something interesting – one may be enough
  • 14. points of comparison measures: average satisfaction 3.2 on a 5 point scale time to complete task in range 13.2–27.6 seconds good or bad? need a point of comparison but what? self, similar system, created or real?? think purpose ... what constitutes a ‘control’ think!!
  • 15. do I need statistics? finding some problem to fix NO to know how frequently it occurs whether most users experience it YES if you’ve found most problems
  • 16. statistics need a course in itself! experimental design choosing right test etc., etc., etc. a few things ...
  • 17. statistical significance stat. sig = likelihood of seeing effect by chance 5% (p <0.05) = 1 in 20 chance beware many tests and cherry picking! 10 tests means 50:50 chance of seeing p<0.05 not necessarily large effect (i.e. ≠ important) non-significant = not proven (NOT no effect) may simply not be sensitive enough e.g. too few users to show no (small) effect need other methods find out about confidence intervals!
  • 18. statistical power how likely effect will show up in experiment more users means more ‘power’ 2x senisitivity needs 4x number of users manipulate it! more users (but usually many more) within subject/group (‘cancels’ individual diffs.) choice of task (particularly good/bad) add distracter task
  • 19. from data to knowledge
  • 20. types of knowledge descriptive explaining what happened predictive saying what will happen cause ⇒effect where science often ends synthetic • synthetic working out what to do to make make you want happen – working out what to do to what what you want happen effect ⇒cause effect ⇒cause design and engineering – design and engineering
  • 21. generalisation? can we ever generalise? every situation is unique, but ... ... to use past experience is to generalise generalisation ≠ abstraction cases, descriptive frameworks, etc. data ≠ generalistion interpolation – maybe extrapolation??
  • 22. generalisation ... never comes (solely) from data always comes from the head requires understanding
  • 23. mechanism ? reduction reconstruction – formal hypothesis testing + may be qualitative too – more scientific precision wholistic analytic •– wholistic analytic field studies, ethnographies – field studies, ethnographies + ‘end to end’ experiments ? ? ? ? ? + ‘end to end’ experiments – more ecological validity – more ecological validity
  • 24. from evaluation to validation
  • 25. validating work your work sa evaluation m pl in • justification g •experiments evaluation singularity? – expert opinion – experiments different people – previous research user studies – user studies – newdifferent situations experiments peer review – peer review
  • 26. generative artefacts artefact toolkits devices interfaces evaluation singularity to guidelines to o m people, situations methodologies • justification sa an • evaluation plus ... m y – expert opinion pl – experiments – previous research – e different designers user studies – new experiments different briefs – peer review (pure) evaluation of generative artefacts is methodologically unsound
  • 27. validating work your work justification evaluation expert opinion • justification •experiments evaluation previous opinion – expert research user studies – experiments – previous research – user studies new new experiments – experiments peer review – peer review
  • 28. justification vs. validation justification evaluation • different disciplines – mathematics: proof = justification – medicine: drug trials = evaluation • combine them: – look for weakness in justification – focus evaluation there
  • 29. example – scroll arrows ... Xerox STAR – first commercial GUI precursor of Mac, Windows, ... principled design decisions which direction for scroll arrows? not obvious: moving document or handle? => do a user study! gap in justification => evaluation unfortunately ... Apple got the wrong designs 