HCI have long moved beyond the evaluation setting of a single user
sitting in front of a single desktop computer, yet many of our fundamentally
held viewpoints about evaluation continues to be ruled by outdated biases
derived from this legacy. We need to engage with real users in 'Living
Laboratories', in which researchers either adopt or create functioning systems
that are used in real settings. These new experimental platforms will greatly
enable researchers to conduct evaluations that span many users, places, time,
location, and social factors in ways that are unimaginable before.
2. As a field, early fundamental contributions from:
– Computer scientists interested in changes in ways we
interact with information systems
– Psychologists interested in the implications of these
changes
Combustible, because:
– Computer scientists want to create great tools, but didn’t
know how to measure impact
– Psychologists want to go beyond classical research of the
brain and human cognition
The need to establish HCI as a science
– Adopt methods from psychology
– Good Examples: Fitts’ Law, Models of Human Memory,
Cognitive and Behavioral Modeling, Information Foraging
– Dual purpose: understand nature of human behavior and
build up a science of HCI techniques.
7/24/09 HCIC "Living Lab" 2
3. Many problems don’t fit the laboratory experimental methods
anymore
– Beyond a user in front of computer; Yet evaluation methods mostly
stayed the same
– Controlled lab study as the gold standard for acceptance
Changes and Trends in Social Computing and UbiComp
Old Assumptions New Considerations
Single display Multiple displays
Knowledge work Games, communication, social apps
Isolated worker Collaborative and social groups
Stationary location Mobile and stationary
Short task durations Short and long tasks, and tasks with no
time boundries
Controllable experimental conditions Uncontrollable experimental conditions
7/24/09 HCIC "Living Lab" 3
4. Artificial experimental setups are only capable of telling us behaviors in
constrained situations
Hard to generalize to new task contexts (with interruptions,
other tasks, other goals, unfocused attention, more displays)
Hard to generalize to other tools, apps
Ecological considerations
Adoption of mobile technology
iPhones in Japan, single‐handed input [PARC]
Best selling phones in Indonesia comes with a compass [Bell]
Impossible to answer questions about aggregate behaviors
of groups
Aggregate behavior of Wikipedia or Delicious users
7/24/09 HCIC "Living Lab" 4
5. Conduct research on real platforms and services
– Not to replace controlled lab studies
– Expand our arsenal to cover new situations
Some principles:
– Embedded in the real world
– Ecologically valid situations
– Embrace the complexity
– Rely on big‐data‐science to extract patterns
Not first to suggest this:
– S. Carter, J. Mankoff, S. Klemmer and T. Matthews. Exiting the cleanroom: On
ecological validity and ubiquitous computing. HCI Journal, 2008
– EClass [Abowd], PlaceLab [Intille], Plasma Poster [Churchill and Nelson], Digital
Family Portrait [Rowan, Mynatt]
7/24/09 HCIC "Living Lab" 5
14. Master degree was in computational molecular biology
Analogy: Just as biologists work on model plants and
genomes in the lab, this tells us just how it behaves in an
isolated environment under controlled conditions, but
not how the plant will behave in the real world.
Biologists don’t just study models in the lab, but in the
wild also.
7/24/09 HCIC "Living Lab" 14
15. Two dimensions
– 1. Whether the system is under the control of the researcher
– 2. Whether the study is conducted in the lab or in the wild
System Control System Not in
Control
Laboratory (1) Build a system, (2) Adopt a system,
study in the Lab study in the Lab
Wild (Real (4) Build a system, (3) Adopt a system,
World) release it, study in study in the Wild
the Wild
7/24/09 HCIC "Living Lab" 15
16. Traditional Approach; Numerous examples
Favored by HCI field reviewers
Typical situation is the study of some interaction technique
– Pen input, gestures, perception of some visualized data, reading tasks,
mobile text input
Typical measures are quantitative in nature
– performance in time, performance in accuracy, eyetracking, learning
measures, user preferences
Issues:
– Not always ecologically valid
– Hard to take all interactions into account
– Often time‐consuming; even though we thought we could do it fast.
7/24/09 HCIC "Living Lab" 16
17. Harder to find in the literature
Often comparing against an older system as baseline
Typical case is comparison of two systems
– (one website with another, one word processor vs. another)
– Which highlighting feature works better
– Two text input technique on a cell phone
Typical measures are similar to (1)
Issues:
– Some similar issues to (1) because it’s in lab
– System feature not in control, so not able to compare fairly, or
isolate the feature
7/24/09 HCIC "Living Lab" 17
18. Two dimensions
– 1. Whether the system is under the control of the researcher
– 2. Whether the study is conducted in the lab or in the wild
System Control System Not in
Control
Laboratory (1) Build a system, (2) Adopt a system,
study in the Lab study in the Lab
Wild (Real (4) Build a system, (3) Adopt a system,
World) release it, study in study in the Wild
the Wild
7/24/09 HCIC "Living Lab" 18
19. Real applications in ecological valid situations
Real findings can be applied to a running system
Impact of research is more immediate, since system is already
running
Typical case is log analytics with large subject pools
– log studies of web sites, real mobile calling usages, web search logs,
studies of Wikipedia edits.
Typical measures are stickiness, amount of activity, clustering
analysis, correlational analysis
Issues:
– Factors not in control, findings not comparable
– Factors cannot be isolated
– Reasons for failure is often just guesswork
7/24/09 HCIC "Living Lab" 19
20. Hypothesis: Conflict is what drives Wikipedia forward.
How to study this?
– John Tukey paradigm
– Get a large paper, and plot all of the data!
– Downloaded all of Wikipedia and all of the revisions
– Hadoop/MapReduce, MySQL, etc.
7/24/09 HCIC "Living Lab" 20
21. 100%
95% Maintenance
90%
Percentage of total edits
Other
85%
80%
User Talk
75%
User
70%
Article Talk
65%
Article
60%
2001 2002 2003 2004 2005 2006
7/24/09 HCIC "Living Lab" 21
22. Group D
Group A
Group B
Group C
Number of users in user group A B C Total
Users with Korean point of view 10 6 0 16
Users with Japanese point of view 1 8 7 16
7/24/09 Neutral or Unidentified
HCIC "Living Lab" 7 3 6 2217
23. Anonymous (vandals/
spammers)
Sympathetic to husband
Mediators
Sympathetic to parents
7/24/09 HCIC "Living Lab" 23
28. Hypothesis: Social Tagging doesn’t scale over time.
How to study this?
– Crawl as much tagging data as we can.
– Study the noise in the system.
– 40 machines for 3 months
7/24/09 HCIC "Living Lab" 28
34. Two dimensions
– 1. Whether the system is under the control of the researcher
– 2. Whether the study is conducted in the lab or in the wild
System Control System Not in
Control
Laboratory (1) Build a system, (2) Adopt a system,
study in the Lab study in the Lab
Wild (Real (4) Build a system, (3) Adopt a system,
World) release it, study in study in the Wild
the Wild
7/24/09 HCIC "Living Lab" 34
35. Similar to (3), practical for running systems; ecologically
valid, impact can be immediate.
– Good for cases in which economics makes sense [Google]
– Changes to system is possible; Factors can be controlled.
Typical case might be A/B testing, large subject pools
Typical measures are being developed
– Impact measures . Large visit # and interest (measured by blog
posts?) New Business inquiries?
– Usability measures vs. Usefulness measures
Issues:
– Effort and resource requirement is dropping but still significant
– Hard for a research lab to take on
7/24/09 HCIC "Living Lab" 35
43. Evaluation methods are in‐separable from the kinds of
science and models that can be build in a field.
Platform advances enable real technology insertion into
real world situations cheaper and more manageable.
Characteriza7on Models
Evalua7ons Prototypes
7/24/09 HCIC "Living Lab" 43
44. Research Vision: Understand how social computing systems can
enhance the ability of a group of people to remember, think, and
reason.
Living Laboratory: Create applications that harness collective
intelligence to improve knowledge capture, transfer, and discovery.
http://asc‐parc.blogspot.com
http://www.edchi.net
echi@parc.com
WikiDashboard MrTaggy SparTag.us