Design at Large: Integrating Teaching and Experiments Online featuring Scott Klemmer

DESIGN AT LARGE
Scaling the Studio 
and the Lab 
to the Globe
Scott Klemmer

The successes
are tremendously
exciting

…but the failure rate is high. 
The challenge: Design is often faith-based rather than research-based.

“Nothing is as
practical as a
good theory”
–Kurt Lewin

Norman & Klemmer (2014) How design education must change
Why a shortfall of design principles?
• Engineering excels at practical theory 
…from the physical sciences.
• The human world is different
• Introspection is valuable 
…but often misleading
• Industry is empirical 
…but product focused

www.solveforx.com
• Build practical theory with
real-world experiments
• Bake that theory into
software that transforms
<X>
DESIGN AT LARGE

Is this possible?
Klemmer & Carroll (2014) HCI Special Issue: Understanding Design Thinking

“There are no rules of
composition in photography,
there are only good
photographs”
-Ansel Adams

Smith et al. 1993 
Examples can increase conformity...

Will nothing new
ever be created?

—E.W. Dijkstra, On the Cruelty of Really Teaching
Computer Science
Just for Small Innovations?
“By ... metaphors and analogies we try to link the new to the old,
the novel to the familiar. Under suﬃciently slow and gradual
change, it works reasonably well;
in the case of a sharp discontinuity, however, the method breaks
down ... our past experience is no longer relevant, the analogies
become too shallow, and the metaphors become more
misleading than illuminating. This is the situation ... for radical
novelty.”

Les Demoiselles d'Avignon
John Richardson, A Life of Picasso:The Cubist Rebel, 1907-1916

“Good artists borrow, great artists steal”
—Pablo Picasso
19th century Fang sculptureLes Demoiselles d'Avignon
John Richardson, A Life of Picasso:The Cubist Rebel, 1907-1916

Design Learning at Large
Chinmay Kulkarni et al.
Peer and Self Assessment in Massive Online Classes, Chinmay
Kulkarni, Koh Pang Wei, Huy Le, Daniel Chia, Kathryn Papadopoulos,
Justin Cheng, Daphne Koller, Scott R. Klemmer. TOCHI: ACM
Transactions on Computer-Human Interaction, 2013
The identify-verify pattern scales short-answer grading by combining
peer assessment with algorithmic scoring, Chinmay Kulkarni, Richard
Socher, Michael S. Bernstein, Scott R. Klemmer. Learning at Scale, 2014
Talkabout: Making distance matter with small groups in massive
classes, Chinmay Kulkarni, Julia Cambre, Yasmine Kotturi, Michael S.
Bernstein, Scott Klemmer, CSCW: ACM Conference on Computer
Supported Cooperative Work, 2015
Structure and messaging techniques for online peer learning systems
that increase stickiness, Yasmine Kotturi, Chinmay Kulkarni, Michael
Bernstein, Scott Klemmer, ACM Learning at Scale, 2015
PeerStudio: Rapid Peer Feedback Emphasizes Revision and Improves
Performance, Chinmay Kulkarni, Michael S Bernstein, Scott R Klemmer,
ACM Learning at Scale, 2015

Beyond Being There
Hollan, Jim, and Scott Stornetta. "Beyond being there.”ACM, 1992.

3 ingredients central to
learning,
but hard to scale
22

1. Feedback on open-ended work
Schön, D. (1987). Educating the reﬂective practitioner: Toward a new
design for teaching and learning in the professions.

2. Engaging Diverse Perspectives
Model UN Design Crit
Gurin, P. et al. (2002) Diversity and higher education: Theory
and impact on educational outcomes, Harvard Educational Review

3. Revision for mastery
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of
deliberate practice in the acquisition of expert performance.
Image Courtesy IDEO

Global peer assessment
Interfaces for accurate assessment of open-
ended work
Talkabout
Diversity as a design opportunity:
Small group discussions in massive classes
PeerStudio
Scale as a design opportunity:
Immediate feedback for mastery

The paradox of peer
processes
Non-experts performing expert work

Our approach:
Calibrated peer review
Chinmay Kulkarni, et al.
Peer and Self Assessment in Massive Online Classes, TOCHI 2013
3) Reﬂect
(Assess: Self)
2) Assess: Peers1) Train: calibrate
✓

Large scale peer assessment 
Human-computer
Interaction
Design
Teaching
character
Management
Constitutional law
Arguments
Introduction to
Philosophy
Essays
Social
Psychology
Essays
Programming
in Python
Code
Child
Nutrition
Recipes
World Music
Music
used by 100,000+ students

Assessment training is crucial
0
25
50
75
100
0 25 50 75 100
Self grade (%)
Peergrade(%)
No Training
r=0.58
0
20
40
60
80
100
0 20 40 60 80 100
Self grade (%)
Peergrade(%)
With Training
r=0.73

How well do peer and staff
assessments correlate?
3) Reflect
(Assess: Self)
2) Assess: Peers
staff-graded
1) Assess: calibrate
✓
Dataset: 99 submissions with ~160 peer assessments each.

Grading for a pass-fail class
Extrapolated results from a bootstrapped simulation
Earn certificate if staff-graded
Certificate awarded
No certificate
97.8%
2.2%
No certificate
Certificate awarded
99.3%
0.7%
No certificate if staff-graded

Students with novel answers
sometimes penalized unfairly
“damn peer review - it was a bunch of
[students] just making things ﬁt into a rubric -
checking oﬀ a check sheet - like talking about
dog poop. what is this world coming to?”
-A student in a peer-assessed class

“I've never seen something like that!”
Introduction to Art
“Treasure Cage” from Canada“Magical lights” from Norway

The return of the
novices-as-experts paradox
“fully interactive, page ﬂow is
complete… make it clearer
what people should do next”
Experts:
capture the structure
of rubric
Peers:
Focus on superﬁcial
features, even when
asked not to
“unpolished…Try to make UI less
coloured.”

Fortune cookies for qualitative,
personalized feedback
• Peers can recognize errors from a list of
patterns, even if they can’t articulate them
• Most errors are variations on a theme
+
“...because _____________________”
Cue Variation

Students Made it Theirs
• Sharing cool interfaces, resources,
articles
• Collating reading lists, creating
assignment aids
• Doing really creative work
• Helping other students
• heuristic evaluation feedback
• answering forum questions
• extra peer assessment

I am Chandramouli Sharma, a junior year undergraduate in Computer Science from the National Institute of Technology Karnataka, India.
I am one of those thousands of students who took the HCI class on Coursera in October 2012. I had timing clashes, so I had to finish the
course during December in vacations.
Here is my amazing journey from a small project in HCI class to a platform that will now be used by thousands of schools in 47 countries
and the awards I won along the way. I have illustrated it through pictures. This is a tribute to you for the great you class took at no fee.
Note: Images might take some time to load.
What I worked on..
I worked on a web application which could display complex environmental pollution data sets into interactive visualizations. This could be
used by school students to understand environmental issues. Below is one of the paper prototypes that I developed during the class.
After a few iterations I came up with a digital prototype. It looked something like this.

Why diversity?
Diﬀerent professional
knowledge, educational
systems, and cultural
values
Information
[Tudge ’08]
Cognition
[Gurin et al. ’02]
[Nemeth ’86]
[Schwartz et al ’04]
From passivity to active,
eﬀortful, conscious
thinking

Students are often homophilic
Hurtado, S. et al. (1998) The Climate for Diversity: Key Issues
for Institutional Self-Study.

Talkabout: video discussions
with global peers
Kulkarni, C, et. al. “Talkabout: Making distance matter with
small groups in massive classes”, CSCW 2015

Group assignment algorithm
• Talkabout assigns to one of many parallel
groups.
• Assignment is greedy, constrained by
preferred group size
• balances gender
• improves geographic diversity

Lex, student in
Organizational
Analysis

Discussants as far apart as
New York and London
Median pair-wise distance 4,100 mi (6,600 km)
0
50
100
2 3 4 5 6
Number of countries in discussion
NumberofDiscussions

Students discuss twice as long
as instructors asked them to
Discussion duration (minutes)
Number of
Students
Median duration
0
100
200
30 60 90 120 150 180
Recommended
duration

Do diverse, small-group discussions
improve learning outcomes?
1. Does participation help?
2. Does diversity amplify
participation beneﬁts?
IRB #30319

Study: Beneﬁts of
Participation
• n=934, Irrational Behavior
• Dependent measure: total course grade (%)
• Between-subjects
Wait list
No talkabout for
ﬁrst half of class
Discussion
Talkabout
throughout class

Course grades higher in
discussion condition
Irrational Behavior  
(p<0.05)
Total
grade
(%)
0
10
20
30
40
50
Discussion Wait list
(control)
6% of total grade

Study: Benefits of Diversity
• n=2,422, Social Psychology
• Quasi-experiment: discussants assigned to
first available group
• Result: natural variation in diversity
• Measure: performance on final exam
• OLS regression controls for prior performance

Diverse discussions lead to
higher ﬁnal scores
0%
2.5%
5%
Social Psychology Organizational Analysis
3.6%
2.4%
Grade
diﬀerence
(most-least
diverse)
ior

Evaluation Goals

Talkabout as a springboard to
global friendships
“We shared emails because we are discussing
issues that require a strong, networked group
to change the status quo… the impact would
be far greater if participants could connect
and engage outside of the course”
-Student in International Women’s Health and Human Rights class
Average (9 classes)
International Women’s  
Health and Human Rights
0% 25% 50% 75% 100%
92%
47.2%
“Shared contact info with group”

5,000+ students from 134 countries
Social Psychology
International Women’s
Health & Human
Rights
Learning How to Learn
How to Change the
World
Understanding
Research Methods
Irrational Behavior
Critical Perspectives
on Management
Organizational
Analysis
Think Again: How to
Reason and Argue
translated by students into French & Spanish

PeerStudio scales interactive
peer feedback
Kulkarni C., Bernstein M., Klemmer S. (2015)
“PeerStudio: Rapid Feedback Emphasizes Revision and Improves
Performance”, Learning@Scale
Submit for feedback
Give feedback to two
peers
Submit for
grades
Read feedback & revise

How might we lower the
training burden?
0.0
2.5
5.0
7.5
10.0
Evaluation Submission
10.5 hours
1.9 hours
Training Creating  
own work
Median hours
in activity
Training  
1.9 hours

Solution: contrasting cases for
training-free micro-expertise
Thompson, Gentner, Loewenstein (2000),
“Analogical Training More Powerful Than Individual Case Training”
Average Peer-majority/Staﬀ
diﬀerence: 5.7%

Time to ﬁrst feedback:
Learning How to Learn
0
50
100
<10 min <1 hr <2 hr <6 hr <12 hr <24 hr > 24 hr
Time to first review
Numberofsubmissions
native the plot

Problem: Accurate feedback
is not always actionable

Solution: Real-time tips for
actionable feedback
• Correctness and velocity feedback leads to
large improvements
• Speciﬁc, topic-relevant feedback more useful
• Logistic regression with bag-of-words features
predicts relevance

Solution: Real-time tips for
actionable feedback
1 Calculate an internal score
for each rubric dimension
2 Generate tips for reviewer
Overall, 81% of students
received actionable
comments

Without hints, students focus on
author, and what’s good
I think you are, I wish you, I hope you…
With hints, students focus on
work and what could be better
I think you should, you need to,
your work could…

N=104 in “Medical Education in the New Millennium” (edX)
Study: Does fast feedback
improve ﬁnal performance?
Early feedback,
fast (<1 hr)
grades 4.4%
higher than No early
feedback
Early feedback,
delayed 24 hours
No early
feedback
grades same
as

• Build practical theory with
real-world experiments
• Bake pedagogy into
software that transforms
learning
SCALING THE STUDIO
http://d.ucsd.edu/peer

This is a multidisciplinary effort.

fundamental understanding practical impact

Pasteur’s Quadrant
fundamentalunderstanding
practical impact
Stokes (1997) Pasteur's Quadrant: Basic Science and Technological Innovation

The Design Lab creates postcards from the future

scale personalized mastery-learning
experiences?
How might we…
http://d.ucsd.edu/peer

Let’s match this enthusiasm with insight
Be the thermostat, not the thermometer

http://designlab.ucsd.edu
Scott Klemmer
@DesignAtLarge

Thomke (2000) Experimentation matters: unlocking the potential of new technologies for innovation
Learning through Prototyping
“Never go to a meeting
without a prototype...” 
—Boyle’s Law

Design Process at Large
Steven Dow
Asst Prof, CMU
Early and Repeated Exposure to Examples Improves Creative Work,
Chinmay Kulkarni, Steven P Dow, Scott R Klemmer. Cognitive
Science, 2012.
Prototyping Dynamics: Sharing Multiple Designs Improves
Exploration, Group Rapport, and Results, Steven P Dow, Julie
Fortuna, Dan Schwartz, Beth Altringer, Daniel L Schwartz, Scott R
Klemmer. CHI: ACM Conference on Human Factors in Computing
Systems, 2011.
Parallel Prototyping Leads to Better Design Results, More
Divergence, and Increased Self-Efﬁcacy, Steven P Dow, Alana
Glassco, Jonathan Kass, Melissa Schwarz, Daniel Schwartz, Scott R
Klemmer. ACM Transactions on Computer-Human Interaction, 2010
The Efﬁcacy of Prototyping Under Time Constraints, Steven P. Dow,
Kate Heddleston, Scott R Klemmer. Creativity & Cognition, 2009

“I went with the whole parachute idea and what I had from the
beginning...” 
“This is the best approach for such a design...” “I am not a very good outside-the-box thinker, so I kinda just had one idea
and I was going to try to make it work...”
“No... for some reason... this seems to be the only idea. There needs to be a
platform and then as good of cushion as possible... I don’t see any other way.”
Participants picked their concept early

Duncker, 1945
Functional Fixation

Can process
offer a ﬁxation
antidote? Prototype
Prototype
Prototype
SERIAL
DESIGN AT LARGE
Feedback
Feedback

Web-scale
experiments as a
research platform
DESIGN AT LARGE

Task: Design a Web Ad (N=33)
parallel
prototyping
condition
FINAL
serial
prototyping
condition

Parallel design -> more clicks
Parallel
Clicks per million
impressions
Serial F(1,30)=4.227  
p<.05
0
60
120
180
240
300
360
420
480
398
445

...and more time on the site
Parallel
condition
Average time on client
site per visitor
(seconds)
Serial
condition
F(1,493)=3.172  
p=0.076
0
5
10
15
20
25
30
35
40
12.9
31.3

...and higher expert ratings
Parallel
condition
Likert-scale rating
(0-50)
Serial
condition
F(1,5)=7.948  
p<0.05
0
4
8
12
16
20
24
28
21.7
24.4

...and more diverse designs
Parallel Serial
F=182, p<0.001
0
0.5
1
1.5
2
2.5
3
3.5
3.18
2.78
7=highly similar
0=not at all similar

Gentner, Loewenstein, & Thomson, 2003
learning outcome
Comparison aids learning
training
session
“Describe the solution.”
CASE#1
CASE#2
CASE#1
CASE#2
“Describe the parallels of
these solutions”
“Describe the solution.”
SEPARATE CASES COMPARISON CASES
Solutions to a landlord-renter lease
~ 3x

Sharing Multiple Beneﬁts
• User engagement
• Expert rating
• Individual exploration
• Feature sharing
• Conversational turns
• Consensus
• Rapport Share Multiple
clicks/M
0
250
500
750
1000
1250
774.6734.9
1072.1
Share Best Share One
χ2=4.72, p<0.05
self other self other
… …
self other

Design at Large: Integrating Teaching and Experiments Online featuring Scott Klemmer

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Design at Large: Integrating Teaching and Experiments Online featuring Scott Klemmer

Ähnlich wie Design at Large: Integrating Teaching and Experiments Online featuring Scott Klemmer (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Design at Large: Integrating Teaching and Experiments Online featuring Scott Klemmer