My keynote for the iSay conference "The Shape of Things"
http://isayevents.wordpress.com/shapeofthings/program/
My notes from the conference are at http://openobjects.blogspot.co.uk/2013/02/notes-from-shape-of-things-new-and.html
1. The gift that gives twice:
crowdsourcing as productive
engagement with cultural heritage
Mia Ridge, Open University
http://openobjects.org.uk
@mia_out
The Shape of Things: New and emerging technology-enabled
models of participation through VGC
4. What is crowdsourcing?
'the spare processing power of millions of human brains'
National Library NZ on The Commons
https://www.flickr.com/photos/nationallibrarynz_commons/3326203787/
5. Crowdsourcing and related terms
The Library of Congress
https://www.flickr.com/photos/library_of_congress/2163131283/
6. What’s VGC, what’s crowdsourcing?
When there’s
no clearly
defined
direction,
shared goal or
research
question?
Smithsonian Institution Archives, http://photography.si.edu/SearchImage.aspx?id=279
7. Participatory project models
Contributory
the public contributes data to a project designed
by the organisation
Collaborative
both active partners, but lead by organisation
Co-creative
all partners define goals together
Center for Advancement of Informal Science Education (CAISE)
8. But who really has agency?
"I participate, you participate, he participates, we participate, you participate...they
profit.“ (1968) Via A Ladder of Citizen Participation - Sherry R Arnstein
10. Crowdsourcing in GLAMs
(GLAMs: galleries, libraries, museums, archives)
Cornell University Library https://www.flickr.com/photos/cornelluniversitylibrary/3611951684/
11. Why crowdsourcing in GLAMs?
Cornell University Library https://www.flickr.com/photos/cornelluniversitylibrary/4558314825/
12. Who participates in crowdsourcing?
UW Digital Collections http://www.flickr.com/photos/uw_digital_images/4476958262/
13. Super-contributors and drive-bys
‘16,400 little boxes – one for each person who’s contributed to oldWeather. The area of each box is
proportional to the number of pages transcribed, between us all we’ve done 1,090,745 pages.’
http://blog.oldweather.org/2012/09/05/theres-a-green-one-and-a-pink-one-and-a-blue-one-and-a-yellow-one/
14. Crowdsourcing before the web
• 19th Century natural
history collecting
• 1849 Smithsonian
meteorological
observation project
• 1857, 1879 Oxford English
Dictionary
• WWII Soldiers given a
Field Collector’s Manual
in Natural History by the
US Museum of Natural
History
James Murray, editor, OED, with contributor slips
https://en.wikipedia.org/wiki/File:James-Murray.jpg
15. Productive process and outcomes
The Library of Congress https://secure.flickr.com/photos/library_of_congress/2178435033/
18. Trove
Over 85 million
lines of text
corrected, 1.9
million tags; 50k
comments, 82k
registered users
19. FamilySearch
2012 Statistics
Total records indexed: 534,108,416
Total records arbitrated: 263,254,447
Total volunteers contributing: 348,796
Total estimated hours contributed: 12,764,859
On “5 Million Name Fame” event day, July 2012:
Indexed Records: 7,258,151
Arbitrated Records: 3,082,728
Total Records Worked: 10,340,879
https://familysearch.org Volunteers participating: 46,091.
33. Engagement
1. 'attending'
2. 'participating'
3. 'deciding'
4. 'producing‘
Department for Culture Media and Sport The U.S. National Archives
'Culture and Sport Evidence', 2011 http://www.flickr.com/photos/usnationalarchives/3678706327/
34. 'Levels of Engagement' in citizen
science
• Level 1: participating in
simple classification
tasks
• Level 2: participating in
community discussion
• Level 3: 'working
independently on self-
identified research
projects’
(Raddick et al, 2009)
State Library of Queensland, Australia
https://www.flickr.com/photos/statelibraryqueensland/4603281578/
36. FamilySearch ‘stepping stones’
• Indexing as ‘introductory, family history
education’ including:
– Knowledge about Record Types
– Genealogical Information
– Handwriting Practice
• From indexing, can move to arbitration
– Invited after transcribing 2,000 records if 94%
accuracy or higher
37. Crowdsourcing and motivations for
participation
Powerhouse Museum Collection https://secure.flickr.com/photos/powerhouse_museum/2633069104/
38. Motivations for participation
• Altruistic
– helping to provide an accurate record of local
history
• Intrinsic
– reading 18thC handwriting is an enjoyable puzzle
• Extrinsic
– an academic collecting a quote from a primary
source
41. Intrinsic motivations
• fun
• the pleasure in doing
hobbies
• the enjoyment in
learning
• mastering new skills,
practicing existing skills
• recognition
• community
• passion for the subject
State Library of Queensland, Australia
https://secure.flickr.com/photos/statelibraryqueensland/3198305152/
42. Intrinsic motivations
People crave:
• satisfying work to do
• the experience of
being good at
something
• time spent with people
we like
• the chance to be a part
of something bigger
(Jane McGonigal, 2009)
State Library of New South Wales collection
https://secure.flickr.com/photos/statelibraryofnsw/2880982738/
43. Crowdsourcing in cultural heritage
• Fun, challenging tasks
• Interesting subjects
• Interfaces and
community provide
scaffolding
• Lots of content for
museums
The U.S. National Archives
http://www.flickr.com/photos/usnationalarchives/4266498500/
46. Thank you!
Questions?
Mia Ridge
Open University
http://openobjects.org.uk
@mia_out
The Library of Congress
https://www.flickr.com/photos/library_of_congress/2179923364/
Editor's Notes
Hi, I’m Mia... Going to talk about how crowdsourcing in cultural heritage is productive in two ways: it’s very effective for producing content, but the process of producing that content is also a form of productive engagement. Understanding what audiences get out of participating in projects helps design better, more productive projects.
Can you tell I’ve been playing with visualisations lately? Currently PhD student in Digital Humanities in the Department of History, Open University, and I’m also Chair of the Museums Computer Group, or MCG.Previously, cultural heritage technologist (Science Museum, Museum of London, Melbourne Museum; programmer, analyst, user experience design and research) until I realised I had to get the PhD out of my system.My doctoral research investigates the potential technical, interface and cultural requirements for collaboratively creating digital resources by aggregating the collecting and digitisation work that academic, family/local historians and other researchers would be doing as part of their normal research work. In other words, I’m researching how historians use, evaluate and contribute to collaboratively created resources as a form of scholarly crowdsourcing. This scholarly crowdsourcing is also a form of ‘participant digitisation‘, where the work that people are already doing is centrally harvested or aggregated.For my MSc in Human Computer Interaction I researched, designed and evaluated crowdsourcing games that were designed to help enhance metadata about 'difficult' (technical, repetitive, boring) museum objects. Used principles of casual game design...
Image: Election night crowd, Wellington, 1931 Source: National Library NZ on The Commons https://www.flickr.com/photos/nationallibrarynz_commons/3326203787/Crowdsourcing (Jeff Howe and Mark Robinson, Wired, 2006):"the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call”. Or, using 'the spare processing power of millions of human brains’ – the idea that there’s a ‘cognitive surplus’ of potential attention that could be turned away from TV to other activities
So what’s not crowdsourcing? As Estellés-Arolas and González-Ladrón-de-Guevara (2012) point out, crowdsourcing is evolving to the extent that the label may be applied to almost any internet-based collaborative activity. The lines are also blurred between crowdsourcing and related terms such as cognitive surplus (Shirky, 2011). So we might exclude some activities designed to supplement computational methods but ‘human computation’ - 'using human effort to perform tasks that computers cannot yet perform, usually in an enjoyable manner' (Law and von Ahn, 2009) such as image recognition – is strongly related, where collective intelligence (Quinn and Bederson, 2011) and the ‘wisdom of the crowds’ might not be. There are a number of terms to describe directed work around historical or scientific material. Terms such as 'community-sourcing', or working with people who already have a relationship with your institution (Phillips, 2010) and 'nichesourcing', where tasks are 'distributed amongst a small crowd of amateur experts' (de Boer, Hildebrand, Aroyo et al, 2012) have also been used. Citizen science: projects that involve'volunteers from the general public in scientific investigations’, and citizen history has been coined from that. ‘Scholarly crowdsourcing' is how I’m describing the kinds of participant digitisation I’m interested in - the collaborative creation of resources through collection, digitisation, description or transcription for use in academic and amateur scholarly research. Content created with organisations such as visitor- or user-generated content can be tricky...
Difficult area to define, and the division may not matter too much, but it’s sometimes a useful distinction.
Crowdsourcing is usually contributory, though more ambitious projects aim to be collaborative or co-creative (see caveats in Whose Cake Is It Anyway? report)
Another way to look at the question, particularly for collaborative and co-creative projects... Who has agency in the process? Important to think throughout the day about whose voice can effect substantive change.Image via http://www.lithgow-schmidt.dk/sherry-arnstein/ladder-of-citizen-participation.html Arnstein, Sherry R. "A Ladder of Citizen Participation," JAIP, Vol. 35, No. 4, July 1969, pp. 216-224
It’s important to address the ethics of crowdsourcing, though I think museums are lucky that people are generally only going to participate if they’re also getting something out of it, and we're working toward the common good.
When well-designed, crowdsourcing projects meet core mission of museums as they go beyond digitising content, improving metadata or identifying specimens to providing engaging experiences; to position museums as platforms for enjoyable, meaningful activity.
Two kinds of gap – semantic gap, gap in resources to digitise backlog
People who are passionate about your subjectPeople who like doing the task you're offeringPeople who can't volunteer in regular hours or at your venues
Some people do a lot of the work, and a lot of people do some of the work. This represents all 16,400 people who have transcribed at least one page for Old Weather.Source: http://blog.oldweather.org/2012/09/05/theres-a-green-one-and-a-pink-one-and-a-blue-one-and-a-yellow-one/
The long tradition of volunteering in museums and cultural heritage encompasses both citizen science and citizen history. The Oxford English Dictionary is one famous example, though in today’s terms we might say they started out with nichesourcing and moved to crowdsourcing. There are many examples of natural history collecting and observation.
People don't realise that they're helping digitise books ‘One Word at a Time‘ every time they fill in one of these… ‘reCAPTCHA is a free CAPTCHA service that helps to digitize books, newspapers and old time radio shows. ... Currently, we are helping to digitize old editions of the New York Times and books from Google Books.’ http://www.google.com/recaptcha/learnmoreAudio version is helping transcribe ‘audio from old time radio shows that speech recognition software could not decipher correctly’ http://blog.recaptcha.net/2008/12/new-audio-recaptcha.htmlSo not strictly crowdsourcing in cultural heritage, but a good example of the types of jobs that crowdsourcing helps people accomplish that computers can’t do yet, and of how productive it can be do tie crowdsourcing in with existing tasks like posting a comment somewhere. It's also a counter-example as it's not as engaging for the person doing it.
Alongside typical crowdsourcing actions like commenting and tagging, the National Library of Australia's Trove newspaper database makes visible the issues with optical character recognition (OCR) – the errors in transcribing old typefaces and newspaper layouts that can make searches in the database inaccurate and render content meaningless – and asks the public to help improve it. The crowdsourced functionality in Trove is closely aligned to the needs of its users, who would already be correcting text from the digitised originals for their own uses. By providing a tool through which participants can share their corrections, their individual work benefits all users. 1,888,907 tags; 50,377 comments, 82,122 registered users, 7,699 active so far in JanuarySource: http://trove.nla.gov.au/system/stats?env=prod
Your contribution makes a visible differenceimmediately… Supported by effective design that makes correcting text a satisfying interaction, the user experience is further enhanced by the immediate appearance of the corrected text on the page (alongside the editing history). This shows participants the value of their contribution by making their corrections immediately available for the benefit of other users.
Source: https://familysearch.org
'The FamilySearch Indexing app simplifies indexing by allowing you to transcribe individual names, or “snippets,” on your mobile device instead of downloading larger batches of names that must all be transcribed as part of a group. (You also have the option to view the entire document so you can see the name in context.) You can set a difficulty level and skip snippets that are too hard to read.' They're experimenting with tasks that fit into lifestyles, while checking accuracy rates remain at the high level they need. Source: https://tech.lds.org/blog/455-new-familysearch-indexing-app-now-available
OCR correction games from the National Library of Finland. Who doesn’t want to save moles from certain death? Also built in validation activities into the system as well as transcription.
Very productive super-taggers… designed for scholarly use, but manual validation creates backlog. Post to the blog about progress. Overall: 60% of the 8,164 manuscripts uploaded to the website have been transcribed thus far.
Great example of power of community and serious validation. Climate data informs climate science models, so has to be right. Used proper maths to work out how many contributors needed to transcribe something to get accurate data. Over a million pages transcribed. Did 1 million pages within about 18 months, and finished them a few months later. So successful, had to find new content to give participants…
Really focussed design. Altruistic and subject specialist motivation; clear sense of what to do next… Also topical content - new menus, presidential inauguration menu… 1,165,528 dishes transcribed from 16,301 menus (from their collections of around 45,000 menus dating back to the 1840s)
Further down the page, requests for help with specialist tasks, supporting a range of potential users.'Help fix misspellings, fill in missing data... ' - if you can't bear to see a misspelled word go by, you might not be able to resist that. They also include proof of the value of your contributions - '16,243 menus digitized and counting...' and a way to check out the content before committing to action.
Highly structured task – not much room for uncertainty, reduces cognitive overhead of the task, makes it more enjoyable
UK RED is an open-access database with over 30,000 records documenting the history of reading in Britain from 1450 to 1945. Evidence of reading presented in UK RED is drawn from published and unpublished sources as diverse as diaries, commonplace books, memoirs, sociological surveys, and criminal court and prison records. The point here is the difference motivation can make – people working on this are often using it as a research database, so they’re deeply motivated to contribute, even if the form (and this is just part of it) looks complicated.Source:http://www.open.ac.uk/Arts/reading/UK/about.php and Contribute page
'The initial pilot, released in February 2012, was a tremendous success, with all 724 maps georeferenced by the public in less than one week.' They've just released more but if you want to try it you’ll have to be quick.http://www.bl.uk/maps/index.htmlThis example shows the power of finding the right audience, particularly through something as fast as social media. People started tweeting about it and by the time the Press Release was ready, all the maps were georeferenced.Christopher Fleet, Kimberly C. Kowal, PetrPřidal. 'Georeferencer: Crowdsourced Georeferencing for Map Library Collections'. D-Lib Magazine. http://www.bl.uk/maps/
So to sum up...
So to sum up...
Think of tasks as atoms of crowdsourcing; tasks join together to form activities, etc.
So what kinds of engagement can you get through participating in crowdsourcing projects?But first, what is engagement? Not just 'turning up'…
A useful model for thinking about engagement and participation in cultural heritage projects... Department for Culture Media and Sport 'Culture and Sport Evidence' CASE (2011) defines four types of engagement, each of which builds on the previous level: 1) 'attending'- paying conscious, intentional attention to content; 2) 'participating' - interaction that contributes to the creation of content; 3) 'deciding' - making decisions about the delivery of resources for content creation and 4) 'producing' - creating content 'which has a public economic impact'. Department for Culture Media and Sport 'Culture and Sport Evidence’, 2011
Participating in community also seems to be key reason for on-going participation in traditional and online volunteering... see some examples of self-identified research projects next.Suggests you should leave room for curiosity to develop...
Herbaria@Home started in 2006. It aims to document historical herbarium collections within museums based on photographs of specimens supplied by museums. So far participants have documented # historic specimens, and some have also found themselves being interested in the people whose specimens they were documenting. As a result, the project has expanded to include biographies of the original collectors.http://herbariaunited.org/wiki/Harry_Corbyn_Levingeor http://herbariaunited.org/wiki/Augustin_LeyWe’ve seen a similar thing inOld Weather, where some people who’ve become intrigued by the oddities they’ve come across while transcribing weather observations started to get interested in the history of the ships they were working on, starting learning about maritime history and writing ship's histories.FamilySearch quite deliberately aims to turn indexers into family historians...
FamilySearch encourage people who aren’t interested in family history to start out as transcribers. Transcribing words is a good example of a microtask, a good way to start a complex process. FamilySearchknow that transcribers will probably end up being interested in finding out more about their families as they’re exposed to other people’s histories. They also encourage some people to move into positions of responsibility for checking other transcriptions. Source: Davis, Jessie. 2012. ‘Stepping Stones of Genealogy’. FamilySearch Blog. November 20. https://familysearch.org/blog/en/stepping-stones-genealogy/.
Went back to literature on motivations for participation in open source projects, Wikipedia, volunteering in museums etc for a grounding in relevant theory about participation.
The same task (such as transcribing sections of a historic document) could be undertaken for altruistic, intrinsic or extrinsic reasons… Usually more than one motivation per person. Motivations change over time – different when deciding to start participating to when continuing.Much research on motivations for participation in non-commercial crowdsourcing projects comes from citizen science, or other 'community-based peer-production projects' like open source software.Intrinsic motivation - behavior, such as a hobby, that is initiated without obvious external incentives. External motivation is activated by external incentives, such as direct or indirect monetary compensation, or recognition by others.
Getting own work done and getting research published or playing a game. Those activities might have their own intrinsic motivation but as it’s not directly related to crowdsourcing task, it’s extrinsic in this context.
Turns out there’s a whole ecosystem around matching volunteers to opportunities.In the research, ‘the importance of the project’s goals’ and ‘helping out’ were important motivations for participating in crowdsourcing and volunteering in cultural heritage. Some Wikipedians are motivated by ‘ideology’.However, often not a pure motivation – I discovered in my game project that people can use altruistic projects to justify enjoyable pasttimes.
Some things are just intrinsically fun, whether it’s playing with a hose or collecting Pokemon.Some of the learning scaffolding or progression comes from the community, not from the interface.
Source: http://www.aam-us.org/resources/publications/museum-magazine/museums-as-happiness-engineers and http://www.youtube.com/watch?v=zJ9j7kIZuoQ&feature=plcp
So, to sum up...
So, then, it seems crowdsourcing in cultural heritage is at the intersection between tasks that need to be done that people can do better than computers, and content created through engagement with cultural heritage that contributes towards a shared, significant goal. The process might have as much value as the results.Working towards a shared significant goal, whether through tasks related to your own work, as a side-effect of game play or engaging in an intrinsically enjoyable task, is what’s important – working towards something bigger than yourself.Venn diagram made via https://www.lucidchart.com/documents/edit/4d95-5078-5108522a-9bb1-6a540a004234#?demo=on