Presentation by Michael Zimmer at the Internet Research Ethics preconference workshop on 10/20/2010. Part of Internet Research 11.0, the 11th annual conference of the Association of Internet Researchers (AoIR).
1. What is a text?What can be used? Vs. a (private) utterance Does it matter who collected, and how? Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
2. Current discussion on Air-L about whether blogs are “texts”, thus completely outside purview of IRBs Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future The act of publication is to make public a set of ideas, and at that point it becomes an artifact--a text--game for analysis without the concern of human subject research ethics (in my opinion). Again, if the authors attempt to password-protect their work, that's an IRB-worthy issue, but otherwise, even if it's about a "personal matter," the act of publication is a public thing...thus no IRB needed. (source redacted)
3. What is a text? Presumption that any publication online is fair game for collection and use in research, so long as no proactive attempt was made to restrict access Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
4. Complications Not all “publications” online are equal Blogging is often a broadcast medium Status updates / tweets have more of an imagined audience, even if public Items are re-blogged, re-tweeted, typically without consent/control by source Password-protection should not be the gold standard Not everyone is technically literate Presumption of limited visibility Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
5. Is this an open text? A parent starts a blog for the local PTA, and the comments include remarks intended for that limited context & audience An 18 year old starts a LiveJournal with personal content; anyone with an account can view it A Twitter feed which includes re-tweets from identifiable accounts, unsure whether original tweets were open or restricted Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
6. What can be used? Presumption that anything publicly accessible by any means is fair game Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
7. Complications Just because something is accessible to a researcher doesn’t mean the owner meant it to be scraped, harvested, mined Owner might have presumption of obscurity Owner might not recognize power of crawlers and scrapers Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
8. Should these be used? A public blog, now deleted from Blogger, is freely accessible via Internet Archive A Twitter stream, long abandoned, with only 5 followers Facebook profile data, from account started in 2007 with no recent activity, but with information forced to be public based on platform changes Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
9. Case 1: Pete Warden Facebook DB Independent engineer devises way to scrape public Facebook accounts Doesn’t access from within Facebook, thus (presumably) avoids violation of Facebook’s TOS Harvests public profile information from 215 million accounts Plans to release to public, without any de-identification Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
10. Case 1: Pete Warden Facebook DB Is method of scraping, without even logging into Facebook, acceptable? Did users envision this type of access and harvesting when making profile information public? Can researchers use this data? What would an IRB say? Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
11. Case 2: Library of Congress to Archive Public Twitter Streams LOC and Twitter strike agreement to have all public tweets archived After a 6-month delay, all public tweets are sent to LOC Non-commercial use only; not publicly available or available for bulk download But will create “analytic tools” Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
12. Case 2: Library of Congress to Archive Public Twitter Streams What personal information will also be included? Name and bio? Geo-locational data for each tweet? My private tweets, that have been retweeted in public stream, are included No opt-out. Can researchers use this data? What would an IRB say? Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future
13. What is a text?What can be used? Vs. a (private) utterance Does it matter who collected, and how? Oct 20, 2010 Ethics and Internet Research Commons: Building a sustainable future