A conversation about Twitter's recent moves to enforce aspects of its API TOS to prohibit online research services archives for download. This was informed by recent discussion on the AoIR mailing list and my own experiences.
6. Twitter-History a.k.a. âTwistoryâ âWe hope Twitter will realize the value of enabling researchers, journalists and citizens better ways to search, sort and analyze clusters of this important historical information.â 6
8. Twitter says âdesist!â Prohibited other services from offering archives (for download): E.g., 140kit, TwapperKeeper, DiscoverText, ... Shut down 3rd party clients (Twidroyd & UberTwitter) for: Private Direct Messages longer than 140 characters Trademark infringement Changing the content of users' Tweets in order to make money 8
9. Twitter responds ... â... abide by a simple set of rules that are in the interests of our users, as well as the health and vitality of the platform as a whole.â â... on an average day we turn off more than one hundred services that violate our API rules of the road.â âYou can download Twitter for Blackberry, Twitter for Android and other official Twitter apps here. You can also try our mobile web site or apps from other third-party developers.â 9
11. Perspectives: Online social messaging service (user) Open ecosystem infrastructure (developer) Historical social record (researchers) Post âtweetsâ with max. 140 characters in real-time Publicly accessible (cf. CB radios) with some privacy Provides search (limited) Uses & develops open-source software (e.g., Cassandra, Lucene, FlockDB, ...)
13. Some Twitter numbers Valuation: 4 billion (January 2011) Investment: $360 million (200m, Dec 2010) Employees: 400 (Jan 2011)ï200 are engineers Revenue: Ad estimates 150 million for 2011 No. of tweets: 140-150 million per day Users/Accounts: 200 million (approx.) Website ranking: Top 10-Top20 Twitter search: One billion queries per day 13
18. Twitter Research Services: 140kit, TwapperKeeper, DiscoverText, The Archivist, ... Some hundreds of publications Areas: Social network analysis, recommendations systems, social influence, user sentiment, business strategy, disaster prediction & alerts, education, software engineering, politics, ... Using: Content analysis (narrative), ethnography, SVMs, TextRank, TFIDF, BoW, POS, ... 18
19. The Twitter API REST API uses HTTP protocol All website features supported through API Programming libraries available Rate limiting (user & IP): Anonymous: 150 requests per hour OAuth: 350 requests per hour Whitelist e.g. ï 20,000 requests Streaming offerings: Spritzer (1%) Gardenhose (10%) Firehose (100%) 19
20. General Terms of Service (Nov 2010) Under âYour Rightsâ: â... You grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).â 20
21. TOS tips âThis license is you authorizing us to make your Tweets available to the rest of the world and to let others do the same. But whatâs yours is yours â you own your content.â âTwitter has an evolving set of rules for how API developers can interact with your content. These rules exist to enable an open ecosystem with your rights in mind.â 21
22. API TOS (Feb 2011) Access to Twitter Content: You will not attempt or encourage others to: sell, rent, lease, sublicense, redistribute, or syndicate the Twitter API or Twitter Content to any third party for such party to develop additional products or services without prior written approval from Twitter Content = âAll use of the Twitter API and content, documentation, code, and related materials made available to you on or through Twitter.â 22
23. Authorised resyndication = GNIP First authorized reseller of Twitter data, Nov 2010 Offerings: Halfhose (50%, $30k / mo) Decahose (10%, $5k / mo) Power Track ($.10 per 1,000 Tweets) Link Stream ($50k / mo) User Mention Stream ($20k / mo) Keyword Search 23
24. Potential consequences Obstruct peer review of datasets Prohibits researchers getting access to data (in a timely way, if at all) Stifle innovations (most come from user community & 3rd party developers!) Users become more cautious about using social media Twitter becomes less useful (protest, reporting, ...) Twitter services become hacking targets: (unreliable, unstable, slow, ...) Social science researchers twiddle their thumbs
26. Talking points Is there a problem here? Does Twitter have any obligation to users, developers & researchers? Is it worth (or even ethical) to violate Twitterâs TOS to get access to researchable data? Should usersâ content even be available to researchers?