Talk on the potentials of Twitter data for linguistic research held at the Freiburg Institute for Advanced Study (FRIAS) on invitation from Christian Mair. Thanks for having me!
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
Twitter as a data source for (socio)linguistic research
1. Twitter as a data source for
(socio)linguistic research
Cornelius Puschmann
Berlin School of Library and Information Science /
Humboldt Institute for Internet and Society
Universität Freiburg, 29. November
2. 2. A very brief introduction to Twitter
1. Framing the issue: Big Data in the
Humanities and Social Sciences
3. Technical requirements
4. Legal and ethical issues
5. Sample data: a corpus of Nigerian pidgin
This talk
3. "Big Data"
The proliferation of social media makes large
volumes of data available to researchers, leading
to new approaches:
• digital methods (Rogers, 2009)
• cultural analytics (Manovich, 2007)
• computational social science (Lazer et al, 2009)
4. Examples of "Big Data"-
style research
• artistic trends in online art communities
(Manovich, 2011)
• cooperation and collaboration in Wikipedia
edit wars (Yasseri, 2012)
• tracing the geographical spread of
neologisms via Twitter (Eisenstein et al,
2012; 44 mio tweets, 500k users)
5.
6.
7. Features of Twitter
• messages restricted to 140 characters
• semi-synchronous
• mostly public
• content presented as stream
• used to spread news, have (semi-)public
conversations
• native features: retweeting, hashtags, @
messages, picture linking
10. Software
Collection:
• The Archivist (Windows desktop software)
• yourTwapperKeeper (webserver required)
• 140kit.com (web-based plaform for researchers)
Analysis:
• Excel, Open Office Calc, SPSS, R, Google Docs..
Visualization:
• (Excel, OO Calc, R), Gephi, NodeXL
11. Legal and ethics issues
• Consider ethical issues when collecting
(cf. AoIR Ethics Guidelines)
• Anonymize all data
(cf. European Data Protection Directive)
• Don‘t share raw data
(cf.Twitter Terms of Service)
• Publish only excerpts/summary statistics
12. Example: A Twitter corpus of Nigerian pidgin
• collected data since August 17th
• used tweets from and to three
users based in Abuja, Nigeria
• 8,151 tweets from 357 different users
• corpus contains both language data and social graph
13. wiztalib
@USER: @wiztalib lolz,make I do leave dat town jor..BM no dey exist again u
think so,I miss ooooo
RT @USER: Som pple ask me, whr av bin l8ly, de tot I fell off, nobody can
save me, de playn in d backgrnd, I don't backdwn, so dnt ...
albertteslim
@USER d same tin we'v bin hearin...'xamz till further notice'...
@USER mai fada nd mai moda
lil_tenuche
LMFAO @USER: @lil_tenuche I wan enter keffi 2day sef... I wan write my
exams dia since unibuja dey dull me
RT @USER: Retweet If you are proud of ur language ☺
Sample tweets
14. @USER guy u be fool o.dey use my acct tweet rubish ba....ur brain dy shake abi
@USER: @USER abi,hw is lifefyn sweet ow jtown
Abi@USER: Wish her well bruv....RT @USER: A wish?@USER: Watch out ppl! Goin to be performing wit Wizzy tonyt :D
If you don't know won't you shut up?abi are u a learner
@USER loool....abi, when u dey commot that side?
U wey knw@USER: Hmmmm!!! D usual abi?@USER: This night.....
Lol...abi@USER: Take cover!! RT @USER: Watchu gon' do when shit hits the fan?
Lol...just missing u@USER: *raised eyebrow* kabir?? One can now follow twice? Abi what? RT @USER: @USER pls ff bck
dear :D
I wish o@USER: Kissed u n neva called?@USER: Do u knw wat he did?@USER: Abi...@USER: Oya frgive jor
Abi...@USER: Oya frgive jor@USER: @USER I'm really angry
Lol...abi@USER: Na Ideba tinz oh@USER: Ileya ti ya o
Lol abi@USER: A 100% is much,at least 50% will do.@USER: Never trust a human being 100%
Hmmm okies@USER: Not really @USER: I see u've joined #TeamNoSleep abi @USER
I see u've joined #TeamNoSleep abi @USER
Loool@USER: Tweetpic my boobs abi? U wee tey for there
Abi@USER: 3 jst in one nyte. Wow dats splendid.RT @USER: @USER yaaaaaaaay...pls ff @USER ...she's our bday mate
Abi...@USER: @USER saying nothing, and wishing you had?
Abi@USER: I can't dull myslf gaskiya
Hmmm dats true o! Lemme see ur hand sef@USER: @USER lol...why dnt u believe me, abi u see ring 4 my hand?
Yes o! :D@USER: U abi?@USER: Okene ben 10
@USER: @USER abi,hw is lifefyn sweet ow jtown
@USER abi,hw is life
Abi..RT @USER: @USER its dirty!!!
U̶̲̥̅̊ dnt knw abi RT @USER: @USER but y?
Sample tweets (with abi)
16. Summary
• Twitter can be used to collect language data
from a variety of sources
• Combination of linguistic, demographic and
interactional data enables new forms of
research
• technical challenges must be overcome
• legal/ethical issues should be carefully
considered from the onset