Analysing the Usage of Wikipedia on Twitter:
Understanding Inter-Language Links
HICSS 49, January 8th, 2016
Eva Zangerle, ...
2
MotivationWhy this work does matter…
• Wikipedia central source of information
• 450 million users per month, 277 editio...
3
MotivationWhy this work does matter…
• Wikipedia central source of information
• 450 million users per month, 277 editio...
4
Our Vision: Extrinsic Quality-Measures
5
Inter-language Link Analysis
Our Vision: Extrinsic Quality-Measures
6
Previous Research
Eva Zangerle, Georg Schmidhammer and Günther Specht. #Wikipedia on Twitter: Analyzing Tweets
About Wik...
7
Research Questions
How are inter-language links distributed among the
different Wikipedias?
What are the causes for user...
Crawl
Twitter
Crawl
Wikipedia
Clean
Data
Quality
Analyses
Extract
Links
9
Crawling
Crawl
Twitter
Crawl
Wikipedia
Clean
Data
Quality
Analyses
Extract
Links
• Twitter API
• Search for keyword „wik...
10
Cleaning Data
Crawl
Twitter
Crawl
Wikipedia
Clean
Data
Quality
Analyses
Extract
Links
• Filter tweets with no Wikipedia...
11
Cleaning Data
Crawl
Twitter
Crawl
Wikipedia
Clean
Data
Quality
Analyses
Extract
Links
Feature Raw Cleaned
Tweets 6,415,...
12
Crawling Wikipedia
Crawl
Twitter
Crawl
Wikipedia
Clean
Data
Quality
Analyses
Extract
Links
• MediaWiki API
• Resolution...
13
Quality Measures
Crawl
Twitter
Crawl
Wikipedia
Clean
Data
Quality
Analyses
Extract
Links
1. Article length
2. Number of...
Results
15
RQ1: Distribution of (Inter-language)
links
Top3 Interlanguage
Targets:
62.68 % English
6.26% Japanese
5.76% Spanish
16
RQ2: Causes for Inter-language Links
85%do not have a counterpart
in the tweet‘s language
(out of 691,424 inter-languag...
17
RQ2: Causes for Inter-language Links
Remaining 15%: Could article quality be an issue?
https://en.wikipedia.org/wiki/Bl...
18
19
20
RQ2: Causes for Inter-language Links
• Remaining 99,776 articles: apply 12 quality
measures to all originally posted ar...
21
RQ2: Causes for Inter-language Links
for
58%of all language combinations
the tweeted language is of significantly bette...
22
Dominating Languages
Target Better than (p < 0.05) Count
English Spanish, Japanese, French, Korean, Italian,
German, Ar...
23
Dominating Languages
• Most dominating target languages are English, Spanish,
Japanese
• most extensive Wikipedias
• mo...
24
Quality Measures
66%of all articles tweeted feature a significantly higher quality
for all twelve quality measures
(p <...
25
Quality Measures
97%of all articles tweeted feature a significantly higher quality
for more than six quality measures
(...
26
Conclusion
85% of all inter-language links: no counterpart
available
Articles tweeted are of significantly higher quali...
Questions?
any coffee break
@eva_zangerle
eva.zangerle@uibk.ac.at
http://www.evazangerle.at
http://dbis-informatik.uibk.ac...
Analysing the Usage of Wikipedia on Twitter:
Understanding Inter-Language Links
Eva Zangerle, Georg Schmidhammer, Günther ...
Nächste SlideShare
Wird geladen in …5
×

Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

759 Aufrufe

Veröffentlicht am

Wikipedia is a central source of information as 450
million people consult the online encyclopaedia every
month to satisfy their information needs. Some of these
users also refer to Wikipedia within their tweets. In
this paper, we analyse links within tweets referring to
a Wikipedia of a language different from the tweet’s
language. Therefore, we investigate causes for the
usage of such inter-language links by comparing the
tweeted article and its counterpart in the tweet’s
language (if there is any) in terms of article quality.
We find that the main cause for inter-language links is
the non-existence of the article in the tweet’s
language. Furthermore, we observe that the quality of
the tweeted articles is constantly higher in comparison
to their counterparts, suggesting that users choose the
article of higher quality even when tweeting in another
language. Moreover, we find that English is the most
dominant target for inter-language links.

Veröffentlicht in: Wissenschaft
0 Kommentare
0 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Keine Downloads
Aufrufe
Aufrufe insgesamt
759
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
30
Aktionen
Geteilt
0
Downloads
4
Kommentare
0
Gefällt mir
0
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie
  • In this slide there is a background placeholder. Click to the small icon on the center of the slide and choose an image from computer. When add an image, you must sent it to back with Right Click on Image -> Send to Back -> Send to Back.
  • Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

    1. 1. Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links HICSS 49, January 8th, 2016 Eva Zangerle, Georg Schmidhammer, Günther Specht University of Innsbruck, Austria
    2. 2. 2 MotivationWhy this work does matter… • Wikipedia central source of information • 450 million users per month, 277 editions • Research focused on intrinsic factors • community • content • quality
    3. 3. 3 MotivationWhy this work does matter… • Wikipedia central source of information • 450 million users per month, 277 editions • Research focused on intrinsic factors • community • content • quality • What about extrinsic factors?
    4. 4. 4 Our Vision: Extrinsic Quality-Measures
    5. 5. 5 Inter-language Link Analysis Our Vision: Extrinsic Quality-Measures
    6. 6. 6 Previous Research Eva Zangerle, Georg Schmidhammer and Günther Specht. #Wikipedia on Twitter: Analyzing Tweets About Wikipedia. In Proceedings of the 11th International Symposium on Open Collaboration, OpenSym ’15, pages 14:1–14:8, New York, NY, USA, 2015. ACM. • Extrinsic view on Wikipedia via Twitter • 20% of all tweets lead to a Wikipedia other than the tweet‘s language (except for English and Japanese)
    7. 7. 7 Research Questions How are inter-language links distributed among the different Wikipedias? What are the causes for users to link to a Wikipedia other than the one of their langage?
    8. 8. Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links
    9. 9. 9 Crawling Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links • Twitter API • Search for keyword „wikipedia“ • 2014/10/20 – 2015/04/28 • 6,415,762 tweets in total • Extraction of links from tweets
    10. 10. 10 Cleaning Data Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links • Filter tweets with no Wikipedia URL contained • Bots contained in dataset • 99th percentile (>130 tweets) • BotOrNot Detection Service for 1,083 accounts • users and tweets deleted from dataset
    11. 11. 11 Cleaning Data Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links Feature Raw Cleaned Tweets 6,415,762 2,844,399 Retweets 2,040,816 855,959 Distinct Users 2,287,430 1,092,732 Mentions 4,673,284 2,437,092 Distinct Hashtags 213,574 127,958 Hashtag Usages 2,283,535 788,210 Distinct URLs 1,976,479 1,179,288 URL Usages 4,825,230 3,130,420
    12. 12. 12 Crawling Wikipedia Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links • MediaWiki API • Resolution of revision ID for time tweet was sent • Crawling of • article • headings • wikilinks • references • images • Last 500 edits
    13. 13. 13 Quality Measures Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links 1. Article length 2. Number of references (absolute) 3. Number of references (relative) 4. Diversity 5. Number of headings (absolute) 6. Number of headings (relative) Warncke-Wang, M., Cosley, D., and Riedl, J. "Tell Me More: An Actionable Quality Model for Wikipedia", in the proceedings of WikiSym 2013 7. Informativeness 8. Number of images (relative) 9. Number of wikilinks (relative) 10.Currency 11.HasInfoBox 12.Complexity (Flesch Kincaid)
    14. 14. Results
    15. 15. 15 RQ1: Distribution of (Inter-language) links Top3 Interlanguage Targets: 62.68 % English 6.26% Japanese 5.76% Spanish
    16. 16. 16 RQ2: Causes for Inter-language Links 85%do not have a counterpart in the tweet‘s language (out of 691,424 inter-language links)
    17. 17. 17 RQ2: Causes for Inter-language Links Remaining 15%: Could article quality be an issue? https://en.wikipedia.org/wiki/Black_Monday_(1987) https://es.wikipedia.org/wiki/Lunes_negro_(1987) originally posted counterpart
    18. 18. 18
    19. 19. 19
    20. 20. 20 RQ2: Causes for Inter-language Links • Remaining 99,776 articles: apply 12 quality measures to all originally posted articles and their counterparts • Group articles into language pairs (original and counterpart language) • For each article in language pair count number of measures original articles performance better than counterpart and vice versa (result: two vectors) • Wilcoxon signed rank test for each language pair
    21. 21. 21 RQ2: Causes for Inter-language Links for 58%of all language combinations the tweeted language is of significantly better quality (p < 0.05)
    22. 22. 22 Dominating Languages Target Better than (p < 0.05) Count English Spanish, Japanese, French, Korean, Italian, German, Arabic, Indonesian, Portuguese, Dutch, Turkish, Swedish, Thai, Polish, Romanian, Finnish, Danish, Norwegian, Farsi, Welsh, Hindi, Bulgarian, Latvian, Bosnian, Slovakish, Hung-arian, Slovenian, Lithuanian, Bosnian 28 French English, Japanese, Spanish 3 Spanish English, Italian 2 Catalan English, Portuguese 2 German English 1 Japanese German 1 Portuguese Spanish 1 Turkish English 1
    23. 23. 23 Dominating Languages • Most dominating target languages are English, Spanish, Japanese • most extensive Wikipedias • most active Wikipedias  more elaborate, mature articles than in user‘s language
    24. 24. 24 Quality Measures 66%of all articles tweeted feature a significantly higher quality for all twelve quality measures (p < 0.001)
    25. 25. 25 Quality Measures 97%of all articles tweeted feature a significantly higher quality for more than six quality measures (p < 0.001)
    26. 26. 26 Conclusion 85% of all inter-language links: no counterpart available Articles tweeted are of significantly higher quality (with English, Japanese and German dominating) Users deliberately tweet article of higher quality
    27. 27. Questions? any coffee break @eva_zangerle eva.zangerle@uibk.ac.at http://www.evazangerle.at http://dbis-informatik.uibk.ac.at https://www.facebook.com/dbisibk Contact
    28. 28. Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links Eva Zangerle, Georg Schmidhammer, Günther Specht University of Innsbruck, Austria

    ×