http://www.scotthale.net/pubs/?websci2014
This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present.
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Multilinguals and Wikipedia Editing
1. Multilinguals and Wikipedia Editing
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
2. Background, Motivations
Wikipedia is global platform covering hundreds of languages
despite evidence of balkanization (Taneja & Wu, in press)
Past studies generally concentrate on one edition (usually English)
Important variations across languages
Content is diverse across languages (Hecht & Gergle, 2010)
Each edition of Wikipedia shows a self-focus bias with more articles
about regions where the language is spoken (Hecht & Gergle, 2009)
Multilingual users may act as unconscious translators bridging language
divides (Herring et al., 2007; Eleta & Golbeck, 2012)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
3. Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
4. Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Editors of many editions of Wikipedia come from a wide variety of
timezones suggesting that bilingual editors are present (Yasseri et al.,
2012)
In a survey of editors, half of all editors reported editing in multiple
languages and 72% reported reading more than one language edition of
Wikipedia.†
†
https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/
Location %26 Language&oldid=8409990
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
5. Hypotheses
1 Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
6. Data
All edits to any of the top 46 language editions (all editions with at
least 100,000 articles)
Recorded via the IRC stream
(code at http://www.scotthale.net/pubs/?websci2014)
32 days (8 July to 9 August 2013)
Edit meta-data
datetime
edition
article title
username
size of edit
flags (minor, bot, etc.)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
7. Data cleaning
Non-minor edits by registered, human users to articles
Only edits to main (article) namespace
Removed articles flagged as being created by ‘bots’
Removed anonymous users
Removed undeclared bots and users with only one edit session in the
month
Require at least four edits and at least 2 edits to one edition
Matching users and articles across languages
Look for common usernames across language editions
Check usernames are indeed linked global accounts
WikiData dump to match articles across languages
55,568 users with a total of 3,518,955 edits (excluding the Simple English
edition).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
8. Data summary
Language Edits Articles Users NP
users
NP
edits
English 1,389,647 518,405 27,476 18% 3%
German 256,495 125,647 5,967 18% 2%
French 250,828 106,027 4,549 25% 3%
Spanish 191,934 66,848 4,338 24% 3%
Russian 239,267 92,326 3,961 16% 1%
Japanese 106,848 56,406 3,551 11% 2%
Italian 160,191 69,534 2,919 25% 2%
Chinese 112,888 42,937 2,309 14% 1%
Portuguese 67,505 32,753 1,730 29% 4%
Dutch 80,535 39,463 1,500 33% 3%
Polish 67,038 37,393 1,454 30% 3%
Top language editions: The Users column includes all users who edited the edition
during the data collection period. A percentage of these users (NP users) are
non-primary users who edited a different language edition more frequently.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
9. Multilinguals vs Monolinguals
15.4% of users (8,544) edited multiple language editions.
Figure: Density plot comparing the number of edits made by monolingual and
multilingual Wikipedia users.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
10. Hypotheses
Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
11. What do multilinguals edit?
Only 2.6% of edits are
from users writing in their
non-primary languages.
44% of the articles edited
by multilingual users in
their non-primary
languages were not edited
by any monolingual user
2D density plot of the number of multilingual
users editing articles in a non-primary language
against the number of monolingual users editing
the articles.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
12. What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles in
other languages that they also edited in their primary languages. The distribution is
bimodal. A large number of users did not edit any of the same articles in their
primary languages, but a large number of users always edited the same articles in
their primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
13. What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles in
other languages that they also edited in their primary languages after removing
edits to articles that do not exist in users’ primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
14. Hypotheses
Most editors will edit only one language edition
Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
15. Variations by language
Scatter plot of language size (number of unique users) and percentage of users who
are multilingual (edit more than one language edition). The three editions with less
than 10 users in the sample are omitted (Uzbek, Cebuano, and Waray-Waray).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
16. Language crossings
ar
bg
ca
cs
da
de
en
es
fa
fifr
he
hu
id
it
ja
ko
nl
no
pl
pt
ro
ru
sv
tr
uk
zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
17. Language crossings (English removed)
ca
cs
de
es
fr
it
ja
nl
pl
pt
ru
sv
uk zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
18. Hypotheses
Most editors will edit only one language edition
Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
19. Simple English
No big changes if Simple English edition is considered
Largest editor overlap with English edition
Dedicated group of editors:
45% of editors editing Simple most frequently do not edit any other
edition (similar to Esperanto)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
20. Comparison with Twitter
Similar percentages of users multilingual (11% in Twitter)
Similar correlation between activity level and multilingualism
Language size not correlated with multilingualism on Twitter;
some language consistencies (Japanese, English) and some variations
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network.
http://www.scotthale.net/pubs/?chi2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
21. Implications and future directions
Implications
Multilingual users found in all
editions; correlation with activity
Design for multilingual users
(universal language selector and
global accounts already progress
in this direction)
Important per language
variations
Inverse correlation between
multilingual users and self-focus
bias as measured by Hecht
(2009)
Further work
Move from edit meta-data to
edit content itself
What type of edits are users
making in non-primary
languages?
Variations by topic/theme?
Correlations with link/image
overlap?
Viewing vs. editing behavior
(survey results show much higher
percentage of users read multiple
editions)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
22. Multilinguals and Wikipedia Editing
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall as
well as the anonymous reviewers who provided helpful comments on previous versions of
this research.
23. Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge:
Cambridge University Press.
Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks:
How Multilingual Users of Twitter Connect Language Communities.
Proceedings of the American Society for Information Science and
Technology, 49(1), 1–4. Available from
http://dx.doi.org/10.1002/meet.14504901327
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter
Network. In Proceedings of the sigchi conference on human factors in
computing systems (pp. 833–842). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/2556288.2557203
Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in
community-maintained knowledge repositories. In Proceedings of the
fourth international conference on communities and technologies (pp.
11–20). New York, NY, USA: ACM. Available from
http://doi.acm.org/10.1145/1556460.1556463
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
24. Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0:
User-generated content and its applications in a multilingual context.
In Proceedings of the 28th international conference on human factors
in computing systems (pp. 291–300). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/1753326.1753370
Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E.,
Stoerger, S., et al. (2007). Language Networks on LiveJournal. In
Proceedings of the 40th annual hawaii international conference on
system sciences. Washington, DC, USA: IEEE Computer Society.
Available from http://dx.doi.org/10.1109/HICSS.2007.320
Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language
and Internet diffusion patterns in Uzbekistan. New Review of
Hypermedia and Multimedia, 11(2), 205–220.
Yasseri, T., Sumi, R., & Kert´esz, J. (2012). Circadian Patterns of Wikipedia
Editorial Activity: A Demographic Analysis. PLoS ONE, 7(1), e30091.
Available from
http://dx.doi.org/10.1371%2Fjournal.pone.0030091
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
25. Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of
Connection. London: W. W. Norton & Company.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing