SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Multilinguals and Wikipedia Editing
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Background, Motivations
Wikipedia is global platform covering hundreds of languages
despite evidence of balkanization (Taneja & Wu, in press)
Past studies generally concentrate on one edition (usually English)
Important variations across languages
Content is diverse across languages (Hecht & Gergle, 2010)
Each edition of Wikipedia shows a self-focus bias with more articles
about regions where the language is spoken (Hecht & Gergle, 2009)
Multilingual users may act as unconscious translators bridging language
divides (Herring et al., 2007; Eleta & Golbeck, 2012)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Editors of many editions of Wikipedia come from a wide variety of
timezones suggesting that bilingual editors are present (Yasseri et al.,
2012)
In a survey of editors, half of all editors reported editing in multiple
languages and 72% reported reading more than one language edition of
Wikipedia.†
†
https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/
Location %26 Language&oldid=8409990
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
1 Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data
All edits to any of the top 46 language editions (all editions with at
least 100,000 articles)
Recorded via the IRC stream
(code at http://www.scotthale.net/pubs/?websci2014)
32 days (8 July to 9 August 2013)
Edit meta-data
datetime
edition
article title
username
size of edit
flags (minor, bot, etc.)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data cleaning
Non-minor edits by registered, human users to articles
Only edits to main (article) namespace
Removed articles flagged as being created by ‘bots’
Removed anonymous users
Removed undeclared bots and users with only one edit session in the
month
Require at least four edits and at least 2 edits to one edition
Matching users and articles across languages
Look for common usernames across language editions
Check usernames are indeed linked global accounts
WikiData dump to match articles across languages
55,568 users with a total of 3,518,955 edits (excluding the Simple English
edition).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data summary
Language Edits Articles Users NP
users
NP
edits
English 1,389,647 518,405 27,476 18% 3%
German 256,495 125,647 5,967 18% 2%
French 250,828 106,027 4,549 25% 3%
Spanish 191,934 66,848 4,338 24% 3%
Russian 239,267 92,326 3,961 16% 1%
Japanese 106,848 56,406 3,551 11% 2%
Italian 160,191 69,534 2,919 25% 2%
Chinese 112,888 42,937 2,309 14% 1%
Portuguese 67,505 32,753 1,730 29% 4%
Dutch 80,535 39,463 1,500 33% 3%
Polish 67,038 37,393 1,454 30% 3%
Top language editions: The Users column includes all users who edited the edition
during the data collection period. A percentage of these users (NP users) are
non-primary users who edited a different language edition more frequently.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Multilinguals vs Monolinguals
15.4% of users (8,544) edited multiple language editions.
Figure: Density plot comparing the number of edits made by monolingual and
multilingual Wikipedia users.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Only 2.6% of edits are
from users writing in their
non-primary languages.
44% of the articles edited
by multilingual users in
their non-primary
languages were not edited
by any monolingual user
2D density plot of the number of multilingual
users editing articles in a non-primary language
against the number of monolingual users editing
the articles.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles in
other languages that they also edited in their primary languages. The distribution is
bimodal. A large number of users did not edit any of the same articles in their
primary languages, but a large number of users always edited the same articles in
their primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles in
other languages that they also edited in their primary languages after removing
edits to articles that do not exist in users’ primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
Most editors will edit only one language edition
Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Variations by language
Scatter plot of language size (number of unique users) and percentage of users who
are multilingual (edit more than one language edition). The three editions with less
than 10 users in the sample are omitted (Uzbek, Cebuano, and Waray-Waray).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Language crossings
ar
bg
ca
cs
da
de
en
es
fa
fifr
he
hu
id
it
ja
ko
nl
no
pl
pt
ro
ru
sv
tr
uk
zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Language crossings (English removed)
ca
cs
de
es
fr
it
ja
nl
pl
pt
ru
sv
uk zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
Most editors will edit only one language edition
Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Simple English
No big changes if Simple English edition is considered
Largest editor overlap with English edition
Dedicated group of editors:
45% of editors editing Simple most frequently do not edit any other
edition (similar to Esperanto)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Comparison with Twitter
Similar percentages of users multilingual (11% in Twitter)
Similar correlation between activity level and multilingualism
Language size not correlated with multilingualism on Twitter;
some language consistencies (Japanese, English) and some variations
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network.
http://www.scotthale.net/pubs/?chi2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Implications and future directions
Implications
Multilingual users found in all
editions; correlation with activity
Design for multilingual users
(universal language selector and
global accounts already progress
in this direction)
Important per language
variations
Inverse correlation between
multilingual users and self-focus
bias as measured by Hecht
(2009)
Further work
Move from edit meta-data to
edit content itself
What type of edits are users
making in non-primary
languages?
Variations by topic/theme?
Correlations with link/image
overlap?
Viewing vs. editing behavior
(survey results show much higher
percentage of users read multiple
editions)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Multilinguals and Wikipedia Editing
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall as
well as the anonymous reviewers who provided helpful comments on previous versions of
this research.
Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge:
Cambridge University Press.
Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks:
How Multilingual Users of Twitter Connect Language Communities.
Proceedings of the American Society for Information Science and
Technology, 49(1), 1–4. Available from
http://dx.doi.org/10.1002/meet.14504901327
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter
Network. In Proceedings of the sigchi conference on human factors in
computing systems (pp. 833–842). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/2556288.2557203
Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in
community-maintained knowledge repositories. In Proceedings of the
fourth international conference on communities and technologies (pp.
11–20). New York, NY, USA: ACM. Available from
http://doi.acm.org/10.1145/1556460.1556463
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0:
User-generated content and its applications in a multilingual context.
In Proceedings of the 28th international conference on human factors
in computing systems (pp. 291–300). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/1753326.1753370
Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E.,
Stoerger, S., et al. (2007). Language Networks on LiveJournal. In
Proceedings of the 40th annual hawaii international conference on
system sciences. Washington, DC, USA: IEEE Computer Society.
Available from http://dx.doi.org/10.1109/HICSS.2007.320
Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language
and Internet diffusion patterns in Uzbekistan. New Review of
Hypermedia and Multimedia, 11(2), 205–220.
Yasseri, T., Sumi, R., & Kert´esz, J. (2012). Circadian Patterns of Wikipedia
Editorial Activity: A Demographic Analysis. PLoS ONE, 7(1), e30091.
Available from
http://dx.doi.org/10.1371%2Fjournal.pone.0030091
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of
Connection. London: W. W. Norton & Company.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Weitere ähnliche Inhalte

Ähnlich wie Multilinguals and Wikipedia Editing

Design and Multilingual Users on Twitter and Wikipedia
Design and Multilingual Users on Twitter and WikipediaDesign and Multilingual Users on Twitter and Wikipedia
Design and Multilingual Users on Twitter and WikipediaScott A. Hale
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44Alain Désilets
 
Your Global Audience is Already Here: How to Create Content that Communicates...
Your Global Audience is Already Here: How to Create Content that Communicates...Your Global Audience is Already Here: How to Create Content that Communicates...
Your Global Audience is Already Here: How to Create Content that Communicates...Scott Abel
 
Multilingual user interface for website using resource
Multilingual user interface for website using resourceMultilingual user interface for website using resource
Multilingual user interface for website using resourceeSAT Publishing House
 
Multilingual user interface for website using resource files
Multilingual user interface for website using resource filesMultilingual user interface for website using resource files
Multilingual user interface for website using resource fileseSAT Journals
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community wayAlexandro Colorado
 
languagetranslator-211028085026.pptx
languagetranslator-211028085026.pptxlanguagetranslator-211028085026.pptx
languagetranslator-211028085026.pptxMDASIFALI32
 
Increasing access to free and open knowledge for speakers of underserved lang...
Increasing access to free and open knowledge for speakers of underserved lang...Increasing access to free and open knowledge for speakers of underserved lang...
Increasing access to free and open knowledge for speakers of underserved lang...Lucie-Aimée Kaffee
 
Creating Technical Documents In English For Global Audiences
Creating Technical Documents In English For Global AudiencesCreating Technical Documents In English For Global Audiences
Creating Technical Documents In English For Global AudiencesEddie Hollon
 
Language translator
Language translatorLanguage translator
Language translatorSumitSumit26
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticsCornelius Puschmann
 
Thomson Reuters EndNote x7.2 Overview and Roadmap
Thomson Reuters EndNote x7.2 Overview and RoadmapThomson Reuters EndNote x7.2 Overview and Roadmap
Thomson Reuters EndNote x7.2 Overview and RoadmapEduserv
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsIrum Malik
 
Rianne Nieland's final presentation
Rianne Nieland's final presentationRianne Nieland's final presentation
Rianne Nieland's final presentationVictor de Boer
 
A Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching ResearchA Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching ResearchJim Webb
 
An Open Online Dictionary for Endangered Uralic Languages.pdf
An Open Online Dictionary for Endangered Uralic Languages.pdfAn Open Online Dictionary for Endangered Uralic Languages.pdf
An Open Online Dictionary for Endangered Uralic Languages.pdfJackie Gold
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 

Ähnlich wie Multilinguals and Wikipedia Editing (20)

Design and Multilingual Users on Twitter and Wikipedia
Design and Multilingual Users on Twitter and WikipediaDesign and Multilingual Users on Twitter and Wikipedia
Design and Multilingual Users on Twitter and Wikipedia
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Your Global Audience is Already Here: How to Create Content that Communicates...
Your Global Audience is Already Here: How to Create Content that Communicates...Your Global Audience is Already Here: How to Create Content that Communicates...
Your Global Audience is Already Here: How to Create Content that Communicates...
 
Multilingual user interface for website using resource
Multilingual user interface for website using resourceMultilingual user interface for website using resource
Multilingual user interface for website using resource
 
Multilingual user interface for website using resource files
Multilingual user interface for website using resource filesMultilingual user interface for website using resource files
Multilingual user interface for website using resource files
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community way
 
languagetranslator-211028085026.pptx
languagetranslator-211028085026.pptxlanguagetranslator-211028085026.pptx
languagetranslator-211028085026.pptx
 
Increasing access to free and open knowledge for speakers of underserved lang...
Increasing access to free and open knowledge for speakers of underserved lang...Increasing access to free and open knowledge for speakers of underserved lang...
Increasing access to free and open knowledge for speakers of underserved lang...
 
Creating Technical Documents In English For Global Audiences
Creating Technical Documents In English For Global AudiencesCreating Technical Documents In English For Global Audiences
Creating Technical Documents In English For Global Audiences
 
Php packages
Php packagesPhp packages
Php packages
 
Language translator
Language translatorLanguage translator
Language translator
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 
Thomson Reuters EndNote x7.2 Overview and Roadmap
Thomson Reuters EndNote x7.2 Overview and RoadmapThomson Reuters EndNote x7.2 Overview and Roadmap
Thomson Reuters EndNote x7.2 Overview and Roadmap
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Rianne Nieland's final presentation
Rianne Nieland's final presentationRianne Nieland's final presentation
Rianne Nieland's final presentation
 
A Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching ResearchA Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching Research
 
An Open Online Dictionary for Endangered Uralic Languages.pdf
An Open Online Dictionary for Endangered Uralic Languages.pdfAn Open Online Dictionary for Endangered Uralic Languages.pdf
An Open Online Dictionary for Endangered Uralic Languages.pdf
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
 

Mehr von Scott A. Hale

Researching Misinformation
Researching MisinformationResearching Misinformation
Researching MisinformationScott A. Hale
 
Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Scott A. Hale
 
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...Scott A. Hale
 
Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)Scott A. Hale
 
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...Scott A. Hale
 
Interactive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and disseminationInteractive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and disseminationScott A. Hale
 
Oxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolOxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolScott A. Hale
 
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the WebMapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the WebScott A. Hale
 
Ancient History of the UK Web
Ancient History of the UK WebAncient History of the UK Web
Ancient History of the UK WebScott A. Hale
 
ECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers ExperimentECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers ExperimentScott A. Hale
 

Mehr von Scott A. Hale (10)

Researching Misinformation
Researching MisinformationResearching Misinformation
Researching Misinformation
 
Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...
 
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
 
Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)
 
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
 
Interactive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and disseminationInteractive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and dissemination
 
Oxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolOxford Digital Humanities Summer School
Oxford Digital Humanities Summer School
 
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the WebMapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
 
Ancient History of the UK Web
Ancient History of the UK WebAncient History of the UK Web
Ancient History of the UK Web
 
ECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers ExperimentECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers Experiment
 

Kürzlich hochgeladen

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 

Kürzlich hochgeladen (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 

Multilinguals and Wikipedia Editing

  • 1. Multilinguals and Wikipedia Editing Scott A. Hale Oxford Internet Institute http://www.scotthale.net/pubs/?websci2014 25 June 2014 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 2. Background, Motivations Wikipedia is global platform covering hundreds of languages despite evidence of balkanization (Taneja & Wu, in press) Past studies generally concentrate on one edition (usually English) Important variations across languages Content is diverse across languages (Hecht & Gergle, 2010) Each edition of Wikipedia shows a self-focus bias with more articles about regions where the language is spoken (Hecht & Gergle, 2009) Multilingual users may act as unconscious translators bridging language divides (Herring et al., 2007; Eleta & Golbeck, 2012) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 3. Related work Why edit Wikipedia in a foreign language? Increased audience size (Crystal, 2003; Zuckerman, 2013) In a Uzbekistan survey, Internet users reported accessing content in foreign languages even while simultaneously reporting poor foreign language skills (Wei & Kolko, 2005) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 4. Related work Why edit Wikipedia in a foreign language? Increased audience size (Crystal, 2003; Zuckerman, 2013) In a Uzbekistan survey, Internet users reported accessing content in foreign languages even while simultaneously reporting poor foreign language skills (Wei & Kolko, 2005) Editors of many editions of Wikipedia come from a wide variety of timezones suggesting that bilingual editors are present (Yasseri et al., 2012) In a survey of editors, half of all editors reported editing in multiple languages and 72% reported reading more than one language edition of Wikipedia.† † https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/ Location %26 Language&oldid=8409990 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 5. Hypotheses 1 Most editors will edit only one language edition 2 Multilingual users will edit different articles than monolingual users 3 When a user edits an article in another language that same user will usually also edit the corresponding article in his native language 4 Users writing primarily in smaller-sized language editions will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions 5 Larger-sized language editions, English chief among them, will be more likely to have contributions from editors of different languages than smaller-sized language editions Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 6. Data All edits to any of the top 46 language editions (all editions with at least 100,000 articles) Recorded via the IRC stream (code at http://www.scotthale.net/pubs/?websci2014) 32 days (8 July to 9 August 2013) Edit meta-data datetime edition article title username size of edit flags (minor, bot, etc.) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 7. Data cleaning Non-minor edits by registered, human users to articles Only edits to main (article) namespace Removed articles flagged as being created by ‘bots’ Removed anonymous users Removed undeclared bots and users with only one edit session in the month Require at least four edits and at least 2 edits to one edition Matching users and articles across languages Look for common usernames across language editions Check usernames are indeed linked global accounts WikiData dump to match articles across languages 55,568 users with a total of 3,518,955 edits (excluding the Simple English edition). Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 8. Data summary Language Edits Articles Users NP users NP edits English 1,389,647 518,405 27,476 18% 3% German 256,495 125,647 5,967 18% 2% French 250,828 106,027 4,549 25% 3% Spanish 191,934 66,848 4,338 24% 3% Russian 239,267 92,326 3,961 16% 1% Japanese 106,848 56,406 3,551 11% 2% Italian 160,191 69,534 2,919 25% 2% Chinese 112,888 42,937 2,309 14% 1% Portuguese 67,505 32,753 1,730 29% 4% Dutch 80,535 39,463 1,500 33% 3% Polish 67,038 37,393 1,454 30% 3% Top language editions: The Users column includes all users who edited the edition during the data collection period. A percentage of these users (NP users) are non-primary users who edited a different language edition more frequently. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 9. Multilinguals vs Monolinguals 15.4% of users (8,544) edited multiple language editions. Figure: Density plot comparing the number of edits made by monolingual and multilingual Wikipedia users. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 10. Hypotheses Most editors will edit only one language edition 2 Multilingual users will edit different articles than monolingual users 3 When a user edits an article in another language that same user will usually also edit the corresponding article in his native language 4 Users writing primarily in smaller-sized language editions will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions 5 Larger-sized language editions, English chief among them, will be more likely to have contributions from editors of different languages than smaller-sized language editions Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 11. What do multilinguals edit? Only 2.6% of edits are from users writing in their non-primary languages. 44% of the articles edited by multilingual users in their non-primary languages were not edited by any monolingual user 2D density plot of the number of multilingual users editing articles in a non-primary language against the number of monolingual users editing the articles. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 12. What do multilinguals edit? Histogram showing the distribution with which multilingual users edited articles in other languages that they also edited in their primary languages. The distribution is bimodal. A large number of users did not edit any of the same articles in their primary languages, but a large number of users always edited the same articles in their primary languages. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 13. What do multilinguals edit? Histogram showing the distribution with which multilingual users edited articles in other languages that they also edited in their primary languages after removing edits to articles that do not exist in users’ primary languages. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 14. Hypotheses Most editors will edit only one language edition Multilingual users will edit different articles than monolingual users Ö When a user edits an article in another language that same user will usually also edit the corresponding article in his native language 4 Users writing primarily in smaller-sized language editions will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions 5 Larger-sized language editions, English chief among them, will be more likely to have contributions from editors of different languages than smaller-sized language editions Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 15. Variations by language Scatter plot of language size (number of unique users) and percentage of users who are multilingual (edit more than one language edition). The three editions with less than 10 users in the sample are omitted (Uzbek, Cebuano, and Waray-Waray). Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 16. Language crossings ar bg ca cs da de en es fa fifr he hu id it ja ko nl no pl pt ro ru sv tr uk zh Co-editing network graph Nodes represent language editions Directed, weighted edges show the log of the number of users primarily editing one language edition who edited another edition Only edges with weights over 1.96 standard deviations above the mean are shown Colors indicate communities found by the infomap community detection algorithm Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 17. Language crossings (English removed) ca cs de es fr it ja nl pl pt ru sv uk zh Co-editing network graph Nodes represent language editions Directed, weighted edges show the log of the number of users primarily editing one language edition who edited another edition Only edges with weights over 1.96 standard deviations above the mean are shown Colors indicate communities found by the infomap community detection algorithm Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 18. Hypotheses Most editors will edit only one language edition Multilingual users will edit different articles than monolingual users Ö When a user edits an article in another language that same user will usually also edit the corresponding article in his native language Users writing primarily in smaller-sized language editions will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions Larger-sized language editions, English chief among them, will be more likely to have contributions from editors of different languages than smaller-sized language editions Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 19. Simple English No big changes if Simple English edition is considered Largest editor overlap with English edition Dedicated group of editors: 45% of editors editing Simple most frequently do not edit any other edition (similar to Esperanto) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 20. Comparison with Twitter Similar percentages of users multilingual (11% in Twitter) Similar correlation between activity level and multilingualism Language size not correlated with multilingualism on Twitter; some language consistencies (Japanese, English) and some variations Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network. http://www.scotthale.net/pubs/?chi2014 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 21. Implications and future directions Implications Multilingual users found in all editions; correlation with activity Design for multilingual users (universal language selector and global accounts already progress in this direction) Important per language variations Inverse correlation between multilingual users and self-focus bias as measured by Hecht (2009) Further work Move from edit meta-data to edit content itself What type of edits are users making in non-primary languages? Variations by topic/theme? Correlations with link/image overlap? Viewing vs. editing behavior (survey results show much higher percentage of users read multiple editions) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 22. Multilinguals and Wikipedia Editing Scott A. Hale Oxford Internet Institute http://www.scotthale.net/pubs/?websci2014 25 June 2014 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall as well as the anonymous reviewers who provided helpful comments on previous versions of this research.
  • 23. Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge: Cambridge University Press. Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks: How Multilingual Users of Twitter Connect Language Communities. Proceedings of the American Society for Information Science and Technology, 49(1), 1–4. Available from http://dx.doi.org/10.1002/meet.14504901327 Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network. In Proceedings of the sigchi conference on human factors in computing systems (pp. 833–842). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/2556288.2557203 Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the fourth international conference on communities and technologies (pp. 11–20). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/1556460.1556463 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 24. Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0: User-generated content and its applications in a multilingual context. In Proceedings of the 28th international conference on human factors in computing systems (pp. 291–300). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/1753326.1753370 Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E., Stoerger, S., et al. (2007). Language Networks on LiveJournal. In Proceedings of the 40th annual hawaii international conference on system sciences. Washington, DC, USA: IEEE Computer Society. Available from http://dx.doi.org/10.1109/HICSS.2007.320 Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language and Internet diffusion patterns in Uzbekistan. New Review of Hypermedia and Multimedia, 11(2), 205–220. Yasseri, T., Sumi, R., & Kert´esz, J. (2012). Circadian Patterns of Wikipedia Editorial Activity: A Demographic Analysis. PLoS ONE, 7(1), e30091. Available from http://dx.doi.org/10.1371%2Fjournal.pone.0030091 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 25. Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of Connection. London: W. W. Norton & Company. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing