Slides presented at UPF:
https://www.upf.edu/web/mdm-dtic/gender-and-wikipedia_2017
https://www.upf.edu/en/web/guest/home/-/asset_publisher/UI8Z8VAxU47P/content/id/7282941/maximized#.WIjEE2dA-Ba
The presentation focuses on two studies that investigate differences in language used by men and women on Wikipedia talk pages. Automatic message analysis reveals that women participate more in discussions that have a positive tone, and use a language that promotes more relationship and emotional connection compared to men. We also observe a gender difference in the leadership style: while men administrators tend to maintain an impersonal tone compared to other users, women administrators are indistinguishable from other women, and use a markedly emotional and relationship-oriented language. The results suggest the importance of communication style to address gender gap in online collaboration platforms, and to favor more welcoming environments capable of attracting and retaining users.
Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions
1. Gender Gap in Collaborative Platforms:
Language and emotions in Wikipedia Discussions
David Laniado, Daniela Iosub, Carlos Castillo,
Mayo Fuster Morell and Andreas Kaltenbrunner
david.laniado@eurecat.org
Universitat Pompeu Fabra, January 17, 2017
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 1 / 58
2. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 2 / 58
3. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 3 / 58
4. Wikipedia is a teenager
Happy birthday
Wikipedia!
English Wikipedia is
now 16 years old
Catalan Wikipedia will
be 16 in March (the
second oldest one)
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 4 / 58
5. The largest human knowledge repository
Fifth most visited web site
Among top results for search queries about almost any topic
Largest collaborative project
Conditions and reflects public opinion... with some bias
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 5 / 58
6. Wikipedia’s social experiment
"The problem with Wikipedia is that it only works in practice. In theory,
it can never work."
(Wikipedian popular joke)
A crazy idea: anyone can edit
All relevant points of view should be represented
Policies of Notability and Neutral Point of view
Quality assured by editors’ negotiations over content
The more people with different points of view contributing, the
better the quality
→ Biases in the editor community may cause biases in the content
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 6 / 58
7. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 7 / 58
8. Bias in the content?
Top global biographies by birth country (Young-Ho Eom et al, 2015)
top central biographies from each of the 24 major Wikipedias
the 100 most central (PageRank) in each version’s hyperlink network
→ striking geographic bias
http://www.quantware.ups-tlse.fr/QWLIB/topwikipeople/geofigs/pagerank24x100.html
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 8 / 58
9. Top Women Biographies
Rank NA PageRank female figures CC Century LC
1 24 Elizabeth II UK 20 EN
2 17 Mary (mother of Jesus) IL -1 HE
3 12 Queen Victoria UK 19 EN
4 6 Elizabeth I of England UK 16 EN
5 2 Maria Theresa AT 18 DE
6 1 Benazir Bhutto PK 20 HI
7 1 Catherine the Great PL 18 PL
8 1 Anne Frank DE 20 DE
9 1 Indira Gandhi IN 20 HI
10 1 Margrethe II of Denmark DK 20 DA
Top 10 global female historical figures by PageRank for the 24 major
Wikipedia editions (Young-Ho Eom et al, 2015)
NA → number of language editions in which a biography appears in the
top 100 rank
CC → birth country code
LC → language code corresponding to the birth country
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 9 / 58
10. Top biographies by gender
Number of women among the top global biographies by birth century
Number of women among the 100 most central biographies for each
language edition
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 10 / 58
11. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 11 / 58
12. Wikipedia editor gender gap
Estimated women participation
2011 editor survey: 9%
2013 editor survey: 13%
corrected with propensity score estimation: 16% (Mako Hill and
Shaw, 2013)
while in most online social networks women are more active!
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 12 / 58
13. Why women do not edit Wikipedia?
1 A lack of user-friendliness in the editing interface
2 Not having enough free time
3 A lack of self-confidence
4 Aversion to conflict and an unwillingness to participate in lengthy
edit wars
5 Belief that their contributions are too likely to be reverted or
deleted
6 Some find its overall atmosphere misogynistic
7 Wikipedia culture is sexual in ways they find off-putting
8 Being addressed as male is off-putting to women whose primary
language has grammatical gender
9 Fewer opportunities than other sites for social relationships and a
welcoming tone
(Sue Gardener, 2011)
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 13 / 58
14. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 14 / 58
15. Emotional factors and discussions
Importance of
emotional factors
Discussion spaces are
fundamental to the
collaborative process
Discussion triggers
emotions and breeds
particular emotional
environments
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 15 / 58
16. Wikipedia’s most visible side
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 16 / 58
17. Article talk pages
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 17 / 58
18. Discussions in article talk pages
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 18 / 58
19. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 19 / 58
20. Studying emotions and language in talk pages
Interactions in Wikipedia
Implicit → editing
Explicit → communication
Article talk pages → discussions about how to improve articles
User talk pages → a kind of public in-boxes
Goal: Shed light on the emotional dimension of the interactions
extensive analysis of emotions in explicit communication
sentiment analysis of comments in article talk and personal talk
pages
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 20 / 58
21. Research questions
1 How are the emotional and communication styles of editors
affected by their status?
2 How are the emotional and communication styles of editors
affected by their gender?
3 How are the emotional expressions affected by interacting with
others in comment threads (emotional congruence)?
4 How are the emotional styles of editors related to those of the
editors they interact more frequently with (emotional
homophily)?
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 21 / 58
22. Publications
Results published in:
Laniado, D., Castillo, C., Kaltenbrunner, A., and Fuster Morell, M. F. (2012)
Emotions and dialogue in a peer-production community: the case of Wikipedia.
8th International Symposium on Wikis and Open Collaboration, WikiSym’12
Iosub, D., Laniado, D., Castillo, C., Fuster Morell, M. F., and Kaltenbrunner, A. (2014)
Emotions under Discussion: Gender, Status and Communication in Online Collaboration.
Plos One, 9(8)
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 22 / 58
23. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 23 / 58
24. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 24 / 58
25. Dataset: conversations
Extracting conversations among editors from the English Wikipedia
Articles 3 210 039
Articles with talk page (ATP) 871 485 (27.1%)
Editors who comment articles 350 958
Editors with ≥ 100 comments on ATP 12 231 (3.5%)
Total comments in ATP 11 041 246
Comments containing ANEW words 7 414 411 (67.2%)
Comments by editors with ≥ 100 comments on ATP 5 480 544 (49.6%)
Comments by these editors and with ANEW words 3 649 297 (33.3%)
Table: Data extracted from a complete dump of the English Wikipedia, dated
March 12th, 2010
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 25 / 58
26. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 26 / 58
27. User gender labelling
≈ 12 000 users wrote ≥ 100 comments in articles talk pages
Gender identified through Wikipedia API for ≈ 2 000 of them
A sample of 1 385 users for manual labelling through
crowdsourcing (Crowdflower)
Non-admins Admins Total
Men 1 087 1 526 2 613
Women 68 97 165
Unknown 6 850 2 603 9 453
Total 8 005 4 226 12 231
Table: Users with ≥ 100 comments by gender and administrator status.
Gender could be identified only for ≈ 50% of users:
real name or username (50% of those identified)
implicitly stated gender (27% of women, 20% of men)
pronoun (15% of women, 10% of men)
other indicators: userboxes, pictures, links to personal blogs...
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 27 / 58
28. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 28 / 58
29. Measuring the Emotional Content of Discussions
Lexicon-based methods
relying on three different instruments:
Affective norms for English words (ANEW)
Linguistic Inquiry and Word Count (LIWC)
SentiStrength
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 29 / 58
30. Measuring the Emotional Content of Discussions
Method 1: Affective norms for English words (ANEW)
Rates a list of 1060 frequent words on a 9 point scale in three
dimensions:
Valence
Arousal
Dominance
assign emotion scores to each word from the lexicon
Bradley and Lang. (1999).
Affective norms for English words (ANEW) Technical report C-1.
The Center for Research in Psychophysiology, University of Florida, FL.
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 30 / 58
31. Measuring the Emotional Content of Discussions
Method 2: Linguistic Inquiry and Word Count (LIWC)
Two scores for basic emotion (compared with ANEW valence)
positive emotion
negative emotion
Discrete measures of emotions (anger, anxiety, sadness, affect)
Other classes of words to characterize language (i.e. personal
pronouns, tentative words, fillers...)
→ Count the proportion of words belonging to each class
Pennebaker J, Chung C, Ireland M, Gonzales A, Booth R (2010).
The development and psychometric properties of LIWC2007. Austin, TX.
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 31 / 58
32. Measuring the Emotional Content of Discussions
Method 2: Linguistic Inquiry and Word Count (LIWC)
Dictionary size Examples
Anger 91 hate, kill, annoyed
Anxiety 84 worried, fearful, nervous
Sadness 101 crying, grief, sad
Tentative 155 maybe, perhaps, guess
Certainty 83 always, never
Fillers 9 blah, you know
Past 155 went, ran, had
Present 169 is, does, hear
Future 48 will, gonna
Social words 455 mate, talk, child
Table: Description of LIWC measures (as per http://www.liwc.net).
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 32 / 58
33. Measuring Relationship-Orientation with LIWC
Definition
Communication that promotes social
affiliation and emotional connection:
preoccupation with others (use of
personal pronouns, e.g., I, you)
preoccupation with the larger social
domain (e.g., references to friends and
family)
expression of positive emotion
Examples
We are glad to have you. If I can help at all let
me know :)
A-giau has smiled at you. Smiles promote
WikiLove and hopefully this one has made
your day better...Happy editing
Congrats! Thank you for your dedication.
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 33 / 58
34. Measuring the Emotional Content of Discussions
Method 3: SentiStrength
SentiStrength
Based on LIWC and developed for short web texts
Accounts for modes of textual expression specific to the online
environment, e.g. emoticons and abbreviations
Provides a positive and a negative score for emotional valence
Emotion score is the strongest positive and negative emotion
expressed in a comment
Final scores are averages over comments in a given category
Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010)
Sentiment strength detection in short informal text.
Journal of the American Society for Information Science and Technology 61: 2544 – 2558.
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 34 / 58
35. Example: results with three different emotional lexica
Table: Example messages with their corresponding Valence(ANEW) or
positive & negative scores (LIWC, SentiStrength)
ANEW LIWC SentiSt.
Valence + - + -
Sounds like a good challenge - to be proven or disproven. I’m
happy if it can be shown to go further using closed cubic poly-
nomial solutions. The nice thing about these are that they are
pretty easy to test numerically . . .
7.4 12.5 0 3 -2
–in “Exact trigonometric constants”
Seems you have not yet seen female lover after having sex
who do not wish to have sex with the same lover any more :)
Once you’ve seen it, you understand very well what war of Venus
means compared to war of Mars.
5.5 6.8 4.5 4 -3
–in “House (astrology)”
What about the whirlie hazing, the alcohol abuse, the emotional
poverty, the suicide in 1995/6, the biotech plans which were
stopped by pitzer protests . . .
1.6 4 8 1 -4
–in “Harvey Mudd College”
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 35 / 58
36. Sentiment analysis
Statistical tests
Compute average values with the three lexica for each user
Compare distribution of values for two groups of users (e.g.:
admins vs regulars, women vs men)
Most variables are not normally distributed
⇓
Mann-Whitney U-test
Compare distributions of rankings
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 36 / 58
37. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 37 / 58
38. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 38 / 58
39. Emotions and Status
Table: Emotions and Status: Administrators promote a generally neutral tone
on article talk pages. Regular editors express more negative emotion, and
are more emotional.
(Article Talk) Regular Admin Mann-Whitney U-Test p-value
LIWC
Positive 2.369 2.409 -4.308 p < 0.001
Negative 1.368 1.120 -18.578 p < 0.001
Affect 3.784 3.661 -8.466 p < 0.001
Anxiety 0.180 0.166 -5.834 p < 0.001
Anger 0.554 0.446 -19.217 p < 0.001
Sadness 0.175 0.166 -4.450 p < 0.001
SentiStrength
Positive 1.805 1.774 -14.603 p < 0.001
Negative -2.005 -1.912 -23.046 p < 0.001
When difference is statistically significant (p-value in bold) the larger absolute value is underlined
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 39 / 58
40. Emotions and Status
Admins:
more positive emotion
(ANEW and LIWC)
generally, emotionally
reserved compared to
regular users (LIWC)
Regular users:
more emotional
more affect, and more
anxiety, anger and
sadness (LIWC)
stronger positive and
negative words than
admins (SentiStrength)
Personal talk pages
In personal talk pages, admins are more emotional compared to
the article talk pages
more positive emotion compared to regular editors, but also more
anxiety and sadness
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 40 / 58
41. Dialogue and Status
Table: Dialogue and Status: Administrators are more impersonal in article talk
pages. Regular editors are more concerned with others.
(Article Talk) Regular Admin Mann-Whitney U-test p-value
Relationship-orientation
Personal pronouns 5.135 4.815 -13.561 p < 0.001
Use of “I” 2.456 2.429 -1.733 p=0.083
Use of “You” 1.043 0.892 -12.573 p < 0.001
Use of “Shehe” 0.609 0.526 -8.657 p < 0.001
Social words 6.320 5.810 -19.013 p < 0.001
Certainty
Certainty 1.426 1.317 -16.824 p < 0.001
Tentativeness 3.199 3.169 -2.210 p < 0.001
Filler words 0.168 0.155 -6.687 p < 0.001
Temporal Orientation
Past 2.376 2.305 -5.696 p < 0.001
Present 8.011 7.841 -8.060 p < 0.001
Future 1.114 1.166 -9.887 p < 0.001
When difference is statistically significant (p-value in bold) the larger absolute value is underlined
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 41 / 58
42. Dialogue and Status
Admins
more neutral and
impersonal tone
less relationship oriented
more concerned with the
future
tend to "rule with reason"
Regular users
more relationship-oriented
more personal pronouns
and more social words
more concerned with past
more insecure, but not in
personal spaces
more certainty, tentative
and filler words in article
talk pages
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 42 / 58
43. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 43 / 58
44. Emotions and gender
ANEW Words more used by women and men
Size accounts for difference in frequency
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 44 / 58
45. Emotions and gender
Women use words associated to more positive emotions
Result consistent and significant with the three lexicons
ANEW: Difference is not significant when normalising by article
→ difference might be due to topic selection: women choose to
participate in topics which have more positive discussions
No significant difference in expression of negative emotions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 45 / 58
46. Topics, emotions and gender
N≥1 ANEW words; corr=−0.64 (p=0.002)
prop. of male comments
meanvalence
0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96
4.7
4.8
4.9
5
5.1
5.2
5.3
5.4
5.5
5.6
Computing
Arts
Philosophy
Language
Health
Mathematics
Belief
Sports
Agriculture
Environment
Techn. & app. sci.
Law
Society
Business
Education
Culture
People
Science
Politics
Geography and places
History and events
Figure: Mean valence (ANEW) for discussions of articles in different topic
categories, vs the proportion of comments written by men
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 46 / 58
47. Dialogue and gender
Table: Dialogue and Gender: Women use a more relationship-oriented
speech style.
(Article Talk) Men Women Mann-Whitney U-test p-value
Relationship-orientation
Personal pronouns 4.964 5.420 -4.375 p < 0.001
Use of “I” 2.488 2.764 -3.945 p < 0.001
Use of “You” 0.936 0.957 -0.926 p=0.355
Use of “Shehe” pronouns 0.541 0.713 -4.657 p < 0.001
Social words 5.960 6.353 -3.487 p < 0.001
Certainty
Certainty 1.346 (1397) 1.300 (1263) -2.078 p = 0.038*
Tentativeness 3.150 3.215 -1.162 p=0.245
Filler words 0.161 0.160 -0.137 p=0.891
Temporal Orientation
Past 2.325 2.543 -4.305 p < 0.001
Present 7.897 8.180 -3.086 p = 0.002
Future 1.168 1.147 -1.008 p=0.314
When difference is statistically significant (p-value in bold) the larger absolute value is
underlined. Cases where the averages are not informative are marked with an asterisk * and
include the mean ranks Mann-Whitney U-test next to the averages in parentheses.
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 47 / 58
48. Dialogue and Gender
Women write longer messages
Women are more relationship-oriented
more personal pronouns, in particular “I”, more social words
Women are not more insecure
Less certainty words, no significant difference for tentativeness and
filler words
Women admins are more relationship oriented than men admins
Different leadership style
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 48 / 58
49. Qualitative analysis: Relationship orientation
Manual classification of 100 comments
Three main types of comments high in relationship-orientation:
inviting comments that explain the edit in a friendly tone, and call for
further intervention and collaboration
common perspective-building comments that are focused on
understanding others and solving debates in a constructive manner
appreciative comments that contain positive emotions and
celebrate others’ actions
⇒ This suggests that relationship-orientation may be conducive to
successful collaboration
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 49 / 58
50. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 50 / 58
51. Emotional congruence
Comparison of each comment with the comment it replies to
not based only on our set of users, but on all comments (from all
users)
Emotions: editors tend to reply with:
more positive emotion
less negative emotion
less anger, anxiety and sadness
stronger words, both positive and negative (SentiStrength)
Dialogue: editors tend to reply with:
more relationship oriented speech
less tentative and certainty words
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 51 / 58
52. Emotional homophily
Mixing patterns: do users interact preferentially with similar users?
Disassortativity by activity
users who write more comments tend to reply preferentially to less
active users, and viceversa
Assortativity by gender
Men interact more with other men, and women with other women
Assortativity by emotion and language
Users interact more with others similar in emotional expression and
communication style
also in the network of communication on personal talk pages
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 52 / 58
53. Emotional homophily
Example: homophily by expression of anger
edges connect users who
have exchanged at least 10
replies
node color represents the
level of anger expressed by a
user, from low to high
node size → proportional to
the number of connections of
a user
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 53 / 58
54. Outline
1 Introduction
Wikipedia content biases
Wikipedia gender gap
Wikipedia discussion spaces
Goal and research questions
2 Framework of analysis
Data acquisition and pre-processing
User gender labelling
Language and sentiment analysis
3 Results
Emotions and status
Emotions and gender
Networked emotions
4 Conclusions
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 54 / 58
55. Conclusions
Administrators and experienced users play a pivotal role
they tend to interact especially with less experienced users
they promote a positive but impersonal environment
Men and women have a different communication style
women participate in discussions that have a more positive tone
men interact more with men, and women with women
women use a more emotional and relationship-oriented language
women admins have a relationship-oriented leadership style
⇒ promoting relationship-orientated leadership could lead to a more
positive environment
⇒ giving women more space in the community could result in a more
welcoming envoronment, for both women and men
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 55 / 58
56. Future work
Longitudinal analysis
how do emotional styles of editors change over time and with
increasing experience?
how do emotions in the discussions affect participation?
Qualitative analysis and human annotation
include non-textual emotional aspects such as emoticons, barn
stars and virtual gifts
deal with sarcasm, measure the extent of condescending or
paternalistic language in comments addressed at women editors
Examine other online spaces
Similar conclusions might hold for other online spaces
especially in discussions involving conflict, decision making and
power dynamics
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 56 / 58
57. Some references
M. M. Bradley and P. J .Lang.
Affective norms for English words (ANEW) Technical report C-1.
The Center for Research in Psychophysiology, University of Florida, FL, 2012.
B. Collier and J. Bear.
Conflict, criticism, or confidence: an empirical examination of the gender gap in wikipedia contributions.
In Proc. of CSCW, 2012.
Eom, Y.H., Aragón, P., Laniado, D., Kaltenbrunner, A., Vigna, S., Shepelyansky, D.L.: Interactions of cultures and top
people of wikipedia from ranking of 24 language editions.
PLoS ONE 10(3), e0114,825 (2015).
Iosub, D., Laniado, D., Castillo, C., Fuster Morell, M. F., and Kaltenbrunner, A. (2014)
Emotions under Discussion: Gender, Status and Communication in Online Collaboration.
Plos One, 9(8)
O. Kucuktunc, B. B. Cambazoglu, I. Weber, and H. Ferhatosmanoglu.
A large-scale sentiment analysis for Yahoo! answers.
In Proc. of WSDM, 2012.
D. Laniado, R. Tasso, Y. Volkovich, and A. Kaltenbrunner.
When the Wikipedians talk: Network and tree structure of Wikipedia discussion pages.
In Proc. of ICWSM, 2011.
Laniado, D., Castillo, C., Kaltenbrunner, A., and Fuster Morell, M. F. (2012)
Emotions and dialogue in a peer-production community: the case of Wikipedia.
8th International Symposium on Wikis and Open Collaboration, WikiSym’12
H. Zhu, R. Kraut, A. Kittur
Effectiveness of shared leadership in online communities.
In Proc. of CSCW, 2012.
David Laniado @sdivad Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Discussions 57 / 58