From my CSCW 2012 talk about language, gender, and utility on IMDb. Slides and notes available as PDF:
http://casmlab.org/docs/learning_the_lingo_w_notes.pdf
More info about the project available: http://www.casmlab.org/projects/informationbias/
9. data
• 250 top-rated movies
• 21,012 unique reviewers
• 100 most prolific men, 100 most prolific
women from that group
• 199,166 reviews written by those 200
people
10. descriptives
M F
reviews written
1,187 183.5
(median)
review length
249 223
(median)
most reviews 8,167 2,061
19. The Descendants takes a dramatic look at
the structure of family and the intrinsic
bonds that holds its members together. This
dark and wonderful drama painfully reveals
the fact that many families are broken, often
ugly things, yet still mysteriously hold
together. It provides a solemn example of
how every single piece of a family can be
fragmented yet recreated through
communal obstacles.
20. Best picture nominee? Really? It was
quite boring for me... I was waiting
the end of it almost from the
beginning... If there wasn't Clooney, I
don't know the reason I would even
consider to see it. But he is really
amazing actor! His play was
awesome!
26. gender
statistical significance: small effect size:
hedges hedges
pronouns pronouns
first person pronouns first person pronouns
vocabulary richness vocabulary richness
sentence complexity sentence complexity
word complexity word complexity
utility utility
27. time
statistical significance: small effect size:
hedges hedges
pronouns pronouns
first person pronouns first person pronouns
vocabulary richness vocabulary richness
sentence complexity sentence complexity
word complexity word complexity
34. summary:
what we know
- Language convergence can happen, even
without direct interaction;
- Women receive lower utility scores, even
when they write like the men;
- Lower utility scores effectively bury
women’s contributions
35. future work:
what we don’t know
- Who reads, writes, and votes and why
- What’s the relationship between social
voting and information bias?
- What else could be driving these results?
- Effects of the objects being reviewed – e.g.,
movie genre, popularity
36. contact us
Libby Hemphill Jahna Otterbacher
libby.hemphill@iit.edu jahna.otterbacher@iit.edu
Illinois Institute of Technology
Chicago, IL
37. limitations
- gender and sex complicated, their reporting
is also complicated
- don’t know who’s voting
- other features (e.g., genre, release date)
may also have effects
38. controlling for movie
• Jahna’s 2010 CIKM paper: predicted gender using content,
style and metadata (including utility) features. Included
movie and movie genre as control variables, but they were
not significantly correlated to review utility. Gender was
by and large the most significant correlate of utility.
• Did a little sanity check by looking up three “chick
flicks” (one mentioned in yesterday’s panel) on IMDb:
Bridges of Madison County, Sixteen Candles and even
Romy and Michele's High School Reunion. If you sort by
gender, you can see that the guys are writing the highly-
ranked stuff.
39. q: What’s your role in this community?
a: I write reviews. Hopefully some people read them. I
primarily write them because I enjoy it. If people want to
read them, that's a bonus. I read other people's reviews
because I'm interested in what they've got to say. I like
the varied responses people come up with to the same
film! And I visit the forums to join in debates or to
answer a question in which a user needs help identifying
a movie or an actor. So my role in that sense would be
"sharing knowledge".
40. q: To what extent do you pay attention to how
many people mark your reviews as useful?
a: None with obscure movies. how often
would anybody even READ the review. (sic)
And people might be like me, just curious if
other people had the same opinion. I don't
really go back and see if people found them
useful. I like if I get a personal note about a
review, but that happens VERY rarely.
41. q: To what extent do you feel that your
reviews are valued by the IMDb community?
a: To be honest I really have no idea how to
answer that, while I get the odd private
message thanking me or criticsing my
reviews I don't get that much feedback so in
all honesty don't know. I like to think people
like them but I can't be sure one way or the
other.
43. example reviews
The next in a long line of "found footage" flicks that have been flooding our cinemas
over the last few years, Chronicle breaks free of the usual constraints within that
sub genre to concoct a truly memorable sci-fi thriller. Retracing the steps of three
teenage friends who are gifted with telekinesis after a chance encounter with
something (intelligently, the movie never stipulates what exactly), the story focuses
on the varying paths they take with their new found talent, but not until they have
had some juvenile fun with it first. This is an amazingly accomplished debut feature
for writer-director Josh Trank (who co-penned the script with Max "son of John"
Landis); his technical veracity is utterly mind-blowing – especially when you consider
the shoestring funds he had to work with – and his narrative pacing is impeccable.
The icing on the already yummy cake is the marvellous CGI that allows our
protagonists to fly, crush cars and stop baseballs in mid air – all seamlessly and
photo-realistically. Chronicle is a tremendous achievement in low-budget, big-
concept filmmaking.
Hinweis der Redaktion
\n
Between studies like those mentioned during yesterday’s panel on gender and the popular press, the question of how women’s online participation differs from men is a nagging one. As I mentioned during the Q&A part of the panel, I’m troubled by the conflation of genders and sexes and genders and behaviors, but, we’ve used those broad categories in this research as well. I’ll happily talk about the differences between sex and gender during the Q&A. Issues of sex and gender are messy, especially online, and our project is an attempt to use the signals available to understand a small part of what might be driving differences we see between the scale of men’s participation and the scale of women’s.\n
Well, our results suggest that women do contribute, some even profusely, but that their contributions are buried. Using data from the IMDb review site, I’ll show you that women on IMDb are adopting the majority voice in their language use, without interacting directly with other reviewers, and that their contributions, even when indistinguishable from men’s, are shoved out of view because they lose the zero-sum game of “making the first page of results”.\n
And all it takes is a few answers to this question: Was the above review useful to you?\n
This kind of question, this form of social voting, is increasingly popular. We see it on Facebook, Amazon, The Hairpin, Guitar Center, even Buzzy. Designers of many open contribution systems ask users to provide feedback about the contributions and then use that feedback to sort the long lists of contributions. We wondered about the relationship between the votes people give and some features of the contributions.\n
specifically gender and language.\n
so we set off to IMDb to find out. IMDb is a huge resource about a variety of visual media, and we focused on film. IMDb’s content comes from a variety of sources, and our study focuses on the reviews provided by registered IMDb users.\n
IMDb user reviews look like so. On IMDb, “utility” means the proportion of people who found the review useful out of all of those who voted. So here, 3 out 3 voters found this particular review of Elvis, the made-for-TV movie, useful.\n
The data we used for the study come from 200 prolific reviewers. First, we found the 250 top-rated movies, according to IMDb users. That yielded 21,012 reviews. From those reviews, we identified all the authors and selected the 100 most prolific men and 100 most prolific women. Then, we gathered all the reviews by those reviewers, and those 199,166 reviews comprise our dataset.\n
In those nearly 200,000 reviews, men dominated all measures of activity - reviews written, review length, and the difference in the number of reviews written by the most prolific of each group is four-fold - 8167 to 2061. We focused on the language used in those reviews, and focused on six linguistic features and one measure of utility.\n
We used regression analysis to determine the impact of gender and time on review’s utility and various language features. We chose these features of language because previous research has suggested or found significant differences between men and women on each measure. Utility we include because we were curious whether there were differences between men and women.\nHedges: “more of less,” “rather”, number of hedges, normalized by review length (words)\nPronoun rate (#PNs / words) \nProportion of PNs that are first person\nVocabulary richness: diversity of words used. Number of unique words / total words\nWord complexity: Character-to-word ratio\nSentence complexity: Word-to-sentence ratio\n\n
Our first hypothesis, based on a bunch of literature you can read about in our paper, was that over time, women would write more like men. We expected to see women adapt their writing to the majority voice.\n
Our first measure about language use was “hedging”. Hedges qualify the writer’s commitment to their statement. Some say they are subtle means to avoid responsibility or to obscure the facts. Some say they show politeness or lack of confidence. Either way, existing research on gender and language suggests that women use hedging words much more often than men do. Here, and in other slides where I show users’ content, I’ve left the submissions unedited (except for trimming content before and after). This line from a review of Pulp Fiction is a good example of a hedge-infested review. “Perhaps ‘Pulp Fiction’ may remain tarantino’s opus, perhaps not.” Without hedges, the line is “Pulp Fiction will remain Tarantino’s opus.”\n
Surprisingly, we found that both males and females use about the same (see, I hedged) number of hedges when they first start writing, but then, the women decrease their hedge use and men increase theirs. We definitely did not expect to see men increase their hedge use over time. In this and all the other graphs I’ll show, time is on the x-axis, and the language feature or utility is on the y-axis. Shorter red lines represent women, and longer blue lines represent men.\n\n
Something else we didn’t expect was an increase in pronouns. When looking at all pronouns - he, she, our, their - we see the gap between women’s use and men’s use that we expect here at the beginning, but then they both increase their pronoun use.\n
Not all pronouns, though. First person pronouns, as a ratio of first person to all pronouns, show marked decreases for women and slight increases for men.\n
Women show a similar drop in their vocabulary richness. Men’s vocabulary richness also decreased, and the two ended up about the same by the time authors wrote many reviews. Keep in mind that “many” here is a couple thousand for women and nearly 9K for men.\n
Sentence complexity showed similar convergence. Here I’ll illustrate sentence complexity with a couple of extremes, first a review with complex sentences and then one that’s less complex.\n
Read - see only three sentences here, but it takes up my whole slide.\n
The less complex review - READ - has 7 sentences in nearly the same amount of space.\n
Word complexity actually decreased in both groups of reviews. Remember word complexity is a measure of the character-to-word ratio, so longer words are more complex.\n
In summary, we saw females decrease their hedges, increase their pronouns, decrease first person pronouns, decrease vocabulary richness, increase sentence complexity, and decrease word complexity. Nearly all of these changes were expected since we thought they would adopt “more male” language use patterns. What we didn’t expect were the changes in language use we observed among males. An increase in hedges, especially, was surprising.\n
Over all, H1 was supported. Women did write more like men over time. \nconvergence except for hedging\nsomething interesting is happening in pronouns\nhedging surprising increase from men\n
Our second question was about whether those changes, that adaptation to the dominant voice, would be accompanied by a rise in utility awarded by readers. We expected to see women’s utility scores rise over time as they adopted the majority voice.\n
Cleary, they did. Again, women are red and men blue in this graph. What’s troubling, though, is that even though women showed marked increases in utility, they never catch up. I’ll get to why that matters, but first, a quick stats discussion.\n
Now you’ve seen all my graphs, so i can summarize. When we regress these measures, we see statistically significant main effects for all measures. Women use less rich vocabulary, less complex wording, less complex sentences, and receive lower utility scores for their trouble. in all cases except hedges and vocabulary richness, those differences also show meaningful effect sizes. Our N of nearly 200K is large enough that we were likely to see statistically significant differences, so we ran effect size calculations on each model to assess the meaning of those differences. In all cases where I report effect size, it’s a small one.\n
By now, you may be wondering about the role of time, and I can touch on that briefly.\nWe did include the number of reviews written in the regression model, and again, we saw significance for all measures. We use “number of reviews written” as a proxy for time. So, rather than measuring time in minutes or weeks, we measure it in increments of review. So, 1 review written, 2 reviews written, and so on until over 8000. \n\nHowever, only 3 measures showed measurable effective sizes: hedges, vocabulary richness, and word complexity. Hedges increased, likely because men went hedge-crazy, and both word complexity and vocabulary richness decreased. \n\nI like to think of it this way: as they write more reviews, both genders get less stuffy in their writing. It’s like being in intro to film where the first day of class everyone feels immense pressure to say something profound, but by the end of the semester, they become comfortable just speaking up.\n
So, H2 is supported, but...Women do increase their utility over time, and at a faster rate than men, but they don’t catch up. You remember I mentioned effect size results when I summarized the regressions, and we didn’t see a meaningful effect for utility for either predictor: gender or number of reviews. Normally we’d say then, “oh, well, then the statistical significance doesn’t really matter in the world.” But, yes, there’s the real but... The difference does matter in the world. Because, on IMDb and other sites that use utility to sort their information, the relative utility is all that matters, not the size of the difference. \n
When users arrive at a reviews page on IMDb, they see a screen like this one. Notice the Filter there. It says “best”. IMDb has 10 reviews per page, so that means that the first 10 reviews, the 10 “best” reviews, are most likely written by men. We already know that very few people click to the second page of any result set, and just like our tendency to stick with the first page of results buries the John Smiths who aren’t great at SEO, it buries women’s contributions to IMDb.\n
When I started this talk, I showed you a Room for Debate about women’s contributions to Wikipedia. Surveys suggested very few Wikipedians are women, and the Room for Debate was trying to make sense of those findings. Joseph Reagle argued, among other things, that we may be rationalizing women’s absence as a lack of interest on their part. Anna North wonders if solitarily editing a contribution is antisocial, and that’s why women avoid it. They, and the other debaters, including everyone who participated in yesterday’s panel, may be on to something. \n\nBut what our study shows is that even when women muster enough interest and brave enough antisocial solo editing to contribute to IMDb, the community doesn’t value their contributions, at least not as much as it value’s men’s. The lower utility scores their reviews receive push their contributions further and further from the top, further and further from eyes that might read them. Notice the passive voice here though. Women’s contributions are buried. Something must be doing the burying for that to be true.\n
IMDb does offer alternative methods for sorting reviews. I find it interesting that they use the label “Filter” for their drop down. Really, they’re asking you what criteria to use to determine the order of results, but it’s as if the label knows that by sorting, the page is effectively filtering what you’ll see. We can’t see it all; we must necessarily filter. \n\nIMDb changes the options in this drop down often. This screen shot is from last Wednesday, and the options may actually be different already. So what happens when you choose the Male/Female filter?\n
Some sort of crazy coloring happens. The background on the DIVs that hold review content get these faint pastel colors, most of which are laudably gender-neutral, to indicate the gender of their author. It makes for colorful reading, but users have to do some extra work to get here.\n
and we already know that users don’t often do that extra work. So, the design of the system - it’s sorting mechanism, it’s default display - effectively bury women’s contributions. We don’t need a systematic rejection of women’s content or overtly sexist moderation for women’s contributions to go unnoticed. A simple “Was the above review useful to you?” will do.\n
so what do we know now that this phase of the study is complete? We know that language convergence can happen even without direct interaction. We know that women receive lower utility scores, even when they write like men. And, we know that lower utility scores effectively bury women’s contributions. These results matter because they may help explain that small changes in the design of a system could produce large effects on the information accessed. And this burying effect likely plagues lots of kinds of minority voices.\n
As so often happens in research, our results imply more questions than they answer. Some of the questions we’re interested in answering are about the people involved in the system - who does the reading, writing, and voting? Why do they do it? What other kinds of information bias do we produce when using collaborative filtering and social voting mechanisms? And of course, what else is driving these results. As the paper points out, the total variance we’re able to explain is small, so there’s clearly more to the story. We’re especially curious about the effects of the objects being reviewed. So, we have questions about the people, the technology, and the objects.\n
And I’m interested to hear your questions as well.\n
\n
Controlling for movie is a good idea, one we’ve thought about and that our reviewers mentioned as well. We didn’t include it here because Jahna’s earlier work showed gender was a more significant correlate for utility. And, just to doublecheck, we looked at the reviews from some “chick flicks” last night and saw that guys wrote the highly-ranked reviews there as well.\n
\n
\n
\n
\n
Male reviewer\nReport style\n10 pronouns\n2 first person pronouns\n