Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

RecSys2018論文読み会 資料

823 Aufrufe

Veröffentlicht am

Exploring Author Gender in Book Rating and Recommendation
M. D. Ekstrand, M. Tian, M. R. I. Kazi, H. Mehrpouyan, and D. Kluver
https://doi.org/10.1145/3240323.3240373


RecSys2018 論文読み会 (2018-11-17) https://atnd.org/events/101334

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

RecSys2018論文読み会 資料

  1. 1. Exploring Author Gender in Book Rating and Recommendation M. D. Ekstrand et al. 1
  2. 2. 2 
 
 

  3. 3. 3
  4. 4. 4 RecSys ’18, October 2–7, 2018, Vancouver, BC, Canada u unu µ ¯ua ua ¯ua ¯nua ba sa a u 2 U a 2 A u u a
  5. 5. 5 RecSys ’18, October 2–7, 2018, Vancouver, BC, Can u unu µ ¯ua ¯ua ba sa u 2 U Binomial(nu, θu)NegBinomial(ν, γ) logit(θu) Normal(μ, σ)
  6. 6. 6 ober 2–7, 2018, Vancouver, BC, Canada u u µ ¯ua ua ¯ua ¯nua ba sa a a 2 A Table Variab n ¯nu ¯u logit( ) Normal( + logit( ), 2)<latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit>
  7. 7. 7 btain author information from (VIAF)3, a directory of author ity records from the Library of und the world. Author gender s for many records. mployed by the VIAF is exible ender identities, supporting an es for the validity of an identity. se exibility — all its assertions This is a signicant limitation on 5.1. book data with rating data by ve data linking coverage, and works instead of individual edi- m a bipartite graph of ISBNs and “edition” records, and OpenLi- e) and consider each connected ess than 1% of ratings) this caus- or a book; we resolve multiple ir ratings. VIAF do not share linking iden- hority records by author name. ontain multiple name entries, izations of the author’s name. arry multiple known forms of ng names to improve matching ng both “Last, First” and “First e all VIAF records containing a d names for the rst author of n a book’s cluster. If all records hor’s gender agree, we take that ontradicting gender statements, as “ambiguous”. ure good coverage while main- Table 2: Summary of rating data BookCrossing Amazon Ratings 1,149,780 22,507,155 Users 105,283 8,026,324 Rated ISBNs/ASINs 340,554 2,330,066 Rated ‘Books’ 295,935 2,286,656 Matched Books 240,255 1,083,066 Known-Gender Books 166,928 616,317 Female-Author Books 66,524 181,850 Male-Author Books 100,404 434,467 % Female Books 39.9% 29.5% % Female Ratings 45.3% 36.2% BXA BXE LOC AZ fem ale m ale am biguousunknow nunlinked fem ale m ale am biguousunknow nunlinked 0% 20% 40% 60% 0% 10% 20% 30% 40% 0% 10% 20% 30% 40% 0% 10% 20% 30% 40% Linking Result CoveragePercent Scope Books Ratings Figure 1: Results of data linking and gender resolution. LOC is the set of books with Library of Congress records; other panes are the results of linking rating data.
  8. 8. 8 dependent TAN 2.17.3 each per- We report arameters h existing acterizing nalyze the Tables 1– sample of nders are in our cat- has a more ookCross- wn-gender oportions (est. sd log odds) 1.03 1.11 1.77 Posterior Mean 0.42 0.40 0.37 Std. Dev. 0.23 0.23 0.28 AZBXABXE 0.00 0.25 0.50 0.75 1.00 0 1 2 3 4 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Proportion of Female Authors Density Method Estimated θ Observed y/n Predicted y/n Figure 4: Distribution of user author-gender tendencies. His- togram shows observed proportions; lines show kernel den- sities of estimated tendencies ( 0) along with observed and predicted proportions. and Figure 4 shows the distribution of observed author gender
  9. 9. 9 Users Dist. Items % Dist. Users Dist. Items % Dist. Users Dist. Items % Dist. Users Dist. Items % Dist. Prole 1,000 35,187 66.5 1,000 24,913 73.6 1,000 27,525 88.2 1,000 27,525 88.2 UserUser 1,000 6,007 12.0 988 6,235 12.7 1,000 15,343 30.7 939 25,853 55.1 ItemItem 1,000 21,282 42.6 997 10,174 20.4 999 33,363 67.7 999 22,360 45.6 MF 1,000 140 0.3 1,000 264 0.5 1,000 164 0.3 1,000 651 1.3 PF 1,000 1,506 3.0 1,000 4,105 8.2 1,000 2,746 5.4 1,000 3,538 7.0 AZ (Explicit) AZ (Implicit) BXA BXE UserUserItemItemMFPF 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0 1 2 3 0 1 2 3 4 0 10 20 0 1 2 3 4 Proportion of Books by Female Authors Density Mean Algorithm Popular Profile Method Observed Predicted Figure 5: Posterior densities of recommender biases from integrated regression model. proportions. The ripples in predicted and observed proportions are due to the commonality of 5-item user proles, for which there are only 6 possible proportions; estimated tendency ( ) smooths them out. This smoothing, along with avoiding estimated extreme biases based on limited data, are why we nd it useful to estimate tenden- cy instead of directly computing statistics on observed proportions. To support direct comparison of the densities of observations and predictions, we resampled observed proportions with replacement to yield 10,000 observations. We observe a population tendency to rate male authors more frequently than female authors in all data sets (µ 0), but to rate female authors more frequently than they would be rated were users drawing books uniformly at random from the available set. The average user author-gender tendency is slightly closer to an even balance than the set of rated books. We also found a large diversity amongst users about their estimated tendencies (s.d. of Table 6: Mean / SD of rec. list female author proportions. BXA BXE AZ (Implicit) AZ (Explicit) Popular 0.458 0.500 0.364 0.364 Rating — 0.383 — 0.222 UserUser 0.399 / 0.180 0.435 / 0.190 0.315 / 0.186 0.367 / 0.278 ItemItem 0.465 / 0.200 0.348 / 0.124 0.351 / 0.245 0.389 / 0.336 MF 0.134 / 0.027 0.334 / 0.039 0.468 / 0.079 0.418 / 0.124 PF 0.372 / 0.208 0.429 / 0.177 0.374 / 0.144 0.394 / 0.177 basic coverage statistics of these algorithms along with correspond- ing user prole statistics. Users for which an algorithm could not produce recommendations are rare. We also computed the extent to which algorithms recommend dierent items to dierent users; “% Dist.” is the percentage of all recommendations that were distinct items. Algorithms that repeatedly recommend the same items will
  10. 10. 10 BXE -0.139 0.162 0.906 -0.573 0.129 0.531 -0.652 0.002 0.161 -0.166 0.298 0.772 (-0.20,-0.08) (0.10,0.22) (0.87,0.95) (-0.61,-0.54) (0.09,0.16) (0.51,0.56) (-0.66,-0.64) (-0.01,0.01) (0.15,0.17) (-0.22,-0.11) (0.25,0.35) (0.74,0.81) AZ (Implicit) -0.127 0.688 0.715 0.094 0.863 0.895 -0.244 0.011 0.364 -0.224 0.287 0.537 (-0.19,-0.06) (0.65,0.73) (0.68,0.76) (0.02,0.17) (0.81,0.92) (0.84,0.95) (-0.27,-0.22) (-0.00,0.02) (0.35,0.38) (-0.26,-0.18) (0.26,0.31) (0.51,0.56) AZ (Explicit) -0.580 0.322 0.681 -0.380 0.438 0.852 -0.117 0.006 0.273 -0.403 0.141 0.525 (-0.63,-0.53) (0.29,0.35) (0.65,0.71) (-0.44,-0.32) (0.40,0.48) (0.81,0.89) (-0.14,-0.10) (-0.00,0.02) (0.26,0.29) (-0.44,-0.37) (0.12,0.16) (0.50,0.55) AZ (Explicit) AZ (Implicit) BXA BXE UserUserItemItemMFPF 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Profile Proportion of Female Authors RecommenderProportionofFemaleAuthors Figure 6: Scatter plots and regression curves for recommender response to individual users. more concentrated. In the BookCrossing data, it tends to favor male authors more than the underlying data would support; in implic- it feedback mode, it is highly biased towards male authors with respect even to the baseline distributions. 4.4 From Proles to Recommendations Our extended Bayesian model (Section 3.4.2) allows us to address RQ4: the extent to which our algorithms propagate individual users’ tendencies into their recommendations (RQ4). Figure 5 shows the posterior predictive and observed densities of recommender author-gender tendencies, and Figure 6 shows scatter plots of observed recommendation proportions against user prole proportions with regression curves (regression lines in log- place. Visual inspection of the scatter plot suggests that there is a strong component with consistent tendencies, but the regression may accurately model the remaining users. Future work will use a model that can better account for some global consistency. 4.5 Summary RQ1 — Baseline Gender Distribution Known books are sig- nicantly more likely to be written by men than by women; representation among rated books is more balanced. RQ2 — User Input Gender Distributions User are diuse in their rating tendencies, with an overall trend favoring male authors but less strongly than the baseline distribution. RQ3 — Recommender Output Distributions Dierent CF

×