The talk I gave at Communities and Technologies 2009 on using a hyperlingual methodology to identify cultural diversity in the knowledge representations in Wikipedia. Paper at http://www.brenthecht.com/papers/bhecht_CommAndTech2009.pdf
21. Introduction
• self-focus bias
• effect of community-held opinions and interests
on the world knowledge in Wikipedia
• if it exists, both positive and negative
23. Introduction
terms and concepts
subset of the English Wikipedia Article Graph (WAG)
24. Introduction
terms and concepts
subset of the English Wikipedia Article Graph (WAG)
25. Introduction
terms and concepts
• “Barack Obama” has 2
inlinks
• “Barack Obama” has an
indegree of 2
subset of the English Wikipedia Article Graph (WAG)
26. Introduction
terms and concepts
subset of the English Wikipedia Article Graph (WAG)
27. Introduction
terms and concepts
• indegree → what people
are writing about
• indegree → relatedness
to sum of world knowledge
in each Wikipedia
subset of the English Wikipedia Article Graph (WAG)
28. Introduction
terms and concepts
The United
States Joe Biden
Barack Obama
subset of the English Wikipedia Article Graph (WAG)
29. Introduction
terms and concepts
The United
States Joe Biden
• indegree → what people
are writing about
• indegree → relatedness
to sum of world knowledge
in each Wikipedia
Barack Obama
subset of the English Wikipedia Article Graph (WAG)
31. Study 1
methods
definition of focus
• focus = indegree in Wikipedia Article
Graph (WAG)
32. Study 1
methods
definition of focus
• focus = indegree in Wikipedia Article
Graph (WAG)
• greater indegree = greater focus
33. Study 1
methods
definition of focus
• focus = indegree in Wikipedia Article
Graph (WAG)
• greater indegree = greater focus
• compare across 15 Wikipedias
34. Study 1
methods
definition of focus
Jonathan Interstate Jonathan
Frakes Pennsylvania Frakes Pennsylvania
99
Penn State Université d'État de
University Pennsylvanie
indegree = 3 indegree = 1
English Wikipedia French Wikipedia
36. Study 1
methods
definition of focus
Chez Ashton French Fries Cheddar Cheddar
Cheese Chez Ashton French Fries
Cheese
Poutine Poutine
indegree = 0 indegree = 3
English Wikipedia French Wikipedia
42. Study 1
methods
sample and statistic
• statistic = spatial indegree sums
43. Study 1
methods
sample and statistic
Flying Finn
Airline
Finland
44. Study 1
methods
sample and statistic
Flying Finn
Airline
Rovaniemi
Finland Helsinki
45. Study 1
methods
sample and statistic
Sub-arctic Sub-arctic
Flying Finn
Climate Climate
Airline
Rovaniemi
Finland Helsinki
Finno-Urgic
Linus Torvalds Languages
46. Study 1
methods
sample and statistic
Sub-arctic Sub-arctic
Flying Finn
Climate Climate
Airline
Rovaniemi
• Finland has an
indegree sum = 4
Finland Helsinki
Finno-Urgic
Linus Torvalds Languages
48. Study 1
null hypothesis
H0: Indegree sums will have roughly the
same distribution in every Wikipedia
49. Study 1
null hypothesis
H0: Indegree sums will have roughly the
same distribution in every Wikipedia
All Wikipedias agree on focus distribution
50. Study 1
null hypothesis
H0: Indegree sums will have roughly the
same distribution in every Wikipedia
All Wikipedias agree on focus distribution
Self-focus bias does not exist
52. Study 1
self-focus hypothesis
H1: Each language’s Wikipedia will have
higher indegree sums in countries where
the language is prominent
53. Study 1
self-focus hypothesis
H1: Each language’s Wikipedia will have
higher indegree sums in countries where
the language is prominent
Each Wikipedia will demonstrate greater
focus on its language’s culture hearth
54. Study 1
self-focus hypothesis
H1: Each language’s Wikipedia will have
higher indegree sums in countries where
the language is prominent
Each Wikipedia will demonstrate greater
focus on its language’s culture hearth
Self-focus bias exists
58. Study I
results
Country Indegree Sum
Germany 718,668
United States 114,720
France 110,554
Switzerland 103,387
Austria 95,986
Italy 93,116
German Wikipedia
59. Study I
results
Country Indegree Sum
Finland 55,331
United States 25,664
Germany 11,972
Russia 10,076
United Kingdom 9,402
Italy 7,948
Finnish Wikipedia
60. Study I
results
Country Indegree Sum
Japan 453,048
Italy 70,922
United States 60,384
China 37,208
Germany 25,276
United Kingdom 20,690
61. Study I
results
Country Indegree Sum
Japan 453,048
Italy 70,922
United States 60,384
China 37,208
Germany 25,276
United Kingdom 20,690
Japanese Wikipedia
66. Study I
results
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
67. Study I
results
Country Indegree Sum
Y United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
68. Study I
results
Country Indegree Sum
Y United States 1,366,261
Y United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
69. Study I
results
Country Indegree Sum
Y United States 1,366,261
Y United Kingdom 439,582
Y France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
70. Study I
results
Country Indegree Sum
Y United States 1,366,261
Y United Kingdom 439,582
N France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
71. Study I
results
Country Indegree Sum
Y United States 1,366,261
Y United Kingdom 439,582
N France 189,698
N Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
72. Study I
results
Country Indegree Sum
Y United States 1,366,261
Y United Kingdom 439,582
N France 189,698
N Germany 151,503
Y Canada 146,191
Italy 129,133
English Wikipedia
73. Study I
results
Country Indegree Sum
Y United States 1,366,261
Y United Kingdom 439,582
N France 189,698
N Germany 151,503
Y Canada 146,191
N Italy 129,133
English Wikipedia
74. Study I
results
Country Indegree Sum
Y United States 1,366,261
Y United Kingdom 439,582
N France 189,698
N Germany 151,503
Y Canada 146,191
N Italy 129,133
English Wikipedia
75. Study I
results
Country Indegree Sum
United States 1,366,261
Y United Kingdom 439,582
N France 189,698
N Germany 151,503
Y Canada 146,191
N Italy 129,133
English Wikipedia
76. Study I
results
Country Indegree Sum
Num United States 1,366,261
Y United Kingdom 439,582
N France 189,698
N Germany 151,503
Y Canada 146,191
N Italy 129,133
English Wikipedia
77. Study I
results
Country Indegree Sum
Num United States 1,366,261
Y United Kingdom 439,582
France 189,698
N Germany 151,503
Y Canada 146,191
N Italy 129,133
English Wikipedia
78. Study I
results
Country Indegree Sum
Num United States 1,366,261
Y United Kingdom 439,582
Den France 189,698
N Germany 151,503
Y Canada 146,191
N Italy 129,133
English Wikipedia
79. Study I
results
USA 1,366,261
SFR(W English ) = = = 7.2
France 189,698
80. Study I
results
Language Self-focus Ratio
English 7.2
Japanese 6.4
German 6.3
French 4.2
Italian 3.6
Catalan 2.9
Spanish 2.4
Finnish 2.2
Polish 1.7
Norwegian 1.4
Chinese 1.2
Dutch 0.7
Swedish 0.6
Portuguese 0.3
81. Study 1I
methods
sample and statistic
• sample = geographic articles
• statistic = spatial indegree sums
82. Study 1I
methods
sample and statistic
• sample = geographic articles
• statistic = spatial indegree sums
90. Discussion
hyperlingual approach
• 15 Wikipedias (22)
• over 8 million articles
• over 270 million links
• English less than 1/4 the
data
91. Discussion
hyperlingual approach
• 15 Wikipedias (22)
• over 8 million articles
• over 270 million links
• English less than 1/4 the
data
• it was “easy” with
WikAPIdia software
93. Discussion
hyperlingual approach
• general benefits
94. Discussion
hyperlingual approach
• general benefits
• similarities → more robust findings
95. Discussion
hyperlingual approach
• general benefits
• similarities → more robust findings
• differences → cultural diversity
96. Discussion
hyperlingual approach
• general benefits
• similarities → more robust findings
• differences → cultural diversity
• mine cultural diversity
97. Discussion
hyperlingual approach
• general benefits
• similarities → more robust findings
• differences → cultural diversity
• mine cultural diversity
• “culturally-aware applications”
98. Discussion
hyperlingual approach
• general benefits
• similarities → more robust findings
• differences → cultural diversity
• mine cultural diversity
• “culturally-aware applications”
• very rarely in literature
100. Conclusion
Cliffs Notes
1. self-focus is a systemic bias in Wikipedia
• people reorient world knowledge
around themselves
• many implications for technologies
103. Conclusion
Cliffs Notes
1. self-focus is a systemic bias in Wikipedia
• people reorient world knowledge
around themselves
• many implications for technologies
2. hyperlingual approach proved very
useful
104. Acknowledgements
Nada Petrović
Colleagues at the Collabolab
NSF #0705901
Microsoft Research
Contact Info
brent@u.northwestern.edu
www.brenthecht.com