Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
LOOPS OF
HUMANS AND
BOTS IN
WIKIDATA
Elena Simperl
University of Southampton, UK
@esimperl
OVERVIEW
Wikidata is a critical AI asset
in many domains
Recent project of Wikimedia
(2012), edited
collaboratively
Our re...
WHAT IS WIKIDATA
BASIC FACTS
Collaborative knowledge graph
100k registered users, 46M items
Open licence
RDF exports, connected to Linked O...
THE KNOWLEDGE GRAPH
STATEMENTS, ITEMS, PROPERTIES
Item identifiers start with a Q, property identifiers
start with a P
5
Q...
THE KNOWLEDGE GRAPH
ITEMS CAN BE CLASSES, ENTITIES, VALUES
6
Q7259
Ada Lovelace
Q84
London
Q334155
Sadiq Khan
P6
head of g...
THE KNOWLEDGE GRAPH
ADDING CONTEXT TO STATEMENTS
Statements may include context
 Qualifiers (optional)
 References (requ...
THE KNOWLEDGE GRAPH
CO-EDITED BY BOTS AND HUMANS
Human editors can register or work anonymously
Bots created by community ...
OUR WORK
Effects of editing behaviour and community
make-up on the knowledge graph
Content quality as a function of its pr...
THE RIGHT MIX OF USERS
Piscopo, A., Phethean, C., & Simperl, E. (2017). What
Makes a Good Collaborative Knowledge Graph:
G...
BACKGROUND
Wikidata editors have varied tenure and interests
Group composition impacts outcomes
 Diversity can have multi...
OUR STUDY
Analysed the edit history of items
Corpus of 5k items, whose quality has been
manually assessed (5 levels)*
Ed...
RESEARCH HYPOTHESES
Activity Outcome
H1 Bots edits Item quality
H2 Bot-human interaction Item quality
H3 Anonymous edits I...
DATA AND METHODS
 Ordinal regression analysis, four models were trained
 Dependent variable: 5k labelled Wikidata items
...
RESULTS
ALL HYPOTHESES SUPPORTED
H1
H2
H3 H4
H5
LESSONS LEARNED
The more is not
always the
merrier
01
Bot edits are key
for quality, but
bots and humans
are better
02
Div...
IMPLICATIONS
Encourage
registration
01
Identify
further areas
for bot editing
02
Design
effective
human-bot
workflows
03
S...
LIMITATIONS AND FUTURE WORK
 Did not consider evolution of quality over time
 Sample vs Wikidata (most items C or lower)...
THE CONTENT IS AS
GOOD AS ITS
REFERENCES
Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E.
(2017). Provenance Inform...
PROVENANCE IN WIKIDATA
Statements may include context
 Qualifiers (optional)
 References (required)
Two types of referen...
THE ROLE OF PROVENANCE
Wikidata aims to become a hub of references
Provenance increases trust in Wikidata
Lack of provenan...
OUR STUDY
Approach to evaluate quality of external
references in Wikidata
Quality is defined by the Wikidata verifiability...
RESEARCH QUESTIONS
RQ1 Are Wikidata external references relevant?
RQ2 Are Wikidata external references
authoritative?
I.e...
METHODS
TWO STAGE MIXED APPROACH
1. Microtask crowdsourcing
Evaluate relevance & authoritativeness
of a reference sample
...
STAGE 1: MICROTASK CROWDSOURCING
3 tasks on Crowdflower
5 workers/task, majority voting
Test questions to select worker...
STAGE 2: MACHINE LEARNING
Compared three algorithms
 Naïve Bayes, Random Forest, SVM
Features based on [Lehmann et al., ...
DATA
1.6M external references (6% of total)
 1.4M from two sources (protein KBs)
83,215 English-language references
 Sam...
RESULTS: CROWDSOURCING
CROWDSOURCING WORKS
Trusted workers: >80% accuracy
95% of responses from T3.A confirmed in T3.B
T...
RESULTS: CROWDSOURCING
MAJORITY OF REFERENCES ARE HIGH QUALITY
2586 references evaluated
Found 1674 valid references from ...
RESULTS: CROWDSOURCING
HUMANS ARE BETTER AT EDITING REFERENCES
RQ1
RQ2
RESULTS: CROWDSOURCING
DATA FROM GOVERNMENT AND ACADEMIC SOURCES
Most common author type (T2)
 Organisation (78%)
Most co...
RESULTS: MACHINE LEARNING
RANDOM FORESTS PERFORM BEST
F1 MCC
Relevance
Baseline 0.84 0.68
Naïve Bayes 0.90 0.86
Random For...
LESSONS LEARNED
Crowdsourcing+ML works!
Many external sources are high quality
Bad references mainly non-working links,
co...
LIMITATIONS AND FUTURE WORK
Studies with non-English sources
Did not consider internal references
Deployment in Wikidata, ...
FROM NEURAL
NETWORKS TO A
MULTILINGUAL
WIKIPEDIA
Kaffee, L., Elsahar, H., Vougiouklis, P., Gravier, C.,
Laforest, F., Hare...
BACKGROUND
Wikipedia is available in 287
languages, but content is unevenly
distributed
Wikidata is cross-lingual
ArticleP...
OUR STUDY
Enrich ArticlePlaceholders with textual
summaries generated from Wikidata
triples
Train a neural network to gene...
RESEARCH QUESTIONS
RQ1 Can we automatically generate summaries
that match the quality and feel of Wikipedia in
different l...
APPROACH
NEURAL NETWORK TRAINED ON WIKIDATA/WIKIPEDIA
Feed-forward architecture
encodes triples from the
ArticlePlaceholde...
EVALUATION
AUTOMATIC EVALUATION
Trained on corpus of Wikipedia sentences and
corresponding Wikidata triples (205k Arabic;
...
EVALUATION
USER STUDIES
Two 15 days online surveys with readers and
editors of the Arabic and Esperanto Wikipedia’s
 Read...
RESULTS: AUTOMATIC EVALUATION
APPROACH OUTPERFORMS BASELINES
RESULTS: USER STUDIES
SUMMARIES ARE USEFUL FOR THE COMMUNITY
Readers study
Editors study
LIMITATIONS AND
FUTURE WORK
No easy way to test whether
summaries would indeed
lead to more participation
on underserved W...
CONCLUSIONS
45
SUMMARY OF FINDINGS
Collaboration between human and bots is important
Tools needed to identify tasks for bots and continuo...
Nächste SlideShare
Wird geladen in …5
×

Loops of humans and bots in Wikidata

162 Aufrufe

Veröffentlicht am

Invited talk at the HumL workshop at The Web Conference, April 2018

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Loops of humans and bots in Wikidata

  1. 1. LOOPS OF HUMANS AND BOTS IN WIKIDATA Elena Simperl University of Southampton, UK @esimperl
  2. 2. OVERVIEW Wikidata is a critical AI asset in many domains Recent project of Wikimedia (2012), edited collaboratively Our research assesses the quality of Wikidata and the link between community processes and quality
  3. 3. WHAT IS WIKIDATA
  4. 4. BASIC FACTS Collaborative knowledge graph 100k registered users, 46M items Open licence RDF exports, connected to Linked Open Data Cloud
  5. 5. THE KNOWLEDGE GRAPH STATEMENTS, ITEMS, PROPERTIES Item identifiers start with a Q, property identifiers start with a P 5 Q84 London Q334155 Sadiq Khan P6 head of government
  6. 6. THE KNOWLEDGE GRAPH ITEMS CAN BE CLASSES, ENTITIES, VALUES 6 Q7259 Ada Lovelace Q84 London Q334155 Sadiq Khan P6 head of government Q727 Amsterdam Q515 city Q6581097 male Q59360 Labour party Q145 United Kingdom
  7. 7. THE KNOWLEDGE GRAPH ADDING CONTEXT TO STATEMENTS Statements may include context  Qualifiers (optional)  References (required) Two types of references  Internal, linking to another item  External, linking to webpage 7 Q84 London Q334155 Sadiq Khan P6 head of government 9 May 2016 https://www.london.gov.uk/...
  8. 8. THE KNOWLEDGE GRAPH CO-EDITED BY BOTS AND HUMANS Human editors can register or work anonymously Bots created by community for routine tasks 18k active human users, 200+ bots
  9. 9. OUR WORK Effects of editing behaviour and community make-up on the knowledge graph Content quality as a function of its provenance Tools to improve content diversity
  10. 10. THE RIGHT MIX OF USERS Piscopo, A., Phethean, C., & Simperl, E. (2017). What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata. International Conference on Social Informatics, 305- 322, Springer.
  11. 11. BACKGROUND Wikidata editors have varied tenure and interests Group composition impacts outcomes  Diversity can have multiple effects  Moderate tenure diversity increases outcome quality  Interest diversity leads to increased group productivity Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivityand member withdrawalin online volunteer groups. In: Proceedingsof the 28th international conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)
  12. 12. OUR STUDY Analysed the edit history of items Corpus of 5k items, whose quality has been manually assessed (5 levels)* Edit history focused on community make-up Community is defined as set of editors of item Considered features from group diversity literature and Wikidata-specific aspects *https://www.wikidata.org/wiki/Wikidata:Item_quality
  13. 13. RESEARCH HYPOTHESES Activity Outcome H1 Bots edits Item quality H2 Bot-human interaction Item quality H3 Anonymous edits Item quality H4 Tenure diversity Item quality H5 Interest diversity Item quality
  14. 14. DATA AND METHODS  Ordinal regression analysis, four models were trained  Dependent variable: 5k labelled Wikidata items  Independent variables  Proportion of bot edits  Bot human edit proportion  Proportion of anonymous edits  Tenure diversity: Coefficient of variation  Interest diversity: User editing matrix  Control variables: group size, item age
  15. 15. RESULTS ALL HYPOTHESES SUPPORTED H1 H2 H3 H4 H5
  16. 16. LESSONS LEARNED The more is not always the merrier 01 Bot edits are key for quality, but bots and humans are better 02 Diversity matters 03
  17. 17. IMPLICATIONS Encourage registration 01 Identify further areas for bot editing 02 Design effective human-bot workflows 03 Suggest items to edit based on tenure and interests 04
  18. 18. LIMITATIONS AND FUTURE WORK  Did not consider evolution of quality over time  Sample vs Wikidata (most items C or lower)  Other group features (e.g., coordination) not considered  No distinction between editing activities (e.g., schema vs instances, topics etc.)  Different metrics of interest (topics, type of activity) 18
  19. 19. THE CONTENT IS AS GOOD AS ITS REFERENCES Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E. (2017). Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References. International Semantic Web Conference, 542-558, Springer. 19
  20. 20. PROVENANCE IN WIKIDATA Statements may include context  Qualifiers (optional)  References (required) Two types of references  Internal, linking to another item  External, linking to webpage Q84 London Q334155 Sadiq Khan P6 head of government 9 May 2016 https://www.london.gov.uk/...
  21. 21. THE ROLE OF PROVENANCE Wikidata aims to become a hub of references Provenance increases trust in Wikidata Lack of provenance hinders content reuse Quality of references is yet unknown Hartig, O. (2009). Provenance Information in the Web of Data. LDOW, 538.
  22. 22. OUR STUDY Approach to evaluate quality of external references in Wikidata Quality is defined by the Wikidata verifiability policy  Relevant: support the statement they are attached to  Authoritative: trustworthy, up-to-date, and free of bias for supporting a particular statement Large-scale (the whole of Wikidata) Bot vs. human-contributed references
  23. 23. RESEARCH QUESTIONS RQ1 Are Wikidata external references relevant? RQ2 Are Wikidata external references authoritative? I.e., do they match the author and publisher types from the Wikidata policy? RQ3 Can we automatically detect non-relevant and non-authoritative references?
  24. 24. METHODS TWO STAGE MIXED APPROACH 1. Microtask crowdsourcing Evaluate relevance & authoritativeness of a reference sample Create training set for machine learning model 2. Machine learning Large-scale reference quality prediction RQ1 RQ2 RQ3
  25. 25. STAGE 1: MICROTASK CROWDSOURCING 3 tasks on Crowdflower 5 workers/task, majority voting Test questions to select workers 25 Feature Microtask Description Relevance T1 Does the reference support the statement? Authoritativeness T2 Choose author type from list T3.A Choose publisher type from list T3.B Verify publisher type, then choose sub-type from list RQ1 RQ2
  26. 26. STAGE 2: MACHINE LEARNING Compared three algorithms  Naïve Bayes, Random Forest, SVM Features based on [Lehmann et al., 2012 & Potthast et al. 2008] Baseline: item labels matching (relevance); deprecated domains list (authoritativeness) RQ3 Features URL reference uses Subject parent class Source HTTP code Property parent class Statement item vector Object parent class Statement object vector Author type Author activity Author activity on references
  27. 27. DATA 1.6M external references (6% of total)  1.4M from two sources (protein KBs) 83,215 English-language references  Sample of 2586 (99% conf., 2.5% m. of error)  885 assessed automatically, e.g., links not working or csv files
  28. 28. RESULTS: CROWDSOURCING CROWDSOURCING WORKS Trusted workers: >80% accuracy 95% of responses from T3.A confirmed in T3.B Task No. of microtasks Total workers Trusted workers Workers’ accuracy Fleiss’ k T1 1701 references 457 218 75% 0.335 T2 1178 links 749 322 75% 0.534 T3.A 335 web domains 322 60 66% 0.435 T3.B 335 web domains 239 116 68% 0.391
  29. 29. RESULTS: CROWDSOURCING MAJORITY OF REFERENCES ARE HIGH QUALITY 2586 references evaluated Found 1674 valid references from 345 domains Broken URLs deemed not relevant and not authoritative RQ1 RQ2
  30. 30. RESULTS: CROWDSOURCING HUMANS ARE BETTER AT EDITING REFERENCES RQ1 RQ2
  31. 31. RESULTS: CROWDSOURCING DATA FROM GOVERNMENT AND ACADEMIC SOURCES Most common author type (T2)  Organisation (78%) Most common publisher types (T3)  Governmental agencies (37%)  Academic organisations (24%) RQ2
  32. 32. RESULTS: MACHINE LEARNING RANDOM FORESTS PERFORM BEST F1 MCC Relevance Baseline 0.84 0.68 Naïve Bayes 0.90 0.86 Random Forest 0.92 0.89 SVM 0.91 0.87 Authoritativeness Baseline 0.53 0.16 Naïve Bayes 0.86 0.78 Random Forest 0.89 0.83 SVM 0.89 0.79 RQ3
  33. 33. LESSONS LEARNED Crowdsourcing+ML works! Many external sources are high quality Bad references mainly non-working links, continuous control required Lack of diversity in bot-added sources Humans and bots are good at different things
  34. 34. LIMITATIONS AND FUTURE WORK Studies with non-English sources Did not consider internal references Deployment in Wikidata, including changes in editing behaviour
  35. 35. FROM NEURAL NETWORKS TO A MULTILINGUAL WIKIPEDIA Kaffee, L., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J., & Simperl, E. (2018) Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders. European Semantic Web Conference, to appear. Springer 35
  36. 36. BACKGROUND Wikipedia is available in 287 languages, but content is unevenly distributed Wikidata is cross-lingual ArticlePlaceholders display Wikidata triples as stubs for articles in underserved Wikipedia’s Currently deployed in 11 Wikipedia’s
  37. 37. OUR STUDY Enrich ArticlePlaceholders with textual summaries generated from Wikidata triples Train a neural network to generate one sentence summaries resembling the opening paragraph of a Wikipedia article Test the approach on two languages, Esperanto and Arabic with readers and editors of those Wikipedia’s
  38. 38. RESEARCH QUESTIONS RQ1 Can we automatically generate summaries that match the quality and feel of Wikipedia in different languages? RQ2 Are summaries useful for the communities editing underserved Wikipedia’s?
  39. 39. APPROACH NEURAL NETWORK TRAINED ON WIKIDATA/WIKIPEDIA Feed-forward architecture encodes triples from the ArticlePlaceholder into vector of fixed dimensionality RNN-based decoder generates text summaries, one token at a time Optimisations for different entity verbalisations, rare entities etc.
  40. 40. EVALUATION AUTOMATIC EVALUATION Trained on corpus of Wikipedia sentences and corresponding Wikidata triples (205k Arabic; 102k Esperanto) Tested against three baselines: machine translation (MT) and template retrieval (TR, TRext) Using standard metrics: BLEU, METEOR, ROUGEL RQ1
  41. 41. EVALUATION USER STUDIES Two 15 days online surveys with readers and editors of the Arabic and Esperanto Wikipedia’s  Readers survey  60 articles (30 ours, 15 news items, 15 Wikipedia summaries from the training corpus)  Fluency: Is the text understandable and grammatically correct?  Appropriateness: Does the summary ‘feel’ like a Wikipedia article?  Editors survey  30 automatically generated summaries  Editors were asked to edit the article starting from our summary (2-3 sentences)  Measured the extent to which the summary was reused (Greedy String Tiling – GST – metric) RQ1 RQ2
  42. 42. RESULTS: AUTOMATIC EVALUATION APPROACH OUTPERFORMS BASELINES
  43. 43. RESULTS: USER STUDIES SUMMARIES ARE USEFUL FOR THE COMMUNITY Readers study Editors study
  44. 44. LIMITATIONS AND FUTURE WORK No easy way to test whether summaries would indeed lead to more participation on underserved Wikipedia’s Wikidata itself needs more multilingual labels Ongoing Wikipedia study: ask editors of Wikipedia articles opportunistically to add missing labels of relevant Wikidata items and properties
  45. 45. CONCLUSIONS 45
  46. 46. SUMMARY OF FINDINGS Collaboration between human and bots is important Tools needed to identify tasks for bots and continuously study their effects on outcomes and community Quality is a complex concept, we studied only a subset of aspects References are high quality, though biases exist in terms of choice of sources Automatically created content is useful to editors of underserved Wikipedia’s

×