Weitere ähnliche Inhalte
Ähnlich wie Measuring the Quality of Web Content using Factual Information (20)
Kürzlich hochgeladen (20)
Measuring the Quality of Web Content using Factual Information
- 1. 16. April 2012
www.know-center.at
Measuring the Quality of Web
Content using Factual
Information
WebQuality 2012 workshop
at WWW 2012
Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti, Leticia
Cagnina, Christopher Horn, Benno Stein and Michael Granitzer
© Know-Center 2012 gefördert durch das Kompetenzzentrenprogramm
- 3. Motivation
People‘s decisions often based on Web content
lacking quality control, no verification
Inaccurate, incorrect infomation
No fact checking
Measures needed to capture credibility and quality aspects
In respect to facts!
3
© Know-Center 2012
- 4. Approach
Measure information quality based on factual information
3 Approaches:
Use simple statistics about the facts obtained from text
Exploit relational information contained in facts
Use semantic relationships like meronymy and hypernymy
First approach:
Use simple statistical features about facts in a document
Indicates how informative a document is
Derive facts from Web content using Open Information
Extraction
4
© Know-Center 2012
- 6. Experiments
Wikipedia: 1000 Featured and Good articles versus 1000 Non-
Featured (randomly selected)
Featured: a comprehensive coverage of the major facts in
the context of the article’s subject
Baseline: Word Count [Blumenstock 2008]
Featured articles longer than non-featured
Bias: longer docs contain more facts
Evaluation: 2 Datasets
Unbalanced: articles differ in length
Balanced: articles similar in length
6
© Know-Center 2012
- 10. Experiments – Relational Features
Approach 2: exploiting relational information contained in facts
Extract relational features from articles
Use relations from ReVerb: binary relations (e1, relation, e2)
Use them to train a classifier to discriminate between
featured/good and non-featured
10
© Know-Center 2012
- 11. Experiments – Relational Features
Approach 2: exploiting relational information contained in facts
Extract relational features from articles
Use relations from ReVerb: binary relations (e1, relation, e2)
Use them to train a classifier to discriminate between
featured/good and non-featured
11
© Know-Center 2012
- 12. Summary
Simple fact related measure: Factual Density
Based on Factual Density, featured/good articles can be separated
from non-featured if article length similar
If articles differ in length, word count! For future work,
combination of both
Plan to incorporate edit history: more editors, higher factual density
Preliminary experiments with relational features
Promising results, more work in this direction
Goal here is to bring semantics in to the field of Information
Quality
We expect this to unlock several IQ dimensions, e.g. generality
vs specificity
12
© Know-Center 2012
- 13. Thank you for your attention!
Elisabeth Lex
elex@know-center.at
13
© Know-Center 2012