SlideShare ist ein Scribd-Unternehmen logo
1 von 13
16. April 2012
                                                                                               www.know-center.at




                        Measuring the Quality of Web
                        Content using Factual
                        Information


                        WebQuality 2012 workshop
                        at WWW 2012




                     Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti, Leticia
                     Cagnina, Christopher Horn, Benno Stein and Michael Granitzer
© Know-Center 2012                                            gefördert durch das Kompetenzzentrenprogramm
Agenda


Motivation
Approach
Results
Summary and Outlook




                      2

                      © Know-Center 2012
Motivation


People‘s decisions often based on Web content
  lacking quality control, no verification

           Inaccurate, incorrect infomation
           No fact checking

Measures needed to capture credibility and quality aspects
  In respect to facts!




                                                             3

                                                             © Know-Center 2012
Approach

Measure information quality based on factual information
3 Approaches:
  Use simple statistics about the facts obtained from text
  Exploit relational information contained in facts
  Use semantic relationships like meronymy and hypernymy
First approach:
  Use simple statistical features about facts in a document
  Indicates how informative a document is
  Derive facts from Web content using Open Information
   Extraction


                                                               4

                                                               © Know-Center 2012
Definition of Factual Density


Fact Count




Factual Density




                                5

                                © Know-Center 2012
Experiments


Wikipedia: 1000 Featured and Good articles versus 1000 Non-
Featured (randomly selected)
  Featured: a comprehensive coverage of the major facts in
   the context of the article’s subject
Baseline: Word Count [Blumenstock 2008]
  Featured articles longer than non-featured
  Bias: longer docs contain more facts
Evaluation: 2 Datasets
  Unbalanced: articles differ in length
  Balanced: articles similar in length

                                                              6

                                                              © Know-Center 2012
Distributions of docs in both datasets in
respect to word count




                                            7

                                            © Know-Center 2012
Precision/Recall curves of Factual Density




                                             8

                                             © Know-Center 2012
Results
Factual Density on balanced corpus




                                     9

                                     © Know-Center 2012
Experiments – Relational Features


Approach 2: exploiting relational information contained in facts
Extract relational features from articles
  Use relations from ReVerb: binary relations (e1, relation, e2)
Use them to train a classifier to discriminate between
featured/good and non-featured




                                                                   10

                                                                   © Know-Center 2012
Experiments – Relational Features


Approach 2: exploiting relational information contained in facts
Extract relational features from articles
  Use relations from ReVerb: binary relations (e1, relation, e2)
Use them to train a classifier to discriminate between
featured/good and non-featured




                                                                   11

                                                                   © Know-Center 2012
Summary

Simple fact related measure: Factual Density
Based on Factual Density, featured/good articles can be separated
from non-featured if article length similar
If articles differ in length, word count!  For future work,
combination of both
Plan to incorporate edit history: more editors, higher factual density
Preliminary experiments with relational features
  Promising results, more work in this direction
 Goal here is to bring semantics in to the field of Information
  Quality
 We expect this to unlock several IQ dimensions, e.g. generality
  vs specificity
                                                                   12

                                                                   © Know-Center 2012
Thank you for your attention!

          Elisabeth Lex
       elex@know-center.at




                                13

                                © Know-Center 2012

Weitere ähnliche Inhalte

Ähnlich wie Measuring the Quality of Web Content using Factual Information

Ideal-Analytics - Introduction to Version 3.3
Ideal-Analytics - Introduction to Version 3.3Ideal-Analytics - Introduction to Version 3.3
Ideal-Analytics - Introduction to Version 3.3
Yamika Mehra
 
Greenwich Digital Learning Share
Greenwich Digital Learning ShareGreenwich Digital Learning Share
Greenwich Digital Learning Share
EdAdvance
 
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
IJCI JOURNAL
 
Enabled, Engaged, Empowered: The Student Vision for Digital Learning
Enabled, Engaged, Empowered: The Student Vision for Digital LearningEnabled, Engaged, Empowered: The Student Vision for Digital Learning
Enabled, Engaged, Empowered: The Student Vision for Digital Learning
Julie Evans
 
Module 6 - Communication and effective presentations
Module 6 - Communication and effective presentationsModule 6 - Communication and effective presentations
Module 6 - Communication and effective presentations
Paul Brown
 

Ähnlich wie Measuring the Quality of Web Content using Factual Information (20)

Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
 
Mastery of Common Core Assessments
Mastery of Common Core AssessmentsMastery of Common Core Assessments
Mastery of Common Core Assessments
 
SMX Landing Page Optimization
SMX Landing Page OptimizationSMX Landing Page Optimization
SMX Landing Page Optimization
 
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
 
M12S07 - Retention & ESI - Paths to Success - Part Two
M12S07 - Retention & ESI - Paths to Success - Part TwoM12S07 - Retention & ESI - Paths to Success - Part Two
M12S07 - Retention & ESI - Paths to Success - Part Two
 
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
 
Ideal-Analytics - Introduction to Version 3.3
Ideal-Analytics - Introduction to Version 3.3Ideal-Analytics - Introduction to Version 3.3
Ideal-Analytics - Introduction to Version 3.3
 
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 
Greenwich Digital Learning Share
Greenwich Digital Learning ShareGreenwich Digital Learning Share
Greenwich Digital Learning Share
 
Chapter 20 Presentation
Chapter 20 Presentation Chapter 20 Presentation
Chapter 20 Presentation
 
ai-one presentation
ai-one presentationai-one presentation
ai-one presentation
 
Community research
Community researchCommunity research
Community research
 
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
 
Enabled, Engaged, Empowered: The Student Vision for Digital Learning
Enabled, Engaged, Empowered: The Student Vision for Digital LearningEnabled, Engaged, Empowered: The Student Vision for Digital Learning
Enabled, Engaged, Empowered: The Student Vision for Digital Learning
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU Project
 
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
 
Ideal-Analytics Product Training
Ideal-Analytics Product TrainingIdeal-Analytics Product Training
Ideal-Analytics Product Training
 
Capstone Project
Capstone ProjectCapstone Project
Capstone Project
 
Module 6 - Communication and effective presentations
Module 6 - Communication and effective presentationsModule 6 - Communication and effective presentations
Module 6 - Communication and effective presentations
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Measuring the Quality of Web Content using Factual Information

  • 1. 16. April 2012 www.know-center.at Measuring the Quality of Web Content using Factual Information WebQuality 2012 workshop at WWW 2012 Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein and Michael Granitzer © Know-Center 2012 gefördert durch das Kompetenzzentrenprogramm
  • 3. Motivation People‘s decisions often based on Web content  lacking quality control, no verification  Inaccurate, incorrect infomation  No fact checking Measures needed to capture credibility and quality aspects  In respect to facts! 3 © Know-Center 2012
  • 4. Approach Measure information quality based on factual information 3 Approaches:  Use simple statistics about the facts obtained from text  Exploit relational information contained in facts  Use semantic relationships like meronymy and hypernymy First approach:  Use simple statistical features about facts in a document  Indicates how informative a document is  Derive facts from Web content using Open Information Extraction 4 © Know-Center 2012
  • 5. Definition of Factual Density Fact Count Factual Density 5 © Know-Center 2012
  • 6. Experiments Wikipedia: 1000 Featured and Good articles versus 1000 Non- Featured (randomly selected)  Featured: a comprehensive coverage of the major facts in the context of the article’s subject Baseline: Word Count [Blumenstock 2008]  Featured articles longer than non-featured  Bias: longer docs contain more facts Evaluation: 2 Datasets  Unbalanced: articles differ in length  Balanced: articles similar in length 6 © Know-Center 2012
  • 7. Distributions of docs in both datasets in respect to word count 7 © Know-Center 2012
  • 8. Precision/Recall curves of Factual Density 8 © Know-Center 2012
  • 9. Results Factual Density on balanced corpus 9 © Know-Center 2012
  • 10. Experiments – Relational Features Approach 2: exploiting relational information contained in facts Extract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2) Use them to train a classifier to discriminate between featured/good and non-featured 10 © Know-Center 2012
  • 11. Experiments – Relational Features Approach 2: exploiting relational information contained in facts Extract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2) Use them to train a classifier to discriminate between featured/good and non-featured 11 © Know-Center 2012
  • 12. Summary Simple fact related measure: Factual Density Based on Factual Density, featured/good articles can be separated from non-featured if article length similar If articles differ in length, word count!  For future work, combination of both Plan to incorporate edit history: more editors, higher factual density Preliminary experiments with relational features  Promising results, more work in this direction Goal here is to bring semantics in to the field of Information Quality We expect this to unlock several IQ dimensions, e.g. generality vs specificity 12 © Know-Center 2012
  • 13. Thank you for your attention! Elisabeth Lex elex@know-center.at 13 © Know-Center 2012