SlideShare a Scribd company logo
1 of 25
Download to read offline
Getting the Most Out of Social Annotations for Web
                Page Classification
                       DocEng 2009


       Arkaitz Zubiaga, Raquel Mart´
                                   ınez, V´
                                          ıctor Fresno

                    NLP & IR Group @ UNED


                   September 16th, 2009
Introduction


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   2 / 25
Introduction


What is Web Page Classification?


      We have a set of documents:

                                      D = {d1 , ..., d|D| }

      And a set of predefined categories:

                                      C = {c1 , ..., c|C | }

      Web page classification is known as:

                                        dj , ci ∈ D × C




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   3 / 25
Introduction


What are Social Bookmarking Sites? (I)



        Web sites that allow us to save web links, defining metadata to them.
              Delicious1




   1
       http://delicious.com
Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   4 / 25
Introduction


What are Social Bookmarking Sites? (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   5 / 25
Introduction


Social Annotations



      Tags: Keywords. E.g., photography, web2.0, images.
      Notes: Free texts describing web pages. E.g., Flickr is a website for
      photo sharing and photo online management.
      Highlights: Selecting relevant parts of a page.
      Reviews: Free texts with subjective descriptions. E.g., Interesting
      web page with photos.
      Ratings: Gradings. E.g., 1 to 5.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   6 / 25
Introduction


Motivation




      Classical web page classification methods rely on web pages’ content.
      Motivation: Could social annotations help improving the results?




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   7 / 25
Introduction


Related Work




      Some works (Bao et al., 2007; Heymann et al., 2008) show the
      usefulness of tags for information retrieval.
      (Ramage et al., 2009) show that tags can improved clustering tasks.
      (Noll and Meinell, 2008) make a study on tags, concluding that they
      could be interesting for web page classification tasks.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   8 / 25
Dataset


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   9 / 25
Dataset


Dataset
       December 2008 - January 2009: monitoring URLs with more than
       100 users annotating it on Delicious’ recent feed.
              87,096 URLs.
       Their classification on the Open Directory Project2 (ODP).
              12,616 URLs matching.
              17 first-level categories.
              Unbalanced.
       Annotations retrieval:
              Number of users annotating it3 .
              Top 10 list of tags3 .
              Full Tag Activity (FTA)3 .
              Notes3 .
              Reviews4 .
              Highlights5 .
   2
     http://www.dmoz.org
   3
     Delicious
   4
     StumbleUpon - http://www.stumbleupon.com
   5
     Diigo - http://diigo.com
Zubiaga, Mart´
             ınez, Fresno (UNED)    Social Annotations for WPC   September 16th, 2009   10 / 25
Experiments


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   11 / 25
Experiments


Configuration




        Support Vector Machines (SVM).
              SVMmulticlass6
        Evaluation: Accuracy.
        Several training sets.
        6 executions for each set.




   6
       http://svmlight.joachims.org
Zubiaga, Mart´
             ınez, Fresno (UNED)      Social Annotations for WPC   September 16th, 2009   12 / 25
Experiments


Classifying with Tags (I)




      Unweighted tags.
      Ranked tags.
      Tag fractions.
      Weighted tags (Top 10).
      Weighted tags (FTA).




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   13 / 25
Experiments


Classyfing with tags (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   14 / 25
Experiments


Classifying with Comments (I)




      Only notes.
      Both notes and reviews.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   15 / 25
Experiments


Classifying with Comments (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   16 / 25
Experiments


Comparison with the Baseline (Content) (I)




      Content.
      Comments.
      Tags.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   17 / 25
Experiments


Comparison with the Baseline (Content) (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   18 / 25
Experiments


Combining Classifiers (I)




      Tags + content.
      Tags + comments.
      Comment + content.
      Tags + comments + content.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   19 / 25
Experiments


Combining Classifiers (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   20 / 25
Conclusions


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   21 / 25
Conclusions


Conclusions



      We analyzed and evaluated the use of social annotations for web page
      classification.
      Some of the annotations are not popular enough.
              Tags and comments are popular.
      Both tags and comments outperform the results by the content.
      Combining the 3 data inputs performs even better.
      We corroborate the conclusions by (Noll and Meinell, 2008), showing
      in a quantitative way that social annotations are useful for web page
      classification.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   22 / 25
Future Work


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   23 / 25
Future Work


Future Work




      Classifying in a lower level.
      Filtering tags and comments (misbehavior detection).




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   24 / 25
Future Work


Thank You



Achiu    Arigato                   Danke Dhannvaad Dua Netjer en ek Efcharisto
      Gracias Gr`cies
                a    Gratia Grazie Guishepeli
   Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila
                     o o o          e
   esker Obrigado Shukran          Tack Tak Takk          Shukriya

   T¨nan Tapadh leat Tesekk¨r ederim Thank
    a                       u
                                         you         Toda



Zubiaga, Mart´
             ınez, Fresno (UNED)       Social Annotations for WPC    September 16th, 2009   25 / 25

More Related Content

Similar to Getting the Most Out of Social Annotations for Web Page Classification

Music Recommendation and Discovery in the Long Tail
Music Recommendation and Discovery in the Long TailMusic Recommendation and Discovery in the Long Tail
Music Recommendation and Discovery in the Long TailOscar Celma
 
Helping online communities enrich folksonomies
Helping online communities enrich folksonomiesHelping online communities enrich folksonomies
Helping online communities enrich folksonomiesFreddy Limpens
 
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...Robert Jansma
 
Adversarial ID - Social spam recognition
Adversarial ID - Social spam recognitionAdversarial ID - Social spam recognition
Adversarial ID - Social spam recognitionNicola Miotto
 
Using Technology To Enhance Instruction
Using Technology To Enhance InstructionUsing Technology To Enhance Instruction
Using Technology To Enhance InstructionLisa Durff
 
Using Technology To Enhance Instruction08
Using Technology To Enhance Instruction08Using Technology To Enhance Instruction08
Using Technology To Enhance Instruction08Lisa Durff
 
2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghanOpenAIRE
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewOpenAIRE
 
Social Networking for the Foreign Language Classroom
Social Networking for the Foreign Language ClassroomSocial Networking for the Foreign Language Classroom
Social Networking for the Foreign Language ClassroomBarbara Lindsey
 
iAnnotate 2013 Introduction
iAnnotate 2013 IntroductioniAnnotate 2013 Introduction
iAnnotate 2013 IntroductionRobert Sanderson
 
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel FunctionBeibei Yang
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaJanna Joceli Omena
 
OSNs2.pptx
OSNs2.pptxOSNs2.pptx
OSNs2.pptxAndrii53
 
Share the love: Motivating positive participation in social software
Share the love: Motivating positive participation in social softwareShare the love: Motivating positive participation in social software
Share the love: Motivating positive participation in social softwareMikeBrzozowski
 

Similar to Getting the Most Out of Social Annotations for Web Page Classification (20)

Music Recommendation and Discovery in the Long Tail
Music Recommendation and Discovery in the Long TailMusic Recommendation and Discovery in the Long Tail
Music Recommendation and Discovery in the Long Tail
 
Helping online communities enrich folksonomies
Helping online communities enrich folksonomiesHelping online communities enrich folksonomies
Helping online communities enrich folksonomies
 
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
 
Link Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-OnLink Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-On
 
Adversarial ID - Social spam recognition
Adversarial ID - Social spam recognitionAdversarial ID - Social spam recognition
Adversarial ID - Social spam recognition
 
Using Technology To Enhance Instruction
Using Technology To Enhance InstructionUsing Technology To Enhance Instruction
Using Technology To Enhance Instruction
 
Using Technology To Enhance Instruction08
Using Technology To Enhance Instruction08Using Technology To Enhance Instruction08
Using Technology To Enhance Instruction08
 
UTEP
UTEPUTEP
UTEP
 
2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data Overview
 
Social Networking for the Foreign Language Classroom
Social Networking for the Foreign Language ClassroomSocial Networking for the Foreign Language Classroom
Social Networking for the Foreign Language Classroom
 
iAnnotate 2013 Introduction
iAnnotate 2013 IntroductioniAnnotate 2013 Introduction
iAnnotate 2013 Introduction
 
OSNs.pptx
OSNs.pptxOSNs.pptx
OSNs.pptx
 
OSNs.pptx
OSNs.pptxOSNs.pptx
OSNs.pptx
 
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel Function
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social Media
 
Podcasting
PodcastingPodcasting
Podcasting
 
OSNs2.pptx
OSNs2.pptxOSNs2.pptx
OSNs2.pptx
 
Affordances in Social Media for Education
Affordances in Social Media for EducationAffordances in Social Media for Education
Affordances in Social Media for Education
 
Share the love: Motivating positive participation in social software
Share the love: Motivating positive participation in social softwareShare the love: Motivating positive participation in social software
Share the love: Motivating positive participation in social software
 

More from azubiaga

Exploiting context for rumour detection in social media
Exploiting context for rumour detection in social mediaExploiting context for rumour detection in social media
Exploiting context for rumour detection in social mediaazubiaga
 
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social MediaCrowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social Mediaazubiaga
 
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...azubiaga
 
Clasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones SocialesClasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones Socialesazubiaga
 
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?azubiaga
 
Master thesis presentation
Master thesis presentationMaster thesis presentation
Master thesis presentationazubiaga
 

More from azubiaga (6)

Exploiting context for rumour detection in social media
Exploiting context for rumour detection in social mediaExploiting context for rumour detection in social media
Exploiting context for rumour detection in social media
 
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social MediaCrowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
 
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
 
Clasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones SocialesClasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones Sociales
 
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
 
Master thesis presentation
Master thesis presentationMaster thesis presentation
Master thesis presentation
 

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Getting the Most Out of Social Annotations for Web Page Classification

  • 1. Getting the Most Out of Social Annotations for Web Page Classification DocEng 2009 Arkaitz Zubiaga, Raquel Mart´ ınez, V´ ıctor Fresno NLP & IR Group @ UNED September 16th, 2009
  • 2. Introduction Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 2 / 25
  • 3. Introduction What is Web Page Classification? We have a set of documents: D = {d1 , ..., d|D| } And a set of predefined categories: C = {c1 , ..., c|C | } Web page classification is known as: dj , ci ∈ D × C Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 3 / 25
  • 4. Introduction What are Social Bookmarking Sites? (I) Web sites that allow us to save web links, defining metadata to them. Delicious1 1 http://delicious.com Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 4 / 25
  • 5. Introduction What are Social Bookmarking Sites? (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 5 / 25
  • 6. Introduction Social Annotations Tags: Keywords. E.g., photography, web2.0, images. Notes: Free texts describing web pages. E.g., Flickr is a website for photo sharing and photo online management. Highlights: Selecting relevant parts of a page. Reviews: Free texts with subjective descriptions. E.g., Interesting web page with photos. Ratings: Gradings. E.g., 1 to 5. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 6 / 25
  • 7. Introduction Motivation Classical web page classification methods rely on web pages’ content. Motivation: Could social annotations help improving the results? Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 7 / 25
  • 8. Introduction Related Work Some works (Bao et al., 2007; Heymann et al., 2008) show the usefulness of tags for information retrieval. (Ramage et al., 2009) show that tags can improved clustering tasks. (Noll and Meinell, 2008) make a study on tags, concluding that they could be interesting for web page classification tasks. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 8 / 25
  • 9. Dataset Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 9 / 25
  • 10. Dataset Dataset December 2008 - January 2009: monitoring URLs with more than 100 users annotating it on Delicious’ recent feed. 87,096 URLs. Their classification on the Open Directory Project2 (ODP). 12,616 URLs matching. 17 first-level categories. Unbalanced. Annotations retrieval: Number of users annotating it3 . Top 10 list of tags3 . Full Tag Activity (FTA)3 . Notes3 . Reviews4 . Highlights5 . 2 http://www.dmoz.org 3 Delicious 4 StumbleUpon - http://www.stumbleupon.com 5 Diigo - http://diigo.com Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 10 / 25
  • 11. Experiments Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 11 / 25
  • 12. Experiments Configuration Support Vector Machines (SVM). SVMmulticlass6 Evaluation: Accuracy. Several training sets. 6 executions for each set. 6 http://svmlight.joachims.org Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 12 / 25
  • 13. Experiments Classifying with Tags (I) Unweighted tags. Ranked tags. Tag fractions. Weighted tags (Top 10). Weighted tags (FTA). Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 13 / 25
  • 14. Experiments Classyfing with tags (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 14 / 25
  • 15. Experiments Classifying with Comments (I) Only notes. Both notes and reviews. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 15 / 25
  • 16. Experiments Classifying with Comments (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 16 / 25
  • 17. Experiments Comparison with the Baseline (Content) (I) Content. Comments. Tags. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 17 / 25
  • 18. Experiments Comparison with the Baseline (Content) (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 18 / 25
  • 19. Experiments Combining Classifiers (I) Tags + content. Tags + comments. Comment + content. Tags + comments + content. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 19 / 25
  • 20. Experiments Combining Classifiers (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 20 / 25
  • 21. Conclusions Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 21 / 25
  • 22. Conclusions Conclusions We analyzed and evaluated the use of social annotations for web page classification. Some of the annotations are not popular enough. Tags and comments are popular. Both tags and comments outperform the results by the content. Combining the 3 data inputs performs even better. We corroborate the conclusions by (Noll and Meinell, 2008), showing in a quantitative way that social annotations are useful for web page classification. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 22 / 25
  • 23. Future Work Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 23 / 25
  • 24. Future Work Future Work Classifying in a lower level. Filtering tags and comments (misbehavior detection). Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 24 / 25
  • 25. Future Work Thank You Achiu Arigato Danke Dhannvaad Dua Netjer en ek Efcharisto Gracias Gr`cies a Gratia Grazie Guishepeli Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila o o o e esker Obrigado Shukran Tack Tak Takk Shukriya T¨nan Tapadh leat Tesekk¨r ederim Thank a u you Toda Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 25 / 25