Getting the Most Out of Social Annotations for Web Page Classification
1. Getting the Most Out of Social Annotations for Web
Page Classification
DocEng 2009
Arkaitz Zubiaga, Raquel Mart´
ınez, V´
ıctor Fresno
NLP & IR Group @ UNED
September 16th, 2009
2. Introduction
Index
1 Introduction
2 Dataset
3 Experiments
4 Conclusions
5 Future Work
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 2 / 25
3. Introduction
What is Web Page Classification?
We have a set of documents:
D = {d1 , ..., d|D| }
And a set of predefined categories:
C = {c1 , ..., c|C | }
Web page classification is known as:
dj , ci ∈ D × C
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 3 / 25
4. Introduction
What are Social Bookmarking Sites? (I)
Web sites that allow us to save web links, defining metadata to them.
Delicious1
1
http://delicious.com
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 4 / 25
5. Introduction
What are Social Bookmarking Sites? (II)
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 5 / 25
6. Introduction
Social Annotations
Tags: Keywords. E.g., photography, web2.0, images.
Notes: Free texts describing web pages. E.g., Flickr is a website for
photo sharing and photo online management.
Highlights: Selecting relevant parts of a page.
Reviews: Free texts with subjective descriptions. E.g., Interesting
web page with photos.
Ratings: Gradings. E.g., 1 to 5.
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 6 / 25
7. Introduction
Motivation
Classical web page classification methods rely on web pages’ content.
Motivation: Could social annotations help improving the results?
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 7 / 25
8. Introduction
Related Work
Some works (Bao et al., 2007; Heymann et al., 2008) show the
usefulness of tags for information retrieval.
(Ramage et al., 2009) show that tags can improved clustering tasks.
(Noll and Meinell, 2008) make a study on tags, concluding that they
could be interesting for web page classification tasks.
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 8 / 25
9. Dataset
Index
1 Introduction
2 Dataset
3 Experiments
4 Conclusions
5 Future Work
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 9 / 25
10. Dataset
Dataset
December 2008 - January 2009: monitoring URLs with more than
100 users annotating it on Delicious’ recent feed.
87,096 URLs.
Their classification on the Open Directory Project2 (ODP).
12,616 URLs matching.
17 first-level categories.
Unbalanced.
Annotations retrieval:
Number of users annotating it3 .
Top 10 list of tags3 .
Full Tag Activity (FTA)3 .
Notes3 .
Reviews4 .
Highlights5 .
2
http://www.dmoz.org
3
Delicious
4
StumbleUpon - http://www.stumbleupon.com
5
Diigo - http://diigo.com
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 10 / 25
11. Experiments
Index
1 Introduction
2 Dataset
3 Experiments
4 Conclusions
5 Future Work
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 11 / 25
12. Experiments
Configuration
Support Vector Machines (SVM).
SVMmulticlass6
Evaluation: Accuracy.
Several training sets.
6 executions for each set.
6
http://svmlight.joachims.org
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 12 / 25
13. Experiments
Classifying with Tags (I)
Unweighted tags.
Ranked tags.
Tag fractions.
Weighted tags (Top 10).
Weighted tags (FTA).
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 13 / 25
14. Experiments
Classyfing with tags (II)
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 14 / 25
15. Experiments
Classifying with Comments (I)
Only notes.
Both notes and reviews.
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 15 / 25
21. Conclusions
Index
1 Introduction
2 Dataset
3 Experiments
4 Conclusions
5 Future Work
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 21 / 25
22. Conclusions
Conclusions
We analyzed and evaluated the use of social annotations for web page
classification.
Some of the annotations are not popular enough.
Tags and comments are popular.
Both tags and comments outperform the results by the content.
Combining the 3 data inputs performs even better.
We corroborate the conclusions by (Noll and Meinell, 2008), showing
in a quantitative way that social annotations are useful for web page
classification.
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 22 / 25
23. Future Work
Index
1 Introduction
2 Dataset
3 Experiments
4 Conclusions
5 Future Work
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 23 / 25
24. Future Work
Future Work
Classifying in a lower level.
Filtering tags and comments (misbehavior detection).
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 24 / 25
25. Future Work
Thank You
Achiu Arigato Danke Dhannvaad Dua Netjer en ek Efcharisto
Gracias Gr`cies
a Gratia Grazie Guishepeli
Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila
o o o e
esker Obrigado Shukran Tack Tak Takk Shukriya
T¨nan Tapadh leat Tesekk¨r ederim Thank
a u
you Toda
Zubiaga, Mart´
ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 25 / 25