4. Introduction
Context
✓ The Internet usage is doubling every year
✓ Web is a network with large amounts of
resources
✓ Our prototype is built on top of resources
usage
5. Introduction
Motivation
✓ Discovery of patterns & trends
✓ Text Mining = Data Mining for unstructured
text
✓ Can be of use for analyzing existing Web
usage
6. Introduction
Research Question
“Is it possible to group textual resources to users’
profiles and thus improve clustering techniques
used in recommendation applications, without
additional tagging mechanisms?”
8. Introduction
Our goal
Our goal is to find an alternative method to
classify new items as relevant or not, given all
historical choices and at the same time use
similar users’ choices to identify potentially
relevant items.
9. to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Text Mining <> Data Mining
✓ Built on top of unstructured data
✓ Requires additional computing for natural
language processing
10. to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Clustering
✓ Finds groups of similar objects
✓ Does not requires training sets
✓ Objects classification may be made
afterwards
11. to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Text Mining
✓ Documents are represented as points in a
space map
✓ Words are categorized and represented by
frequencies on a dictionary
✓ It’s possible to apply classification or
association techniques on those frequencies,
as if it was plain numerical data
12. to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Building profiles (1)
✓ Tagging resources enables communities to
search related resources without additional
computation
✓ Requires someone to describe resources for
the tag to be effective
13. to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Building profiles (2)
✓ Experiments in building profiles include
analyzis of zones of interest or page links
✓ They are mostly based in users’ actions taken
individually
14. to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Building profiles (3)
✓ Recommendation based in classification
techniques require training with initial profiles
✓ Some authors recognize the subjectivity of
users’ based profiles
15. Plan2See Method
Recommendation
✓ Presents similar textual resources based on
users’ selections
✓ Is based in the organisation of clusters built on
top of users’ choices, thus dividing or grouping
resources
Resources
✓ Event announcements with title, description,
date and location
26. Plan2See Method
Testing Equal Means
✓ We’ve used Hotteling 2-Sample T-squared Test
for testing if the null hypothesis should be
rejected
Assumptions
✓ Only the 5% higher frequencies’ words are
used
✓ Dividing is done for clusters with less than
60% of selected events
✓ Grouping is done on clusters with at least
25% of its’ events selected
✓ Clustering is schedule so clusters are
stable and are not being modified for
27. ✓ Resources were gathered by crawling
✓ Data has been filtered to build the
application dictionary
✓ We’ve tested 10 initial clusters from
KMeans and decided to use only one
initial cluster
✓ We’ve tested the basic operations for
the algorithm with success
Plan2See Setup
28. Conclusions
✓ A new method is proposed
➢ Clustering for users’ profiles
➢ Does not need any additional
tagging mechanisms
➢ Clusters seem to be stable even if
changes occur periodically
29. Conclusions
✓ Lacks tests with real users’ preferences
➢ Lacks testing recommendation for
users’ items and for the dynamic
groups
➢ Lacks verification that this profiling
is effective, i. e., users are choosing
similar contents in groups or
communities