Profiling Users' Preferences with Text Mining '14

Profiling Users’ Preferences
with Text Mining
Pedro Costa
ISCTE-IUL 2014
Lisboa, 3 de Julho

Agenda
Introduction
Background and Related Work
Plan2See Method
Plan2See Setup
Conclusions and Future Work

Introduction
Context
✓ The Internet usage is doubling every year
✓ Web is a network with large amounts of
resources
✓ Our prototype is built on top of resources
usage

Introduction
Motivation
✓ Discovery of patterns & trends
✓ Text Mining = Data Mining for unstructured
text
✓ Can be of use for analyzing existing Web
usage

Introduction
Research Question
“Is it possible to group textual resources to users’
profiles and thus improve clustering techniques
used in recommendation applications, without
additional tagging mechanisms?”

Introduction
Assumptions
1) personal information will always be
insufficient;
2) tagging resources relies in human knowledge
and sense to be accurate.

Introduction
Our goal
Our goal is to find an alternative method to
classify new items as relevant or not, given all
historical choices and at the same time use
similar users’ choices to identify potentially
relevant items.

to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Text Mining <> Data Mining
✓ Built on top of unstructured data
✓ Requires additional computing for natural
language processing

Clustering
✓ Finds groups of similar objects
✓ Does not requires training sets
✓ Objects classification may be made
afterwards

Text Mining
✓ Documents are represented as points in a
space map
✓ Words are categorized and represented by
frequencies on a dictionary
✓ It’s possible to apply classification or
association techniques on those frequencies,
as if it was plain numerical data

Building profiles (1)
✓ Tagging resources enables communities to
search related resources without additional
computation
✓ Requires someone to describe resources for
the tag to be effective

✓ Experiments in building profiles include
analyzis of zones of interest or page links
✓ They are mostly based in users’ actions taken
individually

✓ Recommendation based in classification
techniques require training with initial profiles
✓ Some authors recognize the subjectivity of
users’ based profiles

Plan2See Method
Recommendation
✓ Presents similar textual resources based on
users’ selections
✓ Is based in the organisation of clusters built on
top of users’ choices, thus dividing or grouping
resources
Resources
✓ Event announcements with title, description,
date and location

T-Test result shows related
content
Plan2See Method
Grouping

T-Test result shows unrelated
content
Plan2See Method
Dividing

Plan2See Method
Recommendation

Plan2See Method
Testing Equal Means
✓ We’ve used Hotteling 2-Sample T-squared Test
for testing if the null hypothesis should be
rejected
Assumptions
✓ Only the 5% higher frequencies’ words are
used
✓ Dividing is done for clusters with less than
60% of selected events
✓ Grouping is done on clusters with at least
25% of its’ events selected
✓ Clustering is schedule so clusters are
stable and are not being modified for

✓ Resources were gathered by crawling
✓ Data has been filtered to build the
application dictionary
✓ We’ve tested 10 initial clusters from
KMeans and decided to use only one
initial cluster
✓ We’ve tested the basic operations for
the algorithm with success
Plan2See Setup

Conclusions
✓ A new method is proposed
➢ Clustering for users’ profiles
➢ Does not need any additional
tagging mechanisms
➢ Clusters seem to be stable even if
changes occur periodically

Conclusions
✓ Lacks tests with real users’ preferences
➢ Lacks testing recommendation for
users’ items and for the dynamic
groups
➢ Lacks verification that this profiling
is effective, i. e., users are choosing
similar contents in groups or
communities

Thank you!
Pedro Costa
ISCTE-IUL
pedro.bonifacio.costa@gmail.com

Profiling Users' Preferences with Text Mining '14

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (13)

Ähnlich wie Profiling Users' Preferences with Text Mining '14

Ähnlich wie Profiling Users' Preferences with Text Mining '14 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Profiling Users' Preferences with Text Mining '14