SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Profiling Users’ Preferences
with Text Mining
Pedro Costa
ISCTE-IUL 2014
Lisboa, 3 de Julho
Agenda
Introduction
Background and Related Work
Plan2See Method
Plan2See Setup
Conclusions and Future Work
Introduction
Introduction
Context
✓ The Internet usage is doubling every year
✓ Web is a network with large amounts of
resources
✓ Our prototype is built on top of resources
usage
Introduction
Motivation
✓ Discovery of patterns & trends
✓ Text Mining = Data Mining for unstructured
text
✓ Can be of use for analyzing existing Web
usage
Introduction
Research Question
“Is it possible to group textual resources to users’
profiles and thus improve clustering techniques
used in recommendation applications, without
additional tagging mechanisms?”
Introduction
Assumptions
1) personal information will always be
insufficient;
2) tagging resources relies in human knowledge
and sense to be accurate.
Introduction
Our goal
Our goal is to find an alternative method to
classify new items as relevant or not, given all
historical choices and at the same time use
similar users’ choices to identify potentially
relevant items.
to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Text Mining <> Data Mining
✓ Built on top of unstructured data
✓ Requires additional computing for natural
language processing
to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Clustering
✓ Finds groups of similar objects
✓ Does not requires training sets
✓ Objects classification may be made
afterwards
to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Text Mining
✓ Documents are represented as points in a
space map
✓ Words are categorized and represented by
frequencies on a dictionary
✓ It’s possible to apply classification or
association techniques on those frequencies,
as if it was plain numerical data
to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Building profiles (1)
✓ Tagging resources enables communities to
search related resources without additional
computation
✓ Requires someone to describe resources for
the tag to be effective
to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Building profiles (2)
✓ Experiments in building profiles include
analyzis of zones of interest or page links
✓ They are mostly based in users’ actions taken
individually
to find an alternative method to classify
new items as relevant or not, given all
historical choices and at the same time
use similar users’ choices to identify
potentially relevant items.
Background and Related Work
Building profiles (3)
✓ Recommendation based in classification
techniques require training with initial profiles
✓ Some authors recognize the subjectivity of
users’ based profiles
Plan2See Method
Recommendation
✓ Presents similar textual resources based on
users’ selections
✓ Is based in the organisation of clusters built on
top of users’ choices, thus dividing or grouping
resources
Resources
✓ Event announcements with title, description,
date and location
Plan2See Method
Plan2See Method
Grouping
Plan2See Method
Grouping
T-Test result shows related
content
Plan2See Method
Grouping
Plan2See Method
Grouping
Plan2See Method
Dividing
Plan2See Method
Dividing
T-Test result shows unrelated
content
Plan2See Method
Dividing
Plan2See Method
Dividing
Plan2See Method
Recommendation
Plan2See Method
Testing Equal Means
✓ We’ve used Hotteling 2-Sample T-squared Test
for testing if the null hypothesis should be
rejected
Assumptions
✓ Only the 5% higher frequencies’ words are
used
✓ Dividing is done for clusters with less than
60% of selected events
✓ Grouping is done on clusters with at least
25% of its’ events selected
✓ Clustering is schedule so clusters are
stable and are not being modified for
✓ Resources were gathered by crawling
✓ Data has been filtered to build the
application dictionary
✓ We’ve tested 10 initial clusters from
KMeans and decided to use only one
initial cluster
✓ We’ve tested the basic operations for
the algorithm with success
Plan2See Setup
Conclusions
✓ A new method is proposed
➢ Clustering for users’ profiles
➢ Does not need any additional
tagging mechanisms
➢ Clusters seem to be stable even if
changes occur periodically
Conclusions
✓ Lacks tests with real users’ preferences
➢ Lacks testing recommendation for
users’ items and for the dynamic
groups
➢ Lacks verification that this profiling
is effective, i. e., users are choosing
similar contents in groups or
communities
Thank you!
Pedro Costa
ISCTE-IUL
pedro.bonifacio.costa@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

K3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibraryevaminerva
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterVarsha Khodiyar
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystemVarsha Khodiyar
 
Clustering Technique for Collaborative Filtering Recommendation and Applicat...
Clustering Technique for Collaborative  Filtering Recommendation and Applicat...Clustering Technique for Collaborative  Filtering Recommendation and Applicat...
Clustering Technique for Collaborative Filtering Recommendation and Applicat...Pham Cuong
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...NASIG
 
Text Data Mining: Unlocking the hidden potential from scholarly content.
Text Data Mining: Unlocking the hidden potential from scholarly content.Text Data Mining: Unlocking the hidden potential from scholarly content.
Text Data Mining: Unlocking the hidden potential from scholarly content.Emma Warren-Jones
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
Wilson-npg-scientific data-nfdp13
Wilson-npg-scientific data-nfdp13Wilson-npg-scientific data-nfdp13
Wilson-npg-scientific data-nfdp13DataDryad
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
 
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017dkNET
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluatedGESIS
 

Was ist angesagt? (13)

K3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibrary
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopter
 
Payton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook MetadataPayton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook Metadata
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
Clustering Technique for Collaborative Filtering Recommendation and Applicat...
Clustering Technique for Collaborative  Filtering Recommendation and Applicat...Clustering Technique for Collaborative  Filtering Recommendation and Applicat...
Clustering Technique for Collaborative Filtering Recommendation and Applicat...
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
 
Text Data Mining: Unlocking the hidden potential from scholarly content.
Text Data Mining: Unlocking the hidden potential from scholarly content.Text Data Mining: Unlocking the hidden potential from scholarly content.
Text Data Mining: Unlocking the hidden potential from scholarly content.
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
Wilson-npg-scientific data-nfdp13
Wilson-npg-scientific data-nfdp13Wilson-npg-scientific data-nfdp13
Wilson-npg-scientific data-nfdp13
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 

Ähnlich wie Profiling Users' Preferences with Text Mining '14

Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Stats Statswork
 
Mixed Methods Research Design
Mixed Methods Research DesignMixed Methods Research Design
Mixed Methods Research DesignSYIKIN MARIA
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.pptHODECE21
 
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...Karthikeyan Umapathy
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
unit02-170324041847 (2).pptx
unit02-170324041847 (2).pptxunit02-170324041847 (2).pptx
unit02-170324041847 (2).pptxPraveen Kumar
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Jenn Riley
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
IRJET- Text Document Clustering using K-Means Algorithm
IRJET-  	  Text Document Clustering using K-Means Algorithm IRJET-  	  Text Document Clustering using K-Means Algorithm
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
 
Designing an effective information architecture (
Designing an effective information architecture (Designing an effective information architecture (
Designing an effective information architecture (Vickey Bird
 
615900072
615900072615900072
615900072picktru
 
Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536IJRAT
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findabilityKristian Norling
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computersNoonapau
 

Ähnlich wie Profiling Users' Preferences with Text Mining '14 (20)

Viva
VivaViva
Viva
 
Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...
 
Mixed Methods Research Design
Mixed Methods Research DesignMixed Methods Research Design
Mixed Methods Research Design
 
Mixed Methods Research Design
Mixed Methods Research DesignMixed Methods Research Design
Mixed Methods Research Design
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
 
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
unit02-170324041847 (2).pptx
unit02-170324041847 (2).pptxunit02-170324041847 (2).pptx
unit02-170324041847 (2).pptx
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
D1802023136
D1802023136D1802023136
D1802023136
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
IRJET- Text Document Clustering using K-Means Algorithm
IRJET-  	  Text Document Clustering using K-Means Algorithm IRJET-  	  Text Document Clustering using K-Means Algorithm
IRJET- Text Document Clustering using K-Means Algorithm
 
Designing an effective information architecture (
Designing an effective information architecture (Designing an effective information architecture (
Designing an effective information architecture (
 
615900072
615900072615900072
615900072
 
Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536
 
Testing Taxonomies
Testing TaxonomiesTesting Taxonomies
Testing Taxonomies
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findability
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computers
 
Query expansion
Query expansionQuery expansion
Query expansion
 

Kürzlich hochgeladen

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Profiling Users' Preferences with Text Mining '14

  • 1. Profiling Users’ Preferences with Text Mining Pedro Costa ISCTE-IUL 2014 Lisboa, 3 de Julho
  • 2. Agenda Introduction Background and Related Work Plan2See Method Plan2See Setup Conclusions and Future Work
  • 4. Introduction Context ✓ The Internet usage is doubling every year ✓ Web is a network with large amounts of resources ✓ Our prototype is built on top of resources usage
  • 5. Introduction Motivation ✓ Discovery of patterns & trends ✓ Text Mining = Data Mining for unstructured text ✓ Can be of use for analyzing existing Web usage
  • 6. Introduction Research Question “Is it possible to group textual resources to users’ profiles and thus improve clustering techniques used in recommendation applications, without additional tagging mechanisms?”
  • 7. Introduction Assumptions 1) personal information will always be insufficient; 2) tagging resources relies in human knowledge and sense to be accurate.
  • 8. Introduction Our goal Our goal is to find an alternative method to classify new items as relevant or not, given all historical choices and at the same time use similar users’ choices to identify potentially relevant items.
  • 9. to find an alternative method to classify new items as relevant or not, given all historical choices and at the same time use similar users’ choices to identify potentially relevant items. Background and Related Work Text Mining <> Data Mining ✓ Built on top of unstructured data ✓ Requires additional computing for natural language processing
  • 10. to find an alternative method to classify new items as relevant or not, given all historical choices and at the same time use similar users’ choices to identify potentially relevant items. Background and Related Work Clustering ✓ Finds groups of similar objects ✓ Does not requires training sets ✓ Objects classification may be made afterwards
  • 11. to find an alternative method to classify new items as relevant or not, given all historical choices and at the same time use similar users’ choices to identify potentially relevant items. Background and Related Work Text Mining ✓ Documents are represented as points in a space map ✓ Words are categorized and represented by frequencies on a dictionary ✓ It’s possible to apply classification or association techniques on those frequencies, as if it was plain numerical data
  • 12. to find an alternative method to classify new items as relevant or not, given all historical choices and at the same time use similar users’ choices to identify potentially relevant items. Background and Related Work Building profiles (1) ✓ Tagging resources enables communities to search related resources without additional computation ✓ Requires someone to describe resources for the tag to be effective
  • 13. to find an alternative method to classify new items as relevant or not, given all historical choices and at the same time use similar users’ choices to identify potentially relevant items. Background and Related Work Building profiles (2) ✓ Experiments in building profiles include analyzis of zones of interest or page links ✓ They are mostly based in users’ actions taken individually
  • 14. to find an alternative method to classify new items as relevant or not, given all historical choices and at the same time use similar users’ choices to identify potentially relevant items. Background and Related Work Building profiles (3) ✓ Recommendation based in classification techniques require training with initial profiles ✓ Some authors recognize the subjectivity of users’ based profiles
  • 15. Plan2See Method Recommendation ✓ Presents similar textual resources based on users’ selections ✓ Is based in the organisation of clusters built on top of users’ choices, thus dividing or grouping resources Resources ✓ Event announcements with title, description, date and location
  • 19. T-Test result shows related content Plan2See Method Grouping
  • 23. T-Test result shows unrelated content Plan2See Method Dividing
  • 26. Plan2See Method Testing Equal Means ✓ We’ve used Hotteling 2-Sample T-squared Test for testing if the null hypothesis should be rejected Assumptions ✓ Only the 5% higher frequencies’ words are used ✓ Dividing is done for clusters with less than 60% of selected events ✓ Grouping is done on clusters with at least 25% of its’ events selected ✓ Clustering is schedule so clusters are stable and are not being modified for
  • 27. ✓ Resources were gathered by crawling ✓ Data has been filtered to build the application dictionary ✓ We’ve tested 10 initial clusters from KMeans and decided to use only one initial cluster ✓ We’ve tested the basic operations for the algorithm with success Plan2See Setup
  • 28. Conclusions ✓ A new method is proposed ➢ Clustering for users’ profiles ➢ Does not need any additional tagging mechanisms ➢ Clusters seem to be stable even if changes occur periodically
  • 29. Conclusions ✓ Lacks tests with real users’ preferences ➢ Lacks testing recommendation for users’ items and for the dynamic groups ➢ Lacks verification that this profiling is effective, i. e., users are choosing similar contents in groups or communities