SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Web Usage Mining with Semantic Analysis
Laura Hollink, VU University Amsterdam
Peter Mika, Yahoo! Labs Barcelona
Roi Blanco, Yahoo! Labs Barcelona
Analysis of web user behavior
What are typical use cases? Are these carried out in a particular order?
Which use cases are not satisfied? And to which other sites do users
go?
Analysis of web user behavior
What are typical use cases? Are these carried out in a particular order?
Which use cases are not satisfied? And to which other sites do users
go?
oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
captain'america'''movies.yahoo.com moneyball'trailer'''movies.yahoo.com'
money'''moneyball'movies.yahoo.com'
moneyball'''movies.yahoo.com''movies.yahoo.com en.wikipedia.org'''movies.yahoo.com''peter'brand'''peter
nymag.com'''moneyball'the'movie'''www.imdb.com'
moneyball'trailer'movies.yahoo.com''moneyball'trailer''
brad'pi-''brad'pi-'moneyball''brad'pi-'moneyball'movie'brad'pi-'moneyball''brad'pi-'moneyball'oscar'''www.imdb.co
relay'for'life'calvert'ocunty www.relayforlife.org'trailer'for'moneyball'''movies.yahoo.com 'moneyball.movie
moneyball'en.wikipedia.org 'movies.yahoo.com map'of'africa''www.africaguide.com'
money'ball'movie'''www.imdb.com money'ball'movie'trailer''moneyball.movie-trailer.com''
brad'pi-'new''www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com'brad'pi-'news'
news.search.yahoo.com moneyball'trailer''moneyball'trailer'www.imdb.com''www.imdb.com!
Transaction logs: sessions of queries and clicks
Analysis of web user behavior
oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
captain'america'''movies.yahoo.com moneyball'trailer'''movies.yahoo.com'
money'''moneyball'movies.yahoo.com'
moneyball'''movies.yahoo.com''movies.yahoo.com en.wikipedia.org'''movies.yahoo.com''peter'brand'''peter
nymag.com'''moneyball'the'movie'''www.imdb.com'
moneyball'trailer'movies.yahoo.com''moneyball'trailer''
brad'pi-''brad'pi-'moneyball''brad'pi-'moneyball'movie'brad'pi-'moneyball''brad'pi-'moneyball'oscar'''www.imdb.co
relay'for'life'calvert'ocunty www.relayforlife.org'trailer'for'moneyball'''movies.yahoo.com 'moneyball.movie
moneyball'en.wikipedia.org 'movies.yahoo.com map'of'africa''www.africaguide.com'
money'ball'movie'''www.imdb.com money'ball'movie'trailer''moneyball.movie-trailer.com''
brad'pi-'new''www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com'brad'pi-'news'
news.search.yahoo.com moneyball'trailer''moneyball'trailer'www.imdb.com''www.imdb.com!
Transaction logs: sessions of queries and clicks
Are these use cases typical for all movies? Recent movies? Only for
Moneyball?
Why are these questions difficult to answer?
Sparsity of the event space
‣ 64% percent of queries are unique within a year
‣ even the most frequent patterns have extremely low support
To illustrate: top 12 most frequent sessions observed in our data:
Tasks
Question 1: what are typical use cases?
‣Task 1: find sequences of events in the data that are more
frequent (have a higher support) than a threshold.
Question 2: what use cases are not satisfied?
‣Task 2: learn to predict website abandonment from
queries and clicks.
Approach
'oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
Applied to the
movie domain
Connect queries to entities in the linked open data cloud and use
properties of these entities to generalize and categorize queries.
Data processing and linking steps
1.link queries to entities
2.select types of entities (classes)
3.detect modifier words (download, trailer, cast, date, etc.)
4.identify navigational queries
5.identify ‘loosing’ queries.
'oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
1. Linking queries to entities in the LOD cloud
• We link one entity to each query.
• The intent of about 40% of unique Web queries is to find a particular entity
[Pound, WWW2008].
• We link to Freebase (has a lot of movie related info) and DBpedia (Wikipedia is
widely used)
2. Select one type per entity
• We use the Freebase API to get the semantic “types” of
each query URI
• Freebase ‘Notable types API’ is not official and not
documented.
• For repeatability and transparency, we have created our
own heuristics to select one type for each entity:
1. no internal or administrative types,
2.prefer established domains (‘Commons’) over user defined schemas
(’Bases’)
3.aggregate specific types into more general types
a)subtypes of location -> location
b)subtypes of award winners and nominees -> award_winner_nonimee
c)prefer movie related types over other types: film, actor,
artist, tv_program, tv_actor and location (order of decreasing
preference).
entity
TypeType
Type Type
Type
Type
3. Detect modifier words in queries
Top 100 most frequent words that appear in the query log before or after
entity names [Mika ISWC2009, Pantel WWW2012].
movie, movies, theater, cast, quotes, free, theaters, watch, 2011, new, tv,
show, dvd, online, sex, video, cinema, trailer, list, theatre . . .
4. Identifying navigational queries
• A navigational query is a query entered with the intention of navigating to a
particular website.
• A common heuristic is to consider navigational queries where the query
matches the domain name of a clicked result.
• “official homepage” is value of dbpedia:homepage, dbpedia:url, and
foaf:homepage.
netflix login www.netflix.com
banana www.bananas.org
European Parliament europarl.europa.eu
5 Identify ‘loosing’ queries
• A ‘loosing’ query is the query that leads a user to abandon a service in favor
of another service.
• Common definition: A user repeats the same query and clicks on another
result in the list.
• Our broader, semantic definition:
Evaluation
1.Linking to entities and types
2.Detection of frequent usage patterns
3.Prediction of website abandonment
Applied to the movie domain
• sample of server logs of Yahoo! Search in the US
from June, 2011, split into sessions.
• Only sessions that contain at least one visit to any
of 16 popular movie sites4.
• 1.7 million sessions, containing over 5.8 million
queries and over 6.8 million clicks.
Evaluation of links to entities and types
• Compare manually created <query, entity> and <entity, type> pairs to
automatically created links.
• 2 samples: the 50 most frequent queries and 50 random queries.
Examples:
• Ambiguous query: “Green Lantern” - the movie or the fictional character?
• Wrong type: Oil peak is a serious game subject?
Evaluation of links to entities and types
Queries Entities Types
Frequencyofoccurrence
Frequencyofoccurrence
Frequencyofoccurrence
Frequent usage patterns I
• Freebase:release_date property of entities.
Recent movies Older movies
Frequent usage patterns II
• Sequences of consecutive query types.
Frequent usage patterns III
• A comparison of
websites.
• most frequent query
types that lead to a click
on a website.
/film
/film/actor
/tv_program
/people/person
/book/book
ional_universe/fictional_character
/music/artist
/tv/tv_actor
/location
/film/film_series
Website 1
proportionofqueriesthatleadtoaclickonthewebsite
0.0
0.1
0.2
0.3
0.4
0.5
0.6
/film
/location
/book/book
/film/actor
/business/employer
/fictional_universe/work_of_fiction
ional_universe/fictional_character
/tv_program
/architecture/building_function
/film/film_series
Website 2
proportionofqueriesthatleadtoaclickonthewebsite
0.0
0.1
0.2
0.3
0.4
0.5
0.6
/location
/business/employer
/film
/film/actor
/organization/organization
/architecture/building_function
/people/person
/tv_program
/tv/tv_network
/internet/website_category
Website 3
proportionofqueriesthatleadtoaclickonthewebsite
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Proportionofqueries
Proportionofqueries
Website BWebsite A
Predicting website abandonment
• 3 Classification Tasks:
Given a (part of a) session in which a user is lost/gained, predict...
1...whether a user will be gained for a given website.
2...given that the session includes a given website, whether this website is in
the loosing or gaining position.
3...given that the session includes two given websites, which one is in the
gaining position.
•Gradient Boosted Decision Trees.
Discussion and future work
• Mining patterns of entire queries gives problems with sparsity of data
• We interpret the structure and semantics of the queries, using openly
available, up-to-date information on the Web.
• give a “semantic” definition of navigational and ‘loosing’ queries
• find patterns of user behavior
• predict website abandonment
• This is the beginning:
• Use more properties of entities, more features.
• Detect more complex patterns.
• Explore other linked open datasets.
Thank you!
Questions?

Weitere ähnliche Inhalte

Andere mochten auch

Duneska gómez
Duneska gómezDuneska gómez
Duneska gómezDUN GOMEZ
 
Alibaba: The Figures
Alibaba: The FiguresAlibaba: The Figures
Alibaba: The FiguresStartup China
 
Mexico2008 Photo Album2
Mexico2008 Photo Album2Mexico2008 Photo Album2
Mexico2008 Photo Album2Barry Fisher
 
El Primado de Pedro
El Primado de PedroEl Primado de Pedro
El Primado de PedroMiguel Angel
 
Creating professional learning community schoolloop112
Creating professional learning community schoolloop112Creating professional learning community schoolloop112
Creating professional learning community schoolloop112marcelo leal
 
Defensa acusación 26/04
Defensa acusación 26/04Defensa acusación 26/04
Defensa acusación 26/04cee_info_2012
 
Gulmohar project brochure
Gulmohar project brochureGulmohar project brochure
Gulmohar project brochureAshoka Realty
 
Plan nacional de desarrollo – evolucion de indicadores
Plan nacional de desarrollo – evolucion de indicadoresPlan nacional de desarrollo – evolucion de indicadores
Plan nacional de desarrollo – evolucion de indicadoresdavid_9015
 
Proyectos Estudiantiles DI - Bases
Proyectos Estudiantiles DI - BasesProyectos Estudiantiles DI - Bases
Proyectos Estudiantiles DI - Basescee_info_2012
 
Bad Grammar Tattoos
Bad Grammar TattoosBad Grammar Tattoos
Bad Grammar Tattoosnoeldrew
 
Simplifying the Complex: Serving Data from Pipeline Data Models
Simplifying the Complex: Serving Data from Pipeline Data ModelsSimplifying the Complex: Serving Data from Pipeline Data Models
Simplifying the Complex: Serving Data from Pipeline Data ModelsSafe Software
 

Andere mochten auch (18)

Duneska gómez
Duneska gómezDuneska gómez
Duneska gómez
 
Alibaba: The Figures
Alibaba: The FiguresAlibaba: The Figures
Alibaba: The Figures
 
Pisici
PisiciPisici
Pisici
 
Mexico2008 Photo Album2
Mexico2008 Photo Album2Mexico2008 Photo Album2
Mexico2008 Photo Album2
 
El Primado de Pedro
El Primado de PedroEl Primado de Pedro
El Primado de Pedro
 
Creating professional learning community schoolloop112
Creating professional learning community schoolloop112Creating professional learning community schoolloop112
Creating professional learning community schoolloop112
 
Defensa acusación 26/04
Defensa acusación 26/04Defensa acusación 26/04
Defensa acusación 26/04
 
Fitxa sessió
Fitxa sessióFitxa sessió
Fitxa sessió
 
Gulmohar project brochure
Gulmohar project brochureGulmohar project brochure
Gulmohar project brochure
 
Plan nacional de desarrollo – evolucion de indicadores
Plan nacional de desarrollo – evolucion de indicadoresPlan nacional de desarrollo – evolucion de indicadores
Plan nacional de desarrollo – evolucion de indicadores
 
Proyectos Estudiantiles DI - Bases
Proyectos Estudiantiles DI - BasesProyectos Estudiantiles DI - Bases
Proyectos Estudiantiles DI - Bases
 
Bad Grammar Tattoos
Bad Grammar TattoosBad Grammar Tattoos
Bad Grammar Tattoos
 
Garuda Indonesia (GA88)
Garuda Indonesia (GA88)Garuda Indonesia (GA88)
Garuda Indonesia (GA88)
 
Petrofísica de carbonatos do nordeste brasileiro
Petrofísica de carbonatos do nordeste brasileiroPetrofísica de carbonatos do nordeste brasileiro
Petrofísica de carbonatos do nordeste brasileiro
 
Venice_la_nuite
Venice_la_nuiteVenice_la_nuite
Venice_la_nuite
 
Simplifying the Complex: Serving Data from Pipeline Data Models
Simplifying the Complex: Serving Data from Pipeline Data ModelsSimplifying the Complex: Serving Data from Pipeline Data Models
Simplifying the Complex: Serving Data from Pipeline Data Models
 
CITd
CITdCITd
CITd
 
Presentación de caso clínico
Presentación de caso clínicoPresentación de caso clínico
Presentación de caso clínico
 

Ähnlich wie WWW2013: Web Usage Mining with Semantic Analysis

Internet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) PresentationInternet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) Presentationlyvette24
 
Genetic Malware
Genetic MalwareGenetic Malware
Genetic MalwareOkta
 
Lessons Learnt From Working With Rails
Lessons Learnt From Working With RailsLessons Learnt From Working With Rails
Lessons Learnt From Working With Railsmartinbtt
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsPeter Mika
 
CBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in MultimediaCBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in Multimediadermotte
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
 
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site ArchitectureTom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site Architectureauexpo Conference
 
Anly 500-presentation
Anly 500-presentationAnly 500-presentation
Anly 500-presentationFangyaTan
 
Web query expansion based on association rules mining with e hownet and googl...
Web query expansion based on association rules mining with e hownet and googl...Web query expansion based on association rules mining with e hownet and googl...
Web query expansion based on association rules mining with e hownet and googl...Paul Yang
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 topgeek
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Immersive Recommendation
Immersive RecommendationImmersive Recommendation
Immersive Recommendation承剛 謝
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummiesSaurav Chakravorty
 
Introduction to Information Architecture & Design - SVA Workshop 02/15/14
Introduction to Information Architecture & Design - SVA Workshop 02/15/14Introduction to Information Architecture & Design - SVA Workshop 02/15/14
Introduction to Information Architecture & Design - SVA Workshop 02/15/14Robert Stribley
 

Ähnlich wie WWW2013: Web Usage Mining with Semantic Analysis (20)

Internet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) PresentationInternet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) Presentation
 
Genetic Malware
Genetic MalwareGenetic Malware
Genetic Malware
 
Genetic Malware
Genetic MalwareGenetic Malware
Genetic Malware
 
Lessons Learnt From Working With Rails
Lessons Learnt From Working With RailsLessons Learnt From Working With Rails
Lessons Learnt From Working With Rails
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
Collab filtering-tutorial
Collab filtering-tutorialCollab filtering-tutorial
Collab filtering-tutorial
 
CBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in MultimediaCBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in Multimedia
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site ArchitectureTom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
 
Anly 500-presentation
Anly 500-presentationAnly 500-presentation
Anly 500-presentation
 
Web query expansion based on association rules mining with e hownet and googl...
Web query expansion based on association rules mining with e hownet and googl...Web query expansion based on association rules mining with e hownet and googl...
Web query expansion based on association rules mining with e hownet and googl...
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Google Hacking 101
Google Hacking 101Google Hacking 101
Google Hacking 101
 
Immersive Recommendation
Immersive RecommendationImmersive Recommendation
Immersive Recommendation
 
Open Source Intelligence
Open Source IntelligenceOpen Source Intelligence
Open Source Intelligence
 
Mashups
MashupsMashups
Mashups
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummies
 
Introduction to Information Architecture & Design - SVA Workshop 02/15/14
Introduction to Information Architecture & Design - SVA Workshop 02/15/14Introduction to Information Architecture & Design - SVA Workshop 02/15/14
Introduction to Information Architecture & Design - SVA Workshop 02/15/14
 

Mehr von Laura Hollink

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentLaura Hollink
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftLaura Hollink
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesLaura Hollink
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenarioLaura Hollink
 
Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media dataLaura Hollink
 
Talk of Europe: Linked data of the European Parliament
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European ParliamentLaura Hollink
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectLaura Hollink
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Laura Hollink
 
Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media dataLaura Hollink
 
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebLaura Hollink
 

Mehr von Laura Hollink (11)

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU Parliament
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenario
 
Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media data
 
Talk of Europe: Linked data of the European Parliament
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European Parliament
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH project
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
 
Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media data
 
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic Web
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

WWW2013: Web Usage Mining with Semantic Analysis

  • 1. Web Usage Mining with Semantic Analysis Laura Hollink, VU University Amsterdam Peter Mika, Yahoo! Labs Barcelona Roi Blanco, Yahoo! Labs Barcelona
  • 2. Analysis of web user behavior What are typical use cases? Are these carried out in a particular order? Which use cases are not satisfied? And to which other sites do users go?
  • 3. Analysis of web user behavior What are typical use cases? Are these carried out in a particular order? Which use cases are not satisfied? And to which other sites do users go? oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org! captain'america'''movies.yahoo.com moneyball'trailer'''movies.yahoo.com' money'''moneyball'movies.yahoo.com' moneyball'''movies.yahoo.com''movies.yahoo.com en.wikipedia.org'''movies.yahoo.com''peter'brand'''peter nymag.com'''moneyball'the'movie'''www.imdb.com' moneyball'trailer'movies.yahoo.com''moneyball'trailer'' brad'pi-''brad'pi-'moneyball''brad'pi-'moneyball'movie'brad'pi-'moneyball''brad'pi-'moneyball'oscar'''www.imdb.co relay'for'life'calvert'ocunty www.relayforlife.org'trailer'for'moneyball'''movies.yahoo.com 'moneyball.movie moneyball'en.wikipedia.org 'movies.yahoo.com map'of'africa''www.africaguide.com' money'ball'movie'''www.imdb.com money'ball'movie'trailer''moneyball.movie-trailer.com'' brad'pi-'new''www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com'brad'pi-'news' news.search.yahoo.com moneyball'trailer''moneyball'trailer'www.imdb.com''www.imdb.com! Transaction logs: sessions of queries and clicks
  • 4. Analysis of web user behavior oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org! captain'america'''movies.yahoo.com moneyball'trailer'''movies.yahoo.com' money'''moneyball'movies.yahoo.com' moneyball'''movies.yahoo.com''movies.yahoo.com en.wikipedia.org'''movies.yahoo.com''peter'brand'''peter nymag.com'''moneyball'the'movie'''www.imdb.com' moneyball'trailer'movies.yahoo.com''moneyball'trailer'' brad'pi-''brad'pi-'moneyball''brad'pi-'moneyball'movie'brad'pi-'moneyball''brad'pi-'moneyball'oscar'''www.imdb.co relay'for'life'calvert'ocunty www.relayforlife.org'trailer'for'moneyball'''movies.yahoo.com 'moneyball.movie moneyball'en.wikipedia.org 'movies.yahoo.com map'of'africa''www.africaguide.com' money'ball'movie'''www.imdb.com money'ball'movie'trailer''moneyball.movie-trailer.com'' brad'pi-'new''www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com'brad'pi-'news' news.search.yahoo.com moneyball'trailer''moneyball'trailer'www.imdb.com''www.imdb.com! Transaction logs: sessions of queries and clicks Are these use cases typical for all movies? Recent movies? Only for Moneyball?
  • 5. Why are these questions difficult to answer? Sparsity of the event space ‣ 64% percent of queries are unique within a year ‣ even the most frequent patterns have extremely low support To illustrate: top 12 most frequent sessions observed in our data:
  • 6. Tasks Question 1: what are typical use cases? ‣Task 1: find sequences of events in the data that are more frequent (have a higher support) than a threshold. Question 2: what use cases are not satisfied? ‣Task 2: learn to predict website abandonment from queries and clicks.
  • 7. Approach 'oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org! Applied to the movie domain Connect queries to entities in the linked open data cloud and use properties of these entities to generalize and categorize queries.
  • 8. Data processing and linking steps 1.link queries to entities 2.select types of entities (classes) 3.detect modifier words (download, trailer, cast, date, etc.) 4.identify navigational queries 5.identify ‘loosing’ queries. 'oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
  • 9. 1. Linking queries to entities in the LOD cloud • We link one entity to each query. • The intent of about 40% of unique Web queries is to find a particular entity [Pound, WWW2008]. • We link to Freebase (has a lot of movie related info) and DBpedia (Wikipedia is widely used)
  • 10. 2. Select one type per entity • We use the Freebase API to get the semantic “types” of each query URI • Freebase ‘Notable types API’ is not official and not documented. • For repeatability and transparency, we have created our own heuristics to select one type for each entity: 1. no internal or administrative types, 2.prefer established domains (‘Commons’) over user defined schemas (’Bases’) 3.aggregate specific types into more general types a)subtypes of location -> location b)subtypes of award winners and nominees -> award_winner_nonimee c)prefer movie related types over other types: film, actor, artist, tv_program, tv_actor and location (order of decreasing preference). entity TypeType Type Type Type Type
  • 11. 3. Detect modifier words in queries Top 100 most frequent words that appear in the query log before or after entity names [Mika ISWC2009, Pantel WWW2012]. movie, movies, theater, cast, quotes, free, theaters, watch, 2011, new, tv, show, dvd, online, sex, video, cinema, trailer, list, theatre . . .
  • 12. 4. Identifying navigational queries • A navigational query is a query entered with the intention of navigating to a particular website. • A common heuristic is to consider navigational queries where the query matches the domain name of a clicked result. • “official homepage” is value of dbpedia:homepage, dbpedia:url, and foaf:homepage. netflix login www.netflix.com banana www.bananas.org European Parliament europarl.europa.eu
  • 13. 5 Identify ‘loosing’ queries • A ‘loosing’ query is the query that leads a user to abandon a service in favor of another service. • Common definition: A user repeats the same query and clicks on another result in the list. • Our broader, semantic definition:
  • 14. Evaluation 1.Linking to entities and types 2.Detection of frequent usage patterns 3.Prediction of website abandonment Applied to the movie domain • sample of server logs of Yahoo! Search in the US from June, 2011, split into sessions. • Only sessions that contain at least one visit to any of 16 popular movie sites4. • 1.7 million sessions, containing over 5.8 million queries and over 6.8 million clicks.
  • 15. Evaluation of links to entities and types • Compare manually created <query, entity> and <entity, type> pairs to automatically created links. • 2 samples: the 50 most frequent queries and 50 random queries. Examples: • Ambiguous query: “Green Lantern” - the movie or the fictional character? • Wrong type: Oil peak is a serious game subject?
  • 16. Evaluation of links to entities and types Queries Entities Types Frequencyofoccurrence Frequencyofoccurrence Frequencyofoccurrence
  • 17. Frequent usage patterns I • Freebase:release_date property of entities. Recent movies Older movies
  • 18. Frequent usage patterns II • Sequences of consecutive query types.
  • 19. Frequent usage patterns III • A comparison of websites. • most frequent query types that lead to a click on a website. /film /film/actor /tv_program /people/person /book/book ional_universe/fictional_character /music/artist /tv/tv_actor /location /film/film_series Website 1 proportionofqueriesthatleadtoaclickonthewebsite 0.0 0.1 0.2 0.3 0.4 0.5 0.6 /film /location /book/book /film/actor /business/employer /fictional_universe/work_of_fiction ional_universe/fictional_character /tv_program /architecture/building_function /film/film_series Website 2 proportionofqueriesthatleadtoaclickonthewebsite 0.0 0.1 0.2 0.3 0.4 0.5 0.6 /location /business/employer /film /film/actor /organization/organization /architecture/building_function /people/person /tv_program /tv/tv_network /internet/website_category Website 3 proportionofqueriesthatleadtoaclickonthewebsite 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Proportionofqueries Proportionofqueries Website BWebsite A
  • 20. Predicting website abandonment • 3 Classification Tasks: Given a (part of a) session in which a user is lost/gained, predict... 1...whether a user will be gained for a given website. 2...given that the session includes a given website, whether this website is in the loosing or gaining position. 3...given that the session includes two given websites, which one is in the gaining position. •Gradient Boosted Decision Trees.
  • 21. Discussion and future work • Mining patterns of entire queries gives problems with sparsity of data • We interpret the structure and semantics of the queries, using openly available, up-to-date information on the Web. • give a “semantic” definition of navigational and ‘loosing’ queries • find patterns of user behavior • predict website abandonment • This is the beginning: • Use more properties of entities, more features. • Detect more complex patterns. • Explore other linked open datasets.