SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Pavan Kapanipathi*, Prateek Jain^, Chitra
Venkataramani^, Amit Sheth*
*Kno.e.sis Center, Wright State University
^IBM TJ Watson Research Center
1
#eswc2014Kapanipathi
 Motivation
 Background
 Approach
 Evaluation
 Conclusion & Future Work
2
Motivation
 Approach
 Evaluation
 Conclusion & Future Work
3
 Tapping into Social Networks to identify
interests is not new (2006+). It works!!
◦ Google, Bing, Samsung TV etc.
 Twitter Content
◦ 500M+ Users generating 500M+ tweets per day.
◦ Public and useful for research
4
 Interests with lesser or no semantics
◦ Bag of Words [1]
◦ Bag of Concepts
 Some Semantics
◦ Bag of Linked Entities with intentions of using
Knowledge Bases. [2, 3]
5
1. Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. You Are Who You Know: Inferring User
Profiles in Online Social Networks. WSDM ’10.
2. Fabian Abel, Qi Gao, Geert-Jan Houben, and Ke Tao. Analyzing User Modeling on Twitter for Personalized News
Recommendations. UMAP ’11
3. Fabrizio Orlandi, John Breslin, and Alexandre Passant. Aggregated, Interoperable and Multi-domain User Profiles
for the Social Web. I-SEMANTICS ’12.
6
 How can Semantics/Knowledge Bases be
utilized to infer interests?
◦ Extensive use of Knowledge Bases to infer user
interests from Tweets is yet to be explored.
 First we started with utilizing Hierarchical
Relationships
7
Internet
Semantic
Search
Linked
Data
Metadata
Technology
World
Wide Web
Semantic
Web
Entities
Structured
Information
8
 Addressing Data Sparcity Problem
◦ Infer more interests of the users with lesser data.
 Flexibility for Recommendations
◦ Recommend about Sports or Football
 KB knows that Football is a sub-category of Sports
◦ Resource Description Framework and Semantic Web
 RDF has lesser data online to recommend.
9
 Motivation
Approach
 Evaluation
 Conclusion & Future Work
10
11
Tweets
Interest Hierarchy
12
Tweets
Interest Hierarchy
 Selecting an Ontology
◦ Available: Wikipedia, Dmoz, OpenCyc, Freebase
◦ Our framework can adapt to any ontology
 Wikipedia
◦ Diverse Domains & Coverage
◦ Resemblance to a Taxonomy
◦ Extracted Structured Wikipedia – Dbpedia
◦ Existing entity recognition techniques (Explained
further)
13
 4.2 Million Articles
 0.8 Million Wikipedia Categories
 2.0 Million Category-Subcategory
relationships
 Challenges
◦ Since crowd-sourced – Noisy
◦ Not a hierarchy/taxonomy
 It is a graph
 It has cycles
14
 Clean up -- Removed Wiki Admin Categories
 Hierarchical Interest Graph needs a Base
Hierarchy
◦ Shortest Path from the root node
 Root Node: Category:Main Topic Classifications
 Assumption – Hops to the root node determines the
level of abstraction of the category.
15
16
Agriculture Science
Science
Education
Scientists
Main topic
classifications
Sports Health
Health
Care
Health
Economics
Level: 1
Level: 2
Level: 3
 Removing Links that does not concur to a
hierarchy
17
18
Tweets
Interest Hierarchy
 Extracting Wikipedia concepts from Tweets
 Interests Scoring
19
http://en.wikipedia.org/wiki/Semantic_search
http://en.wikipedia.org/wiki/Ontology
◦ Issues relevant to entity extraction are handled by
the web services
 Stop words removal, URLs, Disambiguation etc.
20
Precision Recall F-measure Usability Rate Limit
License
Text Razor 64.6 26.9 38.0 Web Service 500/day
Zemanta 57.7 31.8 41.0 Web Service 10000/day
*L. Derczynski, D. Maynard, N. Aswani, and K. Bontcheva. Microblog-genre noise and impact on semantic annotation accuracy.
In Proceedings of the 24th ACM Conference on Hypertext and Social Media, HT ’13.
 Scoring Wikipedia concepts
21
Internet
Semantic
Search
Linked
Data
Metadata
Technology
World Wide Web
Semantic
Web
User
Interests
Structured
Information
0.8 0.2 0.6
Scores for
Interests
22
23
Tweets
Interest Hierarchy
 Result (Challenges)
◦ Infer more categories
without context
◦ Equal weights regardless
Interest Score
◦ Cannot rank categories of
Interest for a user
◦ We use Spreading
Activation
24
Cricket
M S
Dhoni
Virat
Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers
Honorary
Members of
the Order of
Australia
Order of
Australia
Awards
Culture
 Graph Algorithm to find contextual nodes
◦ Cognitive Sciences
◦ Neural Networks
◦ Information Retrieval
 Associative, Semantic Networks
◦ Semantic Web
 Context Generation
25
26
Cricket
M S Dhoni Virat Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers
0.8 0.2
0.6
0.5
0.4
0.25
0.1
Activation Function
Determines the extent of
spreading
27
 No Decay – No Weighted Edge
• Result: Most generic categories ranked higher
 Decays over the hops of the activation
• 0.4, 0.6, 0.8
• Result: Same as above
28
29
Agriculture Science
Science
Education
Scientists
Main topic
classifications
Sports Health
Health
Care
Health
Economics
Level: 1
Main Topic Classification – 1
Technology – 2
Science – 2
Sports– 2
Business – 2
…
…
Technology Companies – 3
Scientists– 3
29
 Uneven distribution of nodes in the hierarchy
 Many-many for category-subcategory
relationships
3030
 Uneven distribution of nodes in the hierarchy
 Many-many for category-subcategory
relationships
31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
50000
100000
150000
200000
250000
300000
Hierarchical Level
NumberofNodes
31
 Uneven distribution of nodes in the hierarchy
 Many-many for category-subcategory
relationships
3232
 Uneven distribution of nodes in the hierarchy
 Many-many for category-subcategory
relationships
3333

34
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
50000
100000
150000
200000
250000
300000
NumberofNodes
Hierarchical Level
34
35
1 2 3 4
35
 Nodes that intersect domains/subcategories activated
by diverse entities
3636
37
Cricket
M S Dhoni Virat Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers3
3
5
5
Michael
Clarke
Shane
Watson
Australian
Cricket
Australian
Cricketers
2
2
37

3838
39
 Motivation
 Approach
Evaluation
 Conclusion & Future Work
40
 User Study Data
◦ 37 Users
◦ 31927 Tweets
41
• Hierarchical Interest Graph
– 111,535 Category
Interests.
– 3000 Categories/user
– Ranking Evaluation --
Top-50 Categories.
 How many relevant/irrelevant Hierarchical
Interests are retrieved at top-k ranks?
◦ Graded Precision
 How well are the retrieved relevant
Hierarchical Interests ranked at top-k?
◦ Mean Average Precision
 How early in the ranked Hierarchical Interests
can we find a relevant result?
◦ Mean Reciprocal Recall
42
43
Priority Intersect works the best
with
• 76% Mean Average Precision
• 98% Mean Reciprocal Recall
 How many of the categories inferred by the system
were not explicitly mentioned by the user in
tweets? (Semantic Web and Category:Semantic Web)
44
Priority Intersect at Top-10
• 52% of Categories were not mentioned in
tweets by user
• 65% of which were marked relevant
• 10% were marked May-be
 Mapped (String match) categories of
Wikipedia to Dmoz.
◦ ~141K categories mapped
 Compared all the category and sub-category
relationships of the mapped categories in the
hierarchy to manually created Dmoz.
◦ 87% precise (in hierarchy were also found in Dmoz)
45
 Motivation
 Approach
 Evaluation
Conclusion & Future
Work
46
 Hierarchical Interest Graph (Hierarchy representation of
user interests)
◦ With hierarchical levels of each interest to have flexibility for
personalizing and recommending based on its abstractness.
 We semantically enhanced user profiles of interests from
Twitter using Knowledge bases.
◦ Inferred abstract/hierarchical interests of Twitter users using
Wikipedia
◦ This can help reducing the data sparcity problem by inferring
relevant interests.
 The top-1 hierarchical-interest generated by the system
was correct for 36 out of 37 user-study participants.
◦ Mean Average Precision at Top-10 is 0.76
47
 Measuring impact of Hierarchical Interest
Graphs for recommendation of Movies/Music
◦ Datasets
 Movielens
 Lastfm
 Tuning the system to utilize the hierarchical
levels of interests for personalization and
recommendation
◦ Sports (most abstract interest)
◦ Baseball (specific interest)
48
49
Contact: Pavan Kapanipathi
Twitter:@pavankaps
Email: pavan@knoesis.org
More info: Knoesis Wiki – Hierarchical Interest Graph

Weitere ähnliche Inhalte

Ähnlich wie User Interests Identification From Twitter using Hierarchical Knowledge Base

Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Sonya Liberman
 
The Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian ResearchersThe Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian ResearchersIRDL
 
Deploying Viva Topics
Deploying Viva TopicsDeploying Viva Topics
Deploying Viva TopicsDrew Madelung
 
Information Architecture Workshop
Information Architecture WorkshopInformation Architecture Workshop
Information Architecture WorkshopPeter Morville
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
IRJET- Sentimental Analysis from Tweets to Find Positive, Negative,Neutra...
IRJET-  	  Sentimental Analysis from Tweets to Find Positive, Negative,Neutra...IRJET-  	  Sentimental Analysis from Tweets to Find Positive, Negative,Neutra...
IRJET- Sentimental Analysis from Tweets to Find Positive, Negative,Neutra...IRJET Journal
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysisijtsrd
 
Management and analysis of social media data
Management and analysis of social media dataManagement and analysis of social media data
Management and analysis of social media dataWeining Qian
 
How you and your gateway can benefit from the services of the Science Gateway...
How you and your gateway can benefit from the services of the Science Gateway...How you and your gateway can benefit from the services of the Science Gateway...
How you and your gateway can benefit from the services of the Science Gateway...Katherine Lawrence
 
4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"rhetoricked
 
Supporting Research Communities with XSEDE
Supporting Research Communities with XSEDESupporting Research Communities with XSEDE
Supporting Research Communities with XSEDEJohn Towns
 
Large scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationLarge scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationMitul Tiwari
 
The open academic: Why and how business academics should use social media to ...
The open academic: Why and how business academics should use social media to ...The open academic: Why and how business academics should use social media to ...
The open academic: Why and how business academics should use social media to ...Ian McCarthy
 
Practical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapePractical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapeDigital Science
 

Ähnlich wie User Interests Identification From Twitter using Hierarchical Knowledge Base (20)

Saner17 sharma
Saner17 sharmaSaner17 sharma
Saner17 sharma
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
 
The Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian ResearchersThe Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian Researchers
 
Intro to UOSM2012
Intro to UOSM2012Intro to UOSM2012
Intro to UOSM2012
 
Deploying Viva Topics
Deploying Viva TopicsDeploying Viva Topics
Deploying Viva Topics
 
Information Architecture Workshop
Information Architecture WorkshopInformation Architecture Workshop
Information Architecture Workshop
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
Lagace - Copyright Clearance Center April 2, 2015
Lagace - Copyright Clearance Center April 2, 2015Lagace - Copyright Clearance Center April 2, 2015
Lagace - Copyright Clearance Center April 2, 2015
 
IRJET- Sentimental Analysis from Tweets to Find Positive, Negative,Neutra...
IRJET-  	  Sentimental Analysis from Tweets to Find Positive, Negative,Neutra...IRJET-  	  Sentimental Analysis from Tweets to Find Positive, Negative,Neutra...
IRJET- Sentimental Analysis from Tweets to Find Positive, Negative,Neutra...
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Management and analysis of social media data
Management and analysis of social media dataManagement and analysis of social media data
Management and analysis of social media data
 
How you and your gateway can benefit from the services of the Science Gateway...
How you and your gateway can benefit from the services of the Science Gateway...How you and your gateway can benefit from the services of the Science Gateway...
How you and your gateway can benefit from the services of the Science Gateway...
 
4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"
 
Supporting Research Communities with XSEDE
Supporting Research Communities with XSEDESupporting Research Communities with XSEDE
Supporting Research Communities with XSEDE
 
Large scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationLarge scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluation
 
The open academic: Why and how business academics should use social media to ...
The open academic: Why and how business academics should use social media to ...The open academic: Why and how business academics should use social media to ...
The open academic: Why and how business academics should use social media to ...
 
Strengthening Network Practice Through Evaluation
Strengthening Network Practice Through EvaluationStrengthening Network Practice Through Evaluation
Strengthening Network Practice Through Evaluation
 
Practical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapePractical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscape
 

Mehr von Pavan Kapanipathi

Improving Natural Language Inference Using External Knowledge in the Science ...
Improving Natural Language Inference Using External Knowledge in the Science ...Improving Natural Language Inference Using External Knowledge in the Science ...
Improving Natural Language Inference Using External Knowledge in the Science ...Pavan Kapanipathi
 
Knoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsKnoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsPavan Kapanipathi
 
Knowledge base enabled Information Filtering on Social Web -- EMC
Knowledge base enabled Information Filtering on Social Web -- EMCKnowledge base enabled Information Filtering on Social Web -- EMC
Knowledge base enabled Information Filtering on Social Web -- EMCPavan Kapanipathi
 
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Pavan Kapanipathi
 
Hierarchical Interest Graphs from Twitter
Hierarchical Interest Graphs from TwitterHierarchical Interest Graphs from Twitter
Hierarchical Interest Graphs from TwitterPavan Kapanipathi
 
Privacy Aware Semantic Dissemination
Privacy Aware Semantic DisseminationPrivacy Aware Semantic Dissemination
Privacy Aware Semantic DisseminationPavan Kapanipathi
 
Personalized Filtering of Twitter Stream
Personalized Filtering of Twitter StreamPersonalized Filtering of Twitter Stream
Personalized Filtering of Twitter StreamPavan Kapanipathi
 

Mehr von Pavan Kapanipathi (9)

Improving Natural Language Inference Using External Knowledge in the Science ...
Improving Natural Language Inference Using External Knowledge in the Science ...Improving Natural Language Inference Using External Knowledge in the Science ...
Improving Natural Language Inference Using External Knowledge in the Science ...
 
Knoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsKnoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-Tutorials
 
Knowledge base enabled Information Filtering on Social Web -- EMC
Knowledge base enabled Information Filtering on Social Web -- EMCKnowledge base enabled Information Filtering on Social Web -- EMC
Knowledge base enabled Information Filtering on Social Web -- EMC
 
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
 
Hierarchical Interest Graphs from Twitter
Hierarchical Interest Graphs from TwitterHierarchical Interest Graphs from Twitter
Hierarchical Interest Graphs from Twitter
 
Random walk on Graphs
Random walk on GraphsRandom walk on Graphs
Random walk on Graphs
 
SemPuSH: ISWC 2011 Poster
SemPuSH: ISWC 2011 PosterSemPuSH: ISWC 2011 Poster
SemPuSH: ISWC 2011 Poster
 
Privacy Aware Semantic Dissemination
Privacy Aware Semantic DisseminationPrivacy Aware Semantic Dissemination
Privacy Aware Semantic Dissemination
 
Personalized Filtering of Twitter Stream
Personalized Filtering of Twitter StreamPersonalized Filtering of Twitter Stream
Personalized Filtering of Twitter Stream
 

Kürzlich hochgeladen

Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Copilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotCopilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotEdgard Alejos
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 

Kürzlich hochgeladen (20)

Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Copilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotCopilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform Copilot
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 

User Interests Identification From Twitter using Hierarchical Knowledge Base

  • 1. Pavan Kapanipathi*, Prateek Jain^, Chitra Venkataramani^, Amit Sheth* *Kno.e.sis Center, Wright State University ^IBM TJ Watson Research Center 1 #eswc2014Kapanipathi
  • 2.  Motivation  Background  Approach  Evaluation  Conclusion & Future Work 2
  • 4.  Tapping into Social Networks to identify interests is not new (2006+). It works!! ◦ Google, Bing, Samsung TV etc.  Twitter Content ◦ 500M+ Users generating 500M+ tweets per day. ◦ Public and useful for research 4
  • 5.  Interests with lesser or no semantics ◦ Bag of Words [1] ◦ Bag of Concepts  Some Semantics ◦ Bag of Linked Entities with intentions of using Knowledge Bases. [2, 3] 5 1. Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. You Are Who You Know: Inferring User Profiles in Online Social Networks. WSDM ’10. 2. Fabian Abel, Qi Gao, Geert-Jan Houben, and Ke Tao. Analyzing User Modeling on Twitter for Personalized News Recommendations. UMAP ’11 3. Fabrizio Orlandi, John Breslin, and Alexandre Passant. Aggregated, Interoperable and Multi-domain User Profiles for the Social Web. I-SEMANTICS ’12.
  • 6. 6
  • 7.  How can Semantics/Knowledge Bases be utilized to infer interests? ◦ Extensive use of Knowledge Bases to infer user interests from Tweets is yet to be explored.  First we started with utilizing Hierarchical Relationships 7
  • 9.  Addressing Data Sparcity Problem ◦ Infer more interests of the users with lesser data.  Flexibility for Recommendations ◦ Recommend about Sports or Football  KB knows that Football is a sub-category of Sports ◦ Resource Description Framework and Semantic Web  RDF has lesser data online to recommend. 9
  • 10.  Motivation Approach  Evaluation  Conclusion & Future Work 10
  • 13.  Selecting an Ontology ◦ Available: Wikipedia, Dmoz, OpenCyc, Freebase ◦ Our framework can adapt to any ontology  Wikipedia ◦ Diverse Domains & Coverage ◦ Resemblance to a Taxonomy ◦ Extracted Structured Wikipedia – Dbpedia ◦ Existing entity recognition techniques (Explained further) 13
  • 14.  4.2 Million Articles  0.8 Million Wikipedia Categories  2.0 Million Category-Subcategory relationships  Challenges ◦ Since crowd-sourced – Noisy ◦ Not a hierarchy/taxonomy  It is a graph  It has cycles 14
  • 15.  Clean up -- Removed Wiki Admin Categories  Hierarchical Interest Graph needs a Base Hierarchy ◦ Shortest Path from the root node  Root Node: Category:Main Topic Classifications  Assumption – Hops to the root node determines the level of abstraction of the category. 15
  • 16. 16 Agriculture Science Science Education Scientists Main topic classifications Sports Health Health Care Health Economics Level: 1 Level: 2 Level: 3
  • 17.  Removing Links that does not concur to a hierarchy 17
  • 19.  Extracting Wikipedia concepts from Tweets  Interests Scoring 19 http://en.wikipedia.org/wiki/Semantic_search http://en.wikipedia.org/wiki/Ontology
  • 20. ◦ Issues relevant to entity extraction are handled by the web services  Stop words removal, URLs, Disambiguation etc. 20 Precision Recall F-measure Usability Rate Limit License Text Razor 64.6 26.9 38.0 Web Service 500/day Zemanta 57.7 31.8 41.0 Web Service 10000/day *L. Derczynski, D. Maynard, N. Aswani, and K. Bontcheva. Microblog-genre noise and impact on semantic annotation accuracy. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, HT ’13.
  • 21.  Scoring Wikipedia concepts 21
  • 24.  Result (Challenges) ◦ Infer more categories without context ◦ Equal weights regardless Interest Score ◦ Cannot rank categories of Interest for a user ◦ We use Spreading Activation 24 Cricket M S Dhoni Virat Kohli Sachin Tendulkar Sports Indian Cricket Indian Cricketers Honorary Members of the Order of Australia Order of Australia Awards Culture
  • 25.  Graph Algorithm to find contextual nodes ◦ Cognitive Sciences ◦ Neural Networks ◦ Information Retrieval  Associative, Semantic Networks ◦ Semantic Web  Context Generation 25
  • 26. 26 Cricket M S Dhoni Virat Kohli Sachin Tendulkar Sports Indian Cricket Indian Cricketers 0.8 0.2 0.6 0.5 0.4 0.25 0.1 Activation Function Determines the extent of spreading
  • 27. 27
  • 28.  No Decay – No Weighted Edge • Result: Most generic categories ranked higher  Decays over the hops of the activation • 0.4, 0.6, 0.8 • Result: Same as above 28
  • 29. 29 Agriculture Science Science Education Scientists Main topic classifications Sports Health Health Care Health Economics Level: 1 Main Topic Classification – 1 Technology – 2 Science – 2 Sports– 2 Business – 2 … … Technology Companies – 3 Scientists– 3 29
  • 30.  Uneven distribution of nodes in the hierarchy  Many-many for category-subcategory relationships 3030
  • 31.  Uneven distribution of nodes in the hierarchy  Many-many for category-subcategory relationships 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 50000 100000 150000 200000 250000 300000 Hierarchical Level NumberofNodes 31
  • 32.  Uneven distribution of nodes in the hierarchy  Many-many for category-subcategory relationships 3232
  • 33.  Uneven distribution of nodes in the hierarchy  Many-many for category-subcategory relationships 3333
  • 34.  34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 50000 100000 150000 200000 250000 300000 NumberofNodes Hierarchical Level 34
  • 35. 35 1 2 3 4 35
  • 36.  Nodes that intersect domains/subcategories activated by diverse entities 3636
  • 37. 37 Cricket M S Dhoni Virat Kohli Sachin Tendulkar Sports Indian Cricket Indian Cricketers3 3 5 5 Michael Clarke Shane Watson Australian Cricket Australian Cricketers 2 2 37
  • 39. 39
  • 40.  Motivation  Approach Evaluation  Conclusion & Future Work 40
  • 41.  User Study Data ◦ 37 Users ◦ 31927 Tweets 41 • Hierarchical Interest Graph – 111,535 Category Interests. – 3000 Categories/user – Ranking Evaluation -- Top-50 Categories.
  • 42.  How many relevant/irrelevant Hierarchical Interests are retrieved at top-k ranks? ◦ Graded Precision  How well are the retrieved relevant Hierarchical Interests ranked at top-k? ◦ Mean Average Precision  How early in the ranked Hierarchical Interests can we find a relevant result? ◦ Mean Reciprocal Recall 42
  • 43. 43 Priority Intersect works the best with • 76% Mean Average Precision • 98% Mean Reciprocal Recall
  • 44.  How many of the categories inferred by the system were not explicitly mentioned by the user in tweets? (Semantic Web and Category:Semantic Web) 44 Priority Intersect at Top-10 • 52% of Categories were not mentioned in tweets by user • 65% of which were marked relevant • 10% were marked May-be
  • 45.  Mapped (String match) categories of Wikipedia to Dmoz. ◦ ~141K categories mapped  Compared all the category and sub-category relationships of the mapped categories in the hierarchy to manually created Dmoz. ◦ 87% precise (in hierarchy were also found in Dmoz) 45
  • 46.  Motivation  Approach  Evaluation Conclusion & Future Work 46
  • 47.  Hierarchical Interest Graph (Hierarchy representation of user interests) ◦ With hierarchical levels of each interest to have flexibility for personalizing and recommending based on its abstractness.  We semantically enhanced user profiles of interests from Twitter using Knowledge bases. ◦ Inferred abstract/hierarchical interests of Twitter users using Wikipedia ◦ This can help reducing the data sparcity problem by inferring relevant interests.  The top-1 hierarchical-interest generated by the system was correct for 36 out of 37 user-study participants. ◦ Mean Average Precision at Top-10 is 0.76 47
  • 48.  Measuring impact of Hierarchical Interest Graphs for recommendation of Movies/Music ◦ Datasets  Movielens  Lastfm  Tuning the system to utilize the hierarchical levels of interests for personalization and recommendation ◦ Sports (most abstract interest) ◦ Baseball (specific interest) 48
  • 49. 49 Contact: Pavan Kapanipathi Twitter:@pavankaps Email: pavan@knoesis.org More info: Knoesis Wiki – Hierarchical Interest Graph