SlideShare a Scribd company logo
1 of 1
Download to read offline
Addressing Volume and Velocity Challenge on the Social Web using Crowd-Sourced Knowledge Bases. Pavan Kapanipathi Kno.e.sis Center, Wright State University, Dayton, OH USA 
Volume Challenge 
Wikipedia: 
•Collaborative encyclopedia with more than 4M articles. 
•Prominent source of an evolving knowledge base. 
•Structured representation of Wikipedia as Dbpedia. 
•Wikipedia Hyperlink structure is a powerful resource to find semantic realtedness between text and entities. 
Twitter and Wikipedia 
This work has primarily focused on addressing the volume and velocity challenge on Social Web, specifically Twitter. In order to address these challenges, we have utilized Wikipedia as the source of Knowledge Base. Volume – Hierarchical Interest Graphs Generate Hierarchical Interest Graph from users’ tweets. The Hierarchical Interest Graphs are later used for filtering and recommendations. Velocity – Tracking Dynamic Events on Twitter Events change their topics (sub-events) dynamically. Tracking events on Twitter is challenging. We utilize the evolving Wikipedia structure to track dynamic events on Twitter. 
Overview 
Evaluation 
•User study with 37 participants 
•Evaluated the top-30 categories for three different experiments. 
•Best had a MAP of 76% at top-5 with 98% MRR 
Evaluation 
•Dynamic events on Twitter are challenging to follow either for information or for real-time analysis. 
•During dynamic events Wikipedia evolves due to its collaborative nature . 
•This work leverages Wikipedia’s dynamic nature and the hashtag co-occurrence on Twitter to track event tweets. 
•Created gold standard for 3 events (75 Hashtags, 15000 Tweets). 
•Evaluated the tweets tagged with top hashtags 
•NDCG of 92% for the top 5 hashtags 
•Generates entities of interests from tweets of users. 
•Maps the entities to those on Wikipedia and infers the appropriate categories from Wikipedia Hierarchy. 
•Spreading Activation function is a function of 1. Prominence of the category for its sub-category (handling multiple categories) 2. Importance of the node in the user’s interest hierarchy. 3. Normalizing based on the distribution of categories in the hierarchy. 
•Handling Information Overload by utilizing User Profiles of Interest. Also, addressing Cold start and Data sparcity problems . 
•Hierarchy representation of interests by inferring the hierarchy from knowledge bases. 
•Our hashtag co-occurrence analysis is as follows: 1. A very small percentage of event-related hashtags are necessary to get most of the event related tweets. 2. These popular hashtags co-occur very well. 
•Starting with an initial event-relevant hashtag, we check the relevancy of co-occurring hashtags with the Wikipedia Event page. 
•The relevancy is measured by representing -- tags with its co- occurring entities --- Wikipedia event page by its linked entities. 
Velocity Challenge 
Publications 
Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, and Amit Sheth. User Interests Identification on Twitter Using a Hierarchical Knowledge BaseUR - The Semantic Web: Trends and Challenges, ESWC 2014. Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, and Amit Sheth. 2014. Hierarchical interest graph from tweets. In Proceedings of the companion publication of the 23rd international conference on World wide web companion (WWW Companion '14). Pavan Kapanipathi, Krishnaprasad Thirunarayan, Amit Sheth, and Pascal Hitzler. A Real-time Approach for Continuous Crawling of Events on Twitter by Leveraging Wikipedia. Technical report 2013. 
Twitter: 
•Unidirectional paradigm and open to research. 
•Twitter users generate around 433k tweets, around 12TB /min. 
• Being explored to understand user behavior , disaster management, follow trending topics, and news .

More Related Content

What's hot

ALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research ImpactALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research Impact
William Gunn
 

What's hot (20)

Sra Oa Nih
Sra Oa NihSra Oa Nih
Sra Oa Nih
 
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
 
Terkko altmetrics process
Terkko altmetrics processTerkko altmetrics process
Terkko altmetrics process
 
Article Impact (ALM and altmetrics)
Article Impact (ALM and altmetrics)Article Impact (ALM and altmetrics)
Article Impact (ALM and altmetrics)
 
ScienceOpen "The Big Picture: Open Access content aggregators as drivers of i...
ScienceOpen "The Big Picture: Open Access content aggregators as drivers of i...ScienceOpen "The Big Picture: Open Access content aggregators as drivers of i...
ScienceOpen "The Big Picture: Open Access content aggregators as drivers of i...
 
Reading Preference and Behavior on Wikipedia
Reading Preference and Behavior on WikipediaReading Preference and Behavior on Wikipedia
Reading Preference and Behavior on Wikipedia
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Datashare cni spring2013
Datashare cni spring2013Datashare cni spring2013
Datashare cni spring2013
 
ALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research ImpactALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research Impact
 
Digby - Institutional Repository - Vendor Partnerships
Digby - Institutional Repository - Vendor PartnershipsDigby - Institutional Repository - Vendor Partnerships
Digby - Institutional Repository - Vendor Partnerships
 
Library Garden's Magical Mystery Tour: Live from the Ivory Tower
Library Garden's Magical Mystery Tour: Live from the Ivory TowerLibrary Garden's Magical Mystery Tour: Live from the Ivory Tower
Library Garden's Magical Mystery Tour: Live from the Ivory Tower
 
OSFair2017 Workshop | Frontiers’ Ambition for Open Science
OSFair2017 Workshop | Frontiers’ Ambition for Open ScienceOSFair2017 Workshop | Frontiers’ Ambition for Open Science
OSFair2017 Workshop | Frontiers’ Ambition for Open Science
 
Apis and scientific publishing
Apis and scientific publishingApis and scientific publishing
Apis and scientific publishing
 
Brace for Impact: New Means for Measuring Research Metrics
Brace for Impact: New Means for Measuring Research MetricsBrace for Impact: New Means for Measuring Research Metrics
Brace for Impact: New Means for Measuring Research Metrics
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
 
Wikimedia Foundation, Trust & Safety: Cyber Harassment Classification and Pre...
Wikimedia Foundation, Trust & Safety: Cyber Harassment Classification and Pre...Wikimedia Foundation, Trust & Safety: Cyber Harassment Classification and Pre...
Wikimedia Foundation, Trust & Safety: Cyber Harassment Classification and Pre...
 
Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life Cycle
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?
 
Practicing Data Science Responsibly
Practicing Data Science ResponsiblyPracticing Data Science Responsibly
Practicing Data Science Responsibly
 
Accessing The Materials You Need
Accessing The Materials You NeedAccessing The Materials You Need
Accessing The Materials You Need
 

Viewers also liked

Hierarchical Interest Graphs from Twitter
Hierarchical Interest Graphs from TwitterHierarchical Interest Graphs from Twitter
Hierarchical Interest Graphs from Twitter
Pavan Kapanipathi
 
Personalized and Adaptive Semantic Information Filtering for Social Media
Personalized and Adaptive Semantic Information Filtering for Social MediaPersonalized and Adaptive Semantic Information Filtering for Social Media
Personalized and Adaptive Semantic Information Filtering for Social Media
Pavan Kapanipathi
 

Viewers also liked (7)

User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
Hierarchical Interest Graphs from Twitter
Hierarchical Interest Graphs from TwitterHierarchical Interest Graphs from Twitter
Hierarchical Interest Graphs from Twitter
 
Knowledge base enabled Information Filtering on Social Web -- EMC
Knowledge base enabled Information Filtering on Social Web -- EMCKnowledge base enabled Information Filtering on Social Web -- EMC
Knowledge base enabled Information Filtering on Social Web -- EMC
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 
Personalized and Adaptive Semantic Information Filtering for Social Media
Personalized and Adaptive Semantic Information Filtering for Social MediaPersonalized and Adaptive Semantic Information Filtering for Social Media
Personalized and Adaptive Semantic Information Filtering for Social Media
 
Knoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsKnoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-Tutorials
 
Random walk on Graphs
Random walk on GraphsRandom walk on Graphs
Random walk on Graphs
 

Similar to Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced Knowledge Bases

final_nlp
final_nlpfinal_nlp
final_nlp
aphex34
 
Towards identifying Collaborative Learning groups using Social Media
Towards identifying Collaborative Learning groups using Social MediaTowards identifying Collaborative Learning groups using Social Media
Towards identifying Collaborative Learning groups using Social Media
Selver Softic
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
NSMNSS
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Digital Methods Initiative
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
20211a05p7
 

Similar to Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced Knowledge Bases (20)

Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
 
The sum of all human knowledge in the age of machines: A new research agenda ...
The sum of all human knowledge in the age of machines: A new research agenda ...The sum of all human knowledge in the age of machines: A new research agenda ...
The sum of all human knowledge in the age of machines: A new research agenda ...
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
Towards identifying Collaborative Learning groups using Social Media
Towards identifying Collaborative Learning groups using Social MediaTowards identifying Collaborative Learning groups using Social Media
Towards identifying Collaborative Learning groups using Social Media
 
A new research agenda for Wikimedia – Big Dive 2015
A new research agenda for Wikimedia – Big Dive 2015A new research agenda for Wikimedia – Big Dive 2015
A new research agenda for Wikimedia – Big Dive 2015
 
Twitter in Academic Conferences
Twitter in Academic ConferencesTwitter in Academic Conferences
Twitter in Academic Conferences
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
Ins and Outs of News Twitter as a Real-Time News Analysis Service
Ins and Outs of News Twitter as a Real-Time News Analysis ServiceIns and Outs of News Twitter as a Real-Time News Analysis Service
Ins and Outs of News Twitter as a Real-Time News Analysis Service
 
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR Tutorial
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR TutorialExploiting Wikipedia for Information Retrieval Tasks, SIGIR Tutorial
Exploiting Wikipedia for Information Retrieval Tasks, SIGIR Tutorial
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
News construction from microblogging post using open data
News construction from microblogging post using open dataNews construction from microblogging post using open data
News construction from microblogging post using open data
 
Analyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News RecommendationsAnalyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News Recommendations
 
Nanotweets
NanotweetsNanotweets
Nanotweets
 
Gaza_Audience Gatekeeping
Gaza_Audience GatekeepingGaza_Audience Gatekeeping
Gaza_Audience Gatekeeping
 
Wikipedia - Disruptive Technology
 Wikipedia - Disruptive Technology Wikipedia - Disruptive Technology
Wikipedia - Disruptive Technology
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced Knowledge Bases

  • 1. Addressing Volume and Velocity Challenge on the Social Web using Crowd-Sourced Knowledge Bases. Pavan Kapanipathi Kno.e.sis Center, Wright State University, Dayton, OH USA Volume Challenge Wikipedia: •Collaborative encyclopedia with more than 4M articles. •Prominent source of an evolving knowledge base. •Structured representation of Wikipedia as Dbpedia. •Wikipedia Hyperlink structure is a powerful resource to find semantic realtedness between text and entities. Twitter and Wikipedia This work has primarily focused on addressing the volume and velocity challenge on Social Web, specifically Twitter. In order to address these challenges, we have utilized Wikipedia as the source of Knowledge Base. Volume – Hierarchical Interest Graphs Generate Hierarchical Interest Graph from users’ tweets. The Hierarchical Interest Graphs are later used for filtering and recommendations. Velocity – Tracking Dynamic Events on Twitter Events change their topics (sub-events) dynamically. Tracking events on Twitter is challenging. We utilize the evolving Wikipedia structure to track dynamic events on Twitter. Overview Evaluation •User study with 37 participants •Evaluated the top-30 categories for three different experiments. •Best had a MAP of 76% at top-5 with 98% MRR Evaluation •Dynamic events on Twitter are challenging to follow either for information or for real-time analysis. •During dynamic events Wikipedia evolves due to its collaborative nature . •This work leverages Wikipedia’s dynamic nature and the hashtag co-occurrence on Twitter to track event tweets. •Created gold standard for 3 events (75 Hashtags, 15000 Tweets). •Evaluated the tweets tagged with top hashtags •NDCG of 92% for the top 5 hashtags •Generates entities of interests from tweets of users. •Maps the entities to those on Wikipedia and infers the appropriate categories from Wikipedia Hierarchy. •Spreading Activation function is a function of 1. Prominence of the category for its sub-category (handling multiple categories) 2. Importance of the node in the user’s interest hierarchy. 3. Normalizing based on the distribution of categories in the hierarchy. •Handling Information Overload by utilizing User Profiles of Interest. Also, addressing Cold start and Data sparcity problems . •Hierarchy representation of interests by inferring the hierarchy from knowledge bases. •Our hashtag co-occurrence analysis is as follows: 1. A very small percentage of event-related hashtags are necessary to get most of the event related tweets. 2. These popular hashtags co-occur very well. •Starting with an initial event-relevant hashtag, we check the relevancy of co-occurring hashtags with the Wikipedia Event page. •The relevancy is measured by representing -- tags with its co- occurring entities --- Wikipedia event page by its linked entities. Velocity Challenge Publications Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, and Amit Sheth. User Interests Identification on Twitter Using a Hierarchical Knowledge BaseUR - The Semantic Web: Trends and Challenges, ESWC 2014. Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, and Amit Sheth. 2014. Hierarchical interest graph from tweets. In Proceedings of the companion publication of the 23rd international conference on World wide web companion (WWW Companion '14). Pavan Kapanipathi, Krishnaprasad Thirunarayan, Amit Sheth, and Pascal Hitzler. A Real-time Approach for Continuous Crawling of Events on Twitter by Leveraging Wikipedia. Technical report 2013. Twitter: •Unidirectional paradigm and open to research. •Twitter users generate around 433k tweets, around 12TB /min. • Being explored to understand user behavior , disaster management, follow trending topics, and news .