SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
Entity Linking in Social Media
Project Number : 10
Group Number : 51
- Abhishek Mittal, 201101192
- Mohit Aggarwal, 201101164
- Vishrut Mehta, 201102128
- Himanshu Ghadiya, 201305620
Overview
● The main aim of our project is to Link Entities in Social Media,
i.e. extracting context and meaning of a sentence from a tweet
and linking it to a wikipedia page for more understanding.
● In today’s world, Semantic understanding of Sentence is very
important. Now, most of our conversation happens through social
media, and its important to understand the meaning of those
conversations, which is possible by linking named entities to
their context. So we have taken tweets as our base to evaluate
and test our method of entity linking by extracting context from
tweets.
Approach
● We extract named entities from the tweets using the CMU-ARK
tagger .
● The named entities are then mapped to relevant news feeds in a
particular time interval.
● We then extract the named entities from these news feeds and
obtain a final collection of related entities that would contain
sufficient information about that tweet.
● Corresponding to each entity, we find the Wikipedia Pages.
● We then find the labels for each wiki page in order to find their
context and then finally map the tweets to that context. The
classification task is done using SVM.
Design
Datasets
We have used the following datasets for the project -
● A dataset of tweets.
● A dataset of news feeds from different news websites. We have used the
CBS News dataset.
● A 40 GB Wikipedia Dump as the training set for SVM. Right now, we
have trained the SVM on only 5 GB of Wikipedia data.
● A predefined set of about 15 labels, that the tweets would be mapped to.
Tools
We have used the following tools for the project -
● CMR-ARK parser - To find named entities using mention detection from
tweets.
● Stanford parser - To find named entities using mention detection on news
feeds (as they are structured).
● Wikipedia Search API - To find wiki pages for a keyword.
● SVM - Libsvm - To find the context of the wiki page.
Results
● We evaluated our system on a small dataset. We took about 200 tweets
dated 10 January, 2014 and news feeds during all the 24 hours of that
day.
● We then ran our algorithm to find the context of each tweet. After
comparing the results with the labels we had manually assigned, we
found the Accuracy to be around 37 percent.
● The low accuracy in outputs is mostly because of small training and
testing datasets used for classification. When we train the SVM on 40 GB
of Wikipedia dataset, we are confident of achieving a good accuracy.
Challenges and Issues
● Feature Selection for SVM was a major challenge. We would have to
choose such feature vectors that would give maximum accuracy during
classification.
● Training SVM on 40GB of Wikipedia is a major challenge.
● Right now, we have taken only 15 labels for classifying the tweets.
Increasing the number of labels would make the algorithm more
computation intensive. Scaling this system for bigger datasets and more
contexts would require more optimizations.
Thank You

Weitere ähnliche Inhalte

Andere mochten auch

digital marketing plan for the launch of the online bidding company Pujalista...
digital marketing plan for the launch of the online bidding company Pujalista...digital marketing plan for the launch of the online bidding company Pujalista...
digital marketing plan for the launch of the online bidding company Pujalista...Valeria Deserto
 
Digital Marketing Plan for American Birding Association
Digital Marketing Plan for American Birding AssociationDigital Marketing Plan for American Birding Association
Digital Marketing Plan for American Birding AssociationDaniel Forster
 
Glen Eden Digital Marketing Plan
Glen Eden Digital Marketing PlanGlen Eden Digital Marketing Plan
Glen Eden Digital Marketing PlanJoshua Favaro
 
LUMIX Digital Marketing Plan
LUMIX Digital Marketing PlanLUMIX Digital Marketing Plan
LUMIX Digital Marketing PlanMary Raftery
 
Digital Marketing Plan
Digital Marketing PlanDigital Marketing Plan
Digital Marketing PlanAshley Egan
 
Gut Checking Your 2015 Integrated Digital Marketing Plan
Gut Checking Your 2015 Integrated Digital Marketing PlanGut Checking Your 2015 Integrated Digital Marketing Plan
Gut Checking Your 2015 Integrated Digital Marketing PlanMike Corak
 
NSTA digital marketing plan
NSTA digital marketing planNSTA digital marketing plan
NSTA digital marketing plantourismvc
 
Digital marketing Plan for Food Processing Company
Digital marketing Plan for Food Processing CompanyDigital marketing Plan for Food Processing Company
Digital marketing Plan for Food Processing CompanyKavish Arora
 
Plan de Marketing Digital
Plan de Marketing DigitalPlan de Marketing Digital
Plan de Marketing DigitalNicolás Vives
 
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digitalCopy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digitalMichel Soares de Oliveira
 
Digital Marketing Plan for Profecs education
Digital Marketing Plan for Profecs educationDigital Marketing Plan for Profecs education
Digital Marketing Plan for Profecs educationAnindita Sarkar
 
Direct Wine Digital Marketing
Direct Wine Digital MarketingDirect Wine Digital Marketing
Direct Wine Digital MarketingPeter Harrison
 
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...theidm_marketing
 
How to Build an End-to-End Digital Marketing Plan
How to Build an End-to-End Digital Marketing PlanHow to Build an End-to-End Digital Marketing Plan
How to Build an End-to-End Digital Marketing PlanAuthentia
 
Digital Marketing Plan for Education - Vibes Communications
Digital Marketing Plan for Education - Vibes CommunicationsDigital Marketing Plan for Education - Vibes Communications
Digital Marketing Plan for Education - Vibes CommunicationsVibes Communications Pvt Ltd
 
Innervate Digital Marketing Plan
Innervate Digital Marketing PlanInnervate Digital Marketing Plan
Innervate Digital Marketing PlanJordan Mason
 

Andere mochten auch (17)

digital marketing plan for the launch of the online bidding company Pujalista...
digital marketing plan for the launch of the online bidding company Pujalista...digital marketing plan for the launch of the online bidding company Pujalista...
digital marketing plan for the launch of the online bidding company Pujalista...
 
Digital Marketing Plan for American Birding Association
Digital Marketing Plan for American Birding AssociationDigital Marketing Plan for American Birding Association
Digital Marketing Plan for American Birding Association
 
Glen Eden Digital Marketing Plan
Glen Eden Digital Marketing PlanGlen Eden Digital Marketing Plan
Glen Eden Digital Marketing Plan
 
LUMIX Digital Marketing Plan
LUMIX Digital Marketing PlanLUMIX Digital Marketing Plan
LUMIX Digital Marketing Plan
 
Digital Marketing Plan
Digital Marketing PlanDigital Marketing Plan
Digital Marketing Plan
 
Gut Checking Your 2015 Integrated Digital Marketing Plan
Gut Checking Your 2015 Integrated Digital Marketing PlanGut Checking Your 2015 Integrated Digital Marketing Plan
Gut Checking Your 2015 Integrated Digital Marketing Plan
 
NSTA digital marketing plan
NSTA digital marketing planNSTA digital marketing plan
NSTA digital marketing plan
 
Digital marketing Plan for Food Processing Company
Digital marketing Plan for Food Processing CompanyDigital marketing Plan for Food Processing Company
Digital marketing Plan for Food Processing Company
 
Plan de Marketing Digital
Plan de Marketing DigitalPlan de Marketing Digital
Plan de Marketing Digital
 
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digitalCopy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
 
Plan de marketing digital
Plan de marketing digitalPlan de marketing digital
Plan de marketing digital
 
Digital Marketing Plan for Profecs education
Digital Marketing Plan for Profecs educationDigital Marketing Plan for Profecs education
Digital Marketing Plan for Profecs education
 
Direct Wine Digital Marketing
Direct Wine Digital MarketingDirect Wine Digital Marketing
Direct Wine Digital Marketing
 
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
 
How to Build an End-to-End Digital Marketing Plan
How to Build an End-to-End Digital Marketing PlanHow to Build an End-to-End Digital Marketing Plan
How to Build an End-to-End Digital Marketing Plan
 
Digital Marketing Plan for Education - Vibes Communications
Digital Marketing Plan for Education - Vibes CommunicationsDigital Marketing Plan for Education - Vibes Communications
Digital Marketing Plan for Education - Vibes Communications
 
Innervate Digital Marketing Plan
Innervate Digital Marketing PlanInnervate Digital Marketing Plan
Innervate Digital Marketing Plan
 

Kürzlich hochgeladen

Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 

Kürzlich hochgeladen (20)

Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Ire project - Entity Linking in Social Media

  • 1. Entity Linking in Social Media Project Number : 10 Group Number : 51 - Abhishek Mittal, 201101192 - Mohit Aggarwal, 201101164 - Vishrut Mehta, 201102128 - Himanshu Ghadiya, 201305620
  • 2. Overview ● The main aim of our project is to Link Entities in Social Media, i.e. extracting context and meaning of a sentence from a tweet and linking it to a wikipedia page for more understanding. ● In today’s world, Semantic understanding of Sentence is very important. Now, most of our conversation happens through social media, and its important to understand the meaning of those conversations, which is possible by linking named entities to their context. So we have taken tweets as our base to evaluate and test our method of entity linking by extracting context from tweets.
  • 3. Approach ● We extract named entities from the tweets using the CMU-ARK tagger . ● The named entities are then mapped to relevant news feeds in a particular time interval. ● We then extract the named entities from these news feeds and obtain a final collection of related entities that would contain sufficient information about that tweet. ● Corresponding to each entity, we find the Wikipedia Pages. ● We then find the labels for each wiki page in order to find their context and then finally map the tweets to that context. The classification task is done using SVM.
  • 5. Datasets We have used the following datasets for the project - ● A dataset of tweets. ● A dataset of news feeds from different news websites. We have used the CBS News dataset. ● A 40 GB Wikipedia Dump as the training set for SVM. Right now, we have trained the SVM on only 5 GB of Wikipedia data. ● A predefined set of about 15 labels, that the tweets would be mapped to.
  • 6. Tools We have used the following tools for the project - ● CMR-ARK parser - To find named entities using mention detection from tweets. ● Stanford parser - To find named entities using mention detection on news feeds (as they are structured). ● Wikipedia Search API - To find wiki pages for a keyword. ● SVM - Libsvm - To find the context of the wiki page.
  • 7. Results ● We evaluated our system on a small dataset. We took about 200 tweets dated 10 January, 2014 and news feeds during all the 24 hours of that day. ● We then ran our algorithm to find the context of each tweet. After comparing the results with the labels we had manually assigned, we found the Accuracy to be around 37 percent. ● The low accuracy in outputs is mostly because of small training and testing datasets used for classification. When we train the SVM on 40 GB of Wikipedia dataset, we are confident of achieving a good accuracy.
  • 8. Challenges and Issues ● Feature Selection for SVM was a major challenge. We would have to choose such feature vectors that would give maximum accuracy during classification. ● Training SVM on 40GB of Wikipedia is a major challenge. ● Right now, we have taken only 15 labels for classifying the tweets. Increasing the number of labels would make the algorithm more computation intensive. Scaling this system for bigger datasets and more contexts would require more optimizations.