SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
Topic Detection From Events
Abstract
We live in a world of information overload. Manually
annotating text with topics isn't an option anymore. In this
paper, we deal with tweets. Firstly we recognize the
topics/entities they speak about. Having done that, we
cluster them based on the recognized entities and verbs to
get hierarchy of clusters. The clusters are then labelled
based on the most frequent entities.
Tools Used
● Twitter NLP
● Wiki Semantic Distance
● Verb net
Approach
To get first level of clusters:
1. Tokenize the tweets.
2. Apply POS tagging.
3. Apply IOB tagging on each token using Feature Extraction.
4. Extract Entities by applying some rules on the tweet with IOB token.
5. For the identified entity, find the nearest wikipedia entity using string edit
distance.
6. Create an inverted index based on the identified entities.
Approach contd...
Then:
1. We use k-means clustering using jaccard similarity as the similarity metric
at each level.
2. We get the most frequent tags from each of the clusters and use them to
label the clusters.
Architecture
Results
Results contd...
Results contd...
Conclusion
Our methods successfully cluster tweets into a semantically related hierarchy.
We took a dataset that was constrained to a specific domain i.e. elections.
Future work may involve experimenting with different datasets. Wiki semantic
distance might be more useful in case of a more diverse dataset. Future work
can also focus on experimenting with different datasets to find out when wiki
semantic distance begins to significantly outperform jaccard similarity.
Thanks!
Team 15
Garima Ahuja
Harish Kolli
Ashwin
Venkatram

Weitere ähnliche Inhalte

Ähnlich wie Topic Detection From Tweets Using Entity Recognition & Clustering

Intro to Objective C
Intro to Objective CIntro to Objective C
Intro to Objective CAshiq Uz Zoha
 
Real-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech StreamsReal-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech Streamstmra
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxnilesh405711
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntityAnkita Kumari
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPJustin Long
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisAli BELCAID
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018Olaf de Leeuw
 
Flutter - Fluent-Utter
Flutter - Fluent-UtterFlutter - Fluent-Utter
Flutter - Fluent-UtterApurva Gupta
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionTarekMourad8
 
JAVA VIVA QUESTIONS_CODERS LODGE.pdf
JAVA VIVA QUESTIONS_CODERS LODGE.pdfJAVA VIVA QUESTIONS_CODERS LODGE.pdf
JAVA VIVA QUESTIONS_CODERS LODGE.pdfnofakeNews
 
The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017SK Reddy
 

Ähnlich wie Topic Detection From Tweets Using Entity Recognition & Clustering (20)

Python made easy
Python made easy Python made easy
Python made easy
 
Intro to Objective C
Intro to Objective CIntro to Objective C
Intro to Objective C
 
Real-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech StreamsReal-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech Streams
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an Entity
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLP
 
Memory models in c#
Memory models in c#Memory models in c#
Memory models in c#
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
 
Java unit 7
Java unit 7Java unit 7
Java unit 7
 
Python-Classes.pptx
Python-Classes.pptxPython-Classes.pptx
Python-Classes.pptx
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018
 
3. jvm
3. jvm3. jvm
3. jvm
 
Flutter - Fluent-Utter
Flutter - Fluent-UtterFlutter - Fluent-Utter
Flutter - Fluent-Utter
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information Extraction
 
No more bad news!
No more bad news!No more bad news!
No more bad news!
 
JAVA VIVA QUESTIONS_CODERS LODGE.pdf
JAVA VIVA QUESTIONS_CODERS LODGE.pdfJAVA VIVA QUESTIONS_CODERS LODGE.pdf
JAVA VIVA QUESTIONS_CODERS LODGE.pdf
 
The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017
 

Kürzlich hochgeladen

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Topic Detection From Tweets Using Entity Recognition & Clustering

  • 2. Abstract We live in a world of information overload. Manually annotating text with topics isn't an option anymore. In this paper, we deal with tweets. Firstly we recognize the topics/entities they speak about. Having done that, we cluster them based on the recognized entities and verbs to get hierarchy of clusters. The clusters are then labelled based on the most frequent entities.
  • 3. Tools Used ● Twitter NLP ● Wiki Semantic Distance ● Verb net
  • 4. Approach To get first level of clusters: 1. Tokenize the tweets. 2. Apply POS tagging. 3. Apply IOB tagging on each token using Feature Extraction. 4. Extract Entities by applying some rules on the tweet with IOB token. 5. For the identified entity, find the nearest wikipedia entity using string edit distance. 6. Create an inverted index based on the identified entities.
  • 5. Approach contd... Then: 1. We use k-means clustering using jaccard similarity as the similarity metric at each level. 2. We get the most frequent tags from each of the clusters and use them to label the clusters.
  • 10. Conclusion Our methods successfully cluster tweets into a semantically related hierarchy. We took a dataset that was constrained to a specific domain i.e. elections. Future work may involve experimenting with different datasets. Wiki semantic distance might be more useful in case of a more diverse dataset. Future work can also focus on experimenting with different datasets to find out when wiki semantic distance begins to significantly outperform jaccard similarity.
  • 11. Thanks! Team 15 Garima Ahuja Harish Kolli Ashwin Venkatram