Topic Detection From Tweets Using Entity Recognition & Clustering

•

0 gefällt mir•448 views

Garima Ahuja

Technologie Bildung

Abstract
We live in a world of information overload. Manually
annotating text with topics isn't an option anymore. In this
paper, we deal with tweets. Firstly we recognize the
topics/entities they speak about. Having done that, we
cluster them based on the recognized entities and verbs to
get hierarchy of clusters. The clusters are then labelled
based on the most frequent entities.

Tools Used
● Twitter NLP
● Wiki Semantic Distance
● Verb net

Approach
To get first level of clusters:
1. Tokenize the tweets.
2. Apply POS tagging.
3. Apply IOB tagging on each token using Feature Extraction.
4. Extract Entities by applying some rules on the tweet with IOB token.
5. For the identified entity, find the nearest wikipedia entity using string edit
distance.
6. Create an inverted index based on the identified entities.

Approach contd...
Then:
1. We use k-means clustering using jaccard similarity as the similarity metric
at each level.
2. We get the most frequent tags from each of the clusters and use them to
label the clusters.

Conclusion
Our methods successfully cluster tweets into a semantically related hierarchy.
We took a dataset that was constrained to a specific domain i.e. elections.
Future work may involve experimenting with different datasets. Wiki semantic
distance might be more useful in case of a more diverse dataset. Future work
can also focus on experimenting with different datasets to find out when wiki
semantic distance begins to significantly outperform jaccard similarity.

Thanks!
Team 15
Garima Ahuja
Harish Kolli
Ashwin
Venkatram

Weitere ähnliche Inhalte

Ähnlich wie Topic Detection From Tweets Using Entity Recognition & Clustering

Python made easy Abhishek kumar

Intro to Objective CAshiq Uz Zoha

Real-time Generation of Topic Maps from Speech Streamstmra

Frame-Script and Predicate logic.pptxnilesh405711

Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA

SubTopic Detection of Tweets Related to an EntityAnkita Kumari

Automating Tinder w/ Eigenfaces and StanfordNLPJustin Long

Memory models in c#Sophie Obomighie

Bitcoin Price PredictionKadambini Indurkar

Java unit 7Shipra Swati

Python-Classes.pptxKarudaiyar Ganapathy

Data Acquisition for Sentiment AnalysisAli BELCAID

Generating domain specific sentiment lexicons using the Web Directory acijjournal

Dataworkz odsc london 2018Olaf de Leeuw

3. jvmIndu Sharma Bhardwaj

Flutter - Fluent-UtterApurva Gupta

A Fuzzy Logic Intelligent Agent for Information ExtractionTarekMourad8

No more bad news!Simon Lia-Jonassen

JAVA VIVA QUESTIONS_CODERS LODGE.pdfnofakeNews

The magic of machine translation 20 july 2017SK Reddy

Ähnlich wie Topic Detection From Tweets Using Entity Recognition & Clustering (20)

Python made easy

Intro to Objective C

Real-time Generation of Topic Maps from Speech Streams

Frame-Script and Predicate logic.pptx

Demystifying NLP Transformers: Understanding the Power and Architecture behin...

SubTopic Detection of Tweets Related to an Entity

Automating Tinder w/ Eigenfaces and StanfordNLP

Memory models in c#

Bitcoin Price Prediction

Java unit 7

Python-Classes.pptx

Data Acquisition for Sentiment Analysis

Generating domain specific sentiment lexicons using the Web Directory

Dataworkz odsc london 2018

3. jvm

Flutter - Fluent-Utter

A Fuzzy Logic Intelligent Agent for Information Extraction

No more bad news!

JAVA VIVA QUESTIONS_CODERS LODGE.pdf

The magic of machine translation 20 july 2017

Kürzlich hochgeladen

What is Artificial Intelligence?????????blackmambaettijean

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Sample pptx for embedding into website for demoHarshalMandlekar2

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

unit 4 immunoblotting technique complete.pptxBkGupta21

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

From Family Reminiscence to Scholarly Archive .Alan Dix

Advanced Computer Architecture – An IntroductionDilum Bandara

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Kürzlich hochgeladen (20)

What is Artificial Intelligence?????????

SIP trunking in Janus @ Kamailio World 2024

Ensuring Technical Readiness For Copilot in Microsoft 365

Dev Dives: Streamline document processing with UiPath Studio Web

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Anypoint Exchange: It’s Not Just a Repo!

The Ultimate Guide to Choosing WordPress Pros and Cons

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Sample pptx for embedding into website for demo

The State of Passkeys with FIDO Alliance.pptx

unit 4 immunoblotting technique complete.pptx

Unraveling Multimodality with Large Language Models.pdf

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

From Family Reminiscence to Scholarly Archive .

Advanced Computer Architecture – An Introduction

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

DevEX - reference for building teams, processes, and platforms

Topic Detection From Tweets Using Entity Recognition & Clustering

1. Topic Detection From Events

2. Abstract We live in a world of information overload. Manually annotating text with topics isn't an option anymore. In this paper, we deal with tweets. Firstly we recognize the topics/entities they speak about. Having done that, we cluster them based on the recognized entities and verbs to get hierarchy of clusters. The clusters are then labelled based on the most frequent entities.

3. Tools Used ● Twitter NLP ● Wiki Semantic Distance ● Verb net

4. Approach To get first level of clusters: 1. Tokenize the tweets. 2. Apply POS tagging. 3. Apply IOB tagging on each token using Feature Extraction. 4. Extract Entities by applying some rules on the tweet with IOB token. 5. For the identified entity, find the nearest wikipedia entity using string edit distance. 6. Create an inverted index based on the identified entities.

5. Approach contd... Then: 1. We use k-means clustering using jaccard similarity as the similarity metric at each level. 2. We get the most frequent tags from each of the clusters and use them to label the clusters.

6. Architecture

7. Results

8. Results contd...

9. Results contd...

10. Conclusion Our methods successfully cluster tweets into a semantically related hierarchy. We took a dataset that was constrained to a specific domain i.e. elections. Future work may involve experimenting with different datasets. Wiki semantic distance might be more useful in case of a more diverse dataset. Future work can also focus on experimenting with different datasets to find out when wiki semantic distance begins to significantly outperform jaccard similarity.

11. Thanks! Team 15 Garima Ahuja Harish Kolli Ashwin Venkatram

Topic Detection From Tweets Using Entity Recognition & Clustering

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Topic Detection From Tweets Using Entity Recognition & Clustering

Ähnlich wie Topic Detection From Tweets Using Entity Recognition & Clustering (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Topic Detection From Tweets Using Entity Recognition & Clustering