SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
SUB EVENT DETECTION
ON SOCIAL MEDIA
Kshitij Kansal	

Maaz Anwar Nomani	

Ahmed Ali Durga
Information Retrieval and Extraction
INTRODUCTION
1. Motivation	

• Social Media is filled with a lot of information.	

• Information is shared much before the news gets displayed on the
news websites.	

• The information shared captures even the minute details which
news websites might ovelook.	

• This gives us a lot of scope for early news detection with more
diminutive details.	

2. Objective	

• We aim to propose an automatic method for extracting Sub -
Events in the given Social Media feeds.
SUB EVENT
• What is a Sub Event?	

• Any kind of information which is small to be conveyed as a part of
whole event. 	

• large enough to affect some appreciably large reader's community.	

• Includes aftermath of an event, real time notifications, responses,
public sentiments and reports.	

• Why Sub Event?	

• Closely related to a particular commuity.	

• Can be used to enhance the knowledge of an event.	

• Can measure the public sentiments along the whole course of
occurance of the event.
OUR EXPERIMENT
• Detecting the "sub events" in the Twitter Stream related to the
US Presedential Elections.
• Main Event: US Presidential Elections and the Victory of
Barack Obama.
• Sub Events: Victory or defeats of some famous candidates,
public sentiments across the course of elections, changes in the
stock market as the treds start to pour out etc.
• The approach decided is not specific to this dataset only. This
can be applied to any dataset in the form of Twitter stream.
APPROACH
We followed an organised approach where we divide the
whole process in the following three sub parts which
were dealt with separately and later integrated.	

• Tweet filtering and Noise Reduction	

• Sub Event Detection	

• Sub Event Summarization
TWEET FILTERING AND NOISE REDUCTION
Aim: To eliminate the useless tweets which do not convey much
information regarding the event.
• Tweet Stream provided is cleaned using the self defined filter.
• Filter takes into account the linguistic aspects of the language and
context filtering.
• Remove Diacritic marks
• Consider only ASCII characters
• Ignore repeatitions
• Ignore Multiple Punctuations
• Consider only tweets starting with capitals
• Remove extremely small and large tweets
SUB EVENT EXTRACTION
Aim: To extract tweets that express some defining moments in the
event.
• To be applied on the filtered stream available from the noise
reduction module.
• Dictionary of the tweets words and generation of Tweet Vector
• Find the distance between the tweets.
• Group together the similar tweets.
• Chunks of relevant tweets will form the sub events.
• Hashing of the tweet stream to increse the speed of the system
EXTRACTION ...
Dictionary Creating and Vector Generation
• Dictionary Creation:
• Bag of Word Representation.
• Stop Word Removal.
• Assign unique ID to the words.
• Vector Generation:
• Create the n dimension vector
• n is the number of words in the dictionary.
• Vector value = 1, if word present
• Vector value = 0, if word not present
• Create sparse vector for space optimization.
EXTRACTION ...
Distance and Similarity Measures
• Euclidean Distance:
• Simple distance between the tweet vecors.
• Similar to finding distance between the points in n dimension space.
• n being the size of Tweet dictionary.
• Similarity Measure:
• Calculate the no of similar words in the tweets.
• If greter than some threshold, assume them to be similar
• Threshold(in our case): 50% of the length of smaller tweet.
• Takes into account the length of tweets i.e. Normalization.
• Cosine Similarity:
• Similar to above method.
• Also takes int account the length i.e. Normalization.
• Works by finding out the angle between the two tweets.
• Tweets are taken to points in n dimension space.
EXTRACTION ...
Hashing
• Increases the speed of retrieval module
• Locality Sensitive Hashing
• Dimension Reduction of high dimension data
• Maximizes the probability of collision of similar
tweets.
• PyLucene
• Python extension for using Java Lucene
• Apache Lucene is a free/open source
information retrieval software library
SUMMARIZATION
• Related tweets are extracted and stored in separated files.
• Need to make extract the sub event from these related tweets.
• Some kind of summarization of the colled tweets is required.
• Summarization needs to be in human readble form.
• Should able to convey the happeinings in the sub event.
• If possble, crawl data from the URL's in the links and use it for
summarization.
• Image support will increase its attractiveness and user
acceptability.
SUMMARIZATION ...
• Important for the end user evaluation.
• Thus,Summarization forms the crux of the content defined by a
sub-event.
• Two approaches to automatic summarization
• Extraction: Works by selecting a subset of existing words,
phrases, or sentences in the original text to form the summary
• Abstraction: build an internal semantic representation and
then use natural language generation techniques to create a
summary that is closer to what a human might generate
SUMMARIZATION ...
• Spanning Phrase approach is used.
• Took into account the most frequent words in the
cluster of tweets and club them.
• Choose two to be the maximum frequency of a word is
'w' ccurring in all the tweets.

Weitere ähnliche Inhalte

Andere mochten auch

Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalSvitlana volkova
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosisask2372
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITAnkit Sharma
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalChen Xi
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2ndhit_alex
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and ExtractionChristopher Frenz
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...Jim Jenkins
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesYunyao Li
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataGerard de Melo
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked DataIsabelle Augenstein
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsMatthew Lease
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 

Andere mochten auch (20)

Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location Retrieval
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosis
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrieval
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2nd
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and Extraction
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...
 
2 13
2 132 13
2 13
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challenges
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 

Ähnlich wie Group-13 Project 15 Sub event detection on social media

Twitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project PresentationTwitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project PresentationPallav Shah
 
final_nlp
final_nlpfinal_nlp
final_nlpaphex34
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social mediaJeremiah Fadugba
 
Twitter: A Hands On Learning Session for Researchers
Twitter: A Hands On Learning Session for ResearchersTwitter: A Hands On Learning Session for Researchers
Twitter: A Hands On Learning Session for ResearchersKMb Unit, York University
 
Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!AfricanCommonsProject
 
Using Social Media to Amplify Academic Events
Using Social Media to Amplify Academic EventsUsing Social Media to Amplify Academic Events
Using Social Media to Amplify Academic EventsLorna Campbell
 
Final Year PPT on Twitter App
Final Year PPT on Twitter AppFinal Year PPT on Twitter App
Final Year PPT on Twitter Appscorpionking257
 
Band of brothers, building scalable social web apps on windows azure with asp...
Band of brothers, building scalable social web apps on windows azure with asp...Band of brothers, building scalable social web apps on windows azure with asp...
Band of brothers, building scalable social web apps on windows azure with asp...Marjan Nikolovski
 
Towards identifying Collaborative Learning groups using Social Media
Towards identifying Collaborative Learning groups using Social MediaTowards identifying Collaborative Learning groups using Social Media
Towards identifying Collaborative Learning groups using Social MediaSelver Softic
 
iPhoneアプリのTwitter連携
iPhoneアプリのTwitter連携iPhoneアプリのTwitter連携
iPhoneアプリのTwitter連携So Matsuda
 
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Pavan Kapanipathi
 
Community application design for streaming analytics
Community application design for streaming analyticsCommunity application design for streaming analytics
Community application design for streaming analyticsSandeep Kumar
 
Social Media and Libraries
Social Media and LibrariesSocial Media and Libraries
Social Media and LibrariesClayton Wehner
 
Twitter in teaching and learning by dr.c.thanavathi
Twitter in teaching and learning by dr.c.thanavathiTwitter in teaching and learning by dr.c.thanavathi
Twitter in teaching and learning by dr.c.thanavathiThanavathi C
 
The power of social media anlaytics
The power of social media anlayticsThe power of social media anlaytics
The power of social media anlayticsAjay Ram
 
SwiftRiver Overview
SwiftRiver OverviewSwiftRiver Overview
SwiftRiver OverviewUshahidi
 
Project considerations etp
Project considerations etpProject considerations etp
Project considerations etpmissko
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataAxel Bruns
 

Ähnlich wie Group-13 Project 15 Sub event detection on social media (20)

Twitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project PresentationTwitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project Presentation
 
Twitter For Journalists
Twitter For JournalistsTwitter For Journalists
Twitter For Journalists
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
 
Twitter: A Hands On Learning Session for Researchers
Twitter: A Hands On Learning Session for ResearchersTwitter: A Hands On Learning Session for Researchers
Twitter: A Hands On Learning Session for Researchers
 
Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!
 
Using Social Media to Amplify Academic Events
Using Social Media to Amplify Academic EventsUsing Social Media to Amplify Academic Events
Using Social Media to Amplify Academic Events
 
Final Year PPT on Twitter App
Final Year PPT on Twitter AppFinal Year PPT on Twitter App
Final Year PPT on Twitter App
 
Band of brothers, building scalable social web apps on windows azure with asp...
Band of brothers, building scalable social web apps on windows azure with asp...Band of brothers, building scalable social web apps on windows azure with asp...
Band of brothers, building scalable social web apps on windows azure with asp...
 
Twitter in Academic Conferences
Twitter in Academic ConferencesTwitter in Academic Conferences
Twitter in Academic Conferences
 
Towards identifying Collaborative Learning groups using Social Media
Towards identifying Collaborative Learning groups using Social MediaTowards identifying Collaborative Learning groups using Social Media
Towards identifying Collaborative Learning groups using Social Media
 
iPhoneアプリのTwitter連携
iPhoneアプリのTwitter連携iPhoneアプリのTwitter連携
iPhoneアプリのTwitter連携
 
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
 
Community application design for streaming analytics
Community application design for streaming analyticsCommunity application design for streaming analytics
Community application design for streaming analytics
 
Social Media and Libraries
Social Media and LibrariesSocial Media and Libraries
Social Media and Libraries
 
Twitter in teaching and learning by dr.c.thanavathi
Twitter in teaching and learning by dr.c.thanavathiTwitter in teaching and learning by dr.c.thanavathi
Twitter in teaching and learning by dr.c.thanavathi
 
The power of social media anlaytics
The power of social media anlayticsThe power of social media anlaytics
The power of social media anlaytics
 
SwiftRiver Overview
SwiftRiver OverviewSwiftRiver Overview
SwiftRiver Overview
 
Project considerations etp
Project considerations etpProject considerations etp
Project considerations etp
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter Data
 

Kürzlich hochgeladen

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 

Kürzlich hochgeladen (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 

Group-13 Project 15 Sub event detection on social media

  • 1. SUB EVENT DETECTION ON SOCIAL MEDIA Kshitij Kansal Maaz Anwar Nomani Ahmed Ali Durga Information Retrieval and Extraction
  • 2. INTRODUCTION 1. Motivation • Social Media is filled with a lot of information. • Information is shared much before the news gets displayed on the news websites. • The information shared captures even the minute details which news websites might ovelook. • This gives us a lot of scope for early news detection with more diminutive details. 2. Objective • We aim to propose an automatic method for extracting Sub - Events in the given Social Media feeds.
  • 3. SUB EVENT • What is a Sub Event? • Any kind of information which is small to be conveyed as a part of whole event. • large enough to affect some appreciably large reader's community. • Includes aftermath of an event, real time notifications, responses, public sentiments and reports. • Why Sub Event? • Closely related to a particular commuity. • Can be used to enhance the knowledge of an event. • Can measure the public sentiments along the whole course of occurance of the event.
  • 4. OUR EXPERIMENT • Detecting the "sub events" in the Twitter Stream related to the US Presedential Elections. • Main Event: US Presidential Elections and the Victory of Barack Obama. • Sub Events: Victory or defeats of some famous candidates, public sentiments across the course of elections, changes in the stock market as the treds start to pour out etc. • The approach decided is not specific to this dataset only. This can be applied to any dataset in the form of Twitter stream.
  • 5. APPROACH We followed an organised approach where we divide the whole process in the following three sub parts which were dealt with separately and later integrated. • Tweet filtering and Noise Reduction • Sub Event Detection • Sub Event Summarization
  • 6. TWEET FILTERING AND NOISE REDUCTION Aim: To eliminate the useless tweets which do not convey much information regarding the event. • Tweet Stream provided is cleaned using the self defined filter. • Filter takes into account the linguistic aspects of the language and context filtering. • Remove Diacritic marks • Consider only ASCII characters • Ignore repeatitions • Ignore Multiple Punctuations • Consider only tweets starting with capitals • Remove extremely small and large tweets
  • 7. SUB EVENT EXTRACTION Aim: To extract tweets that express some defining moments in the event. • To be applied on the filtered stream available from the noise reduction module. • Dictionary of the tweets words and generation of Tweet Vector • Find the distance between the tweets. • Group together the similar tweets. • Chunks of relevant tweets will form the sub events. • Hashing of the tweet stream to increse the speed of the system
  • 8. EXTRACTION ... Dictionary Creating and Vector Generation • Dictionary Creation: • Bag of Word Representation. • Stop Word Removal. • Assign unique ID to the words. • Vector Generation: • Create the n dimension vector • n is the number of words in the dictionary. • Vector value = 1, if word present • Vector value = 0, if word not present • Create sparse vector for space optimization.
  • 9. EXTRACTION ... Distance and Similarity Measures • Euclidean Distance: • Simple distance between the tweet vecors. • Similar to finding distance between the points in n dimension space. • n being the size of Tweet dictionary. • Similarity Measure: • Calculate the no of similar words in the tweets. • If greter than some threshold, assume them to be similar • Threshold(in our case): 50% of the length of smaller tweet. • Takes into account the length of tweets i.e. Normalization. • Cosine Similarity: • Similar to above method. • Also takes int account the length i.e. Normalization. • Works by finding out the angle between the two tweets. • Tweets are taken to points in n dimension space.
  • 10. EXTRACTION ... Hashing • Increases the speed of retrieval module • Locality Sensitive Hashing • Dimension Reduction of high dimension data • Maximizes the probability of collision of similar tweets. • PyLucene • Python extension for using Java Lucene • Apache Lucene is a free/open source information retrieval software library
  • 11. SUMMARIZATION • Related tweets are extracted and stored in separated files. • Need to make extract the sub event from these related tweets. • Some kind of summarization of the colled tweets is required. • Summarization needs to be in human readble form. • Should able to convey the happeinings in the sub event. • If possble, crawl data from the URL's in the links and use it for summarization. • Image support will increase its attractiveness and user acceptability.
  • 12. SUMMARIZATION ... • Important for the end user evaluation. • Thus,Summarization forms the crux of the content defined by a sub-event. • Two approaches to automatic summarization • Extraction: Works by selecting a subset of existing words, phrases, or sentences in the original text to form the summary • Abstraction: build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might generate
  • 13. SUMMARIZATION ... • Spanning Phrase approach is used. • Took into account the most frequent words in the cluster of tweets and club them. • Choose two to be the maximum frequency of a word is 'w' ccurring in all the tweets.