SocialLda

•

1 gefällt mir•480 views

saurabh_IIITH

Bildung Technologie Business

Topic Modelling in Social
Media
Group 28 Project 06

Group Members
● Prateek Mehta (201203006)
prateek.mehta@students.iiit.ac.in
● Saurabh Kanaujia (201305551)
saurabh.kanaujia@students.iiit.ac.in
● Nikita Nataraj (201101079)
nikita.nataraj@students.iiit.ac.in

Aim
● Apply topic-modelling techniques to social
media.
● Main focus: reduce the cost of computing
LDA-model in social networks and this
technique should be scalable.
● Efficient representation and calculation of
topics of whole network.

Introduction
Topical categorization of blogs, documents or other objects that can be
tagged with text, improves the experience for end users. When the set of
documents is very large and varies significantly from user to user, the task of
calculating a single global topic model, or an individual topic model for each
and every user can become very expensive in large scale internet settings. In
order to implement topic modelling, we have used LDA. Latent Dirichlet
allocation (LDA)is an unsupervised, probabilistic, text clustering algorithm.
LDA defines a generative model that can be used to model how documents
are generated given a set of topics and the words in the topics. We have
chosen to LDA because it is more convenient to model more human like
corpus, in other other words social media.

Possible Approaches
1. Find LDA model for each user in network. (very costly)
2. Find top K influential users and apply LDA model for
these.
3. Classifying communities and apply the LDA model
across communities.
We tried to implement Approach 2 and 3.

Approach No. 3 Drawbacks
● This community detection is based upon bi-directional
follower-followee relationship. only 22-23% users in
twitter have such relationship where they follow each
other.
● Implementation to find communities based upon uni-
directional follower-followee relationship was not
possible and scalable.

Approach No. 2
Phase 1: Finding Influential Users
● Top-k users found using GraphChi API page rank
algorithm.
● Fetched tweets and URLs embedded with them.
Metadata, tags, ids are also fetched.
● Crawled the URLs, and summarized them.
● Tweets document + URI summary used as training data

Approach No. 2
Phase 2: User Similarity
● Tweets and urls are fetched. Url is summarised to 15-
20 sentences.
● Jaccard index is calculated to match user with one of
the top users.
● Maximum Jaccard index implies that user adopts the
topic distribution with the corresponding

Conclusion
Out of the three approaches that were
proposed, the second one, in which we define
100 top users and create an LDA model for
each.

Empfohlen

Scalable recommendation with social contextual informationeSAT Journals

Safeguarding Abila: Discovering Evolving Activist NetworksParang Saraf

Zaffar+Ahmed+ +Collaborative+FilteringZaffar Ahmed Shaikh

Realizing the GPRAMA using Government Linked DataGeorge Thomas

Personalizing Forum Search using Multidimensional Random WalksAmélie Marian

HealthData.gov Challenge WebinarGeorge Thomas

Open Health Knowledge GraphsGeorge Thomas

Approaches for recommendation systemVikash Kumar

Empfohlen

Scalable recommendation with social contextual informationeSAT Journals

Safeguarding Abila: Discovering Evolving Activist NetworksParang Saraf

Zaffar+Ahmed+ +Collaborative+FilteringZaffar Ahmed Shaikh

Realizing the GPRAMA using Government Linked DataGeorge Thomas

Personalizing Forum Search using Multidimensional Random WalksAmélie Marian

HealthData.gov Challenge WebinarGeorge Thomas

Open Health Knowledge GraphsGeorge Thomas

Approaches for recommendation systemVikash Kumar

Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...Digital History

StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...Symeon Papadopoulos

Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain

Fabrikatyr lda topic modelling practical applicationTim Carnus

Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Alexis Perrier

Lifelong Topic Modelling presentation Daniele Di Mitri

Topic Modelling to identify behavioral trends in online communities Conor Duke

Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Jonathan Sedar

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...Vasily Leksin

Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon

An Introduction to gensim: "Topic Modelling for Humans"sandinmyjoints

Database - Entity Relationship Diagram (ERD)Mudasir Qazi

Entity Relationship DiagramShakila Mahjabin

How to Draw an Effective ER diagramTech_MX

Topic Modelling and APIsAli Kheyrollahi

word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody

Vectors Shiva Daravath

Topic modelkrishnakant vishwakarma

Topic modelingSajal Sharma

Social Friend Overlying Communities Based on Social Network ContextIRJET Journal

Scalable recommendation with social contextual informationeSAT Journals

Preliminry reportJiten Ahuja

Weitere ähnliche Inhalte

Andere mochten auch

Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...Digital History

StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...Symeon Papadopoulos

Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain

Fabrikatyr lda topic modelling practical applicationTim Carnus

Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Alexis Perrier

Lifelong Topic Modelling presentation Daniele Di Mitri

Topic Modelling to identify behavioral trends in online communities Conor Duke

Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Jonathan Sedar

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...Vasily Leksin

Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon

An Introduction to gensim: "Topic Modelling for Humans"sandinmyjoints

Database - Entity Relationship Diagram (ERD)Mudasir Qazi

Entity Relationship DiagramShakila Mahjabin

How to Draw an Effective ER diagramTech_MX

Topic Modelling and APIsAli Kheyrollahi

word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody

Vectors Shiva Daravath

Andere mochten auch (17)

Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...

StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...

Topic Modelling: Tutorial on Usage and Applications

Fabrikatyr lda topic modelling practical application

Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...

Lifelong Topic Modelling presentation

Topic Modelling to identify behavioral trends in online communities

Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...

Word2Vec: Vector presentation of words - Mohammad Mahdavi

An Introduction to gensim: "Topic Modelling for Humans"

Database - Entity Relationship Diagram (ERD)

Entity Relationship Diagram

How to Draw an Effective ER diagram

Topic Modelling and APIs

word2vec, LDA, and introducing a new hybrid algorithm: lda2vec

Vectors

Ähnlich wie SocialLda

Topic modelkrishnakant vishwakarma

Topic modelingSajal Sharma

Social Friend Overlying Communities Based on Social Network ContextIRJET Journal

Scalable recommendation with social contextual informationeSAT Journals

Preliminry reportJiten Ahuja

Metadata mapping and vocabulary: consistency for all in scholarly communicati...CILIP MDG

Improving Effort Estimation in Agile Software Development ProjectsGedi Siuskus

Database design conceptShashwat Shriparv

Graph Neural Networks for Social Recommendation.pptxssuser2624f71

Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...TELKOMNIKA JOURNAL

Profile Analysis of Users in Data Analytics DomainDrjabez

PPT.pptxHARSHPANDEY864931

srd117.final.512Spring2016Saurabh Deochake

Integrated expert recommendation model for online communitiesst02IJwest

Q046049397IJERA Editor

A Personalized Software Assistant Framework To Achieve User GoalsPradeep K. Venkatesh

IRJET- Event Detection and Text Summary by Disaster WarningIRJET Journal

Social media community using optimized algorithm by M. Gomathi / Lecturergomathi chlm

IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET Journal

Analyzing User Modeling on Twitter for Personalized News RecommendationsGUANGYUAN PIAO

Ähnlich wie SocialLda (20)

Topic model

Topic modeling

Social Friend Overlying Communities Based on Social Network Context

Scalable recommendation with social contextual information

Preliminry report

Metadata mapping and vocabulary: consistency for all in scholarly communicati...

Improving Effort Estimation in Agile Software Development Projects

Database design concept

Graph Neural Networks for Social Recommendation.pptx

Selecting User Influence on Twitter Data Using Skyline Query under MapReduce ...

Profile Analysis of Users in Data Analytics Domain

PPT.pptx

srd117.final.512Spring2016

Integrated expert recommendation model for online communitiesst02

Q046049397

A Personalized Software Assistant Framework To Achieve User Goals

IRJET- Event Detection and Text Summary by Disaster Warning

Social media community using optimized algorithm by M. Gomathi / Lecturer

IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...

Analyzing User Modeling on Twitter for Personalized News Recommendations

Kürzlich hochgeladen

FSB Advising Checklist - Orientation 2024Elizabeth Walsh

Single or Multiple melodic lines structuredhanjurrannsibayan2

How to Give a Domain for a Field in Odoo 17Celine George

Dyslexia AI Workshop for Slideshare.pptxcallscotland1987

On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash

The basics of sentences session 3pptx.pptxheathfieldcps1

General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil

Spatium Project Simulation student briefAssociation for Project Management

Holdier Curriculum Vitae (April 2024).pdfagholdier

Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh

Sociology 101 Demonstration of Learning Exhibitjbellavia9

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b

Introduction to Nonprofit Accounting: The BasicsTechSoup

Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417

Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George

Application orientated numerical on hev.pptRamjanShidvankar

1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics

Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand

Kürzlich hochgeladen (20)

FSB Advising Checklist - Orientation 2024

Single or Multiple melodic lines structure

How to Give a Domain for a Field in Odoo 17

Dyslexia AI Workshop for Slideshare.pptx

On National Teacher Day, meet the 2024-25 Kenan Fellows

The basics of sentences session 3pptx.pptx

General Principles of Intellectual Property: Concepts of Intellectual Proper...

Spatium Project Simulation student brief

Holdier Curriculum Vitae (April 2024).pdf

Micro-Scholarship, What it is, How can it help me.pdf

Sociology 101 Demonstration of Learning Exhibit

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf

Introduction to Nonprofit Accounting: The Basics

Unit-V; Pricing (Pharma Marketing Management).pptx

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes

Application orientated numerical on hev.ppt

1029 - Danh muc Sach Giao Khoa 10 . pdf

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...

Google Gemini An AI Revolution in Education.pptx

SocialLda

1. Topic Modelling in Social Media Group 28 Project 06

2. Group Members ● Prateek Mehta (201203006) prateek.mehta@students.iiit.ac.in ● Saurabh Kanaujia (201305551) saurabh.kanaujia@students.iiit.ac.in ● Nikita Nataraj (201101079) nikita.nataraj@students.iiit.ac.in

3. Aim ● Apply topic-modelling techniques to social media. ● Main focus: reduce the cost of computing LDA-model in social networks and this technique should be scalable. ● Efficient representation and calculation of topics of whole network.

4. Introduction Topical categorization of blogs, documents or other objects that can be tagged with text, improves the experience for end users. When the set of documents is very large and varies significantly from user to user, the task of calculating a single global topic model, or an individual topic model for each and every user can become very expensive in large scale internet settings. In order to implement topic modelling, we have used LDA. Latent Dirichlet allocation (LDA)is an unsupervised, probabilistic, text clustering algorithm. LDA defines a generative model that can be used to model how documents are generated given a set of topics and the words in the topics. We have chosen to LDA because it is more convenient to model more human like corpus, in other other words social media.

5. Possible Approaches 1. Find LDA model for each user in network. (very costly) 2. Find top K influential users and apply LDA model for these. 3. Classifying communities and apply the LDA model across communities. We tried to implement Approach 2 and 3.

6. Approach No. 3 Drawbacks ● This community detection is based upon bi-directional follower-followee relationship. only 22-23% users in twitter have such relationship where they follow each other. ● Implementation to find communities based upon uni- directional follower-followee relationship was not possible and scalable.

7. Approach No. 2 Phase 1: Finding Influential Users ● Top-k users found using GraphChi API page rank algorithm. ● Fetched tweets and URLs embedded with them. Metadata, tags, ids are also fetched. ● Crawled the URLs, and summarized them. ● Tweets document + URI summary used as training data

8. Approach No.2 Phase 1: Diagram

9. Approach No. 2 Phase 2: User Similarity ● Tweets and urls are fetched. Url is summarised to 15- 20 sentences. ● Jaccard index is calculated to match user with one of the top users. ● Maximum Jaccard index implies that user adopts the topic distribution with the corresponding

10. Approach No. 2 Phase 2: Diagram

11. Conclusion Out of the three approaches that were proposed, the second one, in which we define 100 top users and create an LDA model for each.