SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Behavior
Analytics
SOCIAL
MEDIA
MINING
2Social Media Mining Measures and Metrics 2Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Dear instructors/users of these slides:
Please feel free to include these slides in your own
material, or modify them as you see fit. If you decide
to incorporate these slides into your presentations,
please include the following note:
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining:
An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
or include a link to the website:
http://socialmediamining.info/
3Social Media Mining Measures and Metrics 3Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Examples of Behavior Analytics
• What motivates users to
join an online group?
• When users abandon
social media sites, where
do they migrate to?
• Can we predict box
office revenues for
movies from tweets?
4Social Media Mining Measures and Metrics 4Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Behavior Analysis
• To answers these questions we need to analyze or
predict behaviors on social media.
• Users exhibit different behaviors on social media:
– As individuals, or
– As part of a broader collective behavior.
• When discussing individual behavior,
– Our focus is on one individual.
• Collective behavior emerges when a population of
individuals behave in a similar way with or
without coordination or planning.
5Social Media Mining Measures and Metrics 5Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Our Goal
To analyze, model, and
predict individual and
collective behavior
6Social Media Mining Measures and Metrics 6Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Individual Behavior
7Social Media Mining Measures and Metrics 7Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Types of Individual Behavior
• User-User
(link generation)
– befriending, sending a
message, playing games,
following, or inviting
• User-Community
– joining or leaving a
community, participating
in community discussions
• User-Entity
(content generation)
– writing a post
– posting a photo
8Social Media Mining Measures and Metrics 8Social Media Mining Behavior Analyticshttp://socialmediamining.info/
I. Individual Behavior
Analysis
9Social Media Mining Measures and Metrics 9Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Example: Community Membership in Social Media
• Why do users join communities?
– Communities can be implicit:
• Individuals buying a product as a community, and
• People buying the product for the first time as individuals joining the community.
– What factors affect the community-joining behavior of individuals?
• We can observe users who join communities
– Determine factors that are common among them
• To observe users, we require
– A population of users,
– A community 𝐶, and
– Community membership info (users who are members of 𝐶)
• To distinguish between users who have already joined the community
and those who are now joining it,
– We need community memberships at two times 𝑡1 and 𝑡2, with 𝑡2 > 𝑡1
– At 𝑡2, we find users who are members of the community, but were not members at 𝑡1
• These new users form the subpopulation that is analyzed for community-joining behavior.
10Social Media Mining Measures and Metrics 10Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Community Membership in Social Media
Hypothesis:
– individuals are inclined toward
an activity when their friends
are engaged in the same
activity.
• A factor that plays a role in
users joining a community is
the number of their friends
who are already members of
the community.
• In data mining terms,
– number of friends of an
individual in a community
– A feature to predict whether
the individual joins the
community (i.e., class
attribute).
Number of Friends
vs
Probability of Joining
a Community
Backstrom, L., Huttenlocher, D., Kleinberg, J., & Lan, X. (2006, August). Group formation in large social
networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining (pp. 44-54). ACM.
11Social Media Mining Measures and Metrics 11Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Even More Features
12Social Media Mining Measures and Metrics 12Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Feature Importance Analysis
Which feature can help best determine whether
individuals will join or not?
I. We can use any feature selection algorithm, or
II. We can use a classification algorithm, such as
decision tree learning
– Most important Features are ranked higher
13Social Media Mining Measures and Metrics 13Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Decision Tree for Joining a Community
Are these features well-designed?
– We can evaluate using classification performance metrics
14Social Media Mining Measures and Metrics 14Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Behavior Analysis Methodology
• An observable behavior
– The behavior needs to be observable
– E.g., accurately observing the joining of individuals (and possibly their
joining times)
• Features:
– Finding data features (covariates) that may or may not affect (or be
affected by) the behavior
– We need a domain expert for this step
• Feature-Behavior Association:
– Find the relationship between features and behavior
– E.g., use decision tree learning
• Evaluation:
– The findings are due to the features and not to externalities.
– E.g., we can use
• classification accuracy
• randomization tests (discussed later!)
• or causality testing algorithms
15Social Media Mining Measures and Metrics 15Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Granger Causality
Consider a linear regression model
• We can predict 𝒀𝒕 + 𝟏 by using either 𝒀 𝟏, 𝒀 𝟐 … 𝒀𝒕 or a
combination of 𝑿 𝟏, 𝑿 𝟐 … 𝑿𝒕 and 𝒀 𝟏, 𝒀 𝟐 … 𝒀𝒕
• If 𝜺 𝟐 < 𝜺 𝟏then 𝑿 Granger Causes 𝒀
• Why is this not causality?
16Social Media Mining Measures and Metrics 16Social Media Mining Behavior Analyticshttp://socialmediamining.info/
II. Individual Behavior
Modeling
17Social Media Mining Measures and Metrics 17Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Individual Behavior Modeling
• Models in
– Economics, Game Theory, and Network Science
We can use:
1. Threshold Models: we need to learn thresholds and weights
• 𝑊𝑖𝑗 can be defined as the fraction of times user 𝑖 buys a product and
user 𝑗 buys the same product soon after that
– When is soon?
– Similarly, thresholds can be estimated by taking into account
the average number of friends who need to buy a product
before user 𝑖 decides to buy it.
– What if friends don’t buy the same products?
• We can find the most similar individuals or items (similar to
collaborative filtering methods)
2. Cascade Models
18Social Media Mining Measures and Metrics 18Social Media Mining Behavior Analyticshttp://socialmediamining.info/
III. Individual Behavior
Prediction
19Social Media Mining Measures and Metrics 19Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Individual Behavior Prediction
• Most behaviors result in newly formed links in
social media.
– It can be a link to a user, as in befriending behavior;
– A link to an entity, as in buying behavior; or
– A link to a community, as in joining behavior.
• We can formulate many of these behaviors as a
link prediction problem.
20Social Media Mining Measures and Metrics 20Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Link Prediction - Setup
• Given a graph 𝐺(𝑉, 𝐸), let 𝑒(𝑢, 𝑣) denote edge between nodes 𝑢 and 𝑣
– 𝑡(𝑒) denotes the time that the edge was formed
• Let 𝐺[𝑡1, 𝑡2] represent the subgraph of 𝐺 such that all edges are
created between 𝑡1 and 𝑡2
– i.e., for all edges 𝑒 in this subgraph, 𝑡1 < 𝑡(𝑒) < 𝑡2.
• Given four time stamps 𝑡11 < 𝑡12 < 𝑡21 < 𝑡22 a link prediction
algorithm is given
– The subgraph G(𝑡11, 𝑡12) (training interval) and
– Is expected to predict edges in G(𝑡21, 𝑡22) (testing interval).
• We can only predict edges for nodes that exist in the training period
• Let G(𝑉𝑡𝑟𝑎𝑖𝑛, 𝐸𝑡𝑟𝑎𝑖𝑛) be our training graph. Then, a link prediction
algorithm generates a sorted list of most probable edges in
21Social Media Mining Measures and Metrics 21Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Link Prediction - Algorithms
• Assign 𝜎(𝑥, 𝑦) to every edge 𝑒(𝑥, 𝑦)
• Edges sorted by this value in decreasing order
will form our ranked list of predictions
• Any similarity measure between two nodes
can be used for link prediction;
– Network measures (Chapter 3) are useful here.
• We will review some well-known methods
– Node Neighborhood-Based Methods
– Path-Based Methods
22Social Media Mining Measures and Metrics 22Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Node Neighborhood-Based
Methods
23Social Media Mining Measures and Metrics 23Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Node Neighborhood-Based Methods
• Common Neighbors:
– The more common neighbors that two nodes
share, the more similar they are
• Jaccard Similarity:
– The likelihood of a node that is a neighbor of either
𝑥 or 𝑦 to be a common neighbor
24Social Media Mining Measures and Metrics 24Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Node Neighborhood-Based Methods
• Adamic-Adar:
– If two individuals share a neighbor
– and that neighbor is a rare neighbor,
– it should have a higher impact on their similarity.
• Preferential Attachment:
– Nodes of higher degree have a higher chance of
getting connected to incoming nodes
Rareness
25Social Media Mining Measures and Metrics 25Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Example
Compute the edge score for edge (5,7)
26Social Media Mining Measures and Metrics 26Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Path-Based Methods
27Social Media Mining Measures and Metrics 27Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Path-Based Measures
• Katz Measure:
– |𝑝𝑎𝑡ℎ𝑠 𝑥,𝑦
<𝑙>|denotes the number of paths of length 𝑙
between 𝑥 and 𝑦
– 𝛽 is a constant that exponentially damps longer paths
• When 𝛽 is small, Katz measure reduces to common neighbor
– It can be reformulated in closed form as
28Social Media Mining Measures and Metrics 28Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Path-Based Measures
• Hitting Time and Commute Time:
– Consider a random walk that starts at node 𝑥 and
moves to adjacent nodes uniformly.
– Hitting time is the expected number of random
walk steps needed to reach 𝑦 starting from 𝑥.
– A smaller hitting time implies a higher similarity
• A negation can turn it into a similarity measure
– If 𝒚 is highly connected random walks are more
likely to visit 𝒚
• We can normalize it using the stationary probability
29Social Media Mining Measures and Metrics 29Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Path-Based Measures
– Hitting time is not symmetric, we can use commute
time instead, or its normalized version
• Rooted PageRank: the stationary probability of
𝑦, when at each random walk run you can jump
to 𝑥 with probability 𝑃 and to a random node
with 1 − 𝑃
• SimRank: Recursive definition of similarity
30Social Media Mining Measures and Metrics 30Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Link Prediction - Evaluation
• After one of the aforementioned measures is selected, a
list of the top most similar pairs of nodes are selected.
• These pairs of nodes denote edges predicted to be the
most likely to soon appear in the network.
• Performance (precision, recall, or accuracy) can be
evaluated using the testing graph and by comparing the
number of the testing graph’s edges that the link
prediction algorithm successfully reveals.
• Performance is usually very low, since many edges are
created due to reasons not solely available in a social
network graph.
– Solution: a common baseline is to compare the performance
with random edge predictors and report the factor
improvements over random prediction.
31Social Media Mining Measures and Metrics 31Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Collective Behavior
32Social Media Mining Measures and Metrics 32Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Collective Behavior
• First Defined by sociologist Robert Park
• Collective Behavior: A group of
individuals behaving in a similar way
• It can be planned and coordinated, but
often is spontaneous and unplanned
Examples
• Individuals standing in line for a new product
release
• Posting messages online to support a cause
or to show support for an individual
33Social Media Mining Measures and Metrics 33Social Media Mining Behavior Analyticshttp://socialmediamining.info/
I. Collective Behavior
Analysis
34Social Media Mining Measures and Metrics 34Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Collective Behavior Analysis
• We can analyze collective behavior by
analyzing individuals performing the behavior
• We can then put together the results of these
analyses
• The result would be the expected behavior for a
large population
• OR, we can analyze the population as a whole
• Not very popular for analysis, as individuals are
ignored
• Popular for Prediction purposes
35Social Media Mining Measures and Metrics 35Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Example – Analyzing User Migrations
Users migrate in social media due to their limited time and
resources
• Sites are interested in keeping their users, because they are
valuable assets that help contribute to their growth and generate
revenue by increasing traffic
Two types of migrations:
• Site migration: For any user who is a member of two sites
𝑆1 and 𝑆2 at time 𝑡𝑖, and is only a member of 𝑆2 at time 𝑡𝑗 >
𝑡𝑖, then the user is said to have migrated from site 𝑆1 to 𝑆2.
• Attention Migration: For any user who is a member of
two sites 𝑆1 and 𝑆2 and is active at both at time 𝑡𝑖, if the
user becomes inactive on 𝑆1 and remains active on 𝑆2 at
time 𝑡𝑗 > 𝑡𝑖, then the user’s attention is said to have
migrated away from site 𝑆1 and toward site 𝑆2.
36Social Media Mining Measures and Metrics 36Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Collective Behavior Analysis - Example
• Activity (or inactivity) of a user can be determined by
observing the user’s actions performed on the site.
• We can consider a user active in [𝑡, 𝑡 + 𝑋 ], if the user has
performed at least one action on the site during this period
– Otherwise, the user is considered inactive.
• The interval could be measured at different granularity levels
– E.g., days, weeks, months, and years.
– It is common to set 𝑋 = 1 month.
• We can analyze migrations of individuals and then measure
the rate at which the populations are migrating across sites.
– We can use the methodology for individual behavior analysis
37Social Media Mining Measures and Metrics 37Social Media Mining Behavior Analyticshttp://socialmediamining.info/
The Observable Behavior
• Sites migration is rarely
observed
• Attention migration is
clearly observable
• We need to take
multiple steps to
observe it:
• Users are required to be
identified on multiple
networks (challenging!)
• Some ideas:
John.Smith1 on
Facebook is JohnSmith
on Twitter
38Social Media Mining Measures and Metrics 38Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Features
• User Activity: more active users are less likely to
migrate
• e.g., number of tweets, posts, or photos
• User Network Size: a user with more social ties
(i.e., friends) in a social network is less likely to move
• e.g., number of friends
• User Rank: a user with high status in a network is
less likely to move to a new one where he or she
must spend more time getting established.
• e.g., centrality scores
• External rank: your citations, how many have referred
to your article, ...
39Social Media Mining Measures and Metrics 39Social Media Mining Behavior Analyticshttp://socialmediamining.info/
• Given two snapshots of a network, we know if users
migrated or not.
• Let vector Y ∈ ℝ 𝑛 indicate whether any of our 𝑛 users
have migrated or not.
• Let 𝑋𝑡 ∈ ℝ3×𝑛 be the features collected (activity, friends,
rank) for any one of these users at time stamp 𝑡.
• The correlation between features 𝑋 and labels 𝑌 can be
computed via logistic regression.
• How can we verify that this correlation is not random?
Feature-Behavior Association
40Social Media Mining Measures and Metrics 40Social Media Mining Behavior Analyticshttp://socialmediamining.info/
To verify if the correlation between features and
the migration behavior is not random
• We can construct a random set of migrating users
• compute 𝑋 𝑅𝑎𝑛𝑑𝑜𝑚 and 𝑌𝑅𝑎𝑛𝑑𝑜𝑚for them
• Find the correlation between these random
variables (e.g., regression coefficients) and it
should be significantly different from what we
obtained using real-world observations
We can use 𝜒2
(Chi-square) test for significance
testing
Evaluation Strategy
From Original Dataset
From Random Dataset
41Social Media Mining Measures and Metrics 41Social Media Mining Behavior Analyticshttp://socialmediamining.info/
II. Collective Behavior
Modeling
42Social Media Mining Measures and Metrics 42Social Media Mining Behavior Analyticshttp://socialmediamining.info/
• Collective behavior can be conveniently
modeled using some of the techniques
discussed in Chapter 4 - Network Models.
• We want models that can mimic
characteristics observable in the population.
• In network models, node properties rarely
play a role
– Reasonable for modeling collective behavior.
Collective Behavior Modeling
43Social Media Mining Measures and Metrics 43Social Media Mining Behavior Analyticshttp://socialmediamining.info/
III. Collective Behavior
Prediction
44Social Media Mining Measures and Metrics 44Social Media Mining Behavior Analyticshttp://socialmediamining.info/
Collective Behavior Prediction
• From previous chapters, we could use
• Linear Influence Model (LIM)
• Epidemic Models
• Collective behavior can be analyzed either in terms of
1. individuals performing the collective behavior or
2. based on the population as a whole. (More Common)
• When predicting collective behavior,
• We are interested in predicting the intensity of a phenomenon, which is due to the
collective behavior of the population
• e.g., how many of them will vote?
• We can utilize a data mining approach where features that describe the
population well are used to predict a response variable
• i.e., the intensity of the phenomenon
• A training-testing framework or correlation analysis is used to
determine the generalization and the accuracy of the predictions.
45Social Media Mining Measures and Metrics 45Social Media Mining Behavior Analyticshttp://socialmediamining.info/
1. Set the target variable that is being predicted
– In our example: the revenue that a movie produces.
– The revenue is the direct result of the collective behavior of going to the
theater to watch the movie.
2. Identify features in the population that may affect the target
variable
– the average hourly number of tweets related to the movie for each of the
seven days prior to the movie opening (seven features)
– The number of opening theaters for the movie (one feature).
3. Predict the target variable using a supervised learning approach,
utilizing the features determined in step 2.
4. Measure performance using supervised learning evaluation.
The predictions using this approach are closer to reality than that of the
Hollywood Stock Exchange (HSX), which is the gold standard for predicting
revenues for movies
Predicting Box Office Revenue for Movies
46Social Media Mining Measures and Metrics 46Social Media Mining Behavior Analyticshttp://socialmediamining.info/
• Target variable 𝒚
• Some feature 𝐴 that quantifies the attention
• Some feature 𝑃 that quantifies the publicity
• Train a regression model
Generalizing the Idea

Weitere ähnliche Inhalte

Was ist angesagt?

Social Media Mining - Chapter 5 (Data Mining Essentials)
Social Media Mining - Chapter 5 (Data Mining Essentials)Social Media Mining - Chapter 5 (Data Mining Essentials)
Social Media Mining - Chapter 5 (Data Mining Essentials)SocialMediaMining
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit Vpkaviya
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectivenessemapesce
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithmsAlireza Andalib
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systemsguest77b0cd12
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part ITHomas Plotkowiak
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011guillaume ereteo
 
Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Wael Elrifai
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisFred Stutzman
 
CS6010 Social Network Analysis Unit III
CS6010 Social Network Analysis   Unit IIICS6010 Social Network Analysis   Unit III
CS6010 Social Network Analysis Unit IIIpkaviya
 
Community Detection
Community Detection Community Detection
Community Detection Kanika Kanwal
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis Dragan Gasevic
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social networkakash_mishra
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011idoguy
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 

Was ist angesagt? (20)

Social Media Mining - Chapter 5 (Data Mining Essentials)
Social Media Mining - Chapter 5 (Data Mining Essentials)Social Media Mining - Chapter 5 (Data Mining Essentials)
Social Media Mining - Chapter 5 (Data Mining Essentials)
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011
 
Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
CS6010 Social Network Analysis Unit III
CS6010 Social Network Analysis   Unit IIICS6010 Social Network Analysis   Unit III
CS6010 Social Network Analysis Unit III
 
Social Network Analysis (SNA)
Social Network Analysis (SNA)Social Network Analysis (SNA)
Social Network Analysis (SNA)
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Community Detection
Community Detection Community Detection
Community Detection
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 

Ähnlich wie Social Media Mining - Chapter 10 (Behavior Analytics)

Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
 
User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks Shah Alam Sabuj
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social MediaSeth Grimes
 
Recommender System _Module 1_Introduction to Recommender System.pptx
Recommender System _Module 1_Introduction to Recommender System.pptxRecommender System _Module 1_Introduction to Recommender System.pptx
Recommender System _Module 1_Introduction to Recommender System.pptxSatyam Sharma
 
The power of social media anlaytics
The power of social media anlayticsThe power of social media anlaytics
The power of social media anlayticsAjay Ram
 
How to use Big Data to drive product strategy and adoption
How to use Big Data to drive product strategy and adoptionHow to use Big Data to drive product strategy and adoption
How to use Big Data to drive product strategy and adoptionUXPA International
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)ShankarPrasaadRajama
 
A Survey on Multi-party Privacy Disputes in Social Networks
A Survey on Multi-party Privacy Disputes in Social NetworksA Survey on Multi-party Privacy Disputes in Social Networks
A Survey on Multi-party Privacy Disputes in Social NetworksIRJET Journal
 
Social media data analysis
Social media data analysisSocial media data analysis
Social media data analysisShweta Patnaik
 
Defrag 2010 Collaborative Analytics
Defrag 2010 Collaborative AnalyticsDefrag 2010 Collaborative Analytics
Defrag 2010 Collaborative Analyticsehuddleston
 
Group Chapter Presentation
Group Chapter PresentationGroup Chapter Presentation
Group Chapter Presentationfeinal
 
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkSocial Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkHemant Purohit
 
Social media presentation - fixed
Social media presentation - fixedSocial media presentation - fixed
Social media presentation - fixedBobbyS95
 
Social media presentation
Social media presentation Social media presentation
Social media presentation mhabibian
 
IBM Social Media Analytics
IBM Social Media AnalyticsIBM Social Media Analytics
IBM Social Media AnalyticsIan Balina
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET Journal
 
Advanced Techniques in Social Media Data Analysis.docx
Advanced Techniques in Social Media Data Analysis.docxAdvanced Techniques in Social Media Data Analysis.docx
Advanced Techniques in Social Media Data Analysis.docxQuick Metrix
 
Software Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshopSoftware Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshopManrique Lopez
 
Social media presentation
Social media presentationSocial media presentation
Social media presentationmhabibian
 

Ähnlich wie Social Media Mining - Chapter 10 (Behavior Analytics) (20)

Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
SM&WA_S1-2.pptx
SM&WA_S1-2.pptxSM&WA_S1-2.pptx
SM&WA_S1-2.pptx
 
User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
 
Recommender System _Module 1_Introduction to Recommender System.pptx
Recommender System _Module 1_Introduction to Recommender System.pptxRecommender System _Module 1_Introduction to Recommender System.pptx
Recommender System _Module 1_Introduction to Recommender System.pptx
 
The power of social media anlaytics
The power of social media anlayticsThe power of social media anlaytics
The power of social media anlaytics
 
How to use Big Data to drive product strategy and adoption
How to use Big Data to drive product strategy and adoptionHow to use Big Data to drive product strategy and adoption
How to use Big Data to drive product strategy and adoption
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)
 
A Survey on Multi-party Privacy Disputes in Social Networks
A Survey on Multi-party Privacy Disputes in Social NetworksA Survey on Multi-party Privacy Disputes in Social Networks
A Survey on Multi-party Privacy Disputes in Social Networks
 
Social media data analysis
Social media data analysisSocial media data analysis
Social media data analysis
 
Defrag 2010 Collaborative Analytics
Defrag 2010 Collaborative AnalyticsDefrag 2010 Collaborative Analytics
Defrag 2010 Collaborative Analytics
 
Group Chapter Presentation
Group Chapter PresentationGroup Chapter Presentation
Group Chapter Presentation
 
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkSocial Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
 
Social media presentation - fixed
Social media presentation - fixedSocial media presentation - fixed
Social media presentation - fixed
 
Social media presentation
Social media presentation Social media presentation
Social media presentation
 
IBM Social Media Analytics
IBM Social Media AnalyticsIBM Social Media Analytics
IBM Social Media Analytics
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
 
Advanced Techniques in Social Media Data Analysis.docx
Advanced Techniques in Social Media Data Analysis.docxAdvanced Techniques in Social Media Data Analysis.docx
Advanced Techniques in Social Media Data Analysis.docx
 
Software Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshopSoftware Development Analytics Intro. Twitter OSS workshop
Software Development Analytics Intro. Twitter OSS workshop
 
Social media presentation
Social media presentationSocial media presentation
Social media presentation
 

Kürzlich hochgeladen

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 

Kürzlich hochgeladen (20)

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 

Social Media Mining - Chapter 10 (Behavior Analytics)

  • 2. 2Social Media Mining Measures and Metrics 2Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate these slides into your presentations, please include the following note: R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014. Free book and slides at http://socialmediamining.info/ or include a link to the website: http://socialmediamining.info/
  • 3. 3Social Media Mining Measures and Metrics 3Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Examples of Behavior Analytics • What motivates users to join an online group? • When users abandon social media sites, where do they migrate to? • Can we predict box office revenues for movies from tweets?
  • 4. 4Social Media Mining Measures and Metrics 4Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Behavior Analysis • To answers these questions we need to analyze or predict behaviors on social media. • Users exhibit different behaviors on social media: – As individuals, or – As part of a broader collective behavior. • When discussing individual behavior, – Our focus is on one individual. • Collective behavior emerges when a population of individuals behave in a similar way with or without coordination or planning.
  • 5. 5Social Media Mining Measures and Metrics 5Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Our Goal To analyze, model, and predict individual and collective behavior
  • 6. 6Social Media Mining Measures and Metrics 6Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Individual Behavior
  • 7. 7Social Media Mining Measures and Metrics 7Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Types of Individual Behavior • User-User (link generation) – befriending, sending a message, playing games, following, or inviting • User-Community – joining or leaving a community, participating in community discussions • User-Entity (content generation) – writing a post – posting a photo
  • 8. 8Social Media Mining Measures and Metrics 8Social Media Mining Behavior Analyticshttp://socialmediamining.info/ I. Individual Behavior Analysis
  • 9. 9Social Media Mining Measures and Metrics 9Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Example: Community Membership in Social Media • Why do users join communities? – Communities can be implicit: • Individuals buying a product as a community, and • People buying the product for the first time as individuals joining the community. – What factors affect the community-joining behavior of individuals? • We can observe users who join communities – Determine factors that are common among them • To observe users, we require – A population of users, – A community 𝐶, and – Community membership info (users who are members of 𝐶) • To distinguish between users who have already joined the community and those who are now joining it, – We need community memberships at two times 𝑡1 and 𝑡2, with 𝑡2 > 𝑡1 – At 𝑡2, we find users who are members of the community, but were not members at 𝑡1 • These new users form the subpopulation that is analyzed for community-joining behavior.
  • 10. 10Social Media Mining Measures and Metrics 10Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Community Membership in Social Media Hypothesis: – individuals are inclined toward an activity when their friends are engaged in the same activity. • A factor that plays a role in users joining a community is the number of their friends who are already members of the community. • In data mining terms, – number of friends of an individual in a community – A feature to predict whether the individual joins the community (i.e., class attribute). Number of Friends vs Probability of Joining a Community Backstrom, L., Huttenlocher, D., Kleinberg, J., & Lan, X. (2006, August). Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 44-54). ACM.
  • 11. 11Social Media Mining Measures and Metrics 11Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Even More Features
  • 12. 12Social Media Mining Measures and Metrics 12Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Feature Importance Analysis Which feature can help best determine whether individuals will join or not? I. We can use any feature selection algorithm, or II. We can use a classification algorithm, such as decision tree learning – Most important Features are ranked higher
  • 13. 13Social Media Mining Measures and Metrics 13Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Decision Tree for Joining a Community Are these features well-designed? – We can evaluate using classification performance metrics
  • 14. 14Social Media Mining Measures and Metrics 14Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Behavior Analysis Methodology • An observable behavior – The behavior needs to be observable – E.g., accurately observing the joining of individuals (and possibly their joining times) • Features: – Finding data features (covariates) that may or may not affect (or be affected by) the behavior – We need a domain expert for this step • Feature-Behavior Association: – Find the relationship between features and behavior – E.g., use decision tree learning • Evaluation: – The findings are due to the features and not to externalities. – E.g., we can use • classification accuracy • randomization tests (discussed later!) • or causality testing algorithms
  • 15. 15Social Media Mining Measures and Metrics 15Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Granger Causality Consider a linear regression model • We can predict 𝒀𝒕 + 𝟏 by using either 𝒀 𝟏, 𝒀 𝟐 … 𝒀𝒕 or a combination of 𝑿 𝟏, 𝑿 𝟐 … 𝑿𝒕 and 𝒀 𝟏, 𝒀 𝟐 … 𝒀𝒕 • If 𝜺 𝟐 < 𝜺 𝟏then 𝑿 Granger Causes 𝒀 • Why is this not causality?
  • 16. 16Social Media Mining Measures and Metrics 16Social Media Mining Behavior Analyticshttp://socialmediamining.info/ II. Individual Behavior Modeling
  • 17. 17Social Media Mining Measures and Metrics 17Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Individual Behavior Modeling • Models in – Economics, Game Theory, and Network Science We can use: 1. Threshold Models: we need to learn thresholds and weights • 𝑊𝑖𝑗 can be defined as the fraction of times user 𝑖 buys a product and user 𝑗 buys the same product soon after that – When is soon? – Similarly, thresholds can be estimated by taking into account the average number of friends who need to buy a product before user 𝑖 decides to buy it. – What if friends don’t buy the same products? • We can find the most similar individuals or items (similar to collaborative filtering methods) 2. Cascade Models
  • 18. 18Social Media Mining Measures and Metrics 18Social Media Mining Behavior Analyticshttp://socialmediamining.info/ III. Individual Behavior Prediction
  • 19. 19Social Media Mining Measures and Metrics 19Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Individual Behavior Prediction • Most behaviors result in newly formed links in social media. – It can be a link to a user, as in befriending behavior; – A link to an entity, as in buying behavior; or – A link to a community, as in joining behavior. • We can formulate many of these behaviors as a link prediction problem.
  • 20. 20Social Media Mining Measures and Metrics 20Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Link Prediction - Setup • Given a graph 𝐺(𝑉, 𝐸), let 𝑒(𝑢, 𝑣) denote edge between nodes 𝑢 and 𝑣 – 𝑡(𝑒) denotes the time that the edge was formed • Let 𝐺[𝑡1, 𝑡2] represent the subgraph of 𝐺 such that all edges are created between 𝑡1 and 𝑡2 – i.e., for all edges 𝑒 in this subgraph, 𝑡1 < 𝑡(𝑒) < 𝑡2. • Given four time stamps 𝑡11 < 𝑡12 < 𝑡21 < 𝑡22 a link prediction algorithm is given – The subgraph G(𝑡11, 𝑡12) (training interval) and – Is expected to predict edges in G(𝑡21, 𝑡22) (testing interval). • We can only predict edges for nodes that exist in the training period • Let G(𝑉𝑡𝑟𝑎𝑖𝑛, 𝐸𝑡𝑟𝑎𝑖𝑛) be our training graph. Then, a link prediction algorithm generates a sorted list of most probable edges in
  • 21. 21Social Media Mining Measures and Metrics 21Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Link Prediction - Algorithms • Assign 𝜎(𝑥, 𝑦) to every edge 𝑒(𝑥, 𝑦) • Edges sorted by this value in decreasing order will form our ranked list of predictions • Any similarity measure between two nodes can be used for link prediction; – Network measures (Chapter 3) are useful here. • We will review some well-known methods – Node Neighborhood-Based Methods – Path-Based Methods
  • 22. 22Social Media Mining Measures and Metrics 22Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Node Neighborhood-Based Methods
  • 23. 23Social Media Mining Measures and Metrics 23Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Node Neighborhood-Based Methods • Common Neighbors: – The more common neighbors that two nodes share, the more similar they are • Jaccard Similarity: – The likelihood of a node that is a neighbor of either 𝑥 or 𝑦 to be a common neighbor
  • 24. 24Social Media Mining Measures and Metrics 24Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Node Neighborhood-Based Methods • Adamic-Adar: – If two individuals share a neighbor – and that neighbor is a rare neighbor, – it should have a higher impact on their similarity. • Preferential Attachment: – Nodes of higher degree have a higher chance of getting connected to incoming nodes Rareness
  • 25. 25Social Media Mining Measures and Metrics 25Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Example Compute the edge score for edge (5,7)
  • 26. 26Social Media Mining Measures and Metrics 26Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Path-Based Methods
  • 27. 27Social Media Mining Measures and Metrics 27Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Path-Based Measures • Katz Measure: – |𝑝𝑎𝑡ℎ𝑠 𝑥,𝑦 <𝑙>|denotes the number of paths of length 𝑙 between 𝑥 and 𝑦 – 𝛽 is a constant that exponentially damps longer paths • When 𝛽 is small, Katz measure reduces to common neighbor – It can be reformulated in closed form as
  • 28. 28Social Media Mining Measures and Metrics 28Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Path-Based Measures • Hitting Time and Commute Time: – Consider a random walk that starts at node 𝑥 and moves to adjacent nodes uniformly. – Hitting time is the expected number of random walk steps needed to reach 𝑦 starting from 𝑥. – A smaller hitting time implies a higher similarity • A negation can turn it into a similarity measure – If 𝒚 is highly connected random walks are more likely to visit 𝒚 • We can normalize it using the stationary probability
  • 29. 29Social Media Mining Measures and Metrics 29Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Path-Based Measures – Hitting time is not symmetric, we can use commute time instead, or its normalized version • Rooted PageRank: the stationary probability of 𝑦, when at each random walk run you can jump to 𝑥 with probability 𝑃 and to a random node with 1 − 𝑃 • SimRank: Recursive definition of similarity
  • 30. 30Social Media Mining Measures and Metrics 30Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Link Prediction - Evaluation • After one of the aforementioned measures is selected, a list of the top most similar pairs of nodes are selected. • These pairs of nodes denote edges predicted to be the most likely to soon appear in the network. • Performance (precision, recall, or accuracy) can be evaluated using the testing graph and by comparing the number of the testing graph’s edges that the link prediction algorithm successfully reveals. • Performance is usually very low, since many edges are created due to reasons not solely available in a social network graph. – Solution: a common baseline is to compare the performance with random edge predictors and report the factor improvements over random prediction.
  • 31. 31Social Media Mining Measures and Metrics 31Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Collective Behavior
  • 32. 32Social Media Mining Measures and Metrics 32Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Collective Behavior • First Defined by sociologist Robert Park • Collective Behavior: A group of individuals behaving in a similar way • It can be planned and coordinated, but often is spontaneous and unplanned Examples • Individuals standing in line for a new product release • Posting messages online to support a cause or to show support for an individual
  • 33. 33Social Media Mining Measures and Metrics 33Social Media Mining Behavior Analyticshttp://socialmediamining.info/ I. Collective Behavior Analysis
  • 34. 34Social Media Mining Measures and Metrics 34Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Collective Behavior Analysis • We can analyze collective behavior by analyzing individuals performing the behavior • We can then put together the results of these analyses • The result would be the expected behavior for a large population • OR, we can analyze the population as a whole • Not very popular for analysis, as individuals are ignored • Popular for Prediction purposes
  • 35. 35Social Media Mining Measures and Metrics 35Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Example – Analyzing User Migrations Users migrate in social media due to their limited time and resources • Sites are interested in keeping their users, because they are valuable assets that help contribute to their growth and generate revenue by increasing traffic Two types of migrations: • Site migration: For any user who is a member of two sites 𝑆1 and 𝑆2 at time 𝑡𝑖, and is only a member of 𝑆2 at time 𝑡𝑗 > 𝑡𝑖, then the user is said to have migrated from site 𝑆1 to 𝑆2. • Attention Migration: For any user who is a member of two sites 𝑆1 and 𝑆2 and is active at both at time 𝑡𝑖, if the user becomes inactive on 𝑆1 and remains active on 𝑆2 at time 𝑡𝑗 > 𝑡𝑖, then the user’s attention is said to have migrated away from site 𝑆1 and toward site 𝑆2.
  • 36. 36Social Media Mining Measures and Metrics 36Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Collective Behavior Analysis - Example • Activity (or inactivity) of a user can be determined by observing the user’s actions performed on the site. • We can consider a user active in [𝑡, 𝑡 + 𝑋 ], if the user has performed at least one action on the site during this period – Otherwise, the user is considered inactive. • The interval could be measured at different granularity levels – E.g., days, weeks, months, and years. – It is common to set 𝑋 = 1 month. • We can analyze migrations of individuals and then measure the rate at which the populations are migrating across sites. – We can use the methodology for individual behavior analysis
  • 37. 37Social Media Mining Measures and Metrics 37Social Media Mining Behavior Analyticshttp://socialmediamining.info/ The Observable Behavior • Sites migration is rarely observed • Attention migration is clearly observable • We need to take multiple steps to observe it: • Users are required to be identified on multiple networks (challenging!) • Some ideas: John.Smith1 on Facebook is JohnSmith on Twitter
  • 38. 38Social Media Mining Measures and Metrics 38Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Features • User Activity: more active users are less likely to migrate • e.g., number of tweets, posts, or photos • User Network Size: a user with more social ties (i.e., friends) in a social network is less likely to move • e.g., number of friends • User Rank: a user with high status in a network is less likely to move to a new one where he or she must spend more time getting established. • e.g., centrality scores • External rank: your citations, how many have referred to your article, ...
  • 39. 39Social Media Mining Measures and Metrics 39Social Media Mining Behavior Analyticshttp://socialmediamining.info/ • Given two snapshots of a network, we know if users migrated or not. • Let vector Y ∈ ℝ 𝑛 indicate whether any of our 𝑛 users have migrated or not. • Let 𝑋𝑡 ∈ ℝ3×𝑛 be the features collected (activity, friends, rank) for any one of these users at time stamp 𝑡. • The correlation between features 𝑋 and labels 𝑌 can be computed via logistic regression. • How can we verify that this correlation is not random? Feature-Behavior Association
  • 40. 40Social Media Mining Measures and Metrics 40Social Media Mining Behavior Analyticshttp://socialmediamining.info/ To verify if the correlation between features and the migration behavior is not random • We can construct a random set of migrating users • compute 𝑋 𝑅𝑎𝑛𝑑𝑜𝑚 and 𝑌𝑅𝑎𝑛𝑑𝑜𝑚for them • Find the correlation between these random variables (e.g., regression coefficients) and it should be significantly different from what we obtained using real-world observations We can use 𝜒2 (Chi-square) test for significance testing Evaluation Strategy From Original Dataset From Random Dataset
  • 41. 41Social Media Mining Measures and Metrics 41Social Media Mining Behavior Analyticshttp://socialmediamining.info/ II. Collective Behavior Modeling
  • 42. 42Social Media Mining Measures and Metrics 42Social Media Mining Behavior Analyticshttp://socialmediamining.info/ • Collective behavior can be conveniently modeled using some of the techniques discussed in Chapter 4 - Network Models. • We want models that can mimic characteristics observable in the population. • In network models, node properties rarely play a role – Reasonable for modeling collective behavior. Collective Behavior Modeling
  • 43. 43Social Media Mining Measures and Metrics 43Social Media Mining Behavior Analyticshttp://socialmediamining.info/ III. Collective Behavior Prediction
  • 44. 44Social Media Mining Measures and Metrics 44Social Media Mining Behavior Analyticshttp://socialmediamining.info/ Collective Behavior Prediction • From previous chapters, we could use • Linear Influence Model (LIM) • Epidemic Models • Collective behavior can be analyzed either in terms of 1. individuals performing the collective behavior or 2. based on the population as a whole. (More Common) • When predicting collective behavior, • We are interested in predicting the intensity of a phenomenon, which is due to the collective behavior of the population • e.g., how many of them will vote? • We can utilize a data mining approach where features that describe the population well are used to predict a response variable • i.e., the intensity of the phenomenon • A training-testing framework or correlation analysis is used to determine the generalization and the accuracy of the predictions.
  • 45. 45Social Media Mining Measures and Metrics 45Social Media Mining Behavior Analyticshttp://socialmediamining.info/ 1. Set the target variable that is being predicted – In our example: the revenue that a movie produces. – The revenue is the direct result of the collective behavior of going to the theater to watch the movie. 2. Identify features in the population that may affect the target variable – the average hourly number of tweets related to the movie for each of the seven days prior to the movie opening (seven features) – The number of opening theaters for the movie (one feature). 3. Predict the target variable using a supervised learning approach, utilizing the features determined in step 2. 4. Measure performance using supervised learning evaluation. The predictions using this approach are closer to reality than that of the Hollywood Stock Exchange (HSX), which is the gold standard for predicting revenues for movies Predicting Box Office Revenue for Movies
  • 46. 46Social Media Mining Measures and Metrics 46Social Media Mining Behavior Analyticshttp://socialmediamining.info/ • Target variable 𝒚 • Some feature 𝐴 that quantifies the attention • Some feature 𝑃 that quantifies the publicity • Train a regression model Generalizing the Idea