4. Quick Review
• Social Influence : Our friendship and behavior
is affected by Social Influence (to conform to
our neighbors value).
• Selection: We have a tendency to be friends
with people who are like us.
• Homophily: A widely observed social
phenomena which states that “we tend to be
similar to our friends”.
5. Quick Note before we start…
We will refer to Selection as Homophily
(Reason: Authors assume that if Homophily
effects are present, we tend to select individuals
with similar values)
7. Selection Vs social influence: Why do
we care?
• If Social Influence is a significant factor, then
targeting key individuals and trying to modify
undesirable behavior can be effective since we
are then viewing such behavior as a process of
influence spread.
• Otherwise, focusing on a few individuals will
at best change the behavior of a few
individuals.
8. REAL WORLD SCENARIO
• A firm selling products to consumers in a
social network.
• The firm knows that friends in the network
often make similar purchases.
• What is the reason behind this similarity?
• Is it because they have similar tastes, since,
after all, they are friends?
• Is it because one influences the other’s
decision, as they communicate frequently?
Credits: (Homophily or Influence? – Analysis of Purchase
Decisions in a Social Network Context Liye Ma, Alan Montgomery and Ramayya Krishnan )
9. How can the firm take advantage?
• If it is the taste similarity that drives the
similar decisions, the firm should directly
target friends of that customer by offering
discounts to them.
• If, it is social influence that drives the
similarity, the firm should incentivize that
customer to promote the product or service to
her friends.
Credits: (Homophily or Influence? – Analysis of Purchase
Decisions in a Social Network Context Liye Ma, Alan Montgomery and Ramayya Krishnan )
11. EXISTING WORK
• A lot of research has gone into understanding
“Homophily” and “Social Influence” in social
networks.
• Quickly mention studies which involve direct
analysis of “Identifying and measuring
Homophily and social influence effects”.
• This problem area serves as one of the biggest
open ended challenges to Social Scientists. (
will make a good class project as well :D )
13. RELATED WORK - 1
• “Homophily or Influence? – Analysis of
Purchase Decisions in a Social Network
Context”
http://people.stern.nyu.edu/bakos/wise/papers/wise2009-5b2_paper.pdf
14. QUICK LOOK AT THE STUDY
• Phone call history dataset (3.7 Million) from
an Indian Telecom company over a 6 month
period for purchase records of monthly Caller
Ring Back Tones (CRBT) subscription.
• Social Influence & Homophily is studied.
• Study builds a “Hierarchical Bayesian model”
which simultaneously accounts for both
Homophily and social influence effect in
consumers’ decision process.
15. RELATED WORK - 2
• “Social selection and peer influence in an
online social network.”
http://www.irle.berkeley.edu/culture/conf2012/lewis_soc12.pdf
16. QUICK LOOK AT THE STUDY
• Employs Facebook activity of college students.
• Coevolution of friendship and tastes in music,
movies and books over a 4 year time period is
analyzed.
• A “Stochastic actor-based” modeling is
employed to analyze individual effects of
Social Influence & Homophily.
17. RELATED WORK - 3
• “Distinguishing influence-based contagion
from Homophily driven diffusion in dynamic
networks.”
http://www.pnas.org/content106/51/21544.full.pdf
18. QUICK LOOK AT THE STUDY
• Employs the study of a longitudinal dataset
that combines the global network of daily
instant messaging (IM) traffic among 27.4
million users of Yahoo with day-by-day
adoption of a mobile service application
(Yahoo! Go)
• A sample estimation framework to distinguish
influence based on “Matched sample
estimation” is developed.
19. ANALYSIS OF EXISTING APPROACHES
• Empirical Investigations
(Focuses on demonstrating the presence
Homophily and Influence in real world data sets)
of
• Significance Tests for Relational and Social
network data
(Focuses mostly on static networks)
• Modeling Techniques
Homophily & Influence.
for
distinguishing
(Accuracy is impacted by suitability of model)
21. INTRODUCTION
• In Social Network, connected instances are
likely to have auto correlated attributes value.
• “Two friends are more likely to share a
common political belief than two random
strangers.”
• Presents a Randomization technique for
temporal network data for measuring
individual contribution of Homophily and
Social Influence (details coming soon!).
22. THE EXPERIMENT / SUPPORT
• A subset of data from a Facebook group in
Purdue.
• Time step from 2008(t) to 2009(t+1)
• Hypothesis tested on :
1. Semi Synthetic Data with no Homophily & Social Influence.
2. Semi Synthetic Data with strong Homophily or Influence
effect.
3. Actual experiment on real dataset.
• Efficacy of the approach was proven for all
conditions.
23. PROBLEM DEFINITION
• Relational data represented as an undirected,
attributed graph G=(V,E)
• Each node v belongs to V, has a number of
attributes (X1………….Xm)
• For a time step ‘t’, the attributes and
relationships can change.
• Significant Influence : Attributes in t+1 depend
on link structure at t.
• Significant Homophily : Link structure in t+1 will
depend on attributes at t.
(Keep them in mind! We will come back to them)
24. BACKGROUND
• In Statistics, an association is a relationship
between
two
statistically
dependent
quantities.
• ‘Relation Autocorrelation’ : Statistical
dependency between values of the same
variable on related object. ( Abundant in our
dataset) Why?
• In this work we use the Chi-Square statistics.
26. CHI-SQUARE STATISTICS
• How likely is an observed distribution due to
chance?
• Observe 100 students to see “whether attending
class influences how students perform on exam?”
• Four categories :
–
–
–
–
Students who attend class and pass.
Students who attend class and do not pass.
Students who do not attend class and pass.
Students who do not attend class and do not pass.
• Null Hypothesis : There is no difference based on
attending classes.
27. CHI-SQUARE Continued….
• The test compares the observed data to a model that
distributes the data according to the expectation that
the variables are independent. Wherever the observed
data doesn't fit the model, the likelihood that the
variables are dependent becomes stronger, thus
proving the null hypothesis incorrect!
• Degree of freedom : Values in final calculations that
are free to vary.
• Calculate the Chi Square value. (How?)
• Calculate the more interesting ‘p’ value (Percentage
likelihood that the null hypothesis is correct)
31. HOMOPHILY Continued…
If a Homophily effect is present in the data, the
autocorrelation will increase when we consider
the link changes from time t to time t+ 1 :
C( Xt , Gt+1 ) – C( Xt , Gt )
(The Chi-Square value is a single number that adds up all the
differences between our actual data and the data expected.)
32.
33. SOCIAL INFLUENCE Continued…
If an influence effect is present in the data, the
autocorrelation will increase when we consider
the attribute changes from time t to time t + 1:
C( Xt +1 , Gt ) – C( Xt , Gt )
(The Chi-Square value is a single number that adds up all the
differences between our actual data and the data expected.)
35. RANDOMIZATION TESTS
• Provide a robust statistical technique for
hypothesis testing.
• Generates several Pseudosamples (permutations
of original data sets).
• Correlation gain is calculated for each
Pseudosample.
• Value of observed gain is then compared to
distribution of scores.
• A high variance in comparison to the distribution
is deemed significant.
36. ANALYSIS OF KEY ISSUES
AND ASSUMPTIONS
(For Randomization Tests)
• Make an appropriate NULL Hypothesis.
• The data is permuted in a way that accurately
reflects the null hypothesis.
37. SELF ANALYSIS
The Approach is quite relevant and appropriate
as there are no assumptions on the underlying
model.
Also both the attribute values and link change
over time which focuses on assessing both
Influence and Homophily.
38. NULL HYPOTHESIS
• H0H : Link changes are random and are not due
to attribute values in t.
• H0I : Attribute changes are random and are not
due to friends in t.
• H0F : Both attribute and link changes are
random.
40. CHOICE BASED RANDOMIZATION
• For H0H we can maintain the edge addition in t+1
but randomize the choice of target node so that
each node has the same number of additions and
deletions.
• For H0I we can randomized the choice of attribute
value to replace in t+1, so that any similarity of
the value is destroyed.
• This is popularly referred to as “choice-based”
randomization, as we are randomizing the result
of choices(attribute/link changes)
41. CALCULATING CHOICE BASED
RANDOMIZATION
•
•
•
•
Non Trivial Problem.
A greedy assignment is involved.
Collect all the changes (edge & attributes).
Sort the nodes and attributes from those with
least number of random options to those with
largest options.
• Prevents abusing the underlying NULL
hypothesis
42.
43.
44. SELF ANALYSIS
Where to go from here?
• Changing the granularity of time step to
investigate deeper.
• Investigating why certain groups had more of
Homophily or Social Influence?
• Apart from friendship, considering other
influential effects.
45. SUMMARY
• Successful Employed a Randomization Technique
for distinguishing Homophily and Social Influence.
• Tested the hypothesis on different synthetic-real
world data sets.
• Different groups had Influence and Homophily
vary to different degree based on group
properties.