A network pruning based approach for subset specific influential detection

A Network Pruning Based Approach for
Subset-Speciﬁc Inﬂuential Detection
Praphul Chandra, Arun Kalyanasundaram
Hewlett Packard Labs, Bangalore, India
ACM Web Science 2012

Can they really inﬂuence our decisions?

Who else can inﬂuence our decisions?

How do we exploit this spread of inﬂuence?

Viral Marketing

Viral Marketing
Inﬂuential Detection
• Identify a set of nodes (or individuals) to seed with some
information so as to maximize the spread of the seeded
information in the network. [Domingos, et al. 2001][1]

Inﬂuential Detection - Other Applications
Water Distribution Networks [Leskovec, et al. 2007][2]

Inﬂuential Detection - Other Applications
Water Distribution Networks [Leskovec, et al. 2007][2]
Preventing the spread of diseases [Christakis, Fowler 2007][3]

Influential Detection - A simple heuristic
b a
c
d
e
f
g
Most Influential
Finding the most influential node using the highest degree heuristic.

Our Problem - Subset Specific Influential Detection
• Aim : Maximize the spread of influence on a subset of nodes in
the network instead of the whole network.

Subset Speciﬁc Inﬂuential Detection - Examples
Small Businesses - Locality based marketing Political Campaign
[Focus on Supporters / Detractors]
Targeted advertisements - Demographics
[Nationality, Age, Gender, etc.]

Subset Specific Influential Detection - Our Motivation
• Increase in size / density of networks.
• Opportunity to improve the efficiency of traditional approaches.
• Current state of the art “adapts” existing algorithms on influential
detection to the subset specific version. [Kempe, et al. 2003][4]
[Aggarwal, et al. 2011][5]
• We address the subset specific top-k influential detection problem
standalone.

Subset Specific Influential Detection - A simple heuristic
b a
c
d
e
f
g
Subset of nodes to maximize influence spread
Subset Specific Most Influential
Finding the subset specific influential using the highest relevant degree heuristic.

Our Contribution - A Summary
• An efficient algorithm for subset specific top-k influential detection.
• Performance vs. efficiency trade-off using a tunable parameter - γ.
• Analytical framework: For an iteratively pruned network.
• A lower bound to evaluate the influence spread.
• Proof of sub-modularity of the influence spread function.

Background - Models of Information Diﬀusion
• Aim: Capture the dynamics of diﬀusion in social networks.
[Granovetter, Mark 1978][6]
• For Example : Independent Cascade Model (ICM) [Goldenberg,
et al. 2001][7]
• Node u activates its neighbor v with an independent probability, puv .
• Stochastic.
• In general puv = p, the propagation probability.

Background - Models of Information Diffusion
• Aim: Capture the dynamics of diffusion in social networks.
[Granovetter, Mark 1978][6]
• For Example : Independent Cascade Model (ICM) [Goldenberg,
et al. 2001][7]
• Node u activates its neighbor v with an independent probability, puv .
• Stochastic.
• In general puv = p, the propagation probability.
Activation of a node v by a node u can be seen as the outcome of a coin flip with bias puv

Independent Cascade Model - Activation Graphs
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.05
0.3
0.15
a
b
c
d
e
f
g
a
b
c
d
e
f
g
Activation Graph 1 Activation Graph 2
• Activation Graph
• Generated by sampling edges based on puv (edge weight).
• Allows us to evaluate the expected inﬂuence spread [Kempe, et al.
2003].

Evaluating Influence Spread In ICM [Kempe, et al. 2003]
• Expected influence spread due to a node u :
• Mean number of nodes reachable from u in N activation graphs.
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.05
0.3
0.15
.
.
a
b
c
d
e
f
g Ra = 3
Rb = 3
Rc = 3Rd = 3
Re = 0
Rf = 1
Rg = 1
a
b
c
d
e
f
g Ra = 2
Rb = 2
Rc = 0Rd = 2
Re = 2
Rf = 2
Rg = 2
.
.
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.05
0.3
0.15
Most Influential
Activation graph 1
Activation graph N
N Outcomes}
Ru : Number of nodes reachable from u, not including u.

Previous Work - Greedy Algorithm [Kempe, et al. 2003]
• σ(A): Inﬂuence spread, due to a seed set A.
• δu: Marginal contribution of u, which is σ(A ∪ {u}) − σ(A)
• Approach : Iteratively choose a node u with highest δu.
• Performance guarantee : 63% of optimal solution.
• Running time scales exponentially with network size.

Greedy Algorithm - Pictorial Representation
ua
b
c
d
f
v
e
y
x
ua
b
c
d
f
v
e
y
x
Top-k Inﬂuential
Iteration 1
Node v chosen as the most inﬂuential node. Since, δv > δu > δa > ...

Greedy Algorithm - Pictorial Representation
ua
b
c
d
f
v
e
y
x
ua
b
c
d
f
v
e
y
x
Top-k Inﬂuential
Iteration 1
Node v chosen as the most inﬂuential node. Since, δv > δu > δa > ...
ua
b
c
d
f
v
e
w
x
ua
b
c
d
f
v
e
w
x
Iteration 2
After Iteration 1, δu drops below δa. Hence a is chosen next.

Problem Statement
Given a graph, G(V , E) and a destination set D0 ⊆ V , find the top-k nodes in
V which maximize the spread of influence on D0.
b a
c
d
e
f
g
Destination Set (D0)
b a
c
d
e
f
g
[ Subset specific most influential does NOT lie in D0 ]
[ Subset specific most influential does lie in D0 ]
Salient features:
• Top-k nodes may or may not be in D0.
• When D0 = V , it reduces to the general form.

Trivial Extension - Subset Adapted Greedy
• Expected influence spread on D0 due to a node u :
• Mean number of nodes in D0 reachable from u in N activation graphs.
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.05
0.3
0.15
.
.
a
b
c
d
e
f
g Ra = 2
Rb = 2
Rc = 2Rd = 3
Re = 0
Rf = 0
Rg = 0
a
b
c
d
e
f
g Ra = 1
Rb = 1
Rc = 0Rd = 2
Re = 0
Rf = 0
Rg = 0
.
.
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.05
0.3
0.15
Subset Specific Most Influential
Activation graph 1
Activation graph N
N Outcomes}
Ru : Number of nodes in D0 reachable from u, not including u

Iterative Pruning Approach - Central Idea
Central Idea
• Identify a set of nodes, ψ which are considered “inﬂuenced”.
• De-prioritize the spread of inﬂuence to all nodes in ψ.

When to consider a node as inﬂuenced

• Based on a node’s susceptibility to inﬂuence.

For Example : [S. Aral, D. Walker 2011][8]

For Example : [S. Aral, D. Walker 2011][8]
• In our approach, we introduce a threshold parameter γu to model
the susceptibility of a node u.

Iterative Pruning Approach - In Three Steps
1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be inﬂuenced due
to a seed set, A.
• Lu(A) is the expectation that a node u will be active due to A.
Expected Inﬂuence Spread (σ(A)) =
u∈V
Lu(A)

to a seed set, A.
u∈V
Lu(A)
2. Set a threshold γu : Add a node u to ψ, when Lu ≥ γu.
• Sociological perspective of γ : Susceptibility or Ease of Inﬂuencing.
• Incorporates potential inﬂuence that can reach from all over the network.

to a seed set, A.
u∈V
Lu(A)
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.05
0.3
0.15
Subset Specific Influential
Influenced set (ψ)
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.050.3
0.15
La = 0.05
γa = 0.05
Lb = 0.25
γb = 0.2
Lc = 0.15
γc = 0.2

to a seed set, A.
u∈V
Lu(A)
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.05
0.3
0.15
a
b
c
d
e
f
g
0.1
0.2
0.01
0.1
0.050.3
0.15
La = 0.05
γa = 0.05
Lb = 0.25
γb = 0.2
Lc = 0.15
γc = 0.2
3. Pruning Process : Remove all paths that lead ONLY to nodes in ψ.
• Signiﬁcantly improves the eﬃciency. Details to follow.

Iterative Pruning Approach - Pruning Process
• ψ : The set of nodes considered inﬂuenced.
• Two level pruning process:
1. For each node in ψ, remove all its adjacent edges.
2. Recursively remove all paths that do NOT lead to any node in D0 ψ.
• How does pruning help?
• Improves eﬃciency by reducing the density of the underlying graph.
a
b
c
d
e
f
g
a
b
c
d
e
f
g
a
b
c
d
e
f
g
Level 1
Level 2

Experiments
• Datasets: Two real world co-authorship networks
1. High Energy Physics - Theory (HEPT) section of e-print arXiv
Dense network: 15233 nodes / 58891 edges
2. Conference on Software Maintenance and Re-engineering (SMRE)
Sparse network : 1336 nodes / 2200 edges
• Comparison with state of the art:
• Subset Adapted Greedy
• Subset Adapted CELF (Cost Eﬀective Lazy Forward) [Leskovec, et
al. 2007]
• System parameter - γ : {p4
, p3
, p2
, p, 2p, 4p}
where p is the propagation probability in ICM.

Results: [ Dataset 1 ] Dense Network
• Iterative Pruning (γ = p4
) vs. Subset Adapted Greedy:
• 96% improvement in eﬃciency.
• 10% drop in performance (inﬂuence spread).
• Iterative Pruning with CELF (γ = p4
) vs. Subset Adapted CELF:
• 10% drop in performance.

Results: [ Dataset 2 ] Sparse Network
• Iterative Pruning (γ = p4
) vs. Subset Adapted Greedy:
• 21% drop in performance (inﬂuence spread).
• Iterative Pruning with CELF (γ = p4
) vs. Subset Adapted CELF:
• 21% drop in performance.

Key Inferences
• Low values of γ are highly eﬃcient but at the cost of performance
loss.
• Choose a low value of γ for dense networks and a high value of γ for
sparse networks, in order to achieve a desirable performance.
• The relatively low eﬃciency gains with CELF is because the pruning
operation causes a simultaneous reduction in marginal contribution
of several nodes.

Analytical Framework
• Known:
Inﬂuence spread function σ(A) is sub-modular when the underlying graph
G(V , E) is static across iterations. [Kempe, et al. 2003]

• Known:
• Is σ<Gi >
(A) sub-modular when the underlying graph Gi (V , Ei ) is
iteratively pruned? Where Gi is the graph after ith
iteration.

• Known:
• Is σ<Gi >
iteration.
Yes. Details in our paper.

• Known:
• Is σ<Gi >
iteration.
• Can we estimate the σ(A) from σ<Gi >
(A)?

• Known:
• Is σ<Gi >
iteration.
• Can we estimate the σ(A) from σ<Gi >
(A)?
No, but we derive the following lower bound.
σ(A) ≥ σ<Gi >
(A) +
i−1
j=1 u∈ψj ψj+1
Lu(A)
where ψj is the set of inﬂuenced nodes after jth
iteration.

Summary
• Iterative network pruning algorithm for subset specific top-k influential
detection.
• Evaluation of our algorithm on two real world datasets showed significant
efficiency gains with an acceptable drop in performance.
• A tunable parameter γ for performance vs. efficiency trade-off.
• Analytical framework to show the sub-modularity of influence spread
function when the underlying graph is iteratively pruned thus enabling
the evaluation of performance guarantees.

Scope for Future Work
• Design of more eﬃcient algorithms.
• Evaluation with real world distributions of γ (susceptibility).
• Extension to non-progressive models of diﬀusion.

References
[1] P. Domingos and M. Richardson, “Mining the network value of customers,” in Proceedings of the seventh
ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’01. ACM, 2001,
pp. 57–66. [Online]. Available: http://doi.acm.org/10.1145/502512.502525
[2] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance, “Cost-effective outbreak
detection in networks,” in Proceedings of the thirteenth ACM SIGKDD international conference on Knowledge
discovery and data mining, ser. KDD ’07. ACM, 2007, pp. 420–429. [Online]. Available:
http://doi.acm.org/10.1145/1281192.1281239
[3] N. A. Christakis and J. H. Fowler, “The spread of obesity in a large social network over 32 years,” The New
England Journal of Medicine, vol. 357, no. 4, pp. 370–379, July 2007. [Online]. Available:
http://health-equity.pitt.edu/767/
[4] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread of influence through a social network,” in
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ser.
KDD ’03. ACM, 2003, pp. 137–146. [Online]. Available: http://doi.acm.org/10.1145/956750.956769
[5] C. C. Aggarwal, A. Khan, and X. Yan, “On flow authority discovery in social networks,” in Proceedings of the
eleventh SIAM international conference on Data mining, ser. SDM ’11. SIAM / Omnipress, 2011, pp.
522–533.
[6] M. Granovetter, “Threshold Models of Collective Behavior,” American Journal of Sociology, vol. 83, no. 6, pp.
1420–1443, 1978. [Online]. Available: http://dx.doi.org/10.2307/2778111
[7] J. Goldenberg, B. Libai, and E. Muller, “Talk of the Network: A Complex Systems Look at the Underlying
Process of Word-of-Mouth,” Marketing Letters, vol. 3, no. 12, pp. 211–223, Aug. 2001. [Online]. Available:
http://www.ingentaconnect.com/content/klu/mark/2001/00000012/00000003/00350022
[8] S. Aral and D. Walker, “Creating Social Contagion Through Viral Product Design: A Randomized Trial of

Questions?
P. Chandra and A. Kalyanasundaram, “A Network Pruning Based Approach
for Subset Speciﬁc Inﬂuential Detection”, in 4th Annual ACM conference on
Web Science (WebSci 2012), Evanston, Illinois, USA, Jun. 2012.
a
b
c
d
e
f
g
a
b
c
d
e
f
g
a
b
c
d
e
f
g
a
b
c
d
e
f
g

A network pruning based approach for subset specific influential detection

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Ähnlich wie A network pruning based approach for subset specific influential detection

Ähnlich wie A network pruning based approach for subset specific influential detection (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A network pruning based approach for subset specific influential detection