Inﬂuence and diffusion on networks
2

Inﬂuence and diffusion on networks
• Network Inference: Find inﬂuencers, improve marketing,
prevent disease outbreaks, and forecast crimes
2

The Network Inference Problem
• Learning each edge independently
- [Adar,Adamic‘2005]
• MLE-inspired approaches
- [Gomez-Rodriguez, Leskovec, Krause’2010]
- [Gomez-Rodriguez, Balduzzi, Scholkopf’2011]
- [Myers, Leskovec‘2011]
- [Du et al.‘2012]
• Information theoretic
- [Netrapalli, Sanghavi‘2012]
- [Grippon, Rabbat‘2013]
3

• Learning each edge independently
- [Adar,Adamic‘2005]
• MLE-inspired approaches
- [Gomez-Rodriguez, Leskovec, Krause’2010]
- [Gomez-Rodriguez, Balduzzi, Scholkopf’2011]
- [Myers, Leskovec‘2011]
- [Du et al.‘2012]
• Information theoretic
- [Netrapalli, Sanghavi‘2012]
- [Grippon, Rabbat‘2013]
3
Our work

• The relationship between the amount of data and the
performance of inference algorithms is not well understood
4
What can be inferred? What amounts of resources are
required? How hard is the inference task?

Our goal
• Provide rigorous foundation to network inference
1. develop a measure that relates the amount of data to the
performance of algorithms
2. give information-theoretic performance guarantees
3. develop more efﬁcient algorithms
5

We assume an underlying cascade model
6
b
d
e
a
c
st = 0.0

6
b
d
e
a
c
s
Pr{H} = p

6
b
d
e
a
c
s
Exp( )

6
b
d
e
a
c
s
!

6
b
d
e
a
c
s
c

6
b
d
e
a
c
s
c
a b

6
b
d
e
a
c
s
c
a b
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
Trace

Traces and cascades
• Each cascade generates one trace
• Random cascade: starts at a node chosen uniformly at
random (assumption in some of our models)
• Traces do not directly reﬂect the underlying network over
which the cascade propagates
7
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705

Traces and cascades
• Each cascade generates one trace
• Random cascade: starts at a node chosen uniformly at
random (assumption in some of our models)
• Traces do not directly reﬂect the underlying network over
which the cascade propagates
7
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
How much structural information is contained in a trace?

Our Research Question I
How many traces do we need to reconstruct the
underlying network?
We call this measure the trace complexity of the problem.
8

Our Research Question II
How does trace length play a role for inference?
As we keep scanning the trace, it becomes less and less
informative.
9
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705

informative.
9
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
s
c

informative.
9
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
s
c
a
?
?

informative.
9
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
s
c
a
?
?
b
?
?
?

• First-Edge Algorithm
- Infers the edge corresponding to the ﬁrst two nodes in
each trace (and ignores the rest of the trace)
The head of the trace
10
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
Node d
t=1.725

• First-Edge Algorithm
- Infers the edge corresponding to the ﬁrst two nodes in
each trace (and ignores the rest of the trace)
10
Node s
t=0.0
Node c
t=0.345

1. The head of traces
• First-Edge is close to the best we can do for exact reconstruction
2. The tail of traces
• We give algorithms using exponentially fewer traces
- trees
- bounded degree graphs
3. Infer properties without reconstructing the network itself
- degree distribution
Contributions
11
O(log n)
⌦(n 1 ✏
)
O(n)
O(poly( ) log n)

How many traces do we need for exact
reconstruction of general graphs?
12

Lower bound for exact reconstruction of general graphs
13
a b
c...
df
e
G0 = Kn
a b
c...
df
e
G1 = Kn {a, b}
1. We choose the unknown graph in {G0, G1}
2. Run random cascades on the chosen graph

14
a b
c...
df
e
G?
Given a set of ` random traces T1, . . . , T`,
Bayes’ rule can tell us which of the two alternatives
G0 or G1 is the most likely.

15
a b
c...
df
e
G?
Lemma
Let ` < n2 ✏
. For any small positive constant ✏.
Then with prob. 1-o(1) over the random traces T1, ..., T`,
the posterior Pr{G0|T1, ..., T`} lies in [1
2 o(1), 1
2 + o(1)]

16
Corollary
If ` < n · 1 ✏
, any algorithm will fail to reconstruct
the graph with high probability.
Let Δ be the largest degree of a
node in the network
⌦(n 1 ✏
) traces are necessary

17
First-Edge reconstructs the graph with O(n log n) traces.
First-Edge
O(n log n)
Lower bound
⌦(n 1 ✏
)
First-Edge is close to the best we can do for exact reconstruction!

Can we reconstruct special families
of graphs using fewer traces?
18

• Useful information to reconstruct special graphs
• We give algorithms for inference using exponentially fewer traces.
- trees
- bounded degree graphs
The tail of the trace
19
O(log n)
O(poly( ) log n)

Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.

20
O(log n) traces.
Take ` traces

20
O(log n) traces.
Take ` traces
1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces

20
O(log n) traces.
Take ` traces
u v

20
O(log n) traces.
Take ` traces
u v
{u,v} is the only route of
infection between u and v

20
O(log n) traces.
Take ` traces
u v
Incubation time between u
and v is a sample of Exp(λ)

20
O(log n) traces.
Take ` traces
u v
Incubation time between u
and v is a sample of Exp(λ)
If (u, v) 2 E, c(u, v) < 1
with prob. approaching 1 exponentially with `

20
O(log n) traces.
Take ` traces
u v
(*Step 3 omitted)
Otherwise⇤
, c(u, v) > 1
If (u, v) 2 E, c(u, v) < 1

20
O(log n) traces.
Take ` traces
u v
(*Step 3 omitted)
Otherwise⇤
, c(u, v) > 1
If (u, v) 2 E, c(u, v) < 1
Prob. that all these events happen 1 1
nc using ` c · log n traces.

Local MLE for inferring bounded degree graphs
• Think of the potential neighbor sets of u as “forecasters”
predicting the infection time of u, given their own infection times
21
• Identify the most accurate using a proper scoring rule
Trace complexity O(poly( ) log n)

Can we recover properties of a network without
paying the full price of network reconstruction?
22

• Useful to reason about the behavior of processes that take
place in the network
• Robustness [Cohen et al.’00]
• Network evolution [Leskovec, Kleinberg, Faloutsos’05]
• ...
Obtaining network properties cheaper
23
We can infer the degree distribution with high probability
with O(n) traces.
Lower bound for reconstruct the whole network
⌦(n 1 ✏
)

Reconstructing degree distribution
24
s

24
s
t1
Trace 1 t1

24
s
t2
Trace 1 t1
Trace 2 t2

24
s
t3
Trace 1 t1
Trace 2 t2
Trace 3 t3

24
s
Trace 1 t1
Trace 2 t2
Trace 3 t3
.
.
.
Trace ` t`
t`

24
s
Trace 1 t1
Trace 2 t2
Trace 3 t3
.
.
.
Trace ` t`
Let d be the degree of s
• T =
P`
i=1 ti is Erlang(`, d )

24
s
Pr{Erlang(n, ) < z} = Pr{Pois(z · ) n}
Trace 1 t1
Trace 2 t2
Trace 3 t3
.
.
.
Trace ` t`
• T =
P`

24
s
Pr{Erlang(n, ) < z} = Pr{Pois(z · ) n}
Trace 1 t1
Trace 2 t2
Trace 3 t3
.
.
.
Trace ` t`
• T =
P`
Output: ˆd = `
T
We achieve (1 + ✏)-approximation
with probability 1
using ⌦
⇣
ln 1
✏2
⌘
traces.
Using the Poisson tail bound

• Using 10n traces
25
Barabasi-Albert
1024 nodes
Facebook-Rice Undergraduate
1220 nodes
Facebook-Rice Graduate
503 nodes

Building on the First-Edge algorithm
• First-Edge is close to optimal, but
• Naive and too conservative: Ignores most of the trace
information
• predictable performance: At most as many true-
positive edges as the number of traces (and no false positives)
26

Could we discover more true positives
if we are willing to take more (calculated) risks?
27

First-Edge+
28

• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28

First-Edge+
28
s
t0
N(s) = ds

First-Edge+
28
s
t0
N(s) = ds
u
t1
N(u) = du

First-Edge+
28
ds 1 + du 1 edges waiting at time t1
s
t0
N(s) = ds
u
t1
N(u) = du

First-Edge+
28
s
t0
N(s) = ds
u
t1
N(u) = du
Any of these are equally likely to
be the ﬁrst to ﬁnish

First-Edge+
28
v
?t2
s
t0
N(s) = ds
u
t1
N(u) = du

First-Edge+
28
v
?t2
s
t0
N(s) = ds
u
t1
N(u) = du s infected v with probability
p(s,v) = ds 1
ds+du 2
u infected v with probability
p(u,v) = du 1
ds+du 2

First-Edge+
28
Infer (x, y) if p(x,y) 0.5
v
?t2
s
t0
N(s) = ds
u
t1
p(s,v) = ds 1
ds+du 2
p(u,v) = du 1
ds+du 2

First-Edge+
28
Infer (x, y) if p(x,y) 0.5
Given a larger trace preﬁx u1, · · · , uk ( u1 is the source)
p(ui,uk+1) '
dui
P
j duj
v
?t2
s
t0
N(s) = ds
u
t1
p(s,v) = ds 1
ds+du 2
p(u,v) = du 1
ds+du 2

Experimental Inference Results
29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
-Netinf [Gomez-Rodriguez, Leskovec, Krause’2010]
-First-Edge
-First-Edge+

29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
First-Edge+ exhibit
competitive performance
-First-Edge
-First-Edge+

29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
NetInf’s performance
ﬂattens
-First-Edge
-First-Edge+

29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
Our algorithm perfectly
reconstructs trees with ~30 traces
-First-Edge
-First-Edge+

29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
• First-Edge+: competitive performance, extremely simple to implement,
computationally efﬁcient, preemptive.
-First-Edge
-First-Edge+

Conclusions
• Our results have direct implication in the design of network
inference algorithms
• We provide rigorous analysis of the relationship between the
amount of data and the performance of algorithms
• We give algorithms that are competitive with, while being
simpler and more efﬁcient than, existing approaches
30

Open questions and challenges
• Performance guarantees for approximated reconstruction
• Trace complexity under other distributions of incubation
times
• Bounded degree network inference has trace complexity
polynomial in Δ, but running time exponential in Δ
- Can we optimize the algorithm?
• Other network properties that can be recovered without
reconstructing the network
31

Trace complexity of
Network Inference
Bruno Abrahao (Cornell)
Flavio Chierichetti (Sapienza)
Robert Kleinberg (Cornell)
Alessandro Panconesi (Sapienza) Cornell University
32
Text
Sapienza University
Complete version including all proofs
www.arxiv.org/abs/1308.2954
or
http://www.cs.cornell.edu/~abrahao

Trace Complexity of Network Inference

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Trace Complexity of Network Inference

Ähnlich wie Trace Complexity of Network Inference (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Trace Complexity of Network Inference