SlideShare ist ein Scribd-Unternehmen logo
1 von 74
Downloaden Sie, um offline zu lesen
Trace Complexity of
Network Inference
Bruno Abrahao (Cornell)
Flavio Chierichetti (Sapienza)
Robert Kleinberg (Cornell)
Alessandro Panconesi (Sapienza) Cornell University
1
Text
Sapienza University
Wednesday, August 14, 13
Influence and diffusion on networks
2
Wednesday, August 14, 13
Influence and diffusion on networks
• Network Inference: Find influencers, improve marketing,
prevent disease outbreaks, and forecast crimes
2
Wednesday, August 14, 13
The Network Inference Problem
• Learning each edge independently
- [Adar,Adamic‘2005]
• MLE-inspired approaches
- [Gomez-Rodriguez, Leskovec, Krause’2010]
- [Gomez-Rodriguez, Balduzzi, Scholkopf’2011]
- [Myers, Leskovec‘2011]
- [Du et al.‘2012]
• Information theoretic
- [Netrapalli, Sanghavi‘2012]
- [Grippon, Rabbat‘2013]
3
Wednesday, August 14, 13
The Network Inference Problem
• Learning each edge independently
- [Adar,Adamic‘2005]
• MLE-inspired approaches
- [Gomez-Rodriguez, Leskovec, Krause’2010]
- [Gomez-Rodriguez, Balduzzi, Scholkopf’2011]
- [Myers, Leskovec‘2011]
- [Du et al.‘2012]
• Information theoretic
- [Netrapalli, Sanghavi‘2012]
- [Grippon, Rabbat‘2013]
3
Our work
Wednesday, August 14, 13
The Network Inference Problem
• The relationship between the amount of data and the
performance of inference algorithms is not well understood
4
What can be inferred? What amounts of resources are
required? How hard is the inference task?
Wednesday, August 14, 13
Our goal
• Provide rigorous foundation to network inference
1. develop a measure that relates the amount of data to the
performance of algorithms
2. give information-theoretic performance guarantees
3. develop more efficient algorithms
5
Wednesday, August 14, 13
We assume an underlying cascade model
6
b
d
e
a
c
st = 0.0
Wednesday, August 14, 13
We assume an underlying cascade model
6
b
d
e
a
c
s
Pr{H} = p
Wednesday, August 14, 13
We assume an underlying cascade model
6
b
d
e
a
c
s
Pr{H} = p
Wednesday, August 14, 13
We assume an underlying cascade model
6
b
d
e
a
c
s
Exp( )
Wednesday, August 14, 13
We assume an underlying cascade model
6
b
d
e
a
c
s
!
Wednesday, August 14, 13
We assume an underlying cascade model
6
b
d
e
a
c
s
c
Wednesday, August 14, 13
We assume an underlying cascade model
6
b
d
e
a
c
s
c
a b
Wednesday, August 14, 13
We assume an underlying cascade model
6
b
d
e
a
c
s
c
a b
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
Trace
Wednesday, August 14, 13
Traces and cascades
• Each cascade generates one trace
• Random cascade: starts at a node chosen uniformly at
random (assumption in some of our models)
• Traces do not directly reflect the underlying network over
which the cascade propagates
7
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
Wednesday, August 14, 13
Traces and cascades
• Each cascade generates one trace
• Random cascade: starts at a node chosen uniformly at
random (assumption in some of our models)
• Traces do not directly reflect the underlying network over
which the cascade propagates
7
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
How much structural information is contained in a trace?
Wednesday, August 14, 13
Our Research Question I
How many traces do we need to reconstruct the
underlying network?
We call this measure the trace complexity of the problem.
8
Wednesday, August 14, 13
Our Research Question II
How does trace length play a role for inference?
As we keep scanning the trace, it becomes less and less
informative.
9
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
Wednesday, August 14, 13
Our Research Question II
How does trace length play a role for inference?
As we keep scanning the trace, it becomes less and less
informative.
9
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
s
c
Wednesday, August 14, 13
Our Research Question II
How does trace length play a role for inference?
As we keep scanning the trace, it becomes less and less
informative.
9
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
s
c
a
?
?
Wednesday, August 14, 13
Our Research Question II
How does trace length play a role for inference?
As we keep scanning the trace, it becomes less and less
informative.
9
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
s
c
a
?
?
b
?
?
?
Wednesday, August 14, 13
• First-Edge Algorithm
- Infers the edge corresponding to the first two nodes in
each trace (and ignores the rest of the trace)
The head of the trace
10
Node s
t=0.0
Node c
t=0.345
Node a
t=1.236
Node b
t=1.705
Node d
t=1.725
Wednesday, August 14, 13
• First-Edge Algorithm
- Infers the edge corresponding to the first two nodes in
each trace (and ignores the rest of the trace)
The head of the trace
10
Node s
t=0.0
Node c
t=0.345
Wednesday, August 14, 13
1. The head of traces
• First-Edge is close to the best we can do for exact reconstruction
2. The tail of traces
• We give algorithms using exponentially fewer traces
- trees
- bounded degree graphs
3. Infer properties without reconstructing the network itself
- degree distribution
Contributions
11
O(log n)
⌦(n 1 ✏
)
O(n)
O(poly( ) log n)
Wednesday, August 14, 13
How many traces do we need for exact
reconstruction of general graphs?
12
Wednesday, August 14, 13
Lower bound for exact reconstruction of general graphs
13
a b
c...
df
e
G0 = Kn
a b
c...
df
e
G1 = Kn {a, b}
1. We choose the unknown graph in {G0, G1}
2. Run random cascades on the chosen graph
Wednesday, August 14, 13
14
a b
c...
df
e
G?
Given a set of ` random traces T1, . . . , T`,
Bayes’ rule can tell us which of the two alternatives
G0 or G1 is the most likely.
Lower bound for exact reconstruction of general graphs
Wednesday, August 14, 13
15
a b
c...
df
e
G?
Lower bound for exact reconstruction of general graphs
Lemma
Let ` < n2 ✏
. For any small positive constant ✏.
Then with prob. 1-o(1) over the random traces T1, ..., T`,
the posterior Pr{G0|T1, ..., T`} lies in [1
2 o(1), 1
2 + o(1)]
Wednesday, August 14, 13
16
Lower bound for exact reconstruction of general graphs
Corollary
If ` < n · 1 ✏
, any algorithm will fail to reconstruct
the graph with high probability.
Let Δ be the largest degree of a
node in the network
⌦(n 1 ✏
) traces are necessary
Wednesday, August 14, 13
The head of the trace
17
First-Edge reconstructs the graph with O(n log n) traces.
First-Edge
O(n log n)
Lower bound
⌦(n 1 ✏
)
First-Edge is close to the best we can do for exact reconstruction!
Wednesday, August 14, 13
Can we reconstruct special families
of graphs using fewer traces?
18
Wednesday, August 14, 13
• Useful information to reconstruct special graphs
• We give algorithms for inference using exponentially fewer traces.
- trees
- bounded degree graphs
The tail of the trace
19
O(log n)
O(poly( ) log n)
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Take ` traces
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Take ` traces
1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Take ` traces
u v
1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Take ` traces
u v
{u,v} is the only route of
infection between u and v
1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Take ` traces
u v
{u,v} is the only route of
infection between u and v
Incubation time between u
and v is a sample of Exp(λ)
1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Take ` traces
u v
{u,v} is the only route of
infection between u and v
Incubation time between u
and v is a sample of Exp(λ)
1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces
If (u, v) 2 E, c(u, v) < 1
with prob. approaching 1 exponentially with `
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Take ` traces
u v
1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces
(*Step 3 omitted)
Otherwise⇤
, c(u, v) > 1
with prob. approaching 1 exponentially with `
If (u, v) 2 E, c(u, v) < 1
with prob. approaching 1 exponentially with `
Wednesday, August 14, 13
Maximum Likelihood Tree Estimation
20
We can perfectly reconstruct trees with high probability using
O(log n) traces.
Take ` traces
u v
1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces
(*Step 3 omitted)
Otherwise⇤
, c(u, v) > 1
with prob. approaching 1 exponentially with `
If (u, v) 2 E, c(u, v) < 1
with prob. approaching 1 exponentially with `
Prob. that all these events happen 1 1
nc using ` c · log n traces.
Wednesday, August 14, 13
Local MLE for inferring bounded degree graphs
• Think of the potential neighbor sets of u as “forecasters”
predicting the infection time of u, given their own infection times
21
• Identify the most accurate using a proper scoring rule
Trace complexity O(poly( ) log n)
Wednesday, August 14, 13
Can we recover properties of a network without
paying the full price of network reconstruction?
22
Wednesday, August 14, 13
• Useful to reason about the behavior of processes that take
place in the network
• Robustness [Cohen et al.’00]
• Network evolution [Leskovec, Kleinberg, Faloutsos’05]
• ...
Obtaining network properties cheaper
23
We can infer the degree distribution with high probability
with O(n) traces.
Lower bound for reconstruct the whole network
⌦(n 1 ✏
)
Wednesday, August 14, 13
Reconstructing degree distribution
24
s
Wednesday, August 14, 13
Reconstructing degree distribution
24
s
t1
Trace 1 t1
Wednesday, August 14, 13
Reconstructing degree distribution
24
s
t2
Trace 1 t1
Trace 2 t2
Wednesday, August 14, 13
Reconstructing degree distribution
24
s
t3
Trace 1 t1
Trace 2 t2
Trace 3 t3
Wednesday, August 14, 13
Reconstructing degree distribution
24
s
Trace 1 t1
Trace 2 t2
Trace 3 t3
.
.
.
Trace ` t`
t`
Wednesday, August 14, 13
Reconstructing degree distribution
24
s
Trace 1 t1
Trace 2 t2
Trace 3 t3
.
.
.
Trace ` t`
Let d be the degree of s
• T =
P`
i=1 ti is Erlang(`, d )
Wednesday, August 14, 13
Reconstructing degree distribution
24
s
Pr{Erlang(n, ) < z} = Pr{Pois(z · ) n}
Trace 1 t1
Trace 2 t2
Trace 3 t3
.
.
.
Trace ` t`
Let d be the degree of s
• T =
P`
i=1 ti is Erlang(`, d )
Wednesday, August 14, 13
Reconstructing degree distribution
24
s
Pr{Erlang(n, ) < z} = Pr{Pois(z · ) n}
Trace 1 t1
Trace 2 t2
Trace 3 t3
.
.
.
Trace ` t`
Let d be the degree of s
• T =
P`
i=1 ti is Erlang(`, d )
Output: ˆd = `
T
We achieve (1 + ✏)-approximation
with probability 1
using ⌦
⇣
ln 1
✏2
⌘
traces.
Using the Poisson tail bound
Wednesday, August 14, 13
Reconstructing degree distribution
• Using 10n traces
25
Barabasi-Albert
1024 nodes
Facebook-Rice Undergraduate
1220 nodes
Facebook-Rice Graduate
503 nodes
Wednesday, August 14, 13
Building on the First-Edge algorithm
• First-Edge is close to optimal, but
• Naive and too conservative: Ignores most of the trace
information
• predictable performance: At most as many true-
positive edges as the number of traces (and no false positives)
26
Wednesday, August 14, 13
Could we discover more true positives
if we are willing to take more (calculated) risks?
27
Wednesday, August 14, 13
First-Edge+
28
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
s
t0
N(s) = ds
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
s
t0
N(s) = ds
u
t1
N(u) = du
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
ds 1 + du 1 edges waiting at time t1
s
t0
N(s) = ds
u
t1
N(u) = du
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
ds 1 + du 1 edges waiting at time t1
s
t0
N(s) = ds
u
t1
N(u) = du
Any of these are equally likely to
be the first to finish
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
ds 1 + du 1 edges waiting at time t1
v
?t2
s
t0
N(s) = ds
u
t1
N(u) = du
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
ds 1 + du 1 edges waiting at time t1
v
?t2
s
t0
N(s) = ds
u
t1
N(u) = du s infected v with probability
p(s,v) = ds 1
ds+du 2
u infected v with probability
p(u,v) = du 1
ds+du 2
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
ds 1 + du 1 edges waiting at time t1
Infer (x, y) if p(x,y) 0.5
v
?t2
s
t0
N(s) = ds
u
t1
N(u) = du s infected v with probability
p(s,v) = ds 1
ds+du 2
u infected v with probability
p(u,v) = du 1
ds+du 2
Wednesday, August 14, 13
• Idea: 1. Reconstruct degree distribution
2. Guess edges by exploring the memoryless property.
First-Edge+
28
ds 1 + du 1 edges waiting at time t1
Infer (x, y) if p(x,y) 0.5
Given a larger trace prefix u1, · · · , uk ( u1 is the source)
p(ui,uk+1) '
dui
P
j duj
v
?t2
s
t0
N(s) = ds
u
t1
N(u) = du s infected v with probability
p(s,v) = ds 1
ds+du 2
u infected v with probability
p(u,v) = du 1
ds+du 2
Wednesday, August 14, 13
Experimental Inference Results
29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
-Netinf [Gomez-Rodriguez, Leskovec, Krause’2010]
-First-Edge
-First-Edge+
Wednesday, August 14, 13
Experimental Inference Results
29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
First-Edge+ exhibit
competitive performance
-Netinf [Gomez-Rodriguez, Leskovec, Krause’2010]
-First-Edge
-First-Edge+
Wednesday, August 14, 13
Experimental Inference Results
29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
NetInf’s performance
flattens
-Netinf [Gomez-Rodriguez, Leskovec, Krause’2010]
-First-Edge
-First-Edge+
Wednesday, August 14, 13
Experimental Inference Results
29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
Our algorithm perfectly
reconstructs trees with ~30 traces
-Netinf [Gomez-Rodriguez, Leskovec, Krause’2010]
-First-Edge
-First-Edge+
Wednesday, August 14, 13
Experimental Inference Results
29
Barabasi-Albert
1024 nodes
= 174
Power-law tree
1024 nodes
= 94
Facebook
1220 nodes
= 287
= max. degree
• First-Edge+: competitive performance, extremely simple to implement,
computationally efficient, preemptive.
-Netinf [Gomez-Rodriguez, Leskovec, Krause’2010]
-First-Edge
-First-Edge+
Wednesday, August 14, 13
Conclusions
• Our results have direct implication in the design of network
inference algorithms
• We provide rigorous analysis of the relationship between the
amount of data and the performance of algorithms
• We give algorithms that are competitive with, while being
simpler and more efficient than, existing approaches
30
Wednesday, August 14, 13
Open questions and challenges
• Performance guarantees for approximated reconstruction
• Trace complexity under other distributions of incubation
times
• Bounded degree network inference has trace complexity
polynomial in Δ, but running time exponential in Δ
- Can we optimize the algorithm?
• Other network properties that can be recovered without
reconstructing the network
31
Wednesday, August 14, 13
Trace complexity of
Network Inference
Bruno Abrahao (Cornell)
Flavio Chierichetti (Sapienza)
Robert Kleinberg (Cornell)
Alessandro Panconesi (Sapienza) Cornell University
32
Text
Sapienza University
Complete version including all proofs
www.arxiv.org/abs/1308.2954
or
http://www.cs.cornell.edu/~abrahao
Wednesday, August 14, 13

Weitere ähnliche Inhalte

Ähnlich wie Trace Complexity of Network Inference

Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...Harshal Chaudhari
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
 
Applications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingApplications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingEric Larson
 
Improved deterministic algorithms for decremental transitive closure and stro...
Improved deterministic algorithms for decremental transitive closure and stro...Improved deterministic algorithms for decremental transitive closure and stro...
Improved deterministic algorithms for decremental transitive closure and stro...katzelad1
 
Design and Analysis of Algorithm -Shortest paths problem
Design and Analysis of Algorithm -Shortest paths problemDesign and Analysis of Algorithm -Shortest paths problem
Design and Analysis of Algorithm -Shortest paths problempooja saini
 
20131001 lab meeting
20131001 lab meeting20131001 lab meeting
20131001 lab meetingChihua Wu
 
Section 4 3_the_scattering_matrix_package
Section 4 3_the_scattering_matrix_packageSection 4 3_the_scattering_matrix_package
Section 4 3_the_scattering_matrix_packageJamal Kazazi
 
Bring the Noise
Bring the NoiseBring the Noise
Bring the NoiseJon Cowie
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream MiningAlbert Bifet
 
Overfit10
Overfit10Overfit10
Overfit10okeee
 
Time complexity of union find
Time complexity of union findTime complexity of union find
Time complexity of union findWei (Terence) Li
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data ScienceAlbert Bifet
 
Link Prediction in the Real World
Link Prediction in the Real WorldLink Prediction in the Real World
Link Prediction in the Real WorldBalaji Ganesan
 
Machine Learning for Scientific Applications
Machine Learning for Scientific ApplicationsMachine Learning for Scientific Applications
Machine Learning for Scientific ApplicationsDavid Lary
 
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...Amrinder Arora
 

Ähnlich wie Trace Complexity of Network Inference (20)

Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
Applications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingApplications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive Coding
 
Network Design Assignment Help
Network Design Assignment HelpNetwork Design Assignment Help
Network Design Assignment Help
 
Improved deterministic algorithms for decremental transitive closure and stro...
Improved deterministic algorithms for decremental transitive closure and stro...Improved deterministic algorithms for decremental transitive closure and stro...
Improved deterministic algorithms for decremental transitive closure and stro...
 
Design and Analysis of Algorithm -Shortest paths problem
Design and Analysis of Algorithm -Shortest paths problemDesign and Analysis of Algorithm -Shortest paths problem
Design and Analysis of Algorithm -Shortest paths problem
 
04 greedyalgorithmsii
04 greedyalgorithmsii04 greedyalgorithmsii
04 greedyalgorithmsii
 
20131001 lab meeting
20131001 lab meeting20131001 lab meeting
20131001 lab meeting
 
lecture7.ppt
lecture7.pptlecture7.ppt
lecture7.ppt
 
Section 4 3_the_scattering_matrix_package
Section 4 3_the_scattering_matrix_packageSection 4 3_the_scattering_matrix_package
Section 4 3_the_scattering_matrix_package
 
Bring the Noise
Bring the NoiseBring the Noise
Bring the Noise
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
Overfit10
Overfit10Overfit10
Overfit10
 
Time complexity of union find
Time complexity of union findTime complexity of union find
Time complexity of union find
 
UCSD NANO106 - 01 - Introduction to Crystallography
UCSD NANO106 - 01 - Introduction to CrystallographyUCSD NANO106 - 01 - Introduction to Crystallography
UCSD NANO106 - 01 - Introduction to Crystallography
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Link Prediction in the Real World
Link Prediction in the Real WorldLink Prediction in the Real World
Link Prediction in the Real World
 
Machine Learning for Scientific Applications
Machine Learning for Scientific ApplicationsMachine Learning for Scientific Applications
Machine Learning for Scientific Applications
 
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
 
Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 

Kürzlich hochgeladen

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 

Kürzlich hochgeladen (20)

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

Trace Complexity of Network Inference

  • 1. Trace Complexity of Network Inference Bruno Abrahao (Cornell) Flavio Chierichetti (Sapienza) Robert Kleinberg (Cornell) Alessandro Panconesi (Sapienza) Cornell University 1 Text Sapienza University Wednesday, August 14, 13
  • 2. Influence and diffusion on networks 2 Wednesday, August 14, 13
  • 3. Influence and diffusion on networks • Network Inference: Find influencers, improve marketing, prevent disease outbreaks, and forecast crimes 2 Wednesday, August 14, 13
  • 4. The Network Inference Problem • Learning each edge independently - [Adar,Adamic‘2005] • MLE-inspired approaches - [Gomez-Rodriguez, Leskovec, Krause’2010] - [Gomez-Rodriguez, Balduzzi, Scholkopf’2011] - [Myers, Leskovec‘2011] - [Du et al.‘2012] • Information theoretic - [Netrapalli, Sanghavi‘2012] - [Grippon, Rabbat‘2013] 3 Wednesday, August 14, 13
  • 5. The Network Inference Problem • Learning each edge independently - [Adar,Adamic‘2005] • MLE-inspired approaches - [Gomez-Rodriguez, Leskovec, Krause’2010] - [Gomez-Rodriguez, Balduzzi, Scholkopf’2011] - [Myers, Leskovec‘2011] - [Du et al.‘2012] • Information theoretic - [Netrapalli, Sanghavi‘2012] - [Grippon, Rabbat‘2013] 3 Our work Wednesday, August 14, 13
  • 6. The Network Inference Problem • The relationship between the amount of data and the performance of inference algorithms is not well understood 4 What can be inferred? What amounts of resources are required? How hard is the inference task? Wednesday, August 14, 13
  • 7. Our goal • Provide rigorous foundation to network inference 1. develop a measure that relates the amount of data to the performance of algorithms 2. give information-theoretic performance guarantees 3. develop more efficient algorithms 5 Wednesday, August 14, 13
  • 8. We assume an underlying cascade model 6 b d e a c st = 0.0 Wednesday, August 14, 13
  • 9. We assume an underlying cascade model 6 b d e a c s Pr{H} = p Wednesday, August 14, 13
  • 10. We assume an underlying cascade model 6 b d e a c s Pr{H} = p Wednesday, August 14, 13
  • 11. We assume an underlying cascade model 6 b d e a c s Exp( ) Wednesday, August 14, 13
  • 12. We assume an underlying cascade model 6 b d e a c s ! Wednesday, August 14, 13
  • 13. We assume an underlying cascade model 6 b d e a c s c Wednesday, August 14, 13
  • 14. We assume an underlying cascade model 6 b d e a c s c a b Wednesday, August 14, 13
  • 15. We assume an underlying cascade model 6 b d e a c s c a b Node s t=0.0 Node c t=0.345 Node a t=1.236 Node b t=1.705 Trace Wednesday, August 14, 13
  • 16. Traces and cascades • Each cascade generates one trace • Random cascade: starts at a node chosen uniformly at random (assumption in some of our models) • Traces do not directly reflect the underlying network over which the cascade propagates 7 Node s t=0.0 Node c t=0.345 Node a t=1.236 Node b t=1.705 Wednesday, August 14, 13
  • 17. Traces and cascades • Each cascade generates one trace • Random cascade: starts at a node chosen uniformly at random (assumption in some of our models) • Traces do not directly reflect the underlying network over which the cascade propagates 7 Node s t=0.0 Node c t=0.345 Node a t=1.236 Node b t=1.705 How much structural information is contained in a trace? Wednesday, August 14, 13
  • 18. Our Research Question I How many traces do we need to reconstruct the underlying network? We call this measure the trace complexity of the problem. 8 Wednesday, August 14, 13
  • 19. Our Research Question II How does trace length play a role for inference? As we keep scanning the trace, it becomes less and less informative. 9 Node s t=0.0 Node c t=0.345 Node a t=1.236 Node b t=1.705 Wednesday, August 14, 13
  • 20. Our Research Question II How does trace length play a role for inference? As we keep scanning the trace, it becomes less and less informative. 9 Node s t=0.0 Node c t=0.345 Node a t=1.236 Node b t=1.705 s c Wednesday, August 14, 13
  • 21. Our Research Question II How does trace length play a role for inference? As we keep scanning the trace, it becomes less and less informative. 9 Node s t=0.0 Node c t=0.345 Node a t=1.236 Node b t=1.705 s c a ? ? Wednesday, August 14, 13
  • 22. Our Research Question II How does trace length play a role for inference? As we keep scanning the trace, it becomes less and less informative. 9 Node s t=0.0 Node c t=0.345 Node a t=1.236 Node b t=1.705 s c a ? ? b ? ? ? Wednesday, August 14, 13
  • 23. • First-Edge Algorithm - Infers the edge corresponding to the first two nodes in each trace (and ignores the rest of the trace) The head of the trace 10 Node s t=0.0 Node c t=0.345 Node a t=1.236 Node b t=1.705 Node d t=1.725 Wednesday, August 14, 13
  • 24. • First-Edge Algorithm - Infers the edge corresponding to the first two nodes in each trace (and ignores the rest of the trace) The head of the trace 10 Node s t=0.0 Node c t=0.345 Wednesday, August 14, 13
  • 25. 1. The head of traces • First-Edge is close to the best we can do for exact reconstruction 2. The tail of traces • We give algorithms using exponentially fewer traces - trees - bounded degree graphs 3. Infer properties without reconstructing the network itself - degree distribution Contributions 11 O(log n) ⌦(n 1 ✏ ) O(n) O(poly( ) log n) Wednesday, August 14, 13
  • 26. How many traces do we need for exact reconstruction of general graphs? 12 Wednesday, August 14, 13
  • 27. Lower bound for exact reconstruction of general graphs 13 a b c... df e G0 = Kn a b c... df e G1 = Kn {a, b} 1. We choose the unknown graph in {G0, G1} 2. Run random cascades on the chosen graph Wednesday, August 14, 13
  • 28. 14 a b c... df e G? Given a set of ` random traces T1, . . . , T`, Bayes’ rule can tell us which of the two alternatives G0 or G1 is the most likely. Lower bound for exact reconstruction of general graphs Wednesday, August 14, 13
  • 29. 15 a b c... df e G? Lower bound for exact reconstruction of general graphs Lemma Let ` < n2 ✏ . For any small positive constant ✏. Then with prob. 1-o(1) over the random traces T1, ..., T`, the posterior Pr{G0|T1, ..., T`} lies in [1 2 o(1), 1 2 + o(1)] Wednesday, August 14, 13
  • 30. 16 Lower bound for exact reconstruction of general graphs Corollary If ` < n · 1 ✏ , any algorithm will fail to reconstruct the graph with high probability. Let Δ be the largest degree of a node in the network ⌦(n 1 ✏ ) traces are necessary Wednesday, August 14, 13
  • 31. The head of the trace 17 First-Edge reconstructs the graph with O(n log n) traces. First-Edge O(n log n) Lower bound ⌦(n 1 ✏ ) First-Edge is close to the best we can do for exact reconstruction! Wednesday, August 14, 13
  • 32. Can we reconstruct special families of graphs using fewer traces? 18 Wednesday, August 14, 13
  • 33. • Useful information to reconstruct special graphs • We give algorithms for inference using exponentially fewer traces. - trees - bounded degree graphs The tail of the trace 19 O(log n) O(poly( ) log n) Wednesday, August 14, 13
  • 34. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Wednesday, August 14, 13
  • 35. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Take ` traces Wednesday, August 14, 13
  • 36. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Take ` traces 1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces Wednesday, August 14, 13
  • 37. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Take ` traces u v 1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces Wednesday, August 14, 13
  • 38. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Take ` traces u v {u,v} is the only route of infection between u and v 1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces Wednesday, August 14, 13
  • 39. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Take ` traces u v {u,v} is the only route of infection between u and v Incubation time between u and v is a sample of Exp(λ) 1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces Wednesday, August 14, 13
  • 40. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Take ` traces u v {u,v} is the only route of infection between u and v Incubation time between u and v is a sample of Exp(λ) 1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces If (u, v) 2 E, c(u, v) < 1 with prob. approaching 1 exponentially with ` Wednesday, August 14, 13
  • 41. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Take ` traces u v 1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces (*Step 3 omitted) Otherwise⇤ , c(u, v) > 1 with prob. approaching 1 exponentially with ` If (u, v) 2 E, c(u, v) < 1 with prob. approaching 1 exponentially with ` Wednesday, August 14, 13
  • 42. Maximum Likelihood Tree Estimation 20 We can perfectly reconstruct trees with high probability using O(log n) traces. Take ` traces u v 1. Set c(u, v) as the median of observations |t(u) t(v)| over all traces (*Step 3 omitted) Otherwise⇤ , c(u, v) > 1 with prob. approaching 1 exponentially with ` If (u, v) 2 E, c(u, v) < 1 with prob. approaching 1 exponentially with ` Prob. that all these events happen 1 1 nc using ` c · log n traces. Wednesday, August 14, 13
  • 43. Local MLE for inferring bounded degree graphs • Think of the potential neighbor sets of u as “forecasters” predicting the infection time of u, given their own infection times 21 • Identify the most accurate using a proper scoring rule Trace complexity O(poly( ) log n) Wednesday, August 14, 13
  • 44. Can we recover properties of a network without paying the full price of network reconstruction? 22 Wednesday, August 14, 13
  • 45. • Useful to reason about the behavior of processes that take place in the network • Robustness [Cohen et al.’00] • Network evolution [Leskovec, Kleinberg, Faloutsos’05] • ... Obtaining network properties cheaper 23 We can infer the degree distribution with high probability with O(n) traces. Lower bound for reconstruct the whole network ⌦(n 1 ✏ ) Wednesday, August 14, 13
  • 47. Reconstructing degree distribution 24 s t1 Trace 1 t1 Wednesday, August 14, 13
  • 48. Reconstructing degree distribution 24 s t2 Trace 1 t1 Trace 2 t2 Wednesday, August 14, 13
  • 49. Reconstructing degree distribution 24 s t3 Trace 1 t1 Trace 2 t2 Trace 3 t3 Wednesday, August 14, 13
  • 50. Reconstructing degree distribution 24 s Trace 1 t1 Trace 2 t2 Trace 3 t3 . . . Trace ` t` t` Wednesday, August 14, 13
  • 51. Reconstructing degree distribution 24 s Trace 1 t1 Trace 2 t2 Trace 3 t3 . . . Trace ` t` Let d be the degree of s • T = P` i=1 ti is Erlang(`, d ) Wednesday, August 14, 13
  • 52. Reconstructing degree distribution 24 s Pr{Erlang(n, ) < z} = Pr{Pois(z · ) n} Trace 1 t1 Trace 2 t2 Trace 3 t3 . . . Trace ` t` Let d be the degree of s • T = P` i=1 ti is Erlang(`, d ) Wednesday, August 14, 13
  • 53. Reconstructing degree distribution 24 s Pr{Erlang(n, ) < z} = Pr{Pois(z · ) n} Trace 1 t1 Trace 2 t2 Trace 3 t3 . . . Trace ` t` Let d be the degree of s • T = P` i=1 ti is Erlang(`, d ) Output: ˆd = ` T We achieve (1 + ✏)-approximation with probability 1 using ⌦ ⇣ ln 1 ✏2 ⌘ traces. Using the Poisson tail bound Wednesday, August 14, 13
  • 54. Reconstructing degree distribution • Using 10n traces 25 Barabasi-Albert 1024 nodes Facebook-Rice Undergraduate 1220 nodes Facebook-Rice Graduate 503 nodes Wednesday, August 14, 13
  • 55. Building on the First-Edge algorithm • First-Edge is close to optimal, but • Naive and too conservative: Ignores most of the trace information • predictable performance: At most as many true- positive edges as the number of traces (and no false positives) 26 Wednesday, August 14, 13
  • 56. Could we discover more true positives if we are willing to take more (calculated) risks? 27 Wednesday, August 14, 13
  • 58. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 Wednesday, August 14, 13
  • 59. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 s t0 N(s) = ds Wednesday, August 14, 13
  • 60. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 s t0 N(s) = ds u t1 N(u) = du Wednesday, August 14, 13
  • 61. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 ds 1 + du 1 edges waiting at time t1 s t0 N(s) = ds u t1 N(u) = du Wednesday, August 14, 13
  • 62. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 ds 1 + du 1 edges waiting at time t1 s t0 N(s) = ds u t1 N(u) = du Any of these are equally likely to be the first to finish Wednesday, August 14, 13
  • 63. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 ds 1 + du 1 edges waiting at time t1 v ?t2 s t0 N(s) = ds u t1 N(u) = du Wednesday, August 14, 13
  • 64. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 ds 1 + du 1 edges waiting at time t1 v ?t2 s t0 N(s) = ds u t1 N(u) = du s infected v with probability p(s,v) = ds 1 ds+du 2 u infected v with probability p(u,v) = du 1 ds+du 2 Wednesday, August 14, 13
  • 65. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 ds 1 + du 1 edges waiting at time t1 Infer (x, y) if p(x,y) 0.5 v ?t2 s t0 N(s) = ds u t1 N(u) = du s infected v with probability p(s,v) = ds 1 ds+du 2 u infected v with probability p(u,v) = du 1 ds+du 2 Wednesday, August 14, 13
  • 66. • Idea: 1. Reconstruct degree distribution 2. Guess edges by exploring the memoryless property. First-Edge+ 28 ds 1 + du 1 edges waiting at time t1 Infer (x, y) if p(x,y) 0.5 Given a larger trace prefix u1, · · · , uk ( u1 is the source) p(ui,uk+1) ' dui P j duj v ?t2 s t0 N(s) = ds u t1 N(u) = du s infected v with probability p(s,v) = ds 1 ds+du 2 u infected v with probability p(u,v) = du 1 ds+du 2 Wednesday, August 14, 13
  • 67. Experimental Inference Results 29 Barabasi-Albert 1024 nodes = 174 Power-law tree 1024 nodes = 94 Facebook 1220 nodes = 287 = max. degree -Netinf [Gomez-Rodriguez, Leskovec, Krause’2010] -First-Edge -First-Edge+ Wednesday, August 14, 13
  • 68. Experimental Inference Results 29 Barabasi-Albert 1024 nodes = 174 Power-law tree 1024 nodes = 94 Facebook 1220 nodes = 287 = max. degree First-Edge+ exhibit competitive performance -Netinf [Gomez-Rodriguez, Leskovec, Krause’2010] -First-Edge -First-Edge+ Wednesday, August 14, 13
  • 69. Experimental Inference Results 29 Barabasi-Albert 1024 nodes = 174 Power-law tree 1024 nodes = 94 Facebook 1220 nodes = 287 = max. degree NetInf’s performance flattens -Netinf [Gomez-Rodriguez, Leskovec, Krause’2010] -First-Edge -First-Edge+ Wednesday, August 14, 13
  • 70. Experimental Inference Results 29 Barabasi-Albert 1024 nodes = 174 Power-law tree 1024 nodes = 94 Facebook 1220 nodes = 287 = max. degree Our algorithm perfectly reconstructs trees with ~30 traces -Netinf [Gomez-Rodriguez, Leskovec, Krause’2010] -First-Edge -First-Edge+ Wednesday, August 14, 13
  • 71. Experimental Inference Results 29 Barabasi-Albert 1024 nodes = 174 Power-law tree 1024 nodes = 94 Facebook 1220 nodes = 287 = max. degree • First-Edge+: competitive performance, extremely simple to implement, computationally efficient, preemptive. -Netinf [Gomez-Rodriguez, Leskovec, Krause’2010] -First-Edge -First-Edge+ Wednesday, August 14, 13
  • 72. Conclusions • Our results have direct implication in the design of network inference algorithms • We provide rigorous analysis of the relationship between the amount of data and the performance of algorithms • We give algorithms that are competitive with, while being simpler and more efficient than, existing approaches 30 Wednesday, August 14, 13
  • 73. Open questions and challenges • Performance guarantees for approximated reconstruction • Trace complexity under other distributions of incubation times • Bounded degree network inference has trace complexity polynomial in Δ, but running time exponential in Δ - Can we optimize the algorithm? • Other network properties that can be recovered without reconstructing the network 31 Wednesday, August 14, 13
  • 74. Trace complexity of Network Inference Bruno Abrahao (Cornell) Flavio Chierichetti (Sapienza) Robert Kleinberg (Cornell) Alessandro Panconesi (Sapienza) Cornell University 32 Text Sapienza University Complete version including all proofs www.arxiv.org/abs/1308.2954 or http://www.cs.cornell.edu/~abrahao Wednesday, August 14, 13