ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

ABRA: APPROXIMATING BETWEENNESS CENTRALITY
IN STATIC AND DYNAMIC GRAPHS WITH
RADEMACHER AVERAGES
Matteo Riondata and Eli Upfal
22nd ACM SIGKDD Conference, August 2016
1
Murata Lab - Paper reading seminar
Presented by: Kaushalya Madhawa
(25th November 2016)

OUTLINE
1. INTRODUCTION
2. RANDOM SAMPLING FOR APPROXIMATIONS
3. STATISTICAL LEARNING THEORY
‣ representativeness of a sample
‣ Rademacher averages
4. EXPERIMENTS AND RESULTS
2

BETWEENNESS CENTRALITY (BC)
▸ unweighted graph G = (V, E)
▸ n = |V|, m = |E|
3
b(w) =
1
|V | (|V | −1)
∑(u,v)∈VXV
σuv (w)
σuv
W
V
σuv (w) - number of shortest paths from u to v
passing through w U

BETWEENNESS CENTRALITY (BC)
▸ unweighted graph G = (V, E)
▸ n = |V|, m = |E|
▸ fastest exact betweenness calculation
algorithm runs in O(nm) [Brandes 2001]
▸ requires O(n+m) space
4
b(w) =
1
|V | (|V | −1)
∑(u,v)∈VXV
σuv (w)
σuv
W
V
σuv (w) - number of shortest paths from u to v
passing through w U

▸ these methods are based on random sampling to estimate
betweenness centrality with an acceptable accuracy
▸ problem deﬁnition
▸ given ε, δ ∈ (0, 1), an (ε, δ) approximation to B is a
collection such that
APPROXIMATE BC FOR LARGE NETWORKS 5

CONTRIBUTIONS OF THIS PAPER
▸ progressive sampling based BC approximation within ε
additive factor
▸ ﬁrst BC approximation algorithm to estimate BC without
depending on any global property of the graph
▸ ie: RK algorithm [Riandato and Karnopoulis 2016]
depends on Vertex diameter of the graph
6

RANDOM SAMPLING TO APPROXIMATE BETWEENNESS 7

PROGRESSIVE SAMPLING
▸ What is a good stopping condition?
▸ guarantees that the computed approximation fulfills the
desired quality properties
▸ can be evaluated efficiently
▸ is tight (satisfied at small sample sizes)
▸ Determining sampling schedule
▸ minimize the number of iterations that are needed
before the stopping condition is satisfied
9

RECAP OF STATISTICAL LEARNING THEORY
▸ A training set S is called (w.r.t. domain Z ,
hypothesis class H , loss function l , and distribution D ) if
▸ representativeness of sample S with respect to F is
deﬁned as the largest gap between the true error of a
function f and its empirical error
10
ε − representative
sup
h∈H
| LD (h)− LS (h)| ≤ ε
LD ( f ) = EZ~D[ f (z)] LS ( f ) =
1
m
f
i=1
m
∑ (zi )
RepD (F,S) = sup
f ∈F
(LD ( f )− LS ( f ))
given f ∈F,

REPRESENTATIVENESS OF A SAMPLE
▸ how to estimate representative of S using a single sample?
11
S =
S = sup
f ∈F
(LS1
( f )− LS2
( f ))
S = 2
m
sup
f ∈F
σi
i=1
m
∑ f (zi )
σ = (σ1,..,σm ) ∈{±1}m

RADEMACHER AVERAGE 12
‣ Rademacher complexity measure
captures this idea by considering
the expectation of the above with
respect to a random choice of σ
F°S = {( f (z1),...., f (zm )): f ∈F}
R(F°S) =
1
m
Eσ ~{±1}[sup
f ∈F
σi
i=1
m
∑ f (zi )]
σ be distributed i.i.d. according to P[i = 1] = P[i = 1] = 0.5
LD ( f )− LS ( f ) ≤ 2E ′S ~Dm R(F° ′S )+ c
2ln(2 /δ )
m

BACK TO BC
‣for each node w, is the fraction of shortest paths from u
to v going through w
13
fw (u,v)
LD ( fw ) =
1
| D |
σuv (w)
σuv(u,v)∈VXV ,u≠v
∑ = b(w)

RADEMACHER AVERAGE: HOW TO CALCULATE?
▸ calculation is not straightforward and can be time
consuming
▸ an upper bound to the Rademacher average is used in
place of
14
R(F°S) =
1
m
Eσ ~{±1}[sup
f ∈F
σi
i=1
m
∑ f (zi )]
R(F°S) ≤ mins∈!+
ω(s)
ω(s) =
1
s
ln v∈υs
e∑ xp(s2
|| v ||2
/(2m2
))
vw = ( fw (u1,v1),..., fw (um,vm ))
νs = {vw,w ∈V} (|νs |≤|V |)
R(F°S)

STOPPING CONDITION OF BC CALCULATION
▸ a tighter upper bound to maximum deviation average
calculated [Oneto 2013]
15
Δs =
ω*
1−α
+
ln(2 /δ )
2lα(1−α)
+
ln(2 /δ )
2m
Δs ≤ ε
α =
ln(2 /δ )
ln(2 /δ )+ (2lR(F°S)+ ln(2 /δ ))ln(2 /δ )
‣ when this holds collection
is returned

SAMPLING SCHEDULE
▸ initial sample size determined by
▸ next sample size ( ) is calculated assuming that , which
is and upper bound to is also an upper bound to
16
R(F°Si )
R(F°Si+1)
Si+1

DYNAMIC GRAPH BC APPROXIMATION (ABRA-D)
▸ vertex and edge insertions and deletions allowed
▸ two data structures introduced by Hayashi et al (2015)
used
▸ Hypergraph sketch: weighted hyper edge
representation of shortest paths
▸ Two-ball index: to efﬁciently detect the parts of the
Hypergraph sketch that need to be modiﬁed
17

EXPERIMENTAL EVALUATION
▸ performance measured using
▸ runtime
▸ sample size
▸ accuracy
▸ algorithms compared
▸ BA [Brandes 2001] - exact algorithm
▸ RK [Riondato and Kornaropoulos 2016]
18

EXPERIMENTAL RESULTS
▸ δ is is ﬁxed to 0.1
▸ given the logarithmic dependence of the sample size on
δ, impact on the results is limited
19

REFERENCES
[1] U. Brandes. A faster algorithm for betweenness centrality. J. Math. Sociol.,
25(2):163–177, 2001. doi: 10.1080/0022250X.2001.9990249
[2] M. Riondato and E. M. Kornaropoulos. Fast approximation of betweenness
centrality through sampling. Data Mining and Knowledge Discovery, 30(2):438–
475, 2015. ISSN 1573-756X. doi: 10.1007/s10618-015-0423-0.
[3] T. Hayashi, T. Akiba, and Y. Yoshida. Fully dynamic betweenness centrality
maintenance on massive networks. Proceedings of the VLDB Endowment, 9(2),
2015
[4] L. Oneto, A. Ghio, D. Anguita, and S. Ridella. An improved analysis of the
Rademacher data-dependent bound using its self bounding property. Neural
Networks, 44:107–111, 2013.
20

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

Ähnlich wie ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages