Paper presentation on ABRA: a random sampling based approximate betweenness centrality estimation method
Please find the PDF here https://drive.google.com/open?id=0BwND8Ws5fgEeTlVBejFGLW9QaFU
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages
1. ABRA: APPROXIMATING BETWEENNESS CENTRALITY
IN STATIC AND DYNAMIC GRAPHS WITH
RADEMACHER AVERAGES
Matteo Riondata and Eli Upfal
22nd ACM SIGKDD Conference, August 2016
1
Murata Lab - Paper reading seminar
Presented by: Kaushalya Madhawa
(25th November 2016)
2. OUTLINE
1. INTRODUCTION
2. RANDOM SAMPLING FOR APPROXIMATIONS
3. STATISTICAL LEARNING THEORY
‣ representativeness of a sample
‣ Rademacher averages
4. EXPERIMENTS AND RESULTS
2
3. BETWEENNESS CENTRALITY (BC)
▸ unweighted graph G = (V, E)
▸ n = |V|, m = |E|
3
b(w) =
1
|V | (|V | −1)
∑(u,v)∈VXV
σuv (w)
σuv
W
V
σuv (w) - number of shortest paths from u to v
passing through w U
4. BETWEENNESS CENTRALITY (BC)
▸ unweighted graph G = (V, E)
▸ n = |V|, m = |E|
▸ fastest exact betweenness calculation
algorithm runs in O(nm) [Brandes 2001]
▸ requires O(n+m) space
4
b(w) =
1
|V | (|V | −1)
∑(u,v)∈VXV
σuv (w)
σuv
W
V
σuv (w) - number of shortest paths from u to v
passing through w U
5. ▸ these methods are based on random sampling to estimate
betweenness centrality with an acceptable accuracy
▸ problem definition
▸ given ε, δ ∈ (0, 1), an (ε, δ) approximation to B is a
collection such that
APPROXIMATE BC FOR LARGE NETWORKS 5
6. CONTRIBUTIONS OF THIS PAPER
▸ progressive sampling based BC approximation within ε
additive factor
▸ first BC approximation algorithm to estimate BC without
depending on any global property of the graph
▸ ie: RK algorithm [Riandato and Karnopoulis 2016]
depends on Vertex diameter of the graph
6
9. PROGRESSIVE SAMPLING
▸ What is a good stopping condition?
▸ guarantees that the computed approximation fulfills the
desired quality properties
▸ can be evaluated efficiently
▸ is tight (satisfied at small sample sizes)
▸ Determining sampling schedule
▸ minimize the number of iterations that are needed
before the stopping condition is satisfied
9
10. RECAP OF STATISTICAL LEARNING THEORY
▸ A training set S is called (w.r.t. domain Z ,
hypothesis class H , loss function l , and distribution D ) if
▸ representativeness of sample S with respect to F is
defined as the largest gap between the true error of a
function f and its empirical error
10
ε − representative
sup
h∈H
| LD (h)− LS (h)| ≤ ε
LD ( f ) = EZ~D[ f (z)] LS ( f ) =
1
m
f
i=1
m
∑ (zi )
RepD (F,S) = sup
f ∈F
(LD ( f )− LS ( f ))
given f ∈F,
11. REPRESENTATIVENESS OF A SAMPLE
▸ how to estimate representative of S using a single sample?
11
S =
S = sup
f ∈F
(LS1
( f )− LS2
( f ))
S = 2
m
sup
f ∈F
σi
i=1
m
∑ f (zi )
σ = (σ1,..,σm ) ∈{±1}m
12. RADEMACHER AVERAGE 12
‣ Rademacher complexity measure
captures this idea by considering
the expectation of the above with
respect to a random choice of σ
F°S = {( f (z1),...., f (zm )): f ∈F}
R(F°S) =
1
m
Eσ ~{±1}[sup
f ∈F
σi
i=1
m
∑ f (zi )]
σ be distributed i.i.d. according to P[i = 1] = P[i = 1] = 0.5
LD ( f )− LS ( f ) ≤ 2E ′S ~Dm R(F° ′S )+ c
2ln(2 /δ )
m
13. BACK TO BC
‣for each node w, is the fraction of shortest paths from u
to v going through w
13
fw (u,v)
LD ( fw ) =
1
| D |
σuv (w)
σuv(u,v)∈VXV ,u≠v
∑ = b(w)
14. RADEMACHER AVERAGE: HOW TO CALCULATE?
▸ calculation is not straightforward and can be time
consuming
▸ an upper bound to the Rademacher average is used in
place of
14
R(F°S) =
1
m
Eσ ~{±1}[sup
f ∈F
σi
i=1
m
∑ f (zi )]
R(F°S) ≤ mins∈!+
ω(s)
ω(s) =
1
s
ln v∈υs
e∑ xp(s2
|| v ||2
/(2m2
))
vw = ( fw (u1,v1),..., fw (um,vm ))
νs = {vw,w ∈V} (|νs |≤|V |)
R(F°S)
15. STOPPING CONDITION OF BC CALCULATION
▸ a tighter upper bound to maximum deviation average
calculated [Oneto 2013]
15
Δs =
ω*
1−α
+
ln(2 /δ )
2lα(1−α)
+
ln(2 /δ )
2m
Δs ≤ ε
α =
ln(2 /δ )
ln(2 /δ )+ (2lR(F°S)+ ln(2 /δ ))ln(2 /δ )
‣ when this holds collection
is returned
16. SAMPLING SCHEDULE
▸ initial sample size determined by
▸ next sample size ( ) is calculated assuming that , which
is and upper bound to is also an upper bound to
16
R(F°Si )
R(F°Si+1)
Si+1
17. DYNAMIC GRAPH BC APPROXIMATION (ABRA-D)
▸ vertex and edge insertions and deletions allowed
▸ two data structures introduced by Hayashi et al (2015)
used
▸ Hypergraph sketch: weighted hyper edge
representation of shortest paths
▸ Two-ball index: to efficiently detect the parts of the
Hypergraph sketch that need to be modified
17
19. EXPERIMENTAL RESULTS
▸ δ is is fixed to 0.1
▸ given the logarithmic dependence of the sample size on
δ, impact on the results is limited
19
20. REFERENCES
[1] U. Brandes. A faster algorithm for betweenness centrality. J. Math. Sociol.,
25(2):163–177, 2001. doi: 10.1080/0022250X.2001.9990249
[2] M. Riondato and E. M. Kornaropoulos. Fast approximation of betweenness
centrality through sampling. Data Mining and Knowledge Discovery, 30(2):438–
475, 2015. ISSN 1573-756X. doi: 10.1007/s10618-015-0423-0.
[3] T. Hayashi, T. Akiba, and Y. Yoshida. Fully dynamic betweenness centrality
maintenance on massive networks. Proceedings of the VLDB Endowment, 9(2),
2015
[4] L. Oneto, A. Ghio, D. Anguita, and S. Ridella. An improved analysis of the
Rademacher data-dependent bound using its self bounding property. Neural
Networks, 44:107–111, 2013.
20