Presentation of our work "Local discrepancy maximization on graphs" at the International IEEE Conference on Data Engineering (ICDE) 2015 in Seoul, Korea.
Seismic Method Estimate velocity from seismic data.pptx
Bump Hunting in the Dark - ICDE15 presentation
1. Bump
Hun(ng
in
the
Dark
Local
Discrepancy
Maximiza(on
on
Graphs
Aris(des
Gionis
Michael
Mathioudakis
An>
Ukkonen
April
16th,
2015
Seoul,
Korea
ICDE
2015
2. Find the Bump!
Here it
is!
regular
nodes
2
undirected
graph
computer
network
online
social
network
etc.
special
nodes
failure
reported
/
virus
detected!
posted
message
about
topic
X
3. for
α
=
1
score
=
18
–
3
=
15
score
=
1
–
0
=
1
score
=
0
–
1
=
-‐1
graph,
special
&
non-‐special
nodes
find
connected
subgraph
with
max
linear
discrepancy
score
=
α
x
#special
-‐
#non-‐special
3
4. fixed
frac(on
of
special
nodes
subgraph
size
ì
score
ì
fixed
subgraph
size
frac(on
of
special
nodes
ì
score
ì
graph,
special
&
non-‐special
nodes
find
connected
subgraph
with
max
linear
discrepancy
score
=
α
x
#special
-‐
#non-‐special
4
5. local
access
special
nodes
are
provided
as
input
build
graph
via
get-‐neighbors
func(on
how
much
of
the
graph
do
we
need?
(infinite
graph?)
5
6. Approach
first
retrieve
a
minimal
part
of
the
graph
necessary
when
we
have
only
local
access
use
get-neighbors func(on
to
expand
around
the
special
nodes
then
solve
problem
on
retrieved
subgraph
6
7. In
What
Follows
• Unrestricted
Access
– Complexity
– Connec(on
to
Steiner
Trees
– Special
Case:
Graph
is
Tree
– Algorithms
• Local
Access
– Algorithms
to
retrieve
(part
of)
the
graph
• Experiments
• Future
Work
7
9. Unrestricted
Access
the
problem
is…
a
special
case
of
PRIZECOLLECTINGSTEINERTREE
input:
graph
posi(ve
weight
on
terminal
nodes
non-‐nega(ve
weight
on
edges
output:
tree
min
Σ(missed
terminal
node
weights)
+
Σ
(edge
weights)
9
terminal
nodes
special
nodes:
weight
=
α+1
edges:
weight
=
1
max
discrepancy
is
instance
of
min
PCST
weight
GoemansWilliamson
algorithm,
O(|V||E|)
&
O(|V|2log|V|)
(2
-‐
1/(n
-‐
1))-‐approxima(on
min-‐approxima(on
does
not
translate
to
our
max-‐problem
10. In
What
Follows
• Unrestricted
Access
– Complexity
– Connec(on
to
Steiner
Trees
– Special
Case:
Graph
is
Tree
– Algorithms
• Local
Access
– Algorithms
to
retrieve
(part
of)
the
graph
• Experiments
• Future
Work
10
11. Special
Case
graph
is
a
tree
op(mal
linear
recursive
algorithm
op(mal
in
graph
op(mal
with
root
op(mal
with
root
of
subtree
or
11
op(mal
without
root
(in
subtree)
12. Algorithms
idea
for
general
case
heuris(cs
generate
spanning
tree
for
graph
solve
problem
on
spanning
tree
BFS-‐trees
from
each
special
node
min
weight
spanning
tree
random
weights
w(u,v) in [0,1]
‘smart’
weights
w(u,v) = 2 - 1{u is special} - 1{v is special}
GW
for
PCST
12
13. In
What
Follows
• Unrestricted
Access
– Complexity
– Connec(on
to
Steiner
Trees
– Special
Case:
Graph
is
Tree
– Algorithms
• Local
Access
– Algorithms
to
retrieve
(part
of)
the
graph
• Experiments
• Future
Work
13
14. Local
Access
start
with
special
nodes
call
get-neighbors(node)to
expand
retrieve
en(re
graph?
expansion
strategies
full
expansion
return
a
subgraph
that
contains
the
op(mal
solu(on
oblivious
expansion,
adap(ve
expansion
14
15. Full
Expansion
retrieve
graph
that
contains
op(mal
solu(on
with
calls
to
get-neighbors()
maintain
fron(er
to
expand
in
each
itera(on
fron(er
condi(on
unexpanded
nodes
for
which
min distance from reachable special node
less than (α+1) x #reachable_special_nodes
1. fron(er
=
special
nodes
2. call
get-‐neighbors()
on
fron(er
nodes
3. update
fron(er
4. go
to
.2.
if
fron(er
not
empty
ini(ally:
loop:
15
16. Full
Expansion
expensive
in
prac(ce
we
look
for
heuris(cs
16
17. Oblivious
Expansion
expand
(α+1)
(mes
around
each
special
node
not
guaranteed
to
cover
op(mal
solu(on
a
b
17
retrieved
by
ObliviousExpansion
addi(onally
retrieved
by
Full
Expansion
α
=
1
special
node
solution
solution
op(mal
18. Adap(ve
Expansion
idea
:
while
expanding,
es(mate
heurisHcally
how
close
we
are
to
op(mal
for
each
component,
solve
problem
on
spanning
trees
rooted
at
fron(er
nodes
obtain
solu(ons
with
and
w/o
roots
maintain
heuris(c
upper
bound
of
op(mal
solu(on
in
en(re
graph
lower
bound
of
op(mal
solu(on
from
retrieved
graph
terminate
when
upper
bound
≅
lower
bound
18
connected
components
of
retrieved
graph
tree
of
posi(ve
discrepancy
tree
of
nega(ve
discrepancy
19. In
What
Follows
• Unrestricted
Access
– Complexity
– Connec(on
to
Steiner
Trees
– Special
Case:
Graph
is
Tree
– Algorithms
• Local
Access
– Algorithms
to
retrieve
(part
of)
the
graph
• Experiments
• Future
Work
19
20. Datasets
Table I
DATASET STATISTICS (NUMBERS ARE ROUNDED).
Dataset |V | |E|
Geo 1 · 106 4 · 106
BA 1 · 106 10 · 106
Grid 4 · 106 8 · 106
Livejournal 4.3 · 106 69 · 106
Patents 2 · 106 16.5 · 106
Pokec 1.4 · 106 30.6 · 106
All graphs used in the experiments are undirected and their
sizes are reported in Table I.
N
s
q
s
g
g
i
F
synthe(c
real
20
21. Input
graphs
from
previous
datasets
special
nodes:
planted
spheres
k
spheres
radius
ρ
s
special
nodes
inside
spheres
z
special
nodes
outside
spheres
21
23. Expansion
Table II
EXPANSION TABLE (AVERAGES OF 20 RUNS)
ObliviousExp. AdaptiveExp.
dataset s k cost size cost size
Grid 20 2 302 888 2783 7950
Grid 60 1 261 784 534 1604
Geo 20 2 452 2578 4833 30883
Geo 60 1 418 2452 578 3991
BA 20 2 3943 243227 114 6032
BA 60 1 4477 271870 135 7407
Patents 20 2 605 3076 13436 25544
Patents 60 1 620 3126 5907 13009
Pokec 20 2 3884 217592 161 7249
Pokec 60 1 4343 240544 116 5146
Livejournal 20 2 3703 348933 234 13540
Livejournal 60 1 4667 394023 129 7087
generate Q, and in interest of presentation, here we report
what we consider to be representative results.
Table II shows the cost (number of API calls) as well
as the size (number of edges) of the retrieved graph G .
1e+01
1e-02
1e-01
1e+00
1e+01
1e+02
Figure 6. Running times of
size (number of edges). We
23
24. Op(miza(on
10
ES OF 20 RUNS)
Exp. AdaptiveExp.
size cost size
888 2783 7950
784 534 1604
2578 4833 30883
2452 578 3991
3227 114 6032
1870 135 7407
3076 13436 25544
3126 5907 13009
7592 161 7249
0544 116 5146
8933 234 13540
4023 129 7087
sentation, here we report
ve results.
r of API calls) as well
the retrieved graph GX
.
s that for Grid, Geo,
on results in fewer API
Running time (in sec)
1e+01 1e+03 1e+05
1e-02
1e-01
1e+00
1e+01
1e+02
expansion size (#edges)
BFS-Tree
Random-ST
PCST-Tree
Smart-ST
Figure 6. Running times of the different algorithms as a function of expansion
size (number of edges). We can see that in comparison to PCST-Tree Smart-
ST scales to inputs that are up to two orders of magnitude larger.
24
26. In
What
Follows
• Unrestricted
Access
– Complexity
– Connec(on
to
Steiner
Trees
– Special
Case:
Graph
is
Tree
– Algorithms
• Local
Access
– Algorithms
to
retrieve
(part
of)
the
graph
• Experiments
• Future
Work
26
27. Future
Work
Approxima(on
Guarantee
/
Tighter
expansions
Distributed
Se>ng
Unknown
Scale
Applica(on
on
Real
Data
(TKDE
Extension)
27