SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Bump	
  Hun(ng	
  in	
  the	
  Dark	
  
Local	
  Discrepancy	
  Maximiza(on	
  on	
  Graphs	
  
Aris(des	
  Gionis	
  
Michael	
  Mathioudakis	
  
An>	
  Ukkonen	
  
April	
  16th,	
  2015	
  
Seoul,	
  Korea	
  
ICDE	
  2015	
  
Find the Bump!
Here it
is!
regular	
  nodes	
  
2	
  
undirected	
  graph	
  
computer	
  network	
  
online	
  social	
  network	
  
etc.	
  
special	
  nodes	
  
failure	
  reported	
  /	
  virus	
  detected!	
  
posted	
  message	
  about	
  topic	
  X	
  	
  
for	
  α	
  =	
  1	
  
score	
  =	
  18	
  –	
  3	
  =	
  15	
  
score	
  =	
  1	
  –	
  0	
  =	
  1	
  
score	
  =	
  0	
  –	
  1	
  =	
  -­‐1	
  
graph,	
  special	
  &	
  non-­‐special	
  nodes	
  
	
  
find	
  connected	
  subgraph	
  with	
  
	
  
max	
  linear	
  discrepancy	
  
score	
  	
  =	
  	
  α	
  x	
  #special	
  -­‐	
  #non-­‐special	
  
3	
  
fixed	
  frac(on	
  of	
  special	
  nodes	
  
subgraph	
  size	
  ì	
  	
  score	
  ì
fixed	
  subgraph	
  size	
  
frac(on	
  of	
  special	
  nodes	
  ì	
  	
  score	
  ì	
  
graph,	
  special	
  &	
  non-­‐special	
  nodes	
  
	
  
find	
  connected	
  subgraph	
  with	
  
	
  
max	
  linear	
  discrepancy	
  
score	
  	
  =	
  	
  α	
  x	
  #special	
  -­‐	
  #non-­‐special	
  
4	
  
local	
  access	
  
special	
  nodes	
  are	
  provided	
  as	
  input	
  
build	
  graph	
  via	
  get-­‐neighbors	
  func(on	
  
	
  
how	
  much	
  of	
  the	
  graph	
  do	
  we	
  need?	
  
(infinite	
  graph?)	
  
5	
  
Approach	
  
	
  
first	
  
retrieve	
  a	
  minimal	
  part	
  of	
  the	
  graph	
  
necessary	
  when	
  we	
  have	
  only	
  local	
  access	
  
use	
  get-neighbors func(on	
  to	
  
expand	
  around	
  the	
  special	
  nodes	
  
	
  
then	
  
solve	
  problem	
  on	
  retrieved	
  subgraph	
  
6	
  
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
  
–  Special	
  Case:	
  Graph	
  is	
  Tree	
  
–  Algorithms	
  
•  Local	
  Access	
  
–  Algorithms	
  to	
  retrieve	
  (part	
  of)	
  the	
  graph	
  
•  Experiments	
  
•  Future	
  Work	
  
7	
  
Unrestricted	
  Access	
  
the	
  problem	
  is…	
  
NP-­‐hard	
  
(reduc(on	
  from	
  SETCOVER)	
  
8	
  
Unrestricted	
  Access	
  
the	
  problem	
  is…	
  
a	
  special	
  case	
  of	
  PRIZECOLLECTINGSTEINERTREE	
  
	
  
input:	
  graph 	
  posi(ve	
  weight	
  on	
  terminal	
  nodes	
  
	
   	
   	
   	
  non-­‐nega(ve	
  weight	
  on	
  edges	
  
output:	
  tree 	
  min	
  Σ(missed	
  terminal	
  node	
  weights)	
  +	
  Σ	
  (edge	
  weights)	
  
9	
  
terminal	
  nodes	
   special	
  nodes:	
  weight	
  =	
  α+1	
  
edges: 	
   	
  weight	
  =	
  1	
  
max	
  discrepancy	
  	
  	
  	
  	
  is	
  instance	
  of	
  	
  	
  	
  	
  min	
  PCST	
  weight	
  
GoemansWilliamson	
  algorithm,	
  O(|V||E|)	
  &	
  O(|V|2log|V|)	
  
(2	
  -­‐	
  1/(n	
  -­‐	
  1))-­‐approxima(on	
  
min-­‐approxima(on	
  does	
  not	
  translate	
  to	
  our	
  max-­‐problem	
  
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
  
–  Special	
  Case:	
  Graph	
  is	
  Tree	
  
–  Algorithms	
  
•  Local	
  Access	
  
–  Algorithms	
  to	
  retrieve	
  (part	
  of)	
  the	
  graph	
  
•  Experiments	
  
•  Future	
  Work	
  
10	
  
Special	
  Case	
  
graph	
  is	
  a	
  tree	
  
op(mal	
  linear	
  recursive	
  algorithm	
  
op(mal	
  in	
  graph	
  
op(mal	
  with	
  root	
  
op(mal	
  with	
  root	
  
of	
  subtree	
  
or	
  
11	
  
op(mal	
  without	
  root	
  
(in	
  subtree)	
  
Algorithms	
  
idea	
  for	
  general	
  case	
  heuris(cs	
  
generate	
  spanning	
  tree	
  for	
  graph	
  
solve	
  problem	
  on	
  spanning	
  tree	
  
	
  
BFS-­‐trees	
  from	
  each	
  special	
  node	
  
min	
  weight	
  spanning	
  tree	
  
random	
  weights	
  	
  
w(u,v) in [0,1]
‘smart’	
  weights	
  
w(u,v) = 2 - 1{u is special} - 1{v is special}
GW	
  for	
  PCST	
   12	
  
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
  
–  Special	
  Case:	
  Graph	
  is	
  Tree	
  
–  Algorithms	
  
•  Local	
  Access	
  
–  Algorithms	
  to	
  retrieve	
  (part	
  of)	
  the	
  graph	
  
•  Experiments	
  
•  Future	
  Work	
  
13	
  
Local	
  Access	
  
start	
  with	
  special	
  nodes	
  
call	
  get-neighbors(node)to	
  expand	
  
retrieve	
  en(re	
  graph?	
  
	
  
expansion	
  strategies	
  
full	
  expansion	
  
return	
  a	
  subgraph	
  that	
  contains	
  the	
  op(mal	
  solu(on	
  
oblivious	
  expansion,	
  adap(ve	
  expansion	
  
	
   14	
  
Full	
  Expansion	
  
retrieve	
  graph	
  that	
  contains	
  op(mal	
  solu(on	
  
with	
  calls	
  to	
  	
  get-neighbors()	
  
maintain	
  fron(er	
  to	
  expand	
  in	
  each	
  itera(on	
  
	
  
fron(er	
  condi(on	
  
unexpanded	
  nodes	
  for	
  which	
  
min distance from reachable special node
less than (α+1) x #reachable_special_nodes
1.  fron(er	
  =	
  special	
  nodes	
  
2.  call	
  get-­‐neighbors()	
  on	
  fron(er	
  nodes	
  
3.  update	
  fron(er	
  
4.  go	
  to	
  .2.	
  if	
  fron(er	
  not	
  empty	
  
ini(ally:	
  
loop:	
  
15	
  
Full	
  Expansion	
  
	
  
	
  
expensive	
  in	
  prac(ce	
  
we	
  look	
  for	
  heuris(cs	
  
16	
  
Oblivious	
  Expansion	
  
expand	
  (α+1)	
  (mes	
  around	
  each	
  special	
  node	
  
not	
  guaranteed	
  to	
  cover	
  op(mal	
  solu(on	
  
	
  
	
  a	
  
b
17	
  
retrieved	
  by	
  ObliviousExpansion	
  
addi(onally	
  retrieved	
  by	
  Full	
  Expansion	
  
α	
  =	
  1	
  
special	
  node	
  
solution
solution
op(mal	
  
Adap(ve	
  Expansion	
  	
  
idea	
  :	
  while	
  expanding,	
  es(mate	
  heurisHcally	
  how	
  close	
  we	
  are	
  to	
  op(mal	
  
for	
  each	
  component,	
  solve	
  problem	
  on	
  spanning	
  trees	
  rooted	
  at	
  fron(er	
  nodes	
  
obtain	
  solu(ons	
  with	
  and	
  w/o	
  roots	
  
maintain	
  heuris(c	
  
upper	
  bound	
  of	
  op(mal	
  solu(on	
  in	
  en(re	
  graph	
  
lower	
  bound	
  of	
  op(mal	
  solu(on	
  from	
  retrieved	
  graph	
  
terminate	
  when	
  upper	
  bound	
  	
  ≅	
  	
  lower	
  bound	
  	
   18	
  
connected	
  components	
  
of	
  retrieved	
  graph	
   tree	
  of	
  
posi(ve	
  
discrepancy	
  
tree	
  of	
  
nega(ve	
  
discrepancy	
  
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
  
–  Special	
  Case:	
  Graph	
  is	
  Tree	
  
–  Algorithms	
  
•  Local	
  Access	
  
–  Algorithms	
  to	
  retrieve	
  (part	
  of)	
  the	
  graph	
  
•  Experiments	
  
•  Future	
  Work	
  
19	
  
Datasets	
  
Table I
DATASET STATISTICS (NUMBERS ARE ROUNDED).
Dataset |V | |E|
Geo 1 · 106 4 · 106
BA 1 · 106 10 · 106
Grid 4 · 106 8 · 106
Livejournal 4.3 · 106 69 · 106
Patents 2 · 106 16.5 · 106
Pokec 1.4 · 106 30.6 · 106
All graphs used in the experiments are undirected and their
sizes are reported in Table I.
N
s
q
s
g
g
i
F
synthe(c	
  
real	
  
20	
  
Input	
  
graphs	
  from	
  previous	
  datasets	
  
	
  
special	
  nodes:	
  planted	
  spheres	
  
k	
  spheres	
  
radius	
  ρ	
  
s	
  special	
  nodes	
  inside	
  spheres	
  
z	
  special	
  nodes	
  outside	
  spheres	
  
	
  
21	
  
Algorithms	
  
	
  
expansion	
  
Full	
  (...not!)	
  
Oblivious	
  
Adap(ve	
  
	
  
op(miza(on	
  
BFS	
  
random-­‐ST	
  
smart-­‐ST	
  
PCST	
  
	
  
measure	
  
cost	
  (get-neighbors	
  calls)	
  
graph	
  size	
  
	
  
	
  
	
  
	
  
running	
  (me	
  vs	
  graph	
  size	
  
accuracy	
  (Jaccard	
  coeff)	
  
discrepancy	
  
22	
  
Expansion	
  
Table II
EXPANSION TABLE (AVERAGES OF 20 RUNS)
ObliviousExp. AdaptiveExp.
dataset s k cost size cost size
Grid 20 2 302 888 2783 7950
Grid 60 1 261 784 534 1604
Geo 20 2 452 2578 4833 30883
Geo 60 1 418 2452 578 3991
BA 20 2 3943 243227 114 6032
BA 60 1 4477 271870 135 7407
Patents 20 2 605 3076 13436 25544
Patents 60 1 620 3126 5907 13009
Pokec 20 2 3884 217592 161 7249
Pokec 60 1 4343 240544 116 5146
Livejournal 20 2 3703 348933 234 13540
Livejournal 60 1 4667 394023 129 7087
generate Q, and in interest of presentation, here we report
what we consider to be representative results.
Table II shows the cost (number of API calls) as well
as the size (number of edges) of the retrieved graph G .
1e+01
1e-02
1e-01
1e+00
1e+01
1e+02
Figure 6. Running times of
size (number of edges). We
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
23	
  
Op(miza(on	
  
10
ES OF 20 RUNS)
Exp. AdaptiveExp.
size cost size
888 2783 7950
784 534 1604
2578 4833 30883
2452 578 3991
3227 114 6032
1870 135 7407
3076 13436 25544
3126 5907 13009
7592 161 7249
0544 116 5146
8933 234 13540
4023 129 7087
sentation, here we report
ve results.
r of API calls) as well
the retrieved graph GX
.
s that for Grid, Geo,
on results in fewer API
Running time (in sec)
1e+01 1e+03 1e+05
1e-02
1e-01
1e+00
1e+01
1e+02
expansion size (#edges)
BFS-Tree
Random-ST
PCST-Tree
Smart-ST
Figure 6. Running times of the different algorithms as a function of expansion
size (number of edges). We can see that in comparison to PCST-Tree Smart-
ST scales to inputs that are up to two orders of magnitude larger.
24	
  
Op(miza(on	
   11
Table III
ACCURACY, AVERAGES OF 20 RUNS
ObliviousExpansion AdaptiveExpansion
dataset s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST
Grid 20 2 0.88 (0) 0.81 (0) 0.93 (0) 0.93 (0) 0.88 (0) 0.85 (0) 0.93 (0) 0.93 (0)
Grid 60 1 1.00 (0) 0.94 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 1.00 (0) 1.00 (0)
Geo 20 2 1.00 (0) 0.95 (0) 1.00 (0) 1.00 (0) 1.00 (0) 0.98 (0) 1.00 (0) 1.00 (0)
Geo 60 1 1.00 (0) 0.96 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 0.99 (0) 0.99 (0)
BA 20 2 0.47 (12) 0.18 (12) NaN (20) 0.46 (0) 0.46 (0) 0.44 (0) 0.46 (0) 0.45 (0)
BA 60 1 NaN (20) NaN (20) NaN (20) 0.77 (0) 0.76 (0) 0.76 (0) 0.77 (3) 0.76 (0)
Patents 20 2 0.92 (0) 0.86 (0) 0.91 (0) 0.90 (0) 0.72 (0) 0.74 (0) 0.77 (3) 0.74 (0)
Patents 60 1 0.89 (0) 0.76 (0) 0.89 (0) 0.89 (0) 0.74 (0) 0.73 (0) 0.74 (0) 0.74 (0)
Pokec 20 2 0.53 (2) 0.13 (3) NaN (20) 0.46 (0) 0.43 (0) 0.41 (0) 0.42 (2) 0.40 (0)
Pokec 60 1 0.74 (6) 0.09 (6) NaN (20) 0.61 (0) 0.48 (0) 0.46 (0) 0.45 (1) 0.45 (0)
Livejournal 20 2 0.62 (5) 0.19 (5) NaN (20) 0.54 (0) 0.56 (0) 0.53 (0) 0.58 (5) 0.56 (0)
Livejournal 60 1 0.88 (12) 0.26 (9) NaN (20) 0.68 (0) 0.65 (0) 0.62 (0) 0.62 (1) 0.62 (0)
Table IV
DISCREPANCY, AVERAGES OF 20 RUNS
ObliviousExpansion AdaptiveExpansion
dataset s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST
Grid 20 2 14.5 (0) 11.8 (0) 16.8 (0) 16.7 (0) 14.8 (0) 13.8 (0) 16.4 (0) 16.3 (0)
Grid 60 1 41.0 (0) 36.9 (0) 41.0 (0) 41.0 (0) 40.5 (0) 38.9 (0) 40.9 (0) 40.9 (0)
Geo 20 2 19.9 (0) 18.4 (0) 20.0 (0) 20.0 (0) 19.9 (0) 19.2 (0) 20.0 (0) 20.0 (0)
Geo 60 1 22.0 (0) 20.6 (0) 22.0 (0) 22.0 (0) 21.8 (0) 21.6 (0) 21.8 (0) 21.8 (0)
BA 20 2 15.0 (12) 2.8 (12) NaN (20) 15.2 (0) 15.6 (0) 14.4 (0) 14.4 (0) 15.0 (0)
BA 60 1 NaN (20) NaN (20) NaN (20) 36.1 (0) 37.4 (0) 35.3 (0) 35.9 (3) 35.5 (0)
Patents 20 2 17.4 (0) 15.8 (0) 17.7 (0) 17.6 (0) 14.9 (0) 13.8 (0) 15.8 (3) 14.8 (0)
Patents 60 1 40.0 (0) 31.1 (0) 40.8 (0) 40.6 (0) 33.0 (0) 32.2 (0) 33.2 (0) 33.3 (0)
Pokec 20 2 11.6 (2) 2.6 (3) NaN (20) 11.8 (0) 8.6 (0) 8.0 (0) 8.2 (2) 8.0 (0)
Pokec 60 1 36.6 (6) 4.7 (6) NaN (20) 28.6 (0) 20.9 (0) 17.4 (0) 18.3 (1) 18.5 (0)
Livejournal 20 2 14.3 (5) 3.5 (5) NaN (20) 13.8 (0) 11.8 (0) 9.8 (0) 10.8 (5) 10.2 (0)
Livejournal 60 1 45.6 (12) 12.0 (9) NaN (20) 31.1 (0) 29.8 (0) 25.6 (0) 26.8 (1) 27.4 (0)
ObliviousExpansion for denser graphs. We also note that once a subgraph has been discovered in the
25	
  
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
  
–  Special	
  Case:	
  Graph	
  is	
  Tree	
  
–  Algorithms	
  
•  Local	
  Access	
  
–  Algorithms	
  to	
  retrieve	
  (part	
  of)	
  the	
  graph	
  
•  Experiments	
  
•  Future	
  Work	
  
26	
  
Future	
  Work	
  
	
  
Approxima(on	
  Guarantee	
  /	
  Tighter	
  expansions	
  
Distributed	
  Se>ng	
  
Unknown	
  Scale	
  
Applica(on	
  on	
  Real	
  Data	
  
(TKDE	
  Extension)	
  
27	
  
The End
28	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1aravindangc
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofElliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofArunanand Ta
 
Distributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with RegularizationDistributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with RegularizationИлья Трофимов
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
 
H2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine UdellH2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine UdellSri Ambati
 
Line drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesLine drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesAnkit Garg
 
Slides: Perspective click-and-drag area selections in pictures
Slides: Perspective click-and-drag area selections in picturesSlides: Perspective click-and-drag area selections in pictures
Slides: Perspective click-and-drag area selections in picturesFrank Nielsen
 
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Alexander Litvinenko
 
K10692 control theory
K10692 control theoryK10692 control theory
K10692 control theorysaagar264
 
Habilitation à diriger des recherches
Habilitation à diriger des recherchesHabilitation à diriger des recherches
Habilitation à diriger des recherchesPierre Pudlo
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationAlexander Litvinenko
 

Was ist angesagt? (20)

Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Hat04 0205
Hat04 0205Hat04 0205
Hat04 0205
 
10_rnn.pdf
10_rnn.pdf10_rnn.pdf
10_rnn.pdf
 
15_representation.pdf
15_representation.pdf15_representation.pdf
15_representation.pdf
 
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofElliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge Proof
 
parent functons
parent functonsparent functons
parent functons
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Ch8
Ch8Ch8
Ch8
 
14_autoencoders.pdf
14_autoencoders.pdf14_autoencoders.pdf
14_autoencoders.pdf
 
Distributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with RegularizationDistributed Coordinate Descent for Logistic Regression with Regularization
Distributed Coordinate Descent for Logistic Regression with Regularization
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
karnaugh maps
karnaugh mapskarnaugh maps
karnaugh maps
 
H2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine UdellH2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine Udell
 
Line drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesLine drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniques
 
Slides: Perspective click-and-drag area selections in pictures
Slides: Perspective click-and-drag area selections in picturesSlides: Perspective click-and-drag area selections in pictures
Slides: Perspective click-and-drag area selections in pictures
 
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
 
K10692 control theory
K10692 control theoryK10692 control theory
K10692 control theory
 
Habilitation à diriger des recherches
Habilitation à diriger des recherchesHabilitation à diriger des recherches
Habilitation à diriger des recherches
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 

Ähnlich wie Bump Hunting in the Dark - ICDE15 presentation

Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Graphical Model Selection for Big Data
Graphical Model Selection for Big DataGraphical Model Selection for Big Data
Graphical Model Selection for Big DataAlexander Jung
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting classgiridaroori
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...Artem Lutov
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNDat Nguyen
 
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesEfficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesVissarion Fisikopoulos
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxarifimad15
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Distributed computation and reconfiguration in actively dynamic networks
Distributed computation and reconfiguration in actively dynamic networksDistributed computation and reconfiguration in actively dynamic networks
Distributed computation and reconfiguration in actively dynamic networksPeter Kos
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Clipping & Rasterization
Clipping & RasterizationClipping & Rasterization
Clipping & RasterizationAhmed Daoud
 

Ähnlich wie Bump Hunting in the Dark - ICDE15 presentation (20)

MATLABgraphPlotting.pptx
MATLABgraphPlotting.pptxMATLABgraphPlotting.pptx
MATLABgraphPlotting.pptx
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Graphical Model Selection for Big Data
Graphical Model Selection for Big DataGraphical Model Selection for Big Data
Graphical Model Selection for Big Data
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting class
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
raster algorithm.pdf
raster algorithm.pdfraster algorithm.pdf
raster algorithm.pdf
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Optimisation random graph presentation
Optimisation random graph presentationOptimisation random graph presentation
Optimisation random graph presentation
 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesEfficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptx
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Distributed computation and reconfiguration in actively dynamic networks
Distributed computation and reconfiguration in actively dynamic networksDistributed computation and reconfiguration in actively dynamic networks
Distributed computation and reconfiguration in actively dynamic networks
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Lecture12
Lecture12Lecture12
Lecture12
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Clipping & Rasterization
Clipping & RasterizationClipping & Rasterization
Clipping & Rasterization
 

Mehr von Michael Mathioudakis

Measuring polarization on social media
Measuring polarization on social mediaMeasuring polarization on social media
Measuring polarization on social mediaMichael Mathioudakis
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systemsMichael Mathioudakis
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systemsMichael Mathioudakis
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Michael Mathioudakis
 
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Michael Mathioudakis
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Michael Mathioudakis
 
Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020Michael Mathioudakis
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Michael Mathioudakis
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMichael Mathioudakis
 

Mehr von Michael Mathioudakis (10)

Measuring polarization on social media
Measuring polarization on social mediaMeasuring polarization on social media
Measuring polarization on social media
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systems
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systems
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
 
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
 
Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
 
Absorbing Random Walk Centrality
Absorbing Random Walk CentralityAbsorbing Random Walk Centrality
Absorbing Random Walk Centrality
 

Kürzlich hochgeladen

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 

Kürzlich hochgeladen (20)

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 

Bump Hunting in the Dark - ICDE15 presentation

  • 1. Bump  Hun(ng  in  the  Dark   Local  Discrepancy  Maximiza(on  on  Graphs   Aris(des  Gionis   Michael  Mathioudakis   An>  Ukkonen   April  16th,  2015   Seoul,  Korea   ICDE  2015  
  • 2. Find the Bump! Here it is! regular  nodes   2   undirected  graph   computer  network   online  social  network   etc.   special  nodes   failure  reported  /  virus  detected!   posted  message  about  topic  X    
  • 3. for  α  =  1   score  =  18  –  3  =  15   score  =  1  –  0  =  1   score  =  0  –  1  =  -­‐1   graph,  special  &  non-­‐special  nodes     find  connected  subgraph  with     max  linear  discrepancy   score    =    α  x  #special  -­‐  #non-­‐special   3  
  • 4. fixed  frac(on  of  special  nodes   subgraph  size  ì    score  ì fixed  subgraph  size   frac(on  of  special  nodes  ì    score  ì   graph,  special  &  non-­‐special  nodes     find  connected  subgraph  with     max  linear  discrepancy   score    =    α  x  #special  -­‐  #non-­‐special   4  
  • 5. local  access   special  nodes  are  provided  as  input   build  graph  via  get-­‐neighbors  func(on     how  much  of  the  graph  do  we  need?   (infinite  graph?)   5  
  • 6. Approach     first   retrieve  a  minimal  part  of  the  graph   necessary  when  we  have  only  local  access   use  get-neighbors func(on  to   expand  around  the  special  nodes     then   solve  problem  on  retrieved  subgraph   6  
  • 7. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   7  
  • 8. Unrestricted  Access   the  problem  is…   NP-­‐hard   (reduc(on  from  SETCOVER)   8  
  • 9. Unrestricted  Access   the  problem  is…   a  special  case  of  PRIZECOLLECTINGSTEINERTREE     input:  graph  posi(ve  weight  on  terminal  nodes          non-­‐nega(ve  weight  on  edges   output:  tree  min  Σ(missed  terminal  node  weights)  +  Σ  (edge  weights)   9   terminal  nodes   special  nodes:  weight  =  α+1   edges:    weight  =  1   max  discrepancy          is  instance  of          min  PCST  weight   GoemansWilliamson  algorithm,  O(|V||E|)  &  O(|V|2log|V|)   (2  -­‐  1/(n  -­‐  1))-­‐approxima(on   min-­‐approxima(on  does  not  translate  to  our  max-­‐problem  
  • 10. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   10  
  • 11. Special  Case   graph  is  a  tree   op(mal  linear  recursive  algorithm   op(mal  in  graph   op(mal  with  root   op(mal  with  root   of  subtree   or   11   op(mal  without  root   (in  subtree)  
  • 12. Algorithms   idea  for  general  case  heuris(cs   generate  spanning  tree  for  graph   solve  problem  on  spanning  tree     BFS-­‐trees  from  each  special  node   min  weight  spanning  tree   random  weights     w(u,v) in [0,1] ‘smart’  weights   w(u,v) = 2 - 1{u is special} - 1{v is special} GW  for  PCST   12  
  • 13. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   13  
  • 14. Local  Access   start  with  special  nodes   call  get-neighbors(node)to  expand   retrieve  en(re  graph?     expansion  strategies   full  expansion   return  a  subgraph  that  contains  the  op(mal  solu(on   oblivious  expansion,  adap(ve  expansion     14  
  • 15. Full  Expansion   retrieve  graph  that  contains  op(mal  solu(on   with  calls  to    get-neighbors()   maintain  fron(er  to  expand  in  each  itera(on     fron(er  condi(on   unexpanded  nodes  for  which   min distance from reachable special node less than (α+1) x #reachable_special_nodes 1.  fron(er  =  special  nodes   2.  call  get-­‐neighbors()  on  fron(er  nodes   3.  update  fron(er   4.  go  to  .2.  if  fron(er  not  empty   ini(ally:   loop:   15  
  • 16. Full  Expansion       expensive  in  prac(ce   we  look  for  heuris(cs   16  
  • 17. Oblivious  Expansion   expand  (α+1)  (mes  around  each  special  node   not  guaranteed  to  cover  op(mal  solu(on      a   b 17   retrieved  by  ObliviousExpansion   addi(onally  retrieved  by  Full  Expansion   α  =  1   special  node   solution solution op(mal  
  • 18. Adap(ve  Expansion     idea  :  while  expanding,  es(mate  heurisHcally  how  close  we  are  to  op(mal   for  each  component,  solve  problem  on  spanning  trees  rooted  at  fron(er  nodes   obtain  solu(ons  with  and  w/o  roots   maintain  heuris(c   upper  bound  of  op(mal  solu(on  in  en(re  graph   lower  bound  of  op(mal  solu(on  from  retrieved  graph   terminate  when  upper  bound    ≅    lower  bound     18   connected  components   of  retrieved  graph   tree  of   posi(ve   discrepancy   tree  of   nega(ve   discrepancy  
  • 19. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   19  
  • 20. Datasets   Table I DATASET STATISTICS (NUMBERS ARE ROUNDED). Dataset |V | |E| Geo 1 · 106 4 · 106 BA 1 · 106 10 · 106 Grid 4 · 106 8 · 106 Livejournal 4.3 · 106 69 · 106 Patents 2 · 106 16.5 · 106 Pokec 1.4 · 106 30.6 · 106 All graphs used in the experiments are undirected and their sizes are reported in Table I. N s q s g g i F synthe(c   real   20  
  • 21. Input   graphs  from  previous  datasets     special  nodes:  planted  spheres   k  spheres   radius  ρ   s  special  nodes  inside  spheres   z  special  nodes  outside  spheres     21  
  • 22. Algorithms     expansion   Full  (...not!)   Oblivious   Adap(ve     op(miza(on   BFS   random-­‐ST   smart-­‐ST   PCST     measure   cost  (get-neighbors  calls)   graph  size           running  (me  vs  graph  size   accuracy  (Jaccard  coeff)   discrepancy   22  
  • 23. Expansion   Table II EXPANSION TABLE (AVERAGES OF 20 RUNS) ObliviousExp. AdaptiveExp. dataset s k cost size cost size Grid 20 2 302 888 2783 7950 Grid 60 1 261 784 534 1604 Geo 20 2 452 2578 4833 30883 Geo 60 1 418 2452 578 3991 BA 20 2 3943 243227 114 6032 BA 60 1 4477 271870 135 7407 Patents 20 2 605 3076 13436 25544 Patents 60 1 620 3126 5907 13009 Pokec 20 2 3884 217592 161 7249 Pokec 60 1 4343 240544 116 5146 Livejournal 20 2 3703 348933 234 13540 Livejournal 60 1 4667 394023 129 7087 generate Q, and in interest of presentation, here we report what we consider to be representative results. Table II shows the cost (number of API calls) as well as the size (number of edges) of the retrieved graph G . 1e+01 1e-02 1e-01 1e+00 1e+01 1e+02 Figure 6. Running times of size (number of edges). We                                                                                                                                                                               23  
  • 24. Op(miza(on   10 ES OF 20 RUNS) Exp. AdaptiveExp. size cost size 888 2783 7950 784 534 1604 2578 4833 30883 2452 578 3991 3227 114 6032 1870 135 7407 3076 13436 25544 3126 5907 13009 7592 161 7249 0544 116 5146 8933 234 13540 4023 129 7087 sentation, here we report ve results. r of API calls) as well the retrieved graph GX . s that for Grid, Geo, on results in fewer API Running time (in sec) 1e+01 1e+03 1e+05 1e-02 1e-01 1e+00 1e+01 1e+02 expansion size (#edges) BFS-Tree Random-ST PCST-Tree Smart-ST Figure 6. Running times of the different algorithms as a function of expansion size (number of edges). We can see that in comparison to PCST-Tree Smart- ST scales to inputs that are up to two orders of magnitude larger. 24  
  • 25. Op(miza(on   11 Table III ACCURACY, AVERAGES OF 20 RUNS ObliviousExpansion AdaptiveExpansion dataset s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST Grid 20 2 0.88 (0) 0.81 (0) 0.93 (0) 0.93 (0) 0.88 (0) 0.85 (0) 0.93 (0) 0.93 (0) Grid 60 1 1.00 (0) 0.94 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 1.00 (0) 1.00 (0) Geo 20 2 1.00 (0) 0.95 (0) 1.00 (0) 1.00 (0) 1.00 (0) 0.98 (0) 1.00 (0) 1.00 (0) Geo 60 1 1.00 (0) 0.96 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 0.99 (0) 0.99 (0) BA 20 2 0.47 (12) 0.18 (12) NaN (20) 0.46 (0) 0.46 (0) 0.44 (0) 0.46 (0) 0.45 (0) BA 60 1 NaN (20) NaN (20) NaN (20) 0.77 (0) 0.76 (0) 0.76 (0) 0.77 (3) 0.76 (0) Patents 20 2 0.92 (0) 0.86 (0) 0.91 (0) 0.90 (0) 0.72 (0) 0.74 (0) 0.77 (3) 0.74 (0) Patents 60 1 0.89 (0) 0.76 (0) 0.89 (0) 0.89 (0) 0.74 (0) 0.73 (0) 0.74 (0) 0.74 (0) Pokec 20 2 0.53 (2) 0.13 (3) NaN (20) 0.46 (0) 0.43 (0) 0.41 (0) 0.42 (2) 0.40 (0) Pokec 60 1 0.74 (6) 0.09 (6) NaN (20) 0.61 (0) 0.48 (0) 0.46 (0) 0.45 (1) 0.45 (0) Livejournal 20 2 0.62 (5) 0.19 (5) NaN (20) 0.54 (0) 0.56 (0) 0.53 (0) 0.58 (5) 0.56 (0) Livejournal 60 1 0.88 (12) 0.26 (9) NaN (20) 0.68 (0) 0.65 (0) 0.62 (0) 0.62 (1) 0.62 (0) Table IV DISCREPANCY, AVERAGES OF 20 RUNS ObliviousExpansion AdaptiveExpansion dataset s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST Grid 20 2 14.5 (0) 11.8 (0) 16.8 (0) 16.7 (0) 14.8 (0) 13.8 (0) 16.4 (0) 16.3 (0) Grid 60 1 41.0 (0) 36.9 (0) 41.0 (0) 41.0 (0) 40.5 (0) 38.9 (0) 40.9 (0) 40.9 (0) Geo 20 2 19.9 (0) 18.4 (0) 20.0 (0) 20.0 (0) 19.9 (0) 19.2 (0) 20.0 (0) 20.0 (0) Geo 60 1 22.0 (0) 20.6 (0) 22.0 (0) 22.0 (0) 21.8 (0) 21.6 (0) 21.8 (0) 21.8 (0) BA 20 2 15.0 (12) 2.8 (12) NaN (20) 15.2 (0) 15.6 (0) 14.4 (0) 14.4 (0) 15.0 (0) BA 60 1 NaN (20) NaN (20) NaN (20) 36.1 (0) 37.4 (0) 35.3 (0) 35.9 (3) 35.5 (0) Patents 20 2 17.4 (0) 15.8 (0) 17.7 (0) 17.6 (0) 14.9 (0) 13.8 (0) 15.8 (3) 14.8 (0) Patents 60 1 40.0 (0) 31.1 (0) 40.8 (0) 40.6 (0) 33.0 (0) 32.2 (0) 33.2 (0) 33.3 (0) Pokec 20 2 11.6 (2) 2.6 (3) NaN (20) 11.8 (0) 8.6 (0) 8.0 (0) 8.2 (2) 8.0 (0) Pokec 60 1 36.6 (6) 4.7 (6) NaN (20) 28.6 (0) 20.9 (0) 17.4 (0) 18.3 (1) 18.5 (0) Livejournal 20 2 14.3 (5) 3.5 (5) NaN (20) 13.8 (0) 11.8 (0) 9.8 (0) 10.8 (5) 10.2 (0) Livejournal 60 1 45.6 (12) 12.0 (9) NaN (20) 31.1 (0) 29.8 (0) 25.6 (0) 26.8 (1) 27.4 (0) ObliviousExpansion for denser graphs. We also note that once a subgraph has been discovered in the 25  
  • 26. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   26  
  • 27. Future  Work     Approxima(on  Guarantee  /  Tighter  expansions   Distributed  Se>ng   Unknown  Scale   Applica(on  on  Real  Data   (TKDE  Extension)   27