SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
COMPUTING LOCAL
AND GLOBAL
CENTRALITY
DAVID F. GLEICH (AND MANY OTHERS)!
DATA MINING, NETWORKS AND DYNAMICS
2011 NOVEMBER 7




                                      1
LOCAL
                      GLOBAL


   Pooya
Esfandiar
                                      Reid
                      Francesco                 Andersen
                      Bonchi




    Chen                             Vahab
    Greif
                          Mirrokni

                      Laks V.S.
                      Lakshmanan



 Byung-




                                                            2/41
 Won On
Graph centrality

Global
How important is a
node? 

Local
How important is a
node with respect
to another one?




                     3/41
Graph centrality

Koschützki et al.
must respect
isomorphism

higher is better

Examples
node-degree
1/shortest-path




                     4/41
Graph centrality
                This talk
                
                Path summation
               X
               
 f (paths of length `)
                 `


local Katz score
X                 number of paths of
        ↵` ·
               length ` between i and j
    `




                                          5/41
A – adjacency matrix
L – Laplacian matrix
P – random walk transition matrix

Katz score
      Ki,j = [(I ↵AT ) 1 ]i,j
                                                
Commute time

    Ci,j = vol(G)(L+ + L+
                    i,i  j,j                        2L+ )
                                                      i,j
PageRank
     (I ↵P T )x = (1 ↵)e/n
                      
     Xi,j = (1 ↵)[(I ↵P T ) 1 ]i,j




                                                            6/41
USES FOR CENTRALITY

Ranking features for web-search/classification
    Najork, M. A.; Zaragoza, H. & Taylor, M. J.#
    HITS on the web: How does it compare? 
    Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R.
    & Leonardi, S. Link analysis for Web spam detection 

Interesting nodes
    GeneRank, ProteinRank, TwitterRank, IsoRank,
    FutureRank, HostRank, DiffusionRank, ItemRank,
    SocialPageRank, SimRank




                                                               7/41
USES FOR CENTRALITY

Ranking networks of comparisons.
    Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings,
    K. E. Sensitivity and Stability of Ranking Vectors 

Clustering or community detection
    Andersen, R.; Chung, F. & Lang, K.#
    Local Graph Partitioning using PageRank Vectors 

Link prediction
    Savas et al. Hold on about 90 minutes 




                                                                 8/41
THESE GET USED
  A LOT. THEY
 MUST BE FAST.


                  9
MATRICES, MOMENTS, QUADRATURE

Estimate a quadratic form

                                     T

                         l  x f (Z )x  u
                                      T    +

                  (ei          ej ) L (ei               ej )     Commute


1                    T                     1
  (ei + ej )T (I   ↵P )   1
                              (ei + ej )     (ei   ej )T (I   ↵P T )   1
                                                                           (ei   ej )   Katz
4

                                          4

Also used by Benzi and Bonito (LAA) for Katz
scores and the matrix exponential




                                                                                           10/41
MMQ - THE BIG IDEA
Quadratic form                                                         Think                              
          


Weighted sum                                                           A is s.p.d. use EVD

          


Stieltjes integral                                                     “A tautology”

          


Quadrature approximation                                                              
            

Matrix equation                                                        Lanczos
David F. Gleich (Purdue)       Univ. Chicago SSCS Seminar                                         22 of 47




                                                                                                             11/41
MMQ PROCEDURE
Goal                                    
Given                                    

1. Run k-steps of Lanczos on       starting with      
2. Compute          ,       with an additional eigenvalue at       ,
        set                                                         Correspond to a Gauss-Radau rule, with
                                                                    u as a prescribed node
3. Compute       ,       with an additional eigenvalue at    , set
                                                                    Correspond to a Gauss-Radau rule, with
                                                                    l as a prescribed node
4. Output                      as lower and upper bounds on      




                                                                                                                        12/41
David F. Gleich (Purdue)                          Univ. Chicago SSCS Seminar                                 25 of 47
How well does it work?
                Bounds
                                     Error
          arxiv, Katz, hard alpha                      arxiv, Katz, hard
50
                                          0
                                       10



 0
                                          -5
                                       10



-50                                             5     10     15    20    25   30
      5     10     15    20    25    30             matrix-vector products
          matrix-vector products




                                                                                   13/41
                              ������ = 1/( || A ||2 + 1 )
MY COMPLAINTS


Matvecs are expensive.

Takes many iterations.

Just one score comes out!






                             14/41
Katz scores
ATZ               SCORES ARE LOCALIZED
                       T
                  (I ↵A )k = e i    are highly
                                     localized.
                                                      Up to 50 neighbors is
                                                      99.65% of the total
                                                      mass




                                                                                     15/41
Gleich (Purdue)          Univ. Chicago SSCS Seminar                       32 of 47
HOW CAN WE
EXPLOIT THIS?


                 16
TOP-K ALGORITHM FOR KATZ

Approximate      
                                     T
                                                          
where       is sparse

Keep       sparse too
Ideally, don’t “touch” all of      




                                                                17/41
David F. Gleich (Purdue)           Univ. Chicago SSCS Seminar     34 of
TOP-K ALGORITHM FOR KATZ

Approximate      
                                     T
                                                          
where       is sparse

Keep       sparse too
Ideally, don’t “touch” all of      


                                            This is possible for "




                                                                     18/41
David F. Gleich (Purdue)           Univ. Chicago SSCS Seminar          34 of

                                       personalized PageRank!
Richardson Ax = b
x(k+1) = x(k) + r(k)        A = AT , A ⌫ 0   Gradient descent 
r(k+1) = b Ax(k)              equivalent#    min xT Ax       2xT b
                                  to 
                                  

          What about coordinate descent?

Gauss-Southwell Ax = b
x(k+1) = x(k) + rj(k) ej                     How to
r(k+1) = r(k) + rj(k) Aej                    pick j? 

               Frequently “rediscovered” for PageRank.




                                                                     19/41
               McSherry (WWW2005), Berkhin (JIM 2007),
               Andersen-Chung-Lang (FOCS 2006)
DEMO!




         20
NEW CONVERGENCE THEORY

Katz and PageRank are equivalent if 
������ < 1 / || A ||1 

Gauss-Southwell converges when ������ < 1 / || A ||2 
(Luo and Tseng 1992) if j is picked as the largest
residual

Read all about it
Fast matrix computations for pair-wise and column-wise commute times and
Katz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. Internet
Mathematics (to appear)




                                                                           21/41
1,000,000 node, 100,000,000 edges
                                             hollywood, Katz, hard alpha

Precision@k for exact top−k sets    1

                                   0.8

                                   0.6

                                   0.4
                                                                         k=10
                                                                         k=100
                                   0.2                                   k=1000
                                                                         cg k=25
                                    0                                    k=25

                                           −2      −1       0        1         2




                                                                                   22/41
                                         10     10       10     10      10
                                          Equivalent matrix−vector products
OPEN QUESTIONS

I can’t find any existing derivation of this method
in the non-symmetric case (prior to the
PageRank literature). Any thoughts?

How to show that the method convergence for a
non-symmetric matrix when (I ↵P T ) is not
diagonally dominant?






                                                     23/41
OVERLAPPING
CLUSTERS FOR
DISTRIBUTED
CENTRALITY


               24
LARGE GRAPHS, IN PRACTICE
                      Copy 1
          Copy 2
                  src -> dst
      src -> dst
                  src -> dst
      src -> dst
                  src -> dst
      src -> dst

                         Copy 1
          Copy 2
                     src -> dst
      src -> dst
                     src -> dst
      src -> dst
                     src -> dst
      src -> dst

                            Copy 1
          Copy 2
                        src -> dst
      src -> dst
                        src -> dst
      src -> dst
                        src -> dst
      src -> dst



                   Edge lists maybe tied together by a




                                                         25/41
                   common host, stored redundantly on
                   many hard drives.
UTILIZE SOME
REDUNDANCY?
   To compute global PageRank?




                                  26
Overlapping
                         Clusters
                               Use the
                               redundancy to
                               reduce
                               communication
                               when solving a
                               PageRank problem


Overlapping clusters for distributed computation. #




                                                      27/41
Andersen, Gleich, Mirrokni, WSDM2012 (to appear).
Communication
avoiding
algorithms

Communication is the limiting
factor in most computations
these days. Flops are,
relatively speaking, free.




                                28/41
KEY POINTS

Utilize personalized PageRank vectors to find
the clusters with “good” conductance scores.

Define “core” vertices for each cluster. Find a
good way to cover the graph with these
clusters.

Use restricted additive Schwarz to solve #
(thanks Prof. Szyld and Frommer!)




                                                 29/41
All nodes solve locally using #
the coordinate descent method.




                                  30/41
All nodes solve locally using #
the coordinate descent method.




A core vertex for the




                                  31/41
gray cluster.
All nodes solve locally using #
    the coordinate descent method.




   Red sends residuals to white.
White send residuals to red.




                                      32/41
White then uses the coordinate
descent method to adjust its solution.




                                          33/41
Will cause communication to red/blue.
It works!
                 2
                                  Swapping Probability (usroads)
                                  PageRank Communication (usroads)
                                  Swapping Probability (web−Google)
                1.5
                                  PageRank Communication (web−Google)
Relative Work




                 1                                         Metis Partitioner




                0.5


                 0
                  1   1.1   1.2    1.3     1.4     1.5     1.6           1.7
                                   Volume Ratio

                            How much more of the




                                                                               34/41
                            graph we need to store.
PERSONALIZED PAGERANK CLUSTERS

Solve (I ↵P T )x = (1 ↵)ei
       #
to a large degree-weighted tolerance ������ 

Sweep over the vertices in order of their degree-
normalized rank. Find the best conductance set. 

A Cheeger-like inequality. (Not a heuristic.) 




                                                    35/41
CORE VERTICES

Compute the expected “leavetime” for each
vertex in a cluster. 

Keep increasing the threshold for a “good”
vertex until every vertex is core in some cluster.

Then approximate a set-cover problem to cover
the graph with clusters, and use a heuristic to
pack vertices until 




                                                      36/41
MY QUESTIONS "
and future directions

REVERSE ORDER




                         37
GRAPH SPECTRA




                                                38/41
                 Some work by Banerjee and Jost.

Weitere ähnliche Inhalte

Ähnlich wie Computing Local and Global Centrality

PhD_Thesis_slides.pdf
PhD_Thesis_slides.pdfPhD_Thesis_slides.pdf
PhD_Thesis_slides.pdfNiloyBiswas36
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPierre Jacob
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Yury Lifshits
 
On the convergence properties of the Wang-Landau algorithm
On the convergence properties of the Wang-Landau algorithmOn the convergence properties of the Wang-Landau algorithm
On the convergence properties of the Wang-Landau algorithmRobin Ryder
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
Analysis and design of algorithms part 3
Analysis and design of algorithms part 3Analysis and design of algorithms part 3
Analysis and design of algorithms part 3Deepak John
 
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...David Gleich
 
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Michail Argyriou
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological ApproachDon Sheehy
 
Graph Kernels for Chemical Informatics
Graph Kernels for Chemical InformaticsGraph Kernels for Chemical Informatics
Graph Kernels for Chemical InformaticsMukund Raj
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via MeshingDon Sheehy
 

Ähnlich wie Computing Local and Global Centrality (20)

PhD_Thesis_slides.pdf
PhD_Thesis_slides.pdfPhD_Thesis_slides.pdf
PhD_Thesis_slides.pdf
 
Kent_2007
Kent_2007Kent_2007
Kent_2007
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Slides4
Slides4Slides4
Slides4
 
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
Ryder
RyderRyder
Ryder
 
On the convergence properties of the Wang-Landau algorithm
On the convergence properties of the Wang-Landau algorithmOn the convergence properties of the Wang-Landau algorithm
On the convergence properties of the Wang-Landau algorithm
 
Surveys
SurveysSurveys
Surveys
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
Analysis and design of algorithms part 3
Analysis and design of algorithms part 3Analysis and design of algorithms part 3
Analysis and design of algorithms part 3
 
Chapter 23 aoa
Chapter 23 aoaChapter 23 aoa
Chapter 23 aoa
 
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
 
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological Approach
 
Graph Kernels for Chemical Informatics
Graph Kernels for Chemical InformaticsGraph Kernels for Chemical Informatics
Graph Kernels for Chemical Informatics
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 

Mehr von David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 

Mehr von David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 

Kürzlich hochgeladen

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Kürzlich hochgeladen (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Computing Local and Global Centrality

  • 1. COMPUTING LOCAL AND GLOBAL CENTRALITY DAVID F. GLEICH (AND MANY OTHERS)! DATA MINING, NETWORKS AND DYNAMICS 2011 NOVEMBER 7 1
  • 2. LOCAL GLOBAL Pooya Esfandiar Reid Francesco Andersen Bonchi Chen Vahab Greif Mirrokni Laks V.S. Lakshmanan Byung- 2/41 Won On
  • 3. Graph centrality Global How important is a node? Local How important is a node with respect to another one? 3/41
  • 4. Graph centrality Koschützki et al. must respect isomorphism higher is better Examples node-degree 1/shortest-path 4/41
  • 5. Graph centrality This talk Path summation X f (paths of length `) ` local Katz score X number of paths of ↵` · length ` between i and j ` 5/41
  • 6. A – adjacency matrix L – Laplacian matrix P – random walk transition matrix Katz score Ki,j = [(I ↵AT ) 1 ]i,j                                                  Commute time Ci,j = vol(G)(L+ + L+ i,i j,j 2L+ ) i,j PageRank (I ↵P T )x = (1 ↵)e/n                       Xi,j = (1 ↵)[(I ↵P T ) 1 ]i,j 6/41
  • 7. USES FOR CENTRALITY Ranking features for web-search/classification Najork, M. A.; Zaragoza, H. & Taylor, M. J.# HITS on the web: How does it compare? Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R. & Leonardi, S. Link analysis for Web spam detection Interesting nodes GeneRank, ProteinRank, TwitterRank, IsoRank, FutureRank, HostRank, DiffusionRank, ItemRank, SocialPageRank, SimRank 7/41
  • 8. USES FOR CENTRALITY Ranking networks of comparisons. Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings, K. E. Sensitivity and Stability of Ranking Vectors Clustering or community detection Andersen, R.; Chung, F. & Lang, K.# Local Graph Partitioning using PageRank Vectors Link prediction Savas et al. Hold on about 90 minutes 8/41
  • 9. THESE GET USED A LOT. THEY MUST BE FAST. 9
  • 10. MATRICES, MOMENTS, QUADRATURE Estimate a quadratic form T l  x f (Z )x  u T + (ei ej ) L (ei ej ) Commute 1 T 1 (ei + ej )T (I ↵P ) 1 (ei + ej ) (ei ej )T (I ↵P T ) 1 (ei ej ) Katz 4 4 Also used by Benzi and Bonito (LAA) for Katz scores and the matrix exponential 10/41
  • 11. MMQ - THE BIG IDEA Quadratic form                         Think                                     Weighted sum                            A is s.p.d. use EVD       Stieltjes integral                            “A tautology”       Quadrature approximation                                  Matrix equation                      Lanczos David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 22 of 47 11/41
  • 12. MMQ PROCEDURE Goal                                     Given                                     1. Run k-steps of Lanczos on       starting with       2. Compute          ,       with an additional eigenvalue at       , set                               Correspond to a Gauss-Radau rule, with u as a prescribed node 3. Compute       ,       with an additional eigenvalue at    , set                            Correspond to a Gauss-Radau rule, with l as a prescribed node 4. Output                      as lower and upper bounds on       12/41 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 25 of 47
  • 13. How well does it work? Bounds Error arxiv, Katz, hard alpha arxiv, Katz, hard 50 0 10 0 -5 10 -50 5 10 15 20 25 30 5 10 15 20 25 30 matrix-vector products matrix-vector products 13/41 ������ = 1/( || A ||2 + 1 )
  • 14. MY COMPLAINTS Matvecs are expensive. Takes many iterations. Just one score comes out! 14/41
  • 15. Katz scores ATZ SCORES ARE LOCALIZED T (I ↵A )k = e i are highly localized. Up to 50 neighbors is 99.65% of the total mass 15/41 Gleich (Purdue) Univ. Chicago SSCS Seminar 32 of 47
  • 16. HOW CAN WE EXPLOIT THIS? 16
  • 17. TOP-K ALGORITHM FOR KATZ Approximate       T                                           where       is sparse Keep       sparse too Ideally, don’t “touch” all of       17/41 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of
  • 18. TOP-K ALGORITHM FOR KATZ Approximate       T                                           where       is sparse Keep       sparse too Ideally, don’t “touch” all of       This is possible for " 18/41 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of personalized PageRank!
  • 19. Richardson Ax = b x(k+1) = x(k) + r(k) A = AT , A ⌫ 0 Gradient descent r(k+1) = b Ax(k) equivalent# min xT Ax 2xT b to What about coordinate descent? Gauss-Southwell Ax = b x(k+1) = x(k) + rj(k) ej How to r(k+1) = r(k) + rj(k) Aej pick j? Frequently “rediscovered” for PageRank. 19/41 McSherry (WWW2005), Berkhin (JIM 2007), Andersen-Chung-Lang (FOCS 2006)
  • 20. DEMO! 20
  • 21. NEW CONVERGENCE THEORY Katz and PageRank are equivalent if ������ < 1 / || A ||1 Gauss-Southwell converges when ������ < 1 / || A ||2 (Luo and Tseng 1992) if j is picked as the largest residual Read all about it Fast matrix computations for pair-wise and column-wise commute times and Katz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. Internet Mathematics (to appear) 21/41
  • 22. 1,000,000 node, 100,000,000 edges hollywood, Katz, hard alpha Precision@k for exact top−k sets 1 0.8 0.6 0.4 k=10 k=100 0.2 k=1000 cg k=25 0 k=25 −2 −1 0 1 2 22/41 10 10 10 10 10 Equivalent matrix−vector products
  • 23. OPEN QUESTIONS I can’t find any existing derivation of this method in the non-symmetric case (prior to the PageRank literature). Any thoughts? How to show that the method convergence for a non-symmetric matrix when (I ↵P T ) is not diagonally dominant? 23/41
  • 25. LARGE GRAPHS, IN PRACTICE Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Edge lists maybe tied together by a 25/41 common host, stored redundantly on many hard drives.
  • 26. UTILIZE SOME REDUNDANCY? To compute global PageRank? 26
  • 27. Overlapping Clusters Use the redundancy to reduce communication when solving a PageRank problem Overlapping clusters for distributed computation. # 27/41 Andersen, Gleich, Mirrokni, WSDM2012 (to appear).
  • 28. Communication avoiding algorithms Communication is the limiting factor in most computations these days. Flops are, relatively speaking, free. 28/41
  • 29. KEY POINTS Utilize personalized PageRank vectors to find the clusters with “good” conductance scores. Define “core” vertices for each cluster. Find a good way to cover the graph with these clusters. Use restricted additive Schwarz to solve # (thanks Prof. Szyld and Frommer!) 29/41
  • 30. All nodes solve locally using # the coordinate descent method. 30/41
  • 31. All nodes solve locally using # the coordinate descent method. A core vertex for the 31/41 gray cluster.
  • 32. All nodes solve locally using # the coordinate descent method. Red sends residuals to white. White send residuals to red. 32/41
  • 33. White then uses the coordinate descent method to adjust its solution. 33/41 Will cause communication to red/blue.
  • 34. It works! 2 Swapping Probability (usroads) PageRank Communication (usroads) Swapping Probability (web−Google) 1.5 PageRank Communication (web−Google) Relative Work 1 Metis Partitioner 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Volume Ratio How much more of the 34/41 graph we need to store.
  • 35. PERSONALIZED PAGERANK CLUSTERS Solve (I ↵P T )x = (1 ↵)ei # to a large degree-weighted tolerance ������ Sweep over the vertices in order of their degree- normalized rank. Find the best conductance set. A Cheeger-like inequality. (Not a heuristic.) 35/41
  • 36. CORE VERTICES Compute the expected “leavetime” for each vertex in a cluster. Keep increasing the threshold for a “good” vertex until every vertex is core in some cluster. Then approximate a set-cover problem to cover the graph with clusters, and use a heuristic to pack vertices until 36/41
  • 37. MY QUESTIONS " and future directions REVERSE ORDER 37
  • 38. GRAPH SPECTRA 38/41 Some work by Banerjee and Jost.