Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Recent Advances in Kernel-Based Graph Classification

276 Aufrufe

Veröffentlicht am

We review our recent progress in the development of graph kernels. We discuss the hash graph kernel framework, which makes the computation of kernels for graphs with vertices and edges annotated with real-valued information feasible for large data sets. Moreover, we summarize our general investigation of the benefits of explicit graph feature maps in comparison to using the kernel trick. Our experimental studies on real-world data sets suggest that explicit feature maps often provide sufficient classification accuracy while being computed more efficiently. Finally, we describe how to construct valid kernels from optimal assignments to obtain new expressive graph kernels. These make use of the kernel trick to establish one-to-one correspondences. We conclude by a discussion of our results and their implication for the future development of graph kernels.

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

Recent Advances in Kernel-Based Graph Classification

  1. 1. Recent Advances in Kernel-Based Graph Classification ECML PKDD 2017, Nectar Track Nils Kriege, Christopher Morris June 20, 2017 TU Dortmund University, Algorithm Engineering Group
  2. 2. Motivation Question How similar are two graphs? (a) Sildenafil (b) Vardenafil 1
  3. 3. High-level View: Supervised Graph Classification ⊆ H φ 2
  4. 4. Primer on Graph Kernels Question How similar are two graphs? Definition (Graph Kernel) Let 𝒢 be a non-empty set of graphs and let k: 𝒢 × 𝒢 → R. Then k is a graph kernel if there is a real Hilbert space ℋ and a feature map 𝜑: 𝒢 → ℋ such that k(G, H) = ⟨𝜑(G), 𝜑(H)⟩. Explicit vs. Implicit Exp. (EX) Imp. (IM) G H Inner Product PSD function 𝜑(G) 𝜑(H) k(G, H) 3
  5. 5. Talk Structure 1 Explict vs. Implicit Graph Kernels, IEEE ICDM 2014 2 Fast Kernels for Graphs with Continuous Labels, IEEE ICDM 2016 3 Graph Kernels Based on Optimal Assignments, NIPS 2016 4 Outlook/What’s next? 4
  6. 6. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. N. Kriege, M. Neumann, K. Kersting, and P. Mutzel. “Explicit versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels”. In: IEEE International Conference on Data Mining. 2014, pp. 881–886 N. M. Kriege, M. Neumann, C. Morris, K. Kersting, and P. Mutzel. “A Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676 (2017). url: http://arxiv.org/abs/1703.00676 5
  7. 7. Part I: Explicit vs. Implicit Graph Kernels 𝜑 Cont. Labels Run time Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔 ) Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4 ) Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2 ) GraphHopper [Feragen et al., 2013] IM 𝒪(n2 m) Graphlet [Shervashidze et al., 2009] EX NSPDK [Costa et al., 2010] EX Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm) Propagation [Neumann et al., 2016] EX Implicit vs. Explicit • Implicit Kernels: do not scale, extendable to continuous labels • Explicit Kernels: do scale, only discrete labels 6
  8. 8. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. 7
  9. 9. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a finite-dimensional explicit mapping is possible 7
  10. 10. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a finite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels 7
  11. 11. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a finite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels • Weighted vertex kernels: derived approximate finite-dimensional explicit feature maps 7
  12. 12. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a finite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels • Weighted vertex kernels: derived approximate finite-dimensional explicit feature maps • Validated theoretical results in experimental study 7
  13. 13. Part I: Explicit vs. Implicit Graph Kernels implicit explicit 100 150 200 250 300 Data set size 0 1020 30 40 50 60 Label diversity 0 2 4 6 8 10 Runtime [s] Experimental Results Discrete Labels: explicit feature maps outperform implicit kernels (for most kernels and benchmark data sets) 8
  14. 14. Part I: Explicit vs. Implicit Graph Kernels implicit explicit 100 150 200 250 300 Data set size 0 1020 30 40 50 60 Label diversity 0 2 4 6 8 10 Runtime [s] Experimental Results Discrete Labels: explicit feature maps outperform implicit kernels (for most kernels and benchmark data sets) Continuous Labels: approximation by explicit feature maps not competitive for complex kernels 8
  15. 15. Part II: Hash Graph Kernel Framework Challenge Design fast, explicit graph kernels that can handle continuous labels. C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel. “Faster Kernel for Graphs with Continuous Attributes via Hashing”. In: IEEE International Conference on Data Mining. 2016, pp. 1095–1100 [ 1.2 0.3 ] [ 9.1 0.9 ] [ 1.6 0.7 ] [ 5.2 1.0 ] [ 5.1 0.2 ] [ 1.0 0.2 ] 9
  16. 16. Part II: Hash Graph Kernel Framework Challenge Design fast, explicit graph kernels that can handle continuous labels. 𝜑 Cont. Labels Run time Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔 ) Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4 ) Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2 ) GraphHopper [Feragen et al., 2013] IM 𝒪(n2 m) Graphlet [Shervashidze et al., 2009] EX NSPDK [Costa et al., 2010] EX Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm) Propagation [Neumann et al., 2016] EX HGK Framework [Morris et al., 2016] EX Linear in BK 10
  17. 17. Part II: Hash Graph Kernel Framework (G, a) (G, l1) (G, l2) Hash φ(G, l1) φ(G, l2) 1/I[φ(G,l1),...,φ(G,lI)] Feat. Vectors (G, lI) φ(G, lI) 11
  18. 18. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! 12
  19. 19. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) 12
  20. 20. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds 12
  21. 21. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds • State-of-the-art classification accuracies but orders of magnitude faster than implicit kernels 12
  22. 22. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds • State-of-the-art classification accuracies but orders of magnitude faster than implicit kernels Question Is there no benefit from employing the kernel trick at all on the graph domain? 12
  23. 23. Part III: Positive-semidefinite Optimal Assignments Challenge Design valid graph kernel that is based on optimal assignments. X Y a a a b c a b b c c N. M. Kriege, Giscard. P.-L., and R. C. Wilson. “On Valid Optimal Assignment Kernels and Applications to Graph Classification”. In: Advances in Neural Information Processing Systems. 2016, pp. 1615–1623 13
  24. 24. Part III: Positive-semidefinite Optimal Assignments Intuition Optimal Assignments are a “natural” measure of similarity. Definition (Optimal Assignment Kernel) Let ℬ(X, Y) be the bijections between X, Y in [𝒮]n , the optimal assignment kernel on [𝒮]n is defined as Kk ℬ(X, Y) = max B∈ℬ(X,Y) W(B), where W(B) = ∑︁ (x,y)∈B k(x, y) and k is a base kernel on 𝒮. 14
  25. 25. Part III: Positive-semidefinite Optimal Assignments Previous Work: • Optimal assignment kernels for attributed molecular graphs [Fröhlich, Wegner, Sieker, Zell, 2005], ICML • The optimal assignment kernel is not positive definite [Vert, 2008], CoRR, abs/0801.4061 Problem Optimal assignments yield indefinite functions. 15
  26. 26. Part III: Positive-semidefinite Optimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. 16
  27. 27. Part III: Positive-semidefinite Optimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself 16
  28. 28. Part III: Positive-semidefinite Optimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD 16
  29. 29. Part III: Positive-semidefinite Optimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies 16
  30. 30. Part III: Positive-semidefinite Optimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels 16
  31. 31. Part III: Positive-semidefinite Optimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels • Linear time computation of optimal assignment kernels 16
  32. 32. Part III: Positive-semidefinite Optimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels • Linear time computation of optimal assignment kernels • Weisfeiler-Lehman optimal assignment kernels 16
  33. 33. Outlook/What’s next? Classical Graph Kernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 17
  34. 34. Outlook/What’s next? Classical Graph Kernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 Optimization Based Graph Feature Maps Classical: Feature Engineering Phase I Classifier Phase II 17
  35. 35. Outlook/What’s next? Classical Graph Kernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 Optimization Based Graph Feature Maps Classical: Feature Engineering Phase I Classifier Phase II End-to-End: Feature Engineering + Classifier Phase I + Phase II Optimize Parameters 17
  36. 36. Conclusion 1 Explicit vs. Implicit Kernels 2 Hash Graph Kernel Framework 3 Valid Kernels from Optimal Assignments Collection of Graph Classification Benchmarks graphkernels.cs.tu-dortmund.de 18
  37. 37. References I Kriege, N. M., Giscard. P.-L., and R. C. Wilson. “On Valid Optimal Assignment Kernels and Applications to Graph Classification”. In: Advances in Neural Information Processing Systems. 2016, pp. 1615–1623. Kriege, N. M. et al. “A Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676 (2017). url: http://arxiv.org/abs/1703.00676. Kriege, N. et al. “Explicit versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels”. In: IEEE International Conference on Data Mining. 2014, pp. 881–886. Morris, C., K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017. 19
  38. 38. References II Morris, C. et al. “Faster Kernel for Graphs with Continuous Attributes via Hashing”. In: IEEE International Conference on Data Mining. 2016, pp. 1095–1100. 20

×