Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
×

von

Nächste SlideShare
Weiter
Herunterladen, um offline zu lesen und im Vollbildmodus anzuzeigen.

1 Gefällt mir

Teilen

Recent Advances in Kernel-Based Graph Classification

We review our recent progress in the development of graph kernels. We discuss the hash graph kernel framework, which makes the computation of kernels for graphs with vertices and edges annotated with real-valued information feasible for large data sets. Moreover, we summarize our general investigation of the benefits of explicit graph feature maps in comparison to using the kernel trick. Our experimental studies on real-world data sets suggest that explicit feature maps often provide sufficient classification accuracy while being computed more efficiently. Finally, we describe how to construct valid kernels from optimal assignments to obtain new expressive graph kernels. These make use of the kernel trick to establish one-to-one correspondences. We conclude by a discussion of our results and their implication for the future development of graph kernels.

Alle anzeigen

Alle anzeigen

Recent Advances in Kernel-Based Graph Classification

1. 1. Recent Advances in Kernel-Based Graph Classiﬁcation ECML PKDD 2017, Nectar Track Nils Kriege, Christopher Morris June 20, 2017 TU Dortmund University, Algorithm Engineering Group
2. 2. Motivation Question How similar are two graphs? (a) Sildenaﬁl (b) Vardenaﬁl 1
3. 3. High-level View: Supervised Graph Classiﬁcation ⊆ H φ 2
4. 4. Primer on Graph Kernels Question How similar are two graphs? Deﬁnition (Graph Kernel) Let 𝒢 be a non-empty set of graphs and let k: 𝒢 × 𝒢 → R. Then k is a graph kernel if there is a real Hilbert space ℋ and a feature map 𝜑: 𝒢 → ℋ such that k(G, H) = ⟨𝜑(G), 𝜑(H)⟩. Explicit vs. Implicit Exp. (EX) Imp. (IM) G H Inner Product PSD function 𝜑(G) 𝜑(H) k(G, H) 3
5. 5. Talk Structure 1 Explict vs. Implicit Graph Kernels, IEEE ICDM 2014 2 Fast Kernels for Graphs with Continuous Labels, IEEE ICDM 2016 3 Graph Kernels Based on Optimal Assignments, NIPS 2016 4 Outlook/What’s next? 4
6. 6. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the beneﬁts of explicit and implicit graph kernels. N. Kriege, M. Neumann, K. Kersting, and P. Mutzel. “Explicit versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels”. In: IEEE International Conference on Data Mining. 2014, pp. 881–886 N. M. Kriege, M. Neumann, C. Morris, K. Kersting, and P. Mutzel. “A Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676 (2017). url: http://arxiv.org/abs/1703.00676 5
7. 7. Part I: Explicit vs. Implicit Graph Kernels 𝜑 Cont. Labels Run time Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔 ) Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4 ) Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2 ) GraphHopper [Feragen et al., 2013] IM 𝒪(n2 m) Graphlet [Shervashidze et al., 2009] EX NSPDK [Costa et al., 2010] EX Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm) Propagation [Neumann et al., 2016] EX Implicit vs. Explicit • Implicit Kernels: do not scale, extendable to continuous labels • Explicit Kernels: do scale, only discrete labels 6
8. 8. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the beneﬁts of explicit and implicit graph kernels. 7
9. 9. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the beneﬁts of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a ﬁnite-dimensional explicit mapping is possible 7
10. 10. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the beneﬁts of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a ﬁnite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels 7
11. 11. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the beneﬁts of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a ﬁnite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels • Weighted vertex kernels: derived approximate ﬁnite-dimensional explicit feature maps 7
12. 12. Part I: Explicit vs. Implicit Graph Kernels Challenge Investigate the beneﬁts of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a ﬁnite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels • Weighted vertex kernels: derived approximate ﬁnite-dimensional explicit feature maps • Validated theoretical results in experimental study 7
13. 13. Part I: Explicit vs. Implicit Graph Kernels implicit explicit 100 150 200 250 300 Data set size 0 1020 30 40 50 60 Label diversity 0 2 4 6 8 10 Runtime [s] Experimental Results Discrete Labels: explicit feature maps outperform implicit kernels (for most kernels and benchmark data sets) 8
14. 14. Part I: Explicit vs. Implicit Graph Kernels implicit explicit 100 150 200 250 300 Data set size 0 1020 30 40 50 60 Label diversity 0 2 4 6 8 10 Runtime [s] Experimental Results Discrete Labels: explicit feature maps outperform implicit kernels (for most kernels and benchmark data sets) Continuous Labels: approximation by explicit feature maps not competitive for complex kernels 8
15. 15. Part II: Hash Graph Kernel Framework Challenge Design fast, explicit graph kernels that can handle continuous labels. C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel. “Faster Kernel for Graphs with Continuous Attributes via Hashing”. In: IEEE International Conference on Data Mining. 2016, pp. 1095–1100 [ 1.2 0.3 ] [ 9.1 0.9 ] [ 1.6 0.7 ] [ 5.2 1.0 ] [ 5.1 0.2 ] [ 1.0 0.2 ] 9
16. 16. Part II: Hash Graph Kernel Framework Challenge Design fast, explicit graph kernels that can handle continuous labels. 𝜑 Cont. Labels Run time Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔 ) Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4 ) Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2 ) GraphHopper [Feragen et al., 2013] IM 𝒪(n2 m) Graphlet [Shervashidze et al., 2009] EX NSPDK [Costa et al., 2010] EX Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm) Propagation [Neumann et al., 2016] EX HGK Framework [Morris et al., 2016] EX Linear in BK 10
17. 17. Part II: Hash Graph Kernel Framework (G, a) (G, l1) (G, l2) Hash φ(G, l1) φ(G, l2) 1/I[φ(G,l1),...,φ(G,lI)] Feat. Vectors (G, lI) φ(G, lI) 11
18. 18. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! 12
19. 19. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) 12
20. 20. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds 12
21. 21. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds • State-of-the-art classiﬁcation accuracies but orders of magnitude faster than implicit kernels 12
22. 22. Part II: Hash Graph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds • State-of-the-art classiﬁcation accuracies but orders of magnitude faster than implicit kernels Question Is there no beneﬁt from employing the kernel trick at all on the graph domain? 12
23. 23. Part III: Positive-semideﬁnite Optimal Assignments Challenge Design valid graph kernel that is based on optimal assignments. X Y a a a b c a b b c c N. M. Kriege, Giscard. P.-L., and R. C. Wilson. “On Valid Optimal Assignment Kernels and Applications to Graph Classiﬁcation”. In: Advances in Neural Information Processing Systems. 2016, pp. 1615–1623 13
24. 24. Part III: Positive-semideﬁnite Optimal Assignments Intuition Optimal Assignments are a “natural” measure of similarity. Deﬁnition (Optimal Assignment Kernel) Let ℬ(X, Y) be the bijections between X, Y in [𝒮]n , the optimal assignment kernel on [𝒮]n is deﬁned as Kk ℬ(X, Y) = max B∈ℬ(X,Y) W(B), where W(B) = ∑︁ (x,y)∈B k(x, y) and k is a base kernel on 𝒮. 14
25. 25. Part III: Positive-semideﬁnite Optimal Assignments Previous Work: • Optimal assignment kernels for attributed molecular graphs [Fröhlich, Wegner, Sieker, Zell, 2005], ICML • The optimal assignment kernel is not positive deﬁnite [Vert, 2008], CoRR, abs/0801.4061 Problem Optimal assignments yield indeﬁnite functions. 15
26. 26. Part III: Positive-semideﬁnite Optimal Assignments Deﬁnition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. 16
27. 27. Part III: Positive-semideﬁnite Optimal Assignments Deﬁnition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself 16
28. 28. Part III: Positive-semideﬁnite Optimal Assignments Deﬁnition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD 16
29. 29. Part III: Positive-semideﬁnite Optimal Assignments Deﬁnition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies 16
30. 30. Part III: Positive-semideﬁnite Optimal Assignments Deﬁnition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels 16
31. 31. Part III: Positive-semideﬁnite Optimal Assignments Deﬁnition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels • Linear time computation of optimal assignment kernels 16
32. 32. Part III: Positive-semideﬁnite Optimal Assignments Deﬁnition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels • Linear time computation of optimal assignment kernels • Weisfeiler-Lehman optimal assignment kernels 16
33. 33. Outlook/What’s next? Classical Graph Kernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 17
34. 34. Outlook/What’s next? Classical Graph Kernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 Optimization Based Graph Feature Maps Classical: Feature Engineering Phase I Classiﬁer Phase II 17
35. 35. Outlook/What’s next? Classical Graph Kernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 Optimization Based Graph Feature Maps Classical: Feature Engineering Phase I Classiﬁer Phase II End-to-End: Feature Engineering + Classiﬁer Phase I + Phase II Optimize Parameters 17
36. 36. Conclusion 1 Explicit vs. Implicit Kernels 2 Hash Graph Kernel Framework 3 Valid Kernels from Optimal Assignments Collection of Graph Classiﬁcation Benchmarks graphkernels.cs.tu-dortmund.de 18
37. 37. References I Kriege, N. M., Giscard. P.-L., and R. C. Wilson. “On Valid Optimal Assignment Kernels and Applications to Graph Classiﬁcation”. In: Advances in Neural Information Processing Systems. 2016, pp. 1615–1623. Kriege, N. M. et al. “A Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676 (2017). url: http://arxiv.org/abs/1703.00676. Kriege, N. et al. “Explicit versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels”. In: IEEE International Conference on Data Mining. 2014, pp. 881–886. Morris, C., K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017. 19
38. 38. References II Morris, C. et al. “Faster Kernel for Graphs with Continuous Attributes via Hashing”. In: IEEE International Conference on Data Mining. 2016, pp. 1095–1100. 20
• hamedAnwar2

Mar. 20, 2020

We review our recent progress in the development of graph kernels. We discuss the hash graph kernel framework, which makes the computation of kernels for graphs with vertices and edges annotated with real-valued information feasible for large data sets. Moreover, we summarize our general investigation of the benefits of explicit graph feature maps in comparison to using the kernel trick. Our experimental studies on real-world data sets suggest that explicit feature maps often provide sufficient classification accuracy while being computed more efficiently. Finally, we describe how to construct valid kernels from optimal assignments to obtain new expressive graph kernels. These make use of the kernel trick to establish one-to-one correspondences. We conclude by a discussion of our results and their implication for the future development of graph kernels.

Aufrufe

Aufrufe insgesamt

429

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

0

9

Geteilt

0

Kommentare

0

Likes

1