Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
1. Weisfeiler and Leman Go Neural: Higher-order
Graph Neural Networks
Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric
Lenssen, Gaurav Rattan, Martin Grohe
November 12, 2018
TU Dortmund University,
RWTH Aachen University,
McGill University
6. Talk Structure
1 State-of-the-art methods for graph classification
2 Relationship between 1-WL kernel and Graph Neural Networks
3 Higher-order graph properties
4 Experimental results
3
7. Supervised Graph Classification: The State-of-the-Art
Kernel Methods
Find predefined substructures
and count them somehow:
• Shortest-paths or random
walks
• Motifs
• h-neighborhoods around
vertices
• Spectral Approaches
4
8. Supervised Graph Classification: The State-of-the-Art
Kernel Methods
Find predefined substructures
and count them somehow:
• Shortest-paths or random
walks
• Motifs
• h-neighborhoods around
vertices
• Spectral Approaches
Neural Methods
Parameterized neighborhood
aggregation function
f
(t)
v = 𝜎(W1 f
(t−1)
v + W2
∑︁
w∈N(v)
f
(t−1)
w )
and learn parameters W1 and W2
together with the parameters of
the classifier
4
9. Example: Weisfeiler-Lehman Subtree Kernel
Example (Weisfeiler-Lehman Subtree Kernel)
Graph kernel based on heuristic for graph isomorphism testing
Iteration: Two vertices get identical colors iff their colored
neighborhoods are identical
N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt.
“Weisfeiler-Lehman Graph Kernels”. In: JMLR 12 (2011), pp. 2539–2561
5
10. Example: Weisfeiler-Lehman Subtree Kernel
Example (Weisfeiler-Lehman Subtree Kernel)
Graph kernel based on heuristic for graph isomorphism testing
Iteration: Two vertices get identical colors iff their colored
neighborhoods are identical
𝜑(G1) = ( )
(a) G1
𝜑(G2) = ( )
(b) G2
N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt.
“Weisfeiler-Lehman Graph Kernels”. In: JMLR 12 (2011), pp. 2539–2561
5
11. Example: Weisfeiler-Lehman Subtree Kernel
Example (Weisfeiler-Lehman Subtree Kernel)
Graph kernel based on heuristic for graph isomorphism testing
Iteration: Two vertices get identical colors iff their colored
neighborhoods are identical
𝜑(G1) = (2, 2, 2, )
(a) G1
𝜑(G2) = (1, 1, 3, )
(b) G2
N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt.
“Weisfeiler-Lehman Graph Kernels”. In: JMLR 12 (2011), pp. 2539–2561
5
12. Example: Weisfeiler-Lehman Subtree Kernel
Example (Weisfeiler-Lehman Subtree Kernel)
Graph kernel based on heuristic for graph isomorphism testing
Iteration: Two vertices get identical colors iff their colored
neighborhoods are identical
𝜑(G1) = (2, 2, 2, 2, 2, 2, 0, 0)
(a) G1
𝜑(G2) = (1, 1, 3, 2, 0, 1, 1, 1)
(b) G2
N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt.
“Weisfeiler-Lehman Graph Kernels”. In: JMLR 12 (2011), pp. 2539–2561
5
13. Relationship between 1-WL and GNN
1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
6
14. Relationship between 1-WL and GNN
1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
General form of GNNs
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
6
15. Relationship between 1-WL and GNN
1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
General form of GNNs
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
Both methods aggregate colors/features of neighbors
6
16. Relationship between 1-WL and GNN
1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
General form of GNNs
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
Both methods aggregate colors/features of neighbors
Theorem (Informal)
GNNs cannot be more expressive than 1-WL in terms of
distinguishing non-isomorphic graphs.
6
17. Relationship between 1-WL and GNN
1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
General form of GNNs
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
7
18. Relationship between 1-WL and GNN
1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
General form of GNNs
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
Insight
GNNs are as powerful as 1-WL if f
W
(t)
1
merge and f
W
(t)
2
aggr are injective
Theorem (Informal)
There exists a GNN architecture and corresponding weights such
that it reaches an equivalent coloring as 1-WL.
7
19. Relationship between 1-WL and GNN
Theorem (Informal)
There exists a GNN architecture and corresponding weights such
that it reaches an equivalent coloring as 1-WL.
8
20. Relationship between 1-WL and GNN
Theorem (Informal)
There exists a GNN architecture and corresponding weights such
that it reaches an equivalent coloring as 1-WL.
1-WL GNN
∇
8
21. Relationship between 1-WL and GNN
Theorem (Informal)
There exists a GNN architecture and corresponding weights such
that it reaches an equivalent coloring as 1-WL.
1-WL GNN
∇
Take Away
GNNs have the same power as 1-WL in distinguishing
non-isomorphic graphs.
8
22. Relationship between 1-WL and GNN
Theorem (Informal)
There exists a GNN architecture and corresponding weights such
that it reaches an equivalent coloring as 1-WL.
1-WL GNN
∇
Take Away
GNNs have the same power as 1-WL in distinguishing
non-isomorphic graphs. Limits of 1-WL are well understood.
V. Arvind, J. Köbler, G. Rattan, and O. Verbitsky. “On the Power of Color
Refinement”. In: Symposium on Fundamentals of Computation
Theory. 2015, pp. 339–350
8
23. Limits of GNNs
Observation
GNNs cannot distinguish very basic graph properties, e.g.,
• Cycle-free vs. cyclic graphs
• Triangle counts
• Regular graphs
9
24. Limits of GNNs
Observation
GNNs cannot distinguish very basic graph properties, e.g.,
• Cycle-free vs. cyclic graphs
• Triangle counts
• Regular graphs
Observation
Higher-order graph properties play an important role for the
characterization of real-world networks.
9
27. Higher-order Graph Properties
Challenge
Incorporate more higher-order graph properties into Graph
Neural Networks.
1-WL GNN
k-WL k-GNN
∇
Global
Global
∇
Idea: k-WL
Color subgraphs instead of vertices, and define neighborhoods
between them.
10
30. k-dimensional Weisfeiler-Lehman
k-dimensional Weisfeiler-Lehman
• Colors vertex tuples from Vk
• Two tuples v, w are i-neighbors if vj = wj for all j ̸= i
v1 v2 v3
v4 v5 v6
Idea of the Algorithm
Initially Two tuples get the same color if the induced
subgraphs are isomorphic
11
31. k-dimensional Weisfeiler-Lehman
k-dimensional Weisfeiler-Lehman
• Colors vertex tuples from Vk
• Two tuples v, w are i-neighbors if vj = wj for all j ̸= i
v1 v2 v3
v4 v5 v6
Idea of the Algorithm
Initially Two tuples get the same color if the induced
subgraphs are isomorphic
Iteration Two tuples get same color iff they have the same
colored neighborhood
11
32. k-GNN
Idea
Derive k-dimensional Graph Neural Network
ft
S = 𝜎(W1ft−1
S + W2
∑︁
T∈N(S)
ft−1
T ),
where S is a subgraph of the input graph of size k.
12
33. k-GNN
Idea
Derive k-dimensional Graph Neural Network
ft
S = 𝜎(W1ft−1
S + W2
∑︁
T∈N(S)
ft−1
T ),
where S is a subgraph of the input graph of size k.
Challenges
• Scalability
• GPU memory consumption
12
35. Hierarchical k-GNN
Idea
Learn features for subgraphs in a hierarchical way
1-GNN
. . .
2-GNN
. . .
3-GNN
. . .
MLP
Pool
Pool
Pool
Learning higher-order graph properties
13
37. Experimental Results
U0 ZPVE H
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
Errornormalizedto1-GNN(lowerisbetter)
1-k-GNN
1-GNN
MPNN
DTNN
Figure 4: Regression (QM9 data set): Gain over 1-GNN baseline
15
38. Conclusion
1 Relationship between 1-WL kernel and Graph Neural Networks
• GNN are a differentiable version of 1-WL
• GNNs and 1-WL are equally powerful
2 Higher-order graph embeddings
• k-dimensional GNN
• Hierarchical Variant
3 Experimental results
• Good results for large datasets with continuous node labels
16
39. Conclusion
1 Relationship between 1-WL kernel and Graph Neural Networks
• GNN are a differentiable version of 1-WL
• GNNs and 1-WL are equally powerful
2 Higher-order graph embeddings
• k-dimensional GNN
• Hierarchical Variant
3 Experimental results
• Good results for large datasets with continuous node labels
Collection of graph classification benchmarks
graphkernels.cs.tu-dortmund.de
16