Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks

Weisfeiler and Leman Go Neural: Higher-order
Graph Neural Networks
Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric
Lenssen, Gaurav Rattan, Martin Grohe
November 12, 2018
TU Dortmund University,
RWTH Aachen University,
McGill University

Motivation
Question
How similar are two graphs?
(a) Sildenaﬁl (b) Vardenaﬁl
1

High-level View: Supervised Graph Classiﬁcation
2

High-level View: Supervised Graph Classiﬁcation
⊆ H
φ: G → H
2

Talk Structure
1 State-of-the-art methods for graph classiﬁcation
2 Relationship between 1-WL kernel and Graph Neural Networks
3 Higher-order graph properties
4 Experimental results
3

Supervised Graph Classiﬁcation: The State-of-the-Art
Kernel Methods
Find predeﬁned substructures
and count them somehow:
• Shortest-paths or random
walks
• Motifs
• h-neighborhoods around
vertices
• Spectral Approaches
4

Supervised Graph Classification: The State-of-the-Art
Kernel Methods
Find predefined substructures
and count them somehow:
• Shortest-paths or random
walks
• Motifs
• h-neighborhoods around
vertices
• Spectral Approaches
Neural Methods
Parameterized neighborhood
aggregation function
f
(t)
v = 𝜎(W1 f
(t−1)
v + W2
∑︁
w∈N(v)
f
(t−1)
w )
and learn parameters W1 and W2
together with the parameters of
the classifier
4

Example: Weisfeiler-Lehman Subtree Kernel
Example (Weisfeiler-Lehman Subtree Kernel)
Graph kernel based on heuristic for graph isomorphism testing
Iteration: Two vertices get identical colors iff their colored
neighborhoods are identical
N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt.
“Weisfeiler-Lehman Graph Kernels”. In: JMLR 12 (2011), pp. 2539–2561
5

𝜑(G1) = ( )
(a) G1
𝜑(G2) = ( )
(b) G2
5

𝜑(G1) = (2, 2, 2, )
(a) G1
𝜑(G2) = (1, 1, 3, )
(b) G2
5

𝜑(G1) = (2, 2, 2, 2, 2, 2, 0, 0)
(a) G1
𝜑(G2) = (1, 1, 3, 2, 0, 1, 1, 1)
(b) G2
5

Relationship between 1-WL and GNN
1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
6

1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
General form of GNNs
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
6

1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
Both methods aggregate colors/features of neighbors
6

1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
Both methods aggregate colors/features of neighbors
Theorem (Informal)
GNNs cannot be more expressive than 1-WL in terms of
distinguishing non-isomorphic graphs.
6

1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
7

1-WL coloring
c(t)
(v) = hash
(︁
c(t−1)
(v), {{c(t−1)
(w) | w ∈ N(v)}}
)︁
h(t)
(v) = f
W
(t)
1
merge
(︁
h(t−1)
(v), f
W
(t)
2
aggr
(︀
{{h(t−1)
(w) | w ∈ N(v)}}
)︀)︁
Insight
GNNs are as powerful as 1-WL if f
W
(t)
1
merge and f
W
(t)
2
aggr are injective
Theorem (Informal)
There exists a GNN architecture and corresponding weights such
that it reaches an equivalent coloring as 1-WL.
7

Theorem (Informal)
8

Theorem (Informal)
1-WL GNN
∇
8

Theorem (Informal)
1-WL GNN
∇
Take Away
GNNs have the same power as 1-WL in distinguishing
non-isomorphic graphs.
8

Theorem (Informal)
1-WL GNN
∇
Take Away
GNNs have the same power as 1-WL in distinguishing
non-isomorphic graphs. Limits of 1-WL are well understood.
V. Arvind, J. Köbler, G. Rattan, and O. Verbitsky. “On the Power of Color
Reﬁnement”. In: Symposium on Fundamentals of Computation
Theory. 2015, pp. 339–350
8

Limits of GNNs
Observation
GNNs cannot distinguish very basic graph properties, e.g.,
• Cycle-free vs. cyclic graphs
• Triangle counts
• Regular graphs
9

Limits of GNNs
Observation
GNNs cannot distinguish very basic graph properties, e.g.,
• Cycle-free vs. cyclic graphs
• Triangle counts
• Regular graphs
Observation
Higher-order graph properties play an important role for the
characterization of real-world networks.
9

Higher-order Graph Properties
Challenge
Incorporate more higher-order graph properties into Graph
Neural Networks.
10

Challenge
Neural Networks.
1-WL GNN
k-WL k-GNN
∇
Global
Global
∇
10

Challenge
Neural Networks.
1-WL GNN
k-WL k-GNN
∇
Global
Global
∇
Idea: k-WL
Color subgraphs instead of vertices, and deﬁne neighborhoods
between them.
10

k-dimensional Weisfeiler-Lehman
• Colors vertex tuples from Vk
• Two tuples v, w are i-neighbors if vj = wj for all j ̸= i
v1 v2 v3
v4 v5 v6
11

v1 v2 v3
v4 v5 v6
Idea of the Algorithm
Initially Two tuples get the same color if the induced
subgraphs are isomorphic
11

v1 v2 v3
v4 v5 v6
Idea of the Algorithm
Initially Two tuples get the same color if the induced
subgraphs are isomorphic
Iteration Two tuples get same color iff they have the same
colored neighborhood
11

k-GNN
Idea
Derive k-dimensional Graph Neural Network
ft
S = 𝜎(W1ft−1
S + W2
∑︁
T∈N(S)
ft−1
T ),
where S is a subgraph of the input graph of size k.
12

k-GNN
Idea
Derive k-dimensional Graph Neural Network
ft
S = 𝜎(W1ft−1
S + W2
∑︁
T∈N(S)
ft−1
T ),
where S is a subgraph of the input graph of size k.
Challenges
• Scalability
• GPU memory consumption
12

Hierarchical k-GNN
Idea
Learn features for subgraphs in a hierarchical way
13

Hierarchical k-GNN
Idea
Learn features for subgraphs in a hierarchical way
1-GNN
. . .
2-GNN
. . .
3-GNN
. . .
MLP
Pool
Pool
Pool
Learning higher-order graph properties
13

Experimental Results
PRO IMDB-M NCI1 PTC-FM MUTAG
0
20
40
60
80
100
Accuracy[%]
1-GNN
1-k-GNN
Figure 3: Classiﬁcation: Improvement on smaller benchmark datasets
14

Experimental Results
U0 ZPVE H
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
Errornormalizedto1-GNN(lowerisbetter)
1-k-GNN
1-GNN
MPNN
DTNN
Figure 4: Regression (QM9 data set): Gain over 1-GNN baseline
15

Conclusion
• GNN are a differentiable version of 1-WL
• GNNs and 1-WL are equally powerful
2 Higher-order graph embeddings
• k-dimensional GNN
• Hierarchical Variant
• Good results for large datasets with continuous node labels
16

Conclusion
• GNN are a differentiable version of 1-WL
• GNNs and 1-WL are equally powerful
2 Higher-order graph embeddings
• k-dimensional GNN
• Hierarchical Variant
• Good results for large datasets with continuous node labels
Collection of graph classiﬁcation benchmarks
graphkernels.cs.tu-dortmund.de
16

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks

Ähnlich wie Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks