Anzeige
Anzeige

Más contenido relacionado

Similar a AI-CUK Joint Journal Club: V.T.Hoang, Review on "Global self-attention as a replacement for graph convolution," KDD 2022, Jan 3rd, 2023(20)

Más de Network Science Lab, The Catholic University of Korea(17)

Anzeige

AI-CUK Joint Journal Club: V.T.Hoang, Review on "Global self-attention as a replacement for graph convolution," KDD 2022, Jan 3rd, 2023

  1. KDD ’22, August 14–18, 2022, Washington, DC, USA Thuy Hoang Van, PhD student. Network Science Lab, The Catholic University of Korea. https://nslab.catholic.ac.kr/
  2. Difference: Word, image vs graph • Words in sentences – Tokenization & PE (absolute and relative PE) – 1-d dimentional vectors • Images: – Grid, 2-d dimentional vectors • Graphs – Node Position (disorder) – Edge connection, important as node – Global , local connection – Node Centrality Bring a challenge to apply a graph transformer based on node self- attention
  3. Key notes* • Propose a simple extension: – Edge channels to the transformer framework • Propose a global attention take structural information of graphs • Better than convolutional GNNs.
  4. Message passing – Self Attention • Left: It takes three stages of convolution for node 0 to aggregate node 6 • Right: With global self-attention, the model can learn to do so in a single step (all steps in one). – Meaning/Role of target nodes vs other nodes? – How far attention should be? – How much attention should be?
  5. Architecture
  6. SVD-based Positional Encodings • Sigma can be used as PE • Add to node embeddings
  7. Centrality Scalers • ~ a virtual node for graph. • distinguish non-isomophic sub-graphs.
  8. EGT VARIANTS • EGT-Simple: – uses global self-attention, but not residual channels • EGT-Constrained: • Ungated Variant:
  9. Hyperparameters used in large-scale experiments
  10. EXPERIMENTS AND RESULTS • Supervised learning tasks: node, edge, graph classification • The transfer learning performance of EGT • Tasks/Datasets: – node classification: • PATTERN (14K synthetic graphs, 44-188 nodes/graph) • CLUSTER (12K synthetic graphs, 41-190 nodes/graph) – edge classification : • TSP (12K synthetic graphs, 50- 500 nodes/graph) – graph classification : • MNIST (70K superpixel graphs, 40-75 nodes/graph), • CIFAR10 (60K superpixel graphs, 85- 150 nodes/graph), • ZINC (12K molecular graphs, 9-37 nodes/graph) for – Tranfer learning: For Link prediction: • Pre-trained: PCQM4Mv2 dataset • Fine-tuning: OGB, MolPCBA dataset medium-scale supervised learning setting large-scale
  11. Tab.1 Medium-scale Experiments
  12. Table 2. Large-scale Performance Results on OGB-LSC PCQM4M and PCQM4Mv2 datasets in terms of Mean Absolute Error (lower is better)
  13. Transfer Learning Performance – Tranfer learning: For Link prediction: • Pre-trained: PCQM4Mv2 dataset • Fine-tuning: OGB molecular datasets
  14. Ablation Study
  15. Ablation study on the PCQM4Mv2 dataset for EGTSmall
  16. Conclusion • SPD on edge channels shows a strong sampling strategy for self-attention mechanism. – Compare to other sampling strategy: based on adjacency, distance, intimacy, etc. (Graph_Bert,..) – Aggregation mechanism to avoid smoothing problem in Message passing. • Bettter than 1-WL • Key: global attention: more understanding about the meaning of graph structure: – Edge feature – Integrating edge to graph transformer.
  17. Thank you!
Anzeige