Más contenido relacionado

Similar a NS-CUK Seminar: S.T.Nguyen, Review on "DeeperGCN: All You Need to Train Deeper GCNs," arXiv, Mar 6th, 2023(20)

Más de Network Science Lab, The Catholic University of Korea(16)


NS-CUK Seminar: S.T.Nguyen, Review on "DeeperGCN: All You Need to Train Deeper GCNs," arXiv, Mar 6th, 2023

  1. Nguyen Thanh Sang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail:
  2. 1 Problems • When stacking very deep GNN layers, GCNs suffer from vanishing gradient, over-smoothing a nd over-fitting issues when going deeper. => cannot distinguish nodes in graph. • In a largescale graph, it is important to extract more neighborhood features. => limit the representation power of GCNs on largescale graphs.
  3. 2 Contributions • A novel Generalized Aggregation Function which enjoys a permutation invariant property. • Show how its parameters can be tuned to improve the performance of diverse GCN tasks. • Show how its parameters can be learned in an end-to-end fashion. • Enhance the power of GCNs by exploring a modified graph skip connections as well as a novel graph normalization layer. => Apply for a large-scale graph
  4. 3 Graph Representation Learning Node Neighbor Edge Message Construction: Aggregation: Update node feature:
  5. 4 Message Passing
  6. 5 Generalized Message Aggregation Functions • Mean and max aggregators are proven to be less powerful than the WL test, they are found to be effective on the tasks of node classification. • In order to cover the popular mean and max aggregations into the generalized space, the authors define generalized mean-max aggregation for message aggregation. It is easy to include sum aggregation.
  7. 6 Generalized Message Aggregation Functions • The SoftMax function with a temperature has been studied in many machine learning areas, e.g. Energy-Based Learning, Knowledge Distillation and Reinforcement Learning. • Power mean is one member of Quasi-arithmetic mean. It is a generalized mean function that includes harmonic mean, geometric mean, arithmetic mean, and quadratic mean. • p = −1 => the harmonic mean aggregation • p → 0 => the geometric mean aggregation
  8. 7 GENeralized Aggregation Networks (GEN) • The key idea is to keep all the message features to be positive, so that generalized mean- max aggregation functions. • A pre-activation variant of residual connections for GCNs, which follows the ordering: Normalization → ReLU → GraphConv → Addition.  Performs better • A message normalization (MsgNorm) layer, which can significantly boost the performance of networks with under-performing aggregation functions s to be a learnable scalar with an initialized value of 1
  9. 8 Baselines • PlainGCN: stacks GCN layers with {3, 7, 14, 28, 56, 112} depth and without skip connections. Each GCN layer shares the same message passing operator as GEN except the aggregation function is replaced by Sum(·), Mean(·) or Max(·) aggregation. • ResGCN: adding residual connections to PlainGCN following the ordering: GraphGonv → Normalization → ReLU → Addition. • ResGCN+: the pre-activation version of ResGCN: changing the order of residual connections to Normalization → ReLU → GraphGonv → Addition. • DyResGEN: learns parameters β and p dynamically for every layer at every gradient descent step. => avoid the need to painstakingly searching for the best hyper-parameters.
  10. 9 Experiments • Effect of Residual Connections
  11. 10 Experiments • Effect of Generalized Message Aggregators.
  12. 11 Experiments • Learning Dynamic Aggregators
  13. 12 Conclusions • Proposed a differentiable generalized message aggregation function, which defines a family of permutation invariant functions. • Proposed a new variant of residual connections and message normalization layers. • GCNs can go deeper so that it does require more GPUs memory resources and consume more time.
  14. 13