SlideShare ist ein Scribd-Unternehmen logo
1 von 19
2023. 03. 16
Van Thuy Hoang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: hoangvanthuy90@gmail.com
1
• Problems
• Over-fitting
• over-smoothing
• DropEdge model
• Experiments
• Conclusions
2
Over-fitting
 Over-fitting comes from the case when we utilize an over-parametric
model to fit a distribution with limited training data, where the model we
learn fits the training data very well but generalizes poorly to the testing
data.
 It does exist if we apply a deep GCN on small graphs
3
Over-smoothing problem
 Over-smoothing, towards the other extreme, makes training a very deep GCN difficult.
 graph convolutions essentially push representations of adjacent nodes mixed with each
other, such that, if extremely we go with an infinite number of layers, all nodes’
representations will converge to a stationary point, making them unrelated to the input
features and leading to vanishing gradients.
4
DropEdge
 Randomly dropping out certain rate of edges of the input graph for each
training time.
 There are several benefits in applying DropEdge for the GCN training
 DropEdge can be considered as a data augmentation technique.
 We are actually generating different random deformed copies of the
original graph.
 DropEdge can also be treated as a message passing reducer.
 Removing certain edges is making node connections more sparse, and
hence avoiding over-smoothing to some extent when GCN goes very
deep.
 Indeed, as we will draw theoretically in this paper, DropEdge either
reduces the convergence speed of over-smoothing or relieves the
information loss caused by it.
5
DropEdge: e.g.,
6
GCN
 The feed forward propagation in GCN is recursively conducted as:
7
METHODOLOGY
 Drops out a certain rate of edges of the input graph by random.
 It randomly enforces V p non-zero elements of the adjacency matrix A to
be zeros, where V is the total number of edges and p is the dropping rate.
If we denote the resulting adjacency matrix as Adrop, then its relation with
A becomes
8
Preventing over-fitting
 DropEdge produces varying perturbations of the graph connections. As a
result, it generates different random deformations of the input data and
can be regarded as a data augmentation skill for graphs.
 DropEdge enables a random subset aggregation instead of the full
aggregation during GNN training.
 DropEdge only changes the expectation of the neighbor aggregation up
to a multiplier p, if we drop edges with probability p.
9
Layer-Wise DropEdge
 The above formulation of DropEdge is one-shot with all layers sharing the
same perturbed adjacency matrix.
 Indeed, we can perform DropEdge for each individual layer
 Over-smoothing is another obstacle of training deep GCNs, and we will
detail how DropEdge can address it to some extent in the next section.
 For simplicity, the following derivations assume all GCLs share the same
perturbed adjacency matrix, and we will leave the discussion on layer-wise
DropEdge for future exploration.
10
Layer-Wise DropEdge
 The above formulation of DropEdge is one-shot with all layers sharing the
same perturbed adjacency matrix.
 Indeed, we can perform DropEdge for each individual layer
 Over-smoothing is another obstacle of training deep GCNs, and we will
detail how DropEdge can address it to some extent in the next section.
 For simplicity, the following derivations assume all GCLs share the same
perturbed adjacency matrix, and we will leave the discussion on layer-wise
DropEdge for future exploration.
11
Layer-Wise DropEdge
12
TOWARDS PREVENTING OVER-SMOOTHING
 By reducing node connections, DropEdge is proved to slow down the
convergence of over-smoothing; in other words, the value of the relaxed
over-smoothing layer will only increase if using DropEdge
 The gap between the dimensions of the original space and the converging
subspace, i.e. N − M measures the amount of information loss; larger gap
means more severe information loss.
 As shown by our derivations, DropEdge is able to increase the dimension
of the converging subspace, thus capable of reducing information loss
13
DropEdge vs. Dropout
 DropEdge can be regarded as a generation of Dropout from dropping
feature dimensions to dropping edges, which mitigates both over-fitting
and over-smoothing.
 In fact, the impacts of Dropout and DropEdge are complementary to each
other, and their compatibility will be shown in the experiments.
14
DropEdge vs. DropNode
 DropNode samples sub-graphs for mini-batch training, and it can also be
treated as a specific form of dropping edges since the edges connected to
the dropping nodes are also removed. However, the effect of DropNode
on dropping edges is node-oriented and indirect.
 DropEdge is edge-oriented, and it is possible to preserve all node features
for the training (if they can be fitted into the memory at once), exhibiting
more flexibility.
15
DropEdge vs. Graph-Sparsification
 DropEdge where no optimization objective is needed. Specifically,
DropEdge will remove the edges of the input graph by random at each
training time
 Graph-Sparsification resorts to a tedious optimization method to
determine which edges to be deleted, and once those edges are discarded
the output graph keeps unchanged.
16
EXPERIMENTS
 CAN DROPEDGE GENERALLY IMPROVE THE PERFORMANCE OF DEEP
GCNS?
17
EXPERIMENTS
 XXX
 NS-CUK Seminar: V.T.Hoang, Review on "DropEdge: Towards Deep Graph Convolutional Networks on Node Classification", ICLR 2020

Weitere ähnliche Inhalte

Ähnlich wie NS-CUK Seminar: V.T.Hoang, Review on "DropEdge: Towards Deep Graph Convolutional Networks on Node Classification", ICLR 2020

A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
Sean Golliher
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
Kyong-Ha Lee
 
Double Patterning (3/31 update)
Double Patterning (3/31 update)Double Patterning (3/31 update)
Double Patterning (3/31 update)
guest833ea6e
 
Double Patterning
Double PatterningDouble Patterning
Double Patterning
Danny Luk
 
Double Patterning (4/2 update)
Double  Patterning (4/2 update)Double  Patterning (4/2 update)
Double Patterning (4/2 update)
guest833ea6e
 

Ähnlich wie NS-CUK Seminar: V.T.Hoang, Review on "DropEdge: Towards Deep Graph Convolutional Networks on Node Classification", ICLR 2020 (20)

Mixing Path Rendering and 3D
Mixing Path Rendering and 3DMixing Path Rendering and 3D
Mixing Path Rendering and 3D
 
Housing Price Models
Housing Price ModelsHousing Price Models
Housing Price Models
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVM
 
How does unlabeled data improve generalization in self training
How does unlabeled data improve generalization in self trainingHow does unlabeled data improve generalization in self training
How does unlabeled data improve generalization in self training
 
FaceNet: A Unified Embedding for Face Recognition and Clustering
FaceNet: A Unified Embedding for Face Recognition and ClusteringFaceNet: A Unified Embedding for Face Recognition and Clustering
FaceNet: A Unified Embedding for Face Recognition and Clustering
 
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar:  V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...NS-CUK Seminar:  V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
 
06 mlp
06 mlp06 mlp
06 mlp
 
1809.05680.pdf
1809.05680.pdf1809.05680.pdf
1809.05680.pdf
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
 
Domain Invariant Representation Learning with Domain Density Transformations
Domain Invariant Representation Learning with Domain Density TransformationsDomain Invariant Representation Learning with Domain Density Transformations
Domain Invariant Representation Learning with Domain Density Transformations
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
J. Song, J. Park, ICML 2022, MLILAB, KAISTAI
J. Song, J. Park, ICML 2022, MLILAB, KAISTAIJ. Song, J. Park, ICML 2022, MLILAB, KAISTAI
J. Song, J. Park, ICML 2022, MLILAB, KAISTAI
 
Double Patterning (3/31 update)
Double Patterning (3/31 update)Double Patterning (3/31 update)
Double Patterning (3/31 update)
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Double Patterning
Double PatterningDouble Patterning
Double Patterning
 
Double Patterning (4/2 update)
Double  Patterning (4/2 update)Double  Patterning (4/2 update)
Double Patterning (4/2 update)
 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
 
AUTOMATED DATA AUGMENTATIONS FOR GRAPH CLASSIFICATION.pptx
AUTOMATED DATA AUGMENTATIONS FOR GRAPH CLASSIFICATION.pptxAUTOMATED DATA AUGMENTATIONS FOR GRAPH CLASSIFICATION.pptx
AUTOMATED DATA AUGMENTATIONS FOR GRAPH CLASSIFICATION.pptx
 

Mehr von ssuser4b1f48

Mehr von ssuser4b1f48 (20)

NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...
 
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
 
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.B.Kim,  Review on "Cluster-GCN: An Efficient Algorithm for ...NS-CUK Seminar: H.B.Kim,  Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
 
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar: H.E.Lee,  Review on "Weisfeiler and Leman Go Neural: Higher-O...NS-CUK Seminar: H.E.Lee,  Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
 
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
 
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
 
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
 
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
 
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
 
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
 
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
 
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
 
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
 
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
 
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
 
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
 
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: H.E.Lee,  Review on "Graph Star Net for Generalized Multi-Tas...NS-CUK Seminar: H.E.Lee,  Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

NS-CUK Seminar: V.T.Hoang, Review on "DropEdge: Towards Deep Graph Convolutional Networks on Node Classification", ICLR 2020

  • 1. 2023. 03. 16 Van Thuy Hoang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: hoangvanthuy90@gmail.com
  • 2. 1 • Problems • Over-fitting • over-smoothing • DropEdge model • Experiments • Conclusions
  • 3. 2 Over-fitting  Over-fitting comes from the case when we utilize an over-parametric model to fit a distribution with limited training data, where the model we learn fits the training data very well but generalizes poorly to the testing data.  It does exist if we apply a deep GCN on small graphs
  • 4. 3 Over-smoothing problem  Over-smoothing, towards the other extreme, makes training a very deep GCN difficult.  graph convolutions essentially push representations of adjacent nodes mixed with each other, such that, if extremely we go with an infinite number of layers, all nodes’ representations will converge to a stationary point, making them unrelated to the input features and leading to vanishing gradients.
  • 5. 4 DropEdge  Randomly dropping out certain rate of edges of the input graph for each training time.  There are several benefits in applying DropEdge for the GCN training  DropEdge can be considered as a data augmentation technique.  We are actually generating different random deformed copies of the original graph.  DropEdge can also be treated as a message passing reducer.  Removing certain edges is making node connections more sparse, and hence avoiding over-smoothing to some extent when GCN goes very deep.  Indeed, as we will draw theoretically in this paper, DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by it.
  • 7. 6 GCN  The feed forward propagation in GCN is recursively conducted as:
  • 8. 7 METHODOLOGY  Drops out a certain rate of edges of the input graph by random.  It randomly enforces V p non-zero elements of the adjacency matrix A to be zeros, where V is the total number of edges and p is the dropping rate. If we denote the resulting adjacency matrix as Adrop, then its relation with A becomes
  • 9. 8 Preventing over-fitting  DropEdge produces varying perturbations of the graph connections. As a result, it generates different random deformations of the input data and can be regarded as a data augmentation skill for graphs.  DropEdge enables a random subset aggregation instead of the full aggregation during GNN training.  DropEdge only changes the expectation of the neighbor aggregation up to a multiplier p, if we drop edges with probability p.
  • 10. 9 Layer-Wise DropEdge  The above formulation of DropEdge is one-shot with all layers sharing the same perturbed adjacency matrix.  Indeed, we can perform DropEdge for each individual layer  Over-smoothing is another obstacle of training deep GCNs, and we will detail how DropEdge can address it to some extent in the next section.  For simplicity, the following derivations assume all GCLs share the same perturbed adjacency matrix, and we will leave the discussion on layer-wise DropEdge for future exploration.
  • 11. 10 Layer-Wise DropEdge  The above formulation of DropEdge is one-shot with all layers sharing the same perturbed adjacency matrix.  Indeed, we can perform DropEdge for each individual layer  Over-smoothing is another obstacle of training deep GCNs, and we will detail how DropEdge can address it to some extent in the next section.  For simplicity, the following derivations assume all GCLs share the same perturbed adjacency matrix, and we will leave the discussion on layer-wise DropEdge for future exploration.
  • 13. 12 TOWARDS PREVENTING OVER-SMOOTHING  By reducing node connections, DropEdge is proved to slow down the convergence of over-smoothing; in other words, the value of the relaxed over-smoothing layer will only increase if using DropEdge  The gap between the dimensions of the original space and the converging subspace, i.e. N − M measures the amount of information loss; larger gap means more severe information loss.  As shown by our derivations, DropEdge is able to increase the dimension of the converging subspace, thus capable of reducing information loss
  • 14. 13 DropEdge vs. Dropout  DropEdge can be regarded as a generation of Dropout from dropping feature dimensions to dropping edges, which mitigates both over-fitting and over-smoothing.  In fact, the impacts of Dropout and DropEdge are complementary to each other, and their compatibility will be shown in the experiments.
  • 15. 14 DropEdge vs. DropNode  DropNode samples sub-graphs for mini-batch training, and it can also be treated as a specific form of dropping edges since the edges connected to the dropping nodes are also removed. However, the effect of DropNode on dropping edges is node-oriented and indirect.  DropEdge is edge-oriented, and it is possible to preserve all node features for the training (if they can be fitted into the memory at once), exhibiting more flexibility.
  • 16. 15 DropEdge vs. Graph-Sparsification  DropEdge where no optimization objective is needed. Specifically, DropEdge will remove the edges of the input graph by random at each training time  Graph-Sparsification resorts to a tedious optimization method to determine which edges to be deleted, and once those edges are discarded the output graph keeps unchanged.
  • 17. 16 EXPERIMENTS  CAN DROPEDGE GENERALLY IMPROVE THE PERFORMANCE OF DEEP GCNS?