[论文精读]Contrastive Graph Neural Network Explanation

论文网址:[2010.13663] Contrastive Graph Neural Network Explanation (arxiv.org)

论文代码:GitHub - lukasjf/contrastive-gnn-explanation

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related work

2.4. Method

2.5. Experiments

2.5.1. CoGE Implementation

2.5.2. Qualitative Analysis

2.6. Conclusion

3. Reference

        ①They think occlusion fail for one elimination results large difference

        ②They called the situation as Distribution Compliant Explanation (DCE), and they only use data consistent with the training distribution for model interpretation

        ③They proposed a Contrastive GNN Explanation (CoGE) technique

2.2. Introduction

        ①⭐Occlusion can be used in GNN explaination, but it's too extreme that one node might greatly change the structure of a sparse graph

        ②⭐Excluding one edge in a graph may cause disconnective graph

        ③⭐CoGE searches the similarity between nodes in the same label and dissimilarity between nodes in different label

        ④Edge explaination methods:

2.3. Related work

(1)Graph Neural Networks

(2)Explainability Methods for Graphs

(3)Adversarial Graph Attacks

2.4. Method


        ①Considering undirected graph G=\left ( V ,E \right ) with node set V and edge set E

        ②Feature matrix X

(2)Explanations for graph classification

        ①They measure the similarity by Optimal Transport (OT) distance

        ②A example of how to calculate the distance between left graph and the middle graph:

each node holds a weight and all the weigts in one graph equals to 1. The capacity of one node is the weight. The cost of transport is the source weight multiples the distance (L2 distance)

        ③They aim to find a weight:

w_{opt}(G)=\arg\min_{w}\mathcal{L}_{w}^{\neq }(G)-\mathcal{L}_{w}^{\approx}(G)+\mathcal{L}_{w}^{=}(G)


where the first term means the average distance of the k most similar graph with different label, the second term is the average distance of the k most similar graph with the same label, the third term is the distance between weighted graph G and its uniformly-weighted version. 

        ④The formal loss:

\begin{aligned} &\mathcal{L}_{W}^{\neq }(G) =\frac{1}{k}\sum_{H\in\mathbb{G}_{k}^{\neq}}d_{W}(Z_{G},Z_{H}) \\ &\mathcal{L}_{W}^{\approx}(G) =\frac{1}{k}\sum_{H\in\mathbb{G}_{k}^{\approx}}d_{W}(Z_{G},Z_{H}) \\ &\mathcal{L}_{W}^{=}(G) =d_W(Z_G,Z_G) \end{aligned}


2.5. Experiments

2.5.1. CoGE Implementation

        ①Number of compared graphs: k=10

        ②Optimizer: Adam

        ③Learning rate: 0.1, only 0.01 for REDDIT

2.5.2. Qualitative Analysis

        ①Graph classification dataset: MUTAG (4337 chemical molecules) and REDDIT-BINARY (2000 Reddit threads)

        ②GNN: GIN

        ③The most important structure in MUTAG:

where the left denotes the original graph, the middle denotes the similar graph with the same label, the right one is the similar graph with different label

        ④The most important structure in REDDIT-BINARY:

where the number denotes the degree

2.5.3. Quantitative Analysis


        ①Node classification dataset: CYCLIQ

        ②Aiming: finding how many of the x most important edges are in the loop or cluster

(2)Experiment Setup

        ①GNN: GCN with 5 layers

        ②Embedding size: 20

        ③Edge features: NONE

        ④Split: 80%/20% train/test


        ①Performance on CYCLIQ dataset:


        ①Loss ablation:

and they also tried euclidean distance on the weighted average on the node embeddings (L and Average) and got a worse result

2.6. Conclusion

        They aim to further apply it in node classification

3. Reference

Faber, L., Moghaddam, A. K., & Wattenhofer, R. (2020) 'Contrastive Graph Neural Network Explanation', ICML Workshop on Graph Representation Learning and Beyond. doi: https://doi.org/10.48550/arXiv.2010.13663

