[论文精读]Polarized Graph Neural Networks

论文网址:Polarized Graph Neural Networks | Proceedings of the ACM Web Conference 2022

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 省流版

1.1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related work

2.4. Preliminaries

2.4.1. Problem formulation

2.4.2. Homophily and heteriophily in graphs

2.4.3. Anonymous walks

2.5. Method

2.5.1. Measurement of similarities and dissmilarities

2.5.2. Polarization on the hyper-spherical space

2.5.3. Spherical classification loss

2.6. Experiment

2.6.1. Experimental setup

2.6.2. Node classification on different types of networks

2.6.3. Model analysis

2.6.4. Visualization

2.7. Conclusion

3. Reference


1. 省流版

1.1. 心得

(1)也是比较容易理解的文章,但是没有代码还是比较sad

2. 论文逐段精读

2.1. Abstract

        ①AGAIN~Message passing is only suit for homogeneous dissemination

        ②They proposed polargraph Neural Network (Polar-GNN)

2.2. Introduction

        ①The edge generation difference between homophilic graph and hererophilic graph:

        ②Heterophilic information prompts polorization(酱紫的话岂不是一个人看到相似的观点也是促进,看到不相似的也是促进,那好的坏的都当好的了?其实这个很难评,有些人是看到讨厌的东西更坚定自己,有些人反而慢慢接受了呢?计算机应该不能完全仿照什么什么心理学观点吧,这种东西没有一定的)

        ③Chanllenges: mining similar and dissimilar unlabeled nodes, discriminatively aggregating message

2.3. Related work

       ①Introducing homophily, heterophily, spectral and contrastive learning methods

2.4. Preliminaries

2.4.1. Problem formulation

        ①Graph representations G=\left \{ V, E \right \}v_i \in V is node and e_{ij}=\left ( v_i,v_j \right ) \in E is edge

        ②Each node is associated with a feature vector x_{i}\in\mathbb{R}^{d}

        ③Each node belongs to a label or class y_i

        ④For this semi-supervised learning, only few nodes are labeled

2.4.2. Homophily and heteriophily in graphs

        ①Definition 1: the edge homophily rartio:

h=\frac{|\{(v_i,v_j|(v_i,v_j)\in E \wedge y_i=y_j\}|}{| E |}

where | E | denotes the intra-class edges

        ②Homophily ratio: similar nodes h\rightarrow 1, dissimilar nodes h\rightarrow 0

2.4.3. Anonymous walks

        ①Anonymous walks is similar to random work but deletes the specific identity of nodes:

2.5. Method

        ①Overall framework:

2.5.1. Measurement of similarities and dissmilarities

        ①Exampling GAT:

e_{ij}=$Attention$(Wh_i,Wh_j) \\ \alpha_{ij}=\mathrm{softmax}_{j}(e_{ij})=\frac{\exp(e_{ij})}{\sum_{k\in N_{i}}\exp(e_{ik})}

        ②They introduced negative weights for dissimilar node pairs:

a_{ij}=f\left ( w\odot h_i,w\odot h_j \right )

where \odot denotes the Hadamard product, f denotes a mapping function \mathbb{R}^d\times\mathbb{R}^d\to[-1,1], they chose cosine similarity

        ③To record the structure of nodes, they samples anonymous walks sequence with length l for each node v_i, the distribution of the sequence p_i is regarded to the structure of node v_i

        ④The structural similarity:

\alpha_{ij}^p=f(w_2\odot p_i,w_2\odot p_j)

        ⑤Aggregated weights:

\alpha_{ij}=\lambda\alpha_{ij}^{h}+(1-\lambda)\alpha_{ij}^{p}

2.5.2. Polarization on the hyper-spherical space

        ①Neighbors utilization:

h_{i}=\sum_{j\in N_{i}}\alpha_{ij}h_{j}=\sum_{j\in N_{i}^{+}}|\alpha_{ij}|h_{j}-\sum_{j\in N_{i}^{-}}|\alpha_{ij}|h_{j}

        ②Number of neighbors affects the feature

        ③The authors have demonstrated that for a specific node representation, during a gradient descent update, if its norm is relatively large, then the change in its direction will be comparatively small.

        ④They project embeddings to hyper-sphere, they designe a polarized objective function:

\begin{aligned}L(\boldsymbol{h})&=\sum_{i=1}^{N}\left\|\boldsymbol{h}_{i}-\boldsymbol{h}_{i}^{*}\right\|^{2}+\sum_{i=1}^{N}\sum_{j\in N_{i}^{+}}\left|\alpha_{ij}\right|\left\|\boldsymbol{h}_{i}-\boldsymbol{h}_{j}\right\|^{2}\\&-\sum_{i=1}^{N}\sum_{j\in N_{i}^{-}}|\alpha_{ij}|\left\|\boldsymbol{h}_{i}-\boldsymbol{h}_{j}\right\|^{2},\end{aligned}

minimizing the distance between the initial and target states for stability, minimizing the distance between identified similar node pairs, and conversely, maximizing the distance between dissimilar node pairs

        ⑤Alogorithm of  Polar GNN:

2.5.3. Spherical classification loss

        ①Cross entropy loss in hyper-sphere:

L=-\frac{1}{N}\sum_{i=1}^{N}\log\frac{\exp\left(\cos(\theta_{iy_{i}}+m)\right)}{\exp\left(\cos(\theta_{iy_{i}}+m)\right)+\sum_{k\neq y_{i}}\exp(\cos\theta_{ik})}

        ②By fixing b=0 and \left \| w \right \|=1 by L_2 Norm, they transform the cross entropy to:

L=-\frac{1}{N}\sum_{i=1}^{N}\log\frac{\exp(\cos\theta_{iy_{i}})}{\sum_{k=1}^{K}\exp(\cos\theta_{ik})}

where \theta _{ik} denotes the angle between h_i and w_k

        ③They add an addictive angular penalty m:

L=-\frac{1}{N}\sum_{i=1}^{N}\log\frac{\exp\left(\cos(\theta_{iy_{i}}+m)\right)}{\exp\left(\cos(\theta_{iy_{i}}+m)\right)+\sum_{k\neq y_{i}}\exp(\cos\theta_{ik})}

        ④The following figure shows the effect of the penalty term on distance approximation:

2.6. Experiment

2.6.1. Experimental setup

(1)Dataset

        ①Homophily datasets: Cora, Pubmed, Photo and Facebook

        ②Heterophily datasets: Chameleon, Squirrel, Actor and Twitch

(2)Baselines

        ①GCN and GraphSAGE

        ②GAT and APPNP

        ③Geom-GCN

        ④FAGCN and GPRGNN

(3)Settings

        ①Training: 20 nodes per class for training

        ②Testing: 1000 nodes

        ③Validation: another 500 nodes 

        ④Epoch: 500

2.6.2. Node classification on different types of networks

        ①Model performance:

2.6.3. Model analysis

(1)Ablation studies

        ①Module ablation study on Chameleon, with

Polar-GNN-Featonly incorporates node features when calculating the aggregation weights
Polar-GNN-Structonly uses topological structure when calculating weights
Polar-GNN-NoNormPolar-GNN variant without normalization to a hyper-sphere

they got

        ②Penalty ablation:

(2)Alleviating over-smooth

        ①GNN layer ablation:

2.6.4. Visualization

(1)Aggregation weights

        ①Weight aggregation on Chameleon:

(2)Node embeddings-how negative weights help

        ①Node embedding on Chameleon:

2.7. Conclusion

        The polarization is similar to constrained optimization

3. Reference

Fang, Z. et al. (2022) 'Polarized Graph Neural Networks', WWW. doi: https://doi.org/10.1145/3485447.3512187 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值