Structure-Aware Transformer for Graph Representation Learning 简单笔记

mmwtcl_

已于 2022-11-30 13:29:21 修改

阅读量652

点赞数

分类专栏： graph-transformer 文章标签： transformer 深度学习人工智能

于 2022-08-01 10:24:44 首次发布

本文链接：https://blog.csdn.net/wakeupshely/article/details/126095800

版权

graph-transformer 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文介绍了一种新的自注意力机制，通过结构提取和子图表示学习解决了Transformer在捕捉结构相似性上的局限，并提出结构感知注意力，有效增强现有GNN模型。方法包括k-子树和k-子图GNN提取，以及考虑节点度的skip-connection。实验表明，结构感知注意力SOTA，且能避免过拟合问题。

摘要由CSDN通过智能技术生成

SAT 2022

Motivations

1、Transformer with positional encoding do not necessarily capture structural similarity between them（对于一些即便处于不同位置，但是有相似环境、相似结构的结点，应该有相似的表示）

2、suffer from problems of limited expressiveness, over-smoothing, and over-squashing(一般的图神经网络不能太深)

2、a new self-attention mechanism: extracting a subgraph representation rooted at each node before computing the attention（先提取基于每个结点的子图的表示）

Contributions

1、reformulate the self-attention mechanism as a kernel smoother

2、automatically generating the subgraph representations

3、making SAT an effortless enhancer of any existing GNN

4、SAT is more interpretable

Methods

Structure-aware self-attention

在这里插入图片描述

structure extractor

在这里插入图片描述

1、k-subtree GNN extractor
在这里插入图片描述

take the output node representation at u as the subgraph representation at u

2、k-subgraph GNN extractor
在这里插入图片描述

aggregates the updated node representations of all nodes within the k-hop neighborhood using a pooling function

Structure-aware transformer

1、include the degree factor in the skip-connection, reducing the overwhelming influence of highly connected graph components

Combination with absolute encoding

1、absolute positional encoding is not guaranteed to generate similar node representations even if two nodes have similar local structures

2、subgraph representations used in the structure-aware attention can be tailored to measure the structural similarity between nodes

Conclusions

1、The structure-aware framework achieves SOTA performance

2、k-subtree and k-subgraph SAT improve upon the base GNN

3、incorporating the structure via our structure-aware attention brings a notable improvement

4、a small value of k already leads to good performance, while not suffering from over-smoothing or over-squashing

5、a proper absolute positional encoding and a readout method improves performance, but to a much lesser extent than incorporating the structure into the approach

Limitations

it suffers from the same drawbacks as the Transformer, namely the quadratic complexity of the self attention computation

mmwtcl_

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录