论文阅读和分析：Graph Attention Networks

最新推荐文章于 2024-05-11 10:26:18 发布

KPer_Yang

最新推荐文章于 2024-05-11 10:26:18 发布

阅读量889

点赞数 1

分类专栏：机器学习文章标签：论文阅读

本文链接：https://blog.csdn.net/KPer_Yang/article/details/128883183

版权

机器学习专栏收录该内容

87 篇文章 18 订阅

订阅专栏

Graph Attention Networks

图注意力网络（GAT）

通过堆叠节点能够关注其邻域特征的层，能够（隐式地）为邻域中的不同节点指定不同的权重，而不需要任何类型的代价高昂的矩阵运算（例如矩阵转置）或依赖于预先了解图结构。通过这种方式，同时解决了基于谱的图神经网络的几个关键挑战，并使的模型易于应用于聚合和传播问题。

GAT模型已在四个已建立的转导和诱导图基准上实现或匹配最新结果：Cora、Citseeer和Pubmed引文网络数据集，以及蛋白质-蛋白质相互作用数据集（其中测试图在训练期间保持不可见）。

在这里插入图片描述

多头注意力机制如下所示，

左：模型采用的注意力机制 $a(W\vec{h_i}，W\vec{h_j})$ ，由权重向量 $\vec{a}∈R^{2F^`}$ 进行参数化，应用LeakyReLU激活函数。

右：节点1附近的多头注意力（K=3头）示意图。不同的箭头样式和颜色表示独立的注意力计算。将每个头部的聚集特征连接或平均以获得$$。

在这里插入图片描述

t-SNE可视化cora数据集结果：

在这里插入图片描述

算法：geometric提供开源实现

torch_geometric.nn — pytorch_geometric documentation (pytorch-geometric.readthedocs.io)

The graph attentional operator from the “Graph Attention Networks” paper:
$\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},$
where the attention coefficients $a_{ij}$ are computed as
$\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k] \right)\right)}.$
If the graph has multi-dimensional edge features $e_{ij}$ , the attention coefficients $a_{ij}$ are computed as
$\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,j}]\right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k \, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,k}]\right)\right)}.$
PARAMETERS

in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.
out_channels (int) – Size of each output sample.
heads (int, optional) – Number of multi-head-attentions. (default: 1)
concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)
negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)
dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)
add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)
edge_dim (int, optional) – Edge feature dimensionality (in case there are any). (default: None)
fill_value (float or Tensor or str, optional) – The way to generate edge features of self-loops (in case edge_dim != None). If given as float or torch.Tensor, edge features of self-loops will be directly given by fill_value. If given as str, edge features of self-loops are computed by aggregating all features of edges that point to the specific node, according to a reduce operation. ("add", "mean", "min", "max", "mul"). (default: "mean")
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

**kwargs (optional) – Additional arguments of conv.MessagePassing.

    def __init__(
        self,
        in_channels: Union[int, Tuple[int, int]],
        out_channels: int,
        heads: int = 1,
        concat: bool = True,
        negative_slope: float = 0.2,
        dropout: float = 0.0,
        add_self_loops: bool = True,
        edge_dim: Optional[int] = None,
        fill_value: Union[float, Tensor, str] = 'mean',
        bias: bool = True,
        **kwargs,
    ):
        
    def forward(self, x: Union[Tensor, OptPairTensor], edge_index: Adj,
                edge_attr: OptTensor = None, size: Size = None,
                return_attention_weights=None):