GAT模型介绍

最新推荐文章于 2022-12-24 10:30:00 发布

vincent_hahaha

最新推荐文章于 2022-12-24 10:30:00 发布

阅读量3.6k

点赞数 4

分类专栏： GNN 文章标签： pytorch 深度学习神经网络

本文链接：https://blog.csdn.net/vincent_duan/article/details/120071465

版权

GNN 专栏收录该内容

10 篇文章 22 订阅

订阅专栏

在GCN里介绍了处理cora数据集，以及返回的结果：

features：论文的属性特征，维度2708 × 1433 2708 \times 14332708×1433，并且做了归一化，即每一篇论文属性值的和为1.
labels：每一篇论文对应的分类编号：0-6
adj：邻接矩阵，维度2708 × 2708 2708 \times 27082708×2708
idx_train：0-139
idx_val：200-499
idx_test：500-1499

这一节介绍GAT模型：

GAT模型

model:

class GAT(nn.Module):
    def __init__(self, nfeat, nhid, nclass, dropout, alpha, nheads):
        """Dense version of GAT."""
        super(GAT, self).__init__()
        self.dropout = dropout

        self.attentions = [GraphAttentionLayer(nfeat, nhid, dropout=dropout, alpha=alpha, concat=True) for _ in range(nheads)]
        for i, attention in enumerate(self.attentions):
            self.add_module('attention_{}'.format(i), attention)

        self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False)  # 第二层(最后一层)的attention layer

    def forward(self, x, adj):
        x = F.dropout(x, self.dropout, training=self.training)
        x = torch.cat([att(x, adj) for att in self.attentions], dim=1)  # 将每层attention拼接
        x = F.dropout(x, self.dropout, training=self.training)
        x = F.elu(self.out_att(x, adj))   # 第二层的attention layer
        return F.log_softmax(x, dim=1)

layers:

class GraphAttentionLayer(nn.Module):
    """
    Simple GAT layer, similar to https://arxiv.org/abs/1710.10903
    """
    def __init__(self, in_features, out_features, dropout, alpha, concat=True):
        super(GraphAttentionLayer, self).__init__()
        self.dropout = dropout
        self.in_features = in_features
        self.out_features = out_features
        self.alpha = alpha
        self.concat = concat

        self.W = nn.Parameter(torch.empty(size=(in_features, out_features)))
        nn.init.xavier_uniform_(self.W.data, gain=1.414)
        self.a = nn.Parameter(torch.empty(size=(2*out_features, 1)))  # concat(V,NeigV)
        nn.init.xavier_uniform_(self.a.data, gain=1.414)

        self.leakyrelu = nn.LeakyReLU(self.alpha)

    def forward(self, h, adj):
        Wh = torch.mm(h, self.W) # h.shape: (N, in_features), Wh.shape: (N, out_features)
        a_input = self._prepare_attentional_mechanism_input(Wh)  # 每一个节点和所有节点，特征。(Vall, Vall, feature)
        e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))  # a_input.shape=(2708,2708,16) self.a.shape=(16,l) numpy.matmul(a_input,self.a) shape=(2708, 2708, 1)  squeeze表示去掉最后一个维度, (2708,2708)
        # 之前计算的是一个节点和所有节点的attention，其实需要的是连接的节点的attention系数
        zero_vec = -9e15*torch.ones_like(e)
        attention = torch.where(adj > 0, e, zero_vec)    # 将邻接矩阵中小于0的变成负无穷
        attention = F.softmax(attention, dim=1)  # 按行求softmax。 sum(axis=1) === 1
        attention = F.dropout(attention, self.dropout, training=self.training)
        h_prime = torch.matmul(attention, Wh)   # 聚合邻居函数

        if self.concat:
            return F.elu(h_prime)   # elu-激活函数
        else:
            return h_prime

    def _prepare_attentional_mechanism_input(self, Wh):
        N = Wh.size()[0] # number of nodes

        # Below, two matrices are created that contain embeddings in their rows in different orders.
        # (e stands for embedding)
        # These are the rows of the first matrix (Wh_repeated_in_chunks): 
        # e1, e1, ..., e1,            e2, e2, ..., e2,            ..., eN, eN, ..., eN
        # '-------------' -> N times  '-------------' -> N times       '-------------' -> N times
        # 
        # These are the rows of the second matrix (Wh_repeated_alternating): 
        # e1, e2, ..., eN, e1, e2, ..., eN, ..., e1, e2, ..., eN 
        # '----------------------------------------------------' -> N times
        # 
        
        Wh_repeated_in_chunks = Wh.repeat_interleave(N, dim=0)  # 复制
        Wh_repeated_alternating = Wh.repeat(N, 1)
        # Wh_repeated_in_chunks.shape == Wh_repeated_alternating.shape == (N * N, out_features)

        # The all_combination_matrix, created below, will look like this (|| denotes concatenation):
        # e1 || e1
        # e1 || e2
        # e1 || e3
        # ...
        # e1 || eN
        # e2 || e1
        # e2 || e2
        # e2 || e3
        # ...
        # e2 || eN
        # ...
        # eN || e1
        # eN || e2
        # eN || e3
        # ...
        # eN || eN

        all_combinations_matrix = torch.cat([Wh_repeated_in_chunks, Wh_repeated_alternating], dim=1)
        # all_combinations_matrix.shape == (N * N, 2 * out_features)

        return all_combinations_matrix.view(N, N, 2 * self.out_features)

    def __repr__(self):
        return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'

初始化模型

    model = GAT(nfeat=1433, 
                nhid=8, 
                nclass=7, 
                dropout=0.6, 
                nheads=8, 
                alpha=0.2)

构建attention：

self.dropout = 0.6

self.attentions = [GraphAttentionLayer(nfeat, nhid, dropout=dropout, alpha=alpha, concat=True) for _ in range(nheads)]
for i, attention in enumerate(self.attentions):
    self.add_module('attention_{}'.format(i), attention)

self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False)  # 第二层(最后一层)的attention layer

attentions和out_att

首先构建attentions层，主要包括8个GraphAttentionLayer，每一个GraphAttentionLayer如下：

def __init__(self, in_features, out_features, dropout, alpha, concat=True):
    super(GraphAttentionLayer, self).__init__()
    self.dropout = 0.6
    self.in_features = 1433
    self.out_features = 8
    self.alpha = 0.2
    self.concat = True

    self.W = nn.Parameter(torch.empty(size=(1433, 8)))
    nn.init.xavier_uniform_(self.W.data, gain=1.414)  # 初始化W
    self.a = nn.Parameter(torch.empty(size=(2*out_features, 1)))  # concat(V,NeigV)
    nn.init.xavier_uniform_(self.a.data, gain=1.414)  # 初始化a

    self.leakyrelu = nn.LeakyReLU(0.2)

参数W的维度是 $W_{1433 \times 8}$
参数a的维度是 $a_{16 \times 1}$

out_att

而out_att与attention相似，区别是out_att只有一个GraphAttentionLayer，而且参数也有所不同：

def __init__(self, in_features, out_features, dropout, alpha, concat=True):
    super(GraphAttentionLayer, self).__init__()
    self.dropout = 0.6
    self.in_features = 64
    self.out_features = 7
    self.alpha = 0.2
    self.concat = False

    self.W = nn.Parameter(torch.empty(size=(64, 7)))
    nn.init.xavier_uniform_(self.W.data, gain=1.414)  # 初始化W
    self.a = nn.Parameter(torch.empty(size=(2*out_features, 1)))  # concat(V,NeigV)
    nn.init.xavier_uniform_(self.a.data, gain=1.414)  # 初始化a

    self.leakyrelu = nn.LeakyReLU(0.2)

参数W的维度是 $W_{64 \times 7}$
参数a的维度是 $a_{14 \times 1}$

forward执行模型

首先执行model:

    def forward(self, x, adj):
        x = F.dropout(x, self.dropout, training=self.training)
        x = torch.cat([att(x, adj) for att in self.attentions], dim=1)  # 将每层attention拼接
        x = F.dropout(x, self.dropout, training=self.training)
        x = F.elu(self.out_att(x, adj))   # 第二层的attention layer
        return F.log_softmax(x, dim=1)

执行F.dropout，将输入数据的特征进行dropout=0.6，
遍历self.attentions，得到每个att，执行GraphAttentionLayer中的forward方法，att(x, adj)将传入到forward中，参数是数据特征x和邻接矩阵adj
在GraphAttentionLayer中的forward:

    def forward(self, h, adj):
        Wh = torch.mm(h, self.W) # h.shape: (N, in_features), Wh.shape: (N, out_features)
        a_input = self._prepare_attentional_mechanism_input(Wh)  # 每一个节点和所有节点，特征。(Vall, Vall, feature)
        e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))  # a_input.shape=(2708,2708,16) self.a.shape=(16,l) numpy.matmul(a_input,self.a) shape=(2708, 2708, 1)  squeeze表示去掉最后一个维度, (2708,2708)
        # 之前计算的是一个节点和所有节点的attention，其实需要的是连接的节点的attention系数
        zero_vec = -9e15*torch.ones_like(e)
        attention = torch.where(adj > 0, e, zero_vec)    # 将邻接矩阵中小于0的变成负无穷
        attention = F.softmax(attention, dim=1)  # 按行求softmax。 sum(axis=1) === 1
        attention = F.dropout(attention, self.dropout, training=self.training)
        h_prime = torch.matmul(attention, Wh)   # 聚合邻居函数

        if self.concat:
            return F.elu(h_prime)   # elu-激活函数
        else:
            return h_prime

将数据特征h与第一个attention的权重w相乘，得到结果 $w_h=x_{2708 \times 1433} \times W_{1433 \times 8}$ ， $w_h$ 的维度是 $2708 \times 8$ ，然后执行self._prepare_attentional_mechanism_input(Wh)
5. 执行self._prepare_attentional_mechanism_input(Wh):

def _prepare_attentional_mechanism_input(self, Wh):
    N = Wh.size()[0] # number of nodes
    Wh_repeated_in_chunks = Wh.repeat_interleave(N, dim=0)  # 复制 维度:[2708*2708, 8]
    Wh_repeated_alternating = Wh.repeat(N, 1)
    all_combinations_matrix = torch.cat([Wh_repeated_in_chunks, Wh_repeated_alternating], dim=1)
    return all_combinations_matrix.view(N, N, 2 * self.out_features)

假如Wh[0]的内容如下：

Wh[0]
tensor([-0.0118, -0.0033, -0.0051,  0.0151, -0.0151,  0.0186, -0.0097,  0.0387],
       grad_fn=<SelectBackward>)

那么经过Wh_repeated_in_chunks=Wh.repeat_interleave(N, dim=0)后，Wh_repeated_in_chunks的维度变为 $2708 \times 8$ ，且Wh_repeated_in_chunks[0]到Wh_repeated_in_chunks[2707]的数据与Wh[0]一致，
Wh_repeated_in_chunks[2708]到Wh_repeated_in_chunks[2707+2708]的数据与Wh[1]一致，以此类推。

经过Wh_repeated_alternating = Wh.repeat(N, 1)后，Wh_repeated_alternating维度变为 $2708 \times 8$ ，且Wh_repeated_alternating[0]到Wh_repeated_alternating[2707]的数据与Wh[0]到Wh[2707]一致，形式如下：

        # e1, e1, ..., e1,            e2, e2, ..., e2,            ..., eN, eN, ..., eN
        # '-------------' -> N times  '-------------' -> N times       '-------------' -> N times

Wh_repeated_alternating[2708]到Wh_repeated_alternating[2707+2708]的数据与Wh[2708]到Wh[2707+2708]一致，形式如下：

        # These are the rows of the second matrix (Wh_repeated_alternating): 
        # e1, e2, ..., eN, e1, e2, ..., eN, ..., e1, e2, ..., eN 
        # '----------------------------------------------------' -> N times

all_combinations_matrix = torch.cat([Wh_repeated_in_chunks, Wh_repeated_alternating], dim=1)的作用是将上面的Wh_repeated_alternating与Wh_repeated_alternating拼接起来，维度为 $[2708 * 2708, 16]$ ，形式如下：

# e1 || e1
# e1 || e2
# e1 || e3
# ...
# e1 || eN
# e2 || e1
# e2 || e2
# e2 || e3
# ...
# e2 || eN
# ...
# eN || e1
# eN || e2
# eN || e3
# ...
# eN || eN

返回结果a_input的维度被改为 $[2708, 2708, 16]$ ，a_input格式如下，a_input[0]表示第0个数据与其他数据的特征拼接；a_input[1]表示第1个数据与其他数据的特征拼接；
在这里插入图片描述
6. e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))计算attention，a_input的维度是 $[2708, 2708, 16]$ ，self.a的维度是 $[16, 1]$ ，相乘得到维度 $[2708, 2708, 1]$ ，这个表示每一个节点与其他所有节点的attention的值，squeeze表示去掉最后一个维度, 也就是维度变为 $[2708, 2708]$
7. zero_vec = -9e15*torch.ones_like(e), 之前计算的是一个节点和所有节点的attention，其实需要的是连接的节点的attention系数，所以生成与e同样的结构的矩阵zero_vec
8. attention = torch.where(adj > 0, e, zero_vec) 将邻接矩阵中小于0的变成负无穷，形成与邻接矩阵shape相同的attention矩阵，每个值表示该节点与其他节点的attention值
9. attention = F.softmax(attention, dim=1)，对attention矩阵的每一行求softmax，求得关联度最大的概率节点
10. attention = F.dropout(attention, self.dropout, training=self.training)对attention矩阵进行dropout
11. h_prime = torch.matmul(attention, Wh)将最终得到的attention矩阵与当前attention的权重矩阵相乘
12. 将结果h_prime进行激活函数elu，h_prime的维度是 $2708 \times 8$
13. 这样就完成了一次attention，总共有8次attention
14. 得到8个h_prime，将这8个h_prime拼接起来，得到x的shape是[2708, 64]