RGCN,GCN,GRAPHSAGE论文阅读实践

图神经网络

在本文中,我们将图神经网络划分为五大类别,分别是:图卷积网络(Graph Convolution Networks, GCN)、图注意力网络(Graph Attention Networks)、图自编码器( Graph Autoencoders)、图生成网络( Graph Generative Networks)和图时空网络(Graph Spatial-temporal Networks)

3.图自动编码器(Graph Autoencoders)

图自动编码器是一类图嵌入方法,其目的是利用神经网络结构将图的顶点表示为低维向量。典型的解决方案是利用多层感知机作为编码器来获取节点嵌入,其中解码器重建节点的邻域统计信息。

RGCN论文阅读实践

论文主题

​ Relational Graph Convo-lutional Networks (R-GCNs) 适用于链路预测(对于缺失事实的恢复,主谓宾三元组)或者用于节点分类问题(对于节点缺失属性的填充),R-GCN适用于对于高维多关系的数据结构,其通过因子分解模型作为编码模型相对于只包含解码器的baseline模型有良好提升。即针对于P2P平台间关系(基于不同属性形成的多元路径)的多样性,能够良好实现节点预测。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W92pMTPC-1667047514179)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028113150992.png)]

GCN,GAT,GraphSAGE等同构图原理及实现

GCN

H l + 1 = σ ( D − 1 / 2 A D 1 / 2 H ( l ) W ( l ) ) H^{l+1}=\sigma(D^{-1/2}AD^{1/2}H^{(l)}W^{(l)}) Hl+1=σ(D1/2AD1/2H(l)W(l))

其中H表示第l层的节点,D表示度矩阵,A表示邻接矩阵,其过程同CNN卷积过程相似,是一个加权求和的过程,利用邻居点通过度矩阵及其临阶矩阵,计算各边权重,随后加权求和。

主要缺点:1.融合时边权值是固定的,不够灵活。2.可扩展性差,因为它是全图卷积融合,全图做梯度更新,当图比较大时,这样的方式就太慢了,不合适。3.层数加深时,结果会极容易平滑,每个点的特征结果都十分相似。

GAT就来解决问题1的,GraphSAGE就来解决这个问题2的,DeepGCN等一系列文章就来讨论问题3的。

GAT

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rlgmXufY-1667047514181)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028133842750.png)]

其中hi,hj,hk表示为node_feather,其中ai,j表示为第i与第j点之间的的attention系数。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-P6XHLkff-1667047514181)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028135046955.png)]

之后,第i个点的融合attention过后的node feature可以表示下面这个公式,实质上还是一个加权的feature求和过程,只是每次融合中的权重系数是随模型训练学习的,最后在经过一个非线性的激活函数去做对应的任务。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-miVaOePH-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028135229933.png)]

为了使得attention机制更具有扩展性,定义了multi-head attention机制,其中k表示K个attention head,不同聚合方式对于多头注意力机制上存在差异。GAT中的attention机制还是很直观的,通过给每条边加了一个模型可学习的系数ai,j,并基于attention系数进行node feather的融合,依据任务调整模型参数,能够使得自适应参数效果更好。

GraphSAGE
  1. transductive是说要预测的数据在训练时模型也能看到。进一步解释一下,就是说训练前,构建的图结构已经是固定的,你要预测的点或边关系结构都应该已经在这个图里了,训练跟预测时的图结构都是一样的。
  2. inductive是说要预测的数据在训练时模型不用看到,也就是我们平常做算法模型的样子,训练预测时的数据是分开的,也就是上面说的可以图结构不是固定的,加入新的节点。

GraphSAGE提出随机采子图的方式去采样,通过子图更新node embedding, 这样采出的子图结构本身就是变化,从而让模型学到的是一种采样及聚合的参数的方式,有效解决了unseen nodes问题,同时避免了训练中需要把整个图的node embedding一起更新的窘境,有效的增加了扩展性。

  1. 采子图:训练过程中,对于每一个节点采用切子图的方法,随机sample出部分的邻居点,作为聚合的feature点。如上图最左边,对于中心点,采两度,同时sample出部分邻居节点组成训练中的子图。
  2. 聚合:采出子图后,做feature聚合。这里与GCN的方式是一致的,从最外层往里聚合,从而聚合得到中心点的node embedding。聚合这里可操作的地方很多,比如你可以修改聚合函数(一般是用的mean、sum,pooling等),或增加边权值都是可以的。
  3. 任务预测:得到node embedding后,就可以接下游任务了,比如做node classification,node embedding后接一个linear层+softmax做分类即可。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nCwmRApu-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028140919952.png)]

GraphSAGE主要解决了两个问题:1.解决了预测中unseen nodes的问题,原来的GCN训练时,需要看到所有nodes的图数据。2.解决了图规模较大,全图进行梯度更新,内存消耗大,计算慢的问题

RGCN

RGCN应该说是GCN在多关系图场景上的一个简单尝试。***从同构图到异构图,RGCN要解决的核心问题就一个,就是多关系间怎么交互。***RGCN通过使用一个通用的GNN模型,通过计算多元性质边节点的编码形式(不同的下游处理)去计算实体的embedding。

RGCN所描述的在每一种关系下,指向内与指向外的都能够作为其邻居点,同时加入自循环的特征,进行特征融合用于更新中心节点。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CIVGy9eJ-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028142351497.png)]
h i l + 1 = σ ( ∑ r ∈ R ∑ j ∈ N i r W r ( l ) h j l / ( c i , r ) + W 0 ( l ) h i l ) h_i^{l+1}=\sigma(\sum_{r\in\mathcal{R}}\sum_{j\in\mathcal{N}_i^r}W^{(l)}_rh^{l}_j/(c_{i,r})+W^{(l)}_0h^{l}_i) hil+1=σ(rRjNirWr(l)hjl/(ci,r)+W0(l)hil)
其中双层循环遍历,遍历每一种关系下,叠加每一个点的邻居点的特征进行融合,最后加上一层的中心节点特征,经过一个激活函数输出作为中心节点的输出特征,其中W为维度转换矩阵,也就是模型参数。其中R表示关系结合,N表示邻居节点,ci,r表示针对于问题特定的乘子。相较于GCN中采用度矩阵与领接矩阵作为加权求和的特征融合,而RGCN更多是在模型过程中自行学习。

同时,其中因多关系在参数上的提升,即给定了两个W矩阵规则化定义:

  1. Bias-decomposition(共享转换矩阵参数),

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-H3QvzOX6-1667047514182)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028155239764.png)]

  2. Block-diagonal-decomposition(权重矩阵W由基础小矩阵拼接得到,保证W的稀疏性)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qOpBpW0u-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028155248495.png)]

(1)中B表示分解block个数(常量), Vb为分解的参数关系矩阵,与因子 arb组成一对相关系数,都与关系类型r相关,这里不同关系是共享 Vb

(2)中表示为一系列的低维矩阵求和

在通过RGCN作为encode,聚合得到node embedding后,节点分类任务比较好理解,就是拿着encode得到的node embedding后接一个逻辑回归或linear层,结合Cross-Entropy做一个分类任务。

而关系预测任务,在encode得到node embedding后,则是类似于TransE,计算三元组(s,r,o),得到一个score。(文中采用的是DistMult,原理差不多)

import torch
import torch.nn as nn
import torch.nn.functional as F
from dgl import DGLGraph
import dgl.function as fn
from functools import partial

class RGCNLayer(nn.Module):
    def __init__(self, in_feat, out_feat, num_rels, num_bases=-1, bias=None,
                 activation=None, is_input_layer=False):
        super(RGCNLayer, self).__init__()
        self.in_feat = in_feat
        self.out_feat = out_feat
        self.num_rels = num_rels
        self.num_bases = num_bases
        self.bias = bias
        self.activation = activation
        self.is_input_layer = is_input_layer

        # sanity check
        if self.num_bases <= 0 or self.num_bases > self.num_rels:
            self.num_bases = self.num_rels

        # weight bases in equation (3)
        self.weight = nn.Parameter(torch.Tensor(self.num_bases, self.in_feat,
                                                self.out_feat))
        if self.num_bases < self.num_rels:
            # linear combination coefficients in equation (3)
            self.w_comp = nn.Parameter(torch.Tensor(self.num_rels, self.num_bases))

        # add bias
        if self.bias:
            self.bias = nn.Parameter(torch.Tensor(out_feat))

        # init trainable parameters
        nn.init.xavier_uniform_(self.weight,
                                gain=nn.init.calculate_gain('relu'))
        if self.num_bases < self.num_rels:
            nn.init.xavier_uniform_(self.w_comp,
                                    gain=nn.init.calculate_gain('relu'))
        if self.bias:
            nn.init.xavier_uniform_(self.bias,
                                    gain=nn.init.calculate_gain('relu'))

    def forward(self, g):
        if self.num_bases < self.num_rels:
            # generate all weights from bases (equation (3))
            weight = self.weight.view(self.in_feat, self.num_bases, self.out_feat)
            weight = torch.matmul(self.w_comp, weight).view(self.num_rels,
                                                        self.in_feat, self.out_feat)
        else:
            weight = self.weight

        if self.is_input_layer:
            def message_func(edges):
                # for input layer, matrix multiply can be converted to be
                # an embedding lookup using source node id
                embed = weight.view(-1, self.out_feat)
                index = edges.data['rel_type'] * self.in_feat + edges.src['id']
                return {'msg': embed[index] * edges.data['norm']}
        else:
            def message_func(edges):
                w = weight[edges.data['rel_type']]
                msg = torch.bmm(edges.src['h'].unsqueeze(1), w).squeeze()
                msg = msg * edges.data['norm']
                return {'msg': msg}

        def apply_func(nodes):
            h = nodes.data['h']
            if self.bias:
                h = h + self.bias
            if self.activation:
                h = self.activation(h)
            return {'h': h}

        g.update_all(message_func, fn.sum(msg='msg', out='h'), apply_func)
class Model(nn.Module):
    def __init__(self, num_nodes, h_dim, out_dim, num_rels,
                 num_bases=-1, num_hidden_layers=1):
        super(Model, self).__init__()
        self.num_nodes = num_nodes
        self.h_dim = h_dim
        self.out_dim = out_dim
        self.num_rels = num_rels
        self.num_bases = num_bases
        self.num_hidden_layers = num_hidden_layers

        # create rgcn layers
        self.build_model()

        # create initial features
        self.features = self.create_features()

    def build_model(self):
        self.layers = nn.ModuleList()
        # input to hidden
        i2h = self.build_input_layer()
        self.layers.append(i2h)
        # hidden to hidden
        for _ in range(self.num_hidden_layers):
            h2h = self.build_hidden_layer()
            self.layers.append(h2h)
        # hidden to output
        h2o = self.build_output_layer()
        self.layers.append(h2o)

    # initialize feature for each node
    def create_features(self):
        features = torch.arange(self.num_nodes)
        return features

    def build_input_layer(self):
        return RGCNLayer(self.num_nodes, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu, is_input_layer=True)

    def build_hidden_layer(self):
        return RGCNLayer(self.h_dim, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu)

    def build_output_layer(self):
        return RGCNLayer(self.h_dim, self.out_dim, self.num_rels, self.num_bases,
                         activation=partial(F.softmax, dim=1))

    def forward(self, g):
        if self.features is not None:
            g.ndata['id'] = self.features
        for layer in self.layers:
            layer(g)
        return g.ndata.pop('h')
 class Model(nn.Module):
    def __init__(self, num_nodes, h_dim, out_dim, num_rels,
                 num_bases=-1, num_hidden_layers=1):
        super(Model, self).__init__()
        self.num_nodes = num_nodes
        self.h_dim = h_dim
        self.out_dim = out_dim
        self.num_rels = num_rels
        self.num_bases = num_bases
        self.num_hidden_layers = num_hidden_layers

        # create rgcn layers
        self.build_model()

        # create initial features
        self.features = self.create_features()

    def build_model(self):
        self.layers = nn.ModuleList()
        # input to hidden
        i2h = self.build_input_layer()
        self.layers.append(i2h)
        # hidden to hidden
        for _ in range(self.num_hidden_layers):
            h2h = self.build_hidden_layer()
            self.layers.append(h2h)
        # hidden to output
        h2o = self.build_output_layer()
        self.layers.append(h2o)

    # initialize feature for each node
    def create_features(self):
        features = torch.arange(self.num_nodes)
        return features

    def build_input_layer(self):
        return RGCNLayer(self.num_nodes, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu, is_input_layer=True)

    def build_hidden_layer(self):
        return RGCNLayer(self.h_dim, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu)

    def build_output_layer(self):
        return RGCNLayer(self.h_dim, self.out_dim, self.num_rels, self.num_bases,
                         activation=partial(F.softmax, dim=1))

    def forward(self, g):
        if self.features is not None:
            g.ndata['id'] = self.features
        for layer in self.layers:
            layer(g)
        return g.ndata.pop('h')
 # load graph data
from dgl.contrib.data import load_data
data = load_data(dataset='aifb')
num_nodes = data.num_nodes
num_rels = data.num_rels
num_classes = data.num_classes
labels = data.labels
train_idx = data.train_idx
# split training and validation set
val_idx = train_idx[:len(train_idx) // 5]
train_idx = train_idx[len(train_idx) // 5:]

# edge type and normalization factor
edge_type = torch.from_numpy(data.edge_type)
edge_norm = torch.from_numpy(data.edge_norm).unsqueeze(1)

labels = torch.from_numpy(labels).view(-1)
# configurations
n_hidden = 16 # number of hidden units
n_bases = -1 # use number of relations as number of bases
n_hidden_layers = 0 # use 1 input layer, 1 output layer, no hidden layer
n_epochs = 25 # epochs to train
lr = 0.01 # learning rate
l2norm = 0 # L2 norm coefficient

# create graph
g = DGLGraph((data.edge_src, data.edge_dst))
g.edata.update({'rel_type': edge_type, 'norm': edge_norm})

# create model
model = Model(g.num_nodes(),
              n_hidden,
              num_classes,
              num_rels,
              num_bases=n_bases,
              num_hidden_layers=n_hidden_layers)
# optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=l2norm)

print("start training...")
model.train()
for epoch in range(n_epochs):
    optimizer.zero_grad()
    logits = model.forward(g)
    loss = F.cross_entropy(logits[train_idx], labels[train_idx])
    loss.backward()

    optimizer.step()

    train_acc = torch.sum(logits[train_idx].argmax(dim=1) == labels[train_idx])
    train_acc = train_acc.item() / len(train_idx)
    val_loss = F.cross_entropy(logits[val_idx], labels[val_idx])
    val_acc = torch.sum(logits[val_idx].argmax(dim=1) == labels[val_idx])
    val_acc = val_acc.item() / len(val_idx)
    print("Epoch {:05d} | ".format(epoch) +
          "Train Accuracy: {:.4f} | Train Loss: {:.4f} | ".format(
              train_acc, loss.item()) +
          "Validation Accuracy: {:.4f} | Validation loss: {:.4f}".format(
              val_acc, val_loss.item()))

Capsule实践

Dynamic Routing Between Capsules

(1)对于传统的标准神经网络因层次结构太少,只存在神经元,层,网络三个层级,需要对每一层中的神经元组成胶囊(capsule),在胶囊内部能够做大量的内部计算,并实现输出压缩后的界面。

(2)作用是同步过滤,在传统神经元中,标量 x i x_i xi,加权求和得到 a j a_j aj。用非线性激活函数,转换得到神经元输出,是个标量值,激活函数可以选择sigmoid、tanh和ReLU等,最终得到标量。在Capsule中,ui是向量,矩阵的乘就是一个简单的仿射变换,然后,对i维度做加权求和,传统是对标量加权求和,Capsule是对向量加权求和得到向量。Squash函数是个非线性的函数,与传统非线性对应,输出是向量。(1)其中向量的长度表示其存在的概率,方向表示向量的属性。(2)模型的输出依据capsule同父类(上层向量)的正确程度被送到子类节点。在训练期间,routing是迭代完成的。每次迭代都会根据观察到的routing调整胶囊之间的路由权重。这是一种类似于k均值算法或竞争性学习的方式。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tdyXsviH-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028184523667.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ndh22oVo-1667047514183)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028193816505.png)]

import matplotlib.pyplot as plt
import numpy as np
import torch as th
import torch.nn as nn
import torch.nn.functional as F

import dgl

def init_graph(in_nodes, out_nodes, f_size):
    u = np.repeat(np.arange(in_nodes), out_nodes)
    v = np.tile(np.arange(in_nodes, in_nodes + out_nodes), in_nodes)
    g = dgl.DGLGraph((u, v))
    # init states
    g.ndata["v"] = th.zeros(in_nodes + out_nodes, f_size)
    g.edata["b"] = th.zeros(in_nodes * out_nodes, 1)
    return g

import dgl.function as fn


class DGLRoutingLayer(nn.Module):
    def __init__(self, in_nodes, out_nodes, f_size):
        super(DGLRoutingLayer, self).__init__()
        self.g = init_graph(in_nodes, out_nodes, f_size)
        self.in_nodes = in_nodes
        self.out_nodes = out_nodes
        self.in_indx = list(range(in_nodes))
        self.out_indx = list(range(in_nodes, in_nodes + out_nodes))

    def forward(self, u_hat, routing_num=1):
        self.g.edata["u_hat"] = u_hat

        for r in range(routing_num):
            # step 1 (line 4): normalize over out edges
            edges_b = self.g.edata["b"].view(self.in_nodes, self.out_nodes)
            self.g.edata["c"] = F.softmax(edges_b, dim=1).view(-1, 1)
            self.g.edata["c u_hat"] = self.g.edata["c"] * self.g.edata["u_hat"]

            # Execute step 1 & 2
            self.g.update_all(fn.copy_e("c u_hat", "m"), fn.sum("m", "s"))

            # step 3 (line 6)
            self.g.nodes[self.out_indx].data["v"] = self.squash(
                self.g.nodes[self.out_indx].data["s"], dim=1
            )

            # step 4 (line 7)
            v = th.cat(
                [self.g.nodes[self.out_indx].data["v"]] * self.in_nodes, dim=0
            )
            self.g.edata["b"] = self.g.edata["b"] + (
                self.g.edata["u_hat"] * v
            ).sum(dim=1, keepdim=True)

    @staticmethod
    def squash(s, dim=1):
        sq = th.sum(s**2, dim=dim, keepdim=True)
        s_norm = th.sqrt(sq)
        s = (sq / (1.0 + sq)) * (s / s_norm)
        return s
# test
in_nodes = 20
out_nodes = 10
f_size = 4
u_hat = th.randn(in_nodes * out_nodes, f_size)
routing = DGLRoutingLayer(in_nodes, out_nodes, f_size)

entropy_list = []
dist_list = []

for i in range(10):
    routing(u_hat)
    dist_matrix = routing.g.edata["c"].view(in_nodes, out_nodes)
    entropy = (-dist_matrix * th.log(dist_matrix)).sum(dim=1)
    entropy_list.append(entropy.data.numpy())
    dist_list.append(dist_matrix.data.numpy())

stds = np.std(entropy_list, axis=1)
means = np.mean(entropy_list, axis=1)
plt.errorbar(np.arange(len(entropy_list)), means, stds, marker="o")
plt.ylabel("Entropy of Weight Distribution")
plt.xlabel("Number of Routing")
plt.xticks(np.arange(len(entropy_list)))
plt.close()


import matplotlib.animation as animation
import seaborn as sns

fig = plt.figure(dpi=150)
fig.clf()
ax = fig.subplots()


def dist_animate(i):
    ax.cla()
    sns.distplot(dist_list[i].reshape(-1), kde=False, ax=ax)
    ax.set_xlabel("Weight Distribution Histogram")
    ax.set_title("Routing: %d" % (i))


ani = animation.FuncAnimation(
    fig, dist_animate, frames=len(entropy_list), interval=500
)
plt.close()

import networkx as nx
from networkx.algorithms import bipartite

g = routing.g.to_networkx()
X, Y = bipartite.sets(g)
height_in = 10
height_out = height_in * 0.8
height_in_y = np.linspace(0, height_in, in_nodes)
height_out_y = np.linspace((height_in - height_out) / 2, height_out, out_nodes)
pos = dict()

fig2 = plt.figure(figsize=(8, 3), dpi=150)
fig2.clf()
ax = fig2.subplots()
pos.update(
    (n, (i, 1)) for i, n in zip(height_in_y, X)
)  # put nodes from X at x=1
pos.update(
    (n, (i, 2)) for i, n in zip(height_out_y, Y)
)  # put nodes from Y at x=2


def weight_animate(i):
    ax.cla()
    ax.axis("off")
    ax.set_title("Routing: %d  " % i)
    dm = dist_list[i]
    nx.draw_networkx_nodes(
        g, pos, nodelist=range(in_nodes), node_color="r", node_size=100, ax=ax
    )
    nx.draw_networkx_nodes(
        g,
        pos,
        nodelist=range(in_nodes, in_nodes + out_nodes),
        node_color="b",
        node_size=100,
        ax=ax,
    )
    for edge in g.edges():
        nx.draw_networkx_edges(
            g,
            pos,
            edgelist=[edge],
            width=dm[edge[0], edge[1] - in_nodes] * 1.5,
            ax=ax,
        )


ani2 = animation.FuncAnimation(
    fig2, weight_animate, frames=len(dist_list), interval=500
)
plt.close()

Tree—LSTM in DGL

论文实践

该模型核心思想是通过将链式LSTM扩展到树结构LSTM来引入语言任务的句法信息。利用依赖树和支持树技术来获得“潜在树”。

其中因差异性的树通常在结构上存在差异,DGL通过将这些树放置在一个简单图中,通过不同树之间的的结构完成消息传递的过程。

Generative Models of Graphs

论文实践

生成式模型用于实现图的训练以及生成,通过图生成模型进行图结构的形成。感觉上通强化学习类似。

现阶段衡量真实图数据的属性有:

  1. 度的分布

    1.1 一个随机节点度为k的概率,即可以通过节点度数归一化直方图

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-p4kbvhY6-1667047514184)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028215009816.png)]

  2. 聚类系数

    2.1 衡量衡量节点邻居的连接紧密程度。其中节点i的度数为 k i k_i ki,其中邻居间边数为 e i e_i ei,即实际存在的邻居的边数占所有邻居上可能存在的边数。整个图上的clustering coefficient就是对每个节点的clustering coefficient取平均。
    C i = e i K i ( K i − 1 ) C_i=\frac{e_i}{K_i(K_i-1)} Ci=Ki(Ki1)ei

  3. 连接组成部分

    connectivity是任意两个节点都有路径相连的最大子图的大小。找到connected components(连通分量)的方法:随机选取节点跑BFS,标记所有被访问到的节点;如果所有节点都能访问到,说明整个网络都是连通的;否则就选一个没有访问过的节点重复BFS过程。

  4. 路径长度

    生成图的基本步骤:

    1. 对处于变化的图编码
    2. 随机添加行为
    3. 如果在训练过程中,收集错误标志并最优化模型参数

DGMG( Deep Generative Models of Graphs

在每一个事件步,选择1. 在图中增加新的节点,2. 选择两个已存在的node并在其中增添一条边。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lw5b1IuG-1667047514184)(/Users/zhouda/Library/Application Support/typora-user-images/image-20221028222814104.png)]

优化目标

同语言建模保持一致,DGMG通过假定存在以序列形式存在的行为 a 1 , ⋯   , a T a_{1},\cdots,a_{T} a1,,aT,此时模型需要跟随这些步骤,计算整体概率分布形式,并使该MLE损失函数最小化。
p ( a 1 , ⋯   , a T ) = p ( a 1 ) p ( a 2 ∣ a 1 ) ⋯ p ( a T ∣ a 1 , ⋯   , a T − 1 ) . \begin{split}p(a_{1},\cdots, a_{T}) = p(a_{1})p(a_{2}|a_{1})\cdots p(a_{T}|a_{1},\cdots,a_{T-1}).\\\end{split} p(a1,,aT)=p(a1)p(a2a1)p(aTa1,,aT1).
我们的目标为最小化MLE损失
− log ⁡ p ( a 1 , ⋯   , a T ) = − ∑ t = 1 T log ⁡ p ( a t ∣ a 1 , ⋯   , a t − 1 ) . \begin{split}-\log p(a_{1},\cdots,a_{T})=-\sum_{t=1}^{T}\log p(a_{t}|a_{1},\cdots, a_{t-1}).\\\end{split} logp(a1,,aT)=t=1Tlogp(ata1,,at1).

def forward_train(self, actions):
    """
    - actions: list
        - Contains a_1, ..., a_T described above
    - self.prepare_for_train()
        - Initializes self.action_step to be 0, which will get
          incremented by 1 every time it is called.
        - Initializes objects recording log p(a_t|a_1,...a_{t-1})

    Returns
    -------
    - self.get_log_prob(): log p(a_1, ..., a_T)
    """
    self.prepare_for_train()

    stop = self.add_node_and_update(a=actions[self.action_step])
    while not stop:
        to_add_edge = self.add_edge_or_not(a=actions[self.action_step])
        while to_add_edge:
            self.choose_dest_and_update(a=actions[self.action_step])
            to_add_edge = self.add_edge_or_not(a=actions[self.action_step])
        stop = self.add_node_and_update(a=actions[self.action_step])

    return self.get_log_prob()

需要实现DGMG框架

import torch.nn as nn


class DGMGSkeleton(nn.Module):
    def __init__(self, v_max):
        """
        Parameters
        ----------
        v_max: int
            Max number of nodes considered
        """
        super(DGMGSkeleton, self).__init__()

        # Graph configuration
        self.v_max = v_max

    def add_node_and_update(self, a=None):
        """Decide if to add a new node.
        If a new node should be added, update the graph."""
        return NotImplementedError

    def add_edge_or_not(self, a=None):
        """Decide if a new edge should be added."""
        return NotImplementedError

    def choose_dest_and_update(self, a=None):
        """Choose destination and connect it to the latest node.
        Add edges for both directions and update the graph."""
        return NotImplementedError

    def forward_train(self, actions):
        """Forward at training time. It records the probability
        of generating a ground truth graph following the actions."""
        return NotImplementedError

    def forward_inference(self):
        """Forward at inference time.
        It generates graphs on the fly."""
        return NotImplementedError

    def forward(self, actions=None):
        # The graph you will work on
        self.g = dgl.DGLGraph()

        # If there are some features for nodes and edges,
        # zero tensors will be set for those of new nodes and edges.
        self.g.set_n_initializer(dgl.frame.zero_initializer)
        self.g.set_e_initializer(dgl.frame.zero_initializer)

        if self.training:
            return self.forward_train(actions=actions)
        else:
            return self.forward_inference()
实现动态图编码

因上述行为产生的图都由一个概率分布函数采样获得,为投影该结构化的数据至一个欧式空间。此时最大的挑战是该过程需要再图形变化的过程中仍然需要重复。
h G = ∑ v ∈ V Sigmoid ( g m ( h v ) ) f m ( h v ) , \begin{split}\textbf{h}_{G} =\sum_{v\in V}\text{Sigmoid}(g_m(\textbf{h}_{v}))f_{m}(\textbf{h}_{v}),\\\end{split} hG=vVSigmoid(gm(hv))fm(hv),
未来方向:

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值