Graph-Bert 代码理解⑤script_1_preprocess.py

1.WL

node_list = method_obj.data['idx']
link_list = method_obj.data['edges']
def setting_init(self, node_list, link_list):
    for node in node_list:
        self.node_color_dict[node] = 1
        self.node_neighbor_dict[node] = {}

    for pair in link_list:
        u1, u2 = pair
        if u1 not in self.node_neighbor_dict:
            self.node_neighbor_dict[u1] = {}
        if u2 not in self.node_neighbor_dict:
            self.node_neighbor_dict[u2] = {}
        self.node_neighbor_dict[u1][u2] = 1
        self.node_neighbor_dict[u2][u1] = 1

在这里插入图片描述
在这里插入图片描述

def WL_recursion(self, node_list):
   iteration_count = 1
   while True:
       new_color_dict = {}
       for node in node_list:#共2708个节点(含重复)
           neighbors = self.node_neighbor_dict[node]
           #例如{10531: 1, 1129442: 1, 31349: 1, 686532: 1, 31353: 1} 
           neighbor_color_list = [self.node_color_dict[neb] for neb in neighbors]
           #例如[1, 1, 1, 1, 1] 
           color_string_list = [str(self.node_color_dict[node])] + sorted([str(color) for color in neighbor_color_list])
           #也即neighbor_color_list前面加上一个1
           #例如['1', '1', '1', '1', '1', '1']
           color_string = "_".join(color_string_list)
           #例如1_1_1_1_1_1
           hash_object = hashlib.md5(color_string.encode())
           hashing = hash_object.hexdigest()
           new_color_dict[node] = hashing
           #构造key为当前节点,value为通过其邻居构造出来的hash乱码
           #例如{31336: 'ad79ef9e613d8f16268a9227ada05a0e'}
       color_index_dict = {k: v+1 for v, k in enumerate(sorted(set(new_color_dict.values())))}
       for node in new_color_dict:
           new_color_dict[node] = color_index_dict[new_color_dict[node]]
       if self.node_color_dict == new_color_dict or iteration_count == self.max_iter:
           return
       else:
           self.node_color_dict = new_color_dict
       iteration_count += 1

(但是运算的最终结果是new_color_dict,返回的却是没怎么变化的node_color_dict,还是需要看论文理解一下WL究竟是怎么工作的)

2.亲密度&子图批处理

报错:

OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

加入如下代码解决

import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

可选择k=1, 2, 3, 4, 5, 6, 7, 8, 9, 10生成对应数据,根据后文取k=7试验

这时有

data_obj.compute_s = True

所以数据生成阶段’S’不再是None,应表示亲密度,其中c=0.15

eigen_adj = self.c * inv((sp.eye(adj.shape[0]) - (1 - self.c) * self.adj_normalize(adj)).toarray())

在这里插入图片描述

for node_index in index_id_dict:
    node_id = index_id_dict[node_index]
    s = S[node_index]
    s[node_index] = -1000.0
    top_k_neighbor_index = s.argsort()[-k:][::-1]
    #argsort是得到将s从大到小排序后其元素对应原始索引的列表
    #[-k:]是截取上表的后k位,[::-1]再将其逆序
    user_top_k_neighbor_intimacy_dict[node_id] = []
    for neighbor_index in top_k_neighbor_index:
        neighbor_id = index_id_dict[neighbor_index]
        user_top_k_neighbor_intimacy_dict[node_id].append((neighbor_id, s[neighbor_index]))

输出user_top_k_neighbor_intimacy_dict
在这里插入图片描述

3.hop

根据’idx和’edges’通过networkx构造graph
这里用plt可视化试了试,就很密
在这里插入图片描述
(这里调用了前面生成的user_top_k_neighbor_intimacy_dict,也就是说hop信息利用了亲密度信息,那这两个信息一起用不是重复了吗?)
求得每个点到其对应邻居的最短路长hop_dict
在这里插入图片描述

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值