复杂网络作业五：第四题——Structural Role 结构角色：ROIX

最新推荐文章于 2023-03-30 10:31:29 发布

ccutyear

最新推荐文章于 2023-03-30 10:31:29 发布

阅读量1.3k

点赞数 6

分类专栏：复杂网络（作业）文章标签：复杂网络

本文链接：https://blog.csdn.net/ccutyear/article/details/109270339

版权

复杂网络（作业）专栏收录该内容

6 篇文章

订阅专栏

前言

扯淡建议跳过：昨天晚上本来是已经写完了，结果第二天发现内容不对什么原因我也不知道。在这个没有做作业的一周之中，发生了很多的新闻。其中，最令人印象深刻的四川某校的书记的以死明志。想到当初的教科书中的“安能以身之察察，受物之汶汶者乎？宁赴湘流，葬于江鱼之腹中。安能以皓皓之白，而蒙世俗之尘埃乎?”真的在现实生活中上了头条。想想也是如果这个书记不这么做就无法给这个社会带来一点震撼，这才是这个时代最需要的脊梁吧！另外，正是因为那些高尚的精神在这个社会中并非随处可见（应该说是极其罕见），所以才更值得新一带人所崇拜。最后，说实话如果的作是我，我恐怕会变成一条泥鳅在淤泥中打滚吧。毕竟活着远远要比改变环境更容易。（我认为书记并非想逃避，他是想通过这种方式来改变大环境。）呵呵，这也就是为什么大多数的人都是普通人吧。

一、题目

科学家合作网，下载地址http://www-personal.umich.edu/~mejn/netdata/netscience.zip.这个网络是加权的，把这个网络按照无向和无权网络来处理。
特征抽取分成两步：首先抽取节点的基本局部特征，然后聚集起来以获得全局特征。特征抽取构建出矩阵V，包含n个节点，每个节点都有f个特征，包括局部和全局信息。ROIX从矩阵中抽取节点的特征。
（1）基本特征：对于每个节点v，选择3个基本特征：

节点v的度，deg(v)；
节点v的局部网络egonet(v)中的边数，其中v的局部网络egonet(v)是包含节点v及其邻居的导出子图
节点v的局部网络和图G的其他部分连接的边数，也就是进入或者离开v的局部网络的边数。
我们使用V~u来表示节点u的基本特征向量。对于任意节点对u和v，使用cos相似性来度量其特征向量x和y之间的相似性。Cos相似性的定义如下面的公式：

问题：计算出节点9的基本特征向量，并给出与节点9最相似的前5个节点（节点9本身除外）。（注：本题中V~9中的元素不大于10。）（在此处是错误的，因为这个注释我以为是在编号不大于10的结点中进行查找的，结果原来是要在所有结点中进行搜索的。所以代码的对应部分也需要进行修改。我咋发现错了，老师说的…-_-|||）
（2）递归特征
本步中将递归产生更多的特征。这里使用mean和sum作为聚合函数。初始时，每个节点u都有一个特征向量
。在第一轮迭代中，我们聚合u的所有邻居的特征向量均值(mean)到V_{u中，也特征向量求和(sum)做相同操作，也就是得到如下面公式所示的V}u(1)：

N(u)是节点u的邻居。如果N(u)为空的话，mean和sum也都为0.
在k轮迭代后，会获得所有的特征矩阵

问题：这里运行2轮迭代，即K=2。给出与节点9最相似的5个节点（节点9本身除外）。（提示：节点9以及与它最相似性的5个节点的相似性值都大于0.9）。与本题中的第一个问题中得出的前5个节点相比，本问题的解和第一个问题的解有哪些相同节点和哪些不同节点？
（3）角色发现
这个部分将根据节点的递归特征向量和节点相似性得出更多的结论。
问题1：创建有20个bin的直方图，给出节点9和其他节点的cos相似度分布（根据其递归特征向量）。X轴是其他节点和节点9的cos相似度，y轴是节点的数量。是否可以直方图得出一些组/角色？能得出几组？（提示，查找spikes）
问题2：对于这些组/角色，从每组中选取节点u，查看其特征向量，并根据其特征向量画出节点的子图。可以用手画，也可以使用networkx或者graphviz画。图中，需要使用到节点u的局部特征，并且注意其1跳邻居的聚集特征。如果某些特征难以使用的话，可以忽略掉，而且不必画出节点u的三跳以外的节点。
最后，简单总结一下结构上的角色差异。

二、需要使用的函数的介绍（networkx）

1.构建一个图

nx.Graph()

2.从gml文件读入一个图

nx.read_gml(path,label=“id”) #和网上的可能会有不同

3.取出图中的节点

G.nodes #G是networkx.Graph类型的变量

4.取出图中的边

G.edges #G是networkx.Graph类型的变量

5.把多个节点批量加入到图中

G.add_nodes_from(nodes)

6.把多个边批量加入到图中

G.add_edges_from(edges)

7.求某一个节点的度

G.degree(node) #G是networkx类型的变量，node是一个int类型的变量

8.获取一个局部邻居子图

nx.ego_graph(G,node,radius=2)
G:图
node:是中心点的编号
radius：步长

9.针对一个图找出一个合理的部局

pos = nx.spring_layout(G)

10.根据部局来画图

nx.draw_networkx(G,pos) #没有标签

11.根据部局来画标签（编号）

nx.draw_networkx_labels(G,pos = pos)

三、需要使用的函数的介绍（matplotlib.pyplot）

1.为什么要添加这一个

考虑到有人可能对这个模块下的东西不熟，所以就把matplotlib的相关东西也写一下。（有关list,numpy的东西就不写了）至于matplotlib的安装就不过程就不写了，大家就自己百度吧。

2.设置横坐标

plt.xlabel(name,fontproperties=“simsun”) #fontproperties是用于设置字体，这样就可以显示中文了。

3.设置纵坐标

plt.ylabel(name,fontproperties=“simsun”)#fontproperties是用于设置字体，这样就可以显示中文了。

4.设置标题

plt.title(name,fontproperties=“simsun”)#fontproperties是用于设置字体，这样就可以显示中文了。

5.画柱状统计图

plt.bar(x=x_bar,height=y_bar,width=0.1)

6.统计图展示

plt.show()

7.设置画布大小

plt.figure(figsize=(15,15))

8.画布清空

plt.clf()

9.图画保存

plt.savefig(save_path)

代码

# -*- coding: utf-8 -*-
import random
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
#读取网络图
path = r"netscience.gml"
science_G = nx.read_gml(path,label="id")
#把原来的无向带权图转化为无向无向无权图
nodes = science_G.nodes
edges = science_G.edges
G = nx.Graph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)
#统计节点的特征信息
node_degree_list = []  #节点度特征
node_egonet_list = []  #子图特征
node_global_list = []  #全局特征
local_graph_list = []  #局部子图（用于在后面使用）

def calGlobalFeature(G1,G2):
    edges = G2.edges
    cnt_out_edge = 0
    for edge in edges:
        u,v = edge
        if((u in G1 and v not in G1) or (v in G1 and u not in G1)):
            cnt_out_edge+=1
    return cnt_out_edge
for node in G:
    node_degree = science_G.degree(node)
    node_degree_list.append(node_degree)
  #  print(node_degree)
    ego_G = nx.ego_graph(G,node)
    node_egonet = ego_G.number_of_edges()
    node_egonet_list.append(node_egonet)
  #  print(node_egonet)
    ego2_G = nx.ego_graph(G,node,radius=2)
    node_global =  calGlobalFeature(ego_G,ego2_G)
    node_global_list.append(node_global)
  #  print(node_global)
    local_graph_list.append(ego_G)
node_id_list = list(G.nodes())   #获得所有节点的编号，但是需要注意是乱序的

node_feature = list(zip(node_id_list,node_degree_list,node_egonet_list,node_global_list)) # [编号，度，邻域，全局]
node_feature.sort(key = lambda x:x[0])   #通过排序使得它是按照节点的递增序排列
node_feature = np.array(node_feature)    #转成ndarray类型方便后续计算

# 信息特征统计完成，编号是乱序
# 接下来是计算与9号结点之间的相似性
def calSim(node1,node2):
    node1_np = node1[1:]
    node2_np = node2[1:]
    if(np.sqrt(np.sum(node2_np*node2_np))) == 0:
        return (node2[0],0)
    return (node2[0],np.sum(node1_np*node2_np)/(np.sqrt(np.sum(node1_np*node1_np))*np.sqrt(np.sum(node2_np*node2_np))))
# 在本题的第一小题中只需要求出9号节点与前10个结点的相似度
# 即可，但是后面以我的理解是和所有节点的相似度。这么做可以
# 使得其得函数的使用可调整
def calNodeFeatureSim(node_feature_list,node_id,cal_len=99999):
    '''
    :param node_feature_list: 节点的特征序列
    :param node_id: 目标节点
    :param cal_len: 有多少个结点需要用来计算相似度
    :return: 与目标节点的相似度序列
    '''
    node_feature = node_feature_list[node_id]
    sim = []
    for node in node_feature_list:
        if(node[0] == node_id):
            continue
        if(node[0] > cal_len):
            break
        sim.append(calSim(node_feature,node))
    sim.sort(key = lambda x:x[1],reverse=True)
    return sim

sim9 = calNodeFeatureSim(node_feature,9,10)
print("在前10个节点中与9号节点最相似的5个节点如下：")
print("节点编号       相似度")
for i in range(5):
    print("  {}       {}".format(sim9[i][0],sim9[i][1]))


#计算递归特征
def recursionFeature(node_feature,local_feature_list):
    '''
    :param node_feature: 结点的特征序列
    :param local_feature_list: 每一个结点的一跳局部子图
    :return: 新的特征
    '''
    new_node_feature = []
    for node in node_feature:
        node_id = int(node[0])
        local_G = local_graph_list[node_id]
        feature_len = len(node)
        new_feature1 = np.zeros(shape=(feature_len - 1),dtype = np.float32)
       # print(node_id)
        for neig_node in local_G.nodes():
            if(neig_node == node_id):
                continue
           # print(neig_node,node_feature[neig_node])
            new_feature1 += node_feature[neig_node][1:]
        if(len(local_G.nodes()) > 1):
            new_feature2 = new_feature1/(len(local_G.nodes()) - 1)
        else:
            new_feature2 = new_feature1 = np.zeros(shape=(feature_len - 1),dtype = np.float32)
        new_node_feature.append(np.append(new_feature1,new_feature2))
    new_node_feature = np.array(new_node_feature)
    node_feature = np.hstack((node_feature,new_node_feature))
    return node_feature
for i in range(2):
    node_feature = recursionFeature(node_feature,local_graph_list)
#sim9 = calNodeFeatureSim(node_feature,9,10)  #这是按照我原先的错误理解做的。
sim9 = calNodeFeatureSim(node_feature,9)
print(node_feature.shape)  #此时每一个节点的特征数量应该是27个，所以此处应该是28列（第一列是编号）
print("递归特征之后")
print("在前10个节点中与9号节点最相似的5个节点如下：")
print("节点编号       相似度")
for i in range(5):
    print("  {}       {}".format(sim9[i][0],sim9[i][1]))
#进行角色发现,这里可能有错误。我不知道我理解的角色发现是否正确
sim9 = calNodeFeatureSim(node_feature,9) #按照我的理解应该要在全局中进行角色发现
sim9 = np.array(sim9)   #转成ndarray类型方便后续计算
#对数量信息进行统计
bardata = []
base_l = 0
base_u = 0
change = 1/20
for i in range(20):
    base_u += change
    bar1 = sim9[sim9[:,1] < base_u]
    bar2 = bar1[bar1[:,1] >= base_l]
    bardata.append(bar2)
    base_l = base_u

x_bar = [1/20*x + 0.025 for x in range(20)]  #柱形图的横轴
y_bar = [len(x) for x in bardata ]   #柱形图的纵轴

plt.xlabel("相似度",fontproperties="simsun")
plt.ylabel("数量",fontproperties="simsun")
plt.title("相似度统计图",fontproperties="simsun")
plt.bar(x=x_bar,height=y_bar,width=0.1)
plt.show()
import heapq #用于寻找最大的n个值
max3_index = list(map(y_bar.index,heapq.nlargest(3,y_bar))) #用于寻找最大的3个值,并获取对应的下标
random_node = []
for index in max3_index:
    random_node.append(int(bardata[index][random.randint(0,y_bar[index])][0]))
print("从三组中选出的任意结点是：",random_node)
plt.figure(figsize=(15,15))  #设定画布大小
for node_id in random_node:
    plt.clf()   #一共要画三个图，所以每一次在画图之前都需要清空
    act_local_G = nx.ego_graph(G,node_id,radius=3)
    for node in act_local_G:
        act_local_G.add_node(node,local_feature = node_feature[node][1])
    pos=nx.spring_layout(act_local_G) #根据spring算法进行部局，spring算法我不知道是啥。
    nx.draw_networkx(act_local_G,pos)
    nx.draw_networkx_labels(act_local_G,pos = pos)
    plt.savefig(r"filename" + str(node_id) + ".png")  #对图片进行保存