之前多次谈到一个包,就是 karateclub,这个包囊括了很多社区检测算法和节点嵌入算法,今天专门介绍一下这个包如何使用。
算法分类
这里面将所有方法分成了七类,前两类是社区检测方法,中间四类是节点嵌入方法,最后一类是整图嵌入方法。
我整理了一下这个包中收录的各种方法,如下图所示。
# Overlapping community detection 重叠社区检测方法
EgoNetSplitter, DANMF, NNSED, MNMF, BigClam, SymmNMF
# Non-overlapping community detection 非重叠社区检测方法
GEMSEC, EdMot, SCD, LabelPropagation
# Neighbourhood-based node embedding 基于邻域的节点嵌入方法
RandNE, DeepWalk, Node2Vec, Walklets, BoostNE, NodeSketch,
Diff2Vec, GEMSEC, NetMF, GraRep, NMFADMM, LaplacianEigenmaps
# Structural node embedding 基于结构的节点嵌入方法
GraphWave, Role2Vec
# Attributed node embedding 基于属性的节点嵌入方法
FeatherNode, MUSAE, SINE, BANE, TENE, TADW, FSCNMF, ASNE
# Meta node embedding 源节点嵌入方法
NEU
# Whole graph embedding 整图嵌入方法
LDP, FeatherGraph, IGE, GeoScattering, GL2Vec, NetLSD, SF, FGSD, Graph2Vec
注意,karateclub包中的节点嵌入算法都对图的连通性有要求,必须是连通图才能进行节点嵌入,否则会报错。
节点嵌入
下面具体介绍一下,如何使用这个包中的算法。
DeepWalk
以节点嵌入为例,我们先用一下其中最经典的DeepWalk算法。
import networkx as nx
import networkx.algorithms.community as nx_comm
from karateclub import DeepWalk
from sklearn.cluster import KMeans
G = nx.karate_club_graph()
model = DeepWalk() # model
model.fit(G) # node embedding
G_embedding = model.get_embedding()
# clustering by kmeans
num_coms = 2 # number of clusters
clusters = KMeans(n_clusters=num_coms).fit_predict(G_embedding)
print(clusters)
communities = []
for i in range(num_coms):
communities.append(set())
for i in range(len(clusters)):
communities[clusters[i]].add(i)
print(communities)
mod = nx_comm.modularity(G, communities)
print('The modularity of karate club graph is {:.4f}'.format(mod))
我们先对karate_club_graph进行了节点嵌入操作,然后利用KMeans算法对嵌入结果进行聚类,这里设置为2类,之后将聚类结果转化为社区检测结果,然后计算出模块度值,结果如下所示。
[1 1 0 1 1 1 1 1 0 0 1 1 1 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
[{32, 33, 2, 8, 9, 13, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}, {0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 16, 17, 19, 21}]
The modularity of karate club graph is 0.3352
接下来,我们可视化一下社区检测结果。
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10, 6))
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos, with_labels=True)
colors = ['#483D8B', '#00CED1', '#FF4500', '#FFD700']
for i in range(num_coms):
nx.draw_networkx_nodes(G, pos, node_size=700, nodelist=list(communities[i]), node_color=colors[i])
nx.draw_networkx_edges(G, pos, alpha=0.5, width=2)
plt.axis("off")
plt.show()
效果如下:
Node2Vec
我们再试试Node2Vec算法,这是对DeepWalk的一个改进算法。
只需要将上面代码中的DeepWalk换成Node2Vec就行,这次我们将聚类个数设置为3。
import networkx as nx
import networkx.algorithms.community as nx_comm
from karateclub import DeepWalk, Node2Vec
from sklearn.cluster import KMeans
G = nx.karate_club_graph()
model = Node2Vec() # model
model.fit(G)
G_embedding = model.get_embedding()
# clustering by kmeans
num_coms = 3
clusters = KMeans(n_clusters=num_coms).fit_predict(G_embedding)
print(clusters)
communities = []
for i in range(num_coms):
communities.append(set())
for i in range(len(clusters)):
communities[clusters[i]].add(i)
print(communities)
mod = nx_comm.modularity(G, communities)
print('The modularity of karate club graph is {:.4f}'.format(mod))
这次结果为:
[1 1 0 1 2 2 2 1 0 0 2 1 1 1 0 0 2 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
[{32, 33, 2, 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}, {0, 1, 3, 7, 11, 12, 13, 17, 19, 21}, {4, 5, 6, 10, 16}]
The modularity of karate club graph is 0.3744
可以看到,模块度值比上次要高一些(前面是0.3352),再看看社区检测效果。
总结
在这篇文章里面,我们只是利用了karateclub 包中的节点嵌入算法,参数都是默认参数,然后再用聚类算法对节点进行分类,而不是将嵌入结果直接作为社区检测的结果。实际上我们可以将嵌入算法最后的输出维度设置为2维,这样可以直接看到节点低维嵌入的效果,参考之前的文章。