社区发现FN算法Python实现

​2004年,Newman在GN(Girvan and Newman, 2002)算法的基础上,提出了另外一种快速检测社区的算法,称为FN算法。该算法能得到和GN算法相似的结构,但是时间复杂度更低,GN算法的时间复杂度为 O ( m 2 n ) O(m^2n) O(m2n),FN算法的时间复杂度为 O ( ( m + n ) n ) O((m+n)n) O((m+n)n),其中, m m m是边的数量, n n n是节点的数量。此处给出FN算法的Python实现,并给出对比实验以及社区发现的三种评价指标。

在这里插入图片描述
Newman, M. E. J. ,2004. Fast algorithm for detecting community structure in networks. phys rev e stat nonlin soft matter phys, 69, 066133.

去看原文

算法原理

FN算法是一种层次聚类算法。起初每个节点都是一个类。每次合并让Q值增加(即 Δ Q \Delta{Q} ΔQ)最大的一对节点,重复这个过程,直到所有节点都在一个社区为止。在这个合并的过程中,选择Q值(社区发现评估指标)最大的作为最终划分结果。

Δ Q = 2 ( e i j − a i a j ) \Delta{Q}=2(e_{ij}-a_ia_j) ΔQ=2(eijaiaj)
其中, e i j e_{ij} eij表示连接社区 i i i和社区 j j j的边的比例; a i a_i ai表示连接到社区 i i i的所有末端节点比例, a i = ∑ j e i j a_i=\sum_j{e_{ij}} ai=jeij。以下是一个合并的结构图,从下往上进行合并。
在这里插入图片描述

评价指标

社区发现的评估指标主要有三个:互信息和标准化互信息(Normalized Mutual Information,NMI指数)、调整兰德指数(Adjusted Rand Index,ARI指数)、模块度Q(modularity Q)。

当无法获取真实社区划分结果时,可以采用模块度Q来评价。Modularity用于评判社区划分结果的优劣。模块度越大则表明社区划分效果越好,其范围在 [ − 0.5 , 1 ) [-0.5,1) [0.5,1),论文(Newman, 2003)表示当Q值在0.3~0.7之间时,说明聚类的效果很好。

Q = ∑ i = 1 n ( e i i − a i 2 ) Q=\sum_{i=1}^{n}(e_{ii}-a_i^2) Q=i=1n(eiiai2)
其中 e i j = ∑ v w A v w 2 m e_{ij}=\sum_{vw}\frac{A_{vw}}{2m} eij=vw2mAvw a i = k i 2 m = ∑ j e i j a_i=\frac{k_i}{2m}=\sum_je_{ij} ai=2mki=jeij
m m m表示边的数量, e i j e_{ij} eij表示一个节点在社区 i i i内,另一个节点在社区 j j j内的边的比例。 e i i e_{ii} eii表示在社区 i i i内所有的边与整个网络所有的边的一个比值(一个社区内部的度比上整个网络的度),而 a i a_{i} ai则表示i社区内的节点的度(包含了一点在社区 i i i内一点在社区 i i i外的边的度)占整个网络的度比值。

可将模块度用矩阵形式表示,即
Q = 1 2 m T r ( S T B S ) Q=\frac{1}{2m}Tr(S^TBS) Q=2m1Tr(STBS)
其中, B i j = A i j − k i k j 2 m B_{ij}=A_{ij}-\frac{k_ik_j}{2m} Bij=Aij2mkikj k i k_i ki代表的是节点 i i i的度, A i j A_{ij} Aij为邻接矩阵; S S S为每个节点所属社区的one-hot表示, S i r = 1 S_{ir}=1 Sir=1表示第 i i i个节点属于第 r r r社区。

当已知真实社区划分结果时,可采用NMI指数和ARI指数进行评价。
1.NMI指数
如果结果越相似NMI值应接近1;结果很差则NMI值接近0。
N M I ( X , Y ) = 2 M I ( X , Y ) H ( X ) + H ( Y ) NMI(X,Y)=\frac{2MI(X,Y)}{H(X)+H(Y)} NMI(X,Y)=H(X)+H(Y)2MI(X,Y)
其中, M I ( X , Y ) = ∑ i = 1 ∣ X ∣ ∑ j = 1 ∣ Y ∣ P ( i , j ) l o g ( P ( i , j ) P ( i ) P ′ ( j ) ) MI(X,Y)=\sum_{i=1}^{|X|}\sum_{j=1}^{|Y|}P(i,j)log(\frac{P(i,j)}{P(i)P'(j)}) MI(X,Y)=i=1Xj=1YP(i,j)log(P(i)P(j)P(i,j)) H ( X ) = − ∑ i = 1 ∣ X ∣ P ( i ) l o g ( P ( i ) ) H(X)=-\sum_{i=1}^{|X|}P(i)log(P(i)) H(X)=i=1XP(i)log(P(i)) H ( Y ) = − ∑ j = 1 ∣ Y ∣ P ′ ( j ) l o g ( P ′ ( j ) ) H(Y)=-\sum_{j=1}^{|Y|}P'(j)log(P'(j)) H(Y)=j=1YP(j)log(P(j)) X , Y X,Y XY是划分类别唯一标签和真实类别唯一标签。

以下将用一个例子来介绍如何计算。
输出的划分结果:A=[1 2 1 1 1 1 1 2 2 2 2 3 1 1 3 3 3]
真实的划分结果:B=[1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3]

那么 X = u n i q u e ( A ) = [ 1 , 2 , 3 ] , Y = u n i q u e ( B ) = [ 1 , 2 , 3 ] X=unique(A)=[1,2,3],Y=unique(B)=[1,2,3] X=unique(A)=[1,2,3]Y=unique(B)=[1,2,3]
P ( i , j ) P(i,j) P(i,j)表示同时属于社区 i i i和社区 j j j的节点的联合概率, P ( i , j ) = ∣ X i ⋂ Y j ∣ N P(i,j)=\frac{|X_i\bigcap{Y_j}|}{N} P(i,j)=NXiYj, N N N为节点数, i ∈ X , j ∈ Y i\in{X},j\in{Y} iX,jY
P ( i ) , P ( j ) P(i),P(j) P(i),P(j)分别为类别 i , j i,j i,j的概率分布, P ( i ) = ∣ X i ∣ N , P ′ ( j ) = ∣ Y j ∣ N P(i)=\frac{|X_i|}{N},P'(j)=\frac{|Y_j|}{N} P(i)=NXi,P(j)=NYj
H ( X ) , H ( Y ) H(X),H(Y) H(X),H(Y)分别为 X , Y X,Y X,Y的信息熵。
所以 P ( X ) = [ 8 / 17 , 5 / 17 , 4 / 17 ] P(X)=[8/17,5/17,4/17] P(X)=[8/17,5/17,4/17] P ( Y ) = [ 6 / 17 , 6 / 17 , 5 / 17 ] P(Y)=[6/17,6/17,5/17] P(Y)=[6/17,6/17,5/17]
P ( X , Y ) = [ 5 / 17 1 / 17 2 / 17 1 / 17 4 / 17 0 0 1 / 17 3 / 17 ] P(X,Y)=\begin{bmatrix} 5/17 & 1/17 & 2/17 \\ 1/17 & 4/17 & 0 \\ 0 & 1/17 & 3/17 \\ \end{bmatrix} P(X,Y)=5/171/1701/174/171/172/1703/17
因此, M I ( X , Y ) = s u m ( P ( X , Y ) ∗ l o g ( P ( X , Y ) / ( P ( X ) T P ( Y ) ) ) ) MI(X,Y)=sum(P(X,Y) * log(P(X,Y)/(P(X)^TP(Y)))) MI(X,Y)=sum(P(X,Y)log(P(X,Y)/(P(X)TP(Y)))) H ( X ) = − P ( X ) l o g ( P ( X ) T ) H(X)=-P(X)log(P(X)^T) H(X)=P(X)log(P(X)T) H ( Y ) = − P ( Y ) l o g ( P ( Y ) T ) H(Y)=-P(Y)log(P(Y)^T) H(Y)=P(Y)log(P(Y)T),则 N M I ( X , Y ) = 0.3646 NMI(X,Y)=0.3646 NMI(X,Y)=0.3646

[1] Detecting the overlapping and hierarchical community structure in complex networks

2.ARI指数
兰德指数(RI指数)是两种划分 X , Y X,Y X,Y中顶点对正确分类的数量(顶点对在同一个社团中或者在不同的社团中)与总的顶点对的数量的比值,可以使用下式表示:
R I ( X , Y ) = a 00 + a 11 a 00 + a 01 + a 10 + a 11 = a 00 + a 11 C 2 n RI(X,Y)=\frac{a_{00}+a_{11}}{a_{00}+a_{01}+a_{10}+a_{11}}=\frac{a_{00}+a_{11}}{C_2^n} RI(X,Y)=a00+a01+a10+a11a00+a11=C2na00+a11
其中, a 00 a_{00} a00表示在真实社团划分与实验得到的社团划分里都不属于同一社团的点对数目; a 11 a_{11} a11表示在真实社团划分与实验得到的社团划分里都属于同一社团的点对数目; C 2 n C_2^n C2n指可以组成的总顶点对对数。 R I RI RI取值范围为 [ 0 , 1 ] [0,1] [0,1],值越大意味着两种划分结果越吻合。然而 R I RI RI会存在区分度不高的情况。因此为了提高区分度,提出了ARI指数:
A R I = R I − E ( R I ) m a x ( R I ) − E ( R I ) ARI=\frac{RI-E(RI)}{max(RI)-E(RI)} ARI=max(RI)E(RI)RIE(RI)
A R I ARI ARI取值范围为 [ − 1 , 1 ] [−1,1] [1,1],值越大意味着两种划分结果越吻合。从广义的角度来讲,ARI衡量的是两个数据分布的吻合程度。

结果对比

为了验证FN算法的结果质量和计算效率,采用了3个不同的测试案例来和GN算法进行对比实验。

自定义网络dolphinsfootball
节点数量2262115
边数量37159613

以下是实验结果(football网络有真实划分结果,分割线|前的为GN算法,后为FN算法,即GN | FN):

QNMIARI计算时间(s)
自定义网络0.528|0.528//0.024|0.02
dolphins0.519|0.495//0.67|0.46
football0.60|0.550.88|0.700.78|0.4710.1|8.8

可以看出,FN算法的结果质量稍逊于GN算法,但是计算效率更高。

以下是dolphins网络中两个算法结果的可视化:
FN算法结果(4个社区):
在这里插入图片描述
GN算法结果(5个社区):
在这里插入图片描述

源码

原文有源码,去看原文

一种高效实现

FN算法的一种高效实现
FN算法的一种高效实现
FN算法的一种高效实现

更多内容,请关注地学分析与算法。
在这里插入图片描述

  • 4
    点赞
  • 36
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是双向A星算法Python实现: ```python import heapq def astar(start, goal, neighbors_fn, heuristic_fn): """ A* algorithm implementation. :param start: starting node :param goal: goal node :param neighbors_fn: function that returns the neighbors of a node :param heuristic_fn: function that returns the heuristic value of a node :return: path from start to goal """ # Initialize the open and closed sets open_set = [] closed_set = set() # Initialize the start and goal nodes start_node = Node(start, None, 0, heuristic_fn(start, goal)) goal_node = Node(goal, None, 0, 0) # Add the start node to the open set heapq.heappush(open_set, start_node) # Loop until the open set is empty while open_set: # Get the node with the lowest f value current_node = heapq.heappop(open_set) # Check if we have reached the goal if current_node == goal_node: path = [] while current_node: path.append(current_node.state) current_node = current_node.parent return path[::-1] # Add the current node to the closed set closed_set.add(current_node) # Get the neighbors of the current node for neighbor in neighbors_fn(current_node.state): # Create a new node for the neighbor neighbor_node = Node(neighbor, current_node, current_node.g + 1, heuristic_fn(neighbor, goal)) # Check if the neighbor is already in the closed set if neighbor_node in closed_set: continue # Check if the neighbor is already in the open set if neighbor_node in open_set: # Update the neighbor's g value if we found a better path open_neighbor = open_set[open_set.index(neighbor_node)] if neighbor_node.g < open_neighbor.g: open_neighbor.g = neighbor_node.g open_neighbor.parent = neighbor_node.parent else: # Add the neighbor to the open set heapq.heappush(open_set, neighbor_node) # If we reach this point, there is no path from start to goal return None class Node: """ A node in the search tree. """ def __init__(self, state, parent, g, h): self.state = state self.parent = parent self.g = g self.h = h def __lt__(self, other): return (self.g + self.h) < (other.g + other.h) def __eq__(self, other): return self.state == other.state def bidirectional_astar(start, goal, neighbors_fn, heuristic_fn): """ Bidirectional A* algorithm implementation. :param start: starting node :param goal: goal node :param neighbors_fn: function that returns the neighbors of a node :param heuristic_fn: function that returns the heuristic value of a node :return: path from start to goal """ # Initialize the open and closed sets for both directions open_set_start = [] closed_set_start = set() open_set_goal = [] closed_set_goal = set() # Initialize the start and goal nodes for both directions start_node_start = Node(start, None, 0, heuristic_fn(start, goal)) start_node_goal = Node(goal, None, 0, heuristic_fn(goal, start)) goal_node_start = Node(goal, None, 0, 0) goal_node_goal = Node(start, None, 0, 0) # Add the start and goal nodes to the open sets for both directions heapq.heappush(open_set_start, start_node_start) heapq.heappush(open_set_goal, start_node_goal) # Loop until one of the open sets is empty while open_set_start and open_set_goal: # Get the node with the lowest f value from both directions current_node_start = heapq.heappop(open_set_start) current_node_goal = heapq.heappop(open_set_goal) # Check if we have reached the goal from both directions if current_node_start in closed_set_goal or current_node_goal in closed_set_start: path = [] # Add the path from start to current_node_start while current_node_start: path.append(current_node_start.state) current_node_start = current_node_start.parent # Add the path from goal to current_node_goal while current_node_goal: path.append(current_node_goal.state) current_node_goal = current_node_goal.parent return path[::-1] # Add the current nodes to the closed sets for both directions closed_set_start.add(current_node_start) closed_set_goal.add(current_node_goal) # Get the neighbors of the current nodes for both directions for neighbor_start in neighbors_fn(current_node_start.state): # Create a new node for the neighbor from the start direction neighbor_node_start = Node(neighbor_start, current_node_start, current_node_start.g + 1, heuristic_fn(neighbor_start, goal)) # Check if the neighbor from the start direction is already in the closed set from the goal direction if neighbor_node_start in closed_set_goal: continue # Check if the neighbor from the start direction is already in the open set from the start direction if neighbor_node_start in open_set_start: # Update the neighbor's g value if we found a better path open_neighbor_start = open_set_start[open_set_start.index(neighbor_node_start)] if neighbor_node_start.g < open_neighbor_start.g: open_neighbor_start.g = neighbor_node_start.g open_neighbor_start.parent = neighbor_node_start.parent else: # Add the neighbor from the start direction to the open set from the start direction heapq.heappush(open_set_start, neighbor_node_start) for neighbor_goal in neighbors_fn(current_node_goal.state): # Create a new node for the neighbor from the goal direction neighbor_node_goal = Node(neighbor_goal, current_node_goal, current_node_goal.g + 1, heuristic_fn(neighbor_goal, start)) # Check if the neighbor from the goal direction is already in the closed set from the start direction if neighbor_node_goal in closed_set_start: continue # Check if the neighbor from the goal direction is already in the open set from the goal direction if neighbor_node_goal in open_set_goal: # Update the neighbor's g value if we found a better path open_neighbor_goal = open_set_goal[open_set_goal.index(neighbor_node_goal)] if neighbor_node_goal.g < open_neighbor_goal.g: open_neighbor_goal.g = neighbor_node_goal.g open_neighbor_goal.parent = neighbor_node_goal.parent else: # Add the neighbor from the goal direction to the open set from the goal direction heapq.heappush(open_set_goal, neighbor_node_goal) # If we reach this point, there is no path from start to goal return None ``` 希望这个实现对你有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值