社团检测之标签传播算法Python实现

Don’t you wonder sometimes, what might have happened if you tried?

有时别想那么多,试一试看看结果会怎么样?

LPA标签传播算法

主要优点:时间复杂度近似线性,不需要事先知道社区数量。

主要算法流程:首先为每个节点设置唯一标签,接着迭代依次更新各个节点,针对每个节点,通过统计节点邻居的标签,选择标签数最多的标签更新该节点,如果最多便签数大于一,则从中随机选择一个标签更新该节点,直到收敛为止。

标签传播算法的节点标签更新策略主要分成两种:一种是同步更新,另一种是异步更新。
同步更新:在执行第t次迭代更新时,仅依赖第t-1次更新后的标签集。
异步更新:在执行第t次迭代更新时,同时依赖t次迭代已经更新的标签集以及在t-1更新但t次迭代 中未来的及更新的标签集,异步更新策略更关心节点更新顺序,所以在异步更新过程中,节点的更新顺序采用随机选取的方式。

LPA算法适用于非重叠社区发现,针对重叠社区的发现问题,学者提出了COPRA(Community Overlapping Propagation Algorithm)算法。该算法提出所有节点可以同时属于V个社区,V是个人为设定的全局变量,很显然 V 的选择直接影响算法的效果,针对V的选择需要足够的先验知识,在真实的社区网络中,V的选择不能很好的被控制。

Python实现过程

# -*- coding: UTF-8 -*-

"""
Created on 17-11-28

@summary: 实现传统标签传播算法LPA

@author: dreamhome
"""

import random
import networkx as nx
import matplotlib.pyplot as plt


def read_graph_from_file(path):
    """
    :param path: 从文件中读取图结构
    :return: Graph graph
    """
    # 定义图
    graph = nx.Graph()
    # 获取边列表edges_list
    edges_list = []
    # 开始获取边
    fp = open(path)
    edge = fp.readline().split()
    while edge:
        if edge[0].isdigit() and edge[1].isdigit():
            edges_list.append((int(edge[0]), int(edge[1])))
        edge = fp.readline().split()
    fp.close()
    # 为图增加边
    graph.add_edges_from(edges_list)

    # 给每个节点增加标签
    for node, data in graph.nodes_iter(True):
        data['label'] = node

    return graph


def lpa(graph):
    """
    标签传播算法 使用异步更新方式
    :param graph:
    :return:
    """
    def estimate_stop_condition():
        """
        算法终止条件:所有节点的标签与大部分邻居节点标签相同或者迭代次数超过指定值则停止
        :return:
        """
        for node in graph.nodes_iter():
            count = {}
            for neighbor in graph.neighbors_iter(node):
                neighbor_label = graph.node[neighbor]['label']
                count[neighbor_label] = count.setdefault(
                    neighbor_label, 0) + 1

            # 找到计数值最大的label
            count_items = count.items()
            count_items.sort(key=lambda x: x[1], reverse=True)
            labels = [k for k, v in count_items if v == count_items[0][1]]
            # 当节点标签与大部分邻居节点标签相同时则达到停止条件
            if graph.node[node]['label'] not in labels:
                return False

        return True

    loop_count = 0

    # 迭代标签传播过程
    while True:
        loop_count += 1
        print '迭代次数', loop_count

        for node in graph.nodes_iter():
            count = {}
            for neighbor in graph.neighbors_iter(node):
                neighbor_label = graph.node[neighbor]['label']
                count[neighbor_label] = count.setdefault(
                    neighbor_label, 0) + 1

            # 找到计数值最大的标签
            count_items = count.items()
            # print count_items
            count_items.sort(key=lambda x: x[1], reverse=True)
            labels = [(k, v) for k, v in count_items if v == count_items[0][1]]
            # 当多个标签最大计数值相同时随机选取一个标签
            label = random.sample(labels, 1)[0][0]
            graph.node[node]['label'] = label

        if estimate_stop_condition() is True or loop_count >= 10:
            print 'complete'
            return


if __name__ == "__main__":

    path = "/home/dreamhome/network-datasets/dolphins/out.dolphins"
    graph = read_graph_from_file(path)
    lpa(graph)

    # 根据算法结果画图
    node_color = [float(graph.node[v]['label']) for v in graph]
    nx.draw_networkx(graph, node_color=node_color)
    plt.show()
  • 1
    点赞
  • 33
    收藏
    觉得还不错? 一键收藏
  • 17
    评论
GN算法是一种基于图论的社团发现算法,其实现主要包括以下几个步骤: 1. 读取图数据并构建邻接矩阵 2. 计算每个节点的度数 3. 初始化每个节点的社区为其自身 4. 对每条边进行计算,计算边的介数,并将介数最大的边移除 5. 更新节点的社区,合并介数最大的边所连接的两个社区 6. 重复步骤4和5,直到没有边可以移除为止 下面是基于Python实现GN算法的代码示例: ```python import numpy as np # 构建邻接矩阵 def build_adjacency_matrix(data): num_nodes = max([max(item) for item in data]) + 1 adjacency_matrix = np.zeros((num_nodes, num_nodes)) for item in data: adjacency_matrix[item[0], item[1]] = 1 adjacency_matrix[item[1], item[0]] = 1 return adjacency_matrix # 计算节点的度数 def compute_degree(adjacency_matrix): num_nodes = adjacency_matrix.shape[0] degree = np.sum(adjacency_matrix, axis=1) return degree # 初始化节点的社区 def init_community(num_nodes): community = np.arange(num_nodes) return community # 计算边的介数 def compute_betweenness_centrality(adjacency_matrix): num_nodes = adjacency_matrix.shape[0] betweenness_centrality = np.zeros((num_nodes, num_nodes)) for i in range(num_nodes): for j in range(num_nodes): if adjacency_matrix[i, j] == 1: betweenness_centrality[i, j] = 1 betweenness_centrality[j, i] = 1 for k in range(num_nodes): for i in range(num_nodes): for j in range(num_nodes): if i != j and i != k and j != k: if betweenness_centrality[i, j] == 0 and adjacency_matrix[i, k] * adjacency_matrix[k, j] != 0: betweenness_centrality[i, j] = betweenness_centrality[i, k] * betweenness_centrality[k, j] elif adjacency_matrix[i, k] * adjacency_matrix[k, j] != 0: betweenness_centrality[i, j] += betweenness_centrality[i, k] * betweenness_centrality[k, j] return betweenness_centrality # 合并社区 def merge_community(community, community_1, community_2): for i in range(len(community)): if community[i] == community_2: community[i] = community_1 return community # GN算法 def gn_algorithm(data): # 构建邻接矩阵 adjacency_matrix = build_adjacency_matrix(data) # 计算节点的度数 degree = compute_degree(adjacency_matrix) # 初始化节点的社区 community = init_community(adjacency_matrix.shape[0]) # GN算法迭代 while np.sum(adjacency_matrix) != 0: # 计算边的介数 betweenness_centrality = compute_betweenness_centrality(adjacency_matrix) # 获取介数最大的边 max_index = np.unravel_index(np.argmax(betweenness_centrality), betweenness_centrality.shape) # 移除介数最大的边 adjacency_matrix[max_index[0], max_index[1]] = 0 adjacency_matrix[max_index[1], max_index[0]] = 0 # 更新社区 community = merge_community(community, community[max_index[0]], community[max_index[1]]) return community ``` 该代码实现了GN算法的主要步骤,并在每次迭代中移除介数最大的边,并更新节点的社区。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 17
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值