基于标签传递的重叠社区发现算法（COPRA算法）

最新推荐文章于 2022-08-16 11:30:58 发布

Touch59

最新推荐文章于 2022-08-16 11:30:58 发布

阅读量1.4w

点赞数 11

文章标签：社区发现 COPRA算法

本文链接：https://blog.csdn.net/u010658028/article/details/80352437

版权

前言

COPRA算法[1]是Gregory在2010年提出的一种基于标签传递的社区发现算法，该算法可以看作是RAK算法[2]的改进算法。COPRA算法对RAK算法的最大改进在于其可以进行重叠社区的发现，而RAK算法只能用于非重叠社区的发现。

COPRA算法

COPRA算法在执行之初会为网络中每一个节点设置一个唯一的社区编号，一般这个社区编号就是节点的自身的ID；之后，节点会根据自己的邻居节点的社区分布决定自己的社区，简单的来说就是自己的邻居节点倾向于选择哪个社区，自己就选择哪个社区。算法在执行时会使用隶属度（Belonging Coefficient）来帮助节点决定选择哪一个社区。如果节点对于邻居节点所在社区的隶属度都低于阈值，那么节点就随机选择一个社区；最后，算法会根据一些条件来决定是否停止算法。停止条件一般分为两种：第一种是连续两次迭代社区标签数量相同；第二种是连续两次迭代社区内节点数目不变。伪代码如下：

输入：图graph(V,E)，K
输出：节点的社区信息partition
1: 为每一个节点设置唯一的社区标签
2: 在没有达到终止条件前，对每一个节点重复执行：
3:     更新节点对其邻居节点所在社区的隶属度bc
4:     如果 bc < 1/K ：
5:         排除社区标签
6:     如果所有社区标签 bc < 1/K :
7:         随机选取一个社区标签

Java代码

建立图

图的数据结构选取了邻接表

package util;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;

public class Graph {
    /**
     * 图数据结构：邻接表
     * */
    private Map<String, ArrayList<String>> adjT;
    /**
     * 节点属性列表，维护节点的id和社区信息
     * */
    private Map<String, HashSet<String>> nodeCommunityInfoPast = new HashMap<>();
    private Map<String, HashSet<String>> nodeCommunityInfoNew = new HashMap<>();

    public Graph(){
        this.adjT = new HashMap<>();
    }

    public Graph(String edgePath){
        this.adjT = new HashMap<String, ArrayList<String>>();
        try{
            BufferedReader reader = new BufferedReader(new FileReader(edgePath));
            String line = null;
            while((line=reader.readLine())!=null){
                String item[] = line.split(",");//CSV格式文件为逗号分隔符文件，这里根据逗号切分
//                System.out.println(item[1]);

                if (! this.adjT.containsKey(item[0])){
                    this.adjT.put(item[0], new ArrayList<>());
                    this.nodeCommunityInfoNew.put(item[0], new HashSet<>());
                }
                if (! this.adjT.containsKey(item[1])){
                    this.adjT.put(item[1], new ArrayList<>());
                    this.nodeCommunityInfoNew.put(item[1], new HashSet<>());
                }

                if (! this.adjT.get(item[0]).contains(item[1])){
                    this.adjT.get(item[0]).add(item[1]);
                }
                if (! this.adjT.get(item[1]).contains(item[0])){
                    this.adjT.get(item[1]).add(item[0]);
                }

            }
        }catch(Exception e){
            e.printStackTrace();
        }

    }

    /**
     * 判断节点之间是否有边
     * */
    public boolean hasEdge(String v, String w){
        return this.adjT.get(v).contains(w);
    }

    /**
     * 获取节点的邻居节点
     * */
    public ArrayList<String> neighbors(String node){
        return this.adjT.get(node);
    }

    /**
     * 获取网络中的所有节点
     * */
    public Iterable<String> nodes(){
        return this.adjT.keySet();
    }

    /**
     * 获取所有节点的社区信息
     * */
    public Map<String, HashSet<String>> getNodeCommunityInfo() {
        return this.nodeCommunityInfoPast;
    }

    /**
     * 获取节点的社区信息
     * */
    public HashSet<String> getCommnityLabel(String node){
        return this.nodeCommunityInfoPast.get(node);
    }

    /**
     * 更新节点的社区信息
     * */
    public void updateNodeCommunityLabel(String node, String cLabel){
        this.nodeCommunityInfoNew.get(node).add(cLabel);
    }

    /**
     * 在社区信息一轮更新完成后，将原始的社区信息进行覆盖
     * */
    public void coverCommunityInfo(){
        this.nodeCommunityInfoPast.clear();
        for (Map.Entry<String, HashSet<String>> entry : this.nodeCommunityInfoNew.entrySet()){
            nodeCommunityInfoPast.put(entry.getKey(), new HashSet<>(entry.getValue()));
        }
        for (Map.Entry<String, HashSet<String>> entry : this.nodeCommunityInfoNew.entrySet()){
            entry.getValue().clear();
        }
    }

}

使用COPRA算法进行社区发现

由于在项目中要处理的网络规模非常的巨大，所以算法的终止条件设置为“当连续两次迭代的社区数量不发生变化时，停止算法”。当然这样的设置并不能保证算法真正的收敛。如果网络规模较小，可以将算法的终止条件设置为“在连续两次迭代中，各个社区规模不发生变化时，停止算法”，但是这样做迭代过程会非常的长。

package community;

import util.Graph;
import java.util.*;

public class COPRA {

    public static List<String> getRandomList(List<String> paramList,int count){
        /**
         * @function: 从list中随机抽取若干不重复元素
         * @param paramList:被抽取list
         * @param count:抽取元素的个数
         * @return: 由抽取元素组成的新list
         * */
        if(paramList.size()<count){
            return paramList;
        }
        Random random=new Random();
        List<Integer> tempList=new ArrayList<Integer>();
        List<String> newList=new ArrayList<>();
        int temp=0;
        for(int i=0;i<count;i++){
            temp=random.nextInt(paramList.size());//将产生的随机数作为被抽list的索引
            if(!tempList.contains(temp)){
                tempList.add(temp);
                newList.add(paramList.get(temp));
            }
            else{
                i--;
            }
        }
        return newList;
    }

    public Map<String, HashSet<String>> divide_community(Graph graph, int v, int maxIterations){
        /**
         * @function: 使用COPRA算法划分社区
         * @graph ：图
         * @v ：一个节点可以属于的最大社区数
         * @maxIteration ：最大迭代次数
         * */

        /**
         * 初始，为每一个节点附上唯一的社区编号
         * */
        Iterable nodes = graph.nodes();
        for (Object id : nodes){
            graph.updateNodeCommunityLabel((String)id, (String)id);
        }
        graph.coverCommunityInfo();

        Random random = new Random();
        /**
         * 更新节点社区信息
         * */
        int interations = 0;
        Map<String, Integer> communitySizePast = new HashMap<>();
        Map<String, Integer> communitySizeNow = new HashMap<>();
        Integer flag = 0;
        while (interations < maxIterations){
            for (Object id : nodes){
                /**
                 * 统计节点的邻居节点的社区分布
                 * */
                Map<String, Integer> labels_freq = new HashMap<>();
                ArrayList<String> neighbors = graph.neighbors((String)id);
                for (String n : neighbors){
                    HashSet<String> n_labels = graph.getCommnityLabel(n);
                    for (String label : n_labels){
                        if (labels_freq.keySet().contains(label)){
                            labels_freq.put(label, labels_freq.get(label) + 1);
                        }else{
                            labels_freq.put(label, 1);
                        }
                    }
                }

                int temp_count = 0;
                List<String> label_list = new ArrayList<>();
                List<String> label_list_add = new ArrayList<>();
                /**
                 * 计算节点与社区的隶属度
                 * 节点将被分配隶属度大于阈值的社区标签
                 * */
                for (Map.Entry<String, Integer> entry : labels_freq.entrySet()){

                    if (entry.getValue() / (float)neighbors.size() >= 1 / (float)v) { //
                        temp_count += 1;
                        label_list.add(entry.getKey());
//                        graph.updateNodeCommunityLabel((String)id, entry.getKey());
                    }
                }
                // 隶属度大于阈值的社区数量超过v，则随机选取v个隶属度大于阈值的社区
                if (temp_count >= v){
                    label_list_add = getRandomList(label_list, v);
                    for (String l : label_list_add){
                        graph.updateNodeCommunityLabel((String)id, l);
                    }
                //隶属度大于阈值的社区数量不超过v，则选取所有的隶属度大于阈值的社区
                }else if (temp_count > 0){
                    for (String l : label_list){
                        graph.updateNodeCommunityLabel((String)id, l);
                    }
                }
                //节点对于每一个社区的隶属度都低于阈值，随机选择一个社区
                if (temp_count == 0){
                    int maxNum = labels_freq.keySet().size();
                    int index = random.nextInt(maxNum)%(maxNum+1);
                    List<String> labels_list = new ArrayList<>(labels_freq.keySet());
                    graph.updateNodeCommunityLabel((String)id, labels_list.get(index));
                }
            }
            graph.coverCommunityInfo();
            interations += 1;

            /**
             * 前后两次社区的数量不变，则停止算法
             * */
            Map<String, HashSet<String>> partitions = graph.getNodeCommunityInfo();
            for (Map.Entry<String, HashSet<String>> entry : partitions.entrySet()){
                for (String label : entry.getValue()){
                    if (communitySizeNow.containsKey(label)){
                        communitySizeNow.put(label, communitySizeNow.get(label) + 1);
                    }else{
                        communitySizeNow.put(label, 1);
                    }
                }
            }

            Integer community_num_now = communitySizeNow.keySet().size();
            Integer community_num_past = communitySizePast.keySet().size();
            if (community_num_now.equals(community_num_past)){
                flag = 1;
            }

            // 社区数量不变，停止迭代
            if (flag.equals(1)){
                interations = maxIterations;
            }
            // 更新过去的社区信息，清空当前的社区信息
            communitySizePast.clear();
            for (Map.Entry<String, Integer> entry : communitySizeNow.entrySet()){
                communitySizePast.put(entry.getKey(), new Integer(entry.getValue()));
            }
            communitySizeNow.clear();
        }

        return graph.getNodeCommunityInfo();
    }
}

执行社区发现

程序的输入格式为csv格式，数据以边的形式组织，比如：

a,b
a,d
a,e
a,g
b,d
b,c
c,d
e,g
e,f
f,g

test.csv中的图就是文献[1]中的例子。在运行程序前，设置K为2，即节点最多可以属于两个社区；最大迭代次数为10000。
这里写图片描述

import community.COPRA;
import util.Graph;
import java.io.IOException;
import java.util.*;


public class Main {
    public static void main(String[] args) throws IOException {
        Graph graph = new Graph("./data/test.csv");
        COPRA copra = new COPRA();
        Map<String, HashSet<String>> partitions = copra.divide_community(graph, 2, 10000);
        System.out.println(partitions);

        /**
         * 统计社区数量
         * */
        Set<String> community = new HashSet<>();
        for (Map.Entry<String, HashSet<String>> entry : partitions.entrySet()){
            community.addAll(entry.getValue());
        }
        System.out.println("社区数量: " + community.size());
    }
}

程序执行后，得到的结果为：

{a=[b, g], b=[b], c=[b], d=[b], e=[g], f=[g], g=[g]}
社区数量: 2

但是该算法也具有明显的缺陷：随机性太强。几乎每一次的社区发现结果都不相同，以上的结果是正确的输出，但是这个输出是执行多次才得到的，中间很多次的社区发现结果都是错误的。在这个过程中，我也使用了第二种算法终止条件，使用该终止条件后，算法输出的结果明显可靠多了，但是因为要划分的网络过于庞大，只能选择第一种终止条件。当然，出现这种状况的原因也可能是我的程序中存在Bug，欢迎大家指正！

引用

[1]Gregory S. Finding overlapping communities in networks by label propagation[J]. New Journal of Physics, 2009, 12(10):2011-2024.
[2]Raghavan U N, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks[J]. Physical Review E Statistical Nonlinear & Soft Matter Physics, 2007, 76(2):036106.

Touch59

关注

11
点赞
踩
35

收藏

觉得还不错? 一键收藏
2
评论
基于标签传递的重叠社区发现算法（COPRA算法）

前言COPRA算法[1]是Gregory在2010年提出的一种基于标签传递的社区发现算法，该算法可以看作是RAK算法[2]的改进算法。COPRA算法对RAK算法的最大改进在于其可以进行重叠社区的发现，而RAK算法只能用于非重叠社区的发现。COPRA算法COPRA算法在执行之初会为网络中每一个节点设置一个唯一的社区编号，一般这个社区编号就是节点的自身的ID；之后，节点会根据自己的邻居...
复制链接

扫一扫