Efficient Closest Community Search over Large Graph社区搜索问题

最新推荐文章于 2024-01-25 13:35:12 发布

望月听风

最新推荐文章于 2024-01-25 13:35:12 发布

阅读量239

点赞数

分类专栏：笔记

本文链接：https://blog.csdn.net/miss_na/article/details/112970915

版权

笔记专栏收录该内容

9 篇文章

订阅专栏

介绍

本文研究了最接近社区搜索问题，给定一个图G和查询顶点Q，从G中找到一个包括Q的连通子图。并且该连通子图内聚性比较大，也就说这些顶点类型特点相似性比较大。
通过一个两阶段的方法来计算:(1)计算G中包含Q且最具凝聚力的最大连通子图g0，以及(2)迭代地从g0中删除离Q最远的顶点，随后也删除其他违反凝聚力要求的顶点。

算法

第一阶段

基线算法 o(n+m)

Baseline-S1：
先计算出个顶点的core number，通过一个剥离算法在线性时间内计算出。
为的找出包含查询顶点Q的最大kcore。

Input: Graph G = (V,E) and a set of query vertices Q ⊂ V
Output: kQ and the (kQ,∞)-community of Q
1 Run the peeling algorithm of [1] to compute the core number for all vertices of
G;
2 Initialize a priority queue Q to contain an arbitrary vertex of Q;
3 kQ ← n;
4 while not all vertices of Q have been visited do
5 u ← pop the vertex with the maximum core number from Q;
6 Mark u as visited;
7 if core(u) < kQ then kQ ← core(u) ;
8 for each neighbor v ∈ N(u) do
9 if v is not in Q and has not been visited then Push v into Q;
10 g0 ← the connected component of the kQ-core of G that contains Q;
11 return (kQ, g0);

提升算法 o(n0+m0)

Indexed-S1
1 Compute kQ based on the index I;
2 Conduct a pruned breadth-first search on G by starting from an arbitrary
vertex of Q and visiting only vertices whose core numbers are at least kQ;
3 g0 ← the subgraph of G induced by vertices visited at Line 2;
4 return (kQ, g0);

第二阶段

基线算法 o(n0*m0)

Baseline-S2：
从Baseline-S1算法得到的g0中，逐个删除查询距离远的，只要最后组件包括Q且是连通子图就可以。

Input: A set of query vertices Q ⊂ V , an integer kQ, and a graph g0 that
contains Q and has minimum vertex degree kQ
Output: Closest community of Q
1 Compute the query distance for all vertices of g0;
2 i ← 0;
3 while true do
4 u ← the vertex in gi with the largest query distance;
5 gi+1 ← the connected component of the kQ-core of gi\{u} that contains Q;
6 if gi+1 = ∅ then break ;
7 else i ← i + 1 ;
8 return gi;
furthest

提升算法0（m0+n0*logn0）

LinearOrder-S2:
先不验证删除顶点u之后是否满足连接性，而是先建立一个层级结构再说。
通过seq和targets这种结构
按着不同的查询距离，一次添加到seq和targets中，违反连接性的只添加到seq中。
在这里插入图片描述

/* Compute the hierarchical structure for the (kQ, d)-communities */
1 Compute the query distance for all vertices of g0;
2 Sort vertices of g0 in decreasing order with respect to their query distances;
3 seq ← ∅; targets ← ∅;
4 g
 ← g0; deg(u) ← the degree of u in g

for all vertices u ∈ g

;
5 while g

is not empty do
6 u ← the vertex in g

with the largest query distance;
7 if Q ∩ seq = ∅ then Append u to targets;
8 Q ← {u}; /* Q is a queue */;
9 while Q = ∅ do
10 Pop a vertex v from Q, and append v to seq;
11 for each neighbor w of v in g

do
12 deg(w) ← deg(w) − 1;
13 if deg(w) = kQ − 1 then Push w into Q ;
14 Remove v from g

;
使用并查集合并
/* Search for the closest community of Q */
15 Initialize an empty disjoint-set data structure S;
16 for each vertex u ∈ targets in the reverse order do
17 for each vertex v ∈ seq between u (inclusive) and the next target vertex
(exclusive) do
18 Add a singleton set for v into S;
19 for each neighbor w of v in g0 do
20 if w ∈ S then Union v and w in S ;
21 if Q is entirely contained in a single set of S then break ;
22 return all vertices in the set of S that contains Q;

CCS算法

解释：n，m 图G的顶点数和边数，n0，m0，g0的顶点数和边数
Input: Graph G = (V,E), a set of query vertex Q, and an index I
Output: Closest community of Q
1 Compute kQ based on the index I;
2 h0 ← the subgraph of G induced by Q;
3 i ← 0; g ← ∅;
4 while true do
5 g
← the connected component of the kQ-core of hi that contains Q;
6 g ← LinearOrder-S2(Q, kQ, g

);
7 if g = ∅ then
8 i ← i + 1; hi ← hi−1;
9 while hi = G and the size of hi is less than twice of hi−1 do
10 Get the next vertex u that has the smallest query distance;
11 Add to hi the vertex u and its adjacent edges to existing vertices
of hi;
12 else break;
13 return g;