Top-k Structure Holes Detection Algorithm in Social Network

最新推荐文章于 2022-03-04 12:00:04 发布

hellohiman

最新推荐文章于 2022-03-04 12:00:04 发布

阅读量1k

点赞数

本文链接：https://blog.csdn.net/sinat_38616352/article/details/83088872

版权

本文提出了一种基于最短路径增量的Top-k结构洞检测算法，该算法通过分析最短路径增量、子连通组件数量和顶点在连通组件内的方差来确定顶点的结构洞属性值。实验表明，该算法在NDCG评分上表现更优，并且在SIR传染病传播扩散模型中效果明显优于其他方法。

摘要由CSDN通过智能技术生成

Top-k Structure Holes Detection Algorithm in Social Network

Abstract

Structure holes were first proposed in relational sociology, and then introduced into social network research. The structure holes is the vertex set at the key positions in the network, so the detection of such vertex is of great significance to the control of network public opinion, the analysis of the influence of social network, the discovery of the weak point of the network security, the rapid promotion of the information and so on. Aiming at the problem of Structure holes detection, this paper proposes an algorithm of Top-k structure holes discovery based on shortest path increment, mainly through the analysis of the shortest path increment, the number of sub connected components and the variance of the vertices in connected component to determine the structure hole attribute values of the vertices, then the vertices are sorted according to this value and obtained the top-k vertices. The experiment uses the real network and the LFR simulation complex network, and compares the proposed algorithm with other congeneric algorithms by using NDCG evaluation method and the SIR communicable disease propagation diffusion model. The experimental results show that the NDCG score of the proposed algorithm is higher, and its diffusion effect in the SIR model is obviously better than other methods.

Keywords: social network; graph shortest path incremental; structure holes; information diffusion

1 INTRODUCTION

In recent years, the relationship between the network and people's production and life is becoming more and more closely. All kinds of network are turning into the developing direction towards diversified, complex and massive. How to quickly grasp the key vertices and obtain effective information under such a background becomes the key to further improve production efficiency and quality of life. For example, The influence of key vertices on Internet public opinion control, the analysis of user influence in social network, the discovery of weaknesses in network security and the rapid promotion of information, etc.

Fig.1. Structure Holes

The concept of structure holes is first proposed by the Burt [1], which explains and analyzes the key position of the individual in the group. It is believed that in the social structure, the individual in the key position will be able to gain more competitive advantage. As shown in Fig.1, a simple example network, of which three dashed areas represent three communities respectively, dark vertices directly restrict the flow of information between communities even the entire network, so they are regarded as structure holes. But the real network is much more complex than the example network. As the research goes deep, the understanding of the structure holes is no longer limited to the key vertices of the flow of information between the communities. And the algorithms for detecting structure holes are also increasing and improving. Some of them are based on community detection [2], and these algorithms needs to detect the community in advance, so it will become complicated and lengthy in the calculation process, and the quality of the structure hole is fully determined by the found communities; some are optimized for centrality algorithms and key sorting algorithms [3], these algorithms could reach convergence and stability quickly, but the accuracy is relatively weak, such as PageRank [4], Betweenness-Centrality [5,6], Closeness-Centrality [7,8], etc.; machine learning is also used to integrate multiple data to rank key vertices [9,10]. And there are many other algorithms for different ideas.

Our contributions can be summarized as follows:

· We are mainly considers from the structure of network, and judge the attribute strength of vertex by calculating the increment of shortest path. From the principle of the algorithm, the proposed algorithm is similar to the optimization algorithm of centrality algorithm. And we put forward a new optimization scheme. As far as we know, this is the first attempt to use VAR and NCC instead of the maximum value to describe and solve the unreachable shortest path.

· Compared with the centrality algorithm, we inherit its efficiency and improve the accuracy of the result. Compared with the algorithm which based on community detection, we just incorporate the concept of community into the algorithm through simple processing, thus avoiding the complexity of the algorithm.

· The results of SIR diffusion experiments on several networks show that our algorithm is better than other algorithms.

2 PRELIMMINARIES AND DEFINITIONS

In this section, we establish key definitions and notational conventions that simplify the exposition in later sections.

2.1 Network Model

A social network can be modeled as an undirected graph G = (V, E), where is the set of vertices representing the Individuals in social networks. is the set of edges representing the relationships between individuals. And let n = |V|, m = |E|.

In order to facilitate the study and elaboration, we do not consider the weight and direction of edges in this paper. As G is an undirected graph, assume that each edge weight as 1. The distance between two vertices u and v in G is the length of the shortest path between them.

The sum of shortest path of the vertex v is from the vertex v in G to the other vertices [11]:

(1)

The sum of shortest path of the graph G then is [11]:

(2)

Definition 1. (Shortest Path Increment of the Graph, abbreviate SPIG). As we know any vertex in the graph may on the shortest path of some pairs, so remove the vertex which on the shortest path of the pairs that may cause this pairs’ shortest path detour a longer path than before. And if shortest paths goes through this vertex the more the larger the increment will be.

Let G(V\v) be the removed vertex v from G, and abbreviate G(V\v) by G\v if no ambiguities arise. Then SPIG is:

(3)

For a given network G, the c(G) is a constant. This constant has no effect on the comparison of SPIG values between two vertices. So we use SPIG’ instead of SPIG for comparison to improve the calculation efficiency:

(4)

Definition 2. (Number of Connected Components, abbreviate NCC). NCC Is one of the attributes of the vertex, which used to describe the number of connected components in the graph by removing the corresponding vertex.

General, given network graph G is a connected graph. It is means NCC(G) =1 . However, removing a vertex v from the graph G may result in a plurality of sub-connected components. As shown in Fig.2, it was means NCC(G\v)=3 , abbreviate by NCC(v)=3 . And NCC(u)=3, NCC(w)=1 .

Fig.2. Definition of NCC and VAR

Definition 3. (The variance of vertex, abbreviate VAR). The VAR is always the one of the attributes of the vertex, which used to describe the variance of the number of vertices in each sub-connected component after the vertex is removed. As shown in Fig.2, VAR(v) = VAR[ |C1|, |C2|, |C3| ] = VAR[ 4,3,4 ] = 0.222, VAR(w) = VAR[ |C1| ] = VAR[ 1 ] = 0.

Definition 4. (The structure holes attribute of a vertex, abbreviate SH). The SH is used to describe the possibility of vertex as a structure holes. The greater the value, the higher the possibility. The specific content will be described in the algorithm section.

2.2 Problem Definition

Given a network G = (V, E), the problem of structure holes detection is to find a subset of vertices VS (VS⊂V), such that the removal of the vertices in VS from G will result in the maximum SPIG in the induced sub-g