gn算法java_R语言构建蛋白质网络并实现GN算法

最新推荐文章于 2023-11-11 22:05:52 发布

活着改变世界

最新推荐文章于 2023-11-11 22:05:52 发布

阅读量968

点赞数

文章标签： gn算法java

本文链接：https://blog.csdn.net/weixin_34443284/article/details/114873211

版权

本文介绍了如何使用R语言构建蛋白质网络，并探讨了GN算法在蛋白质网络模块发现中的应用。通过读取蛋白质互作数据，利用igraph包构建网络，并展示了包括GN算法在内的多种模块发现方法的实现和图形展示。

摘要由CSDN通过智能技术生成

R语言构建蛋白质网络并实现GN算法

1.蛋白质网络的构建

我们使用与人类HIV相关的蛋白质互作数据hunam-HIV PPI.csv来构建这个蛋白质互作网络。

在R中，我们可以从存储在R环境外部的文件读取数据。还可以将数据写入由操作系统存储和访问的文件。 R可以读取和写入各种文件格式，如：csv，excel，xml等。

想要读取csv文件，我们需要：

设置工作目录

读取CSV文件

代码如下：

setwd("/Users/.../Documents/...")

data

这样，我们就得到了蛋白质互作数据并存储在了data中。

接下来，我们使用igraph包来构建该网络。(因为数据中只有两列表示两个有连接的顶点，因此我没有构建数据帧用于存放顶点的特征)

edges

graph.data.frame(也可写作graph_from_data_frame)函数有许多参数，具体内容如下:

graph_from_data_frame(edges,direced,vertices)

现在，我们已经建立了图形g，如果你想看看它的样子，可以简单地通过plot(g)来做到。

2.生物网络的模块发现方法

在许多复杂网络中，对于模块(或称为社区)的划分是非常有意义的。模块发现，或称为社群发现主要有五种模型。

社群结构特点：社群内边密度要高于社群间边密度，社群内部连接相对紧密，各个社群之间连接相对稀疏。

社群模型

概念

效果

点连接

某点与某社群有关系就是某社群的

最差，常常是某一大类超级多

随机游走

利用距离相似度，用合并层次聚类方法建立社群

运行时间短，但是效果不是特别好，也会出现某类巨多

自旋玻璃

关系网络看成是随机网络场，利用能量函数来进行层次聚类

耗时长，适用较为复杂的情况

中间中心度

找到中间中心度最弱的删除，并以此分裂至到划分不同的大群落

耗时长，参数设置很重要

标签传播

通过相邻点给自己打标签，相同的标签一个社群

跟特征向量可以组合应用，适用于话题类

其中，中间中心度的模型，为Gievan-Newman(GN)算法的思想相同。其余模型的详细情况不作更多介绍，此处，参考了R语言︱SNA-社会关系网络—igraph包(社群划分、画图)。

下面，我们介绍GN算法的基本思想：

1.计算网络中所有边的中介中心性；

2.去除中介中心性最高的边；

3.重新计算去除边后的网络中所有边的中介中心性；

4.跳至步骤2，重新计算，直至网络中没有边存在。

可以看到，这个算法的思想非常简单。但是，这个算法什么时候终止，才能使得社群划分的结构最优？在Newman and Girvan 2004中，他们提出了Modularity Q(全局模块度)的概念，进一步完善了这个算法。一般认为，Q的取值在0.3～0.7之间最优，但是，也需具体情况具体考虑。

3.模块发现方法实现和图形展示

现在模块划分有非常多的算法，很多都已集成在igrah中。在library("igraph")之后，我们可以调用许多包中已实现的函数对网络g划分模块。

算法

作者

年份

复杂度

Newman & Girvan

2004

CFinder

2005

随机游走方法

Pons & Latapy

2005

自旋玻璃社群发现

Reichardt & Bornholdt

2006

LPA(标签传播算法)

Raghavan et al

2007

O(m)

Fast Unfolding

Vincent D. Blondel

2008

LFM

2009

O(n^2)

EAGLE

2009

O(s*n^2)

GIS

2009

O(n^2)

HANP(Hop Attenuation & Node Preferences)

Lan X.Y. & Leung

2009

O(m)

GCE

2010

O(mh)

COPRA

2010

NMF

2010

Link

2010

SLPA/GANXis(Speaker-listener Label Propagation)

Jierui Xie

2011

BMLPA(Balanced Multi-label Propagation)

武志昊(北交大)

2012

O(n*logn)

1) 基于点连接的模块发现：cluster_fast_greedy该方法通过直接优化模块度来发现模块。

cluster_fast_greedy(graph, merges = TRUE, modularity = TRUE,

membership = TRUE, weights = E(graph)$weight)

graph 待划分模块的图。

merges 是否返回合并后的模型。

modularity 是否将每次合并时的模块度以向量返回。

membership 是否在每次合并时考虑所有可能的模块结构，对应最大的模块度计算成员向量。

weights 如果非空，则是一个边权重的向量。

return 一个communities对象。

一个例子：

cfg

plot(cfg, g)

生成的图形如下所示：

2) GN算法：edge.betweenness.community

该方法通过中间中心度找到网络中相互关联最弱的点，删除它们之间的边，并以此对网络进行逐层划分，就可以得到越来越小的模块。在适当的时候终止这个过程，就可以得到合适的模块划分结果。

member

$weight,directed=F)

有默认的边权重weight，并且默认边是无向的，directed=T时代表有向。

调用这个方法并将其图形展示和保存的代码如下：

#• Community structure in social and biological networks

# M. Girvan and M. E. J. Newman

#• New to version 0.6: FALSE

#• Directed edges: TRUE

#• Weighted edges: TRUE

#• Handles multiple components: TRUE

#• Runtime: |V||E|^2 ~稀疏:O(N^3)

V(g)$size = 1 #我将大部分顶点的大小设置为1

V(g)[degree(g)>=300]$size = 5 #但度很大的顶点更大

png('/Users/.../Documents/.../protein.png',width=1800,height=1800)# 指明接下来要做的图形的格式和长宽

plot(ec,g)

dev.off() # 关闭图形设备

print(modularity(ec))

这样，图片保存为了protein.png，还输出了模块度。

3) 随机游走：walktrap.community

#• Computing communities in large networks using random walks

# Pascal Pons, Matthieu Latapy

#• New to version 0.6: FALSE

#• Directed edges: FALSE

#• Weighted edges: TRUE

#• Handles multiple components: FALSE

#• Runtime: |E||V|^2

system.time(wc

print(modularity(wc))

#membership(wc)

plot(wc , g)

4)Newman快速算法：leading.eigenvector.community

Newman快速算法将每个节点看作是一个社团，每次迭代选择产生最大Q值的两个社团合并，直至整个网络融合成一个社团。整个过程可表示成一个树状图，从中选择Ｑ值最大的层次划分得到最终的社团结构。该算法的总体时间复杂度为Ｏ(ｍ(ｍ＋ｎ))

#• Finding community structure in networks using the eigenvectors of matrices

# MEJ Newman

# Phys Rev E 74:036104 (2006)

#• New to version 0.6: FALSE

#• Directed edges: FALSE

#• Weighted edges: FALSE

#• Handles multiple components: TRUE

#• Runtime: c|V|^2 + |E| ~N(N^2)

system.time(lec

print(modularity(lec))

plot(lec,g)

5) Newman Fast Greedy:fastgreedy.community

#• Finding community structure in very large networks

# Aaron Clauset, M. E. J. Newman, Cristopher Moore

#• Finding Community Structure in Mega-scale Social Networks

# Ken Wakita, Toshiyuki Tsurumi

#• New to version 0.6: FALSE

#• Directed edges: FALSE

#• Weighted edges: TRUE

#• Handles multiple components: TRUE

#• Runtime: |V||E| log |V|

system.time(fc

print(modularity(fc))

plot(fc, g)

6) Fast unfolding算法：multilevel.community

#• Fast unfolding of communities in large networks

# Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre

#• New to version 0.6: TRUE

#• Directed edges: FALSE

#• Weighted edges: TRUE

#• Handles multiple components: TRUE

# Runtime: “linear” when |V| \approx |E| ~ sparse; (a quick glance at the algorithm \

# suggests this would be quadratic for fully-connected graphs)

system.time(mc

print(modularity(mc))

plot(mc, g)

7)标签传播算法：label.propagation.community

#• Near linear time algorithm to detect community structures in large-scale networks.

# Raghavan, U.N. and Albert, R. and Kumara, S.

# Phys Rev E 76, 036106. (2007)

#• New to version 0.6: TRUE

#• Directed edges: FALSE

#• Weighted edges: TRUE

#• Handles multiple components: FALSE

# Runtime: |V| + |E|

system.time(lc

print(modularity(lc))

plot(lc , g)

8)自旋玻璃社群发现：spinglass.community

member

#需要设置参数weights，因为无默认值

9)自实现GN算法

为了更好地理解GN算法，我们当然要尝试自己实现一个GN算法。

4.附录：igraph中常用函数

1)plot 画图函数

plot(g, layout = layout.fruchterman.reingold, vertex.size = V(g)$size+2,vertex.color=V(g)$color,vertex.label=V(g)$label,vertex.label.cex=1,edge.color = grey(0.5), edge.arrow.mode = "-",edge.arrow.size=5)

layout 设置图的布局方式

layout、layout.auto、layout.bipartite、layout.circle、layout.drl、layout.fruchterman.reingold、layout.fruchterman.reingold.grid、layout.graphopt、layout.grid、layout.grid.3d、layout.kamada.kawai、layout.lgl、layout.mds、layout.merge、layout.norm、layout.random、layout.reingold.tilford、layout.sphere、layout.spring、layout.star、layout.sugiyama、layout.svd

vertex.size 设置节点的大小

V(g)$deg

V(g)$size=2

V(g)[deg>=1]$size=4

V(g)[deg>=2]$size=6

V(g)[deg>=3]$size=8

V(g)[deg>=4]$size=10

V(g)[deg>=5]$size=12

V(g)[deg>=6]$size=14

vertex.color 设置节点的颜色

color

col