python的networkx 算法_python图算法库Networkx笔记 - Affiliation Network

实际工作中的网络关系往往需要靠我们自己手动创建。这里给出了一个word-co-occurrence network。这个关系网络可以用来理解一系列文档中,词与词的关系。在这个关系如图中,词代表节点,边代表词共同出现。

先构建一个stop word list

stop_words = set([

'the', 'of', 'and', 'i', 'to', 'my', 'in', 'was', 'that', 'thy',

'a', 'had', 'my', 'with', 'but', 'he', 'she', 'you', 'your',

'me', 'not', 'as', 'will', 'from', 'on', 'be', 'it', 'which',

'for', 'his', 'him', 'chapter', 'at', 'who', 'by', 'have',

'would', 'is', 'been', 'when', 'they', 'there', 'we', 'are',

'our', 'if', 'her', 'were', 'than', 'this', 'what', 'so',

'yet', 'more', 'their', 'them', 'or', 'could', 'an', 'can',

'said', 'may', 'do', 'these', 'shall', 'how', 'shall', 'asked',

'before', 'those', 'whom', 'am', 'even', 'its', 'did', 'then',

'abbey', 'tintern', 'wordsworth', 'letter', 'thee', 'thou', 'oh',

'into', 'any', 'myself', 'nor', 'himself', 'one', 'all', 'no', 'yes'

'now', 'upon', 'only', 'might', 'every', 'own', 'such', 'towards',

'again', 'most', 'ever', 'where', 'after', 'up', 'soon', 'many',

'also', 'like', 'over', 'us', 'thus', 'has', 'about']

+ [str(x) for x in range(24)])

现在构造一个函数来从文字中提取图

import re

def co_occurrence_network(text):

G = nx.Graph()

sentence = text.split('.')

for s in sentence:

# 清洗多余字符

clean = re.sub('[^\w\n]+','',s).lower()

clean = re.sub('_+','',clean).strip()

words = re.split('\s+',clean)

# 从词之间构建关系

for v in words:

try:

G.nodes[v]['count'] +=1

except KeyError:

G.add_node(v)

G.nodes[v]['count'] = 1

for w in words:

if v == w or v in stop_words or w in stop_words:

continue

if len(v) == 0 or len(w) == 0:

continue

try:

G.edges[v,w]['count'] +=1

except KeyError:

G.add_edge(v,w,count=1)

return G

调用函数提取,并把图画出来

with open(target,'r') as f:

text = f.read()

G = co_occurrence_network(text)

pairs = sorted(

# data = true 除了返回边还会返回边的count值

G.edges(data=True),key = lambda e: e[2]['count'],

reverse=True

)

pairs[0:10]

>>>

[('man', 'old', {'count': 68}),

('country', 'native', {'count': 38}),

('first', 'now', {'count': 32}),

('death', 'life', {'count': 32}),

('human', 'being', {'count': 32}),

('natural', 'philosophy', {'count': 32}),

('eyes', 'tears', {'count': 30}),

('first', 'eyes', {'count': 28}),

('some', 'time', {'count': 28}),

('night', 'during', {'count': 28})]

plt.figure(figsize=(16,6))

nx.draw_networkx(G)

可以发现非常不直观,这种东西我们在关系网络里叫做hair balls. 我们可以通过subgraph()方法从图中提取一个子图出来

# Count co-occurrences for characters only

characters = [

'creature', 'monster', 'victor', 'elizabeth',

'william', 'henry', 'justine']

G_focus = G.subgraph(characters)

# Create list of edge counts

counts = [G_focus.edges[e]['count'] for e in G_focus.edges]

# Create spring layout

pos = nx.spring_layout(G_focus)

# Create figure and draw nodes

plt.figure(figsize=(20,10))

nx.draw_networkx_nodes(G_focus, pos)

# Draw edges

nx.draw_networkx_edges(

G_focus, pos, width=8,

edge_color=counts, edge_cmap=plt.cm.Blues, alpha=0.5)

nx.draw_networkx_edges(G_focus, pos, edge_color="#7f7f7f",alpha=0.5)

# Draw labels

nx.draw_networkx_labels(G_focus, pos)

plt.figure()

# plt.tight_layout()

Affiliation networks in NetworkX首先说一下affiliation networks的概念,我们知道edge代表了2个节点,也仅仅是2个节点之间的关系。但是在图关系中,有一些关系是没有办法通过仅仅一条边2个节点来表达的,比如多个节点属于同一团体,这种关系结构我们叫做affiliation networks。

从正式的定义上来说,一个affiliation network(或者叫bipartite network二分网络)。并且在这个网络环境中,同一个类型的节点不会直接相连。隶属关系网络对于表达多对多的关系非常有用。比如,一个电影可以包含非常多的演员,一个演员可以在多部电影中出现。

首先测试一个图是不是一个一个affiliation network.

from networkx.algorithms import bipartite

from networkx import NetworkXError

G = nx.karate_club_graph()

try:

left,right = bipartite.sets(G)

print('left nodes\n',left)

print('\nright nodes\n',right)

except NetworkXError as e:

print(e)

>>> Graph is not bipartite.

显然不是二分图,但是我们可以把友谊的关系实体化成节点,从而把这个图变成二分图。

B = nx.Graph()

B.add_edges_from([(v,(v,w)) for v,w in G.edges])

B.add_edges_from([(w,(v,w)) for v,w in G.edges])

try:

# find and print node sets

left,right = bipartite.sets(B)

print('left node\n',left)

print('\nright node\n',right)

except:

pass

如果我们仅仅只是想知道目前的图是不是一个二分图的话,可以直接调用is_bipartite方法

bipartite.is_bipartite(B)

>>> True

接下来看一个复杂点的,划分传播阶梯的例子,我们可以通过connected_components()来无视那些与图中其他点没有关系的节点。

with open(target,'r') as f:

# Skip header row

next(f)

for row in f:

# Break row into cells

cells = row.strip().split('\t')

# Get plant species and pollinator species

plant = cells[4].replace('_', '\n')

pollinator = cells[8].replace('_', '\n')

B.add_edge(pollinator, plant)

# Set node types

B.nodes[pollinator]["bipartite"] = 0

B.nodes[plant]["bipartite"] = 1

# Only consider connected species

B = B.subgraph(list(nx.connected_components(B))[0])

# Get node sets

pollinators = [v for v in B.nodes if B.nodes[v]["bipartite"] == 0]

plants = [v for v in B.nodes if B.nodes[v]["bipartite"] == 1]

现在我们已经有了2类边,我们接下来就可以进行可视化

# Create figure

plt.figure(figsize=(30,30))

# Calculate layout

pos = nx.spring_layout(B, k=0.9)

# Draw using different shapes and colors for plant/pollinators

nx.draw_networkx_edges(B, pos, width=3, alpha=0.2)

nx.draw_networkx_nodes(B, pos, nodelist=plants, node_color="#bfbf7f", node_shape="h", node_size=3000)

nx.draw_networkx_nodes(B, pos, nodelist=pollinators, node_color="#9f9fff", node_size=3000)

nx.draw_networkx_labels(B, pos);

Projections(投影)

投影能够做到的事情其实就是把bipartite变成single mode。

G = bipartite.projected_graph(B,plants)

plt.figure(figsize=(24,24))

pos = nx.spring_layout(G,k=0.5)

nx.draw_networkx_edges(G,pos,width=3,alpha=.2)

nx.draw_networkx_nodes(G,pos,node_color='#bfbf7f',node_shape='h',node_size=10000)

nx.draw_networkx_labels(G,pos);

当然也可以使用pollinators用来project

G = bipartite.projected_graph(B,pollinators)

plt.figure(figsize=(30,30))

pos = nx.spring_layout(G,k=.5)

nx.draw_networkx_edges(G, pos, width=3, alpha=0.2)

nx.draw_networkx_nodes(G, pos, node_color='#9f9fff',node_size=6000)

nx.draw_networkx_labels(G,pos);

我们可以通过下面的函数在进行projections的时候传入权重,对每一个中间的节点赋予1分的权重。

G = bipartite.weighted_projected_graph(B,plants)

list(G.edges(data=True))[0]

>>> ('Urospermum\npicrioides', 'Eryngium\ncampestre', {'weight': 4})

另外我们也可以通过jaccard index来对节点和节点之间的权重进行计算。jaccard index的计算方式就是将2个节点共有的neighbors除以2个节点中值存在一方的节点数量。我们可以通过overlap_weighted_projection_graph()可以通过Jaccard Index来创建一个projections图。可以用下面的代码来进行实例:

# Create co-affiliation network

G = bipartite.overlap_weighted_projected_graph(B, pollinators)

# Get weights

weight = [G.edges[e]['weight'] for e in G.edges]

# Create figure

plt.figure(figsize=(30,30))

# Calculate layout

pos = nx.spring_layout(G, weight='weight', k=0.5)

# Draw edges, nodes, and labels

nx.draw_networkx_edges(G, pos, edge_color=weight, edge_cmap=plt.cm.Blues, width=6, alpha=0.5)

nx.draw_networkx_nodes(G, pos, node_color="#9f9fff", node_size=6000)

nx.draw_networkx_labels(G, pos);

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值