实际工作中的网络关系往往需要靠我们自己手动创建。这里给出了一个word-co-occurrence network。这个关系网络可以用来理解一系列文档中,词与词的关系。在这个关系如图中,词代表节点,边代表词共同出现。
先构建一个stop word list
stop_words = set([
'the', 'of', 'and', 'i', 'to', 'my', 'in', 'was', 'that', 'thy',
'a', 'had', 'my', 'with', 'but', 'he', 'she', 'you', 'your',
'me', 'not', 'as', 'will', 'from', 'on', 'be', 'it', 'which',
'for', 'his', 'him', 'chapter', 'at', 'who', 'by', 'have',
'would', 'is', 'been', 'when', 'they', 'there', 'we', 'are',
'our', 'if', 'her', 'were', 'than', 'this', 'what', 'so',
'yet', 'more', 'their', 'them', 'or', 'could', 'an', 'can',
'said', 'may', 'do', 'these', 'shall', 'how', 'shall', 'asked',
'before', 'those', 'whom', 'am', 'even', 'its', 'did', 'then',
'abbey', 'tintern', 'wordsworth', 'letter', 'thee', 'thou', 'oh',
'into', 'any', 'myself', 'nor', 'himself', 'one', 'all', 'no', 'yes'
'now', 'upon', 'only', 'might', 'every', 'own', 'such', 'towards',
'again', 'most', 'ever', 'where', 'after', 'up', 'soon', 'many',
'also', 'like', 'over', 'us', 'thus', 'has', 'about']
+ [str(x) for x in range(24)])
现在构造一个函数来从文字中提取图
import re
def co_occurrence_network(text):
G = nx.Graph()
sentence = text.split('.')
for s in sentence:
# 清洗多余字符
clean = re.sub('[^\w\n]+','',s).lower()
clean = re.sub('_+','',clean).strip()
words = re.split('\s+',clean)
# 从词之间构建关系
for v in words:
try:
G.nodes[v]['count'] +=1
except KeyError:
G.add_node(v)
G.nodes[v]['count'] = 1
for w in words:
if v == w or v in stop_words or w in stop_words:
continue
if len(v) == 0 or len(w) == 0:
continue
try:
G.edges[v,w]['count'] +=1
except KeyError:
G.add_edge(v,w,count=1)
return G
调用函数提取,并把图画出来
with open(target,'r') as f:
text = f.read()
G = co_occurrence_network(text)
pairs = sorted(
# data = true 除了返回边还会返回边的count值
G.edges(data=True),key = lambda e: e[2]['count'],
reverse=True
)
pairs[0:10]
>>>
[('man', 'old', {'count': 68}),
('country', 'native', {'count': 38}),
('first', 'now', {'count': 32}),
('death', 'life', {'count': 32}),
('human', 'being', {'count': 32}),
('natural', 'philosophy', {'count': 32}),
('eyes', 'tears', {'count': 30}),
('first', 'eyes', {'count': 28}),
('some', 'time', {'count': 28}),
('night', 'during', {'count': 28})]
plt.figure(figsize=(16,6))
nx.draw_networkx(G)
可以发现非常不直观,这种东西我们在关系网络里叫做hair balls. 我们可以通过subgraph()方法从图中提取一个子图出来
# Count co-occurrences for characters only
characters = [
'creature', 'monster', 'victor', 'elizabeth',
'william', 'henry', 'justine']
G_focus = G.subgraph(characters)
# Create list of edge counts
counts = [G_focus.edges[e]['count'] for e in G_focus.edges]
# Create spring layout
pos = nx.spring_layout(G_focus)
# Create figure and draw nodes
plt.figure(figsize=(20,10))
nx.draw_networkx_nodes(G_focus, pos)
# Draw edges
nx.draw_networkx_edges(
G_focus, pos, width=8,
edge_color=counts, edge_cmap=plt.cm.Blues, alpha=0.5)
nx.draw_networkx_edges(G_focus, pos, edge_color="#7f7f7f",alpha=0.5)
# Draw labels
nx.draw_networkx_labels(G_focus, pos)
plt.figure()
# plt.tight_layout()
Affiliation networks in NetworkX首先说一下affiliation networks的概念,我们知道edge代表了2个节点,也仅仅是2个节点之间的关系。但是在图关系中,有一些关系是没有办法通过仅仅一条边2个节点来表达的,比如多个节点属于同一团体,这种关系结构我们叫做affiliation networks。
从正式的定义上来说,一个affiliation network(或者叫bipartite network二分网络)。并且在这个网络环境中,同一个类型的节点不会直接相连。隶属关系网络对于表达多对多的关系非常有用。比如,一个电影可以包含非常多的演员,一个演员可以在多部电影中出现。
首先测试一个图是不是一个一个affiliation network.
from networkx.algorithms import bipartite
from networkx import NetworkXError
G = nx.karate_club_graph()
try:
left,right = bipartite.sets(G)
print('left nodes\n',left)
print('\nright nodes\n',right)
except NetworkXError as e:
print(e)
>>> Graph is not bipartite.
显然不是二分图,但是我们可以把友谊的关系实体化成节点,从而把这个图变成二分图。
B = nx.Graph()
B.add_edges_from([(v,(v,w)) for v,w in G.edges])
B.add_edges_from([(w,(v,w)) for v,w in G.edges])
try:
# find and print node sets
left,right = bipartite.sets(B)
print('left node\n',left)
print('\nright node\n',right)
except:
pass
如果我们仅仅只是想知道目前的图是不是一个二分图的话,可以直接调用is_bipartite方法
bipartite.is_bipartite(B)
>>> True
接下来看一个复杂点的,划分传播阶梯的例子,我们可以通过connected_components()来无视那些与图中其他点没有关系的节点。
with open(target,'r') as f:
# Skip header row
next(f)
for row in f:
# Break row into cells
cells = row.strip().split('\t')
# Get plant species and pollinator species
plant = cells[4].replace('_', '\n')
pollinator = cells[8].replace('_', '\n')
B.add_edge(pollinator, plant)
# Set node types
B.nodes[pollinator]["bipartite"] = 0
B.nodes[plant]["bipartite"] = 1
# Only consider connected species
B = B.subgraph(list(nx.connected_components(B))[0])
# Get node sets
pollinators = [v for v in B.nodes if B.nodes[v]["bipartite"] == 0]
plants = [v for v in B.nodes if B.nodes[v]["bipartite"] == 1]
现在我们已经有了2类边,我们接下来就可以进行可视化
# Create figure
plt.figure(figsize=(30,30))
# Calculate layout
pos = nx.spring_layout(B, k=0.9)
# Draw using different shapes and colors for plant/pollinators
nx.draw_networkx_edges(B, pos, width=3, alpha=0.2)
nx.draw_networkx_nodes(B, pos, nodelist=plants, node_color="#bfbf7f", node_shape="h", node_size=3000)
nx.draw_networkx_nodes(B, pos, nodelist=pollinators, node_color="#9f9fff", node_size=3000)
nx.draw_networkx_labels(B, pos);
Projections(投影)
投影能够做到的事情其实就是把bipartite变成single mode。
G = bipartite.projected_graph(B,plants)
plt.figure(figsize=(24,24))
pos = nx.spring_layout(G,k=0.5)
nx.draw_networkx_edges(G,pos,width=3,alpha=.2)
nx.draw_networkx_nodes(G,pos,node_color='#bfbf7f',node_shape='h',node_size=10000)
nx.draw_networkx_labels(G,pos);
当然也可以使用pollinators用来project
G = bipartite.projected_graph(B,pollinators)
plt.figure(figsize=(30,30))
pos = nx.spring_layout(G,k=.5)
nx.draw_networkx_edges(G, pos, width=3, alpha=0.2)
nx.draw_networkx_nodes(G, pos, node_color='#9f9fff',node_size=6000)
nx.draw_networkx_labels(G,pos);
我们可以通过下面的函数在进行projections的时候传入权重,对每一个中间的节点赋予1分的权重。
G = bipartite.weighted_projected_graph(B,plants)
list(G.edges(data=True))[0]
>>> ('Urospermum\npicrioides', 'Eryngium\ncampestre', {'weight': 4})
另外我们也可以通过jaccard index来对节点和节点之间的权重进行计算。jaccard index的计算方式就是将2个节点共有的neighbors除以2个节点中值存在一方的节点数量。我们可以通过overlap_weighted_projection_graph()可以通过Jaccard Index来创建一个projections图。可以用下面的代码来进行实例:
# Create co-affiliation network
G = bipartite.overlap_weighted_projected_graph(B, pollinators)
# Get weights
weight = [G.edges[e]['weight'] for e in G.edges]
# Create figure
plt.figure(figsize=(30,30))
# Calculate layout
pos = nx.spring_layout(G, weight='weight', k=0.5)
# Draw edges, nodes, and labels
nx.draw_networkx_edges(G, pos, edge_color=weight, edge_cmap=plt.cm.Blues, width=6, alpha=0.5)
nx.draw_networkx_nodes(G, pos, node_color="#9f9fff", node_size=6000)
nx.draw_networkx_labels(G, pos);