Python实现共现矩阵及networkx可视化结果

最新推荐文章于 2024-06-06 10:41:06 发布

TonyHsuM

最新推荐文章于 2024-06-06 10:41:06 发布

阅读量6.4k

点赞数 5

分类专栏： Python实现数据爬取，数据库存储，可视化文章标签： python 大数据

本文链接：https://blog.csdn.net/SouthWooden/article/details/112648623

版权

Python实现数据爬取，数据库存储，可视化专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Python实现共现矩阵及networkx可视化结果

共现矩阵
- 代码实现
networkx可视化
- 代码实现
问题记录
参考文章

共现矩阵

共现矩阵：也称为共词矩阵，能表明两个词之间的关系程度

首先假设我们有两句话，如下图所示，通过jieba分词和停用词词表过滤，我们可以得到以下结果：

test = ["E的B的C", "B的C的D"]

在这里插入图片描述

接着我们可以通过关键词来构建共现矩阵，可以看到，BE同时出现一次，则其权重为1，BC同时出现两次，则其权重为2，以此类推
由此可以看出，共现矩阵是一个对角矩阵。
共现矩阵的[0][0]为空。
共现矩阵的第一行第一列是关键词。
对角线全为0。
共现矩阵其实是一个对称矩阵。

代码实现

# -*- coding: utf-8 -*-
import networkx as nx
import matplotlib.pyplot as plt
import jieba
import numpy as np

test = ["E的B的C", "B的C的D"]

stopwords = [line.strip() for line in
             open('stopwords_unduplicated.txt', encoding='UTF-8').readlines()]  # 停用词词表
cut_text1 = jieba.cut(test[0].replace(' ', ''))
cut_text2 = jieba.cut(test[1].replace(' ', ''))
results1 = []
for word in cut_text1:
    if word not in stopwords:
        if word != '\t':
            results1.append(word)
print("result1 is :", results1)

results2 = []
for word in cut_text2:
    if word not in stopwords:
        if word != '\t':
            results2.append(word)
print("result2 is :", results2)

# 合并列表
result = list(set(results1).union(set(results2)))
print("union result is :", result)
x = len(result)
# 创建二维矩阵
matrix = [[0 for x in range(x+1)] for y in range(x+1)]
weight = 0
for i in range(0, x):
    matrix[0][i+1] = result[i]
for j in range(0, x):
    matrix[j+1][0] = result[j]


# print(np.array(matrix))

for i in range(1, x+1):  # i的范围为 1 到 词数
    for j in range(1, x + 1 - i):  # n的范围为 1到（词数-i）   i+n的范围为 i 到 词数
        word1 = result[i - 1]
        word2 = result[i + j - 1]
        print("In %d iteration, for No.%d word pair:" % (i, j), word1, word2)
        Common_weight = 0

        if word1 in results1 and word1 in results2 and word2 in results1 and word2 in results2:
            # 如果word1和word2同时出现在两个句子中，权重为2
            Common_weight = 2
        elif (word1 in results2) and (word2 in results2):
            Common_weight = 1
        elif (word1 in results1) and (word2 in results1):
            Common_weight = 1
        matrix[i][i + j] = Common_weight    # 该矩阵为对角矩阵
        matrix[i + j][i] = Common_weight
        print("For (%s,  %s), the common_weight is : %d" % (word1, word2, Common_weight))
print("The co-occurrence matrix is:")
# np.array() 将二维数组换行输出
print(np.array(matrix))

networkx可视化

代码实现

# 定义有向图
DG = nx.Graph()
# 添加五个节点(列表)
DG.add_nodes_from(['B', 'C', 'D', 'E'])
print(DG.nodes())
# 添加边(列表)
DG.add_edge('B', 'D', weight=1)
DG.add_edge('B', 'C', weight=2)
DG.add_edge('B', 'E', weight=1)
DG.add_edge('C', 'D', weight=1)
DG.add_edge('C', 'E', weight=1)
DG.add_edge('D', 'E', weight=1)
# DG.add_edges_from([('B', 'C'), ('B', 'D'), ('B', 'E'), ('C','D'),('C','E'),('D','E')])
print("The edges for this graph are: ", DG.edges())
# 绘制图形 设置节点名显示\节点大小\节点颜色
colors = ['red', 'green', 'pink', 'orange']
# 按权重划分为重权值得边和轻权值的边
# 按权重划分为重权值得边和轻权值的边
edge_large = [(u, v) for (u, v, d) in DG.edges(data=True) if d['weight'] > 1.5]
edge_small = [(u, v) for (u, v, d) in DG.edges(data=True) if d['weight'] <= 1.5]
# 节点位置
pos = nx.spring_layout(DG)  # positions for all nodes
# 首先画出节点位置
# nodes
nx.draw_networkx_nodes(DG, pos, node_size=500, node_color=colors)
# 根据权重，实线为权值大的边，虚线为权值小的边
# edges
nx.draw_networkx_edges(DG, pos, edgelist=edge_large,
                       width=6)
nx.draw_networkx_edges(DG, pos, edgelist=edge_small,
                       width=6, alpha=0.5, edge_color='b', style='dashed')

# labels标签定义
nx.draw_networkx_labels(DG, pos, font_size=20, font_family='sans-serif')

plt.axis('off')
plt.savefig('fig.png', bbox_inches='tight')

在这里插入图片描述

问题记录

pycharm画图：warnings.warn("This figure includes Axes that are not
compatible "
报错原因在于plt.tight_layout在某些情况下不能顺利工作
解决方法：删掉plt.show()，加上plt.savefig(‘fig.png’,bbox_inches=‘tight’)

参考文章

[Pyhon大数据分析] 五.人民网新闻话题抓取及Gephi构建主题知识图谱
 python 共现矩阵的实现
 python 共现矩阵构建
 python networkx 根据图的权重画图实现

TonyHsuM

关注

5
点赞
踩
62

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python实现共现矩阵及networkx可视化结果

Python实现共现矩阵及networkx可视化结果共现矩阵代码实现networkx可视化代码实现参考文章共现矩阵共现矩阵：也称为共词矩阵，能表明两个词之间的关系程度首先假设我们有两句话，如下图所示，通过jieba分词和停用词词表过滤，我们可以得到以下结果：test = ["E的B的C", "B的C的D"]接着我们可以通过关键词来构建共现矩阵，可以看到，BE同时出现一次，则其权重为1，BC同时出现两次，则其权重为2，以此类推由此可以看出，共现矩阵是一个对角矩阵。共现矩阵的[0][0
复制链接

扫一扫