人民的名义关系可视化展示

最新推荐文章于 2024-07-28 15:46:11 发布

世纪殇

最新推荐文章于 2024-07-28 15:46:11 发布

阅读量945

点赞数

分类专栏：基于Python的数据分析实战文章标签： python 可视化

本文链接：https://blog.csdn.net/dasgk/article/details/113644677

版权

基于Python的数据分析实战专栏收录该内容

4 篇文章 8 订阅

订阅专栏

实验环境

版本：Python 3.7

依赖：networkx，pyplot，jieba，codecs

networkx 的安装

我在安装时遇到了使用pip3 install networkx 安装成功，但是却安装到了py3.9的软件包目录中，也就是说安装成功，但是程序运行时无法使用的问题，最后通过pip3 install whl的方式安装成功，whl 下载地址

流程说明

加载自定义词典，提高结巴分词人名的识别率
加载要分析的人名列表
加载同义词，比如文中“赵德海” 又叫“老赵” “钟小艾”又叫"小艾"，后面备用
使用结巴分词，按行读取小说内容。进行数据解析，记录每一行中出现的人名，不过滤，不去重（这里需要结合同义词，进行替换）
根据数据解析结果分析关系，结果是一个二维数组类似：relationship['侯亮平']['钟小艾'] = 600,relationship['侯亮平']['蔡成功'] = 344，并将其固化到文件中
读取文件内容，整理格式获得每个人的关系，并进行可视化展示
结果展示
案例以及数据下载

代码案例

# -*- coding: utf-8 -*-
import networkx as nx
import matplotlib.pyplot as plt
import os.path
import jieba
import codecs
import jieba.posseg as pseg


names = {}          # 姓名字典
relationships = {}  # 关系字典
lineNames = []      # 每段内人物关系
combine_words = {}  #同义词替换


jieba.load_userdict("book/person.txt")      # 加载自定义词典，jieba分词，对人名的识别效果较差，矫正人名识别的正确率
# 获取我们关系的人物名称，后面，只关心这个，不处理其他
with codecs.open("book/person.txt", 'r', 'utf8') as f:
    for line in f.readlines():    # 注意是 readlines 要加s 不加s 只读取一行
        index = int(line.index('100'))
        index = index - 1
        names[line[0:index]] = 0

# 获取同义词信息
with codecs.open("book/combine_work.txt", 'r', 'utf8') as f:
    for line in f.readlines():
        line = line.replace("\n", '')
        words = line.split(' ')
        target_name = words[0]
        combine_words[target_name] = words[1:]
#  原理是一行 算是一段，这一行里是出现的名字作为一个数组，两行之间用一个空数组间隔
#  附加项：需要进行同义词分析
with codecs.open("book/people_noval.txt", 'r', 'utf8') as f:
    for line in f.readlines():    # 注意是 readlines 要加s 不加s 只读取一行
        poss = pseg.cut(line)    # 分词，返回词性
        lineNames.append([])    # 为本段增加一个人物列表
        for w in poss:
            if w.flag != 'nr' or len(w.word) < 2:
                continue    # 当分词长度小于2或该词词性不为nr（人名）时认为该词不为人名
            if names.get(w.word) is None:    # 如果某人物（w.word）不在人物字典中
                # 如果是同义词也需要处理，这里是附加项
                for k in combine_words:
                    if w.word in combine_words[k]:
                        names[k] += 1
                        relationships[k] = {}
                        lineNames[-1].append(k)  # 为当前段的环境增加一个人物
                    else:
                        continue
            else:
                names[w.word] += 1
                relationships[w.word] = {}
                lineNames[-1].append(w.word)  # 为当前段的环境增加一个人物

# 分析关系，一行同时出现的人名为一个处理单元，冒泡分析
# 同时出现次数，为关系指数，次数越多，关系越紧密
for line in lineNames:                  # 对于每一段
    for name1 in line:
        for name2 in line:              # 每段中的任意两个人
            if name1 == name2:
                continue
            if relationships[name1].get(name2) is None:     # 若两人尚未同时出现则新建项
                relationships[name1][name2] = 1
            else:
                relationships[name1][name2] = relationships[name1][name2] + 1        # 两人共同出现次数加 1


if os.path.exists("book/person_edge.txt"):
    os.remove("book/person_edge.txt")

# 将关系固化到文件中
with codecs.open("book/person_edge.txt", "a+", "utf-8") as f:
    for name, edges in relationships.items():
        for v, w in edges.items():
            if w > 20:
                f.write(name + " " + v + " " + str(w) + "\r\n")

# 将文件内容加载到数据中，方便后续可视化展示
a = []
f = open('book/person_edge.txt','r',encoding='utf-8')
line = f.readline()
while line:
    a.append(line.split())   #保存文件是以空格分离的
    line = f.readline()


def draw_graph(data):
    G = nx.Graph()
    # 添加带权边
    for edge in data:
        start = edge[0]
        end = edge[1]
        weight = int(edge[2])
        # 目前wight没有超过700的，则进行判断
        weight = weight/100
        G.add_edge(start,end, weight=weight)
    # 按权重划分为重权值得边和轻权值的边
    elarge = [(u, v) for (u, v, d) in G.edges(data=True) if d['weight'] > 5]
    emiddle = [(u, v) for (u, v, d) in G.edges(data=True) if (d['weight'] <= 5 and d['weight'] >1 )]
    esmall = [(u, v) for (u, v, d) in G.edges(data=True) if d['weight'] <= 1]
    # 节点位置
    pos = nx.spring_layout(G)  # positions for all nodes
    # 首先画出节点位置
    # nodes  node_size可以不断调整节点大小，进行可视化输出
    nx.draw_networkx_nodes(G, pos, node_size=300)
    # 根据权重，实线为权值大的边，虚线为权值小的边
    # 权值分三级，关系最多的，用黑色，其次用黄色，相对关系最弱的，用蓝色
    nx.draw_networkx_edges(G, pos, edgelist=elarge,
                           width=1,edge_color='black')
    nx.draw_networkx_edges(G, pos, edgelist=emiddle,
                           width=1, alpha=0.7, edge_color='yellow')

    nx.draw_networkx_edges(G, pos, edgelist=esmall,
                           width=1, alpha=0.3, edge_color='blue', style='dashed')

    nx.draw_networkx_labels(G, pos, font_size=6, font_family='sans-serif')

    plt.show()  # display

draw_graph(a)

结果展示

资料下载密码:liei

世纪殇

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
1
评论
人民的名义关系可视化展示

人物关系分析是数据分析一个典型简单应用，网上有很多关于《人民的名义》人物关系分析案例，但是其中不少是直接贴代码，并未针对代码进行说明，以及每个文件的作用以及分析原理。本文冷饭热吃，重新对其进行梳理分析。并提供完整资料下载。祝大家学的愉快
复制链接

扫一扫