《沉默的真相》的舆情分析及文本挖掘（二）——以原著小说，京东商品为例

最新推荐文章于 2024-01-19 11:06:24 发布

嘤酱丶

最新推荐文章于 2024-01-19 11:06:24 发布

阅读量2.1k

点赞数 1

文章标签：网络图

本文链接：https://blog.csdn.net/qq_43648033/article/details/121930258

版权

5. 原著小说《长夜难明》的数据分析结果

5.1. 主要人物社交网络

由于网络剧《沉默的真相》改编自原著《长夜难明》，其主要角色没有太大变动，但仍有微小差异，例如小说中的“乐乐”在剧中名为“小树”，重新构建同名词典（见表5.1）在此基础上本文构建原著小说中的主要人物社交网络图谱，这里对小说全文使用jieba.cut精准文字模式，只检索指定人名。角色中的连接信息由是否在同一段出现来确定，有则边的值+1，最终得到28个节点（人物名称以及出现次数）和231条人物关系统计边。
最终的人物关系是在Gephi中绘制，其是一个开源的复杂网络数据可视化软件，可用于对链路分析、社交网络分析、生物网络分析等进行探索分析。通过Python把数据处理成Gephi可接受的csv格式，然后再进行绘制。这里我使用Fruchterman Reingold布局，是基于再次改进的弹性模型提出了FR算法。最终得到人物关系图见图5.1。
在这里插入图片描述

import os
import time
import jieba
import codecs
import chardet
import jieba.posseg as pseg

content_path= r"D:\Learning\LPython\bigDataClass_2020Fall\paper_TheLongNight\Character-interaction-visualization-master\长夜难明.txt"
dict_path = r"D:\Learning\LPython\bigDataClass_2020Fall\paper_TheLongNight\Character-interaction-visualization-master\人名.txt"

'''
核心业务代码是 self.gephi_node_name 和 self.gephi_edge  用于处理检查统计的人物
其中 gephi_cofig.gephi_node_name 会统计人物出现的次数，并且生成给 gephi_config.gephi_edege 处理所需要的 _lineNames 

'''
class gephi_config:
    def __init__(self,content_path,dict_path,name_mode = True):
        start = time.time()

        # 获取py文件所在的path
        self._src = os.path.split(os.path.realpath(__file__))[0]
        
        self.local_file_path()
        # 确认生成文件的名称 
        now = self.get_time()
        dirname = os.path.dirname(self._content_path)   
        content_name = os.path.splitext(os.path.split(self._content_path)[1])[0]
        self._dst_node = os.path.join(dirname,(content_name +'gephi_node-' + now +'.csv'))
        self._dst_edge = os.path.join(dirname,(content_name +'gephi_edge-' + now +'.csv'))

        print('生成的节点文件是：{} \n生成的连接文件是：{}'.format(self._dst_node[-19:],self._dst_edge[-19:]))
        
        # 初始化状态、列表、指点对象
        self._name_mode = name_mode # 确定使用那种模式
        self._lineNames= []         # 姓名检索列表
        self._names = {
   }			# 姓名字典
        self._relationships = {
   }	# 关系字典
      
        # 读取self.dict_name内容并且构建出jieba 需要的文本。
        with codecs.open(self._dict_path,'r',self.content_coding(self._dict_path)) as f:
            self._name_list = f.read().split('\r\n')
            list_name = self._name_list
            list_name[-1] = list_name[-1] + ' 10 nr\r\n'
            self._jieba_dict = ' 10 nr\r\n'.join(list_name)

        # 进入处理流程
        self.do_first()
        # 打印处理时间
        end = time.time()
        usetime = end - start