使用python+gephi分析金庸小说人物关系
参考文章:https://blog.csdn.net/weixin_39768541/article/details/84958298
1.模型构建
当两个人物在相邻段落出现时,视为两者存在关系
(该方法可以表示一定的人物关系,但是也存在显著缺陷,后续可进行相关改进)
2.人物获取
通过金庸小说网获取小说中所有人物
import requests
from bs4 import BeautifulSoup
import re
import jieba
from collections import Counter
import csv
respond = requests.get("http://www.jinyongwang.com/data/renwu/")
html = respond.text
soup = BeautifulSoup(html, 'lxml')
OringialPath = soup.find_all(class_="datapice")
pattern = re.compile('(?<=alt=").*?(?=")')
name_list = re.findall(pattern, str(OringialPath))
name_set = set(name_list)
成功实现获取金庸小说所有人物
3.文章获取
使用爬虫获取文章,并根据段落进行分割,保存到数组中
article = ["fei","xue","lian","tian","she","bai","lu","xiao","shu","shen","xia","yi","bi","yuan","yue"]
article = ["xiao"]
URL_Base = "http://www.jinyongwang.com/"
URL = [URL_Base + name + "/" for name in article]
print(URL)
data = []
for u in URL:
print(u)
book = u.split("/")[-2]
print(book)
listpath = []
try:
respond = reque