三国演义人名词云

设计一个程序,读出“三国演义.txt”文件中的三国演义全文,将常见人名进行去重后生成词云,并列出词频最高的10-20

import jieba  # 优秀的中文分词第三方库
import wordcloud
from matplotlib import pyplot
mk = pyplot.imread('caochao.jpg')
txt = open('三国演义.txt','r',encoding='utf-8').read()
# 排除一些不是人名,但是出现次数比较靠前的单词
excludes = {"将军", "却说", "荆州", "二人", "不可", "不能", "如此", "商议", "如何", "主公", "军士", "左右", "军马", "引兵", "次日", "大喜", "天下", "东吴",
            "于是", "今日", "不敢", "魏兵", "陛下", "一人", "都督", "人马", "不知", "汉中", "只见", "众将", "后主", "蜀兵", "上马", "大叫", "太守", "此人",
            "夫人", "先主", "后人", "背后", "城中", "天子", "一面", "何不", "大军", "忽报", "先生", "百姓", "何故", "然后", "先锋", "不如", "赶来", "原来",
            "令人", "江东", "下马", "喊声", "正是", "徐州", "忽然", "因此", "成都", "不见", "未知", "大败", "大事", "之后", "一军", "引军", "起兵", "军中",
            "接应", "进兵", "大惊", "可以", "以为", "大怒", "不得", "心中", "下文", "一声", "追赶", "粮草", "曹兵", "一齐", "分解", "回报", "分付", "只得",
            "出马", "三千", "大将", "许都", "随后", "报知", "前面", "之兵", "且说", "众官", "洛阳", "领兵", "何人", "星夜", "精兵", "城上", "之计", "不肯",
            "相见", "其言", "一日", "而行", "文武", "襄阳", "准备", "若何", "出战", "亲自", "必有", "此事", "军师", "之中", "伏兵", "祁山", "乘势", "忽见",
            "大笑", "樊城", "兄弟", "首级", "立于", "西川", "朝廷", "三军", "大王", "传令", "当先", "五百", "一彪", "坚守", "此时", "之间", "投降", "五千",
            "埋伏", "长安", "三路", "遣使", "英雄","回见","大将军","是夜","小路","望见","无不","有人","马下","必然","将士","甘宁","下寨","杀出","诸葛","中原",
            "屯兵","邓艾","蛮兵","之意","城下","前来","武士","城外","出迎","本部","两路","一阵","连夜","四面","奔走","交锋","冀州","细作","使者","江南","杀来",
            "人报","而出","心腹","何处","皇叔","众人","当日","吴兵","兴兵","何以","如之奈何","先帝","江夏","前进","国家","城门","杀入","两军","来到","厮杀","两个","拜谢",
            "岂可","慌忙","饮酒","为首","性命","进发","谋士","此言"}


# 精确模式,把文本精确的切分开,不存在冗余单词,返回列表类型
words = jieba.lcut(txt)
# 构造一个字典,来表达单词和出现频率的对应关系
counts = {}
# 逐一从words中取出每一个元素
for word in words:
    # 已经有这个键的话就把相应的值加1,没有的话就取值为0,再加1
    if len(word) == 1:
        continue
    elif word == "诸葛亮" or word == "孔明曰":
        rword = "孔明"
    elif word == "关公" or word == "云长":
        rword = "关羽"
    elif word == "玄德" or word == "玄德曰":
        rword = "刘备"
    elif word == "孟德" or word == "丞相":
        rword = "曹操"
    else:
        rword = word
    # 如果在里面返回他的次数,如果不在则添加到字典里面并加一
    counts[rword] = counts.get(rword, 0) + 1
# 删除停用词
for word in excludes:
    del counts[word]
# 排序,变成list类型,并使用sort方法
items = list(counts.items())
# 对一个列表按照键值对的2个元素的第二个元素进行排序
# Ture从大到小,结果保存在items中,第一个元素就是出现次数最多的元素
items.sort(key=lambda x: x[1], reverse=True)
# 将前十个单词以及出现的次数打印出来
name = []
times = []
for i in range(40):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word, count))
    name.append(word)
    times.append(count)
# 词云部分
w = wordcloud.WordCloud(
    font_path='songti.TTF',  # 设置字体
    background_color="white",  # 设置词云背景颜色
    max_words=1000,  # 词云允许最大词汇数
    max_font_size=100,  # 最大字体大小
    random_state=50, # 配色方案的种数
    mask=mk
)
txt = " ".join(name)
w.generate(txt)
w.to_file("ciyun.png")

个词,并形成词云(可以有不同的形状)

  • 4
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
好的,这是一个比较常见的NLP任务,我们可以按照以下步骤来完成: 1. 读取三国演义的文本文件。 2. 使用正则表达式或其他方法,提取文本中的人名。 3. 对人名进行去重和统计词频,并找出出现频率最高的10-20个人名。 4. 根据人名的词频,生成词。 下面是一个Python程序,可以实现上述步骤: ```python import jieba import jieba.analyse from wordcloud import WordCloud, ImageColorGenerator import matplotlib.pyplot as plt import numpy as np from PIL import Image import re # 读取文本文件 with open('三国演义.txt', 'r', encoding='utf-8') as f: text = f.read() # 提取人名 pattern = re.compile('[\u4e00-\u9fa5]{2,4}(?:·[\u4e00-\u9fa5]{2,4})*') names = pattern.findall(text) # 去重 names = list(set(names)) # 统计词频 freq = {} for name in names: freq[name] = text.count(name) # 找出出现频率最高的10-20个人名 top_names = sorted(freq.items(), key=lambda x: x[1], reverse=True)[:20] # 生成词 mask = np.array(Image.open('cloud.png')) wc = WordCloud(font_path='msyh.ttc', background_color='white', max_words=2000, mask=mask) wc.generate_from_frequencies(freq) image_colors = ImageColorGenerator(mask) wc.recolor(color_func=image_colors) plt.imshow(wc, interpolation='bilinear') plt.axis('off') plt.show() ``` 其中,`cloud.png`是词的形状图片,可以根据实际需求自行替换。`msyh.ttc`是字体文件,如果没有该字体文件,可以使用其他中文字体。 运行程序后,即可生成三国演义名词,并列出词频最高的20个人名。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

想敲代码的羊羊羊

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值