20大报告词云统计

冯涛20

已于 2024-04-19 12:21:37 修改

阅读量891

点赞数 8

文章标签： python

于 2024-04-19 12:20:21 首次发布

本文链接：https://blog.csdn.net/weixin_66206430/article/details/137960397

版权

20大报告中的关键词和高频词对于学习有很大帮助。尝试利用分词、词云和图像处理三个库

实现一个简单的20大报告分词系统。

分词：jieba

词云：wordcloud

图像处理：imageio

用网上资源生成一个20大报告的txt文件和一个白底的中国地图，

然后选择图片尺寸，字体，底色，掩模，对意义不大的单字如“的、了、是，又”等做去除。

源码如下：

# 词频统计
import wordcloud as wc
import jieba
import imageio.v2 as img
with open("20d报告.txt") as f:
s = f.read()
ls = jieba.lcut(s) # 生成分词列表
text = ' '.join(ls) # 连接成字符串
mask = img.imread('chinamap.jpg')
stopwords = ["的","地","是","了","不","为","在","既","但","有","又","还","并","和","就","都","这"]
w = wc.WordCloud(font_path = "msyh.ttc",
mask = mask,
width = 1000,
height = 700,
background_color='white',
max_words = 100,
stopwords = stopwords).generate(text)
w.to_file('20大.png')