wordcloud基本使用关键步骤

最新推荐文章于 2024-06-04 06:56:40 发布

寻找自由的咸鱼

最新推荐文章于 2024-06-04 06:56:40 发布

阅读量5.1k

点赞数 3

分类专栏：工具使用 python

本文链接：https://blog.csdn.net/qq_36206070/article/details/106414365

版权

python 同时被 2 个专栏收录

8 篇文章 1 订阅

订阅专栏

工具使用

6 篇文章 0 订阅

订阅专栏

所需要的库

主要需要的就两个，一个是wordcloud自己，还有一个就是画图所需的plt

from wordcloud import WordCloud
import matplotlib.pyplot as plt

输入文本

text = open('constitution.txt').read()
#文本是空格隔开的词汇集合

调用接口生成词云

wordcloud.generate(text) #调用接口生成词云 步骤如下
generate(self, text)
=>
self.generate_from_text(text)
=>
words = self.process_text(text) #预处理和词频统计
self.generate_from_frequencies(words) #生成词云

预处理细节process_text

分词、保留单词字符、去除单字符
去除停用词
去除后缀s
去除纯数字
统计一元和二元词频

返回的结果是字典dict(string, int)分词的token和出现的次数

生成词云generate_from_frequencies

对词计数进行排序，并归一化到0~1之间，得到词频
创建图片并确定font_size初始值
给self.words_赋值，记录的是出现频率最高的前max_words个词，以及对应的归一化后的词频，即dict(token, normalized_frequency)
画出灰度图：词频越大，font_size越大；根据生成的随机数来决定字的水平/垂直方向。若随机数小于self.prefer_horizontal则为水平方向，否则为垂直方向；如果空间不足，优先考虑旋转方向，其次考虑将字体变小
给self.layout_赋值，记录的是词和词频、字体大小、位置、方向、以及颜色，即list(zip(frequencies, font_sizes, positions, orientations, colors))

最终将信息呈现在图上

wordcloud.to_file(filename) 保存到文件
plt.imshow(wordcloud) 直接画出

完整代码如下

from wordcloud import WordCloud,STOPWORDS
import matplotlib.pyplot as plt
words = ["abstr_number","abstr_image","abstr_code_section","https","github"] 
#words为自定义的停用词 自行添加
stopwords = set(STOPWORDS)
for word in words:
    stopwords.add(word) #添加停用词```
strr = open(u'filename','r',encoding='utf-8').read()
#filename自定义
wordcloud = WordCloud(
    random_state=30, #配色方案
    background_color="white",
    width=1500,
    height=960,
    margin=5,
    stopwords=stopwords
).generate(strr)
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

以下二者等价

方法一：

wordcloud = WordCloud(
#    mask=backgroud_Image, 背景图片
    random_state=30, #配色方案
    background_color="white",
    width=1500,
    height=960,
    margin=5,
    stopwords=stopwords
#    ,colormap = 'Blues'
).generate(str)
plt.imshow(wordcloud)

方法二：

wordcloud = WordCloud(
#    mask=backgroud_Image, 背景图片
    random_state=30, #配色方案
    background_color="white",
    width=1500,
    height=960,
    margin=5,
    stopwords=stopwords
#    ,colormap = 'Blues'
)
process_word = WordCloud.process_text(wordcloud,str)
ans = WordCloud.generate_from_frequencies(wordcloud,process_word)
plt.imshow(ans)

寻找自由的咸鱼

关注

3
点赞
踩
23

收藏

觉得还不错? 一键收藏
0
评论
wordcloud基本使用关键步骤

所需要的库主要需要的就两个，一个是wordcloud自己，还有一个就是画图所需的pltfrom wordcloud import WordCloudimport matplotlib.pyplot as plt输入文本text = open('constitution.txt').read()#文本是空格隔开的词汇集合调用接口生成词云wordcloud.generate(text) #调用接口生成词云步骤如下generate(self, text)=>self.genera
复制链接

扫一扫