Python词云 wordcloud

最新推荐文章于 2022-05-09 11:03:53 发布

梦寐_

最新推荐文章于 2022-05-09 11:03:53 发布

阅读量362

点赞数

分类专栏： Python基础

本文链接：https://blog.csdn.net/HHG20171226/article/details/103103353

版权

Python基础专栏收录该内容

45 篇文章 2 订阅

订阅专栏

快速生成词云

from wordcloud import WordCloud

f = open(filename).read()
wordcloud = WordCloud(background_color="white", width=1000, height=860,margin=2).generate(f)

import matplotlib.pyplot as plt
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

# 导出图片
wordcloud.to_file('test.png')

参数：

width,height,margin可以设置图片属性
background_color 参数为设置背景颜色,默认颜色为黑色
font_path = r’D:\Fonts\simkai.ttf’
你可以通过font_path参数来设置字体集

方法：

generate() 可以对全部文本进行自动分词,但是他对中文支持不好,
to_file() 导出图片

- font_path : string  //字体路径，需要展现什么字体就把该字体路径+后缀名写上，可以解决中文乱码的问题，如：font_path = '黑体.ttf'
width : int (default=400)  //输出的画布宽度，默认为400像素

height : int (default=200)  //输出的画布高度，默认为200像素

prefer_horizontal : float (default=0.90) //词语水平方向排版出现的频率，默认 0.9 （所以词语垂直方向排版出现频率为 0.1 ）

mask : nd-array or None (default=None) //如果参数为空，则使用二维遮罩绘制词云。如果 mask 非空，设置的宽高值将被忽略，遮罩形状被 mask 取代。除全白（#FFFFFF）的部分将不会绘制，其余部分会用于绘制词云。如：bg_pic = imread('读取一张图片.png')，背景图片的画布一定要设置为白色（#FFFFFF），然后显示的形状为不是白色的其他颜色。可以用ps工具将自己要显示的形状复制到一个纯白色的画布上再保存，就ok了。

scale : float (default=1) //按照比例进行放大画布，如设置为1.5，则长和宽都是原来画布的1.5倍。

min_font_size : int (default=4) //显示的最小的字体大小

font_step : int (default=1) //字体步长，如果步长大于1，会加快运算但是可能导致结果出现较大的误差。

max_words : number (default=200) //要显示的词的最大个数

stopwords : set of strings or None //设置需要屏蔽的词，如果为空，则使用内置的STOPWORDS

background_color : color value (default=”black”) //背景颜色，如

background_color='white',背景颜色为白色。

max_font_size : int or None (default=None) //显示的最大的字体大小

mode : string (default=”RGB”) //当参数为“RGBA”并且background_color不为空时，背景为透明。

relative_scaling : float (default=.5) //词频和字体大小的关联性

color_func : callable, default=None //生成新颜色的函数，如果为空，则使用 self.color_func

regexp : string or None (optional) //使用正则表达式分隔输入的文本

collocations : bool, default=True //是否包括两个词的搭配

colormap : string or matplotlib colormap, default=”viridis” //给每个单词随机分配颜色，若指定color_func，则忽略该方法。

fit_words(frequencies) //根据词频生成词云

generate(text) //根据文本生成词云

generate_from_frequencies(frequencies[, ...]) //根据词频生成词云

generate_from_text(text) //根据文本生成词云

process_text(text) //将长文本分词并去除屏蔽词（此处指英语，中文分词还是需要自己用别的库先行实现，使用上面的 fit_words(frequencies) ）

recolor([random_state, color_func, colormap]) //对现有输出重新着色。重新上色会比重新生成整个词云快很多。

to_array() //转化为 numpy array

to_file(filename) //输出到文件

自定义字体颜色

利用背景图片生成词云,设置停用词词集

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

# d = path.dirname(__file__)

# Read the whole text.
text = open('yes-minister.txt').read()

# read the mask / color image taken from
# http://jirkavinse.deviantart.com/art/quot-Real-Life-quot-Alice-282261010
alice_coloring = np.array(Image.open('iconfinder_Spider.png'))

# 设置停用词
stopwords = set(STOPWORDS)
stopwords.add("said")

# 你可以通过 mask 参数 来设置词云形状
wc = WordCloud(background_color="white",
						max_words=2000, 
						mask=alice_coloring,						 
		               	stopwords=stopwords, 
		               	max_font_size=40, 
		               	random_state=42
               )
# generate word cloud
wc.generate(text)

# create coloring from image
image_colors = ImageColorGenerator(alice_coloring)

# show
# 在只设置mask的情况下,你将会得到一个拥有图片形状的词云
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.figure()
# recolor wordcloud and show
# we could also give color_func=image_colors directly in the constructor
# 我们还可以直接在构造函数中直接给颜色
# 通过这种方式词云将会按照给定的图片颜色布局生成字体颜色策略
plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.show()

输入词频

from os import path
from wordcloud import WordCloud
import matplotlib.pyplot as plt

d = path.dirname(__file__)
frequencies = {u'知乎': 0.1,  u'小段同学': 0.4,  u'曲小花': 0.3,  u'中文分词': 0.1,  u'样例': 0.1}
wordcloud = WordCloud(font_path="STSONG.TTF").fit_words(frequencies)

plt.imshow(wordcloud)
plt.axis("off")
plt.show()