python词云实现
安装相关包
首先需要安装三个包,可以使用pip命令进行安装:
pip install matplotlib
pip install jieba
pip install wordcloud
安装示例图:
文件数据
安装了 Python 的运行环境,我们还需要数据。
词云分析的对象,是文本。并且可以使用字典文件、txt文件绘制词云图,同时csv文件也可以转字典形式,统计频繁词的权重,并绘制词云图。
不同文件的词云生成,请阅读下面博客:
https://blog.csdn.net/weixin_43746433/article/details/89856014
理论上讲,文本可以是各种语言的。英文、中文、法文、阿拉伯文……
为了简便,我们这里以英文文本为例。本次使用python的部门英文介绍为例。
文本内容如下,并存储到"python.txt"文件中。
python python3 is good well bestbast shell cool
Age has reached the end of the beginning of a word. May be guilty in his seems to passing a lot of different life became the appearance of the
same day; May be backto oneself the paranoid weird belief disillusionment, these days, my mind has been very messy, in my mind constantly. Always feel oneself should go to do something, or write something. Twenty years of life trajectory deeply shallow, suddenly feel something, do it.The end of our life, and can meet many things really do?During myhood, think lucky money and new clothes are necessary for New Year, but as the advance of the age, will be more and more found that those things are optional; Junior high school, thought to have a crush on just means that the real growth, but over the past three years later, his writing of alumni in peace, suddenly found that isn’t really grow up, it seems is not so important; Then in high school, think don’t want to give vent to out your inner voice can be in the high school children of the feelings in a period, but was eventually infarction when graduation party in the throat, later again stood on the pitch he has sweat profusely, looked at his thrown a basketball hoops, suddenly found himself has already can’t remember his appearance.
英文文本制作词云
读取文件
filename = "python.txt"
with open(filename) as f:
mytext = f.read()
程序打开了python.txt 文本文件,把里面的内容都读了出来,存储到了一个叫做mytext 的变量里面。
然后我们尝试显示 mytext 的内容。注意:之后的步骤里,也千万不要忘了这一确认执行动作。
显示的结果如下图所示
绘制词云
from wordcloud import WordCloud
wordcloud = WordCloud().generate(mytext)
print(wordcloud)
此时词云分析已经完成了。你没看错,制作词云的核心步骤只需要这2行语句,而且第一条还只是从扩展包里找外援。但是程序并不会给我们显示任何东西。
%pylab inline
import matplotlib.pyplot as plt
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
高级用法:按指定形状输出词云
设置词云形状
对wordcloud的 mask参数进行设置
mask : nd-array or None (default=None)
If not None, gives a binary mask on where to draw words. If mask is not None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd “masked out” while other entries will be free to draw on. [This changed in the most recent version!]
首先使用numpy将图片转换为 array类型
读取文本文件
filename = "python.txt"
with open(filename) as f:
mytext = f.read()
from PIL import Image
import numpy as np
alice_mask = np.array(Image.open("1.png"))
设置WorldCloud参数
wc = WordCloud(
background_color="white",
max_words=2000,
mask=alice_mask)
wordcloud = wc.generate(mytext)
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()
中文文本绘制词云
#!/usr/bin/Python
# -*- coding: utf-8 -*-
from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud, STOPWORDS
# Read the whole text.
file = open(r'高等数学.txt', "r").read()
default_mode =jieba.cut(file)
text = " ".join(default_mode)
text结果如下:
alice_mask = np.array(Image.open ("3.jpg"))
wc = WordCloud(
background_color="white",
max_words=2000,
mask=alice_mask)
# generate word cloud
wc.generate(text)
# store to file
wc.to_file("2_result.jpg")
# show
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()
python词云不能显示中文
当进行图片输出时,可以看到词云上无中文输出。
设置字体
font_path=‘simfang.ttf’,
wc = WordCloud(
background_color="white",
max_words=2000,
mask=alice_mask)
wc.generate(text)
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
wc = WordCloud(
#设置字体,不指定就会出现乱码,这个字体文件需要下载
font_path='simfang.ttf',
background_color="white",
max_words=2000,
mask=alice_mask)
wc.generate(text)
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
font_path=‘simfang.ttf’,
参数详解
alice_mask是以数组的形式加载图画
stopwords设置停止显示的词语
WordCloud设置词云的属性
generate生成词云
to_file储存图片
WordCloud类的相关属性:
进入wordcloud.py可以看到WordCloud类的相关属性:
其中:
font_path表示用到字体的路径
width和height表示画布的宽和高
prefer_horizontal可以调整词云中字体水平和垂直的多少
mask即掩膜,产生词云背景的区域
scale:计算和绘图之间的缩放
min_font_size设置最小的字体大小
max_words设置字体的多少
stopwords设置禁用词
background_color设置词云的背景颜色
max_font_size设置字体的最大尺寸
mode设置字体的颜色 但设置为RGBA时背景透明
relative_scaling设置有关字体大小的相对字频率的重要性
regexp设置正则表达式
collocations 是否包含两个词的搭配
在generate函数中调试进去可以看到函数:
words=process_text(text)可以返回文本中的词频
generate_from_frequencies根据单词和词频创造一个词云
默认属性值如下:
Init signature:
WordCloud(
font_path=None,
width=400,
height=200,
margin=2,
ranks_only=None,
prefer_horizontal=0.9,
mask=None,
scale=1,
color_func=None,
max_words=200,
min_font_size=4,
stopwords=None,
random_state=None,
background_color='black',
max_font_size=None,
font_step=1,
mode='RGB',
relative_scaling='auto',
regexp=None,
collocations=True,
colormap=None,
normalize_plurals=True,
contour_width=0,
contour_color='black',
repeat=False,
include_numbers=False,
min_word_length=0,
collocation_threshold=30,
)