wordcloud介绍
wordcloud(词云图)
可能比较陌生,但是如果你见到图1,可能就知道了,常用来直观展示一段话或者一篇文章内出现频次高低的文字顺序。
wordcloud的python实现
废话不多说,直接上代码,如果你是用的PyCharm,在install包遇到了问题,可以参考博客Python使用PyCharm时import包遇到的那些坑;
#导入词云图的包
from wordcloud import WordCloud
#导入画图的包
import matplotlib.pyplot as plt
#导入结巴分词的包
import jieba
#导入图片加载的包
from PIL import Image
#导入数组处理的包
import numpy as np
# 生成词云函数
def create_word_cloud(words):
# 使用结巴分词
text = " ".join(jieba.cut(words,cut_all=False, HMM=True))
print(text)
mask = np.array(Image.open("yuan.jpg"))
wc = WordCloud(
background_color="white",
max_words=100,
width=2000,
height=2000,
mask=mask,
font_path="DomoAregato Normal.ttf"
)
wordcloud = wc.generate(text)
# 写词云图片
wordcloud.to_file("wordcloud.jpg")
# 显示词云文件
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
if __name__=='__main__':
s="""
running running running running running running
swimming swimming swimming
LOL LOL LOL LOL LOL LOL LOL
study study study
trip trip
shopping shopping
reading reading reading
ball ball
Singing Singing Singing
movies movies
dancing
bar
"""
create_word_cloud(s)
代码解析如下;
mask = np.array(Image.open("yuan.jpg"))
这句是将背景图片,如图3作为词云图的框图形加载进来,赋值给WordCloud
函数做参数,即生成的词云图会以该图片为形状,如果参数不写,默认为长方形;
WordCloud
函数的其他参数如图3;关于font_path="DomoAregato Normal.ttf"
字体文件,可以去字体量贩网站下载,然后放到你的项目的某个路径就行,这里用了DomoAregato Normal.ttf
字体;
wordcloud效果图
效果图如图4,可见这位同学最喜欢的运动是LOL;