写作本篇文章用时一个小时半,阅读需要十分钟,读完该文章后你将学会如何将任意中文文本生成词云
Python词云 worldcloud 十五分钟入门与进阶
Python中文分词 jieba 十五分钟入门与进阶
代码组成简介
- 代码部分来源于其他人的博客,但是因为bug或者运行效率的原因,我对代码进行了较大的改变
- 代码第一部分,设置代码运行需要的大部分参数,你可以方便的直接使用该代码而不需要进行过多的修改
- 第二部分为jieba的一些设置,当然你也可以利用isCN参数取消中文分词
- 第三部分,wordcloud的设置,包括图片展示与保存
如果你想用该代码生成英文词云,那么你需要将isCN参数设置为0,并且提供英文的停用词表,但是我更推荐你使用Python词云 worldcloud 十五分钟入门与进阶这篇文章中的代码,因为它更简洁,更有利于使用’
关于该程序的使用,你可以直接读注释在数分钟内学会如何使用它
from os import path
from scipy.misc import imread
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud, ImageColorGenerator
d = path.dirname(__file__)
stopwords = {}
isCN = 1
back_coloring_path = "img/lz1.jpg"
text_path = 'txt/lz.txt'
font_path = 'D:\Fonts\simkai.ttf'
stopwords_path = 'stopwords\stopwords1893.txt'
imgname1 = "WordCloudDefautColors.png"
imgname2 = "WordCloudColorsByImg.png"
my_words_list = ['路明非']
back_coloring = imread(path.join(d, back_coloring_path))
wc = WordCloud(font_path=font_path,
background_color="white",
max_words=2000,
mask=back_coloring,
max_font_size=100,
random_state=42,
width=1000, height=860, margin=2,
)
def add_word(list):
for items in list:
jieba.add_word(items)
add_word(my_words_list)
text = open(path.join(d, text_path)).read()
def jiebaclearText(text):
mywordlist = []
seg_list = jieba.cut(text, cut_all=False)
liststr="/ ".join(seg_list)
f_stop = open(stopwords_path)
try:
f_stop_text = f_stop.read( )
f_stop_text=unicode(f_stop_text,'utf-8')
finally:
f_stop.close( )
f_stop_seg_list=f_stop_text.split('\n')
for myword in liststr.split('/'):
if not(myword.strip() in f_stop_seg_list) and len(myword.strip())>1:
mywordlist.append(myword)
return ''.join(mywordlist)
if isCN:
text = jiebaclearText(text)
wc.generate(text)
image_colors = ImageColorGenerator(back_coloring)
plt.figure()
plt.imshow(wc)
plt.axis("off")
plt.show()
wc.to_file(path.join(d, imgname1))
image_colors = ImageColorGenerator(back_coloring)
plt.imshow(wc.recolor(color_func=image_colors))
plt.axis("off")
plt.figure()
plt.imshow(back_coloring, cmap=plt.cm.gray)
plt.axis("off")
plt.show()
wc.to_file(path.join(d, imgname2))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
good luck
本文发表于http://blog.csdn.net/FontThrone/article/details/72782971#comments