为啥网上找的词云计数,都把元素分成单个字了,还好我机智,用笨办法字典相加,
非程序猿工作用到可以来看啊,像我这样的搜百度能搜到(用的是anaconda),
努力写个界面让同事自己上传下载,好难
不知道分哪类。。。。
rreason.txt里的一段内容,分词保存到rrcut.txt和rrcut.jpg
from wordcloud import WordCloud
import jieba
import matplotlib.pyplot as plt
#import numpy as np
#from PIL import Image
import time
datapath = “E:\临时处理\”
with open(datapath + “rreason.txt”,‘r’,encoding=‘utf-8’) as f:
string_data = f.read()
text = " ".join(jieba.cut(string_data,cut_all=True))
llist = text.split()
remove_words = [u’了’,u’\n’,u’ ‘,u’,’,u’,’]
for word in llist:
for i in range(len(remove_words)):
if word == remove_words[i]:
llist.remove(word)
sett = set(llist)
dict = {}
for item in sett:
dict.update({item:llist.count(item)})
dic={}
for k,v in dict.items():
dic[k] = v
with open(datapath + “rrcut%s.txt”%time.strftime("%m%d %H",time.localtime()),‘w’,encoding=‘utf-8’) as f:
for d,v in dic.items():
f.write(d + “,” + str(v) + ‘\n’)
llist = " “.join(llist)
#mask = np.array(Image.open(“E:\临时处\4a0e74e7ly1g2muafkv0tj21o02yo7wh.jpg”))
cloud = WordCloud(font_path=”.\fonts\simhei.ttf",
collocations=False,
background_color=‘white’,
width=400,
height=300,
#mask=mask,
max_words=100,
max_font_size=50,
scale=8)
wcloud = cloud.generate(llist)
wcloud.to_file(datapath + “rrcut%s.jpg”%time.strftime("%m-%d %H",time.localtime()))
plt.imshow(wcloud,interpolation=‘bilinear’)
plt.axis(“off”)
plt.show()
remove_words列表为啥加个u
注释的是可以自己选择背景图片