文本内容:data(包含很多条文本)
1、分词:
import jieba
data_cut = data.apply(jieba.lcut)
2、去除停用词:
with open(r'D:\数据文件\stoplist.txt', encoding='utf-8') as f:
txt = f.read()
stop = txt.split()
stop = stop + [' '] #把空格加进去
data_after = data_cut.apply(
lambda x : [i for i in x if i not in stop]
)
3、
from tkinter import _flatten
tmp = pd.Series(_flatten(list(data_after))) #把二维变成一维
num = tmp.value_counts()
4、绘制
from wordcloud import WordCloud
import matplotlib.pyplot as plt
pic = plt.imread(r'D:\数据文件\aixin.jpg')
wc = WordCloud(
background_color='white',
mask=pic,
font_path=r'C:/Windows/Fonts/simsun.ttc')
wc2 = wc.fit_words(num)
plt.imshow(wc2)
plt.axis('off')
plt.show()
标签:plt,Python,data,pic,stop,词云,import,txt,绘制
来源: https://www.cnblogs.com/DDiamondd/p/11183079.html