wordcloud用来制作中文词云

最新推荐文章于 2024-07-08 19:20:42 发布

Soyoger

最新推荐文章于 2024-07-08 19:20:42 发布

阅读量4.2k

点赞数

分类专栏： Python+Python数据分析文章标签： python wordCloud

本文链接：https://blog.csdn.net/qq_36330643/article/details/78725465

版权

Python+Python数据分析专栏收录该内容

98 篇文章 4 订阅

订阅专栏

1. 读入数据，删除NAN，用jieba分词
df = pd.read_csv("./data/entertainment_news.csv", encoding='utf-8')
df
df = df.dropna()
df
content=df.content.values.tolist()
content
#jieba.load_userdict(u"data/user_dic.txt")
segment=[]
for line in content:
    try:
        segs=jieba.lcut(line)
        for seg in segs:
            if len(seg)>1 and seg!='\r\n':
                segment.append(seg)
    except:
        print line
        continue

2. 去掉停用词
words_df=pd.DataFrame({'segment':segment})
#words_df.head()
stopwords=pd.read_csv("data/stopwords.txt",index_col=False,quoting=3,sep="\t",names=['stopword'], encoding='utf-8')#quoting=3全不引用
#stopwords.head()
words_df=words_df[~words_df.segment.isin(stopwords.stopword)]
words_df

3. 统计计数

       words_stat=words_df.groupby(by=['segment'])['segment'].agg({"计数":numpy.size})

words_stat=words_stat.reset_index().sort_values(by=["计数"],ascending=False)
words_stat.head()

4. 绘图

wordcloud=WordCloud(font_path="data/simhei.ttf",background_color="white",max_font_size=80)
word_frequence = {x[0]:x[1] for x in words_stat.head(1000).values}
wordcloud=wordcloud.fit_words(word_frequence)
plt.imshow(wordcloud)