Python+jieba生成词云

描述:使用Python和jieba生成词云。

#coding:utf-8
from os import path
from scipy.misc import imread
import numpy as np
import pickle
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud,STOPWORDS,ImageColorGenerator


class ciyun():
    def __init__(self):
        pass

    def draw(self):
        # with open('test.txt', 'rb') as f:
        #     text = pickle.load(f)
        text = open('test.txt').read()
        wordlist_arter_jieba = jieba.cut(text, cut_all=True)
        wl_space_split = " ".join(wordlist_arter_jieba)
        color_mask = plt.imread("test.jpg")
        my_wordcloud = WordCloud(
            # font_path='font.ttf',  # 字体最好放在与脚本相同的目录下,而且必须设置
            background_color='white',
            mask=color_mask,
            max_words=2000,
            max_font_size=50000
        )

        word_cloud = my_wordcloud.generate(wl_space_split)
        plt.imshow(word_cloud)
        plt.axis('off')
        plt.show()

t = ciyun()
t.draw()


![生成词云:](http://img.blog.csdn.net/20170706004353314?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxNDI1NzE5Mg==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)
1. 安装jieba和wordcloud库 ```python !pip install jieba !pip install wordcloud ``` 2. 导入所需库 ```python import jieba import wordcloud from PIL import Image import numpy as np import matplotlib.pyplot as plt ``` 3. 读取文本并使用jieba进行分词 ```python with open('text.txt', 'r', encoding='utf-8') as f: text = f.read() words = jieba.lcut(text) # 使用jieba进行分词 ``` 4. 根据词频生成词云图 ```python # 统计词频 word_counts = {} for word in words: if len(word) == 1: continue if word not in word_counts: word_counts[word] = 1 else: word_counts[word] += 1 # 生成词云图 wc = wordcloud.WordCloud( font_path='msyh.ttc', # 字体文件路径 background_color='white', # 背景颜色 width=800, # 宽度 height=600, # 高度 max_words=200, # 最大显示的单词数 max_font_size=100, # 最大字体大小 scale=8, # 缩放比例 random_state=42 # 随机状态 ).generate_from_frequencies(word_counts) # 显示词云图 plt.imshow(wc) plt.axis('off') plt.show() ``` 5. 可以根据需要自定义词云图的形状 ```python # 读取图片 mask = np.array(Image.open('mask.png')) # 生成词云图 wc = wordcloud.WordCloud( font_path='msyh.ttc', background_color='white', width=800, height=600, max_words=200, max_font_size=100, mask=mask, # 以图片形状为背景 contour_width=2, # 设置轮廓线宽度 contour_color='steelblue', # 设置轮廓线颜色 random_state=42 ).generate_from_frequencies(word_counts) # 显示词云图 plt.imshow(wc, interpolation='bilinear') plt.axis('off') plt.show() ``` 其中,mask.png是自定义的图片,可以是任意形状的图片。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值