基于python的词云生成及可视化_用 Python 实现词云可视化!超级简单,一看就会!...

词云是一种非常漂亮的可视化展示方式,正所谓一图胜过千言万语,词云在之前的项目中我也有过很多的使用,可能对于我来说,一种很好的自我介绍方式就是词云吧,就像下面这样的:

个人觉还是会比枯燥的文字语言描述性的介绍会更吸引人一点吧。

今天不是说要怎么用词云来做个人介绍,而是对工作中使用到比较多的词云计较做了一下总结,主要是包括三个方面:

1、诸如上面的简单形式矩形词云

2、基于背景图片数据来构建词云数据

3、某些场景下不想使用类似上面的默认的字体颜色,这里可以自定义词云的字体颜色

接下来对上面三种类型的词云可视化方法进行demo实现与展示,具体如下,这里我们使用到的测试数据如下:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably text one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

1、简单形式矩形词云实现如下:

def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):

'''

词云可视化Demo

'''

try:

with open(freDictpath) as f:

data=f.readlines()

data_list=[one.strip().split(sep) for one in data if one]

fre_dict={}

for one_list in data_list:

fre_dict[unicode(one_list[0])]=int(one_list[1])

except:

fre_dict=freDictpath

wc=WordCloud(font_path='font/simhei.ttf',#设置字体 #simhei

background_color=back, #背景颜色

max_words=1300,# 词云显示的最大词数

max_font_size=120, #字体最大值

margin=3, #词云图边距

width=1800, #词云图宽度

height=800, #词云图高度

random_state=42)

wc.generate_from_frequencies(fre_dict) #从词频字典生成词云

plt.figure()

plt.imshow(wc)

plt.axis("off")

wc.to_file(savepath)

图像数据结果如下:

2、 基于背景图像数据的词云可视化具体实现如下:

先贴一下背景图像:

这也是一个比较经典的图像数据了,下面来看具体的实现:

def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'):

'''

词云可视化Demo【使用背景图片】

'''

try:

with open(freDictpath) as f:

data=f.readlines()

data_list=[one.strip().split(sep) for one in data if one]

fre_dict={}

for one_list in data_list:

fre_dict[unicode(one_list[0])]=int(one_list[1])

except:

fre_dict=freDictpath

back_coloring=imread(backPic)

wc=WordCloud(font_path='simhei.ttf',#设置字体 #simhei

background_color=back,max_words=1300,

mask=back_coloring,#设置背景图片

max_font_size=120, #字体最大值

margin=3,width=1800,height=800,random_state=42,)

wc.generate_from_frequencies(fre_dict) #从词频字典生成词云

wc.to_file(savepath)

结果图像数据如下:

3、 自定义词云字体颜色的具体实现如下:

#自定义颜色列表

color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00',

'#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080']

def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):

'''

词云可视化Demo【自定义字体的颜色】

'''

#基于自定义颜色表构建colormap对象

colormap=colors.ListedColormap(color_list)

try:

with open(freDictpath) as f:

data=f.readlines()

data_list=[one.strip().split(sep) for one in data if one]

fre_dict={}

for one_list in data_list:

fre_dict[unicode(one_list[0])]=int(one_list[1])

except:

fre_dict=freDictpath

wc=WordCloud(font_path='font/simhei.ttf',#设置字体 #simhei

background_color=back, #背景颜色

max_words=1300, #词云显示的最大词数

max_font_size=120, #字体最大值

colormap=colormap, #自定义构建colormap对象

margin=2,width=1800,height=800,random_state=42,

prefer_horizontal=0.5) #无法水平放置就垂直放置

wc.generate_from_frequencies(fre_dict)

plt.figure()

plt.imshow(wc)

plt.axis("off")

wc.to_file(savepath)

结果图像数据如下:

上述三种方法就是我在具体工作中使用频度最高的三种词云可视化展示方法了,下面贴出来完整的代码实现,可以直接拿去跑的:

#!usr/bin/env python

#encoding:utf-8

from __future__ import division

'''

__Author__:沂水寒城

功能: 词云的可视化模块

'''

import os

import sys

import json

import numpy as np

from PIL import Image

from scipy.misc import imread

from matplotlib import colors

import matplotlib.pyplot as plt

from matplotlib.font_manager import FontProperties

from wordcloud import WordCloud,ImageColorGenerator,STOPWORDS

reload(sys)

sys.setdefaultencoding('utf-8')

#自定义颜色列表

color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00',

'#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080']

def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):

'''

词云可视化Demo

'''

try:

with open(freDictpath) as f:

data=f.readlines()

data_list=[one.strip().split(sep) for one in data if one]

fre_dict={}

for one_list in data_list:

fre_dict[unicode(one_list[0])]=int(one_list[1])

except:

fre_dict=freDictpath

wc=WordCloud(font_path='font/simhei.ttf',#设置字体 #simhei

background_color=back, #背景颜色

max_words=1300,# 词云显示的最大词数

max_font_size=120, #字体最大值

margin=3, #词云图边距

width=1800, #词云图宽度

height=800, #词云图高度

random_state=42)

wc.generate_from_frequencies(fre_dict) #从词频字典生成词云

plt.figure()

plt.imshow(wc)

plt.axis("off")

wc.to_file(savepath)

def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'):

'''

词云可视化Demo【使用背景图片】

'''

try:

with open(freDictpath) as f:

data=f.readlines()

data_list=[one.strip().split(sep) for one in data if one]

fre_dict={}

for one_list in data_list:

fre_dict[unicode(one_list[0])]=int(one_list[1])

except:

fre_dict=freDictpath

back_coloring=imread(backPic)

wc=WordCloud(font_path='simhei.ttf',#设置字体 #simhei

background_color=back,max_words=1300,

mask=back_coloring,#设置背景图片

max_font_size=120, #字体最大值

margin=3,width=1800,height=800,random_state=42,)

wc.generate_from_frequencies(fre_dict) #从词频字典生成词云

wc.to_file(savepath)

def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):

'''

词云可视化Demo【自定义字体的颜色】

'''

#基于自定义颜色表构建colormap对象

colormap=colors.ListedColormap(color_list)

try:

with open(freDictpath) as f:

data=f.readlines()

data_list=[one.strip().split(sep) for one in data if one]

fre_dict={}

for one_list in data_list:

fre_dict[unicode(one_list[0])]=int(one_list[1])

except:

fre_dict=freDictpath

wc=WordCloud(font_path='font/simhei.ttf',#设置字体 #simhei

background_color=back, #背景颜色

max_words=1300, #词云显示的最大词数

max_font_size=120, #字体最大值

colormap=colormap, #自定义构建colormap对象

margin=2,width=1800,height=800,random_state=42,

prefer_horizontal=0.5) #无法水平放置就垂直放置

wc.generate_from_frequencies(fre_dict)

plt.figure()

plt.imshow(wc)

plt.axis("off")

wc.to_file(savepath)

if __name__ == '__main__':

text="""

The Zen of Python, by Tim Peters

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably text one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

"""

word_list=text.split()

fre_dict={}

for one in word_list:

if one in fre_dict:

fre_dict[one]+=1

else:

fre_dict[one]=1

simpleWC1(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC1.png')

simpleWC2(sep=' ',back='black',backPic='backPic/A.png',freDictpath=fre_dict,savepath='simpleWC2.png')

simpleWC3(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC3.png')

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值