前言
不想干正事儿,所以 ……整理一下如何使用用R或者Python画词云图(WorkCloud Plot)
工具
嗯 ,不论是R还是Python,都使用的叫 wordcloud
的工具:
- Python:wordcloud、它的gihub
- R:wordcloud2、wordcloud
R 的 wordcloud或Python的 wordcloud,都可以直接保存为PNG、PDF等格式,但是wordcloud2调用的是JavaScript,默认可以保存为HTML格式,也可以使用webshot
截取为PNG或者PDF(或者说是伪PDF?)
安装
- Python:
pip install wordcloud
或者用 conda 安装; - R:
install.packages("wordcloud")
或者devtools::install_github("ifellows/wordcloud")
、devtools::install_github("lchiffon/wordcloud2")
;
例子
Python wordcloud
的例子
定义需要的函数
from urllib.request import urlopen
from zipfile import ZipFile
from io import BytesIO
# 定义下载并解压zip文件的函数
def download_and_unzip(url: str, extract_to: str) -> str:
http_response = urlopen(url)
zipfile = ZipFile(BytesIO(http_response.read()))
zipfile.extractall(extract_to)
return print('Output_path: "{0}"'.format(extract_to))
# 定义一个画图的框架
def plot_cloud(wordcloud, figsize=(20, 15)):
# Set figure size
plt.figure(figsize=figsize)
# Display image
plt.imshow(wordcloud)
# No axis details
plt.axis("off")
下载并读取数据
# 下载数据
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00380/YouTube-Spam-Collection-v1.zip'
download_and_unzip(url)
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
import pandas as pd
# 读取数据
pd = pd.read_csv(r'Youtube04-Eminem.csv', encoding='latin1')
pd.head(5)
处理数据
# 处理数据
comment_words = ''
stopwords = set(STOPWORDS)
for val in pd.CONTENT:
# 保证类型统一
val = str(val)
# 把每个字母都变成小写,当然大写也没问题
tokens = [i.lower() for i in val.split()]
comment_words += " ".join(tokens) + " "
wc = WordCloud(
width=800,
height=800,
background_color="white",
stopwords=stopwords,
min_font_size=10,
).generate(comment_words)
画图:plot_cloud(wc)
还可以这样!!
就是用一个图片作为填充的框架矩阵,代码:
# 下载框架图片
img_url = "https://amueller.github.io/word_cloud/_images/sphx_glr_masked_002.png"
urlretrieve(img_url, "thumbs-up.png")
import numpy as np
from PIL import Image
# Import image to np.array
mask = np.array(Image.open("thumbs-up.png"))
# Generate wordcloud
wc = WordCloud(
height=800,
width=600,
background_color="white",
mask=mask,
stopwords=stopwords,
contour_width=3,
contour_color="steelblue",
).generate(comment_words)
# Plot
plot_cloud(wc, figsize= (10, 10))
R wordcloud
例子
在Github上找到一个 R wordcloud
的例子:wordcloud:https://gist.github.com/emres/3424557,十年前写的!!十年前,我还不知道有R语言和Python(十年前,我还不知道自己是谁呢,虽然现在也不知道)有兴趣的可以点击连接学习。
这里就先聊一聊wordcloud2
,直接上代码了:
# library
library(wordcloud2)
# 看一下示例数据
# head(demoFreq)
# Basic plot
wordcloud2(data=demoFreq, size=1.6)
#
letterCloud( demoFreq, word = "R", color='random-light' , backgroundColor="black")
letterCloud( demoFreq, word = "PEACE", color="white", backgroundColor="pink")
画图都很简单,就是模仿示例数据的内容和格式就完全没问题,主要是保存,没办法直接保存为一般的图片格式,需要先保存为HTML:
# load wordcloud2
library(wordcloud2)
library(webshot)
webshot::install_phantomjs()
my_graph <- wordcloud2(demoFreq, size=1.5)
# html
library("htmlwidgets")
saveWidget(my_graph,"tmp.html",selfcontained = F)
# png or pdf
webshot("tmp.html","fig_1.pdf", delay =5, vwidth = 480, vheight=480)
参考
https://github.com/Lchiffon/wordcloud2
https://amueller.github.io/word_cloud/
https://towardsdatascience.com/create-a-word-cloud-with-r-bde3e7422e8a