Word cloud is an effective way of visualizing the texts. From a pool of texts, you can see which words are the dominants. They are fun and engaging visuals. So, just by looking at this visualization, you know the mode of the text. In this article, I am going to explain how to generate a word cloud using a python module called WordCloud. This is simple and easy. I will start with a simple word cloud and then show some custom and cool shape.
词云是可视化文本的有效方法。 从大量的文本中,您可以看到哪些词占主导地位。 它们既有趣又引人入胜的视觉效果。 因此,仅通过查看此可视化,您就可以知道文本的模式。 在本文中,我将解释如何使用称为WordCloud的python模块生成词云。 这很简单。 我将从一个简单的词云开始,然后展示一些自定义的酷形状。
建立 (Setup)
For this tutorial, I will use a dataset from Kaggle. Please feel free to download the dataset and follow along:
在本教程中,我将使用Kaggle的数据集。 请随时下载数据集并遵循:
To use the WordCloud module, you need to install it. That can be done by using the pip install command:
要使用WordCloud模块,您需要安装它。 这可以通过使用pip install命令来完成:
pip install wordcloud
The command for anaconda users:
Anaconda用户的命令:
conda install -c conda-forge wordcloud
The tools to be used:
使用的工具:
Jupyter Notebook environment
Please make sure that you have them installed.
请确保已安装它们。
简单词云 (Simple Word Cloud)
The simplest version is very easy to build. First import the necessary packages and dataset.
最简单的版本很容易构建。 首先导入必要的包和数据集。
import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGeneratorimport matplotlib.pyplot as plt
%matplotlib inlinedf = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)
As you can see, this dataset has a description of wines of different countries and some other information as well. But for this tutorial, I will only focus on the description column because that contains a good amount of text. I will join all the descriptions and make one large text.
如您所见,该数据集包含对不同国家/地区的葡萄酒的描述以及一些其他信息。 但是对于本教程,我将仅关注描述列,因为其中包含大量文本。 我将加入所有描述并撰写一个大文本。