Wordclouds are often mislabeled as being clunky and old-fashioned. In reality, they can be elegant and creative communication methods of text, both as exploratory analysis but also in presentation. Additionally, they’re very easy to create in Python —so let’s get into it!
Wordcloud通常被错误地标记为笨拙且过时的。 实际上,它们既可以作为探索性分析,也可以作为表示形式的优雅,创造性的文本交流方式。 此外,它们很容易在Python中创建-因此让我们开始吧!
Let’s copy-paste the content from this COVID-19 article and paste it into a text file named covid_article.txt
. The content of this text file will be stored into a variable named content
.
让我们复制并粘贴此COVID-19文章中的内容,并将其粘贴到名为covid_article.txt
的文本文件中。 该文本文件的内容将存储到名为content
的变量中。
content = open("covid_article.txt").read()
In order to make sure one word the same as another, we need to remove punctuation and capitalization, such that ‘hello’ is the same as ‘Hello’, which is the same as ‘hello!’. We will also need to make sure that the characters are all alphabetic — we can accomplish this with list comprehension (alternatively with regular expressions).
为了确保一个单词与另一个单词相同,我们需要删除标点符号和大写字母,以使“ hello”与“ Hello”相同,而与“ hello!”相同。 我们还需要确保所有字符都是字母-我们可以通过列表理解(或者使用正则表达式)来实现。
import string
for punc_char in string.punctuation:
content = content.replace(punc_char,'') #remove punctuation
content = content.lower() #make lowercase
content = ''.join([char for char in content if char in ' abcdefghijklmnopqrstuvwxyz']) #only alphabetic characters
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/815f7b5352ebc52020474c20c4099b09.png)
There are noticeably some small things that need to be fixed, but generally this is just a string of words and we’ll go ahead for now. We’ll need to import the wordcloud
module (install using pip install wordcloud
) and the matplotlib
library to display the image.
显然有一些小问题需要修复,但是通常这只是一句话,我们现在继续。 我们需要导入wordcloud
模块(使用pip install wordcloud
)和matplotlib
库来显示图像。
from wordcloud import WordCloud
import matplotlib.pyplot as plt