云计算简单还是大数据简单_为数据科学家制作词云的简便方法

最新推荐文章于 2024-10-18 00:00:00 发布

cumian8165

最新推荐文章于 2024-10-18 00:00:00 发布

阅读量194

点赞数

文章标签：可视化 python 人工智能 java 机器学习

原文链接：https://www.freecodecamp.org/news/word-cloud-for-data-scientists-76b8a907e04e/

版权

云计算简单还是大数据简单

by Kavita Ganesan

通过Kavita Ganesan

为数据科学家制作词云的简便方法 (An easy way to make word clouds for data scientists)

About a year ago, I looked high and low for a Python word cloud library that I could use from within my Jupyter notebook. I needed it to be flexible enough to use counts or tfidf when needed or just accept a set of words and corresponding weights.

大约一年前，我在Python文字云库中寻找了高低，我可以在Jupyter笔记本中使用它。我需要它足够灵活以在需要时使用counts或tfidf ，或者只接受一组单词和相应的权重。

I was a bit surprised that something like that did not already exist within libraries like plotly. All I wanted to do was to get a quick understanding of my text data and word vectors. I thought that was probably not too much to ask…

令我感到惊讶的是，像plotly这样的库中还没有这样的plotly 。我要做的只是快速了解我的文本数据和单词向量。我认为这可能没什么要问的...

Here I am a year later, using my own word_cloud visualization library. Its not the prettiest or the most sophisticated, but it works for most cases. I decided to share it, so that others could use it as well. After installation, here are a few ways you can use it.

一年后，我在这里使用了自己的word_cloud可视化库。它不是最漂亮或最复杂的，但适用于大多数情况。我决定共享它，以便其他人也可以使用它。安装后，有几种使用方式。

使用单个文本文档生成词云 (Generate word clouds with a single text document)

This example show examples of how you can generate word clouds with just one document. While the colors can be randomized, in this example, the colors are based on the default color settings.

此示例显示了如何仅使用一个文档即可生成词云的示例。尽管可以将颜色随机化，但在此示例中，颜色基于默认颜色设置。

By default, the words are weighted by word counts unless you explicitly ask for tfidf weighting. Tfidf weighting makes sense only if you have a lot of documents to start with.

默认情况下，除非明确要求tfidf加权，否则单词将按单词计数加权。仅当您有很多文档开始时，Tfidf加权才有意义。

从多个文档生成词云 (Generate word clouds from multiple documents)

Let’s say you have 100 documents from one news category, and you just want to see what the common mentions are.

假设您有一个新闻类别有100个文档，而您只想看看常见的提法是什么。

从现有权重生成词云 (Generate word clouds from existing weights)

Let’s say you have a set of words with corresponding weights, and you just want to visualize it. All you need to do is make sure that the weights are normalized between [0 - 1].

假设您有一组具有相应权重的单词，而您只是想将其可视化。您需要做的就是确保权重在[0-1]之间进行归一化。

Hope you find this useful! Please feel free to propose changes to prettify the output - just open a pull request with your changes.

希望你觉得这个有用！请随意提出更改以美化输出-只需打开包含更改的请求即可。