词云生成库WordCloud详解（一）：概述、ImageColorGenerator类

最新推荐文章于 2024-05-15 09:00:00 发布

mighty13

最新推荐文章于 2024-05-15 09:00:00 发布

阅读量3.3w

点赞数 2

分类专栏： Matplotlib 文章标签： matplotlib wordcloud 词云 ImageColor 颜色

本文链接：https://blog.csdn.net/mighty13/article/details/116854437

版权

Matplotlib 专栏收录该内容

124 篇文章 236 订阅

订阅专栏

当前wordcloud版本：1.81
项目地址：https://github.com/amueller/word_cloud
API、案例地址：https://amueller.github.io/word_cloud/

`wordcloud`简介

wordcloud是一款轻量级的Python词云生成库，在Python数据分析领域使用率比较高。
注意：wordcloud默认支持自动生成英文词云，如果使用中文词云，需要使用中文分词器（比如jieba）和中文字体。

wordcloud依赖的第三方库主要有numpy、PIL（Pillow）和matplotlib。

安装wordcloud时尽量使用pip、anacoda等包管理工具，不要用源码安装，因为可能出现一些编译问题。
pip install wordcloud

`wordcloud`包结构

wordcloud的包结构比较简单，仅包含以下10个文件。

__init__.py：定义公开的包命名空间。
_version.py：版本信息。
color_from_image.py：根据图像生成颜色列表。
wordcloud.py：词云生成主接口。
__main__.py：为cli结构提供python -m wordcloud命令支持。
wordcloud_cli.py：wordcloud CLI工具接口。
DroidSansMono.ttf：内置默认字体。
stopwords：内置英文停用词表。
tokenization.py：内置英文分词器。
query_integral_image.cp37-win_amd64.pyd

源码解读

`init.py`解读

分别从wordcloud.py、color_from_image.py等模块中导入WordCloud、 STOPWORDS、 random_color_func、 get_single_color_func、ImageColorGenerator等对象。从_version.py生成版本信息并赋值给变量__version__。

将包开放的接口定义为：WordCloud、 STOPWORDS、 random_color_func、 get_single_color_func、ImageColorGenerator、__version__。

from .wordcloud import (WordCloud, STOPWORDS, random_color_func,
                        get_single_color_func)
from .color_from_image import ImageColorGenerator

__all__ = ['WordCloud', 'STOPWORDS', 'random_color_func',
           'get_single_color_func', 'ImageColorGenerator',
           '__version__']

from ._version import get_versions
__version__ = get_versions()['version']
del get_versions

`_version.py`解读

版本信息保存在JSON字符串version_json中，通过json库将其转换为Python对象。

import json

version_json = '''
{
 "date": "2020-11-11T13:35:51-0800",
 "dirty": false,
 "error": null,
 "full-revisionid": "b6f48e108224f84b0b1659cea8558c86ccfc9898",
 "version": "1.8.1"
}
'''  # END VERSION_JSON


def get_versions():
    return json.loads(version_json)

`color_from_image.py`解读

模块中仅包含ImageColorGenerator类。

ImageColorGenerator类的作用为根据传入单词信息和图像数据（numpy数组），计算单词所占据的源图像中区域的平均颜色。

注意！ImageColorGenerator类的实例为可调用对象。

类签名为class wordcloud.ImageColorGenerator(image, default_color=None)。

该类只定义了两个类，构造方法__init__和调用方法__call__。

构造方法`init`

构造方法签名为def __init__(self, image, default_color=None)。
参数为：

image：图像数据。ndarray。必备参数。
default_color：当wordcloud尺寸比image大时，返回的默认颜色。RGB三元组或None。默认值为None，即当出现当wordcloud尺寸比image大时，抛出异常，而不是返回默认颜色。

构造方法会检测image的ndim和shape属性，与matplotlib类似，image只支持以下三种模式。

image的shape为(M,N)即图像为M像素长N像素宽的灰度图像。
image的shape为(M,N,3)即图像为M像素长N像素宽的RGB图像。
image的shape为(M,N,4)即图像为M像素长N像素宽的RGBA图像。

在__call__中，最终image的shape会统一为(M,N,3)。

调用方法`call`

调用方法签名为def __call__(self, word, font_size, font_path, position, orientation, **kwargs)。

参数为：

word：单词。字符串。
font_size：字体大小。整数。
font_path：字体路径。字符串或Path对象。
position：单词的位置，单位为像素。整数2元组。
orientation：单词的方向。字符串。

word、font_size、font_path和orientation决定了某单词在对应图像中尺寸。结合postion可以确定某单词在对应图像中的具体区域。

根据源码可知，__call__方法，首先确定某单词在对应图像中的所有数据，然后检测图像数据的维数，把图像数据的shape会统一为(M,N,3)，最后检测单词是否超出了图像范围，如果超出范围就使用默认颜色default_color，否则返回单词对应图像区域的颜色数组的平均值。

案例：演示`ImageColorGenerator`类

wx.jpg为

在这里插入图片描述

# 构造图像数据
data = plt.imread('wx.jpg')
# 构造ImageColorGenerator对象
Image = ImageColorGenerator(data)
# 调用ImageColorGenerator对象，返回值为字符串
color = Image(word="ab", font_size=10, font_path='simhei',
              position=(50, 50), orientation="horizontal")
# ImageColorGenerator对象字面量
print(color)
# 将ImageColorGenerator对象字面量转换为RGB三元组
color_rgb = ImageColor.getrgb(color)
# 显示最终颜色
plt.subplot(facecolor=np.array(color_rgb)/255)

输出为：

rgb(16, 209, 30)

在这里插入图片描述

`ImageColorGenerator`类源码

class ImageColorGenerator(object):
    # returns the average color of the image in that region
    def __init__(self, image, default_color=None):
        if image.ndim not in [2, 3]:
            raise ValueError("ImageColorGenerator needs an image with ndim 2 or"
                             " 3, got %d" % image.ndim)
        if image.ndim == 3 and image.shape[2] not in [3, 4]:
            raise ValueError("A color image needs to have 3 or 4 channels, got %d"
                             % image.shape[2])
        self.image = image
        self.default_color = default_color

    def __call__(self, word, font_size, font_path, position, orientation, **kwargs):
        """Generate a color for a given word using a fixed image."""
        # get the font to get the box size
        font = ImageFont.truetype(font_path, font_size)
        transposed_font = ImageFont.TransposedFont(font,
                                                   orientation=orientation)
        # get size of resulting text
        box_size = transposed_font.getsize(word)
        x = position[0]
        y = position[1]
        # cut out patch under word box
        patch = self.image[x:x + box_size[0], y:y + box_size[1]]
        if patch.ndim == 3:
            # drop alpha channel if any
            patch = patch[:, :, :3]
        if patch.ndim == 2:
            raise NotImplementedError("Gray-scale images TODO")
        # check if the text is within the bounds of the image
        reshape = patch.reshape(-1, 3)
        if not np.all(reshape.shape):
            if self.default_color is None:
                raise ValueError('ImageColorGenerator is smaller than the canvas')
            return "rgb(%d, %d, %d)" % tuple(self.default_color)
        color = np.mean(reshape, axis=0)
        return "rgb(%d, %d, %d)" % tuple(color)

mighty13

关注

2
点赞
踩
21

收藏

觉得还不错? 一键收藏
2
评论
词云生成库WordCloud详解（一）：概述、ImageColorGenerator类

当前wordcloud版本：1.81项目地址：https://github.com/amueller/word_cloudAPI、案例地址：https://amueller.github.io/word_cloud/wordcloud简介wordcloud是一款轻量级的Python词云生成库，在Python数据分析领域使用率比较高。注意：wordcloud默认支持自动生成英文词云，如果使用中文词云，需要使用中文分词器（比如jieba）和中文字体。wordcloud依赖的第三方库主要有numpy、
复制链接

扫一扫