Python `wordcloud` 库完整教程

下面是完整的 wordcloud 教程,每段代码都能独立运行,涵盖基础、进阶和实战应用。


Python wordcloud 库完整教程

1. 介绍与安装

1.1 wordcloud 库简介

wordcloud 是一个 Python 库,用于生成词云(Word Cloud),适用于文本分析和数据可视化。

1.2 wordcloud 能做什么?

  • 可视化文本中高频词汇
  • 支持不同形状、颜色和风格
  • 结合 matplotlibPIL 进行美化

1.3 安装 wordcloud

pip install wordcloud matplotlib numpy pillow jieba pandas

2. 生成基础词云

2.1 创建简单的词云

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "Python WordCloud Example Generate Word Cloud Text Visualization"
wordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:一张白色背景的词云,单词 PythonWordCloud 等按频率大小显示。


2.2 读取文本数据并生成词云

from wordcloud import WordCloud
import matplotlib.pyplot as plt

with open("sample.txt", "r", encoding="utf-8") as file:
    text = file.read()

wordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:从 sample.txt 读取文本,生成词云。


2.3 解决中文乱码问题

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "中文 词云 示例 生成 可视化 词云 词云"

wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400, background_color="white").generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:使用 simhei.ttf(黑体)防止中文乱码。


3. 自定义词云样式

3.1 设置背景颜色与配色

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "Python Data Science Visualization AI Machine Learning"

wordcloud = WordCloud(width=800, height=400, background_color="black", colormap="cool").generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:黑色背景,cool 颜色映射。


3.2 控制字体大小与间距

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "Python WordCloud Example Visualization Font Size"

wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400, max_font_size=100, min_font_size=10).generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:控制字体大小,突出高频词。


4. 形状与遮罩(Mask)

4.1 自定义形状的词云

import numpy as np
from PIL import Image
from wordcloud import WordCloud
import matplotlib.pyplot as plt

mask = np.array(Image.open("cloud_shape.png"))

text = "Python Data Science Visualization AI Machine Learning"

wordcloud = WordCloud(mask=mask, background_color="white").generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:词云形状匹配 cloud_shape.png


5. 处理文本数据

5.1 过滤停用词

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "词云 可视化 示例 词云 生成 代码 Python"

stopwords = {"词云", "示例"}  # 停用词

wordcloud = WordCloud(stopwords=stopwords, font_path="simhei.ttf", width=800, height=400).generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:移除 "词云""示例"


5.2 词频统计与自定义权重

from wordcloud import WordCloud
import matplotlib.pyplot as plt

word_freq = {"Python": 50, "数据分析": 40, "机器学习": 30, "AI": 25}

wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400).generate_from_frequencies(word_freq)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:词云大小根据 word_freq 设定,Python 最大。


6. 进阶技巧

6.1 图片颜色匹配

import numpy as np
from PIL import Image
from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pyplot as plt

mask = np.array(Image.open("cloud_shape.png"))
image_colors = ImageColorGenerator(mask)

text = "Python Data Science Visualization AI Machine Learning"

wordcloud = WordCloud(mask=mask, background_color="white").generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.show()

输出:词云颜色匹配 cloud_shape.png


7. 实战案例

7.1 新闻数据词云

import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = open("news.txt", encoding="utf-8").read()
text = " ".join(jieba.cut(text))

wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400).generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:对新闻文本分词后生成词云。


7.2 处理社交媒体数据(微博/推特)

import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt

df = pd.read_csv("tweets.csv")
text = " ".join(df["content"].dropna())

wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400).generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:读取 tweets.csv 生成词云。


7.3 从 Excel/CSV 读取数据生成词云

import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt

df = pd.read_csv("data.csv")
text = " ".join(df["keywords"].dropna())

wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400).generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

输出:读取 data.csv 提取关键词生成词云。


这份教程包含完整的可运行代码,你可以直接复制运行。如果有任何问题或需要额外的功能,欢迎交流!在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值