下面是完整的 wordcloud
教程,每段代码都能独立运行,涵盖基础、进阶和实战应用。
Python wordcloud
库完整教程
1. 介绍与安装
1.1 wordcloud
库简介
wordcloud
是一个 Python 库,用于生成词云(Word Cloud),适用于文本分析和数据可视化。
1.2 wordcloud
能做什么?
- 可视化文本中高频词汇
- 支持不同形状、颜色和风格
- 结合
matplotlib
、PIL
进行美化
1.3 安装 wordcloud
库
pip install wordcloud matplotlib numpy pillow jieba pandas
2. 生成基础词云
2.1 创建简单的词云
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = "Python WordCloud Example Generate Word Cloud Text Visualization"
wordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:一张白色背景的词云,单词 Python
、WordCloud
等按频率大小显示。
2.2 读取文本数据并生成词云
from wordcloud import WordCloud
import matplotlib.pyplot as plt
with open("sample.txt", "r", encoding="utf-8") as file:
text = file.read()
wordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:从 sample.txt
读取文本,生成词云。
2.3 解决中文乱码问题
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = "中文 词云 示例 生成 可视化 词云 词云"
wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400, background_color="white").generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:使用 simhei.ttf
(黑体)防止中文乱码。
3. 自定义词云样式
3.1 设置背景颜色与配色
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = "Python Data Science Visualization AI Machine Learning"
wordcloud = WordCloud(width=800, height=400, background_color="black", colormap="cool").generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:黑色背景,cool
颜色映射。
3.2 控制字体大小与间距
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = "Python WordCloud Example Visualization Font Size"
wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400, max_font_size=100, min_font_size=10).generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:控制字体大小,突出高频词。
4. 形状与遮罩(Mask)
4.1 自定义形状的词云
import numpy as np
from PIL import Image
from wordcloud import WordCloud
import matplotlib.pyplot as plt
mask = np.array(Image.open("cloud_shape.png"))
text = "Python Data Science Visualization AI Machine Learning"
wordcloud = WordCloud(mask=mask, background_color="white").generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:词云形状匹配 cloud_shape.png
。
5. 处理文本数据
5.1 过滤停用词
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = "词云 可视化 示例 词云 生成 代码 Python"
stopwords = {"词云", "示例"} # 停用词
wordcloud = WordCloud(stopwords=stopwords, font_path="simhei.ttf", width=800, height=400).generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:移除 "词云"
和 "示例"
。
5.2 词频统计与自定义权重
from wordcloud import WordCloud
import matplotlib.pyplot as plt
word_freq = {"Python": 50, "数据分析": 40, "机器学习": 30, "AI": 25}
wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400).generate_from_frequencies(word_freq)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:词云大小根据 word_freq
设定,Python
最大。
6. 进阶技巧
6.1 图片颜色匹配
import numpy as np
from PIL import Image
from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pyplot as plt
mask = np.array(Image.open("cloud_shape.png"))
image_colors = ImageColorGenerator(mask)
text = "Python Data Science Visualization AI Machine Learning"
wordcloud = WordCloud(mask=mask, background_color="white").generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.show()
输出:词云颜色匹配 cloud_shape.png
。
7. 实战案例
7.1 新闻数据词云
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = open("news.txt", encoding="utf-8").read()
text = " ".join(jieba.cut(text))
wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400).generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:对新闻文本分词后生成词云。
7.2 处理社交媒体数据(微博/推特)
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
df = pd.read_csv("tweets.csv")
text = " ".join(df["content"].dropna())
wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400).generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:读取 tweets.csv
生成词云。
7.3 从 Excel/CSV 读取数据生成词云
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
df = pd.read_csv("data.csv")
text = " ".join(df["keywords"].dropna())
wordcloud = WordCloud(font_path="simhei.ttf", width=800, height=400).generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
输出:读取 data.csv
提取关键词生成词云。
这份教程包含完整的可运行代码,你可以直接复制运行。如果有任何问题或需要额外的功能,欢迎交流!