接单日记（三）文本处理之词云生成_词云分析实验报告-CSDN博客

本文链接：https://blog.csdn.net/qq_62789540/article/details/130597973

文章目录

接单日记（三）文本处理之词云生成

接单日记（三）文本处理之词云生成

此为一个实验报告，故遵守实验报告的格式。

一、实验目的

熟悉Python第三方库python-docx、wordcloud、jieba库的安装和使用
熟悉使用pathlib来获取文件
熟悉运用Python封装的思想
熟悉使用join方法对字符串进行拼接操作
了解字符串的utf-8的编码格式

二、实验内容

编写一个程序，提取词库里面的所有内容，对其进行分词操作，同时进行词频统计，停用词清洗的操作，最后输出图云到result.jpg中。

三、程序及结果

1、运行程序

from docx import Document
from pathlib import Path
from wordcloud import WordCloud
import jieba

font = Path(r"C:\Windows\Fonts\simfang.ttf")
word_dataset = Path("词库.docx")
stop_word = Path("stoplist.txt")


def get_stop_list(stop_word):
    with open(stop_word, "r", encoding="utf-8") as f:
        return set(f.read().split())


def handle_word_dataset(word_dataset):
    str_ = ""
    for j in Document(word_dataset).paragraphs:
        str_ += j.text

    return [w for w in jieba.cut(str_)]

wc = WordCloud(
    font_path=str(font),
    stopwords=get_stop_list(stop_word),
    width=1920,
    height=1080,
    background_color="white",
    max_words=1000,
).generate(" ".join(handle_word_dataset(word_dataset)))
wc.to_file(Path("result.jpg"))