多文件分词并统计词频

该博客介绍了如何利用jieba库对多文件进行分词,并结合停词表去除停用词,统计每个文件的词频。通过读取文件内容,进行分词处理,然后构建词频字典,最后将结果输出到CSV文件中。
摘要由CSDN通过智能技术生成

import os
import jieba

数据集如下:

folder_path = r"C:\Users\Machine Learning\comments"
os.listdir(folder_path)

['201603枕.txt',
 '201603锅.txt',
 '201604枕.txt',
 '201604锅.txt',
 '201605枕.txt',
 '201605锅.txt',
 '201606枕.txt',
 '201606锅.txt',
 '201607枕.txt',
 '201607锅.txt',
 '201608枕.txt',
 '201608锅.txt',
 '201609枕.txt',
 '201609锅.txt',
 '201610枕.txt',
 '201610锅.txt',
 '201611枕.txt',
 '201611锅.txt',
 '201612枕.txt',
 '201612锅.txt',
 '201701枕.txt',
 '201701锅.txt',
 '201702枕.txt',
 '201702锅.txt',
 '201703枕.txt',
 '201703锅.txt',
 '201704枕.txt',
 '201704锅.txt',
 '201705枕.txt',
 '201705锅.txt',
 '201706枕.txt',
 '201706锅.txt',
 '201707枕.txt',
 '201707锅.txt',
 '201708枕.txt',
 '201708锅.txt',
 '201709枕.txt',
 '201709锅.txt',
 '201710枕.txt',
 '201710锅.txt',
 '201711枕.txt',
 '201711锅.txt',
 '201712枕.txt',
 '201712锅.txt',
 '201801枕.txt',
 '201801锅.txt',
 '201802枕.txt',
 '201802锅.txt',
 '201803锅.txt',
 '201804枕.txt',
 '201804锅.txt',
 '201805枕.txt',
 '201805锅.txt',
 '201806枕.txt',
 '201806锅.txt',
 '201807枕.txt',
 '201807锅.txt',
 '201808枕.txt',
 '201808锅.txt',
 '201809枕.txt',
 '201809锅.txt',
 '201810枕.txt',
 '201810锅.txt',
 '201811枕.txt',
 '201811锅.txt',
 '201812枕.txt',
 '201812锅.txt',
 '201901枕.txt',
 '201901锅.txt',
 '201902枕.tx
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值