python利用nltk工具，数文件夹中所有文本文件的频率最高的单词

最新推荐文章于 2023-01-31 09:47:25 发布

置顶

骉码

最新推荐文章于 2023-01-31 09:47:25 发布

阅读量906

点赞数 1

分类专栏： python python 学习 nlp 文章标签： nltk python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/BigData_ming/article/details/80917132

版权

本文介绍如何利用Python的nltk库来统计文件夹中所有文本文件的高频单词，展示了一个样例，并提及了glob包在导入所有文件中的作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Life is tooooo short , you need python.

这两天看了下nltk（Natural Language Processing Toolkit）工具包，在感受着其强大功能的时。An idea comming ,既然这是一个工具，那能不能真正的用起来了。那么前段时间看到晚上一些关于统计字符的样题，不是可以完美匹配嘛。

下面是一个小样例：

from nltk import *
import glob
files = glob.glob("/Users/thunder/Desktop/show-me-the-code/0006/*.txt")
for file in files:
    print(file)
    with open(file, "r", encoding="GB18030") as f:
        str = f.read()                    #读出的文本 是字符串
        sents_list = sent_tokenize(str)   #sent_tokenize() 将一段文字 分句，列表存储 sentence
        #print(sents)

最低0.47元/天解锁文章

博客等级

码龄8年

16
原创

9
点赞

24
收藏

1
粉丝

关注

私信

热门文章

分类专栏

项目记录 2篇
产品 2篇
python 10篇
python 学习 7篇
csdn
nlp 2篇
qt 1篇

最新评论

python利用nltk工具，数文件夹中所有文本文件的频率最高的单词
抓个屁给你闻: 我使用了这个glob读取文件夹的文件之后，使用with open 好像并没有获得所有的所有文本的内容。咋回事啊[code=python] import nltk import glob from nltk.tokenize import sent_tokenize from nltk.tokenize import PunktSentenceTokenizer from nltk.corpus import webtext from nltk.tokenize import sent_tokenize from nltk.corpus import stopwords #导入文件并改为NLTK适用文本，使用NLTK进行句子切分 files = glob.glob('D:\Python\my_corpus\*.txt')#文档不能有空格符号 for file in files: print(file) with open(file, 'r',encoding='utf-8') as f: text = f.read() corpus_root = r'D:\Python\my corpus' sent_tokenizer =PunktSentenceTokenizer(corpus_root,['*']) sents = sent_tokenizer.tokenize(text) #sents[1] #print(sents) #查询单词并匹配对应句子 A = input("请输入单词：") B = input("请输入单词：") for lines in sents: if A in lines: if B in lines: print ("包含该单词的句子有:",lines) [/code]

大家在看

最新文章

目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。