坑2：1.1.1将文本切分为语句

最新推荐文章于 2023-07-29 01:45:32 发布

badapplecn

最新推荐文章于 2023-07-29 01:45:32 发布

阅读量782

点赞数

分类专栏：精通Python自然语言处理

本文链接：https://blog.csdn.net/badapplecn/article/details/78790597

版权

精通Python自然语言处理专栏收录该内容

3 篇文章 0 订阅

订阅专栏

问题：按如下语句运行，报错。

import nltk
text="Welcome readers. I hope you find it interesting. Please do reply."
from nltk.tokenize import sent_tokenize
sent_tokenize(text)

错误如下：

Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
sent_tokenize(text)
File "D:\Program Files\python\lib\site-packages\nltk\tokenize\__init__.py", line 94, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 834, in load
opened_resource = _open(resource_url)
File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 952, in _open
return find(path_, path + ['']).open()
File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 673, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:

[31m>>> import nltk
>>> nltk.download('punkt')
[0m
Searched in:
- 'C:\\Users\\Mao/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'D:\\Program Files\\python\\nltk_data'
- 'D:\\Program Files\\python\\lib\\nltk_data'
- 'C:\\Users\\Mao\\AppData\\Roaming\\nltk_data'
- ''
**********************************************************************

错误原因：不知道，从报错信息上看，估计是缺'punkt'相关文件。

解决方法：按照错误提示，直接执行nltk.download('punkt')

说明：我开始以为是书上的命令过时了，到http://www.nltk.org/上看了下，它提供的示例代码是这样的：

import nltk
sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""
tokens = nltk.word_tokenize(sentence)
>>> tokens

我按照这个示例还是出错了。

badapplecn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
3
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录