问题:按如下语句运行,报错。
import nltk
text="Welcome readers. I hope you find it interesting. Please do reply."
from nltk.tokenize import sent_tokenize
sent_tokenize(text)
错误如下:
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
sent_tokenize(text)
File "D:\Program Files\python\lib\site-packages\nltk\tokenize\__init__.py", line 94, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 834, in load
opened_resource = _open(resource_url)
File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 952, in _open
return find(path_, path + ['']).open()
File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 673, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt')
[0m
Searched in:
- 'C:\\Users\\Mao/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'D:\\Program Files\\python\\nltk_data'
- 'D:\\Program Files\\python\\lib\\nltk_data'
- 'C:\\Users\\Mao\\AppData\\Roaming\\nltk_data'
- ''
**********************************************************************
错误原因:不知道,从报错信息上看,估计是缺'punkt'相关文件。
解决方法:按照错误提示,直接执行nltk.download('punkt')
说明:我开始以为是书上的命令过时了,到http://www.nltk.org/上看了下,它提供的示例代码是这样的:
import nltk
sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""
tokens = nltk.word_tokenize(sentence)
>>> tokens
我按照这个示例还是出错了。