坑2:1.1.1将文本切分为语句

问题:按如下语句运行,报错。

import nltk
text="Welcome readers. I hope you find it interesting. Please do reply."
from nltk.tokenize import sent_tokenize
sent_tokenize(text)

错误如下:

Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    sent_tokenize(text)
  File "D:\Program Files\python\lib\site-packages\nltk\tokenize\__init__.py", line 94, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 834, in load
    opened_resource = _open(resource_url)
  File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 952, in _open
    return find(path_, path + ['']).open()
  File "D:\Program Files\python\lib\site-packages\nltk\data.py", line 673, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource [93mpunkt[0m not found.
  Please use the NLTK Downloader to obtain the resource:


  [31m>>> import nltk
  >>> nltk.download('punkt')
  [0m
  Searched in:
    - 'C:\\Users\\Mao/nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'D:\\Program Files\\python\\nltk_data'
    - 'D:\\Program Files\\python\\lib\\nltk_data'
    - 'C:\\Users\\Mao\\AppData\\Roaming\\nltk_data'
    - ''
**********************************************************************

错误原因:不知道,从报错信息上看,估计是缺'punkt'相关文件。

解决方法:按照错误提示,直接执行nltk.download('punkt')


说明:我开始以为是书上的命令过时了,到http://www.nltk.org/上看了下,它提供的示例代码是这样的:

import nltk
sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""
tokens = nltk.word_tokenize(sentence)
>>> tokens
我按照这个示例还是出错了。

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值