NLTK安装过程中遇到的坑

最新推荐文章于 2023-02-01 17:21:22 发布

槐夏廿七

最新推荐文章于 2023-02-01 17:21:22 发布

阅读量623

点赞数 4

分类专栏： Python 爬虫

本文链接：https://blog.csdn.net/qq_42452095/article/details/112236655

版权

16 篇文章 4 订阅

订阅专栏

11 篇文章 1 订阅

订阅专栏

无法下载

按照网上的教程输入如下代码会报错 “Connection refused”
```
import nltk
nltk.download()
```
解决办法：手动下载数据文件，链接：https://github.com/nltk/nltk_data/tree/gh-pages（2021-1-5下载的大概620M左右）。下载之后解压，将packages文件夹更名为“nltk_data”后放入上图Download Directory路径。

然后执行代码：
```
import nltk
from nltk.book import *
```
如果输出下面的提示信息，则代表安装成功

*** Introductory Examples for the NLTK Book ***
Loading text1, …, text9 and sent1, …, sent9
Type the name of the text or sentence to view it.
Type: ‘texts()’ or ‘sents()’ to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
无法使用分词

我在使用word_tokenize进行分词的时候还是会报错，提示缺少“punkt”。解决办法如下

下载数据文件，链接：https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip。然后将其解压放入上面nltk_data 下的tokenizers中。即可正常使用分词。

关注