NLTK安装过程中遇到的坑
-
无法下载
按照网上的教程输入如下代码会报错 “Connection refused”
import nltk nltk.download()
解决办法:手动下载数据文件,链接:https://github.com/nltk/nltk_data/tree/gh-pages(2021-1-5下载的大概620M左右)。下载之后解压,将packages文件夹更名为“nltk_data”后放入上图Download Directory路径。
然后执行代码:
import nltk from nltk.book import *
如果输出下面的提示信息,则代表安装成功
*** Introductory Examples for the NLTK Book ***
Loading text1, …, text9 and sent1, …, sent9
Type the name of the text or sentence to view it.
Type: ‘texts()’ or ‘sents()’ to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908 -
无法使用分词
我在使用word_tokenize进行分词的时候还是会报错,提示缺少“punkt”。解决办法如下
下载数据文件,链接:https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip。然后将其解压放入上面nltk_data 下的tokenizers中。即可正常使用分词。