NotImplementedError: To remove HTML markup, use BeautifulSoup's get_text() function
经查阅nltk的相关方法可能已经失效了,改用BeautifulSoup的同类方法即可,代码如下
from bs4 import BeautifulSoup
response = urllib.request.urlopen('your url')
html = response.read()
clean = BeautifulSoup(html).get_text()
No module named 'bs4'
pip install bs4