在学习nltk时遇到了这样一个问题: To remove HTML markup, use BeautifulSoup’s get_text() function
错误处代码为:
html = response.read()
clean = nltk.clean_html(html)
经查阅nltk的相关方法可能已经失效了,改用BeautifulSoup的同类方法即可,代码如下:
from bs4 import BeautifulSoup
response = urllib.request.urlopen('your url')
html = response.read()
clean = BeautifulSoup(html).get_text()