python3.6用HTMLParser解析html时报错
No module named 'htmlentitydefs'或No module named 'markupbase'
先上代码
from HTMLParser import HTMLParser import urllib.request class myhtml(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.flag = 0 self.links = [] def handle_starttag(self,tag,attrs): if tag == "a": for name,value in attrs: if name == "href": self.links.append(name) if __name__ == "__main__": parser = myhtml() myurl = "https://www.cnblogs.com/pinpin" html = urllib.request.urlopen(myurl) html_connect =html.read() html_connect = bytes.decode(html_connect) parser.feed(html_connect) print(parser.links)
错误如下:
TypeError: No module named 'htmlentitydefs'
简单来说 就是一个导包错误,没有就下载导入一个呗~~~,但是这个库安装不了,所以继续找了
百度结论:'htmlentitydefs'应该是在python3以后弃用了
那怎么办,最后通过努力,找到了个很简单的方法
灵感来自:
http://stackoverflow.max-everyday.com/2018/06/python3-importerror-no-module-named-htmlparser/
from HTMLParser import HTMLParser #python2可这么写
from html.parser import HTMLParser #python3建议都这么写后,问题解决了