-
-
-
- import urllib
-
- def getHtml(url):
- page = urllib.urlopen(url).read()
- html=page.read()
- return html
-
- url="http://tieba.baidu.com/p/4040087257/"
- html=getHtml(url)
-
- print(html)
报错:
“AttributeError: 'module' object has no attribute 'urlopen'”
原因是Python3里的urllib模块已经发生改变,此处的urllib都应该改成urllib.request。
-
-
-
- import urllib.request
-
- def getHtml(url):
- page = urllib.request.urlopen(url)
- html=page.read()
- return html
-
- url="http://tieba.baidu.com/p/4040087257/"
- html=getHtml(url)
-
- print(html)
运行成功!
- def getImg(html):
- reg = r'src="(.+?\.jpg)" pic_ext'
- imgre=re.compile(reg)
- imglist = re.findall(imgre,html)
- return imglist
报错:
TypeError: can't use a string pattern on a bytes-like object
原因为Python3 findall数据类型用bytes类型,因此在正则表达式前应添加html = html.decode('utf-8')