参考“http://blog.sina.com.cn/s/blog_5cf74e410102uxsg.html”
非常简单的一小段代码
[python] view plain copy print?
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import urllib
def getHtml(url):
page = urllib.urlopen(url).read()
html=page.read()
return html
url="http://tieba.baidu.com/p/4040087257/"
html=getHtml(url)
print(html)
报错:
“AttributeError: 'module' object has no attribute 'urlopen'”
原因是Python3里的urllib模块已经发生改变,此处的urllib都应该改成urllib.request。
[python] view plain copy print?
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import urllib.request
def getHtml(url):
page = urllib.request.urlopen(url)
html=page.read()
return html
url="http://tieba.baidu.com/p/4040087257/"
html=getHtml(url)
print(html)
运行成功!
[python] view plain copy print?
def getImg(html):
reg = r'src="(.+?\.jpg)" pic_ext'
imgre=re.compile(reg)
imglist = re.findall(imgre,html)
return imglist
报错:
TypeError: can't use a string pattern on a bytes-like object
原因为Python3 findall数据类型用bytes类型,因此在正则表达式前应添加html = html.decode('utf-8')