在《python爬虫与项目实战》中的第五章,5.2的源码报错:
源码如下:
# -*- coding:utf-8 -*-
import urllib
from lxml import etree
import requests
def Schedule(blocknum, blocksize, totalsize):
'''
:param blocknum: 已经下载的数据块
:param blocksize: 数据块的大小
:param totalsize: 远程文件的大小
:return:
'''
per = 100.0*blocknum*blocksize/totalsize
if per > 100:
per = 100
print('当前下载进度:%d') % per
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)'
headers = {'User-Agent':user_agent}
response = requests.get('http://www.ivsky.com/tupian/ziranfengguang/', headers=headers)
# 使用lxml解析网页
html = etree.HTML(response.text)
# 先找到所有的img
img_urls = html.xpath('.//img/@src')
i = 0
for img_url in img_urls:
urllib.urlretrieve(img_url, 'img'+str(i)+'.jpg', Schedule)
i += 1
运行后报错,提示:
AttributeError: module 'urllib' has no attribute 'urlretrieve'
解决办法:
把倒数第二行改为:
urllib.request.urlretrieve(img_url, 'img'+str(i)+'.jpg', Schedule)