随手记
- 我们使用urllib.urlretrieve(url,filename)时经常遇到下载到一半时,出现urllib.ContentTooShortError错误。这是因为文件下载不完全导致的错误。
- urllib.urlretrieve(url,filename)等待时间过长,导致程序死循环或者卡死。
import socket
import urllib.request
#设置超时时间为30s
socket.setdefaulttimeout(30)
#解决下载不完全问题且避免陷入死循环
‘’‘’‘’ 代码省略‘’‘’‘’‘
for page_2 in range(2,int(pageEle)+1):
try:
url=imgUrl.replace('.html', '_%s.html' % str(page_2))
response = requests.get(url).text
selector = html.fromstring(response)
imgEle = selector.xpath('//a[@class="down-btn"]/@href')[0]
print(imgEle)
imgName='%s_%s_%s.jpg'%(page,str(index+1),page_2)
coverPath = '%s/%s/%s' % (os.getcwd(), ss, imgName)
# coverPath = '%s/meizi1/%s' % (os.getcwd(), imgName)
# print("zoujunbo")
urllib.request.urlretrieve(imgEle, coverPath) ###unknown url type: ''
except socket.timeout: ##### 超出时间直接就跳过
print("单个图片下载失败")
# urllib.request.urlretrieve(imgEle, coverPath)
参考链接1:https://blog.csdn.net/jclian91/article/details/77513289
参考链接2:https://blog.csdn.net/Innovation_Z/article/details/51106601