一、常见错误
HTTPConnectionPool(host:XX)Max retries exceeded with url:
如何让请求结束后马上断开连接且释放池中的连接资源:headers={ 'Connection':'close'}
使用代理ip:requests.get(url=url,headers=headers,proxies={'https':'134.209.13.16:8080'}).text
二、站长素材建立模板爬取
#爬取站长素材中的免费建立模板
import requests
from lxml import etree
import random
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36',
'Connection':'close'
}
url_page_one = 'http://sc.chinaz.com/jianli/free.html'
#定制了一个通用的url模板
url_demo = 'http://sc.chinaz.com/jianli/free_%d.html'
start_page = int(input('enter a start page num:'))
end_page = int(input('enter a end page num:'))
for pageNum in ran