拉钩-爬虫知识点
map()
自动任务分配,依次执行url数组,方便多进程爬虫
def scrape(url):
try:
urllib.request.urlopen(url)
print(f'URL {url} Scraped')
except (urllib.error.HTTPError, urllib.error.URLError):
print(f'URL {url} not Scraped')
if __name__ == '__main__':
pool = Pool(processes=3)
urls = [
'https://www.baidu.com',
'http://www.meituan.com/',
'http://blog.csdn.net/',
'http://xxxyxxx.net'
]
pool.map(scrape, urls)
pool.close()
request的几个用法
需要获取图片时,直接存成对应储存格式即可.注意r.content才是原生(字节式)的字符串,而r.text返回的是编码过的unicode
r = requests.get(CONST.RESOURCES[0], headers=headers)
# print(r.text)
with open("picTest.png",'wb') as pic:
pic.write(r.content)
r.cookies 可以获取和设置cookies,然后在headers里可以设置Cookie: