1.2.4 数据保存 (多线程)
- 引入多进程模块
import threading
# 多线程
def download\_imgs(imgList,limit):
threads = []
T = [
threading.Thread(target = download, args=(url,i))
for i, url in enumerate(imgList[:limit + 1])
]
for t in T:
t.start()
threads.append(t)
return threads
- 编写下载函数
def download(img_url,name):
resp = requests.get(img_url)
try:
resp = requests.get(img_url)
with open(f'./images/{name}.jpg', 'wb') as f:
f.write(resp.content)
except Exception as e:
print(f"下载失败: {name} {img\_url} -> {e}")
else:
print(f"下载完成: {name} {img\_url}")
就很随机
实验 2
2.1 题目
使用scrapy框架复现作业①
2.2 思路
2.2.1 setting.py
- 解除限制
ROBOTSTXT_OBEY = False
- 设置保存图片的路径
IMAGES_STORE = r'.\images' # 保存文件的路径
- 打开pipelines
ITEM_PIPELINES = {
'weatherSpider.pipelines.WeatherspiderPipeline': 300,
}
- 设置请求头
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,\*/\*;q=0.8',
'Accept-Language': 'en',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.16 Safari/537.36',
}
2.2.2 item.py
- 设置要爬取的字段
class WeatherspiderItem(scrapy.Item):
numbe