import requests
from lxml import etree
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from multiprocessing import Pool
import os
import threading
import psutil
# URL伪装
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36",
}
def downimg(img_src):
start_time = time.time()
name = img_src.split('/')[-1].split('.')[0]
img_url = "http://127.0.0.1:8080" + img_src
img = requests.get(img_url)
dir_path = 'step1/images'
if not os.path.exists('step1/images'):
os.makedirs('step1/images')
img_path = 'step1/images'
Educoder 第1关:多线程、多进程爬虫
于 2021-06-18 23:43:12 首次发布
本文介绍了在Educoder平台上完成的第一关挑战,聚焦于使用Python实现多线程和多进程的网络爬虫技术。通过实战,探讨了如何通过并发提升爬虫效率,同时涉及到了Python的线程模块和进程模块的运用。
摘要由CSDN通过智能技术生成