写爬虫为啥要用多线程呢?
IO密集型代码(文件处理、网络爬虫等),多线程能够有效提升效率(单线程下有IO操作会进行IO等待,造成不必要的时间浪费,而开启多线程能在线程A等待时,自动切换到线程B,可以不浪费CPU的资源,从而能提升程序执行效率)。
线程池:
import threadpool
def main(url):
req = requests.get(url).text
print(req)
if __name__ == '__main__':
url_list = []
for i in range(1, 100):
url = 'http://hahahaahahaha' + str(i)
url_list.append(url)
pool = threadpool.ThreadPool(30)
reqs = threadpool.makeRequests(main, url_list)
[pool.putRequest(req) for req in reqs]
pool.wait()
`