同步代码:
import requests
import time
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
#flask服务器代码:
from flask import Flask
from time import sleep
app = Flask(__name__)
@app.route('/bobo')
def index1():
sleep(2)
return 'hello bobo!'
@app.route('/jay')
def index2():
sleep(2)
return 'hello jay!'
@app.route('/tom')
def index3():
sleep(2)
return 'hello tom!'
app.run()
start = time.time() urls = [ 'http://127.0.0.1:5000/bobo', 'http://127.0.0.1:5000/jay', 'http://127.0.0.1:5000/tom', ] for url in urls: page_text = requests.get(url,headers=headers).text print(page_text) print(time.time()-start)
hello bobo!
hello jay!
hello tom!
6.016878366470337
异步代码
基于线程池实现异步爬取
from multiprocessing.dummy import Pool #线程池模块 #必须只可以有一个参数 def my_requests(url): return requests.get(url=url,headers=headers).text start = time.time() urls = [ 'http://127.0.0.1:5000/bobo', 'http://127.0.0.1:5000/jay', 'http://127.0.0.1:5000/tom', ] pool = Pool(3) #map:两个参数 #参数1:自定义的函数,必须只可以有一个参数 #参数2:列表or字典 #map的作用就是让参数1表示的自定义的函数异步处理参数2对应的列表或者字典中的元素 page_texes = pool.map(my_requests,urls) print(page_texes) print(time.time()-start)
['hello bobo!', 'hello jay!', 'hello tom!'] 2.0126171112060547
- asyncio
- 如何产生一个携程对象
- 什么是任务对象
- 任务对象和携程对象的区别
- 任务对象如何绑定一个回调呢
- 什么是事件循环呢?
- aiohttp