Python3.6 协程

最新推荐文章于 2024-10-16 11:07:31 发布

xu7065

最新推荐文章于 2024-10-16 11:07:31 发布

阅读量1.4k

点赞数 1

文章标签： python 协程爬虫

本文链接：https://blog.csdn.net/xu7065/article/details/102720334

版权

震惊！一名中年韭菜竟然打算使用python来翻身？上周的这个请求股票接口数据脚本，跑了好几天，每次都是发送了将近3000个请求，然后再一条一条将数据存到数据库中，这样实在是耗费时间。

为了节省本来就不多的时间，打算优化下脚本，增加协程，异步爬取、异步存储数据。

1. 协程介绍

为提高爬取的效率，我们会用到多线程，多进程，协程。关于多进程和多线程网上的概念及用法太多太多，不多啰嗦。先跳过。

2. 主要模块

asyncio

asyncio是Python 3.4版本引入的标准库，直接内置了对异步IO的支持，用于定义协程函数，启动协程。

从Python 3.5开始引入了新的语法async和await。

async和await用法如下：

async def hello():
    print("Hello world!")
    await asyncio.sleep(1)
    print("Hello again!")

#结果
Hello world!
(间隔1s)
Hello again!

具体用法：

#定义协程函数
async def func(*args):
    await xxx
#任务列表，异步执行tasks里的所有任务
tasks = [
    asyncio.ensure_future(func(args1)),
    asyncio.ensure_future(func(args2)),
    ...
]
#启动协程
#建立一个loop，这个loop要工作，等待执行
loop=asyncio.get_event_loop()
#告诉程序异步操作的是tasks里的任务，直到处理完所有
loop.run_until_complete(asyncio.wait(tasks))

3. 实现

先来看看不使用协程的话，效果是如何的：

import time
import asyncio

now = lambda :time.time()

codes = ['600006','600007','600008','600009','600010','600011','600012']

def do_some_work(url,count):
	t = now() - start
	str = '第{count}个URL是:{url},花了{t}秒'.format(count=count,url=url,t=t)
        #等待2s
	time.sleep(2)
	print(str)

start = now()

count = 0

for code in codes:
	url = 'http://nuff.eastmoney.com/EM_Finance2015TradeInterface/JS.ashx?id={code}1'.format(code=code)
	count += 1
	do_some_work(url,count)

print('Time:',now() - start)

运行结果：

现在来实现下使用了协程的：

import time
import asyncio

now = lambda :time.time()

codes = ['600006','600007','600008','600009','600010','600011','600012']

#定义协程函数
async def do_some_work(url,count):
	t = now() - start
	str = '第{count}个URL是:{url},花了{t}秒'.format(count=count,url=url,t=t)
        #等待2s
        await asyncio.sleep(2)
	print(str)

start = now()

tasks = []
count = 0

for code in codes:
	url = 'http://nuff.eastmoney.com/EM_Finance2015TradeInterface/JS.ashx?id={code}1'.format(code=code)
	count += 1
        #将需要异步执行的函数加入到tasks中
	tasks.append(asyncio.ensure_future(do_some_work(url,count)))


loop=asyncio.get_event_loop()
#运行
loop.run_until_complete(asyncio.wait(tasks))


print('Time:',now() - start)

运行结果：