python asyncio 并发编程_asyncio并发编程-中

最新推荐文章于 2022-06-11 17:33:13 发布

郑丢丢

最新推荐文章于 2022-06-11 17:33:13 发布

阅读量164

点赞数

文章标签： python asyncio 并发编程

本文链接：https://blog.csdn.net/weixin_32673065/article/details/114358974

版权

ThreadPoolExecutor和asyncio完成阻塞IO请求

这个小节我们看下如何将线程池和asyncio结合起来。

在协程里面我们还是需要使用多线程的，那什么时候需要使用多线程呢？

我们知道协程里面是不能加入阻塞IO的，但是有时我们必须执行阻塞IO的操作的时候，我们就需要多线程编程了，即我们要在协程中集成阻塞IO的时候就需要多线程操作。

import asyncio

from concurrent.futures import ThreadPoolExecutor

import socket

from urllib.parse import urlparse

def get_url(url):

#通过socket请求html

url = urlparse(url)

host = url.netloc

path = url.path

if path == "":

path = "/"

#建立socket连接

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

client.connect((host, 80)) #阻塞不会消耗cpu

#不停的询问连接是否建立好，需要while循环不停的去检查状态

#做计算任务或者再次发起其他的连接请求

client.send("GET {} HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf8"))

data = b""

while True:

d = client.recv(1024)

if d:

data += d

else:

break

data = data.decode("utf8")

html_data = data.split("\r\n\r\n")[1]

print(html_data)

client.close()

if __name__ == "__main__":

import time

start_time = time.time()

loop = asyncio.get_event_loop()

# 获得线程池的 executor

executor = ThreadPoolExecutor()

# 同样我们可以控制线程池的并发数量

# executor = ThreadPoolExecutor()

# 并发20个请

tasks = []

for url in range(20):

url = "http://shop.projectsedu.com/goods/{}/".format(url)

# 将阻塞的代码放到线程池中运行返回的是 task

task = loop.run_in_executor(executor, get_url, url)

tasks.append(task)

loop.run_until_complete(asyncio.wait(tasks))

print("last time:{}".format(time.time()-start_time))

# 输出

last time:2.110485076904297

上面的代码会生成一个线程池然后让阻塞的代码去线程池中执行。

看下源码：

def run_in_executor(self, executor, func, *args):

self._check_closed()

if self._debug:

self._check_callback(func, 'run_in_executor')

if executor is None:

executor = self._default_executor

# 即使我们没创建 executor 也会自己创建一个

if executor is None:

executor = concurrent.futures.ThreadPoolExecutor()

self._default_executor = executor

# 最后将阻塞代码放到线程池执行然后返回一个 future 对象

return futures.wrap_future(executor.submit(func, *args), loop=self)

def wrap_future(future, *, loop=None):

"""Wrap concurrent.futures.Future object."""

if isfuture(future):

return future

assert isinstance(future, concurrent.futures.Future), \

'concurrent.futures.Future is expected, got {!r}'.format(future)

if loop is None:

loop = events.get_event_loop()

new_future = loop.create_future()

_chain_future(future, new_future)

return new_future

当我们需要在协程中调用阻塞IO的时候就可以按照这种方式放到线程池中

asyncio模拟http请求

在asyncio里面凡是异步的地方都会创建一个future

import asyncio

from urllib.parse import urlparse

async def get_url(url):

url = urlparse(url)

host = url.netloc

path = url.path

if path == "":

path = "/"

# 通过协程的方式建立socket连接返回两个对象

reader, writer = await asyncio.open_connection(host, 80)

writer.write("GET {} HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf8"))

all_lines = []

async for raw_line in reader:

data = raw_line.decode("utf8")

all_lines.append(data)

html = "\n".join(all_lines)

return html

async def main():

tasks = []

for url in range(20):

url = "http://shop.projectsedu.com/goods/{}/".format(url)

# 添加 future 对象到列表中

tasks.append(asyncio.ensure_future(get_url(url)))

# 将完成的打印出来 as_completed 返回的是协程

for task in asyncio.as_completed(tasks):

result = await task

print(result)

if __name__ == "__main__":

import time

start_time = time.time()

loop = asyncio.get_event_loop()

loop.run_until_complete(main())

print('last time:{}'.format(time.time() - start_time))

if __name__ == "__main__":

import time

start_time = time.time()

loop = asyncio.get_event_loop()

tasks = []

for url in range(20):

url = "http://shop.projectsedu.com/goods/{}/".format(url)

tasks.append(get_url(url))

loop.run_until_complete(asyncio.wait(tasks))

print('last time:{}'.format(time.time() - start_time))

整个过程和之前我们实现的完全一致

future和task

future是一个结果容器会将结果放到future中，结果容器运行完毕之后会运行callback，类似线程池中的future。task是future的一个子类。

我们看下一个特殊的函数

class Future:

"""This class is *almost* compatible with concurrent.futures.Future.

Differences:

- result() and exception() do not take a timeout argument and

raise an exception when the future isn't done yet.

- Callbacks registered with add_done_callback() are always called

via the event loop's call_soon_threadsafe().

- This class is not compatible with the wait() and as_completed()

methods in the concurrent.futures package.

(In Python 3.4 or later we may be able to unify the implementations.)

"""

def set_result(self, result):

"""Mark the future done and set its result.

If the future is already done when this method is called, raises

InvalidStateError.

"""

if self._state != _PENDING:

raise InvalidStateError('{}: {!r}'.format(self._state, self))

self._result = result

self._state = _FINISHED

# 运行完赋值之后执行回调

self._schedule_callbacks()

def _schedule_callbacks(self):

"""Internal: Ask the event loop to call all callbacks.

The callbacks are scheduled to be called as soon as possible. Also

clears the callback list.

"""

callbacks = self._callbacks[:]

if not callbacks:

return

self._callbacks[:] = []

# 因为是单线程模式调用 call_soon 放到 loop 队列中

# 然后由loop队列取数据执行

# 其他部分和线程池类似

for callback in callbacks:

self._loop.call_soon(callback, self)

为什么需要一个Task对象呢？

实际上task是协程和future之间的一个重要桥梁。

我们看下具体代码

我们知道在定义一个协程之后，在驱动协程之前，必须对这个协程调用一次next或send方法，让这个协程生效

820d32c42946

image.png

我们从源码看出task对象在初始化的时候调用了_step函数，而这个函数做了两个必要的事情。

第一个就是启动协程：

协程是和线程不一样的，协程必须要经历一个启动的过程。线程则不必，因此线程是由操作系统来调用的。但是协程是程序员自己调度的，我们必须要解决协程启动的问题。所以为了解决这个问题，抽象除了一个task对象，在初始化的时候就会启动协程。

第二个就是将协程的返回值设置到result中：

当运行时抛出StopIteration的时候，就会运行set_result将协程的return值保存到result中。线程中是没有StopIteration异常的。

为了保持协程和线程接口一致问题，创造了task对象来解决协程和线程不一样的地方所需要解决的问题。

我们看下上篇的图片，其中将上面的代码图形化了。

820d32c42946

image.png

郑丢丢

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python asyncio 并发编程_asyncio并发编程-中

ThreadPoolExecutor和asyncio完成阻塞IO请求这个小节我们看下如何将线程池和asyncio结合起来。在协程里面我们还是需要使用多线程的，那什么时候需要使用多线程呢？我们知道协程里面是不能加入阻塞IO的，但是有时我们必须执行阻塞IO的操作的时候，我们就需要多线程编程了，即我们要在协程中集成阻塞IO的时候就需要多线程操作。import asynciofrom concurrent...
复制链接

扫一扫