python asyncio 异常处理,Python asyncio不显示任何错误

I'm trying to get some data from thousands of urls by using asyncio.

Here is a brief overview of the design:

Fill up a Queue in one go with a bunch of urls using a single Producer

Spawn a bunch of Consumers

Each Consumer keeps asynchronously extracting urls from the Queue and sending GET requests

Do some postprocessing on the result

Combine all processed results and return

Problems: asyncio almost never shows if anything is wrong, it just silently hangs with no errors. I put print statements everywhere to detect problems myself, but it didn't help much.

Depending on the number of input urls and number of consumers or limits i might get these errors:

Task was destroyed but it is pending!

task exception was never retrieved future:

aiohttp.client_exceptions.ServerDisconnectedError

aiohttp.client_exceptions.ClientOSError: [WinError 10053] An established connection was aborted by the software in your host machine

Questions: how to detect and handle exceptions in asyncio? how to retry without disrupting the Queue ?

Bellow is my code that i compiled looking at various examples of async code. Currently, there's in an intentional error at the end of a def get_video_title function. When run, nothing shows up.

import asyncio

import aiohttp

import json

import re

import nest_asyncio

nest_asyncio.apply() # jupyter notebook throws errors without this

user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"

def get_video_title(data):

match = re.search(r'window\[["\']ytInitialPlayerResponse["\']\]\s*=\s*(.*)', data)

string = match[1].strip()[:-1]

result = json.loads(string)

return result['videoDetails']['TEST_ERROR'] #

async def fetch(session, url, c):

async with session.get(url, headers={"user-agent": user_agent}, raise_for_status=True, timeout=60) as r:

print('---------Fetching', c)

if r.status != 200:

r.raise_for_status()

return await r.text()

async def consumer(queue, session, responses):

while True:

try:

i, url = await queue.get()

print("Fetching from a queue", i)

html_page = await fetch(session, url, i)

print('+++Processing', i)

result = get_video_title(html_page) # should raise an error here!

responses.append(result)

queue.task_done()

print('+++Task Done', i)

except (aiohttp.http_exceptions.HttpProcessingError, asyncio.TimeoutError) as e:

print('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Error', i, type(e))

await asyncio.sleep(1)

queue.task_done()

async def produce(queue, urls):

for i, url in enumerate(urls):

print('Putting in a queue', i)

await queue.put((i, url))

async def run(session, urls, consumer_num):

queue, responses = asyncio.Queue(maxsize=2000), []

print('[Making Consumers]')

consumers = [asyncio.ensure_future(

consumer(queue, session, responses))

for _ in range(consumer_num)]

print('[Making Producer]')

producer = await produce(queue=queue, urls=urls)

print('[Joining queue]')

await queue.join()

print('[Cancelling]')

for consumer_future in consumers:

consumer_future.cancel()

print('[Returning results]')

return responses

async def main(loop, urls):

print('Starting a Session')

async with aiohttp.ClientSession(loop=loop, connector=aiohttp.TCPConnector(limit=300)) as session:

print('Calling main function')

posts = await run(session, urls, 100)

print('Done')

return posts

if __name__ == '__main__':

urls = ['https://www.youtube.com/watch?v=dNQs_Bef_V8'] * 100

loop = asyncio.get_event_loop()

results = loop.run_until_complete(main(loop, urls))

解决方案

The problem is that your consumer catches only two very specific exceptions, and in their case marks the task as done. If any other exception happens, such as a network-related exception, it will terminate the consumer. However, this is not detected by run, which is awaiting queue.join() with the consumer (effectively) running in the background. This is why your program hangs - queued items are never accounted for, and the queue is never fully processed.

There are two ways to fix this, depending on what you want your program to do when it encounters an unanticipated exception. If you want it to keep running, you can add a catch-all except clause to the consumer, e.g.:

except Exception as e

print('other error', e)

queue.task_done()

The alternative is for an unhandled consumer exception to propagate to run. This must be arranged explicitly, but has the advantage of never allowing exceptions to pass silently. (See this article for a detailed treatment of the subject.) One way to achieve it is to wait for queue.join() and the consumers at the same time; since consumers are in an infinite loop, they will complete only in case of an exception.

print('[Joining queue]')

# wait for either `queue.join()` to complete or a consumer to raise

done, _ = await asyncio.wait([queue.join(), *consumers],

return_when=asyncio.FIRST_COMPLETED)

consumers_raised = set(done) & set(consumers)

if consumers_raised:

await consumers_raised.pop() # propagate the exception

Questions: how to detect and handle exceptions in asyncio?

Exceptions are propagated through await and normally detected and handled like in any other code. The special handling is only needed to catch exceptions that leak from a "background" task like the consumer.

how to retry without disrupting the Queue ?

You can call await queue.put((i, url)) in the except block. The item will be added to the back of the queue, to be picked up by a consumer. In that case you only need the first snippet, and don't want to bother with trying to propagate the exception in consumer to run.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
`AttributeError: __aenter__` 错误通常表示您在 `websockets.connect()` 方法上使用了 `async with` 语句,但该方法不支持上下文管理器(context manager)。 正常情况下,您应该使用 `async with` 语句来确保在使用完 WebSocket 之后正确关闭连接。但是,`websockets.connect()` 并不是一个上下文管理器,因此会抛出该错误。 要解决这个问题,您可以使用 `websockets.client.connect()` 方法来建立 WebSocket 连接,然后手动管理连接的打开和关闭。 以下是一个示例代码,展示了如何使用 `websockets.client.connect()` 来建立 WebSocket 连接: ```python import asyncio import websockets async def client(): # 建立 WebSocket 连接 websocket = await websockets.client.connect('ws://websocket-server-url') try: # 发送和接收消息的代码在这里实现 await websocket.send("Hello, server!") response = await websocket.recv() print(f"Received response from server: {response}") finally: # 关闭 WebSocket 连接 await websocket.close() # 启动 WebSocket 客户端 asyncio.get_event_loop().run_until_complete(client()) ``` 在上述代码中,我们使用 `websockets.client.connect()` 方法建立 WebSocket 连接,并在 `try-finally` 语句中发送和接收消息。无论是否出现异常,`finally` 块中的代码都会执行,确保 WebSocket 连接被正确关闭。 请注意,上述代码只是一个示例,您需要将 `'ws://websocket-server-url'` 替换为实际的 WebSocket 服务器 URL。 希望这可以解决您的问题!如果您还有其他疑问,请随时提问。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值