问题复现
在使用celery + rabbitmq作为broker时,启动一定时间后 会 由于celery和rabbitmq的心跳检测机制 认为连接有问题,先报 如下错误
Too many heartbeats missed
再过一段时间,由于认为心跳有问题,会断开tcp连接,就会报 如下错误:
ConnectionResetError: [Errno 104] Connection reset by peer
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] Traceback (most recent call last):
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "src/gevent/greenlet.py", line 766, in gevent._greenlet.Greenlet.run
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/kombu/asynchronous/timer.py", line 68, in __call__
project_name | return self.fun(*self.args, **self.kwargs)
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/kombu/asynchronous/timer.py", line 130, in _reschedules
project_name | return fun(*args, **kwargs)
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 313, in heartbeat_check
project_name | return self.transport.heartbeat_check(self.connection, rate=rate)
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/kombu/transport/pyamqp.py", line 149, in heartbeat_check
project_name | return connection.heartbeat_tick(rate=rate)
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/amqp/connection.py", line 744, in heartbeat_tick
project_name | self.send_heartbeat()
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/amqp/connection.py", line 695, in send_heartbeat
project_name | self.frame_writer(8, 0, None, None, None)
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/amqp/method_framing.py", line 189, in write_frame
project_name | write(view[:offset])
project_name | [2021-10-27 12:19:00,125: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/amqp/transport.py", line 305, in write
project_name | self._write(s)
project_name | [2021-10-27 12:19:00,126: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/gevent/_socket3.py", line 458, in sendall
project_name | return _socketcommon._sendall(self, data_memory, flags)
project_name | [2021-10-27 12:19:00,126: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/gevent/_socketcommon.py", line 374, in _sendall
project_name | timeleft = __send_chunk(socket, chunk, flags, timeleft, end)
project_name | [2021-10-27 12:19:00,126: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/gevent/_socketcommon.py", line 314, in __send_chunk
project_name | data_sent += socket.send(chunk, flags, timeout=timeleft)
project_name | [2021-10-27 12:19:00,126: WARNING/MainProcess] File "/usr/local/lib/python3.6/site-packages/gevent/_socket3.py", line 439, in send
project_name | return _socket.socket.send(self._sock, data, flags)
project_name | [2021-10-27 12:19:00,126: WARNING/MainProcess] ConnectionResetError: [Errno 104] Connection reset by peer
这时 tcp连接被断开,但是 celery的worker还在继续处理任务,等worker处理完任务 需要回写ack告诉rabbitmq此任务已经完成,由于tcp已经断开连接,就会报如下错误(原因是 tcp断开连接,向一个已经关闭的socket写入数据时就是这个Broken pipe错误)
Couldn't ack 19, reason:BrokenPipeError(32, 'Broken pipe')
project_name | [2021-10-27 04:28:34,232: CRITICAL/MainProcess] Couldn't ack 19, reason:BrokenPipeError(32, 'Broken pipe')
project_name | Traceback (most recent call last):
project_name | File "/usr/local/lib/python3.6/site-packages/kombu/message.py", line 131, in ack_log_error
project_name | self.ack(multiple=multiple)
project_name | File "/usr/local/lib/python3.6/site-packages/kombu/message.py", line 126, in ack
project_name | self.channel.basic_ack(self.delivery_tag, multiple=multiple)
project_name | File "/usr/local/lib/python3.6/site-packages/amqp/channel.py", line 1394, in basic_ack
project_name | spec.Basic.Ack, argsig, (delivery_tag, multiple),
project_name | File "/usr/local/lib/python3.6/site-packages/amqp/abstract_channel.py", line 59, in send_method
project_name | conn.frame_writer(1, self.channel_id, sig, args, content)
project_name | File "/usr/local/lib/python3.6/site-packages/amqp/method_framing.py", line 189, in write_frame
project_name | write(view[:offset])
project_name | File "/usr/local/lib/python3.6/site-packages/amqp/transport.py", line 305, in write
project_name | self._write(s)
project_name | File "/usr/local/lib/python3.6/site-packages/gevent/_socket3.py", line 458, in sendall
project_name | return _socketcommon._sendall(self, data_memory, flags)
project_name | File "/usr/local/lib/python3.6/site-packages/gevent/_socketcommon.py", line 374, in _sendall
project_name | timeleft = __send_chunk(socket, chunk, flags, timeleft, end)
project_name | File "/usr/local/lib/python3.6/site-packages/gevent/_socketcommon.py", line 303, in __send_chunk
project_name | data_sent += socket.send(chunk, flags)
project_name | File "/usr/local/lib/python3.6/site-packages/gevent/_socket3.py", line 439, in send
project_name | return _socket.socket.send(self._sock, data, flags)
project_name | BrokenPipeError: [Errno 32] Broken pipe
分析
从错误发生顺序来看,起因是由于 celery和rabbitmq的心跳检测机制 导致后续连锁错误;
celery心跳检测机制源码分析
解决方法
-
既然心跳机制有问题,就去掉心跳机制, celery配置中 将 broker_heartbeat 值改为0 即可;
解决此问题的限制情景
- celery连接的broker必须为rabbitmq(不能为redis等等)
- celery的transport必须是 pyamqp,一般默认就是pyamqp,而不是librabbitmq(只要不安装librabbitmq库)
其他
- 此bug即使在目前最新的celery 5.1.2版本仍然无法解决,只能通过设置参数禁用心跳机制;
- 经过1周的测试,禁用心跳机制后,celery worker正常消费任务,无任何问题;
相关连接:
Configuration and defaults — Celery 4.2.1 documentationhttps://docs.celeryproject.org/en/v4.2.1/userguide/configuration.html#broker-heartbeatRabbitmq 心跳检测 - -零 - 博客园前言 使用rabbitmq的时候,当你客户端与rabbitmq服务器之间一段时间没有流量,服务器将会断开与客户端之间tcp连接。 而你将在服务器上看这样的日志: missed heartbeats fhttps://www.cnblogs.com/-wenli/p/13603712.html