airflow scheduler stuck

项目场景:

项目场景:airflow scheduler在运行一段时间后突然不工作或者空转,后台进程存在,但是不工作


问题描述

查看日志发现scheduler无心跳

================================================================================
[2022-06-14 14:51:06,301] {manager.py:1065} INFO - Finding 'running' jobs without a recent heartbeat
[2022-06-14 14:51:06,304] {manager.py:1069} INFO - Failing jobs without heartbeat after 2022-06-14 06:46:06.304241+00:00
[2022-06-14 14:51:16,495] {manager.py:1065} INFO - Finding 'running' jobs without a recent heartbeat
[2022-06-14 14:51:16,497] {manager.py:1069} INFO - Failing jobs without heartbeat after 2022-06-14 06:46:16.497371+00:00


查看scheduler错误日志

Traceback (most recent call last):
  File "/home/airflow/env/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 697, in _finalize_fairy
    fairy._reset(pool)
  File "/home/airflow/env/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 893, in _reset
    pool._dialect.do_rollback(self)
  File "/home/airflow/env/lib/python3.9/site-packages/sqlalchemy/dialects/mysql/base.py", line 2513, in do_rollback
    dbapi_connection.rollback()
MySQLdb._exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
(env) *@:/home/>tail airflow/conf/airflow-scheduler.err 
Traceback (most recent call last):
  File "/home/airflow/env/lib/python3.9/threading.py", line 950, in _bootstrap_inner
    self.run()
  File "/home/airflow/env/lib/python3.9/concurrent/futures/process.py", line 317, in run
    result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
  File "/home/airflow/env/lib/python3.9/concurrent/futures/process.py", line 376, in wait_result_broken_or_wakeup
    worker_sentinels = [p.sentinel for p in self.processes.values()]
  File "/home/airflow/env/lib/python3.9/concurrent/futures/process.py", line 376, in <listcomp>
    worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration

原因分析:

追踪栈堆可见

Process 13749: python3.9 /home/airflow/env/bin/airflow scheduler -D
Python v3.9.0 (/home/airflow/env/bin/python3.9)

Thread 13749 (idle): "MainThread"
    wait (threading.py:312)
    result (concurrent/futures/_base.py:435)
    result_iterator (concurrent/futures/_base.py:600)
    _chain_from_iterable_of_lists (concurrent/futures/process.py:559)
    _send_tasks_to_celery (airflow/executors/celery_executor.py:325)
    _process_tasks (airflow/executors/celery_executor.py:277)
    trigger_tasks (airflow/executors/celery_executor.py:268)
    heartbeat (airflow/executors/base_executor.py:158)
    _run_scheduler_loop (airflow/jobs/scheduler_job.py:734)
    _execute (airflow/jobs/scheduler_job.py:651)
    run (airflow/jobs/base_job.py:246)
    _run_scheduler_job (airflow/cli/commands/scheduler_command.py:46)
    scheduler (airflow/cli/commands/scheduler_command.py:70)
    wrapper (airflow/utils/cli.py:92)
    command (airflow/cli/cli_parser.py:48)
    main (airflow/__main__.py:48)
    <module> (airflow:8)
Thread 2111 (idle): "QueueFeederThread"
    wait (threading.py:312)
    _feed (multiprocessing/queues.py:233)
    run (threading.py:888)
    _bootstrap_inner (threading.py:950)
    _bootstrap (threading.py:908)


发现是线程空转导致scheduler进程空等,一直饥饿,检查发现这其实是python3.9的bug
详见issue43498

解决方案:

一次性方案

通过web端或者命令行
airflow jobs check --job-type SchedulerJob --allow-multiple --limit 100
监控scheduler的状态
如果进程无心跳则重启

永久方案

升级python版本

python >=3.9.10或3.10.1

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值