airflow 运行周期设置 schedule_interval

airflow 运行周期问题

最近开始正式使用airflow,关于 schedule_interval 和页面上显示的 last run一直有些不太清楚的地方,而在设置一个每周运行的任务时终于遇到了问题,任务并没有能够如期运行。

一系列google之后发现 airflow的 schedule_interval虽然可以使用cron表达式,但是还是和crontab有一些区别的。

关于 backfill

backfill命令是用来回填数据的,也就是说以之前的日期运行任务。

当任务是每天运行时只需要加上开始日期就可以了,例如

airflow backfill CKD_ALL_REPORT -s 2018-09-04

但是当任务时多天运行一次时这样就不起作用了,会提示

No run dates were found for the given dates and dag interval.

这是因为 airflow有一个窗口的概念
Airflow sets execution_date based on the left bound of the schedule period it is covering, not based on when it fires (which would be the right bound of the period)
stackoverflow上搜到比较合理的解释,意思就是说,airflow会在start_date开始后,符合schedule_interval定义的第一个时间点记为execution_date,但是会在下个时间点到达是才开始运行,也就是说由于这个窗口的原因,last run会滞后一个周期。
所以如何通过jinja来查看execution_date就会发现问题

Jinja模板

JINJA表达式含义
{{ ds }}the execution date as YYYY-MM-DD
{{ ds_nodash }}the execution date as YYYYMMDD
{{ yesterday_ds }}yesterday’s date as YYYY-MM-DD
{{ yesterday_ds_nodash }}yesterday’s date as YYYYMMDD
{{ tomorrow_ds }}tomorrow’s date as YYYY-MM-DD
{{ tomorrow_ds_nodash }}tomorrow’s date as YYYYMMDD
{{ ts }}same as execution_date.isoformat()
{{ ts_nodash }}same as ts without - and :
{{ execution_date }}the execution_date, (datetime.datetime)
{{ prev_execution_date }}the previous execution date (if available)(datetime.datetime)
{{ next_execution_date }}the next execution date (datetime.datetime)
{{ dag }}the DAG object
{{ task }}the Task object
{{ macros }}a reference to the macros package, described below
{{ task_instance }}the task_instance object
{{ end_date }}same as {{ ds }}
{{ latest_date }}same as {{ ds }}
{{ ti }}same as {{ task_instance }}
{{ params }}a reference to the user-defined params dictionary
{{ var.value.my_var }}global defined variables represented as a dictionary
{{ var.json.my_var.path }}global defined variables represented as a dictionary with deserialized JSON object, append the path to the key within the JSON object
{{ task_instance_key_str }}a unique, human-readable key to the task instance formatted {dag_id}{task_id}{ds}
confthe full configuration object located at airflow.configuration.conf which represents
run_idthe run_id of the current DAG run
dag_runa reference to the DagRun object
test_modewhether the task instance was called using the CLI’s test subcommand

参考资料:
https://stackoverflow.com/questions/39612488/airflow-trigger-dag-execution-date-is-the-next-day-why/39620901#39620901

  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 8
    评论
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值