Apache Airflow (十二) :PythonOperator

本文介绍了如何在ApacheAirflow中使用PythonOperator调用Python函数,包括op_args和op_kwargs参数的应用,以及两个具体的调度案例,展示了如何传递可变参数和关键字参数给Python函数。
摘要由CSDN通过智能技术生成

🏡 个人主页IT贫道_大数据OLAP体系技术栈,Apache Doris,Clickhouse 技术-CSDN博客

 🚩 私聊博主:加入大数据技术讨论群聊,获取更多大数据资料。

 🔔 博主个人B栈地址:豹哥教你大数据的个人空间-豹哥教你大数据个人主页-哔哩哔哩视频


PythonOperator可以调用Python函数,由于Python基本可以调用任何类型的任务,如果实在找不到合适的Operator,将任务转为Python函数,使用PythonOperator即可。

关于PythonOperator常用参数如下,更多参数可以查看官网:airflow.operators.python — Airflow Documentation

python_callable(python callable):调用的python函数

op_kwargs(dict):调用python函数对应的 **args 参数,dict格式,使用参照案例。

op_args(list):调用python函数对应的 *args 参数,多个封装到一个tuple中,list格式,使用参照案例。

PythonOperator调度案例

import random
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator

# python中 *  关键字参数允许你传入0个或任意个参数,这些可变参数在函数调用时自动组装为一个tuple。
# python中 ** 关键字参数允许你传入0个或任意个含参数名的参数,这些关键字参数在函数内部自动组装为一个dict。
def print__hello1(*a,**b):
    print(a)
    print(b)
    print("hello airflow1")

# 返回的值只会打印到日志中
    return{"sss1":"xxx1"}

def print__hello2(random_base):
    print(random_base)
    print("hello airflow2")

# 返回的值只会打印到日志中
    return{"sss2":"xxx2"}

default_args = {
    'owner':'maliu',
    'start_date':datetime(2021, 10, 1),
    'retries': 1,  # 失败重试次数
    'retry_delay': timedelta(minutes=5) # 失败重试间隔
}

dag = DAG(
    dag_id = 'execute_pythoncode',
    default_args=default_args,
    schedule_interval=timedelta(minutes=1)
)

first=PythonOperator(
    task_id='first',
    #填写  print__hello1 方法时,不要加上“()”
    python_callable=print__hello1,
    # op_args 对应 print_hello1 方法中的a参数
    op_args=[1,2,3,"hello","world"],
    # op_kwargs 对应 print__hello1 方法中的b参数
    op_kwargs={"id":"1","name":"zs","age":18},
    dag = dag
)

second=PythonOperator(
    task_id='second',
    #填写  print__hello2 方法时,不要加上“()”
    python_callable=print__hello2,
    # random_base 参数对应 print_hello2 方法中参数“random_base”
    op_kwargs={"random_base":random.randint(0,9)},
    dag=dag
)

first >> second

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Airflow PythonOperator is a task in Apache Airflow that allows you to execute a Python function as a task within an Airflow DAG (Directed Acyclic Graph). It is one of the most commonly used operators in Airflow. The PythonOperator takes a python_callable argument, which is the function you want to execute, and any other necessary arguments for that function. When the task is executed, Airflow will call the specified Python function and perform the logic inside it. Here's an example of how to use PythonOperator in an Airflow DAG: ```python from airflow import DAG from airflow.operators.python_operator import PythonOperator from datetime import datetime def my_python_function(): # Your logic here print("Hello, I am running inside a PythonOperator") dag = DAG( 'my_dag', start_date=datetime(2022, 1, 1), schedule_interval='@daily' ) my_task = PythonOperator( task_id='my_task', python_callable=my_python_function, dag=dag ) ``` In this example, we define a DAG called 'my_dag' with a daily schedule interval. We then create a PythonOperator called 'my_task' that executes the function `my_python_function`. Whenever the DAG is triggered, Airflow will execute the logic inside `my_python_function`. You can add more parameters to the PythonOperator based on your requirements, such as providing arguments to the python_callable function or defining the pool for task execution. The output of the function can also be used by downstream tasks in the DAG. I hope this answers your question about the Airflow PythonOperator! Let me know if you have any further queries.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

IT贫道

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值