Airflow的dag的解析方法

从Job.py SchedulerJob类的_execute方法开始

    def _execute(self):
        self.log.info("Starting the scheduler")

        # DAGs can be pickled for easier remote execution by some executors
        pickle_dags = False
        if self.do_pickle and self.executor.__class__ not in \
                (executors.LocalExecutor, executors.SequentialExecutor):
            pickle_dags = True

        self.log.info("Running execute loop for %s seconds", self.run_duration)
        self.log.info("Processing each file at most %s times", self.num_runs)

        # Build up a list of Python files that could contain DAGs
        self.log.info("Searching for files in %s", self.subdir)
        known_file_paths = list_py_file_paths(self.subdir)
        self.log.info("There are %s files in %s", len(known_file_paths), self.subdir)
        
        // 以下方法返回真正进行dag解析的类
        def processor_factory(file_path, zombies):
            return DagFileProcessor(file_path,
                                    pickle_dags,
                                    self.dag_ids,
                                    zombies)

        # When using sqlite, we do not use async_mode
        # so the scheduler job and DAG parser don't access the DB at the same time.
        async_mode = not self.using_sqlite

        // 以下为dag解析的入口类
        self.processor_agent = DagFileProcessorAgent(self.subdir,
                                                     known_file_paths,
                                                     self.num_runs,
                                                     processor_factory,
                                                     async_mode)

        try:
            self._execute_helper()
        finally:
            self.processor_agent.end()
            self.log.info("Exited execute loop")

到DagFileProcessorAgent类的start()方法:

"""
        Launch DagFileProcessorManager processor and start DAG parsing loop in manager.
        """
        self._process = self._launch_process(self._dag_directory,
                                             self._file_paths,
                                             self._max_runs,
                                             self._processor_factory,
                                             self._child_signal_conn,
                                             self._stat_queue,
                                             self._result_queue,
                                             self._async_mode)
        self.log.info("Launched DagFileProcessorManager with pid: {}"
                      .format(self._process.pid))

再到DagFileProcessorAgent类的_launch_process方法

@staticmethod
    def _launch_process(dag_directory,
                        file_paths,
                        max_runs,
                        processor_factory,
                        signal_conn,
                        _stat_queue,
                        result_queue,
                        async_mode):
        def helper():
            # Reload configurations and settings to avoid collision with parent process.
            # Because this process may need custom configurations that cannot be shared,
            # e.g. RotatingFileHandler. And it can cause connection corruption if we
            # do not recreate the SQLA connection pool.
            os.environ['CONFIG_PROCESSOR_MANAGER_LOGGER'] = 'True'
            # Replicating the behavior of how logging module was loaded
            # in logging_config.py
            reload_module(import_module(logging_class_path.rsplit('.', 1)[0]))
            reload_module(airflow.settings)
            del os.environ['CONFIG_PROCESSOR_MANAGER_LOGGER']
            processor_manager = DagFileProcessorManager(dag_directory,
                                                        file_paths,
                                                        max_runs,
                                                        processor_factory,
                                                        signal_conn,
                                                        _stat_queue,
                                                        result_queue,
                                                        async_mode)

            processor_manager.start()

        p = multiprocessing.Process(target=helper,
                                    args=(),
                                    name="DagFileProcessorManager")
        p.start()
        return p

再到DagFileProcessorManager类的start_in_async方法:

    def start_in_async(self):
        """
        Parse DAG files repeatedly in a standalone loop.
        """
        while True:
            loop_start_time = time.time()

            if self._signal_conn.poll():
                agent_signal &#
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是一个使用Airflow DAG进行网络接口请求的示例代码: ```python import requests from datetime import datetime, timedelta from airflow.models import DAG from airflow.operators.python_operator import PythonOperator default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2021, 7, 1), 'retries': 1, 'retry_delay': timedelta(minutes=5) } dag = DAG( 'network_api_request', default_args=default_args, schedule_interval=timedelta(days=1) ) def get_api_data(): url = 'https://jsonplaceholder.typicode.com/todos' response = requests.get(url) data = response.json() return data def save_api_data(**context): data = context['task_instance'].xcom_pull(task_ids='get_api_data') with open('/path/to/save/data.json', 'w') as f: f.write(data) get_api_data_task = PythonOperator( task_id='get_api_data', python_callable=get_api_data, dag=dag ) save_api_data_task = PythonOperator( task_id='save_api_data', python_callable=save_api_data, provide_context=True, dag=dag ) get_api_data_task >> save_api_data_task ``` 在这个例子中,我们使用Python的requests库向一个API发送请求,并将其返回的数据保存到本地文件中。我们使用两个PythonOperator来执行两个任务:get_api_data和save_api_data。第一个任务使用get_api_data函数获取API数据,并将数据存储在XCom中。第二个任务使用save_api_data函数从XCom中获取数据,并将其写入本地文件中。 这个DAG每隔一天执行一次,并在执行过程中处理任何错误。你可以根据自己的需要修改该DAG,例如更改请求的API地址或更改数据的保存位置等。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值