Airflow的docker-compose.yaml安装

前言

由于airflow的组件多,相互关联性强,特别在使用celery调度方式的情况下,另外加上airflow的版本迭代确实快。所以本地固有安装不是一种良好的维护方式。所以特此推荐使用docker-compose.yaml进行安装,一站式启停

基础配置

创建airflow主文件夹

该文件夹用于映射docker在宿主机器上的一个根目录,后面设置为$AIRFLOW_HOME

mkdir /home/airflow;

创建基础文件夹

cd /home/airflow;
mkdir -p ./dags ./logs ./plugins ./config;

这是官方解释
./dags - you can put your DAG files here.
./logs - contains logs from task execution and scheduler.
./config - you can add custom log parser or add airflow_local_settings.py to configure cluster policy.
./plugins - you can put your custom plugins here.

  • 这里个人业务需求,需放入src文件夹和run.py文件

创建.env文件

该文件主要用于docker环境里的权限管理

echo -e "AIRFLOW_UID=$(id -u)" > .env;
  • 对于.env文件的解释。官方是这么解释:

On Linux, the quick-start needs to know your host user id and needs to have group id set to 0. Otherwise the files created in dags, logs and plugins will be created with root user ownership. You have to make sure to configure them for the docker-compose;

UID of the user to run Airflow containers as. Override if you want to use non-default Airflow UID (for example when you map folders from host, it should be set to result of id -u call. When it is changed, a user with the UID is created with default name inside the container and home of the use is set to /airflow/home/ in order to share Python libraries installed there. This is in order to achieve the OpenShift compatibility. See more in the Arbitrary Docker User

官方解释,个人理解是给予创建以上文件夹的权限,防止到时候容器里调用会报错

  • .env还有另外一个作用,你可以设置一些airflow的官方参数,比如
AIRFLOW_UID=0
AIRFLOW__OPERATORS__DEFAULT_QUEUE=worker

添加了AIRFLOW__OPERATORS__DEFAULT_QUEUE参数,用于作为执行器的默认队列,在docker启动的时候,就会加载这个参数到环境变量里,airflow默认会去环境变量里获取并引用,所以.env可以用于airflow的参数设置

区分测试环境和正式环境

  • 对于业务开发中,肯定需要区分测试和正式环境,对于不同的环境执行不同的数据库配置或者系统配置,对此,可以通过.env来修改容器里的环境变量

设置linux系统环境变量

首先对于区分正式和测试的环境变量,我是想通过不同的主机进行区分的,所以我是希望跟主机的全局系统环境变量保持一致的。所以我们设置/etc/environment来实现设置全局变量

vim /etc/environment

填入你的环境变量,比如:

RUN_ENV=official

然后:wq保存修改,在重载一下变量

source /etc/environment

测试一下,看下是否是你设置的值,如果没有变化,请重新链接一下shell

echo $RUN_ENV
official

.env添加系统环境变量,让其带到docker容器里

打开.env文件,然后添加一行配置,如下:

RUN_ENV=${RUN_ENV:-test}
  • 这里说明一下,这是给容器配置的系统环境变量配置RUN_ENV这个环境变量,值为取宿主机器的linux环境变量的RUN_ENV,如果没有则取默认值teset

重新构建容器

docker compose build

重新启动容器

docker compose up

验证变量是否同步与容器中

先进入容器,因为我们业务代码一般运行在worker容器里,所以进入worker容器即可

docker exec -u airflow -it airflow-airflow-worker-1 bash
echo $RUN_ENV

最后再在你的代码里通过读取环境变量,即可判断执行那个配置了

配置docker-compose.yaml

获取docker-compose.yaml文件

注意版本号,请对应好最新的airflow版本,比如,官方文档现在是2.8.0,但写这篇文章的时候,airflow已经更新到了2.8.1,而且上架dockerhub了,那么你就需要把url里面的2.8.0对应改成2.8.1,我下面就不改了,你们自行留意一下

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'

修改对应python版本镜像

由于你个人项目环境使用的python版本,可能和airflow的不一致,会导致很多异常,所以你需要匹配你项目的python版本的airflow。
注意!这里有一点,默认是python3.8版本的镜像,如果想用python3.11版本的docker镜像,可以上到dockerhub上面找相应镜像名称

  • dockerhub项目传送门
  • 认准这个项目,很多另外的项目也叫airflow,别被忽悠
    在这里插入图片描述
  • 找到对应的tag,在拉取指令中取出镜像名称:docker pull apache/airflow:2.8.1-python3.11
    在这里插入图片描述
  • 这里也推荐使用apache/airflow:latest-python3.11这个镜像,就一直是python3.11版本的最新版本airflow
    在这里插入图片描述
    打开yaml文件,修改下面参数
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.8.1}

修改为

image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:latest-python3.11}

这样该docker-compose.yaml文件安装出来的airflow就会是python3.11环境的最新版本airflow

设置自己的数据库配置

如果是自己搭建的分布式,最好就不要使用里面数据库配置,用自己的,这里个人是自己创建了一个docker的postgresql和redis,然后修改了以下配置:

AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
非分布式的,可以略过此步

端口设置

对于原yaml文件里面的端口映射,很多都用到了比较常用的一些端口,对于个人业务需求,进行了部分修改

  • 这块看个人需求,主要因为5555和8080,在我服务器上有其他docker在使用,所以修改了一下,如果是单机只挂载airflow,端口没冲突,此步操作可以略过
  • web服务,映射出来的端口8080改成了8100:
  airflow-webserver:
    <<: *airflow-common
    command: webserver
    ports:
      - "8100:8080"
  • celery监控flower,映射出来的端口5555改成了8101
  flower:
    <<: *airflow-common
    command: celery flower
    profiles:
      - flower
    ports:
      - "8101:5555"
  • postgres数据库添加对外接口,非必要,个人查看airflow数据情况使用,但我个人是用的内网,所以是安全的,自己配置请留意,这里不推荐打开
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    ports:
      - "5432:5432"
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 10s
      retries: 5
      start_period: 5s
    restart: always
  • redis数据库添加对外接口,非必要,个人查看airflow数据情况使用,但我个人是用的内网,所以是安全的,自己配置请留意,这里不推荐打开
  redis:
    image: redis:latest
    expose:
      - 6379
    ports:
      - "6380:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 30s
      retries: 50
      start_period: 30s
    restart: always

添加权限管理文件

上面创建的.env文件,现在需要在docker-compose.yaml文件里面配置进去,记得加在volumes:配置项上方

  env_file:
      - .env

增加两个映射卷

对于项目开发,dags是存放你的调度dags的文件,但dag调度的时候还是需要你的脚本代码文件,所以这里添加一个src文件夹用户存放你的脚本代码文件。
另外,个人设计方面,每个脚本的入口都是放在run.py文件里,进行统一的调度输出。所以再添加一个run.py文件

  volumes:
    - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
    - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
    - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
    - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins

添加两个卷后

  volumes:
    - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
    - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
    - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
    - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
    - ${AIRFLOW_PROJ_DIR:-.}/src:/opt/airflow/src
    - ${AIRFLOW_PROJ_DIR:-.}/run.py:/opt/airflow/run.py

这里要特别注意run.py文件要先放进去,因为如果没有放一个文件进去占位置,docker build的时候会默认创建一个run.py的文件夹!这就会出错了

初始化数据库

通过以下指令,先对数据库进行初始化

docker compose up airflow-init

执行后输出

[+] Running 43/9
 ✔ airflow-init 22 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿]      0B/0B      Pulled                                                                                                            116.1s 
 ✔ redis 6 layers [⣿⣿⣿⣿⣿⣿]      0B/0B      Pulled                                                                                                                                     26.3s 
 ✔ postgres 12 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿]      0B/0B      Pulled                                                                                                                           32.1s                                                                                                                                                                             
[+] Running 4/3
 ✔ Network airflow_default           Created                                                                                                                                           0.2s 
 ✔ Container airflow-postgres-1      Created                                                                                                                                           0.6s 
 ✔ Container airflow-redis-1         Created                                                                                                                                           0.6s 
 ✔ Container airflow-airflow-init-1  Created                                                                                                                                           0.0s 
Attaching to airflow-airflow-init-1
airflow-airflow-init-1  | The container is run as root user. For security, consider using a regular user account.
airflow-airflow-init-1  | 
airflow-airflow-init-1  | DB: postgresql+psycopg2://airflow:***@postgres/airflow
airflow-airflow-init-1  | Performing upgrade to the metadata database postgresql+psycopg2://airflow:***@postgres/airflow
airflow-airflow-init-1  | [2024-01-25T03:10:12.140+0000] {migration.py:216} INFO - Context impl PostgresqlImpl.
airflow-airflow-init-1  | [2024-01-25T03:10:12.141+0000] {migration.py:219} INFO - Will assume transactional DDL.
airflow-airflow-init-1  | [2024-01-25T03:10:12.143+0000] {migration.py:216} INFO - Context impl PostgresqlImpl.
airflow-airflow-init-1  | [2024-01-25T03:10:12.144+0000] {migration.py:219} INFO - Will assume transactional DDL.
airflow-airflow-init-1  | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
airflow-airflow-init-1  | INFO  [alembic.runtime.migration] Will assume transactional DDL.
airflow-airflow-init-1  | INFO  [alembic.runtime.migration] Running stamp_revision  -> 88344c1d9134
airflow-airflow-init-1  | Database migrating done!
airflow-airflow-init-1  | /home/airflow/.local/lib/python3.11/site-packages/flask_limiter/extension.py:336 UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
airflow-airflow-init-1  | [2024-01-25T03:10:15.202+0000] {override.py:1369} INFO - Inserted Role: Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.206+0000] {override.py:1369} INFO - Inserted Role: Public
airflow-airflow-init-1  | [2024-01-25T03:10:15.437+0000] {override.py:868} WARNING - No user yet created, use flask fab command to do it.
airflow-airflow-init-1  | [2024-01-25T03:10:15.474+0000] {override.py:1769} INFO - Created Permission View: can edit on Passwords
airflow-airflow-init-1  | [2024-01-25T03:10:15.479+0000] {override.py:1820} INFO - Added Permission can edit on Passwords to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.484+0000] {override.py:1769} INFO - Created Permission View: can read on Passwords
airflow-airflow-init-1  | [2024-01-25T03:10:15.487+0000] {override.py:1820} INFO - Added Permission can read on Passwords to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.498+0000] {override.py:1769} INFO - Created Permission View: can edit on My Password
airflow-airflow-init-1  | [2024-01-25T03:10:15.502+0000] {override.py:1820} INFO - Added Permission can edit on My Password to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.507+0000] {override.py:1769} INFO - Created Permission View: can read on My Password
airflow-airflow-init-1  | [2024-01-25T03:10:15.510+0000] {override.py:1820} INFO - Added Permission can read on My Password to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.519+0000] {override.py:1769} INFO - Created Permission View: can edit on My Profile
airflow-airflow-init-1  | [2024-01-25T03:10:15.522+0000] {override.py:1820} INFO - Added Permission can edit on My Profile to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.527+0000] {override.py:1769} INFO - Created Permission View: can read on My Profile
airflow-airflow-init-1  | [2024-01-25T03:10:15.531+0000] {override.py:1820} INFO - Added Permission can read on My Profile to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.556+0000] {override.py:1769} INFO - Created Permission View: can create on Users
airflow-airflow-init-1  | [2024-01-25T03:10:15.560+0000] {override.py:1820} INFO - Added Permission can create on Users to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.565+0000] {override.py:1769} INFO - Created Permission View: can read on Users
airflow-airflow-init-1  | [2024-01-25T03:10:15.568+0000] {override.py:1820} INFO - Added Permission can read on Users to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.573+0000] {override.py:1769} INFO - Created Permission View: can edit on Users
airflow-airflow-init-1  | [2024-01-25T03:10:15.576+0000] {override.py:1820} INFO - Added Permission can edit on Users to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.581+0000] {override.py:1769} INFO - Created Permission View: can delete on Users
airflow-airflow-init-1  | [2024-01-25T03:10:15.584+0000] {override.py:1820} INFO - Added Permission can delete on Users to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.591+0000] {override.py:1769} INFO - Created Permission View: menu access on List Users
airflow-airflow-init-1  | [2024-01-25T03:10:15.594+0000] {override.py:1820} INFO - Added Permission menu access on List Users to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.602+0000] {override.py:1769} INFO - Created Permission View: menu access on Security
airflow-airflow-init-1  | [2024-01-25T03:10:15.606+0000] {override.py:1820} INFO - Added Permission menu access on Security to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.624+0000] {override.py:1769} INFO - Created Permission View: can create on Roles
airflow-airflow-init-1  | [2024-01-25T03:10:15.628+0000] {override.py:1820} INFO - Added Permission can create on Roles to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.633+0000] {override.py:1769} INFO - Created Permission View: can read on Roles
airflow-airflow-init-1  | [2024-01-25T03:10:15.636+0000] {override.py:1820} INFO - Added Permission can read on Roles to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.642+0000] {override.py:1769} INFO - Created Permission View: can edit on Roles
airflow-airflow-init-1  | [2024-01-25T03:10:15.645+0000] {override.py:1820} INFO - Added Permission can edit on Roles to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.650+0000] {override.py:1769} INFO - Created Permission View: can delete on Roles
airflow-airflow-init-1  | [2024-01-25T03:10:15.654+0000] {override.py:1820} INFO - Added Permission can delete on Roles to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.663+0000] {override.py:1769} INFO - Created Permission View: menu access on List Roles
airflow-airflow-init-1  | [2024-01-25T03:10:15.667+0000] {override.py:1820} INFO - Added Permission menu access on List Roles to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.683+0000] {override.py:1769} INFO - Created Permission View: can read on User Stats Chart
airflow-airflow-init-1  | [2024-01-25T03:10:15.687+0000] {override.py:1820} INFO - Added Permission can read on User Stats Chart to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.696+0000] {override.py:1769} INFO - Created Permission View: menu access on User's Statistics
airflow-airflow-init-1  | [2024-01-25T03:10:15.700+0000] {override.py:1820} INFO - Added Permission menu access on User's Statistics to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.722+0000] {override.py:1769} INFO - Created Permission View: can read on Permissions
airflow-airflow-init-1  | [2024-01-25T03:10:15.726+0000] {override.py:1820} INFO - Added Permission can read on Permissions to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.735+0000] {override.py:1769} INFO - Created Permission View: menu access on Actions
airflow-airflow-init-1  | [2024-01-25T03:10:15.738+0000] {override.py:1820} INFO - Added Permission menu access on Actions to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.758+0000] {override.py:1769} INFO - Created Permission View: can read on View Menus
airflow-airflow-init-1  | [2024-01-25T03:10:15.762+0000] {override.py:1820} INFO - Added Permission can read on View Menus to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.772+0000] {override.py:1769} INFO - Created Permission View: menu access on Resources
airflow-airflow-init-1  | [2024-01-25T03:10:15.777+0000] {override.py:1820} INFO - Added Permission menu access on Resources to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.801+0000] {override.py:1769} INFO - Created Permission View: can read on Permission Views
airflow-airflow-init-1  | [2024-01-25T03:10:15.805+0000] {override.py:1820} INFO - Added Permission can read on Permission Views to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:15.813+0000] {override.py:1769} INFO - Created Permission View: menu access on Permission Pairs
airflow-airflow-init-1  | [2024-01-25T03:10:15.817+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin
airflow-airflow-init-1  | [2024-01-25T03:10:16.637+0000] {override.py:1458} INFO - Added user airflow
airflow-airflow-init-1  | User "airflow" created with role "Admin"
airflow-airflow-init-1  | 2.8.1
  • 默认情况下,postgres的账号密码都是airflow,redis则是密码为空,如果airflow是有开放公网,这一块记得要留意修改密码

执行完成后查看docker images,可以看到新增了3个镜像

[root@localhost ~]# docker images
REPOSITORY           TAG                 IMAGE ID       CREATED        SIZE
apache/airflow       latest-python3.11   6982e639a403   5 days ago     1.34GB
postgres             13                  0896a8e0282d   2 years ago    371MB
redis                latest              7614ae9453d1   2 years ago    113MB

再查看docker容器,可以看到新增了2个容器

[root@localhost ~]# docker ps
CONTAINER ID   IMAGE          COMMAND                 CREATED           STATUS                    PORTS       NAMES
596112365a38   redis:latest  "docker-entrypoint.s…"   7 minutes ago    Up 7 minutes (healthy)   6379/tcp    airflow-redis-1
afa0e8f05ac4   postgres:13   "docker-entrypoint.s…"   7 minutes ago    Up 7 minutes (healthy)   5432/tcp    airflow-postgres-1

此时db环境就配置好了

启动airflow

直接启动

可以用于查看airflow运行是否正常

docker compose up

后台启动

docker compose up -d

带上flower一并启动

docker-compose --profile flower up -d

只启动某个服务

docker-compose up airflow-worker

airflow的docker常用指令

关闭

docker-compose down

重启

docker-compose restart

root用户进入容器

docker exec -it airflow-airflow-worker-1 /bin/bash

airflow用户进入容器

docker exec -it -u airflow airflow-airflow-worker-1 /bin/bash

访问airflow网页

如果你是修改过映射端口的话,那就用你的映射端口,默认是8080

http://127.0.0.1:8080
此时会要登陆,默认账号密码都是airflow

修改密码

  • 点击右上角图标——Your Profile
    在这里插入图片描述
  • Reset my password
    在这里插入图片描述

业务代码python包安装方法

简单粗暴

  • 执行任务的容器是airflow-airflow-worker-1,直接进入容器pip install安装即可,需要使用airflow用户进入才可安装
docker exec -it -u airflow airflow-airflow-worker-1 /bin/bash
  • 这里重点,如果root用户进去,操作pip,还是会要你用airflow用户进入才可以使用pip
root@2b4151bb054d:/opt/airflow# pip list
You are running pip as root. Please use 'airflow' user to run pip!
See: https://airflow.apache.org/docs/docker-stack/build.html#adding-a-new-pypi-package
  • 镜像内安装requirements.txt,如果使用了 -i ,则要信任站点,–trusted-host mirrors.aliyun.com,如下:
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

这样安装有一个缺点就是如果你docker compose build或者更新了airflow版本,则之前安装的包都会丢失,对此还是推荐下面一种方法

优雅艺术

  • 将需要的requirements.txt文件,放到airflow目录中
  • 创建Dockerfile文件,输入以下内容,FROM改成你的镜像名
FROM apache/airflow:latest-python3.11
USER airflow
COPY requirements.txt /
RUN pip install -r /requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host mirrors.aliyun.com
  • 随后在yaml文件中,屏蔽image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.7.3-python3.11},打开build .,然后保存,如下图:在这里插入图片描述
  • 现阶段airflow目录下,应该是这个样子了,可以先比对一下
    在这里插入图片描述
  • 然后执行一下以下指令,对docker进行重建
docker compose build
  • 验证安装结果,进入worker的容器,pip list查看

安装后的容器运行情况

通过docker ps指令查看

[root@localhost airflow] docker ps
CONTAINER ID   IMAGE                                              COMMAND                        CREATED             STATUS                               PORTS                     NAMES
853c717d4fdb   redis:latest                      "docker-entrypoint.s…"   5 minutes ago    Up 5 minutes (healthy)     0.0.0.0:6380->6379/tcp, :::6380->6379/tcp                 airflow-redis-1
33d6899c0aa8   postgres:13                       "docker-entrypoint.s…"   5 minutes ago    Up 5 minutes (healthy)     0.0.0.0:5432->5432/tcp, :::5432->5432/tcp                 airflow-postgres-1
bb26e290270a   airflow-airflow-webserver         "/usr/bin/dumb-init …"   5 minutes ago    Up 5 minutes (healthy)     0.0.0.0:8100->8080/tcp, :::8100->8080/tcp                 airflow-airflow-webserver-1
a6fc286cd79b   airflow-airflow-triggerer         "/usr/bin/dumb-init …"   5 minutes ago    Up 5 minutes (unhealthy)   8080/tcp                                                  airflow-airflow-triggerer-1
6b60e663f16d   airflow-airflow-scheduler         "/usr/bin/dumb-init …"   5 minutes ago    Up 5 minutes (healthy)     8080/tcp                                                  airflow-airflow-scheduler-1
e97d9eca40dd   airflow-airflow-worker            "/usr/bin/dumb-init …"   5 minutes ago    Up 5 minutes (unhealthy)   8080/tcp                                                  airflow-airflow-worker-1
8dac2db6105b   apache/airflow:latest-python3.11  "/usr/bin/dumb-init …"   52 minutes ago   Up 52 minutes (healthy)    8080/tcp, 0.0.0.0:8101->5555/tcp, :::8101->5555/tcp       airflow-flower-1

这是正常情况下的一整套airflow,对于flower是需要你单独启动的

知识点与小技巧

没有报异常,但你的dag就是不显示

如果那里都没有报异常,但dag就是不显示,那就查看docker日志,重点是scheduler

docker logs airflow-airflow-scheduler-1

如何当即加载dags

产生了文件变动就当即会被加载,比如“保存”一下

对于dag在执行的任务如何强制结束

设置状态为faili即可

PythonVirtualenvOperator的特性

  • PythonVirtualenvOperator,可以单独创建虚拟环境,并且将你的脚本单独的复制了一份。而不是直接执行你的代码
[2023-12-13, 07:50:43 UTC] {process_utils.py:182} INFO - Executing cmd: /tmp/venv3gux3z4y/bin/python /tmp/venv3gux3z4y/script.py /tmp/venv3gux3z4y/script.in /tmp/venv3gux3z4y/script.out /tmp/venv3gux3z4y/string_args.txt /tmp/venv3gux3z4y/termination.log
[2023-12-13, 07:50:43 UTC] {process_utils.py:186} INFO - Output:
[2023-12-13, 07:50:44 UTC] {process_utils.py:190} INFO - Traceback (most recent call last):
[2023-12-13, 07:50:44 UTC] {process_utils.py:190} INFO -   File "/tmp/venv3gux3z4y/script.py", line 30, in <module>
[2023-12-13, 07:50:44 UTC] {process_utils.py:190} INFO -     res = import_package(*arg_dict["args"], **arg_dict["kwargs"])
[2023-12-13, 07:50:44 UTC] {process_utils.py:190} INFO -           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2023-12-13, 07:50:44 UTC] {process_utils.py:190} INFO -   File "/tmp/venv3gux3z4y/script.py", line 26, in import_package
[2023-12-13, 07:50:44 UTC] {process_utils.py:190} INFO -     from src.clean.clean_product.AsinDetail2Recent30Day import AsinDetail2Recent30Day
[2023-12-13, 07:50:44 UTC] {process_utils.py:190} INFO - ModuleNotFoundError: No module named 'src'

File “/tmp/venv3gux3z4y/script.py”, line 30, in ,很明显可以看出,它是在/tmp目录下随机创建一个文件夹venv3gux3z4y,然后再去执行你的脚本

print打印添加到airflow的log里

代码中把sys.stdout添加到你的logger里

logger.add(sink=sys.stdout)

BashOperator和PythonOperator的python解释器区别

  • BashOperator的python编译器是用worker的airflow用户的python编译器:
  • PythonOperator是用的虚拟环境的python编辑器

清理环境

对于想完整清理掉airflow的安装,官方给出的建议

  • The docker-compose environment we have prepared is a “quick-start” one. It was not designed to be used in production and it has a number of caveats - one of them being that the best way to recover from any problem is to clean it up and restart from scratch.
    The best way to do this is to:
  • Run docker compose down --volumes --remove-orphans command in the directory you downloaded the docker-compose.yaml file
  • Remove the entire directory where you downloaded the docker-compose.yaml file rm -rf ‘’
  • Run through this guide from the very beginning, starting by re-downloading the docker-compose.yaml file

相互学习
在这里插入图片描述

  • 25
    点赞
  • 27
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是在Docker安装Apache Airflow的步骤: 1. 首先,您需要安装DockerDocker Compose。您可以在官方Docker文档中找到有关如何安装Docker的指南:https://docs.docker.com/install/ 2. 接下来,创建一个新的目录,用于存储Airflow相关文件。在该目录中创建一个名为docker-compose.yaml的文件,并将以下内容添加到文件中: ``` version: '3' services: postgres: image: postgres environment: POSTGRES_USER: airflow POSTGRES_PASSWORD: airflow POSTGRES_DB: airflow ports: - "5432:5432" volumes: - ./pgdata:/var/lib/postgresql/data webserver: image: apache/airflow:2.1.0 depends_on: - postgres environment: - LOAD_EX=n - EXECUTOR=Local - POSTGRES_USER=airflow - POSTGRES_PASSWORD=airflow - POSTGRES_DB=airflow volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./plugins:/opt/airflow/plugins ports: - "8080:8080" command: webserver scheduler: image: apache/airflow:2.1.0 depends_on: - postgres environment: - LOAD_EX=n - EXECUTOR=Local - POSTGRES_USER=airflow - POSTGRES_PASSWORD=airflow - POSTGRES_DB=airflow volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./plugins:/opt/airflow/plugins command: scheduler ``` 此文件定义了三个服务:一个PostgreSQL数据库服务,一个Web服务器服务和一个调度器服务。Web服务器服务和调度器服务都使用Apache Airflow 2.1.0的官方Docker镜像。 3. 在该目录中创建三个新的子目录:dags、logs和plugins。这些目录将分别用于存储您的DAG、日志和插件。 4. 运行以下命令启动Airflow: ``` docker-compose up ``` 此命令将下载所需的Docker镜像并启动三个服务。您可以使用以下URL访问Airflow Web界面:http://localhost:8080 5. 如果您想停止Airflow,请按CTRL-C并运行以下命令: ``` docker-compose down ``` 这将停止并删除所有相关容器和网络。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值