airflow部署文档
项目介绍
1.本项目使用docker部署节点 并且是用源代码部署版本2.6.3,下面是结点分布。
节点IP | 节点名称 | 节点角色 | 节点服务 |
---|---|---|---|
172.20.0.21 | airflow_node1 | master | webserver,scheduler |
172.20.0.22 | airflow_node2 | worker1 | worker |
172.20.0.23 | airflow_node3 | worker2 | worker |
项目安装步骤
1.docker-ce
略
2.从apache上下载源码文件和whl文件。
源码文件地址:
https://archive.apache.org/dist/airflow/2.6.3/apache-airflow-2.6.3-source.tar.gz
whl文件地址:
https://archive.apache.org/dist/airflow/2.6.3/apache_airflow-2.6.3-py3-none-any.whl
sdist文件地址:
https://archive.apache.org/dist/airflow/2.6.3/apache-airflow-2.6.3.tar.gz
3.启动一个docker。
[root@node06 ~]: docker run -it -p 13131:8080 --name airflow_node1 ubuntu
3.在服务器中安装所需环境
[root@airflow_node1 ~]: apt install build-essential python3-dev libsqlite3-dev openssl \
sqlite default-libmysqlclient-dev libmysqlclient-dev postgresql
[root@04bd0bfd277b ~]: pg_ctlcluster 12 main start
4.安装python虚拟环境。
[root@04bd0bfd277b ~]: apt install python3.8-venv
5.创建python虚拟环境airflow_venv,并激活环境
[root@04bd0bfd277b ~]: python3 -m venv /home/airflow_venv
[root@04bd0bfd277b ~]: . /home/airflow_venv/bin/activate
6.安装依赖库并安装npm。
(airflow_venv) [root@04bd0bfd277b ~]: cd /home/airflow
(airflow_venv) [root@04bd0bfd277b home]: tar -xvf apache-airflow-2.6.3-source.tar.gz
(airflow_venv) [root@04bd0bfd277b home]: mv apache-airflow-2.6.3 airflow
(airflow_venv) [root@04bd0bfd277b airflow]: pip install /home/apache_airflow-2.6.3-py3-none-any.whl
(airflow_venv) [root@04bd0bfd277b airflow]: python setup.py install
(airflow_venv) [root@04bd0bfd277b airflow]: pip install .
–下载node20.11.0版本,并且解压此文件然后配置虚拟环境的变量
(airflow_venv) [root@airflow_node1 home]: tar -xvf node-v20.11.1-linux-x64.tar.xz
(airflow_venv) [root@airflow_node1 home]: mv node-v20.11.1-linux-x64 node
(airflow_venv) [root@airflow_node1 home]: vi /home/airflow_venv/bin/activate
export AIRFLOW_HOME="/home/airflow"
export PATH="/home/node/bin:$PATH"
(airflow_venv) [root@airflow_node1 home]: . /home/airflow_venv/bin/activate
7.编译前端文件
(airflow_venv) [root@airflow_node1 home]: npm -g install yarn
(airflow_venv) [root@airflow_node1 home]: python setup.py compile_assets
8.处理数据库以及redis节点
– 修改数据库的my.cnf文件
[root@node6 ~]: docker run -dit --name=mysql_airflow -p 14141:3306 -e MYSQL_ROOT_PASSWORD=123456 mysql:8.0
[root@mysql_airflow ~]: vi /etc/mysql/my.cnf
[mysqld]
explicit_defaults_for_timestamp=1
– 查看是否配置成功
mysql> show variables like 'explicit_defaults_for_timestamp';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| explicit_defaults_for_timestamp | ON |
+---------------------------------+-------+
1 row in set (0.01 sec)
– 在宿主机中重启docker
[root@node06 ~]: docker restart mysql_airflow
– 给数据库添加数据库和用户
mysql> CREATE DATABASE airflow CHARACTER SET utf8;
mysql> create user 'airflow'@'%' identified by '123456';
mysql> grant all privileges on airflow.* to 'airflow'@'%';
mysql> flush privileges;
– 处理redis节点
[root@node06 ~]: docker run -it --name airflow_redis redis bash
9.给docker集群配置网络
[root@node06 ~]: docker network create --subnet=172.20.0.0/16 airflow_net
[root@node06 ~]: docker network connect --ip 172.20.0.21 airflow_net airflow_node1
[root@node06 ~]: docker network connect --ip 172.20.0.22 airflow_net airflow_node2
[root@node06 ~]: docker network connect --ip 172.20.0.23 airflow_net airflow_node3
[root@node06 ~]: docker network connect --ip 172.20.0.24 airflow_net airflow_redis
[root@node06 ~]: docker network connect --ip 172.20.0.25 airflow_net mysql_airflow
10.修改airflow.cfg,node1,node2,node3都是一样的配置文件
(airflow_venv) [root@airflow_node1 home]: vi /home/airflow/airflow.cfg
#请不要直接复制,找到匹配项在填
[core]
dags_folder = /home/airflow/dags
default_timezone = Asia/Shanghai
executor = CeleryExecutor
sql_alchemy_conn=postgresql+psycopg2://username:password@ip:5432/airflow_new
[webserver]
default_ui_timezone = Asia/Shanghai
[celery]
broker_url = redis://airflow_redis:6379/0
result_backend = db+mysql://root:password@ip:3306/airflow_new?use_unicode=true&charset=utf8
11.初始化数据库
[root@airflow_node1 ~]: airlflow db init
12.创建用户
[root@airflow_node1 ~]: airflow users create \
--username airflow \
--firstname airflow \
--lastname airflow \
--role Admin \
--email xx@qq.com
13.启动airflow集群。
– 安装集群启动模块
(airflow_venv) [root@airflow_node1 airflow]: pip install celery==4.4.7 flower==0.9.7 redis==3.5.3
(airflow_venv) [root@airflow_node2 airflow]: pip install celery==4.4.7 flower==0.9.7 redis==3.5.3
(airflow_venv) [root@airflow_node3 airflow]: pip install celery==4.4.7 flower==0.9.7 redis==3.5.3
– 启动集群
(airflow_venv) [root@airflow_node1 airflow]: airflow webserver
airflow scheduler
(airflow_venv) [root@airflow_node2 airflow]: airflow celery worker
(airflow_venv) [root@airflow_node3 airflow]: airflow celery worker
14.完成安装
可能遇到的问题
moudle pendumlum not found
(airflow_venv) [root@airflow_node1 airflow]: pip install pendulum==2.1.2
No module named ‘MySQLdb’
(airflow_venv) [root@airflow_node1 airflow]: pip install mysqlclient==2.1.1
ModuleNotFoundError: No module named 'connexion.decorators.validation
(airflow_venv) [root@airflow_node1 airflow]: pip install connexion==2.14.2
TypeError: SqlAlchemySessionInterface.init() missing 6 required positional arguments: ‘sequence’, ‘schema’, ‘bind_key’, ‘use_signer’, ‘permanent’, and ‘sid_length’
(airflow_venv) [root@airflow_node1 airflow]: pip install Flask-Session==0.5.0
Exception: Can not find valid pkg-config name.Specify MYSQLCLIENT_CFLAGS and MYSQLCLIENT_LDFLAGS env vars manually
(airflow_venv) [root@airflow_node1 airflow]: pip install mysqlclient==2.1.1