Airflow 是Airbnb公司开源的,是用 Python 实现的任务管理、调度、监控工作流的平台。因依调度赖于 crontab ,Airflow 目前只支持在安装在 Linux 系统平台。Airflow 可以用做 data pipeline(ETL)、 消息队列等监控。
Mysql数据库安装(将作为元数据库):
- # yum install gcc libffi-devel python-devel openssl-devel
- # 创建相关数据库及账号
- mysql> create database airflow default charset utf8 collate utf8_general_ci;
- mysql> create user airflow@'localhost' identified by 'airflow';
- mysql> grant all on airflow.* to airflow@'localhost';
- mysql> flush privileges;
Airflow 安装配置:
- # 配置 airflow 的 home 目录
- > mkdir -p /usr/local/airflow/{dags,logs,plugins}
- # echo "export AIRFLOW_HOME=/usr/local/airflow" >> /etc/profile
- > source /etc/profile
- # 安装 airflow
- > pip install airflow
- # 配置元数据库
- > vi /usr/local/airflow/airflow.cfg
- # dialect+driver://username:password@host:port/database
- sql_alchemy_conn = mysql://airflow:airflow@localhost:3306/airflow
- # 注意:使用的mysql socket路径为:socket=/var/lib/mysql/mysql.sock
- # 初始化元数据库连接(默认sqlite)
- > airflow initdb
- # 启动web服务(不指定端口时默认端口:8080)
- > airflow webserver -p 8080
- # 添加防火墙规则或停止防火墙
- > systemctl stop firewalld.service
- # 远程打开管理窗口
- http://192.168.40.10:8080/admin/
Airflow 服务管理:
- # 安装进程管理工具Supervisord管理airflow进程
- > easy_install supervisor
- > echo_supervisord_conf > /etc/supervisord.conf
- # 编辑文件supervisord.conf,添加启动命令
- > vi /etc/supervisord.conf
- [program:airflow_web]
- command=/usr/bin/airflow webserver -p 8080
- [program:airflow_scheduler]
- command=/usr/bin/airflow scheduler
- # 启动supervisord服务
- /usr/bin/supervisord -c /etc/supervisord.conf
- #此时可以用 supervisorctl 来管理airflow服务了
- supervisorctl start airflow_web
- supervisorctl stop airflow_web
- supervisorctl restart airflow_web
- supervisorctl stop all
安全认证
- # 添加密码模块
- > pip install airflow[password]
- # 启用访问认证
- > vim /usr/local/airflow/airflow.cfg
- [webserver]
- authenticate = true
- auth_backend = airflow.contrib.auth.backends.password_auth
- # 在 python 中执行添加账户:
- import airflow
- from airflow import models, settings
- from airflow.contrib.auth.backends.password_auth import PasswordUser
- user = PasswordUser(models.User())
- user.username = 'afuser'
- user.email = 'afuser@example.com'
- user.password = 'afuser'
- session = settings.Session()
- session.add(user)
- session.commit()
- session.close()
- exit()
- # 重启 airflow_web 服务
- > supervisorctl restart airflow_web