airflow搭建

1. 本地部署

1. 依赖

2. 升级Python3.7

        以 root 用户运行        

#!/bin/bash
# File: upgrade_python37.sh
# User: root
# Os: CentOS 7.9

# 1. Install required package
yum install -y gcc gcc-c++ python-devel openssl-devel zlib-devel readline-devel libffi-devel wget

# 2. Install Python-3.7
#wget https://www.python.org/ftp/python/3.7.10/Python-3.7.10.tar.xz
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/Python-3.7.10.tar.xz
tar xf Python-3.7.10.tar.xz
cd Python-3.7.10
#./configure --prefix=/usr/local/python37 --enable-optimizations --with-ssl --enable-loadable-sqlite-extensions
./configure --prefix=/usr/local/python37 --enable-optimizations --with-ssl
make -j 4
make install

# 3. Link python3.7 to python
unlink /usr/bin/python
ln -sv /usr/local/python37/bin/python3.7 /usr/bin/python

# 4. Add pip.conf file
#cat > /etc/pip.conf << EOF
#[global]
#trusted-host = pypi.douban.com
#index-url = http://pypi.douban.com/simple

#[list]
#format=columns
#EOF

cat > /etc/pip.conf << EOF
[global]
trusted-host = mirrors.aliyun.com
index-url = http://mirrors.aliyun.com/pypi/simple/

[list]
format=columns
EOF

# 5. Add local env
echo export PATH="/usr/local/python37/bin:$PATH" >> ~/.bashrc
source ~/.bashrc
pip3.7 install --upgrade pip==20.2.4 # fix https://github.com/apache/airflow/issues/12838

# 6. View version
python --version
pip3.7 --version 

注意事项:

由于将OS系统默认的Python版本更改了,导致系统自带的命令行工具(yum/ urlgrabber-ext-down/ yum-config-manager)无法直接使用,需要做更改才行

"""
1. vi /usr/bin/yum
2. vi /usr/libexec/urlgrabber-ext-down
3. vi /usr/bin/yum-config-manager
"""

3. 部署MySQL 5.7数据库

1. 安装

#!/bin/bash
# File: install_mysql57.sh
# User: root
# Os: CentOS 7.9
# Reference: https://tecadmin.net/install-mysql-5-7-centos-rhel/

# 1. Install yum source
yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm

# 2. Install mysql
#yum install -y mysql-community-server
#yum localinstall *.rpm # yum install --downloadonly --downloaddir=./ mysql-community-server
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/mysql-yum-57.tar.gz
tar xf mysql-yum-57.tar.gz
cd mysql-yum-57
yum localinstall -y *.rpm

2. 启动数据库

# 1. Start mysql
systemctl start mysqld.service

# 2. view mysql login password
grep 'A temporary password' /var/log/mysqld.log |tail -1

# 3. set secure option
/usr/bin/mysql_secure_installation

# 4. view version
mysql -V

"""
echo explicit_defaults_for_timestamp=1 >> /etc/my.cnf
systemctl restart mysqld.service
"""

3. 创建数据库

> mysql -uroot -p<xxx>
set global validate_password_policy=LOW;
set global validate_password_length=6;
alter user user() identified by "123456";
CREATE DATABASE `airflow` /*!40100 DEFAULT CHARACTER SET utf8 */;
CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow12345678';
GRANT ALL ON airflow.* TO 'airflow_user'@'localhost';
FLUSH PRIVILEGES;

4. 部署Redis 6.x数据库

1. 安装

# 1. Install remi yum repo
yum install -y epel-release yum-utils
yum install -y http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum-config-manager --enable remi

# 2. Install redis latest version
yum install -y redis

2. 配置

# vi /etc/redis.conf
bind 0.0.0.0

3. 启动

# 1. Start redis
systemctl start redis && systemctl enable redis
systemctl status redis

# 2. View redis
ps -ef |grep redis

# 3. Test
redis-cli ping

# 4. View version
redis-cli --version

5. 部署airflow

1. 安装

# 1. Set env
export AIRFLOW_HOME=~/airflow

# 2. Install apache-airflow 2.1.0
AIRFLOW_VERSION=2.1.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.1.0/constraints-3.6.txt # 有可能网络不通,见<<3. 注意事项>>
pip3.7 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

2. 初始化数据库

# 1. Set up database
## https://airflow.apache.org/docs/apache-airflow/2.1.0/howto/set-up-database.html#
pip3.7 install pymysql
airflow config get-value core sql_alchemy_conn  # Error, but create ~/airflow directory

# 2. Initialize the database
"""
# vi ~/airflow/airflow.cfg
[core]
sql_alchemy_conn = mysql+pymysql://airflow_user:airflow12345678@localhost:3306/airflow
"""

airflow db init

"""
...

Initialization done
"""

3. 创建用户

# Create superuser
airflow users create \
    --username admin \
    --firstname zheng \
    --lastname yansheng \
    --role Admin \
    --email zhengyansheng@gmail.com

4. 启动服务

# start the web server, default port is 8080 -D
airflow webserver --port 8080 

# start the scheduler
# open a new terminal or else run webserver with ``-D`` option to run it as a daemon
airflow scheduler

# visit localhost:8080 in the browser and use the admin account you just
# created to login. Enable the example_bash_operator dag in the home page

5. 管理后台

Web admin

 Dashboard

 6. 分布式部署

1. 安装

pip install 'apache-airflow[celery]'
pip install celery[redis]

2. 设置executor

[core]
# The executor class that airflow should use. Choices include
# ``SequentialExecutor``, ``LocalExecutor``, ``CeleryExecutor``, ``DaskExecutor``,
# ``KubernetesExecutor``, ``CeleryKubernetesExecutor`` or the
# full import path to the class when using a custom executor.
# executor = SequentialExecutor
executor = CeleryExecutor


[celery]
# broker_url = redis://redis:6379/0
broker_url = redis://localhost:6379/0

# result_backend = db+postgresql://postgres:airflow@postgres/airflow
result_backend = redis://localhost:6379/0

3. 启动

# 1. Start webserver
airflow webserver -p 8000

# 2. Start scheduler
airflow scheduler

# 3. Start celery worker
airflow celery worker

# 4. Start celery flower
airflow celery flower

 

4. 管理页面

Webserver

flower

7. 演示

启动Dag

 

5. HA 安装airflow

1master1worker总体安装脚本示例 

#####airflow高可用搭建(airflow  version 2.4.0   )
#####前提:纯净centos7系统
# User: root
# Os: CentOS 7.9
#workdir: /server/

#安装airflow需要的系统依赖
#yum install -y  mysql-devel   gcc gcc-c++  gcc-devel  python-devel  openssl-devel zlib-devel readline-devel libffi-devel wget  cyrus-sasl-lib   python3-devel    cyrus-sasl cyrus-sasl-devel   libffi-devel   yum-utils

#一、升级python为python3

# 1. Install required package
yum install -y  mysql-devel   gcc gcc-c++  gcc-devel  python-devel  openssl-devel zlib-devel readline-devel libffi-devel wget  cyrus-sasl-lib   python3-devel    cyrus-sasl cyrus-sasl-devel   libffi-devel  vim  yum-utils

# 2. Install Python-3.7
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/Python-3.7.10.tar.xz
tar xf Python-3.7.10.tar.xz
cd Python-3.7.10
./configure --prefix=/usr/local/python37 --enable-optimizations --with-ssl
make -j 4
make install

# 3. Link python3.7 to python
unlink /usr/bin/python
ln -sv /usr/local/python37/bin/python3.7 /usr/bin/python

# 4. Add pip.conf file
cat > /etc/pip.conf << EOF
[global]
trusted-host = mirrors.aliyun.com
index-url = http://mirrors.aliyun.com/pypi/simple/
[list]
format=columns
EOF

# 5. Add local env
echo export PATH="/usr/local/python37/bin:$PATH" >> ~/.bashrc
source ~/.bashrc
pip3.7 install --upgrade pip==20.2.4 # fix https://github.com/apache/airflow/issues/12838

# 6. View version
python --version
pip3.7 --version

注意事项:

由于将OS系统默认的Python版本更改了,导致系统自带的命令行工具(yum/ urlgrabber-ext-down/ yum-config-manager)无法直接使用,需要做更改才行  python >> python2.7
"""
1. vi /usr/bin/yum
2. vi /usr/libexec/urlgrabber-ext-down
3. vi /usr/bin/yum-config-manager
"""

##二、部署MySQL 5.7数据库
# 1. Install yum source
yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm

# 2. Install mysql
#yum install -y mysql-community-server
#yum localinstall *.rpm # yum install --downloadonly --downloaddir=./ mysql-community-server
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/mysql-yum-57.tar.gz
tar xf mysql-yum-57.tar.gz
cd mysql-yum-57
yum localinstall -y *.rpm

#3.install
sudo yum install mysql-server --nogpgcheck

#4.start mysql
    # 1. Start mysql
    systemctl start mysqld.service

    # 2. view mysql login password
    grep 'A temporary password' /var/log/mysqld.log |tail -1

    # 3. set secure option
    /usr/bin/mysql_secure_installation

    # 4. view version
    mysql -V

    echo explicit_defaults_for_timestamp=1 >> /etc/my.cnf
    systemctl restart mysqld.service

#5创建数据库
> mysql -uroot -p<xxx>
set global validate_password_policy=LOW;
set global validate_password_length=6;
alter user user() identified by "123456";
CREATE DATABASE `airflow` /*!40100 DEFAULT CHARACTER SET utf8 */;
CREATE USER 'airflow_user'@'localhost' IDENTIFIED BY 'airflow12345678';
GRANT ALL ON airflow.* TO 'airflow_user'@'localhost';
#可供其他worker访问连接数据库
CREATE USER 'airflow_user'@'%' IDENTIFIED BY 'airflow12345678';
GRANT ALL ON airflow.* TO 'airflow_user'@'%';
FLUSH PRIVILEGES;

#三、部署Redis 6.x数据库
# 1. Install remi yum repo
yum install -y epel-release yum-utils
yum install -y http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum-config-manager --enable remi

# 2. Install redis latest version
yum install -y redis

#3.配置
# vi /etc/redis.conf
    bind 0.0.0.0
    #redis的保护模式关闭
    # 受保护模式, 默认是开启的
    # protected-mode yes
    protected-mode no
#4.启动
    # 1. Start redis
    systemctl start redis && systemctl enable redis
    systemctl status redis

    # 2. View redis
    ps -ef |grep redis

    # 3. Test
    redis-cli ping

    # 4. View version
    redis-cli --version

#四、安装airflow2.4.0
# Airflow needs a home. `~/airflow` is the default, but you can put it
# somewhere else if you prefer (optional)
export AIRFLOW_HOME=~/airflow

# Install Airflow using the constraints file
AIRFLOW_VERSION=2.4.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
# For example: 3.7
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.4.0/constraints-3.7.txt
pip install "apache-airflow==${AIRFLOW_VERSION}"

#五、初始化数据库
# 1. Set up database
## https://airflow.apache.org/docs/apache-airflow/2.1.0/howto/set-up-database.html#
pip3.7 install pymysql
airflow config get-value core sql_alchemy_conn  # Error, but create ~/airflow directory

# 2. Initialize the database
"""
# vi ~/airflow/airflow.cfg
[core]
executor = CeleryExecutor

sql_alchemy_conn = mysql+pymysql://airflow_user:airflow12345678@localhost:3306/airflow

[celery]
# broker_url = redis://redis:6379/0
broker_url = redis://localhost:6379/0

# result_backend = db+postgresql://postgres:airflow@postgres/airflow
result_backend = redis://localhost:6379/0

"""

airflow db init

"""
...
Initialization done
"""
五、1 部署高可用需要安装celery(master、worker节点都需要安装)
pip3 install 'apache-airflow[mysql]'
pip3 install 'apache-airflow[celery]'
pip3 install 'apache-airflow[redis]'

#六、创建用户
# Create superuser
airflow users create \
    --username admin \
    --firstname admin \
    --lastname admin \
    --role Admin \
    --email admin@admin.com

#七、启动服务(master)
    # 1. Start webserver
    #airflow webserver -p 8000   #default port 8000
    airflow webserver

    # 2. Start scheduler
    airflow scheduler

    # 3. Start celery worker
    airflow celery worker

    # 4. Start celery flower
    airflow celery flower

#八、配置airflow.cfg并启动服务(worker  内存最少5G)
## worker 安装的对应airflow版本和依赖与master相同
##airflow.cfg  $MASTER_IP 为主ip(因为主上安装的mysql和redis,若mysql和redis安装其他机器,填其他机器ip即可)
    """
    # vi ~/airflow/airflow.cfg
    [core]
    executor = CeleryExecutor

    sql_alchemy_conn = mysql+pymysql://airflow_user:airflow12345678@$MASTER_IP:3306/airflow

    [celery]
    # broker_url = redis://redis:6379/0
    broker_url = redis://$MASTER_IP:6379/0

    # result_backend = db+postgresql://postgres:airflow@postgres/airflow
    result_backend = redis://$MASTER_IP:6379/0

    """

#Start celery worker
airflow celery worker
5.1 检查master 与worker是否正常
ps -aux |grep airflow 
ps -axu | grep celeryd 
ps -axu | grep webserver
ps -axu | grep scheduler
ps -axu | grep flower

master 节点: 

airflow 

celeryd 

 webserver

scheduler 

 flower

 worker节点:

5.2 kill   airflow进程 
ps -axu | grep celeryd | awk '{print $2}' | xargs kill -9

ps -axu | grep webserver | awk '{print $2}' | xargs kill -9

ps -axu | grep scheduler | awk '{print $2}' | xargs kill -9

ps -axu | grep flower | awk '{print $2}' | xargs kill -9

 注意事项

1. 本地部署

pip 版本

# https://github.com/apache/airflow/issues/12838
# pip和airflow 2.x兼容性问题,要降级pip的版本,否则在通过pip安装时有异常
pip3.7 install --upgrade pip==20.2.4

pip.conf

pip 建议用 mirrors.aliyun.com 阿里云的源头,如果用 豆瓣的 mirrors.douban.com 的会出现安装时版本依赖的问题。

https://raw.githubusercontent.com/ 网络不通

# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.1.0/constraints-3.6.txt # 有可能网络不通,见<<3. 注意事项>>
pip3.7 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

# 如果上个命令网络不通,导致无法反问 raw.githubusercontent.com 可以替换以下命令
pip3.7 install "apache-airflow==${AIRFLOW_VERSION}" --constraint https://zhengyansheng.oss-cn-beijing.aliyuncs.com/constraints-3.7.txt

2.故障问题

2.1执行airflow db init失败 Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql

 File "/usr/python3.7/lib/python3.7/site-packages/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py", line 44, in upgrade
    raise Exception("Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql")
Exception: Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
解决方法:
进入mysql airflow 数据库,设置global explicit_defaults_for_timestamp

SHOW GLOBAL VARIABLES LIKE '%timestamp%';
SET GLOBAL explicit_defaults_for_timestamp =1;
mysql> SHOW GLOBAL VARIABLES LIKE '%timestamp%';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| explicit_defaults_for_timestamp | OFF   |
| log_timestamps                  | UTC   |
+---------------------------------+-------+
2 rows in set (0.00 sec)

mysql> SET GLOBAL explicit_defaults_for_timestamp =1;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW GLOBAL VARIABLES LIKE '%timestamp%';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| explicit_defaults_for_timestamp | ON    |
| log_timestamps                  | UTC   |
+---------------------------------+-------+
2 rows in set (0.00 sec)

2.2执行ariflow相关命令报错 error: sqlite C library version too old (< {min_sqlite_version}). 

详细报错如下:

Traceback (most recent call last):

  File "/usr/python3.7/bin/airflow", line 5, in <module>

    from airflow.__main__ import main

  File "/usr/python3.7/lib/python3.7/site-packages/airflow/__init__.py", line 34, in <module>

    from airflow import settings

  File "/usr/python3.7/lib/python3.7/site-packages/airflow/settings.py", line 35, in <module>

    from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401

  File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 1114, in <module>

    conf.validate()

  File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 202, in validate

    self._validate_config_dependencies()

  File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 243, in _validate_config_dependencies

    f"error: sqlite C library version too old (< {min_sqlite_version}). "

airflow.exceptions.AirflowConfigException: error: sqlite C library version too old (< 3.15.0). See https://airflow.apache.org/docs/apache-airflow/2.1.1/howto/set-up-database.rst#setting-up-a-sqlite-database

原因: airflow默认使用sqlite作为metastore,但我们使用的是mysql,实际上用不到sqlite

解决方案:修改{AIRFLOW_HOME}/airflow.cfg,
将元数据库信息sql_alchemy_conn修改为

sql_alchemy_conn = mysql+pymysql://airflow:yourpassword@hostname:3306/airflow`

2.3 安装airflow包错误

 出现“xxx setup command: use_2to3 is invalid.”错误是因为 setuptools在构建期间删除了对 use_2to3 的支持。要解决该错误,请setuptools在安装软件包之前将您的版本固定为 57.5.0。

版本 58.0.0中 发生了 重大变化。
该setuptools软件包在构建期间删除了对 use_2to3 的支持。

您可以通过将您的setuptools版本固定到之前的最后一个版本来解决错误58.0.0。

打开终端并运行以下命令。

pip install "setuptools<58.0"
pip3 install "setuptools<58.0"

python -m pip install "setuptools<58.0"
python3 -m pip install "setuptools<58.0"
py -m pip install "setuptools<58.0"

重新安装即可成功。

2.4 

error: sqlite C library version too old
	yum update
	yum install sqlite
从https://sqlite.org/下载源代码,在本地制作并安装。
1)下载源码
[root@stg-airflow001 ~]$ wget https://www.sqlite.org/2019/sqlite-autoconf-3290000.tar.gz

2) 编译
[root@stg-airflow001 ~]$ tar zxvf sqlite-autoconf-3290000.tar.gz 
[root@stg-airflow001 ~]$ cd sqlite-autoconf-3290000/
[root@stg-airflow001 ~/sqlite-autoconf-3290000]$ ./configure --prefix=/usr/local
[root@stg-airflow001 ~/sqlite-autoconf-3290000]$ make && make install

3)替换系统低版本 sqlite3
[root@stg-airflow001 ~/sqlite-autoconf-3290000]$ cd 
[root@stg-airflow001 ~]$ mv /usr/bin/sqlite3  /usr/bin/sqlite3_old
[root@stg-airflow001 ~]$ ln -s /usr/local/bin/sqlite3   /usr/bin/sqlite3
[root@stg-airflow001 ~$ echo "/usr/local/lib" > /etc/ld.so.conf.d/sqlite3.conf
[root@stg-airflow001 ~]$ ldconfig
[root@stg-airflow001 ~]$ sqlite3 -version
3.29.0 2019-07-10 17:32:03 fc82b73eaac8b36950e527f12c4b5dc1e147e6f4ad2217ae43ad82882a88bfa6

AirFlow_使用_sqlite c library version too old-CSDN博客

2.5  worker 执行task日志报错

 Failed to fetch log file from worker. [Errno -2] Name or service not known
     机器的ip/hostname映射都在/etc/hosts文件中加上

2.6 

启动运行一段时间发现网站就起不来了,报错日志满了,翻查日志目录,发现里面的schedule目录里的日志达到44G,清理了以后就好了,为了避免再次发生,调整日志等级

vi airflow.cfg
设置日志等级,等级信息参考[logging level级别]
然后重启服务

[core]
#logging_level = INFO
logging_level = WARNING

NOTSET < DEBUG < INFO < WARNING < ERROR < CRITICAL

如果把log的级别设置为INFO, 那么小于INFO级别的日志都不输出,
大于等于INFO级别的日志都输出。也就是说,日志级别越高,
打印的日志越不详细。默认日志级别为WARNING。

注意: 如果将logging_level改为WARNING或以上级别,
则不仅仅是日志,命令行输出明细也会同样受到影响,也只会输出大于等于指定级别的信息,
所以如果命令行输出信息不全且系统无错误日志输出,那么说明是日志级别过高导致的。

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Airflow汉化指的是将开源工作流编排工具Airflow的界面和相关文档进行翻译为中文。Airflow是一个由Apache基金会孵化的项目,它提供了一种可视化的方式来编排、调度和监控数据处理任务。 汉化Airflow有以下几个主要的原因和优势: 1. 提升用户体验:将Airflow界面和相关文档翻译为中文,可以提升中国用户的使用体验和学习效果。对于不熟悉英文的用户来说,使用母语进行操作可以让他们更容易理解和掌握工具的功能和特性。 2. 方便本地化部署:随着中国数据处理领域的迅速发展,越来越多的企业和组织开始采用Airflow进行工作流的管理。汉化Airflow使得本地化部署更加方便,能够更好地满足国内用户的需求。 3. 促进社区发展:开源软件的发展需要全球开发者的参与,而Airflow的汉化可以吸引更多中文用户参与到社区中来。他们可以贡献代码、提交BUG、提供反馈等,为Airflow的改进和完善做出贡献。 4. 推广和普及:随着Airflow在中国的使用越来越广泛,汉化可以更好地推广和普及这个工作流编排工具。通过本土化的努力,能够让更多用户了解和尝试使用Airflow,进而提升其影响力和社区规模。 总之,Airflow的汉化对于提升用户体验、方便本地化部署、促进社区发展和推广普及都有着重要的作用。希望越来越多的人参与到Airflow的汉化工作中来,共同推动这个开源工作流编排工具的发展。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值