Apache Airflow 安装

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/zpf336/article/details/86482036

安装环境:

ubuntu 14.04

root:root用户

darren:普通用户(有sudo权限)

安装:

第一步:安装python

sudo apt-get update
sudo apt-get install python3

如果已经安装过python3,可以跳过此步。

第二步:配置软连接

sudo ln -s /usr/bin/python3.4 /usr/bin/python

使用默认源安装完python3是3.4,创建一个软连接, 然后使用python -V查看版本信息:

darren@ubuntu:~$ python -V
Python 3.4.3

第三步:安装pip

sudo apt-get install python3-pip

之后会在/usr/bin下生成pip3可执行文件,可以使用pip3 -V 查看版本信息:

darren@ubuntu:~$ /usr/bin/pip3 -V
pip 1.5.4 from /usr/lib/python3/dist-packages (python 3.4)

可是这个版本太低了,安装airflow会有很多问题,所以要升级pip

第四步:pip升级

sudo /usr/bin/pip3 install --upgrade pip

升级完成之后你会发现

darren@ubuntu:~$ /usr/bin/pip3 -V
pip 1.5.4 from /usr/lib/python3/dist-packages (python 3.4)

版本依然没有变化,可是都提示升级成功了,这是为什么? 这是因为升级后的pip3放到了/usr/local/bin下

darren@ubuntu:~$ /usr/local/bin/pip3 -V
pip 18.1 from /usr/local/lib/python3.4/dist-packages/pip (python 3.4)

此时的版本是18.1,然后可以安装airflow了。

第五步:安装airflow

# exchange root user
su root
# airflow needs a home, ~/airflow is the default,
# but you can lay foundation somewhere else if you prefer
# (optional)
export AIRFLOW_HOME=/home/darren/airflow
# set GPL dependency
export SLUGIFY_USES_TEXT_UNIDECODE=yes
# install from pypi using pip
/usr/local/bin/pip3 install apache-airflow

安装过程中可能会出现如下错误:

Unknown distribution option: 'python_requires'

此时可以安装setuptools

/usr/local/bin/pip3 install --upgrade setuptools

安装升级成功后再执行如下命令,继续安装

/usr/local/bin/pip3 install apache-airflow

然后可能会出现新的问题:

thrift 0.11.0 has requirement six>=1.7.2, but you'll have six 1.5.2 which is incompatible.
tenacity 4.8.0 has requirement six>=1.9.0, but you'll have six 1.5.2 which is incompatible.
html5lib 1.0.1 has requirement six>=1.9, but you'll have six 1.5.2 which is incompatible.

意思是说six的版本低了,这个好办,升级six

/usr/local/bin/pip3 install six --upgrade --ignore-installed six

再次重试安装,又可能会报错误:

Cannot uninstall 'colorama'. It is a distutils installed project and thus we cannot accurately determine 
which files belong to it which would lead to only a partial uninstall.

解决方法如下:

find / -name colorama
# 我搜索之后的路径是
#/usr/lib/python3/dist-packages/colorama
#/usr/lib/python3/dist-packages/colorama-0.2.5.egg-info
#第一行是个文件夹,第二行是个文件,删除他们
rm -r colorama
rm colorama-0.2.5.egg-info

就可以解决这个问题,然后重试安装,如果还有类似的问题,同样的方法解决。

安装完成后可以在AIRFLOW_HOME目录下看到如下信息

-rw-rw-r-- 1 darren darren  20738 Jan 14 09:34 airflow.cfg
-rw-r--r-- 1 darren darren 105472 Jan 14 15:05 airflow.db
drwxrwxr-x 5 darren darren   4096 Jan 14 10:04 logs/
-rw-rw-r-- 1 darren darren   2304 Jan 14 09:34 unittests.cfg

从上到下分别是配置文件,数据文件,日志文件夹,单元测试配置文件

第六步:初始化数据库

# initialize the database
airflow initdb

Airflow默认使用SQLite数据库,下次补充介绍如何使用MySQL数据库

第七步: 安装MySQL

sudo apt-get install mysql-server mysql-client

配置用户和创建数据库

新建用户
CREATE USER airflow;
新建数据库
CREATE DATABASE airflow;
给权限
GRANT all privileges on airflow.* TO 'airflow'@'%' IDENTIFIED BY 'airflow';
GRANT all privileges on airflow.* TO 'airflow'@'localhost' IDENTIFIED BY 'airflow';
GRANT all privileges on airflow.* TO 'airflow'@'127.0.0.1' IDENTIFIED BY 'airflow';
刷新
flush privileges;

由于我的系统是Ubuntu14.04, 所以自带的mysql的版本是5.5.62

修改airflow.cfg,默认使用的是SQLite数据库

#sql_alchemy_conn = sqlite:////home/darren/airflow/airflow.db
sql_alchemy_conn = mysql://airflow:airflow@192.168.137.1:3306/airflow

然后重新执行

# initialize the database
airflow initdb

可能会遇到如下错误

ModuleNotFoundError: No module named 'MySQLdb'

这是因为你只安装了mysql数据库,但是没有安装python访问数据库的驱动程序,还需要安装如下程序:

sudo /usr/local/bin/pip3 install pymysql
sudo apt-get install libmysqlclient-dev
sudo /usr/local/bin/pip3 install mysqlclient

再次执行数据库初始化命令,可能会有如下信息

sqlalchemy.exc.ProgrammingError: (MySQLdb._exceptions.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(6) NULL' at line 1") [SQL: 'ALTER TABLE dag MODIFY last_scheduler_run DATETIME(6) NULL'] (Background on this error at: http://sqlalche.me/e/f405)

这个就比较麻烦了,查了资料说是mysql版本太低了,需要升级到5.7以上,参考(https://blog.csdn.net/u013525058/article/details/81188175

所以建议使用Ubuntu16.04,或Ubuntu18.04版本安装,中间能省不少时间

这里有两种方案,升级MySQL,时间花费不少,在本机安装MySQL,可行,重新安装一台高版本的虚拟机安装MySQL,也可行,选择哪种方案,自己决定就行

解决完版本问题之后,再试一次初始化,可能会遇到如下问题:

    run_migrations_online()
  File "/home/darren/program/python3.6.5/lib/python3.6/site-packages/airflow/migrations/env.py", line 86, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/home/darren/program/python3.6.5/lib/python3.6/site-packages/alembic/runtime/environment.py", line 807, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/home/darren/program/python3.6.5/lib/python3.6/site-packages/alembic/runtime/migration.py", line 321, in run_migrations
    step.migration_fn(**kw)
  File "/home/darren/program/python3.6.5/lib/python3.6/site-packages/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py", line 46, in upgrade
    raise Exception("Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql")
Exception: Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql

需要在MySQL配置文件中配置如下信息

[mysqld]
explicit_defaults_for_timestamp = 1

参考(https://blog.csdn.net/qq_29719097/article/details/83577021

再次初始化,最终成功

第八步:启动服务

# start the web server, default port is 8080
airflow webserver -p 8080

打开浏览器,访问host://8080, host是你安装airflow的主机地址

 

看到如下界面,安装完成。

参考:

https://www.jianshu.com/p/16b5aa09b67c

https://stackoverflow.com/questions/8295644/pypi-userwarning-unknown-distribution-option-install-requires

https://www.jianshu.com/p/28e2ae6fbd75

https://blog.csdn.net/Stu_Li/article/details/80065711

https://blog.csdn.net/wzyaiwl/article/details/81263419

展开阅读全文

没有更多推荐了,返回首页