关闭

[AirFlow]AirFlow使用指南一 安装与启动

标签: AirFlow工作流调度平台
1478人阅读 评论(5) 收藏 举报
分类:

1. 安装

通过pip安装:

xiaosi@yoona:~$ pip install airflow

如果速度比较慢,可以使用下面提供的源进行安装:

xiaosi@yoona:~$ pip install -i https://pypi.tuna.tsinghua.edu.cn/simple airflow

如果出现下面提示,表示你的airflow安装成功了:

Successfully installed airflow alembic croniter dill flask flask-admin flask-cache flask-login flask-swagger flask-wtf funcsigs future gitpython gunicorn jinja2 lxml markdown pandas psutil pygments python-daemon python-dateutil python-nvd3 requests setproctitle sqlalchemy tabulate thrift zope.deprecation Mako python-editor click itsdangerous Werkzeug wtforms PyYAML ordereddict gitdb2 MarkupSafe pytz numpy docutils setuptools lockfile six python-slugify idna urllib3 certifi chardet smmap2 Unidecode
Cleaning up...

安装完成之后我的默认安装在~/.local/bin目录下

2. 配置

如果不修改路径,默认的配置为~/airflow

永久修改环境变量

echo "export AIRFLOW_HOME=/home/xiaosi/opt/airflow" >> /etc/profile
source /etc/profile

为了便于操作方便,进行如下配置:

echo "export PATH=/home/xiaosi/.local/bin:$PATH" >> /etc/profile
source /etc/profile

3. 初始化

初始化数据库:

xiaosi@yoona:~$ airflow initdb
[2017-08-02 16:39:22,319] {__init__.py:57} INFO - Using executor SequentialExecutor
[2017-08-02 16:39:22,432] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-08-02 16:39:22,451] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
DB: sqlite:////home/xiaosi/opt/airflow/airflow.db
[2017-08-02 16:39:22,708] {db.py:287} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current schema
INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted
/home/xiaosi/.local/lib/python2.7/site-packages/alembic/util/messaging.py:69: UserWarning: Skipping unsupported ALTER for creation of implicit constraint
  warnings.warn(msg)
INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations
INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_isntance
INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices
INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log
INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun
INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration
INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config
INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user
INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end
INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss
INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection
INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table
INFO  [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table
INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index
INFO  [alembic.runtime.migration] Running upgrade 211e584da130 -> 64de9cddf6c9, add task fails journal table
INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> f2ca10b85618, add dag_stats table
INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 4addfa1236f1, Add fractional seconds to mysql tables
INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 8504051e801b, xcom dag task indices
INFO  [alembic.runtime.migration] Running upgrade 8504051e801b -> 5e7d17757c7a, add pid field to TaskInstance
INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 127d2bf2dfa7, Add dag_id/state index on dag_run table
Done.

运行上述命令之后,会在$AIRFLOW_HOME目录下生成如下文件:

xiaosi@yoona:~/opt/airflow$ ll
总用量 88
drwxrwxr-x  2 xiaosi xiaosi  4096  82 16:39 ./
drwxrwxr-x 26 xiaosi xiaosi  4096  731 13:56 ../
-rw-rw-r--  1 xiaosi xiaosi 11424  82 16:38 airflow.cfg
-rw-r--r--  1 xiaosi xiaosi 58368  82 16:39 airflow.db
-rw-rw-r--  1 xiaosi xiaosi  1554  82 16:38 unittests.cfg

4. 修改默认数据库

找到$AIRFLOW_HOME/airflow.cfg配置文件,进行如下修改:

sql_alchemy_conn = mysql://root:root@localhost:3306/airflow

备注

数据库用户名与密码均为root,airflow使用的数据库为airflow.使用如下命令创建对应的数据库:

mysql> create database airflow;
Query OK, 1 row affected (0.00 sec)

重新初始化服务器数据库:

xiaosi@yoona:~$ airflow initdb

出现了如下错误:

xiaosi@yoona:~$ airflow initdb
Traceback (most recent call last):
  File "/home/xiaosi/.local/bin/airflow", line 17, in <module>
    from airflow import configuration
  File "/home/xiaosi/.local/lib/python2.7/site-packages/airflow/__init__.py", line 30, in <module>
    from airflow import settings
  File "/home/xiaosi/.local/lib/python2.7/site-packages/airflow/settings.py", line 159, in <module>
    configure_orm()
  File "/home/xiaosi/.local/lib/python2.7/site-packages/airflow/settings.py", line 147, in configure_orm
    engine = create_engine(SQL_ALCHEMY_CONN, **engine_args)
  File "/home/xiaosi/.local/lib/python2.7/site-packages/sqlalchemy/engine/__init__.py", line 387, in create_engine
    return strategy.create(*args, **kwargs)
  File "/home/xiaosi/.local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 80, in create
    dbapi = dialect_cls.dbapi(**dbapi_args)
  File "/home/xiaosi/.local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 110, in dbapi
    return __import__('MySQLdb')
ImportError: No module named MySQLdb

解决方案:

MySQL是最流行的开源数据库之一,但在Python标准库中并没有集成MySQL接口程序,MySQLdb是一个第三方包,需独立下载并安装。

sudo apt-get install python-mysqldb

再次初始化:

xiaosi@yoona:~$ airflow initdb
[2017-08-02 17:22:21,169] {__init__.py:57} INFO - Using executor SequentialExecutor
[2017-08-02 17:22:21,282] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-08-02 17:22:21,302] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
DB: mysql://root:***@localhost:3306/airflow
[2017-08-02 17:22:21,553] {db.py:287} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current schema
INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted
INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations
INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_isntance
INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices
INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log
INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun
INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration
INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config
INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user
INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end
INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss
INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection
INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table
INFO  [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table
INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index
INFO  [alembic.runtime.migration] Running upgrade 211e584da130 -> 64de9cddf6c9, add task fails journal table
INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> f2ca10b85618, add dag_stats table
INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 4addfa1236f1, Add fractional seconds to mysql tables
INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 8504051e801b, xcom dag task indices
INFO  [alembic.runtime.migration] Running upgrade 8504051e801b -> 5e7d17757c7a, add pid field to TaskInstance
INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 127d2bf2dfa7, Add dag_id/state index on dag_run table
Done.

查看一下airflow数据库中做了哪些操作:

mysql> use airflow;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-------------------+
| Tables_in_airflow |
+-------------------+
| alembic_version   |
| chart             |
| connection        |
| dag               |
| dag_pickle        |
| dag_run           |
| dag_stats         |
| import_error      |
| job               |
| known_event       |
| known_event_type  |
| log               |
| sla_miss          |
| slot_pool         |
| task_fail         |
| task_instance     |
| users             |
| variable          |
| xcom              |
+-------------------+
19 rows in set (0.00 sec)

5. 启动

通过如下命令就可以启动后台管理界面,默认访问localhost:8080即可:

xiaosi@yoona:~$ airflow webserver
[2017-08-02 17:25:31,961] {__init__.py:57} INFO - Using executor SequentialExecutor
[2017-08-02 17:25:32,075] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-08-02 17:25:32,095] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/

/home/xiaosi/.local/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
[2017-08-02 17:25:32,469] [9703] {models.py:167} INFO - Filling up the DagBag from /home/xiaosi/opt/airflow/dags
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
=================================================================            
[2017-08-02 17:25:33,052] {__init__.py:57} INFO - Using executor SequentialExecutor
[2017-08-02 17:25:33,156] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-08-02 17:25:33,179] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2017-08-02 17:25:33 +0000] [9706] [INFO] Starting gunicorn 19.3.0
[2017-08-02 17:25:33 +0000] [9706] [INFO] Listening at: http://0.0.0.0:8080 (9706)
[2017-08-02 17:25:33 +0000] [9706] [INFO] Using worker: sync
...

呈现出的主界面如下: img

2
0
查看评论

Airflow使用入门指南

Airflow能做什么Airflow是一个工作流分配管理系统,通过有向非循环图的方式管理任务流程,设置任务依赖关系和时间调度。Airflow独立于我们要运行的任务,只需要把任务的名字和运行方式提供给Airflow作为一个task就可以。安装和使用最简单安装在Linux终端运行如下命令 (需要已安装好...
  • qazplm12_3
  • qazplm12_3
  • 2016-11-07 13:50
  • 20676

[AirFlow]AirFlow使用指南三 第一个DAG示例

经过前两篇文章的简单介绍之后,我们安装了自己的AirFlow以及简单了解了DAG的定义文件.现在我们要实现自己的一个DAG.1. 启动Web服务器使用如下命令启用:airflow webserver 现在可以通过将浏览器导航到启动Airflow的主机上的8080端口来访问Airflow UI,例如:...
  • SunnyYoona
  • SunnyYoona
  • 2017-08-03 12:00
  • 4416

Airflow 1.8 工作流平台搭建

Airflow 是Airbnb公司开源的,是用 Python 实现的任务管理、调度、监控工作流的平台。因依调度赖于 crontab ,Airflow 目前只支持在安装在 Linux 系统平台。Airflow 可以用做 data pipeline(ETL)、 消息队列等监控。
  • kk185800961
  • kk185800961
  • 2017-11-03 09:59
  • 1222

airflow FAQ

关于airflow使用过程中的一些常见问题记录
  • yingkongshi99
  • yingkongshi99
  • 2016-09-25 10:30
  • 2049

Airflow:Python 工作流管理利器

Apache Airflow 是一个用于编列杂乱核算作业流和数据处理流水线的开源东西。 假如您发现自个运转的是履行时间超长的 cron 脚本使命,或者是大数据的批处理使命,Airflow 也许是能协助您处理如今窘境的神器。这篇文章将为那些想要寻觅新的东西或者说不知道有这款东西的同学了解 Airflo...
  • IAlexanderI
  • IAlexanderI
  • 2018-01-18 16:09
  • 165

初探airflow

airflow是Airbnb开源的data pipeline调度和监控工作流的平台,用于用来创建、监控和调整data pipeline(ETL)。类似的产品有:Linkedin Azkaban github:https://github.com/apache/incubator-airflow ...
  • u013128262
  • u013128262
  • 2017-04-28 20:30
  • 1901

airflow

1.airflow使用https://segmentfault.com/a/11900000058352422.python3.3 安装pip先安装setuptoolshttps://www.cnblogs.com/lilidun/p/6041198.html3.python安装https://ww...
  • gjq2267787274
  • gjq2267787274
  • 2018-01-17 16:50
  • 49

airflow安装

这个是官方的安装教程:https://pythonhosted.org/airflow/installation.html 我也是按照这个流程进行安装: 1、export $AIRFLOW_HOME=指定路径/airflow //设置环境变量 2、sudo pip install ai...
  • arya_zhang
  • arya_zhang
  • 2016-11-06 21:02
  • 1907

airflow 安装方法 centos 6.5

airflow最简安装方法 centos 6.5python新司机, 所以使用anaconda安装python及其所需要的包, 不用担心缺包导致的各种各样的问题 1. anaconda下载地址:https://www.continuum.io/downloads linux版的...
  • Excaliburace
  • Excaliburace
  • 2016-12-22 19:13
  • 1182

airflow详细安装过程

airflow是Airbnb开源出的一个数据流管理工具,关于使用,可参考官网http://pythonhosted.org/airflow/ 现将安装过程及踩过的坑分享给大家。 安装airflow (为了避免对其他程序造成影响,故不想替换掉原有的python2.6.6,此处希望2.6与2.7...
  • yingkongshi99
  • yingkongshi99
  • 2016-09-25 10:27
  • 5531
    个人资料
    • 访问:1517575次
    • 积分:22609
    • 等级:
    • 排名:第378名
    • 原创:624篇
    • 转载:133篇
    • 译文:60篇
    • 评论:189条
    博客专栏
    文章分类
    最新评论