docker本地开发和测试_使用docker和测试加快气流开发

docker本地开发和测试

If you develop in the cloud or on a shared server, you might have experienced the beauty of working with your teamate relying on the same codebase and the same database! Actually, when working on the same cloud environment with other people, it happens that certain resources (files, tables…) are shared between several teammates. This means you can sometimes find yourself blocked because other people are working on the same resource. And it can become frustrating when a file/table you needed is overwritten, moved or deleted…

如果您是在云中或共享服务器上进行开发的,那么您可能会体验到依靠相同的代码库和数据库来与团队合作的美妙之处! 实际上,当与其他人在同一个云环境中工作时,碰巧某些资源(文件,表等)在多个队友之间共享。 这意味着您有时会发现自己被阻止,因为其他人正在使用同一资源。 当您需要的文件/表被覆盖,移动或删除时,它可能会令人沮丧。

TL; DR (TL;DR)

If you want to do most of your Airflow’s work without relying on a shared workspace and avoiding latency, use an Airflow Docker container and run syntax and unit tests in it.

如果您想在不依赖共享工作空间并避免延迟的情况下完成大部分Airflow的工作,请使用Airflow Docker容器并在其中运行语法和单元测试。

梦想 (The dream)

To have the cloud in your computer!

在您的计算机中拥有云!

Or at least a local environment in which you could check that everything is ok with Python/Airflow syntax so the only thing you need to do on the shared environment is to test you code behavior.

或者至少是一个本地环境,您可以在其中检查Python / Airflow语法是否一切正常,因此在共享环境上唯一需要做的就是测试代码行为。

When you have this local environment, you avoid a lot of handling to deploy your code or copy it to your environment, waiting 30s+ that Airflow understands you updated a file and refreshes it. With your local environment, you’ll see all errors faster than through the web interface or the Airflow’s logs (so you must switch to another application, like Stackdriver, to read this logs).

当您拥有此本地环境时,可以避免大量的工作来部署代码或将其复制到您的环境中,而等待30多秒钟后,Airflow才知道您已更新并刷新了文件。 在您的本地环境中,您看到所有错误的速度比通过Web界面或Airflow的日志快(因此,您必须切换到另一个应用程序,例如Stackdriver,才能读取此日志)。

实现你的梦想 (Achieve your dream)

To run Airflow on your computer and in your CI, we are going to use… Docker, so it’s a bit of configuration at the beginning, but in the end you’ll be able to reproduce your Cloud configuration and share it with your teammates. You can save your Docker image in a private Docker registry to avoid rebuilding it. From my point of view, it’s easier to run an already configured Airflow with Docker than installing it on a virtual environment.

为了在您的计算机和CI上运行Airflow,我们将使用…Docker,因此在开始时需要进行一些配置,但最后您将能够重现Cloud配置并与队友共享。 您可以将Docker映像保存在私有Docker注册表中以避免重建它。 从我的角度来看,用Docker运行已配置的Airflow比在虚拟环境中安装它更容易。

  1. Add Python packages to your Docker image

    将Python软件包添加到Docker映像

You can use an Airflow Docker image from Docker hub (puckel/docker-airflow:1.10.9) or add Python packages to an existing image, by creating a Dockerfile:

您可以通过创建Dockerfile使用来自Docker中心的Airflow Docker映像( puckel/docker-airflow:1.10.9 )或将Python软件包添加到现有映像中:

FROM puckel/docker-airflow:1.10.9USER root
# requirements file with packages you need
COPY requirements.txt .
RUN pip install -r requirements.txt

USER airflow

If you customize your image, you need to save the Dockerfile and run this command in the Dockerfile directory (you might need to add sudobefore docker):

如果您自定义你的形象,你需要保存Dockerfile并运行Dockerfile目录此命令(您可能需要添加sudodocker ):

$ docker build -t my-airflow-img:1.0 .

So you can see it in your local images :

因此,您可以在本地图像中看到它:

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
my-airflow-img 1.0 96696eea2641 5 minutes ago 1.89GB

2. Allow Airflow’s variable loading

2.允许气流的可变负载

If you use Airflow variables in your code, you’ll need to configure a Fernet Key (cryptography key for Airflow) :

如果在代码中使用Airflow变量,则需要配置Fernet密钥(用于Airflow的加密密钥):

# replace <img> with my-airflow-img or $ docker run <img> python -c "
R9mdx6wCwIb0h_GChn1-Fbcth3H_gTBAjgvf87JLgSU=

And to pass this key through an environment variable :

并通过环境变量传递此键:

# replace <fkey> (by the one you just produced) and <img>
# your Airflow's variables file is expected to be in dags directory (in your local/host machine) with name vars.jsondocker run -d -e FERNET_KEY=<fkey> -e DAGS_DIR='dags' \
--name airflow-web \
-v `pwd`/dags/:/usr/local/airflow/dags \
-p 8080:8080 <img>

Now, you can load your variables :

现在,您可以加载变量:

docker exec -it airflow-web airflow variables -i ./dags/vars.json

So now you have a working space to launch tests locally and save a lot of time. It’s not exactly a cloud: no scale up, no link with other applications ; but good enough to work and understand if your code will work. The remaining tests within your cloud environment and with your data will be quicker.

因此,现在您有了一个工作空间,可以在本地启动测试并节省大量时间。 它并非完全是云:没有扩大规模,没有与其他应用程序的链接; 但足以工作并了解您的代码是否可以工作。 您在云环境中以及对数据进行的其余测试将更快。

You can make your life much easier with a pinch of Bash, especially to run your tests in a CI/CD pipeline.

使用少量的Bash可以使您的生活更加轻松,尤其是在CI / CD管道中运行测试。

实现你的梦想 (Fulfill your dream)

You have Airflow running locally, you can see your dags loading in your browser localhost:8080. If there are syntax errors, it will be displayed in the dags page and you can easily access the logs with docker logs -f airflow-web.

您有在本地运行的Airflow,您可以在浏览器localhost:8080看到您的垃圾邮件正在加载。 如果存在语法错误,它将显示在dags页面中,您可以使用docker logs -f airflow-web轻松访问日志。

Now that you have this environment, if you add a few more lines, you will be able to test the syntax of your code (it could be thought like a compilation step). We will run Python tests to check Python and Airflow syntax (ie: do tasks have task_id?) and you will know if it’s going to work within seconds.

现在您已经有了这个环境,如果再添​​加几行,您将能够测试代码的语法(可以将其视为编译步骤)。 我们将运行Python测试来检查Python和Airflow语法(即:任务是否具有task_id?),您将知道它是否会在几秒钟内起作用。

First, we must create a file dags/test_syntax.py and import Python and Airflow packages:

首先,我们必须创建一个文件dags/test_syntax.py并导入Python和Airflow包:

import os
import unittest # usefull to launch tests and imports
from importlib import util # usefull to import a filefrom airflow import models as af_models

We create a Python unittest class (DagsTest), with a test method (test_load_dags_integrity) that will try to load the dag files listed with dag_import method:

我们使用测试方法(test_load_dags_integrity)创建一个Python unittest类(DagsTest),该方法将尝试加载dag_import方法列出的dag文件:

class DagsTest(unittest.TestCase):
"""Validate dags syntax.""" def test_load_dags_integrity(self):
"""Import ml dag files and check for DAG syntax."""
root_dag_dir = os.path.join(DAG_DIR, 'load')
files = ['load_mesages.py','load_users.py'] self.dags_import(root_dag_dir, files)

You could easily change the files list definition to fill it with a function (that takes a directory as input) defining which files to try to import in this directory based on a name convention, so you don’t have to update test files each time your team add a new dag!

您可以轻松地更改files列表定义,以使用功能(以目录作为输入)来填充它,该函数根据名称约定定义要尝试导入该目录的文件,因此您不必每次都更新测试文件您的团队添加了一个新任务!

The import_dags function try to import all files defined in list_dag_files, thanks to _load, and checks if it’s able to find a dag within the loaded module with _check_if_there_is_a_dag_in (function’s content was found in the first cicle of Data Testing Hell with Airflow):

import_dags功能尝试导入中定义的所有文件list_dag_files ,感谢_load ,并检查它是否能够与加载的模块中找到一个DAG _check_if_there_is_a_dag_in (函数的内容在第一cicle发现数据测试地狱的气流):

def import_dags(self, dag_dir, list_dag_files):
"""
For each file in list_dag_files, we:
- try to load it to check syntax and
- check if there is a dag defined in the file.
"""
for filename in list_dag_files:
module, module_path = self._load(dag_dir, filename)
self._check_if_there_is_a_dag_in(module, module_path)@staticmethod
def _load(dag_dir, filename):
module_path = os.path.join(dag_dir, filename)
module_name, _ = os.path.splitext(filename) mod_spec = util.spec_from_file_location(
module_name, module_path)
module = util.module_from_spec(mod_spec)
mod_spec.loader.exec_module(module)
return module, module_path@staticmethod
def _check_if_there_is_a_dag_in(module, module_path):
"""Look if there is at least one dag defined in the module."""
assert any(
isinstance(var, af_models.DAG)
for var in vars(module).values())

You can now launch your syntax tests with:

现在,您可以使用以下命令启动语法测试:

docker exec -it airflow-web python -m unittest dags/test_syntax.py

超越你的梦想 (Beyond your dream)

We have integrity tests, we can easily add unit tests.

我们有完整性测试,我们可以轻松添加单元测试。

First, we must create a file dags/test_unit.py and import some packages (no need of Airflow here, thanks to mock/patch!):

首先,我们必须创建一个文件dags/test_unit.py并导入一些包(由于模拟/补丁,这里不需要Airflow!):

import unittest
from unittest.mock import MagicMock, patchfrom Training import Training

Next, we create a new unittest class for a Training class (in dags/Training.py) with a setUp (triggered before each test) to define a false dag instance and to instanciate Training (so we won’t instanciate it in each test):

接下来,我们为Training类(在dags/Training.py )创建一个新的unittest类,并使用setUp(在每个测试之前触发)来定义一个假dag实例并实例化Training(因此我们不会在每个测试中实例化它) ):

class TrainingUnitTest(unittest.TestCase):
"""
Unit test for training functions.
"""
def setUp(self):
self.dag = MagicMock(return_value="it would be a dag obj")
self.tr = Training(self.dag)

We add a new test in TrainingUnitTest, to check if our method launch from Training class behave as defined and that keep working in this way :

我们在TrainingUnitTest中添加了一个新测试,以检查从Traininglaunch的方法是否按照定义的方式运行并保持这种状态:

def test_launch(self):
with patch('airflow.operators.python_operator.PythonOperator') \
as mock_op:

self.tr.launch(usecase='topics_in_messages') mock_op.assert_called_with(
task_id='training_for_topics_in_messages',
python_callable=self.mi._train_model,
op_kwargs={
'usecase': 'topic_in_messages',
'number_of_clusters': 7},
dag=self.dag
)

We use a patch with patch('...') to catch any call to PythonOperator , so we don’t need Airflow to test it, it’s quicker and (more importantly) we do not want to test Airflow here, we just need to test our own code. We can test our code in intercation with Airflow, but it will be less accurate and slower (around 2 times slower by my measurements), we better want having specific integration tests for it (planned in a futur post).

我们使用with patch('...')来捕获对PythonOperator任何调用,因此我们不需要Airflow对其进行测试,它速度更快,并且(更重要的是)我们不想在此处测试Airflow,我们只需要测试我们自己的代码。 我们可以在与Airflow的交互中测试代码,但是它的准确性和速度较慢(根据我的测量,速度要慢2倍左右),我们最好要对其进行特定的集成测试(在以后的文章中计划)。

You can now launch your unit tests with :

现在,您可以使用以下命令启动单元测试:

docker exec -it airflow-web python -m unittest dags/test_unit.py

超越梦想 (Above the dream)

(Yes… until the end with the dream….)

(是的……直到梦想结束……。)

So you can work faster by checking if there are syntax errors in your code without dependancies to other people (if you share the same cloud) and you can manage your codebase in a safer way with unit tests.

因此,通过检查代码中是否存在语法错误而不依赖于其他人(如果您共享同一云),您可以更快地工作,并且可以通过单元测试以更安全的方式管理代码库。

Some interesting resources found on my way :

在途中发现了一些有趣的资源:

  • Data’s inferno : 7 Circles of Data Testing Hell with Airflow (a must read)

    数据的地狱:7轮数据测试地狱与气流(必读)

  • Airflow’s Best Practices on tests : it could be interessting to look at implementing the dagbag solution for dag loader test (instead of the one from Data’s inferno)

    Airflow在测试方面最佳实践:可能正在考虑实施dagbag解决方案以进行dag loader测试(而不是Data的inferno中的一种)

Thanks to Tiffany Souterre to read over !

感谢Tiffany Souterre继续阅读!

Thank you for reading !

谢谢您的阅读!

翻译自: https://medium.com/@ulyssehum/speed-up-your-airflow-development-with-docker-and-tests-a8449d443174

docker本地开发和测试

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值