docker 指标 端口
An unsettling yet likely familiar situation: you deployed Airflow successfully, but find yourself constantly refreshing the webserver UI to make sure everything is running smoothly.
一个令人不安但可能很熟悉的情况:您成功部署了Airflow,但是发现自己不断刷新Web服务器UI以确保一切正常运行。
You rely on certain alerting tasks to execute upon upstream failures, but if the queue is full and tasks are stalling, how will you be notified?
您依靠某些警报任务来执行上游故障,但是如果队列已满并且任务停滞,将如何通知您?
One solution: deploying Grafana, an open source reporting service, on top of Airflow.
一种解决方案:在Airflow之上部署开源报告服务Grafana。
拟议的架构 (The Proposed Architecture)
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/957147b5a964ae6744f8ab3d56f9177f.png)
To start, I’ll assume basic understanding of Airflow functionality and containerization using Docker and Docker Compose. More resources can be found here for Airflow, here for Docker, and here for Docker Compose.
首先,我将对使用Docker和Docker Compose的Airflow功能和容器化有基本的了解。 在此处可以找到有关Airflow , Docker和Docker Compose的更多资源。
Reference the code to follow along: https://github.com/sarahmk125/airflow-docker-metrics
请参考以下代码: https : //github.com/sarahmk125/airflow-docker-metrics
Now, the fun stuff.
现在,有趣的东西。
二手服务 (Used Services)
To get Airflow metrics into a visually appealing dashboard that supports alerting, the following services are spun up in Docker containers declared in the docker-compose.yml
file:
为了使Airflow指标进入支持警报的视觉吸引力的仪表板,在docker-compose.yml
文件中声明的Docker容器中启动了以下服务:
Airflow: Airflow runs tasks within DAGs, defined in Python files stored in the
./dags/
folder. One sample DAG declaration file is already there. Multiple containers are run, with particular nuances accounting for using the officialapache/airflow
image. More on that later.气流:气流在DAG中运行任务,该任务在
./dags/
文件夹中存储的Python文件中定义。 一个示例DAG声明文件已经在那里。 运行多个容器,使用官方的apache/airflow
图像会引起一些细微差别。 以后再说。StatsD-Exporter: The StatsD-Exporter container converts Airflow’s metrics in StatsD format to Prometheus format, the datasource for the reporting layer (Grafana). More information on StatsD-Exporter found here. The container definition includes the command to be executed upon startup, defining how to use the ports exposed.
StatsD-Exporter :StatsD-Exporter容器将StatsD格式的Airflow指标转换为Prometheus格式,Prometheus格式是报告层(Grafana)的数据源。 有关StatsD-Exporter的更多信息,请参见此处。 容器定义包括启动时要执行的命令,定义了如何使用公开的端口。
statsd-exporter:
image: prom/statsd-exporter
container_name: airflow-statsd-exporter
command: "--statsd.listen-udp=:8125 --web.listen-address=:9102"
ports:
- 9123:9102
- 8125:8125/udp
Prometheus: Prometheus is a service commonly used for time-series data reporting. It is particularly convenient when using Grafana as a reporting UI since Prometheus is a supported datasource. More information on Prometheus found here. The volumes mounted in the container definition indicate how the data flows to/from Prometheus.
Prometheus :Prometheus是通常用于时间序列数据报告的服务。 当使用Grafana作为报告UI时,这特别方便,因为Prometheus是受支持的数据源。 有关Prometheus的更多信息,请参见此处。 容器定义中安装的卷指示数据如何往返于Prometheus。
prometheus:
image: prom/prometheus
container_name: airflow-prometheus
user: "0"
ports:
- 9090:9090
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus/volume:/prometheus
Grafana: Grafana is a reporting UI service that is often used to connect to non-relational databases. In the code described, Grafana uses Prometheus as a datasource for dashboards. The container definition includes an admin user for the portal, as well as the volumes defining datasources and dashboards that are already pre-configured.
Grafana :Grafana是一种报告UI服务,通常用于连接到非关系数据库。 在描述的代码中,Grafana使用Prometheus作为仪表板的数据源。 容器定义包括门户网站的管理员用户,以及定义已预先配置的数据源和仪表板的卷。
grafana:
image: grafana/grafana:7.1.5
container_name: airflow-grafana
environment:
GF_SECURITY_ADMIN_USER: admin
GF_SECURITY_ADMIN_PASSWORD: password
GF_PATHS_PROVISIONING: /grafana/provisioning
ports:
- 3000:3000
volumes:
- ./grafana/volume/data:/grafana
- ./grafana/volume/datasources:/grafana/datasources
- ./grafana/volume/dashboards:/grafana/dashboards
- ./grafana/volume/provisioning:/grafana/provisioning
让它走 (Make It Go)
To start everything up, the following tools are required: Docker, docker-compose, Python3, Git.
要启动所有程序,需要以下工具:Docker,docker-compose,Python3,Git。
Steps (to be run in a terminal):
步骤(在终端中运行):
Clone the repository:
git clone https://github.com/sarahmk125/airflow-docker-metrics.git
克隆存储库:
git clone https://github.com/sarahmk125/airflow-docker-metrics.git
Navigate to the cloned folder:
cd airflow-docker-metrics
导航到克隆的文件夹:
cd airflow-docker-metrics
Startup the containers:
docker-compose -f docker-compose.yml up -d
(Note: they can be stopped or removed by running the same command except withstop
ordown
at the end, respectively)启动容器:
docker-compose -f docker-compose.yml up -d
(注意:可以通过运行相同的命令来停止或删除它们,除了分别在结尾处使用stop
或down
)
The result:
结果:
Airflow webserver UI: http://localhost:8080
气流Web服务器UI: http:// localhost:8080
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/fe8fb16e68d47d6dbee98fda4ef82f24.png)
StatsD metrics list: http://localhost:9123/metrics
StatsD指标列表: http:// localhost:9123 / metrics
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/91806a9d2fcb6eef7fa2ec862d90c3c8.png)
Prometheus: http://localhost:9090
普罗米修斯: http:// localhost:9090
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/6c89d0dc231965d8d0bad1968ddc478c.png)
Grafana: http://localhost:3000 (login: username=admin, password=password)
Grafana: http:// localhost:3000 (登录名:username = admin ,password = password )
The repository includes an Airflow Metrics dashboard, that can be setup with alerts, showing the number of running and queued tasks over time:
该存储库包括一个Airflow Metrics仪表板,该仪表板可以设置警报,显示一段时间内正在运行和排队的任务数:
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/8ef3b7c4d2fc6a685be7510184958b79.png)
步骤说明(Steps Explained)
Prometheus实际上如何获得指标?(How does Prometheus actually get the metrics?)
Prometheus is configured upon startup in the ./prometheus/prometheus.yml
file which is mounted as a volume:
Prometheus在启动时在./prometheus/prometheus.yml
文件中进行配置,该文件以卷挂载:
global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_timeout: 10s
external_labels:
monitor: 'codelab-monitor'scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['airflow-prometheus:9090']
- job_name: 'statsd-exporter'
static_configs:
- targets: ['airflow-statsd-exporter:9102']
tls_config:
insecure_skip_verify: true
In particular, the scrape_configs
section declares a destination (the airflow-prometheus
container) and a source (the airflow-statsd-exporter
container) to scrape.
特别地, scrape_configs
节声明要scrape_configs
的目的地( scrape_configs
airflow-prometheus
容器)和来源( airflow-statsd-exporter
容器)。
如何在Grafana中创建仪表板和警报? (How are dashboards and alerts created in Grafana?)
Provisioning is your friend!
供应是您的朋友!
Provisioning in Grafana means using code to define datasources, dashboards, and alerts to exist upon startup. When starting the containers, there is a Prometheus
datasource already configured in localhost:3000/datasources and an Airflow Metrics
dashboard listed in localhost:3000/dashboards.
Grafana中的配置意味着使用代码来定义数据源,仪表板和警报,这些数据在启动时就存在。 启动容器时,已经在localhost:3000 / datasources中配置了Prometheus
数据源,并在localhost:3000 / dashboards中列出了Airflow Metrics
仪表板。
How to provision:
如何提供:
All the relevant data is mounted as volumes onto the
grafana
container defined in thedocker-compose.yml
file (described above)所有相关数据均作为卷安装到
grafana
docker-compose.yml
文件中定义的grafana
容器中(如上所述)The
./grafana/volume/provisioning/datasources/default.yaml
file contains a definition of all data sources:./grafana/volume/provisioning/datasources/default.yaml
文件包含所有数据源的定义:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
The
./grafana/volume/provisioning/dashboards/default.yaml
file contains information on where to mount dashboards in the container:./grafana/volume/provisioning/dashboards/default.yaml
文件包含有关将仪表板安装在容器中的位置的信息:
apiVersion: 1
providers:
- name: dashboards
folder: General
type: file
editable: true
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /grafana/dashboards
foldersFromFilesStructure: true
The
./grafana/volume/dashboards/
folder contains.json
files, each representing a dashboard. The airflow_metrics.json file results in the dashboard shown above.该
./grafana/volume/dashboards/
文件夹包含.json
文件,每个代表的仪表板。 airflow_metrics.json文件将显示在上面显示的仪表板上。
The JSON can be retrieved from the Grafana UI by following these instructions.
遵循以下说明,可以从Grafana UI中检索JSON。
Alerts in the UI can be setup as described here; there is also an excellent Medium article here on setting up Grafana alerting with Slack. Alerts can be provisioned in the same way as dashboards and datasources.
用户界面中的警报可以按此处所述进行设置; 这里还有一篇出色的中型文章,介绍如何使用Slack设置Grafana警报。 可以采用与仪表板和数据源相同的方式来配置警报。
奖励主题:官方气流图像 (Bonus Topic: The Official Airflow Image)
Before there was an official Docker image, Matthieu “Puckel_” Roisil released Docker support for Airflow. Starting with Airflow version 1.10.10, the Apache Software Foundation released an official image on DockerHub which is the only current and continuously updated image. However, many still rely on the legacy and unofficial docker-airflow
repository.
在没有正式的Docker镜像之前, Matthieu“ Puckel_” Roisil发布了Docker对Airflow的支持。 从Airflow版本1.10.10开始,Apache Software Foundation在DockerHub上发布了官方映像,这是当前唯一且不断更新的映像。 但是,许多人仍然依赖于旧的非官方的docker-airflow
存储库。
Why is this a problem? Well, relying on the legacy repository means capping Airflow at version 1.10.9. Airflow 1.10.10 began supporting some cool features such as running tasks on Kubernetes. The official repository will also be where the the upcoming (and highly anticipated) Airflow 2.0 will be released.
为什么这是个问题? 好吧,依靠旧版存储库意味着将Airflow的版本限制为1.10.9。 Airflow 1.10.10开始支持一些很酷的功能,例如在Kubernetes上运行任务。 官方存储库还将是即将发布(且备受期待的)Airflow 2.0的发布地。
The new docker-compose
declaration found in the described repository for the webserver
looks something like this:
在所描述的webserver
存储库中找到的新docker-compose
声明如下所示:
webserver:
container_name: airflow-webserver
image: apache/airflow:1.10.12-python3.7
restart: always
depends_on:
- postgres
- redis
- statsd-exporter
environment:
- LOAD_EX=n
- EXECUTOR=Local
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
- AIRFLOW__SCHEDULER__STATSD_ON=True
- AIRFLOW__SCHEDULER__STATSD_HOST=statsd-exporter
- AIRFLOW__SCHEDULER__STATSD_PORT=8125
- AIRFLOW__SCHEDULER__STATSD_PREFIX=airflow
-AIRFLOW__CORE__SQL_ALCHEMY_CONN= postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
-AIRFLOW__CORE__FERNET_KEY= pMrhjIcqUNHMYRk_ZOBmMptWR6o1DahCXCKn5lEMpzM=
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__CORE__AIRFLOW_HOME=/opt/airflow/
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
- AIRFLOW__WEBSERVER__WORKERS=2
- AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800
volumes:
- ./dags:/opt/airflow/dags
ports:
- "8080:8080"
command: bash -c "airflow initdb && airflow webserver"
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
A few changes from the puckel/docker-airflow
configuration to highlight:
从puckel/docker-airflow
配置中进行了一些更改,以突出显示:
Custom parameters such as the
AIRFLOW__CORE__SQL_ALCHEMY_CONN
that were previously found in theairflow.cfg
file are now declared as environment variables in thedocker-compose
file.以前在
airflow.cfg
文件中找到的自定义参数(例如AIRFLOW__CORE__SQL_ALCHEMY_CONN
现在在airflow.cfg
docker-compose
文件中声明为环境变量。The
airflow initdb
command to initialize the backend database is now declared as a command in thedocker-compose
file, as opposed to an entrypoint script.与入口点脚本相反,现在在
docker-compose
文件中将docker-compose
airflow initdb
命令初始化后端数据库声明为命令。
瞧! (Voila!)
There you have it. No more worrying if your tasks are infinitely queued and not running. Airflow running in Docker, with dashboards and alerting available in Grafana at your fingertips. The same architecture can be run on an instance deployed in GCP or AWS for 24/7 monitoring just like it was run locally.
你有它。 如果您的任务无限排队且没有运行,则无需担心。 气流在Docker中运行,指尖可在Grafana中使用仪表板和警报。 可以在GCP或AWS中部署的实例上运行相同的体系结构以进行24/7监控,就像在本地运行一样。
The finished product can be found here: https://github.com/sarahmk125/airflow-docker-metrics
成品可以在这里找到: https : //github.com/sarahmk125/airflow-docker-metrics
It’s important to note, there’s always room for improvement:
需要注意的是,总有改进的余地:
- This monitoring setup does not capture container or instance failures; a separate or extended solution is needed to monitor at the container or instance level. 此监视设置无法捕获容器或实例故障; 需要一个单独的或扩展的解决方案来在容器或实例级别进行监视。
- The current code runs using the LocalExecutor, which is less than ideal for large workloads. Further testing with the CeleryExecutor can be done. 当前代码使用LocalExecutor运行,这对于大型工作负载而言并不理想。 可以使用CeleryExecutor进行进一步测试。
- There are many more metrics available in StatsD that were not highlighted (such as DAG or task duration, counts of task failures, etc.). More dashboards can be built and provisioned in Grafana to leverage all the relevant metrics. StatsD中还有许多未突出显示的指标(例如DAG或任务持续时间,任务失败计数等)。 可以在Grafana中构建和配置更多的仪表板,以利用所有相关指标。
- Lastly, this article focuses on a self-hosted (or highly configurable cloud) deployment for Airflow, but this is not the only option for deploying Airflow. 最后,本文重点介绍用于Airflow的自托管(或高度可配置的云)部署,但这不是部署Airflow的唯一选择。
有什么问题吗评论? (Questions? Comments?)
Thanks for reading! I love talking data stacks. Shoot me a message.
谢谢阅读! 我喜欢谈论数据栈。 给我留言。
翻译自: https://towardsdatascience.com/airflow-in-docker-metrics-reporting-83ad017a24eb
docker 指标 端口