docker 指标 端口_docker指标报告中的气流

docker 指标 端口

An unsettling yet likely familiar situation: you deployed Airflow successfully, but find yourself constantly refreshing the webserver UI to make sure everything is running smoothly.

一个令人不安但可能很熟悉的情况:您成功部署了Airflow,但是发现自己不断刷新Web服务器UI以确保一切正常运行。

You rely on certain alerting tasks to execute upon upstream failures, but if the queue is full and tasks are stalling, how will you be notified?

您依靠某些警报任务来执行上游故障,但是如果队列已满并且任务停滞,将如何通知您?

One solution: deploying Grafana, an open source reporting service, on top of Airflow.

一种解决方案:在Airflow之上部署开源报告服务Grafana。

拟议的架构 (The Proposed Architecture)

Image for post
Image by Author
图片作者

To start, I’ll assume basic understanding of Airflow functionality and containerization using Docker and Docker Compose. More resources can be found here for Airflow, here for Docker, and here for Docker Compose.

首先,我将对使用Docker和Docker Compose的Airflow功能和容器化有基本的了解。 在此处可以找到有关AirflowDockerDocker Compose的更多资源。

Reference the code to follow along: https://github.com/sarahmk125/airflow-docker-metrics

请参考以下代码: https : //github.com/sarahmk125/airflow-docker-metrics

Now, the fun stuff.

现在,有趣的东西。

二手服务 (Used Services)

To get Airflow metrics into a visually appealing dashboard that supports alerting, the following services are spun up in Docker containers declared in the docker-compose.yml file:

为了使Airflow指标进入支持警报的视觉吸引力的仪表板,在docker-compose.yml文件中声明的Docker容器中启动了以下服务:

  • Airflow: Airflow runs tasks within DAGs, defined in Python files stored in the ./dags/ folder. One sample DAG declaration file is already there. Multiple containers are run, with particular nuances accounting for using the official apache/airflow image. More on that later.

    气流:气流在DAG中运行任务,该任务在./dags/文件夹中存储的Python文件中定义。 一个示例DAG声明文件已经在那里。 运行多个容器,使用官方的apache/airflow图像会引起一些细微差别。 以后再说。

  • StatsD-Exporter: The StatsD-Exporter container converts Airflow’s metrics in StatsD format to Prometheus format, the datasource for the reporting layer (Grafana). More information on StatsD-Exporter found here. The container definition includes the command to be executed upon startup, defining how to use the ports exposed.

    StatsD-Exporter :StatsD-Exporter容器将StatsD格式的Airflow指标转换为Prometheus格式,Prometheus格式是报告层(Grafana)的数据源。 有关StatsD-Exporter的更多信息,请参见此处。 容器定义包括启动时要执行的命令,定义了如何使用公开的端口。

statsd-exporter:
image: prom/statsd-exporter
container_name: airflow-statsd-exporter
command: "--statsd.listen-udp=:8125 --web.listen-address=:9102"
ports:
- 9123:9102
- 8125:8125/udp
  • Prometheus: Prometheus is a service commonly used for time-series data reporting. It is particularly convenient when using Grafana as a reporting UI since Prometheus is a supported datasource. More information on Prometheus found here. The volumes mounted in the container definition indicate how the data flows to/from Prometheus.

    Prometheus :Prometheus是通常用于时间序列数据报告的服务。 当使用Grafana作为报告UI时,这特别方便,因为Prometheus是受支持的数据源。 有关Prometheus的更多信息,请参见此处。 容器定义中安装的卷指示数据如何往返于Prometheus。

prometheus:
image: prom/prometheus
container_name: airflow-prometheus
user: "0"
ports:
- 9090:9090
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus/volume:/prometheus
  • Grafana: Grafana is a reporting UI service that is often used to connect to non-relational databases. In the code described, Grafana uses Prometheus as a datasource for dashboards. The container definition includes an admin user for the portal, as well as the volumes defining datasources and dashboards that are already pre-configured.

    Grafana :Grafana是一种报告UI服务,通常用于连接到非关系数据库。 在描述代码中,Grafana使用Prometheus作为仪表板的数据源。 容器定义包括门户网站的管理员用户,以及定义已预先配置的数据源和仪表板的卷。

grafana:
image: grafana/grafana:7.1.5
container_name: airflow-grafana
environment:
GF_SECURITY_ADMIN_USER: admin
GF_SECURITY_ADMIN_PASSWORD: password
GF_PATHS_PROVISIONING: /grafana/provisioning
ports:
- 3000:3000
volumes:
- ./grafana/volume/data:/grafana
- ./grafana/volume/datasources:/grafana/datasources
- ./grafana/volume/dashboards:/grafana/dashboards
- ./grafana/volume/provisioning:/grafana/provisioning

让它走 (Make It Go)

To start everything up, the following tools are required: Docker, docker-compose, Python3, Git.

要启动所有程序,需要以下工具:Docker,docker-compose,Python3,Git。

Steps (to be run in a terminal):

步骤(在终端中运行):

  • Clone the repository: git clone https://github.com/sarahmk125/airflow-docker-metrics.git

    克隆存储库: git clone https://github.com/sarahmk125/airflow-docker-metrics.git

  • Navigate to the cloned folder: cd airflow-docker-metrics

    导航到克隆的文件夹: cd airflow-docker-metrics

  • Startup the containers: docker-compose -f docker-compose.yml up -d (Note: they can be stopped or removed by running the same command except with stop or down at the end, respectively)

    启动容器: docker-compose -f docker-compose.yml up -d (注意:可以通过运行相同的命令来停止或删除它们,除了分别在结尾处使用stopdown )

The result:

结果:

Image for post
Image by Author
图片作者
Image for post
Image by Author
图片作者
Image for post
Image by Author
图片作者
  • Grafana: http://localhost:3000 (login: username=admin, password=password)

    Grafana: http:// localhost:3000 (登录名:username = admin ,password = password )

    The repository includes an Airflow Metrics dashboard, that can be setup with alerts, showing the number of running and queued tasks over time:

    该存储库包括一个Airflow Metrics仪表板,该仪表板可以设置警报,显示一段时间内正在运行和排队的任务数:

Image for post
Image by Author
图片作者

步骤说明(Steps Explained)

Prometheus实际上如何获得指标?(How does Prometheus actually get the metrics?)

Prometheus is configured upon startup in the ./prometheus/prometheus.yml file which is mounted as a volume:

Prometheus在启动时在./prometheus/prometheus.yml文件中进行配置,该文件以卷挂载:

global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_timeout: 10s
external_labels:
monitor: 'codelab-monitor'scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['airflow-prometheus:9090']

- job_name: 'statsd-exporter'
static_configs:
- targets: ['airflow-statsd-exporter:9102']

tls_config:
insecure_skip_verify: true

In particular, the scrape_configs section declares a destination (the airflow-prometheus container) and a source (the airflow-statsd-exporter container) to scrape.

特别地, scrape_configs节声明要scrape_configs的目的地( scrape_configs airflow-prometheus容器)和来源( airflow-statsd-exporter容器)。

如何在Grafana中创建仪表板和警报? (How are dashboards and alerts created in Grafana?)

Provisioning is your friend!

供应是您的朋友!

Provisioning in Grafana means using code to define datasources, dashboards, and alerts to exist upon startup. When starting the containers, there is a Prometheus datasource already configured in localhost:3000/datasources and an Airflow Metrics dashboard listed in localhost:3000/dashboards.

Grafana中的配置意味着使用代码来定义数据源,仪表板和警报,这些数据在启动时就存在。 启动容器时,已经在localhost:3000 / datasources中配置了Prometheus数据源,并在localhost:3000 / dashboards中列出了Airflow Metrics仪表

How to provision:

如何提供:

  • All the relevant data is mounted as volumes onto the grafana container defined in the docker-compose.yml file (described above)

    所有相关数据均作为卷安装到grafana docker-compose.yml文件中定义的grafana容器中(如上所述)

  • The ./grafana/volume/provisioning/datasources/default.yaml file contains a definition of all data sources:

    ./grafana/volume/provisioning/datasources/default.yaml文件包含所有数据源的定义:

apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
  • The ./grafana/volume/provisioning/dashboards/default.yaml file contains information on where to mount dashboards in the container:

    ./grafana/volume/provisioning/dashboards/default.yaml文件包含有关将仪表板安装在容器中的位置的信息:

apiVersion: 1
providers:
- name: dashboards
folder: General
type: file
editable: true
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /grafana/dashboards
foldersFromFilesStructure: true
  • The ./grafana/volume/dashboards/ folder contains .json files, each representing a dashboard. The airflow_metrics.json file results in the dashboard shown above.

    ./grafana/volume/dashboards/文件夹包含.json文件,每个代表的仪表板。 airflow_metrics.json文件将显示在上面显示的仪表板上。

The JSON can be retrieved from the Grafana UI by following these instructions.

遵循以下说明,可以从Grafana UI中检索JSON。

Alerts in the UI can be setup as described here; there is also an excellent Medium article here on setting up Grafana alerting with Slack. Alerts can be provisioned in the same way as dashboards and datasources.

用户界面中的警报可以按此处所述进行设置; 这里还有一篇出色的中型文章介绍如何使用Slack设置Grafana警报。 可以采用与仪表板和数据源相同的方式来配置警报。

奖励主题:官方气流图像 (Bonus Topic: The Official Airflow Image)

Before there was an official Docker image, Matthieu “Puckel_” Roisil released Docker support for Airflow. Starting with Airflow version 1.10.10, the Apache Software Foundation released an official image on DockerHub which is the only current and continuously updated image. However, many still rely on the legacy and unofficial docker-airflow repository.

在没有正式的Docker镜像之前, Matthieu“ Puckel_” Roisil发布了Docker对Airflow的支持。 从Airflow版本1.10.10开始,Apache Software Foundation在DockerHub上发布了官方映像,这是当前唯一且不断更新的映像。 但是,许多人仍然依赖于旧的非官方的docker-airflow存储库。

Why is this a problem? Well, relying on the legacy repository means capping Airflow at version 1.10.9. Airflow 1.10.10 began supporting some cool features such as running tasks on Kubernetes. The official repository will also be where the the upcoming (and highly anticipated) Airflow 2.0 will be released.

为什么这是个问题? 好吧,依靠旧版存储库意味着将Airflow的版本限制为1.10.9。 Airflow 1.10.10开始支持一些很酷的功能,例如在Kubernetes上运行任务。 官方存储库还将是即将发布(且备受期待的)Airflow 2.0的发布地。

The new docker-compose declaration found in the described repository for the webserver looks something like this:

在所描述的webserver存储库中找到的新docker-compose声明如下所示:

webserver:
container_name: airflow-webserver
image: apache/airflow:1.10.12-python3.7
restart: always
depends_on:
- postgres
- redis
- statsd-exporter
environment:
- LOAD_EX=n
- EXECUTOR=Local
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
- AIRFLOW__SCHEDULER__STATSD_ON=True
- AIRFLOW__SCHEDULER__STATSD_HOST=statsd-exporter
- AIRFLOW__SCHEDULER__STATSD_PORT=8125
- AIRFLOW__SCHEDULER__STATSD_PREFIX=airflow
-AIRFLOW__CORE__SQL_ALCHEMY_CONN= postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
-AIRFLOW__CORE__FERNET_KEY= pMrhjIcqUNHMYRk_ZOBmMptWR6o1DahCXCKn5lEMpzM=
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__CORE__AIRFLOW_HOME=/opt/airflow/
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
- AIRFLOW__WEBSERVER__WORKERS=2
- AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800
volumes:
- ./dags:/opt/airflow/dags
ports:
- "8080:8080"
command: bash -c "airflow initdb && airflow webserver"
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3

A few changes from the puckel/docker-airflow configuration to highlight:

puckel/docker-airflow配置中进行了一些更改,以突出显示:

  • Custom parameters such as the AIRFLOW__CORE__SQL_ALCHEMY_CONN that were previously found in the airflow.cfg file are now declared as environment variables in the docker-compose file.

    以前在airflow.cfg文件中找到的自定义参数(例如AIRFLOW__CORE__SQL_ALCHEMY_CONN现在在airflow.cfg docker-compose文件中声明为环境变量。

  • The airflow initdb command to initialize the backend database is now declared as a command in the docker-compose file, as opposed to an entrypoint script.

    与入口点脚本相反,现在在docker-compose文件中将docker-compose airflow initdb命令初始化后端数据库声明为命令。

瞧! (Voila!)

There you have it. No more worrying if your tasks are infinitely queued and not running. Airflow running in Docker, with dashboards and alerting available in Grafana at your fingertips. The same architecture can be run on an instance deployed in GCP or AWS for 24/7 monitoring just like it was run locally.

你有它。 如果您的任务无限排队且没有运行,则无需担心。 气流在Docker中运行,指尖可在Grafana中使用仪表板和警报。 可以在GCP或AWS中部署的实例上运行相同的体系结构以进行24/7监控,就像在本地运行一样。

The finished product can be found here: https://github.com/sarahmk125/airflow-docker-metrics

成品可以在这里找到: https : //github.com/sarahmk125/airflow-docker-metrics

It’s important to note, there’s always room for improvement:

需要注意的是,总有改进的余地:

  • This monitoring setup does not capture container or instance failures; a separate or extended solution is needed to monitor at the container or instance level.

    此监视设置无法捕获容器或实例故障; 需要一个单独的或扩展的解决方案来在容器或实例级别进行监视。
  • The current code runs using the LocalExecutor, which is less than ideal for large workloads. Further testing with the CeleryExecutor can be done.

    当前代码使用LocalExecutor运行,这对于大型工作负载而言并不理想。 可以使用CeleryExecutor进行进一步测试。
  • There are many more metrics available in StatsD that were not highlighted (such as DAG or task duration, counts of task failures, etc.). More dashboards can be built and provisioned in Grafana to leverage all the relevant metrics.

    StatsD中还有许多未突出显示的指标(例如DAG或任务持续时间,任务失败计数等)。 可以在Grafana中构建和配置更多的仪表板,以利用所有相关指标。
  • Lastly, this article focuses on a self-hosted (or highly configurable cloud) deployment for Airflow, but this is not the only option for deploying Airflow.

    最后,本文重点介绍用于Airflow的自托管(或高度可配置的云)部署,但这不是部署Airflow的唯一选择。

有什么问题吗评论? (Questions? Comments?)

Thanks for reading! I love talking data stacks. Shoot me a message.

谢谢阅读! 我喜欢谈论数据栈。 给我留言

翻译自: https://towardsdatascience.com/airflow-in-docker-metrics-reporting-83ad017a24eb

docker 指标 端口

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
要使用 `docker_sd_config` 监控 Docker 集群,需要在 Prometheus 的配置文件设置相应的服务发现规则。以下是一个示例配置文件,用于监控运行在 Docker 集群的 Node.js 应用程序: ```yaml global: scrape_interval: 15s scrape_configs: - job_name: 'nodejs-app' metrics_path: '/metrics' file_sd_configs: - files: - /etc/prometheus/targets.json refresh_interval: 5m relabel_configs: - source_labels: [__meta_docker_container_label_com_docker_swarm_service_name] regex: '(.+)' target_label: job replacement: '$1' - source_labels: [__meta_docker_container_label_com_docker_swarm_task_id] regex: '(.+)' target_label: instance replacement: '$1' - source_labels: [__address__, __meta_docker_container_label_com_docker_swarm_task_id] regex: '([^:]+)(?::\d+)?' target_label: __address__ replacement: '$1:3000' - source_labels: [__meta_docker_container_label_com_docker_swarm_service_name] regex: '(.+)' target_label: service replacement: '$1' ``` 在上面的配置文件,`job_name` 是 Prometheus 的作业名称,`metrics_path` 是应用程序的指标路径。`file_sd_configs` 是文件服务发现配置,指定了用于存储应用程序地址信息的 JSON 文件路径。`relabel_configs` 是标签重写配置,用于将 Docker 元数据转换为 Prometheus 标签。 注意,上面的示例配置文件假设 Node.js 应用程序运行在容器的 3000 端口上。如果你的应用程序运行在不同的端口上,请相应地更改标签重写配置。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值