什么是doker compose?
docker compose 是一个可以在一台宿主机上同时启动多个容器的工具--》容器编排--》一台宿主机上的多个容器,那个容器需要加载什么配置,使用那个镜像,开放那个端口,是否使用卷等参数的配置。
cAdvisor有什么作用?
cAdvisor (short for container Advisor) analyzes and exposes resource usage and performance data from running containers.
可以获取宿主机的资源使用和容器的资源使用。
一.使用Prometheus监控容器
[root@k8snode1 sc]# pwd
/sc
1.编辑prometheus.yml
[root@k8snode1 sc]# cat prometheus.yml
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
static_configs:
- targets:
- cadvisor:8080
2.编辑docker-compose.yml
[root@k8snode1 sc]# cat docker-compose.yml
version: '3.2'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- 9090:9090
command:
- --config.file=/etc/prometheus/prometheus.yml
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
depends_on:
- cadvisor
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
ports:
- 8080:8080
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
depends_on:
- redis
redis:
image: redis:latest
container_name: redis
ports:
- 6379:6379
3.下载cadvisor,导入cadvisor镜像。
[root@k8snode1 prom]# ls
cadvisor.tar
[root@k8snode1 prom]# docker load -i cadvisor.tar
[root@k8snode1 sc]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
redis latest 7614ae9453d1 18 months ago 113MB
prom/prometheus latest a3d385fc29f9 18 months ago 201MB
gcr.io/cadvisor/cadvisor latest 68c29634fe49 2 years ago 163MB
4.使用docker compose启动容器
[root@k8snode1 sc]# docker compose up
[+] Running 4/2
⠿ Network sc_default Created 0.1s
⠿ Container redis Created 0.1s
⠿ Container cadvisor Created 0.0s
⠿ Container prometheus Created 0.0s
Attaching to cadvisor, prometheus, redis
redis | 1:C 15 Jun 2023 10:22:08.193 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis | 1:C 15 Jun 2023 10:22:08.193 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis | 1:C 15 Jun 2023 10:22:08.193 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis | 1:M 15 Jun 2023 10:22:08.194 * monotonic clock: POSIX clock_gettime
redis | 1:M 15 Jun 2023 10:22:08.194 * Running mode=standalone, port=6379.
redis | 1:M 15 Jun 2023 10:22:08.195 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
redis | 1:M 15 Jun 2023 10:22:08.195 # Server initialized
redis | 1:M 15 Jun 2023 10:22:08.195 * Ready to accept connections
cadvisor | W0615 10:22:08.704129 1 manager.go:288] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
prometheus | ts=2023-06-15T10:22:09.372Z caller=main.go:478 level=info msg="No time or size retention was set so using the default time retention" duration=15d
prometheus | ts=2023-06-15T10:22:09.372Z caller=main.go:515 level=info msg="Starting Prometheus" version="(version=2.32.1, branch=HEAD, revision=41f1a8125e664985dd30674e5bdf6b683eff5d32)"
prometheus | ts=2023-06-15T10:22:09.372Z caller=main.go:520 level=info build_context="(go=go1.17.5, user=root@54b6dbd48b97, date=20211217-22:08:06)"
prometheus | ts=2023-06-15T10:22:09.372Z caller=main.go:521 level=info host_details="(Linux 3.10.0-1160.88.1.el7.x86_64 #1 SMP Tue Mar 7 15:41:52 UTC 2023 x86_64 1bc7affdad3c (none))"
prometheus | ts=2023-06-15T10:22:09.372Z caller=main.go:522 level=info fd_limits="(soft=1048576, hard=1048576)"
prometheus | ts=2023-06-15T10:22:09.372Z caller=main.go:523 level=info vm_limits="(soft=unlimited, hard=unlimited)"
prometheus | ts=2023-06-15T10:22:09.376Z caller=web.go:570 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
prometheus | ts=2023-06-15T10:22:09.378Z caller=main.go:924 level=info msg="Starting TSDB ..."
prometheus | ts=2023-06-15T10:22:09.386Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
prometheus | ts=2023-06-15T10:22:09.388Z caller=head.go:488 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
prometheus | ts=2023-06-15T10:22:09.389Z caller=head.go:522 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=164.444µs
prometheus | ts=2023-06-15T10:22:09.389Z caller=head.go:528 level=info component=tsdb msg="Replaying WAL, this may take a while"
prometheus | ts=2023-06-15T10:22:09.391Z caller=head.go:599 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
prometheus | ts=2023-06-15T10:22:09.391Z caller=head.go:605 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=94.48µs wal_replay_duration=1.637257ms total_replay_duration=1.946212ms
prometheus | ts=2023-06-15T10:22:09.392Z caller=main.go:945 level=info fs_type=XFS_SUPER_MAGIC
prometheus | ts=2023-06-15T10:22:09.392Z caller=main.go:948 level=info msg="TSDB started"
prometheus | ts=2023-06-15T10:22:09.392Z caller=main.go:1129 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
prometheus | ts=2023-06-15T10:22:09.395Z caller=main.go:1166 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=2.855148ms db_storage=1.181µs remote_storage=27.878µs web_handler=595ns query_engine=1.107µs scrape=1.47533ms scrape_sd=69.063µs notify=1.869µs notify_sd=2.79µs rules=323.126µs
prometheus | ts=2023-06-15T10:22:09.395Z caller=main.go:897 level=info msg="Server is ready to receive web requests."
^CGracefully stopping... (press Ctrl+C again to force) # 按ctrl+C退出
Aborting on container exit...
[+] Running 3/3
⠿ Container prometheus Stopped 0.1s
⠿ Container cadvisor Stopped 0.1s
⠿ Container redis Stopped 0.2s
canceled
使用 docker compose 启动容器并且在后台运行
[root@k8snode1 sc]# docker compose up -d
[+] Running 3/3
⠿ Container redis Started 0.5s
⠿ Container cadvisor Started 0.9s
⠿ Container prometheus Started 1.5s
[root@k8snode1 sc]# docker compose ps #查看启动的容器
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
cadvisor gcr.io/cadvisor/cadvisor:latest "/usr/bin/cadvisor -…" cadvisor 4 minutes ago Up 30 seconds (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
prometheus prom/prometheus:latest "/bin/prometheus --c…" prometheus 4 minutes ago Up 30 seconds 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp
redis redis:latest "docker-entrypoint.s…" redis 4 minutes ago Up 30 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp
5.访问 cadvisor 和 Prometheus
http://192.168.102.137:8080/ 访问cadvisor
http://192.168.102.137:9090/graph 访问Prometheus
6.停止 docker compose
[root@k8snode1 sc]# docker compose stop #停止容器
[+] Running 3/3
⠿ Container prometheus Stopped 0.1s
⠿ Container cadvisor Stopped 0.1s
⠿ Container redis Stopped 0.1s
[root@k8snode1 sc]# docker compose down stop and remove resources #停止并且删除容器
二.Prometheus 监控的 node 服务器,使用的是exporter+node_exporter
1.上传node_exporter软件,解压。
[root@slave ~]# mkdir /node_exporter
[root@slave ~]# cd /node_exporter/
[root@slave node_exporter]# ls
node_exporter-1.4.0-rc.0.linux-amd64.tar.gz
[root@slave node_exporter]# tar xf node_exporter-1.4.0-rc.0.linux-amd64.tar.gz
[root@slave node_exporter]# ls
node_exporter-1.4.0-rc.0.linux-amd64.tar.gz
node_exporter-1.4.0-rc.0.linux-amd64
2.启动node_exporter代理软件
[root@slave node_exporter-1.4.0-rc.0.linux-amd64]# PATH=/node_exporter/node_exporter-1.4.0-rc.0.linux-amd64:$PATH
[root@slave node_exporter-1.4.0-rc.0.linux-amd64]# which node_exporter
/node_exporter/node_exporter-1.4.0-rc.0.linux-amd64/node_exporter
[root@slave node_exporter-1.4.0-rc.0.linux-amd64]# node_exporter --help #查看使用手册
[root@slave node_exporter-1.4.0-rc.0.linux-amd64]# nohup node_exporter --web.listen-address='0.0.0.0:9100' &
启动node_exporter 监听9100端口
[root@slave node_exporter-1.4.0-rc.0.linux-amd64]# ps aux|grep node
root 17227 0.0 0.7 716544 13112 pts/1 Sl 20:19 0:00 node_exporter --web.listen-address=0.0.0.0:9100
root 17236 0.0 0.0 112824 988 pts/1 S+ 20:20 0:00 grep --color=auto node
访问测试是否安装成功
http://192.168.102.136:9100/metrics
3.在Prometheus server里添加被监控主机
在server上操作
[root@k8snode1 sc]# pwd
/sc
[root@k8snode1 sc]# cat prometheus.yml
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
static_configs:
- targets:
- cadvisor:8080
# 添加需要监控的服务器的信息
- job_name: slave
scrape_interval: 5s
static_configs:
- targets:
- 192.168.102.136:9100
4.重启Prometheus服务,因为没有专门的重启脚本,需要手工完成
因为我们是使用容器启动的Prometheus,所有我们需要重启 compose
[root@k8snode1 sc]# docker compose stop
[+] Running 3/3
⠿ Container prometheus Stopped 0.1s
⠿ Container cadvisor Stopped 0.1s
⠿ Container redis Stopped
[root@k8snode1 sc]# docker compose up -d
[+] Running 3/3
⠿ Container redis Started 0.3s
⠿ Container cadvisor Started 0.7s
⠿ Container prometheus Started 1.1s
[root@k8snode1 sc]#
5.去Prometheus服务器上查看添加的监控服务器
http://192.168.102.137:9090/targets
三.使用 Prometheus 监控 MySQL
1.在一台服务器上使用脚本或者yum安装mysqld,然后安装mysqld_exporter,然后在Prometheus里添加mysqld这台被监控的服务器。
[root@slave node_exporter]# mkdir /mysqld_exporter
[root@slave node_exporter]# cd /mysqld_exporter/
[root@slave mysqld_exporter]# ps aux |grep mysqld
root 6652 0.0 0.0 115536 1716 ? S 09:36 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql --pid-file=/data/mysql/slave.pid
mysql 6991 0.1 12.5 1677892 233308 ? Sl 09:36 0:05 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=slave.err --open-files-limit=8192 --pid-file=/data/mysql/slave.pid --socket=/data/mysql/mysql.sock --port=3306
root 16954 0.0 0.0 112824 988 pts/0 S+ 10:33 0:00 grep --color=auto mysqld
2.下载 mysqld_exporter,用 xftp 上传到 Linux 里。
[root@slave mysqld_exporter]# ls
mysqld_exporter-0.14.0.linux-amd64.tar.gz
[root@slave mysqld_exporter]# tar xf mysqld_exporter-0.14.0.linux-amd64.tar.gz
[root@slave mysqld_exporter]# cd mysqld_exporter-0.14.0.linux-amd64
[root@slave mysqld_exporter-0.14.0.linux-amd64]# ls
LICENSE mysqld_exporter NOTICE
[root@slave mysqld_exporter-0.14.0.linux-amd64]# PATH=/mysqld_exporter/mysqld_exporter-0.14.0.linux-amd64:$PATH
[root@slave mysqld_exporter-0.14.0.linux-amd64]# which mysqld_exporter
/mysqld_exporter/mysqld_exporter-0.14.0.linux-amd64/mysqld_exporter
3.给 exporter 创建用户,并且赋予权限。
[root@slave mysqld_exporter-0.14.0.linux-amd64]# mysql -uroot -pSanchuang1234#
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.37 MySQL Community Server (GPL)
Copyright (c) 2000, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
root@(none) 21:00 mysql>grant select , replication client,process on *.* to 'prom'@'localhost' identified by 'sc123456';
Query OK, 0 rows affected, 1 warning (0.01 sec)
root@(none) 21:00 mysql>exit
Bye
4.配置 mysql 的账号信息
[root@slave mysqld_exporter-0.14.0.linux-amd64]# vim my.cnf
[client]
user=prom
password=sc123456
5.启动 mysqld_exporter
[root@slave mysqld_exporter-0.14.0.linux-amd64]# pwd
/mysqld_exporter/mysqld_exporter-0.14.0.linux-amd64
[root@slave mysqld_exporter-0.14.0.linux-amd64]# nohup mysqld_exporter --config.my-cnf=/mysqld_exporter/mysqld_exporter-0.14.0.linux-amd64/my.cnf &
[2] 17291
[root@slave mysqld_exporter-0.14.0.linux-amd64]# nohup: 忽略输入并把输出追加到"nohup.out"
[root@slave mysqld_exporter-0.14.0.linux-amd64]# ls
LICENSE my.cnf mysqld_exporter nohup.out NOTICE
[root@slave mysqld_exporter-0.14.0.linux-amd64]# ps aux |grep mysqld
root 6642 0.0 0.0 115536 1716 ? S 18:37 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql --pid-file=/data/mysql/slave.pid
mysql 6982 0.1 12.1 1743692 227048 ? Sl 18:37 0:10 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=slave.err --open-files-limit=8192 --pid-file=/data/mysql/slave.pid --socket=/data/mysql/mysql.sock --port=3306
root 17291 0.0 0.6 712340 11936 pts/1 Sl 21:05 0:00 mysqld_exporter --config.my-cnf=/mysqld_exporter/mysqld_exporter-0.14.0.linux-amd64/my.cnf
root 17301 0.0 0.0 112824 984 pts/1 S+ 21:08 0:00 grep --color=auto mysqld
[root@slave mysqld_exporter-0.14.0.linux-amd64]# netstat -anpult|grep mysqld
tcp6 0 0 :::3306 :::* LISTEN 6982/mysqld
tcp6 0 0 :::9104 :::* LISTEN 17291/mysqld_export
6.在server上操作
[root@k8snode1 sc]# cat prometheus.yml
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
static_configs:
- targets:
- cadvisor:8080
- job_name: slave
scrape_interval: 5s
static_configs:
- targets:
- 192.168.102.136:9100
- job_name: mysqld_exporter
scrape_interval: 5s
static_configs:
- targets:
- 192.168.102.136:9104
[root@k8snode1 sc]# docker compose stop
[+] Running 3/3
⠿ Container prometheus Stopped 0.2s
⠿ Container cadvisor Stopped 0.1s
⠿ Container redis Stopped 0.3s
[root@k8snode1 sc]# docker compose up -d
[+] Running 3/3
⠿ Container redis Started 0.4s
⠿ Container cadvisor Started 0.9s
⠿ Container prometheus Started 1.4s
[root@k8snode1 sc]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1bc7affdad3c prom/prometheus:latest "/bin/prometheus --c…" 16 hours ago Up 14 seconds 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp prometheus
8838f2e63e1c gcr.io/cadvisor/cadvisor:latest "/usr/bin/cadvisor -…" 16 hours ago Up 15 seconds (health: starting) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp cadvisor
c9aeedb12daa redis:latest "docker-entrypoint.s…" 16 hours ago Up 15 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp redis
7.去Prometheus服务器上查看添加的监控服务器
http://192.168.102.137:9090/targets