一、日志监控
PLG日志系统组成
- promtail:负责收集日志并将其发送给Loki
- loki:主服务器,服务存储日志和处理查询
- Grafana:用于查询和现实日志
1.前期准备
-
创建目录
mkdir -p /usr/dz/monitor/ cd /usr/dz/monitor/
-
拉取镜像
docker pull grafana/grafana docker pull grafana/promtail docker pull grafana/loki:2.0.1
-
拉取promtail和loki配置文件
wget --no-check-certificate https://raw.githubusercontent.com/grafana/loki/master/cmd/loki/loki-local-config.yaml wget --no-check-certificate https://raw.githubusercontent.com/grafana/loki/master/clients/cmd/promtail/promtail-local-config.yaml
2.启动Loki
1.配置loki-local-config.yaml文件
vim /usr/dz/monitor/loki-local-config.yaml
address: 该地址设置为本机服务器地址
注意:这几行要去掉否则会报错
wal:
enabled: true
dir: /tmp/wal
recover: true
2.启动loki
docker run -d \
--name loki \
--privileged=true \
-v /usr/dz/monitor:/mnt/config \
-p 3100:3100 \
-p 9096:9096 \
grafana/loki:2.0.1 -config.file=/mnt/config/loki-local-config.yaml
3.启动promtail
1.配置promtail-local-config.yaml文件
vim /usr/dz/monitor/promtail-local-config.yaml
url地址设置为Loki所在服务器地址
clients:
- url: http://192.168.15.144:3100/loki/api/v1/push
2.启动promtail
docker run -d \
--name promtail \
--privileged=true \
-v /usr/dz/monitor:/mnt/config \
-v /usr/dz/logs:/usr/dz/logs \
grafana/promtail:latest -config.file=/mnt/config/promtail-local-config.yaml
启动出错一般都是这几个地方(踩过的坑)
doker -v 目录挂载 宿主机目录:容器目录---->(目录挂载出错)
docker -p 端口映射 宿主机端口:容器端口---->(端口映射出错)
-config.file-------------------------------------------------> (配置文件路径错误)
可以用 netstat -nap | grep port 命令去看看端口是否通的
可以用 docker logs -f --tail=100 容器ID/容器名 命令去查看日志
4.启动grafana
1.启动grafana
docker run -d \
--name grafana \
-p 3000:3000 \
grafana/grafana:latest
2.访问grafana
http://192.168.15.144:3000默认账号密码:admin
5.grafana添加loki数据源
url:http://192.168.15.144:3100 地址为Loki所在服务器地址
6.查询日志
二、主机监控
主机监控组成
- prometheus:主要是负责存储、抓取、聚合、查询信息
- Node_exporter:负责采集物理机、中间件的信息
- grafana:将采集的数据查询然后可视化展示
1.拉取镜像
docker pull prom/prometheus
docker pull prom/node-exporter
2.将prometheus.yml文件从容器复制到宿主机
docker cp prometheus:/etc/prometheus/prometheus.yml /usr/dz/monitor/
3.修改prometheus.yml文件
vim /usr/dz/monitor/prometheus.yml
4.启动node-exporter(被监控主机装)
docker run -itd --name=node-exporter \
--restart=always \
-p 9100:9100 \
-v "/proc:/host/proc:ro" \
-v "/sys:/host/sys:ro" \
-v "/:/rootfs:ro" \
prom/node-exporter
访问:192.168.15.144:9100/metrics有信息则成功
5.启动prometheus
docker run -d --name prometheus \
-p 9090:9090 \
-v /usr/dz/monitor/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
访问:192.168.15.144:9090有信息则成功
6.在grafana配置prometheus数据源
url:http://192.168.15.144:9090 地址为Prometheus所在服务器地址
7.导入面板
8.查看主机监控信息
三、链路追踪
1.拉取镜像
docker pull apache/skywalking-oap-server:8.8.1
docker pull apache/skywalking-ui:8.6.0
2.启动oap
docker run --name oap --restart always -d \
--restart=always \
-e TZ=Asia/Shanghai \
-p 12800:12800 \
-p 11800:11800 \
apache/skywalking-oap-server:8.8.1
3.启动ui
docker run -d --name skywalking-ui \
--restart=always \
-p 8099:8080 \
--link oap:oap \
-e SW_OAP_ADDRESS=oap:12800 \
apache/skywalking-ui:8.6.0
为防止端口冲突8080端口映射成:8099
4.skywalking Agent
需要被监控的应用的相应服务器需要有Agent软件包
apache-skywalking-apm-bin/agent
例如:服务器192.168.15.145的user服务需要受到SkyWalking监控则需要在user服务的启动命令上加入
-javaagent:/root/java/apache-skywalking-apm-bin/agent/skywalking-agent.jar
-Dskywalking.agent.service_name=user
-Dskywalking.collector.backend_service=192.168.15.144:11800
三个配置项,第一个是Agent jar的绝对路径,可以配置成环境变量,第二个是SkyWalking上显示的服务名称,第三个是SkyWalking OAP的服务路径
5.访问skywalking-ui
192.168.15.144:8099
附录
1.loki-local-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
grpc_server_max_recv_msg_size: 1073741824 #grpc最大接收消息值,默认4m
grpc_server_max_send_msg_size: 1073741824 #grpc最大发送消息值,默认4m
ingester:
lifecycler:
address: 192.168.15.144
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
max_transfer_retries: 0
max_chunk_age: 20m #一个timeseries块在内存中的最大持续时间。如果timeseries运行的时间超过此时间,则当前块将刷新到存储并创建一个新块
schema_config:
configs:
- from: 2021-11-01
store: boltdb
object_store: filesystem
schema: v11
index:
prefix: index_
period: 168h
storage_config:
boltdb:
directory: /tmp/loki/index #存储索引地址
filesystem:
directory: /tmp/loki/chunks
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 30 #修改每用户摄入速率限制,即每秒样本量,默认值为4M
ingestion_burst_size_mb: 15 #修改每用户摄入速率限制,即每秒样本量,默认值为6M
chunk_store_config:
#max_look_back_period: 168h #回看日志行的最大时间,只适用于即时日志
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false #日志保留周期开关,默认为false
retention_period: 0s #日志保留周期
2.promtail-local-config.yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://192.168.15.144:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
- job_name: file
static_configs:
- targets:
- localhost
labels:
job: file
__path__: /usr/dz/logs/logback-file.log
- job_name: user
static_configs:
- targets:
- localhost
labels:
job: user
__path__: /usr/dz/logs/logback-user.log
3.prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
#监控prometheus本身
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
#targets可以使用Prometheus监控其他装有node_exporter的节点,单节点则不需要
#被监控端的IP地址和端口号(有多个被监控端可用 逗号 隔开)
- job_name: node-exporter
static_configs:
- targets: ['192.168.15.144:9100','192.168.15.145:9100']
labels:
instance: 192.168.15.144
- job_name: node-exporter2
static_configs:
- targets: ['192.168.8.140:9100']
labels:
instance: 192.168.8.140
4.docker常用命令
查看docker仓库镜像列表
docker images查看正在运行容器列表
docker ps查看所有容器 -----包含正在运行 和已停止的
docker ps -a停止容器
docker stop 容器ID/容器名重启容器
docker restart 容器ID/容器名启动容器
docker start 容器ID/容器名进入容器
docker exec -it 容器ID/容器名 /bin/bashkill 容器
docker kill 容器ID/容器名删除容器
docker rm -f 容器ID/容器名查看日志
docker logs -f --tail=100 容器ID/容器名