系统资源使用情况
CPU 内存 磁盘:系统会维护相应的表
网络比较复杂
Tendermint暴露metrics的运行流程
http://localhost:26660/metrics
核心配置文件生成的流程:
config.toml由tendermint命令生成
源码文件:
tendermint/cmd/tendermint/main.go
tendermint/cmd/tendermint/commands/root.go
tendermint/config/toml.go
调用关系:
main() -> RootCmd -> ParseConfig() -> EnsureRoot() -> writeDefaultConfigFile() -> DefaultConfig()
tendermint指标的命名方式:2660/metrics endpoint
使用docker去监控容器CPU和内存资源:
CPU 内存 磁盘 网络
docker查看各个容器资源的使用情况:
//Display a live stream of container(s) resource usage statistics
Usage: docker stats [OPTIONS] [CONTAINER...]
docker stats
--no-stream Disable streaming stats and only pull the first result
获取单个容器的资源使用情况
nhl@harry:~$ docker stats node1 --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
77ad10f26e1e node1 5.54% 104.7MiB / 7.596GiB 1.35% 999kB / 2.35MB 0B / 0B 18
Push Gateway
主要用于短期的 jobs。由于这类 jobs 存在时间较短,可能在 Prometheus 来 pull 之前就消失了。为此,这些 jobs 可以直接向 Prometheus server 端推送它们的 metrics。
docker logs --help
Usage: docker logs [OPTIONS] CONTAINER
读取容器资源使用情况—>显示资源使用情况
Docker Stats 是通过Docker引擎的API接口来实现的,执行命令时,调用dokcer引擎的/contains/{id}/stats
Linux中修改docker默认访问端口(2375)
Tendermint本地单节点静态设置资源的使用情况
1.定义node子系统的metrics
新建node/metrics.go文件
package node
import (
"github.com/go-kit/kit/metrics"
"github.com/go-kit/kit/metrics/discard"
"github.com/go-kit/kit/metrics/prometheus"
stdprometheus "github.com/prometheus/client_golang/prometheus"
)
const (
// MetricsSubsystem is a subsystem shared by all metrics exposed by this
// package.
MetricsSubsystem = "node"
)
// Metrics contains metrics exposed by this package.
type Metrics struct {
// CPU使用情况
CpuUsage metrics.Gauge
// 内存占用情况
MemoryUsage metrics.Gauge
// 磁盘读取情况
DiskIO metrics.Gauge
// 网络IO
NetWorkIO metrics.Gauge
}
// PrometheusMetrics returns Metrics build using Prometheus client library.
// Optionally, labels can be provided along with their values ("foo",
// "fooValue").
func PrometheusMetrics(namespace string, labelsAndValues ...string) *Metrics {
labels := []string{}
for i := 0; i < len(labelsAndValues); i += 2 {
labels = append(labels, labelsAndValues[i])
}
return &Metrics{
CpuUsage: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
Namespace: namespace,
Subsystem: MetricsSubsystem,
Name: "cpuUsage",
Help: "Usage of CPU.",
}, labels).With(labelsAndValues...),
MemoryUsage: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
Namespace: namespace,
Subsystem: MetricsSubsystem,
Name: "memoryUsage",
Help: "Usage of memory.",
}, labels).With(labelsAndValues...),
DiskIO: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
Namespace: namespace,
Subsystem: MetricsSubsystem,
Name: "diskIO",
Help: "IO of disk.",
}, labels).With(labelsAndValues...),
NetWorkIO: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
Namespace: namespace,
Subsystem: MetricsSubsystem,
Name: "peer_pending_send_bytes",
Help: "Pending bytes to be sent to a given peer.",
}, labels).With(labelsAndValues...),
}
}
func NopMetrics() *Metrics {
return &Metrics{
CpuUsage: discard.NewGauge(),
MemoryUsage: discard.NewGauge(),
DiskIO: discard.NewGauge(),
NetWorkIO: discard.NewGauge(),
}
}
2.修改node/node.go文件的MetricsProvider
// MetricsProvider returns a consensus, p2p and mempool Metrics.
type MetricsProvider func(chainID string) (*cs.Metrics, *p2p.Metrics, *mempl.Metrics, *sm.Metrics, *Metrics)
3.修改node/node.go文件的DefaultMetricsProvider()函数
func DefaultMetricsProvider(config *cfg.InstrumentationConfig) MetricsProvider {
return func(chainID string) (*cs.Metrics, *p2p.Metrics, *mempl.Metrics, *sm.Metrics, *Metrics) {
if config.Prometheus {
return cs.PrometheusMetrics(config.Namespace, "chain_id", chainID),
p2p.PrometheusMetrics(config.Namespace, "chain_id", chainID),
mempl.PrometheusMetrics(config.Namespace, "chain_id", chainID),
sm.PrometheusMetrics(config.Namespace, "chain_id", chainID),
PrometheusMetrics(config.Namespace, "chain_id", chainID)
}
return cs.NopMetrics(), p2p.NopMetrics(), mempl.NopMetrics(), sm.NopMetrics(), NopMetrics()
}
}
4.修改node/node.go文件的metricsProvider()部分
csMetrics, p2pMetrics, memplMetrics, smMetrics, nodeMetrics := metricsProvider(genDoc.ChainID)
//为nodeMetrics设置静态值
nodeMetrics.CpuUsage.Set(1)
nodeMetrics.MemoryUsage.Set(6)
nodeMetrics.DiskIO.Set(2)
nodeMetrics.NetWorkIO.Set(3)
5.编译构建tendermint二进制文件
cd ~/workspace/tendermint
make build
make install
tendermint init
6.修改配置文件~/.tendermint/config/config.toml
Prometheus=true
create_empty_block=false
7.修改prometheus.yml文件
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'tendermint'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
honor_labels: true
static_configs:
- targets: ['localhost:26660']
8.运行prometheus
cd ~/workspace/prometheus-2.44.0.linux-amd64/
./prometheus --config.file=prometheus.yml
9.查看指标 localhost:9090
前置知识
nhl@harry:~$ pidstat -p 20353
Linux 5.15.90.1-microsoft-standard-WSL2 (harry) 06/02/23 _x86_64_ (16 CPU)
21:13:20 UID PID %usr %system %guest %wait %CPU CPU Command
21:13:20 1000 20353 0.49 1.85 0.00 0.00 2.34 8 tendermint
如果您想以图形方式查看CPU使用情况,可以使用gnome-system-monitor命令。
gnome-system-monitor
ps命令的使用
ps -o %cpu -C tendermint --no-headers
Tendermint本地单节点动态获取资源的使用情况
将一段程序注册为RPC服务:
思路:
1.获取节点的资源使用情况
curl http://localhost:2375/containers/stats/
获取某个节点某个时间cpu的使用情况:
API url的形式:
curl http://localhost:2375/containers/node0/stats/?stream=0
bash的形式
docker stats --no-stream node0
2.解析返回的数据解析所需要的指标
3.将指标显示在prometheus服务中
docker部署Tendermint多节点网络动态获取资源的使用情况
思路:
1.通过docker远程API获取各个节点的资源使用情况
curl http://localhost:2375/containers/stats/
2.解析返回的数据解析所需要的指标
3.将指标显示在prometheus服务中
问题:怎么能不影响主进程而实现实时监控tendermint中cpu的使用情况
(源代码)
各个Metrics是如何使用Prometheus 使用metrics的
思路一:
tendermint内置一个一直运行的服务,监控tendermint进程资源的使用情况
思路二:
单独编写一个程序,获取tendermint资源的使用情况,然后将收集到的数据暴露给Prometheus
docker容器中的prometheus服务监听本地获取的服务