etcd 数据库连接报错:grpc: Conn.transportMonitor exits due to: grpc: timed out trying to connect

问题描述

在 debian11 虚拟机中,使用脚本调用 etcdctl 持续更新特定 key 的值测试数据交互程序的稳定性,刷了3 个小时后,程序访问 etcd 时报了如下错误信息:

2022/10/18 14:19:13 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2022/10/18 14:19:14 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:2379: connect: connection refused"; Reconnecting to "127.0.0.1:2379"
2022/10/18 14:19:16 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:2379: connect: connection refused"; Reconnecting to "127.0.0.1:2379"
2022/10/18 14:19:16 grpc: Conn.transportMonitor exits due to: grpc: timed out trying to connect
2022/10/18 14:19:17 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:2379: connect: connection refused"; Reconnecting to "127.0.0.1:2379"
2022/10/18 14:19:19 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:2379: connect: connection refused"; Reconnecting to "127.0.0.1:2379"
2022/10/18 14:19:20 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:2379: connect: connection refused"; Reconnecting to "127.0.0.1:2379"
2022/10/18 14:19:21 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:2379: connect: connection refused"; Reconnecting to "127.0.0.1:2379"

报错信息表明是连接失败,具体的原因需要进一步定位。

定位过程

查看 etcd 的运行状态

执行 systemctl status etcd 命令查看到 etcd 服务日志信息如下:

● etcd.service - etcd - highly-available key value store
     Loaded: loaded (/lib/systemd/system/etcd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2022-10-18 14:19:13 EDT; 7h ago
       Docs: https://etcd.io/docs
             man:etcd
    Process: 612 ExecStart=/usr/bin/etcd $DAEMON_ARGS (code=exited, status=1/FAILURE)
   Main PID: 612 (code=exited, status=1/FAILURE)
        CPU: 50min 14.795s

Oct 18 14:18:51 debian etcd[612]: segmented wal file /var/lib/etcd/default/member/wal/0000000000000009-00000000000ef088.wal is crea>
Oct 18 14:19:01 debian etcd[612]: purged file /var/lib/etcd/default/member/wal/0000000000000004-000000000006a724.wal successfully
Oct 18 14:19:06 debian etcd[612]: read-only range request "key:\"/dynamic_response/>
Oct 18 14:19:06 debian etcd[612]: WARNING: 2022/10/18 14:19:06 grpc: Server.processUnaryRPC failed to write status: connection erro>
Oct 18 14:19:11 debian etcd[612]: read-only range request "key:\"/register>
Oct 18 14:19:11 debian etcd[612]: WARNING: 2022/10/18 14:19:11 grpc: Server.processUnaryRPC failed to write status: connection erro>
Oct 18 14:19:13 debian etcd[612]: cannot commit tx (write /var/lib/etcd/default/member/snap/db: no space left on device)
Oct 18 14:19:13 debian systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Oct 18 14:19:13 debian systemd[1]: etcd.service: Failed with result 'exit-code'.
Oct 18 14:19:13 debian systemd[1]: etcd.service: Consumed 50min 14.795s CPU time.

Oct 18 14:19:13 debian etcd[612]: cannot commit tx (write /var/lib/etcd/default/member/snap/db: no space left on device) 这句 log 信息表明磁盘空间不足,etcd 数据库无法写入数据库。查看磁盘占用情况,发现 /var/ 所在的分区已经占满。

增加可用磁盘空间的解决方案

  1. 扩展磁盘空间,重启 etcd 服务
  2. 删除本地 etcd 数据库,重启 etcd 服务

扩展磁盘的过程有些复杂,我选择删除本地 etcd 数据库来恢复业务,示例命令如下:

root@debian:/lib/systemd# rm -rf /var/lib/etcd/default/member/
root@debian:/lib/systemd# systemctl start etcd

为什么会出现这种问题?

表面原因是本地磁盘空间较小,etcd 数据库持续扩展在一定时间后将本地磁盘空间占满,无法存储数据后异常退出,故而不能再提供服务,连接 etcd 数据库报超时。

根本原因是用于消息通信的 key 没有删除,在持续更新,etcd 记录了所有版本变化到数据库,持续频繁通信导致 etcd 数据库大小不断增大,最终导致磁盘空间被占满,etcd 异常退出。

根本解决方案

用于通信的 etcd key 在通信完成后即从 etcd 数据库中删除,避免 etcd 数据库大小持续扩展。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值