etcdkeeper连不上etcd,报错:context deadline exceeded。
然后看了下日志:
报错:
etcd | {"level":"warn","ts":"2023-03-17T05:16:03.412Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"3ac7ac36ea1af652","rtt":"0s","error":"dial tcp 10.10.239.33:2380: i/o timeout"}
etcd | {"level":"warn","ts":"2023-03-17T05:16:03.412Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"3ac7ac36ea1af652","rtt":"0s","error":"dial tcp 10.10.239.33:2380: i/o timeout"}
etcd | {"level":"warn","ts":"2023-03-17T05:16:03.414Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f3379f772165359d","rtt":"0s","error":"dial tcp 10.10.239.32:2380: i/o timeout"}
etcd | {"level":"warn","ts":"2023-03-17T05:16:03.414Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f3379f772165359d","rtt":"0s","error":"dial tcp 10.10.239.32:2380: i/o timeout"}
问题:docker-compose 文件里已经注释掉了 集群相关的东西。为什么还去dial别的peer呢?
使用的compose 文件如下所示:
etcd:
# image: bitnami/etcd:latest
image: bitnami/etcd:3.5.5
container_name: etcd
restart: always
ports:
- "2379:2379"
- "2380:2380"
environment:
- ALLOW_NONE_AUTHENTICATION=yes #示例中我们不设置密码
# - ETCD_NAME=etcd1 #节点自己的名字
- ETCD_ADVERTISE_CLIENT_URLS=http://etcd:2379 #告知集群自己的客户端地址 ,为master节点的ip
# - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379 #设置监听客户端通讯的URL列表
# - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://10.10.239.31:2380 #告知集群自己集群通讯地址
# - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380 #用于监听伙伴通讯的URL列表
# - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster #etcd 集群的初始化集群记号
# - ETCD_INITIAL_CLUSTER=etcd1=http://10.10.239.31:2380,etcd2=http://10.10.239.32:2380,etcd3=http://10.10.239.33:2380 #集群成员
# - ETCD_INITIAL_CLUSTER=etcd1=http://10.10.239.31:2380, #集群成员
# - ETCD_INITIAL_CLUSTER_STATE=new #初始化集群状态
volumes:
- ./etcd/data:/bitnami/etcd
networks:
compose_network:
ipv4_address: ${COMPOSE_NETWORK_PREFIX}.18
etcdkeeper:
image: evildecay/etcdkeeper:latest
container_name: etcdkeeper
restart: always
ports:
- "7080:8080"
brew install etcd 启动的就正常运行。为啥docker的不行。
网上搜了很多,镜像也换了,也进容器里去看那些配置文件,都没有显示配置http://10.10.239.32:2380这种url的地方。我容器down和up了很多次,每次都是新容器为什么还会报同一个错误呢?
后来恍然大悟,应该是本地挂载的volume将一些旧数据挂载到了新容器里,才会导致新容器一直同一个错误🙅 ,而且我单台etcd的配置起的容器 要去寻找peer 那就只能说明 之前没有注释掉有关集群的那些环境配置,就启动了一次。然后在本地挂载的volume中有相关集群数据的data
按着这个思路 去看了一下 本地volume的data里的数据 果然!
hanpeng@hanpeng member % ls
snap wal
hanpeng@hanpeng member % pwd
/Users/hanpeng/dev/etcd/data/data/member
hanpeng@hanpeng member % tree
.
├── snap
│ └── db
└── wal
├── 0.tmp
└── 0000000000000000-0000000000000000.wal
cat db
��
����������ec�X��U��
���������[��Uv��%
� �D� 6 I ^u� ��alarmauth
authRevisionauthRolesauthUsersclusterkeyleasemembers0Q�R�R3ac7ac36ea1af652{"id":4235543326421087826,"peerURLs":["http://10.10.239.33:2380"],"name":"etcd3"}e7b147006e212ca5{"id":16695203360812379301,"peerURLs":["http://10.10.239.31:2380"],"name":"etcd1"}f3379f772165359d{"id":17525651808945780125,"peerURLs":["http://10.10.239.32:2380"],"name":"etcd2"}members_removedmeta0 ]confState{"voters":[4235543326421087826,16695203360812379301,17525651808945780125],"auto_leave":false}consistent_indexterm
� �D� 6 I ^u� ��alarmauth
authRevisionauthRolesauthUsersclusterkeyleasemembers0Q�R�R3ac7ac36ea1af652{"id":4235543326421087826,"peerURLs":["http://10.10.239.33:2380"],"name":"etcd3"}e7b147006e212ca5{"id":16695203360812379301,"peerURLs":["http://10.10.239.31:2380"],"name":"etcd1"}f3379f772165359d{"id":17525651808945780125,"peerURLs":["http://10.10.239.32:2380"],"name":"etcd2"}members_removedmeta0 ]confState{"voters":[4235543326421087826,16695203360812379301,17525651808945780125],"auto_leave":false}consistent_indexterm
� �D� 6 I ^u� ��alarmauth
authRevisionauthRolesauthUsersclusterkeyleasemembers0Q�R�R3ac7ac36ea1af652{"id":4235543326421087826,"peerURLs":["http://10.10.239.33:2380"],"name":"etcd3"}e7b147006e212ca5{"id":16695203360812379301,"peerURLs":["http://10.10.239.31:2380"],"name":"etcd1"}f3379f772165359d{"id":17525651808945780125,"peerURLs":["http://10.10.239.32:2380"],"name":"etcd2"}members_removedmeta0 ]confState{"voters":[4235543326421087826,16695203360812379301,17525651808945780125],"auto_leave":false}consistent_indexterm%
hanpeng@hanpeng wal % cat 0000000000000000-0000000000000000.wal
��ل�����ч�����s߯��"�������:"Q{"id":4235543326421087826,"peerURLs":["http://10.10.239.33:2380"],"name":"etcd3"}u���"�ل�����"R{"id":16695203360812379301,"peerURLs":["http://10.10.239.31:2380"],"name":"etcd1"}u���"�딋����"R{"id":17525651808945780125,"peerURLs":["http://10.10.239.32:2380"],"name":"etcd2"}�ȫ%
在db和00000-000000.wal文件里都可以看到有之前的peer的url存在,怪不得注释掉了也会去找。
将data删除掉 或者清空,重新再起一次,运行成功了,etcdkeeper也可以连上了。
还有一点 etcdkeeper的那个地址 是宿主机ip 要写成局域网的192.168.3.35:2379,不是127.0.0.1:2379