etcd集群扩容与故障替换

本文详细介绍了如何进行etcd集群的扩容,包括申请新节点、设置etcdctl API版本、数据备份、添加新节点以及修改集群启动参数。同时,还阐述了在节点故障时的替换步骤,包括删除故障节点、添加新节点、检查集群状态以及替换旧节点,确保集群正常运行。
摘要由CSDN通过智能技术生成

集群信息:

isLeader主机名主机IP
falsek8s-master0110.211.55.3
falsek8s-master0210.211.55.4
truek8s-master0310.211.55.5

一、集群扩容

1.1 申请两台和etcd集群相同配置的机器或容器k8s-master04(10.211.55.6)和k8s-master05(10.211.55.7)用于部署新节点

1.2 etcdctl 3.x以后的版本和etcd集群交互有V2和V3两种API,在V2版本中,etcd集群使用ssl的时候etcdctl后面需要显示的加上证书访问,而V3不需要,笔者自己的etcdctl版本是3.3.12的,etcd集群也是3.0以上的,所以使用etcdctl的V3版本

[root@k8s-master01 ~]# etcdctl -v
etcdctl version: 3.3.12
API version: 2

设置etcdctl使用的api版本

[root@k8s-master01 ~]# export ETCDCTL_API=3

查看集群的节点信息

[root@k8s-master01 ~]# etcdctl --write-out=table  member list
+------------------+---------+--------+--------------------------+--------------------------+
|        ID        | STATUS  |  NAME  |        PEER ADDRS        |       CLIENT ADDRS       |
+------------------+---------+--------+--------------------------+--------------------------+
| 78a30be28647bb39 | started | infra2 | https://10.211.55.4:2380 | https://10.211.55.4:2379 |
| caf3e8da9311ea76 | started | infra3 | https://10.211.55.5:2380 | https://10.211.55.5:2379 |
| fcf8f1c47e6d256c | started | infra1 | https://10.211.55.3:2380 | https://10.211.55.3:2379 |
+------------------+---------+--------+--------------------------+--------------------------+

1.3 登陆集群其中一个节点进行数据备份

[root@k8s-master01 ~]# etcdctl backup --data-dir /var/lib/etcd -backup-dir /tmp/etcd_backup

1.4 在k8s-master01上执行添加第一个新节点,添加节点用到etcdctl的member add参数,我们通过执行 etcdctl member add --help可以看到下面的输出,是我们给新节点起的名字,一般按照etcdctl --write-out=table member list的输出中NAME规律往上叠加即可,[options]就是–peer-urls="",后面的值就是要加的节点的IP和端口

[root@k8s-master01 ~]# etcdctl member add --help
NAME:
	member add - Adds a member into the cluster

USAGE:
	etcdctl member add <memberName> [options]

OPTIONS:
      --peer-urls=""	comma separated peer URLs for the new member.

GLOBAL OPTIONS:
      --cacert=""				verify certificates of TLS-enabled secure servers using this CA bundle
      --cert=""					identify secure client using this TLS certificate file
      --command-timeout=5s			timeout for short running command (excluding dial timeout)
      --debug[=false]				enable client-side debug logging
      --dial-timeout=2s				dial timeout for client connections
  -d, --discovery-srv=""			domain name to query for SRV records describing cluster endpoints
      --endpoints=[127.0.0.1:2379]		gRPC endpoints
      --hex[=false]				print byte strings as hex encoded strings
      --insecure-discovery[=true]		accept insecure SRV records describing cluster endpoints
      --insecure-skip-tls-verify[=false]	skip server certificate verification
      --insecure-transport[=true]		disable transport security for client connections
      --keepalive-time=2s			keepalive time for client connections
      --keepalive-timeout=6s			keepalive timeout for client connections
      --key=""					identify secure client using this TLS key file
      --user=""					username[:password] for authentication (prompt if password is not supplied)
  -w, --write-out="simple"			set the output format (fields, json, protobuf, simple, table)

具体操作和结果输出如下

[root@k8s-master01 ~]# etcdctl member add infra4 --peer-urls="https://10.211.55.6:2380"
Member  29c2088642a8819 added to cluster 20d679faaae7becc

ETCD_NAME="infra4"
ETCD_INITIAL_CLUSTER="infra4=https://10.211.55.6:2380,infra2=https://10.211.55.4:2380,infra3=https://10.211.55.5:2380,infra1=https://10.211.55.3:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.211.55.6:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

添加后查看集群状态,因为新节点还未启动,状态是unstarted,注意这一步是关键,先在集群里把要加的节点加进来,然后再启动节点,启动节点前需要修改etcd配置。

[root@k8s-master01 ~]# etcdctl --write-out=table  member list
+------------------+-----------+--------+--------------------------+--------------------------+
|        ID        |  STATUS   |  NAME  |        PEER ADDRS        |       CLIENT ADDRS       |
+------------------+-----------+--------+--------------------------+--------------------------+
|  29c2088642a8819 | unstarted |        | https://10.211.55.6:2380 |                          |
| 78a30be28647bb39 |   started | infra2 | https://10.211.55.4:2380 | https://10.211.55.4:2379 |
| caf3e8da9311ea76 |   started | infra3 | https://10.211.55.5:2380 | https://10.211.55.5:2379 |
| fcf8f1c47e6d256c |   started | infra1 | https://10.211.55.3:2380 | https://10.211.55.3:2379 |
+------------------+-----------+--------+--------------------------+--------------------------+

1.5 修改etcd集群启动参数,在原有变量基础上,再添加新节点的配置信息:
–initial-cluster infra1=https://10.211.55.3:2380,infra2=https://10.211.55.4:2380,infra3=https://10.211.55.5:2380,infra4=https://10.211.55.6:2380
将启动参数中–initial-cluster-state new 改为 --initial-cluster-state existing

1.6 启动新的节点etcd实例,确认集群状态是否正常,确认新etcd节点添加到集群是否成功。

二、节点故障替换

现在假设k8s-master02 这个节点的etcd实例挂掉,并且无法恢复,下面是替换这个节点的步骤
2.1 删除老的etcd节点故障节点k8s-master02
这一步是关键,因为如果不先删除,直接添加新的节点,会造成集群中有两个不可用的节点(老故障节点,新加节点添加时未启动),四个节点有两台不可用,集群是不允许的,一般情况下集群的自我保护机制会添加失败
2.2 删除老节点后,按照上面的集群扩容步骤添加新的节点,添加成功后,一定要通过etcdctl --write-out=table member list查看集群中的节点是三个,而且status都是started
2.3 最后是更新旧的两台旧节点,将故障节点替换成新节点,ETCD_NAME改成新加的名字,挨个重启旧节点即完成故障节点替换
2.4 通过etcdctl工具可以测试一下读写操作

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值