k8s的核心组件etcd的安装使用、快照说明及etcd命令详解【含单节点，多节点和新节点加入说明】-CSDN博客

本文链接：https://blog.csdn.net/cuichongxin/article/details/118701969

文章目录

说明
基本查看相关命令
etcd单节点命令的使用【增删改查】
多节点配置
etcd快照【snap】
k8s容器中etc说明

说明

关于etcd 的概念和命令参数，已经在另一篇博客中详细说明了如下：

k8s的核心组件etcd功能详解【含etcd各类参数详细说明】

注：etcd节点最好为单数节点且最少3台。
因为Raft定义了节点的三种角色： Follower、Candidate和Leader。如果数量太少或为偶数，可能导致etcd出问题。
我用的下面3台做测试

基本查看相关命令

版本查看

[root@etcd1 ~]# curl -L http://192.168.59.156:2379/version
{"etcdserver":"3.3.11","etcdcluster":"3.3.0"}
[root@etcd1 ~]#

查看etcd暴露出来的prometheus指标；

在prometheus对其监控时使用
注：prometheus采集指标时，是通过https方式采集的

[root@etcd1 ~]# curl -L http://192.168.59.156:2379/metrics
...
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.3547904e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.62616692549e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.158846464e+10
...

节点健康状态查看

命令：etcdctl cluster-health

[root@etcd1 ~]# etcdctl cluster-health
member 8e9e05c52164694d is healthy: got healthy result from http://localhost:2379
cluster is healthy
[root@etcd1 ~]#

其中：cluster is healthy为正常。

现有节点成员查看

命令：etcdctl member list
注：member命令同时支持添加、删除一个节点
这会打印所有节点信息【因为我现在还没加入其他节点，所以只有自己一个】

[root@etcd1 ~]# etcdctl member list
8e9e05c52164694d: name=default peerURLs=http://localhost:2380 clientURLs=http://localhost:2379 isLeader=true
[root@etcd1 ~]#

参数说明：
- peerURLs：表示此节点对其它的节点开放的通信地址
- clientURLs：表示向客户端提供的通信地址
- isLeader：表示当前节点是否是leader

添加一个节点成员

见下面多节点配置中：“新节点加入集群”，有详细说明

删除一个节点成员

命令：etcdctl member remove id【ID用命令：etcdctl member list查看，开头的一串数字就是ID】
如，我删除158这个成员

[root@etcd1 ~]# etcdctl member list | grep 158
240bc0d12da09d72: name=etcd-158 peerURLs=http://192.168.59.158:2380 clientURLs=http://192.168.59.158:2379,http://localhost:2379 isLeader=false
[root@etcd1 ~]#
[root@etcd1 ~]# etcdctl member remove 240bc0d12da09d72
Removed member 240bc0d12da09d72 from cluster
[root@etcd1 ~]#
[root@etcd1 ~]# etcdctl member list | grep 158
[root@etcd1 ~]#

备份etcd的整个数据目录

如果是通过快照恢复，还会清除默认数据目录的所有内容，所以备份整个数据目录和快照是冲突的，主要是说明一下etcdctl中backup的使用以及etcd的默认数据目录罢了。
默认的数据目录为/var/lib/etcd/

[root@etcd1 ~]# etcdctl backup --data-dir /var/lib/etcd --backup-dir /tmp/etcd
2021-07-14 10:40:01.869919 I | open /var/lib/etcd/member/snap: no such file or directory
[root@etcd1 ~]# cd /tmp/etcd
[root@etcd1 etcd]# ls
member
[root@etcd1 etcd]#

参数说明：
- --data-dir：指明数据目录的位置
- --backup-dir：指明备份的位置

etcd单节点命令的使用【增删改查】

etcd单节点的安装配置

安装etcd包

安装etcd包：yum -y install etcd

配置文件编辑

3台虚拟机，仅用一个节点操作即可【是单节点配置】。
安装完毕后配置文件如下：

[root@etcd1 etcd]# cd /etc/etcd/
[root@etcd1 etcd]# ls
etcd.conf
[root@etcd1 etcd]#

编辑前还是先备份一下源文件：

[root@etcd1 etcd]# cp /etc/etcd/etcd.conf /

注意，配置文件中的所有参数在下面这篇博客中有详细说明，自行去查看对比，我在这就不对参数做说明，仅对修改内容做说明
博客连接。。。。。。
需要修改的内容如下：

[root@etcd1 etcd]# vim /etc/etcd/etcd.conf
  1 #[Member]
  2 #下面行是存储位置，可以自定义位置【我是用的默认】
  3 ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
  4 #下面3380需要取消注释，和3379后面都需要加上自己ip:port【默认用的是回环端口】
  5 ETCD_LISTEN_PEER_URLS="http://localhost:2380,http://192.168.59.156:2380"
  6 ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://192.168.59.156:2379"
  7 #
  8 #下面是etcd名字，可以自定义【我用的默认】
  9 ETCD_NAME="default"
 19 #[Clustering]
 20 #添加下面的ip，让其监听该端口
 21 ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.156:2379"
 22 #ETCD_DISCOVERY="" 
  #其他参数可以不用动，直接保存退出

服务加入开机启动

因为我不是第一次执行，所以没有打印任何内容。

[root@etcd1 ~]# systemctl enable etcd --now
[root@etcd1 ~]#
[root@etcd1 ~]# systemctl is-active etcd
active
[root@etcd1 ~]#

获取帮助

这里面有使用的说明，多看看这里面。
- 方式一：etcdctl --help
- 方式二：man etcdctl
etcd写入数据有2个版本，版本2和版本3，默认使用的版本2
注：版本2个版本3写入数据的方式是不一样的且不可混用。

版本2

默认使用的是版本2，如果切换到版本3了，使用下面命令切换版本2的环境。
命令：export ETCDCTL_API=2

查看所有命令

执行：etcdctl --help
下拉，有一个COMMANDS，这里面就是2版本的所有命令了【和linux命令类似】

COMMANDS:
     backup          backup an etcd directory
     cluster-health  check the health of the etcd cluster
     mk              make a new key with a given value
     mkdir           make a new directory
     rm              remove a key or a directory
     rmdir           removes the key if it is an empty directory or a key-value pair
     get             retrieve the value of a key
     ls              retrieve a directory
     set             set the value of a key
     setdir          create a new directory or update an existing directory TTL
     update          update an existing key with a given value
     updatedir       update an existing directory
     watch           watch a key for changes
     exec-watch      watch a key for changes and exec an executable
     member          member add, remove and list subcommands
     user            user add, grant and revoke subcommands
     role            role add, grant and revoke subcommands
     auth            overall auth controls
     help, h         Shows a list of commands or help for one command

如：创建、查看、删除

[root@etcd1 ~]# etcdctl ls /
[root@etcd1 ~]# 
[root@etcd1 ~]# etcdctl mkdir /aa
[root@etcd1 ~]# 
[root@etcd1 ~]# etcdctl ls /
/aa
[root@etcd1 ~]# 
[root@etcd1 ~]# etcdctl rmdir /aa
[root@etcd1 ~]# 
[root@etcd1 ~]# etcdctl ls /
[root@etcd1 ~]#

其他节点访问该etcd

准确来说，叫做远程执行其他etcd命令罢了【注意，这仅仅是远程访问而已，并不是该集群的一部分】
命令：etcdctl --endpoints IP:2379 这跟需要执行的命令【2和3的版本命令不一样】
如下，我在157上查看156这个etc的数据

[root@etcd2 ~]# etcdctl --endpoints http://192.168.59.156:2379 ls /
/aa
[root@etcd2 ~]# 
[root@etcd2 ~]# ip a | grep 59
    inet 192.168.59.157/24 brd 192.168.59.255 scope global ens32
[root@etcd2 ~]#

版本3

需要执行一个环境变量，然后就默认使用版本3了。
命令：export ETCDCTL_API=3

查看所有命令

我执行了3的环境变量后，直接执行：etcdctl --help
下拉，有一个COMMANDS，这里面就是3版本的所有命令了【这个更像数据库】，可以看到比2多很多，而且使用方式也变了

COMMANDS:
        get                     Gets the key or a range of keys
        put                     Puts the given key into the store
        del                     Removes the specified key or range of keys [key, range_end)
        txn                     Txn processes all the requests in one transaction
        compaction              Compacts the event history in etcd
        alarm disarm            Disarms all alarms
        alarm list              Lists all alarms
        defrag                  Defragments the storage of the etcd members with given endpoints
        endpoint health         Checks the healthiness of endpoints specified in `--endpoints` flag
        endpoint status         Prints out the status of endpoints specified in `--endpoints` flag
        endpoint hashkv         Prints the KV history hash for each endpoint in --endpoints
        move-leader             Transfers leadership to another etcd cluster member.
        watch                   Watches events stream on keys or prefixes
        version                 Prints the version of etcdctl
        lease grant             Creates leases
        lease revoke            Revokes leases
        lease timetolive        Get lease information
        lease list              List all active leases
        lease keep-alive        Keeps leases alive (renew)
        member add              Adds a member into the cluster
        member remove           Removes a member from the cluster
        member update           Updates a member in the cluster
        member list             Lists all members in the cluster
        snapshot save           Stores an etcd node backend snapshot to a given file
        snapshot restore        Restores an etcd member snapshot to an etcd directory
        snapshot status         Gets backend snapshot status of a given file
        make-mirror             Makes a mirror at the destination etcd cluster
        migrate                 Migrates keys in a v2 store to a mvcc store
        lock                    Acquires a named lock
        elect                   Observes and participates in leader election
        auth enable             Enables authentication
        auth disable            Disables authentication
        user add                Adds a new user
        user delete             Deletes a user
        user get                Gets detailed information of a user
        user list               Lists all users
        user passwd             Changes password of user
        user grant-role         Grants a role to a user
        user revoke-role        Revokes a role from a user
        role add                Adds a new role
        role delete             Deletes a role
        role get                Gets detailed information of a role
        role list               Lists all roles
        role grant-permission   Grants a key to a role
        role revoke-permission  Revokes a key from a role
        check perf              Check the performance of the etcd cluster
        help

如：创建、查看、删除

文件存储是以表链接的形式。

[root@etcd1 ~]# #/ccx/date1是存储路径，"ccx is superhero"是数据内容
[root@etcd1 ~]# etcdctl put /ccx/date1 "ccx is superhero"
OK
[root@etcd1 ~]# etcdctl get /ccx/date1
/ccx/date1
ccx is superhero
[root@etcd1 ~]# 
[root@etcd1 ~]# etcdctl del /ccx/date1
1
[root@etcd1 ~]# 
[root@etcd1 ~]# etcdctl get /ccx/date1

其他节点访问该etcd

准确来说，叫做远程执行其他etcd命令罢了【注意，这仅仅是远程访问而已，并不是该集群的一部分】
命令：etcdctl --endpoints IP:2379 这跟需要执行的命令【2和3的版本命令不一样】
如下，我在157上查看156这个etc的数据

[root@etcd2 ~]# #我查看3的内容，本地也需要切换到3的环境，否则命令找不到
[root@etcd2 ~]# export ETCDCTL_API=3
[root@etcd2 ~]# etcdctl --endpoints http://192.168.59.156:2379 get /ccx/date1
/ccx/date1
ccx is superhero
[root@etcd2 ~]# 
[root@etcd2 ~]# 
[root@etcd2 ~]# ip a | grep 59
    inet 192.168.59.157/24 brd 192.168.59.255 scope global ens32
[root@etcd2 ~]#

取消版本定义【恢复默认】

比如我之前定义了版本3，现在想让其恢复到默认状态【默认位版本2】
注：k8s在1.5版本以后默认使用的是版本3向etcd写入数据。
命令：unset ETCDCTL_API

[root@etcd1 ~]# export ETCDCTL_API=3
[root@etcd1 ~]# 
[root@etcd1 ~]# unset ETCDCTL_API
[root@etcd1 ~]#

多节点配置

前面说过，最好是3台，我用2台以配置文件的形式加入，第三台用加入集群的方式。

配置主节点【主节点leader】

注：这个没有真正的主节点概念，我只是用这个来做第一个配置，为了后面好说明，所以命其为 “主节点”
先停止etcd服务并清除现有数据【我是用上面单节点配置的，已经配置并启动etcd服务】
注：必须清空数据，否则会报错

[root@etcd1 ~]# systemctl stop etcd
[root@etcd1 ~]# rm -rf /var/lib/etcd/*
[root@etcd1 ~]# 
[root@etcd1 ~]# ls /var/lib/etcd

修改配置文件

文件：/etc/etcd/etcd.conf
可以看到，现在配置文件中可用的就这么几行

[root@etcd1 ~]# grep -o '^[^#].*' /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://localhost:2380,http://192.168.59.156:2380"
ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://192.168.59.156:2379"
ETCD_NAME="default"
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.156:2379"
[root@etcd1 ~]#

直接vim /etc/etcd/etcd.conf，将里面的东西删完，复制下面内容进去修改IP，其他不变。
- 192.168.59.156为我当前节点IP，192.168.59.157/158为我准备加入该节点的IP

[root@etcd1 ~]# vim /etc/etcd/etcd.conf 
#数据存储位置【自定义】
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
#服务器间通讯端口
ETCD_LISTEN_PEER_URLS="http://192.168.59.156:2380,http://localhost:2380"
# 客户端通讯端口
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.156:2379,http://localhost:2379"
#集群名字【自定义】
ETCD_NAME="etcd-156"
#端口监听
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.156:2380"
#端口监听
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.156:2379"
#所有集群IP信息都需要列出来【自己和要加入该集群的ip，前面的名称是上面的集群名字，不能乱写】
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380"
#类似于秘钥【每台上面的这个值要一样】
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
#新集群值必须为new
ETCD_INITIAL_CLUSTER_STATE="new"

参数详细说明：
- ETCD_DATA_DIR 服务运行数据保存的路径
- ETCD_NAME 节点名称，默认为default
  -ETCD_LISTEN_PEER_URLS 监听的同伴通信的地址，比如http://ip:2380，如果有多个，使用逗号分隔。需要所有节点都能够访问，所以不要使用 localhost！
- ETCD_LISTEN_CLIENT_URLS 监听的客户端服务地址
- ETCD_ADVERTISE_CLIENT_URLS 对外公告的该节点客户端监听地址，这个值会告诉集群中其他节点。
- ETCD_INITIAL_ADVERTISE_PEER_URLS 对外公告的该节点同伴监听地址，这个值会告诉集群中其他节点
- ETCD_INITIAL_CLUSTER 集群中所有节点的信息
- ETCD_INITIAL_CLUSTER_STATE 新建集群的时候，这个值为 new；假如加入已经存在的集群，这个值为 existing。
- ETCD_INITIAL_CLUSTER_TOKEN 集群的ID，多个集群的时候，每个集群的ID必须保持唯一

导入配置文件到其他节点

主节点上将配置文件拷贝到另外一个节点上【导入前要先去另外节点上安装etcd服务】

[root@etcd1 ~]# scp /etc/etcd/etcd.conf 192.168.59.157:/etc/etcd/
root@192.168.59.157's password: 
etcd.conf                                        100%  567   984.1KB/s   00:00    
[root@etcd1 ~]#

其他节点同步主节点【备节点】

注：加入主节点其实和主节点的配置是一样的，为了方便说明才命其为：”其他节点“

安装etcd包

安装etcd包：yum -y install etcd

修改配置文件

我在主节点上已经将配置文件导入过来了，所以直接将ip和NAME修改即可【其他不变】。
- 修改IP：在编辑界面输入：:1,6s/156/157/g可以直接替换1-6行的ip【仅替换1-6行】
- 修改ETCD_NAME=,修改规则如下图
如，我157修改完毕以后的配置文件信息如下

[root@etcd2 ~]# cat /etc/etcd/etcd.conf
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
ETCD_LISTEN_PEER_URLS="http://192.168.59.157:2380,http://localhost:2380" 
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.157:2379,http://localhost:2379" 
ETCD_NAME="etcd-157" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.157:2380" 
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.157:2379" 
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
ETCD_INITIAL_CLUSTER_STATE="new"

[root@etcd2 ~]#

启动etcd服务

这个玩意有启动顺序的，我们现在主节点上启动该服务
命令：systemctl start etcd
这时候是卡主的【是正常的，要让另外节点etcd服务都起来以后该节点才会启完】
然后我们去启动其他节点的etcd服务
命令：systemctl start etcd

[root@etcd2 ~]# systemctl start etcd
[root@etcd2 ~]#

这时候主节点的etcd服务才会启动完毕

[root@etcd1 ~]# systemctl start etcd
[root@etcd1 ~]#

上面所有节点服务都启动完毕以后，再所有节点执行加入开启启动的命令:
systemctl enable etcd

#注意主机名是3台的
[root@etcd1 ~]# systemctl enable etcd
[root@etcd1 ~]# 

[root@etcd2 ~]# systemctl enable etcd
Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /usr/lib/systemd/system/etcd.service.
[root@etcd2 ~]#

至此，主备节点就配置完毕了

新节点加入集群

所谓新节点加入集群，就是一台新的主机，需要加入到已存在的集群里面
这个需要在版本2上完成，所以需要将版本切回2，否则会报member命令不存在。
命令：export ETCDCTL_API=2

安装etcd包

安装etcd包：yum -y install etcd

查看现有节点的leader信息

[root@etcd1 ~]# etcdctl member list 
220b656d1029422: name=etcd-157 peerURLs=http://192.168.59.157:2380 clientURLs=http://192.168.59.157:2379,http://localhost:2379 isLeader=false
aaaca50ef34fc86: name=etcd-156 peerURLs=http://192.168.59.156:2380 clientURLs=http://192.168.59.156:2379,http://localhost:2379 isLeader=true
[root@etcd1 ~]#

如上，isLeader=yes的就是主节点了，下面操作就在这上面进行。

加入节点【命令获取配置文件信息】

在主节点上执行：etcdctl member add 自定义名称 http://ip:2380 【主节点(leader)上执行】【该命令必须执行】
如，我准备将etcd3加入：
执行这个命令以后，下面会出现3行ETCD开头的内容，意思是需要将这3行内容写入配置文件：/etc/etcd/etcd.conf中，但这3行内容是不够的还需要其他的，所以我们可以直接导入主节点的配置文件更方便。

[root@etcd1 ~]# etcdctl member add etcd-158 http://192.168.59.158:2380
Added member named etcd-158 with ID 9b5d28a80771cff9 to cluster

ETCD_NAME="etcd-158"
ETCD_INITIAL_CLUSTER="etcd-157=http://192.168.59.157:2380,etcd-156=http://192.168.59.156:2380,etcd-158=http://192.168.59.158:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
[root@etcd1 ~]#

执行上面条代码更主要的是能查看到该节点信息了，虽然不完整【因为配置文件没配置哦】

[root@etcd1 ~]# etcdctl member list
220b656d1029422: name=etcd-157 peerURLs=http://192.168.59.157:2380 clientURLs=http://192.168.59.157:2379,http://localhost:2379 isLeader=false
aaaca50ef34fc86: name=etcd-156 peerURLs=http://192.168.59.156:2380 clientURLs=http://192.168.59.156:2379,http://localhost:2379 isLeader=true
9b5d28a80771cff9[unstarted]: peerURLs=http://192.168.59.158:2380
[root@etcd1 ~]#

配置文件继续往下看

修改配置文件

上面说过，直接导入主配置文件更方便，先导入吧【任意已存在节点上操作都可以】

[root@etcd1 ~]# scp /etc/etcd/etcd.conf 192.168.59.158:/etc/etcd/
root@192.168.59.158's password: 
etcd.conf                                        100%  567   984.1KB/s   00:00    
[root@etcd1 ~]#

修改IP：在编辑界面输入：:1,6s/156/157/g可以直接替换1-6行的ip【仅替换1-6行】【参考上面】
此外，还需要新增和修改下面2样内容。
修改完毕以后的代码如下【注：只修改这一个节点即可，主备节点不需要修改】

[root@etcd3 ~]# cat /etc/etcd/etcd.conf
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
ETCD_LISTEN_PEER_URLS="http://192.168.59.158:2380,http://localhost:2380" 
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.158:2379,http://localhost:2379" 
ETCD_NAME="etcd-158" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.158:2380" 
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.158:2379" 
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380,etcd-158=http://192.168.59.158:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
ETCD_INITIAL_CLUSTER_STATE="existing"

[root@etcd3 ~]#

启动etcd服务

直接启动服务即可，一定要确定状态为：active，如果状态不对，处理方法看下面的。。。处理方法

[root@etcd3 ~]# systemctl start etcd
[root@etcd3 ~]# systemctl is-active etcd
active
[root@etcd3 ~]#

然后加入开机启动

[root@etcd3 ~]# systemctl enable etcd
Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /usr/lib/systemd/system/etcd.service.
[root@etcd3 ~]#

启动etcd服务正常，但etcd状态为inactive处理方法

是这样的，我上面新节点加入是我之前配置过的，所以当我用新加入方式的时候，最后启动该服务就是有问题，也是折腾了一会才想起是因为做过配置，导致服务不能启动的。
内容如下
服务启动不会报错，但状态就是不会为：active

[root@etcd3 ~]# systemctl restart etcd
[root@etcd3 ~]# systemctl status etcd
● etcd.service - Etcd Server
   Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since 三 2021-07-14 17:44:36 CST; 1s ago
  Process: 2530 ExecStart=/bin/bash -c GOMAXPROCS=$(nproc) /usr/bin/etcd --name="${ETCD_NAME}" --data-dir="${ETCD_DATA_DIR}" --listen-client-urls="${ETCD_LISTEN_CLIENT_URLS}" (code=exited, status=0/SUCCESS)
 Main PID: 2530 (code=exited, status=0/SUCCESS)

原因：这是因为该节点之前配置过，有存储信息，所以导致该情况发生
处理方法
直接执行：rm -rf /var/lib/etcd/*删除数据，然后重启就正常了

[root@etcd3 ~]# rm -rf /var/lib/etcd/*
[root@etcd3 ~]# 
[root@etcd3 ~]# systemctl start etcd
[root@etcd3 ~]# systemctl status etcd
● etcd.service - Etcd Server
   Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: active (running) since 三 2021-07-14 17:46:23 CST; 2s ago
 Main PID: 2595 (etcd)
   CGroup: /system.slice/etcd.service
           └─2595 /usr/bin/etcd --name=etcd-158 --data-dir=/var/lib/etcd/cluster.etcd --listen-client-urls=http://192.168.59.158:2379,http://localhost:2379

3个节点配置一览

通过命令可以看到leader是etcd-156【主机名etcd1】

[root@etcd3 ~]# etcdctl member list
220b656d1029422: name=etcd-157 peerURLs=http://192.168.59.157:2380 clientURLs=http://192.168.59.157:2379,http://localhost:2379 isLeader=false
aaaca50ef34fc86: name=etcd-156 peerURLs=http://192.168.59.156:2380 clientURLs=http://192.168.59.156:2379,http://localhost:2379 isLeader=true
9b5d28a80771cff9: name=etcd-158 peerURLs=http://192.168.59.158:2380 clientURLs=http://192.168.59.158:2379,http://localhost:2379 isLeader=false
[root@etcd3 ~]#

etcd1和etcd2是做的主备，etcd3是用命令加入的
用空白行隔开了，注意看主机名

[root@etcd1 ~]# cat /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
ETCD_LISTEN_PEER_URLS="http://192.168.59.156:2380,http://localhost:2380" 
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.156:2379,http://localhost:2379" 
ETCD_NAME="etcd-156" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.156:2380" 
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.156:2379" 
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
ETCD_INITIAL_CLUSTER_STATE="new"

[root@etcd1 ~]#

[root@etcd2 ~]# cat /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
ETCD_LISTEN_PEER_URLS="http://192.168.59.157:2380,http://localhost:2380" 
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.157:2379,http://localhost:2379" 
ETCD_NAME="etcd-157" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.157:2380" 
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.157:2379" 
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
ETCD_INITIAL_CLUSTER_STATE="new"

[root@etcd2 ~]# 

[root@etcd3 ~]# cat /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
ETCD_LISTEN_PEER_URLS="http://192.168.59.158:2380,http://localhost:2380" 
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.158:2379,http://localhost:2379" 
ETCD_NAME="etcd-158" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.158:2380" 
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.158:2379" 
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380,etcd-158=http://192.168.59.158:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
ETCD_INITIAL_CLUSTER_STATE="existing"

[root@etcd3 ~]#

测试

如，我在节点1和节点2上分别创建一个文件，然后去节点3上查看，可以看到数据是同步的即正常
注意看主机名【用空白行隔开了】
下面是默认版本2

[root@etcd1 ~]# etcdctl mkdir /etcd1
[root@etcd1 ~]# 

[root@etcd2 ~]# etcdctl mkdir /etcd2
[root@etcd2 ~]# 

[root@etcd3 ~]# etcdctl ls /
/etcd1
/etcd2
[root@etcd3 ~]#

版本3也是一样可以同步的，但需要注意同一目录下数据会被覆盖
如下，我在节点1上写入了hero1，然后去2节点上写入hero2，最终看到的是hero2 【注意，3版本在生产环境千万不能这么搞，这么搞数据就丢了】

[root@etcd1 ~]# export ETCDCTL_API=3
[root@etcd1 ~]# etcdctl put /ccx hero1
OK
[root@etcd1 ~]# 

[root@etcd2 ~]# etcdctl put /ccx hero2
OK
[root@etcd2 ~]# 
# 版本3和版本2还不一样，3版本是以表链接形式，统一目录下数据会被覆盖，不能共存【2版本可以】
[root@etcd3 ~]# etcdctl get /ccx
/ccx
hero2
[root@etcd3 ~]#

etcd快照【snap】

快照是：etcdctl 3版本里来做的
先在每台上执行3的环境变量：export ETCDCTL_API=3
现在我们创建2组数据，用来测试后面的快照恢复。

[root@etcd1 ~]# etcdctl put date1 "hello word"
OK
[root@etcd1 ~]# etcdctl put date2 "hello word_new"
OK
[root@etcd1 ~]# 

[root@etcd3 ~]# etcdctl get date1
date1
hello word
[root@etcd3 ~]# etcdctl get date2
date2
hello word_new
[root@etcd3 ~]#

获取帮助

命令：etcdctl snap --help
这里面有很多详细说明
但我这关注使用，下翻有一个COMMANDS选项，里说使用
如：save是做快照，restore是恢复快照。

[root@etcd3 ~]# etcdctl snap --help | grep -A 5 COMMANDS:
COMMANDS:
        save    Stores an etcd node backend snapshot to a given file
        restore Restores an etcd member snapshot to an etcd directory
        status  Gets backend snapshot status of a given file

GLOBAL OPTIONS:
[root@etcd3 ~]#

无证书快照创建与恢复

创建快照【无证书】

命令：etcdctl snap save 自定义名称
注：当前所在什么目录，打包的数据就存放在什么目录
如：我将我现在的数据打包为：snap1.date

[root@etcd3 /]# etcdctl snap save snap1.date
Snapshot saved at snap1.date
[root@etcd3 /]# 
[root@etcd3 /]# ls | grep sna
snap1.date
[root@etcd3 /]#

恢复快照【无证书】

恢复前我先删除现有做过快照的数据

[root@etcd3 /]# etcdctl del date1
1
[root@etcd3 /]# 
[root@etcd3 /]# etcdctl del date2
1
[root@etcd3 /]#

快照拷贝至其他节点

然后将备份的数据拷贝到其他节点【我etcd3上做的快照，所以就拷贝到1和2上】
也就是说，虽然写数据仅在其中一个节点写会自动同步到其他节点，但恢复数据需要在每一个节点上都操作一遍，不能仅在其中某些节点操作！

[root@etcd3 /]# ls | grep sna
snap1.date
[root@etcd3 /]# 
[root@etcd3 /]# scp snap1.date 192.168.59.156:/
The authenticity of host '192.168.59.156 (192.168.59.156)' can't be established.
ECDSA key fingerprint is SHA256:zRtVBoNePoRXh9aA8eppKwwduS9Rjjr/kT5a7zijzjE.
ECDSA key fingerprint is MD5:b8:53:cc:da:86:2a:97:dc:bd:64:6b:b1:d0:f3:02:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.59.156' (ECDSA) to the list of known hosts.
root@192.168.59.156's password: 
Permission denied, please try again.
root@192.168.59.156's password: 
snap1.date                                       100%   20KB  10.9MB/s   00:00    
[root@etcd3 /]# scp snap1.date 192.168.59.157:/
The authenticity of host '192.168.59.157 (192.168.59.157)' can't be established.
ECDSA key fingerprint is SHA256:zRtVBoNePoRXh9aA8eppKwwduS9Rjjr/kT5a7zijzjE.
ECDSA key fingerprint is MD5:b8:53:cc:da:86:2a:97:dc:bd:64:6b:b1:d0:f3:02:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.59.157' (ECDSA) to the list of known hosts.
root@192.168.59.157's password: 
snap1.date                                       100%   20KB  11.4MB/s   00:00    
[root@etcd3 /]# 
[root@etcd3 /]#

etcd数据清除【所有节点同步操作】

先停止所有节点的etcd服务

[root@etcd3 /]# systemctl stop etcd

[root@etcd2 /]# systemctl stop etcd

[root@etcd1 /]# systemctl stop etcd

删除现有全部数据

[root@etcd1 ~]# rm -rf /var/lib/etcd/*

[root@etcd2 ~]# rm -rf /var/lib/etcd/*

[root@etcd3 ~]# rm -rf /var/lib/etcd/*

开始数据恢复【所有节点同步操作】

给快照文件添加etcd的组和用户，免得没有权限访问报错
【注：默认是有etcd这个用户的，在快照文件目录下执行chown etcd.etcd 快照文件 即可

[root@etcd3 /]# cat /etc/passwd| grep etcd
etcd:x:997:995:etcd user:/var/lib/etcd:/sbin/nologin
[root@etcd3 /]# 
[root@etcd3 /]# ls | grep snap
snap1.date
[root@etcd3 /]# 
[root@etcd3 /]# chown etcd.etcd snap1.date 
[root@etcd3 /]# 
[root@etcd3 /]#

[root@etcd2 /]# chown etcd.etcd snap1.date 

[root@etcd1 /]# chown etcd.etcd snap1.date

etcdctl snapshot restore 快照数据文件 --name 配置文件拷贝过来【不固定】 --initial-cluster 配置文件拷贝过来【这是固定的，一次拷贝所以节点相同】 --initial-advertise-peer-urls 配置文件拷贝过来【不固定】 --data-dir /var/lib/etcd/cluster.etcd

配置文件路径：/etc/etcd/etcd.conf
为了好理解，我在这分开做说明吧，仅下面3个参数需要修改
- --name对应配置文件中：ETCD_NAME
- --initial-cluster对应配置文件中： ETCD_INITIAL_CLUSTER【3个节点该内容都一样，需要注意，必须所有节点信息都要列出来】
- --initial-advertise-peer-urls 对应配置文件中:ETCD_INITIAL_ADVERTISE_PEER_URLS

恢复etcd1

修改的信息在上面都说了，如果还是没明白修改参数的，我用图片圈出来对应关心了，这样应该懂了吧！
配置文件内容：

[root@etcd1 /]# 
[root@etcd1 /]# cat /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
ETCD_LISTEN_PEER_URLS="http://192.168.59.156:2380,http://localhost:2380" 
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.156:2379,http://localhost:2379" 
ETCD_NAME="etcd-156" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.156:2380" 
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.156:2379" 
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
ETCD_INITIAL_CLUSTER_STATE="new"

[root@etcd1 /]#

恢复数据代码及成功提示符如下

[root@etcd1 /]# etcdctl snapshot restore snap1.date --name etcd-156 --initial-cluster etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380,etcd-158=http://192.168.59.158:2380 --initial-advertise-peer-urls http://192.168.59.156:2380 --data-dir /var/lib/etcd/cluster.etcd
2021-07-15 12:34:52.751689 I | etcdserver/membership: added member 220b656d1029422 [http://192.168.59.157:2380] to cluster bf1393a8380b1115
2021-07-15 12:34:52.751800 I | etcdserver/membership: added member aaaca50ef34fc86 [http://192.168.59.156:2380] to cluster bf1393a8380b1115
2021-07-15 12:34:52.751823 I | etcdserver/membership: added member 240bc0d12da09d72 [http://192.168.59.158:2380] to cluster bf1393a8380b1115
[root@etcd1 /]#

恢复etcd2

直接拷贝etcd1上的恢复代码，修改--name和--initial-advertise-peer-urls即可，其他不变
配置文件内容

[root@etcd2 /]# cat /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
ETCD_LISTEN_PEER_URLS="http://192.168.59.157:2380,http://localhost:2380" 
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.157:2379,http://localhost:2379" 
ETCD_NAME="etcd-157" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.157:2380" 
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.157:2379" 
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
ETCD_INITIAL_CLUSTER_STATE="new"

[root@etcd2 /]#

恢复数据代码及成功提示符如下

[root@etcd2 /]# etcdctl snapshot restore snap1.date --name etcd-157 --initial-cluster etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380,etcd-158=http://192.168.59.158:2380 --initial-advertise-peer-urls http://192.168.59.157:2380 --data-dir /var/lib/etcd/cluster.etcd
2021-07-15 12:53:30.860174 I | etcdserver/membership: added member 220b656d1029422 [http://192.168.59.157:2380] to cluster bf1393a8380b1115
2021-07-15 12:53:30.860269 I | etcdserver/membership: added member aaaca50ef34fc86 [http://192.168.59.156:2380] to cluster bf1393a8380b1115
2021-07-15 12:53:30.860290 I | etcdserver/membership: added member 240bc0d12da09d72 [http://192.168.59.158:2380] to cluster bf1393a8380b1115
[root@etcd2 /]#

恢复etcd3

直接拷贝etcd1上的恢复代码，修改--name和--initial-advertise-peer-urls即可，其他不变
配置文件内容

[root@etcd3 /]# cat /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/cluster.etcd" 
ETCD_LISTEN_PEER_URLS="http://192.168.59.158:2380,http://localhost:2380" 
ETCD_LISTEN_CLIENT_URLS="http://192.168.59.158:2379,http://localhost:2379" 
ETCD_NAME="etcd-158" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.59.158:2380" 
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379,http://192.168.59.158:2379" 
ETCD_INITIAL_CLUSTER="etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380,etcd-158=http://192.168.59.158:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" 
ETCD_INITIAL_CLUSTER_STATE="existing"

[root@etcd3 /]#

恢复数据代码及成功提示符如下

[root@etcd3 /]# etcdctl snapshot restore snap1.date --name etcd-158 --initial-cluster etcd-156=http://192.168.59.156:2380,etcd-157=http://192.168.59.157:2380,etcd-158=http://192.168.59.158:2380 --initial-advertise-peer-urls http://192.168.59.158:2380 --data-dir /var/lib/etcd/cluster.etcd
2021-07-15 12:54:24.757913 I | etcdserver/membership: added member 220b656d1029422 [http://192.168.59.157:2380] to cluster bf1393a8380b1115
2021-07-15 12:54:24.758030 I | etcdserver/membership: added member aaaca50ef34fc86 [http://192.168.59.156:2380] to cluster bf1393a8380b1115
2021-07-15 12:54:24.758055 I | etcdserver/membership: added member 240bc0d12da09d72 [http://192.168.59.158:2380] to cluster bf1393a8380b1115
[root@etcd3 /]#

数据文件权限赋予【所有节点操作】

最后需要给数据文件赋予etcd分组，以免权限不足。
命令：chown -R etcd.etcd /var/lib/etcd

[root@etcd1 /]# chown -R etcd.etcd /var/lib/etcd/

[root@etcd2 /]# chown -R etcd.etcd /var/lib/etcd/

[root@etcd3 /]# chown -R etcd.etcd /var/lib/etcd/

启动etcd服务【所有节点】

命令：systemctl start etcd
第一个节点启动的时候会卡主是正常的，最少2个节点启动以后才行。

[root@etcd3 /]# systemctl start etcd

[root@etcd2 /]# systemctl start etcd

[root@etcd1 /]# systemctl start etcd

测试

直接执行查看命令，之前的数据能出来即快照恢复成功。

[root@etcd3 /]# etcdctl get date1
date1
hello word
[root@etcd3 /]# 
[root@etcd3 /]# etcdctl get date2
date2
hello word_new
[root@etcd3 /]# 

[root@etcd2 /]# etcdctl get date1
date1
hello word
[root@etcd2 /]#


[root@etcd1 /]# etcdctl get date2
date2
hello word_new
[root@etcd1 /]#

总结

整个快照恢复过程略显复杂，但上面这些操作是可以写成脚本的，用脚本一键恢复。

无证书快照创建与恢复

创建快照【有证书】

相关证书参数在–help中OPTIONS下面都是有说明的，下面一般是必带参数了
/path路径不是固定的，存放在哪就定义什么路径，https处替换为服务地址即可。

etcdctl snap save --cacert="/path/cacent" --cert="/path/cert" --key="/path/key" --endpoints=https://127.0.0.1:2379

因为我环境中没有证书，做不了演示。

恢复快照【有证书】

因为我没有证书环境，所以没法测试，但理论上恢复快照的方式和没证书是一样的，所以可以直接参考上面无证书快照恢复方式。

k8s容器中etc说明

容器运行方式

k8s中的etcd是以容器的方式运行的，但是端口这些和上面说的一样，且写数据的方式是用的版本3。

[root@master ~]# kubectl get pods -n kube-system  | grep etcd
etcd-master                                1/1     Running   4          3d5h
[root@master ~]#

k8s中的etcd配置文件

k8s中的etcd配置文件是以yaml的形式存在的：/etc/kubernetes/manifests/etcd.yaml

[root@master ~]# cat /etc/kubernetes/manifests/etcd.yaml  |grep -A 20 command:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.59.142:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://192.168.59.142:2380
    - --initial-cluster=master=https://192.168.59.142:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.59.142:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.59.142:2380
    - --name=master
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.aliyuncs.com/google_containers/etcd:3.4.13-0
    imagePullPolicy: IfNotPresent
[root@master ~]#

k8s中的etcd文件存储位置

其实上面配置文件中有说明，容器中etcd的存储位置是：
- --data-dir=/var/lib/etcd
除此之外，还有一个挂载卷：
- mountPath: /var/lib/etcd
也就是说，将容器中的/var/lib/etcd同步到本地的/var/lib/etcd了
所以我们像查看k8s中etcd的文件存储内容，直接在本地服务器上查看：/var/lib/etcd即可

[root@master ~]# ls /var/lib/etcd/
member
[root@master ~]# 
[root@master etcd]# cd member/
[root@master member]# ls
snap  wal
[root@master member]# cd snap/
[root@master snap]# ls
0000000000000006-00000000002255f6.snap  0000000000000006-000000000022cb29.snap
0000000000000006-0000000000227d07.snap  0000000000000006-000000000022f23a.snap
0000000000000006-000000000022a418.snap  db
[root@master snap]# cd ..
[root@master member]# cd wal/
[root@master wal]# ls
000000000000002c-00000000001cbe27.wal  000000000000002f-000000000021568e.wal
000000000000002d-00000000001e469a.wal  0000000000000030-000000000022dde6.wal
000000000000002e-00000000001fcf1c.wal  1.tmp
[root@master wal]#