1 etcd概述
etcd是一个高度一致的分布式键值存储,它提供了一种可靠的方式来存储需要由分布式系统或机器集群访问的数据。从简单应用程序到Kubernetes到任何复杂性的应用程序都可以从etcd中读写数据。
etcd是用Go语言编写的,它具有出色的跨平台支持,拥有较小的二进制文件和强大的社区。etcd机器之间的通信通过Raft共识算法处理。
1.1 Kubernetes集群外部 etcd 架构
etcd分布式数据存储集群在独立于kubernetes控制平面节点的其他节点上运行, etcd 成员在不同的主机上运行,每个etcd主机与每个控制平面节点的kube-apiserver通信,这种拓扑结构解耦了控制平面和etcd成员。
1.2 etcd工作原理
l HTTP Server:主要进行处理用户发送的API请求以及其他etcd节点的同步与心跳信息请求。
l Store:处理etcd支持的各类功能的事务,包括数据索引、节点状态变更、监控与反馈、事件处理与执行等。
l Raft:Raft强一致性算法的具体实现,是etcd的核心。
l WAL:Write Ahead Log(预写式日志),WAL是etcd的数据存储方式。Snapshot是为了防止数据过多而进行的状态快照;Entry则表示存储的具体日志内容。
用户请求会经由HTTP Server转发给Store进行具体的事务处理,如果涉及节点的修改,则交给Raft进行状态的变更、日志的记录,然后再同步给其他etcd节点确认数据提交,最后进行数据的提交,再次同步。
2 安装和启动etcd集群
2.1安装etcd(CentOS 7.6)
ETCD_VER=v3.5.4
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version
2.1.1方式一:二进制文件
下载适用于平台的压缩存档文件https://github.com/etcd-io/etcd/releases/,把以下内容保存至文件中
ETCD_VER=v3.5.4
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version
执行脚本,查看安装结果
启动本地etcd服务
/tmp/etcd-download-test/etcd
写读验证etcd
[root@localhost k8s]# /tmp/etcd-download-test/etcdctl --endpoints=localhost:2379 put foo bar
OK
[root@localhost k8s]# /tmp/etcd-download-test/etcdctl --endpoints=localhost:2379 get foo
foo
bar
2.1.2方式二:源代码构建
部署机需要部署go环境,需要解决网络连接外网的问题。
2.2部署etcd集群
2.2.1在每个节点指定集群成员
TOKEN=token-01
CLUSTER_STATE=new
NAME_1=machine-1
NAME_2=machine-2
NAME_3=machine-3
HOST_1=192.168.22.154
HOST_2=192.168.22.155
HOST_3=192.168.22.157
CLUSTER=${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_3}=http://${HOST_3}:2380
2.2.2在每个机器上执行如下命令
监# For machine 1
THIS_NAME=${NAME_1}
THIS_IP=${HOST_1}
etcd --data-dir=data.etcd --name ${THIS_NAME} \
--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN}
# For machine 2
THIS_NAME=${NAME_2}
THIS_IP=${HOST_2}
etcd --data-dir=data.etcd --name ${THIS_NAME} \
--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN}
# For machine 3
THIS_NAME=${NAME_3}
THIS_IP=${HOST_3}
etcd --data-dir=data.etcd --name ${THIS_NAME} \
--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN}
2.2.3查看集群状态
export ETCDCTL_API=3
HOST_1=192.168.22.154
HOST_2=192.168.22.155
HOST_3=192.168.22.157
ENDPOINTS=$HOST_1:2379,$HOST_2:2379,$HOST_3:2379
etcdctl --endpoints=$ENDPOINTS member list
3 etcd常用操作
3.1增加key
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put foo "Hello World"
OK
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get foo
foo
Hello World
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS --write-out="json" get foo
3.2删除key
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS del foo
1
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get foo
通过前缀删除key
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put k1 value1
OK
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put k2 value2
OK
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get k --prefix
k1
value1
k2
value2
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS del k --prefix
2
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get k --prefix
3.3通过前缀获取key
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put web1 value1
OK
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put web2 value2
OK
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put web3 value3
OK
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get web --prefix
web1
value1
web2
value2
web3
value3
3.4查看集群状态
[root@localhost ~]# etcdctl --write-out=table --endpoints=$ENDPOINTS endpoint status
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS endpoint health
3.5 etcd数据备份
[root@localhost ~]# ENDPOINTS=$HOST_1:2379
[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS snapshot save my.db
[root@localhost ~]# etcdctl --write-out=table --endpoints=$ENDPOINTS snapshot status my.db
4 Prometheus+grafana监控etcd集群
4.1安装prometheus
[root@localhost ~]# PROMETHEUS_VERSION="2.0.0"
[root@localhost ~]# wget https://github.com/prometheus/prometheus/releases/download/v$PROMETHEUS_VERSION/prometheus-$PROMETHEUS_VERSION.linux-amd64.tar.gz -O /tmp/prometheus-$PROMETHEUS_VERSION.linux-amd64.tar.gz
S_VERSION.linux-amd64.tar.gz --directory /tmp/ --strip-components=1
[root@localhost k8s]# /tmp/prometheus --version
4.2配置etcd集群端点
[root@localhost k8s]# cat > /tmp/test-etcd.yaml <<EOF
> global:
> scrape_interval: 10s
> scrape_configs:
> - job_name: test-etcd
> static_configs:
> - targets: ['192.168.22.154:2379','192.168.22.155:2379','192.168.22.157:2379']
> EOF
[root@localhost k8s]# cat /tmp/test-etcd.yaml
4.3启动prometheus
[root@localhost k8s]# /tmp/prometheus --config.file /tmp/test-etcd.yaml --web.listen-address ":9090" >> /tmp/test-etcd.log
4.4访问prometheus
4.5执行查询
4.6配合grafana查看
4.6.1部署grafana
[root@localhost k8s]# wget https://dl.grafana.com/enterprise/release/grafana-enterprise-9.0.1-1.x86_64.rpm
[root@localhost k8s]# sudo yum install grafana-enterprise-9.0.1-1.x86_64.rpm
[root@localhost k8s]# systemctl restart grafana-server.service
4.6.2登录grafana
http://192.168.22.154:3000/login
账号和密码:admin/admin
4.6.3添加prometheus
4.6.4导入etcd模板
https://grafana.com/grafana/dashboards/3070
4.6.5查看视图
5总结
由于etcd将数据写入磁盘,其性能很大程度上取决于磁盘性能。因此,强烈推荐使用SSD。etcd 集群需要大多数节点(即仲裁)来就集群状态的更新达成一致。对于具有n个成员的集群,法定为(n/2)+1。一个etcd集群可能不应该超过七个节点。一个5成员的etcd集群可以容忍两个成员的故障,这在大多数情况下就足够了。尽管较大的集群提供了更好的容错能力,但写入性能会受到影响,因为必须在更多机器上复制数据。
作者:丁运管,深信服云计算认证专家(SCCE-C),产业教育中心资深讲师,云计算认证架构师,曾就职于阿里云、宏福集团,担任高级运维工程师和云计算高级讲师;多次作为电信、移动等众多大型企业特聘讲师,提供课程培训和技术顾问;致力于Docker、Kubernetes、OpenStack等前沿技术研究,具有丰富的云计算一线实战经验以及课程资源建设和交付经验。