etcd分布式存储从零实践可观测_etcd存储原理-CSDN博客

本文链接：https://blog.csdn.net/sangfor_edu/article/details/125906543

1 etcd概述

etcd是一个高度一致的分布式键值存储，它提供了一种可靠的方式来存储需要由分布式系统或机器集群访问的数据。从简单应用程序到Kubernetes到任何复杂性的应用程序都可以从etcd中读写数据。

etcd是用Go语言编写的，它具有出色的跨平台支持，拥有较小的二进制文件和强大的社区。etcd机器之间的通信通过Raft共识算法处理。

1.1 Kubernetes集群外部 etcd 架构

etcd分布式数据存储集群在独立于kubernetes控制平面节点的其他节点上运行， etcd 成员在不同的主机上运行，每个etcd主机与每个控制平面节点的kube-apiserver通信，这种拓扑结构解耦了控制平面和etcd成员。

1.2 etcd工作原理

l HTTP Server：主要进行处理用户发送的API请求以及其他etcd节点的同步与心跳信息请求。

l Store：处理etcd支持的各类功能的事务，包括数据索引、节点状态变更、监控与反馈、事件处理与执行等。

l Raft:Raft强一致性算法的具体实现，是etcd的核心。

l WAL：Write Ahead Log（预写式日志），WAL是etcd的数据存储方式。Snapshot是为了防止数据过多而进行的状态快照；Entry则表示存储的具体日志内容。

用户请求会经由HTTP Server转发给Store进行具体的事务处理，如果涉及节点的修改，则交给Raft进行状态的变更、日志的记录，然后再同步给其他etcd节点确认数据提交，最后进行数据的提交，再次同步。

2 安装和启动etcd集群

2.1安装etcd（CentOS 7.6）

ETCD_VER=v3.5.4

# choose either URL

GOOGLE_URL=https://storage.googleapis.com/etcd

GITHUB_URL=https://github.com/etcd-io/etcd/releases/download

DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/tmp/etcd-download-test/etcd --version

/tmp/etcd-download-test/etcdctl version

/tmp/etcd-download-test/etcdutl version

2.1.1方式一：二进制文件

下载适用于平台的压缩存档文件https://github.com/etcd-io/etcd/releases/，把以下内容保存至文件中

ETCD_VER=v3.5.4

# choose either URL

GOOGLE_URL=https://storage.googleapis.com/etcd

GITHUB_URL=https://github.com/etcd-io/etcd/releases/download

DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/tmp/etcd-download-test/etcd --version

/tmp/etcd-download-test/etcdctl version

/tmp/etcd-download-test/etcdutl version

执行脚本，查看安装结果

启动本地etcd服务

/tmp/etcd-download-test/etcd

写读验证etcd

[root@localhost k8s]# /tmp/etcd-download-test/etcdctl --endpoints=localhost:2379 put foo bar

[root@localhost k8s]# /tmp/etcd-download-test/etcdctl --endpoints=localhost:2379 get foo

foo

bar

2.1.2方式二：源代码构建

部署机需要部署go环境，需要解决网络连接外网的问题。

2.2部署etcd集群

2.2.1在每个节点指定集群成员

TOKEN=token-01

CLUSTER_STATE=new

NAME_1=machine-1

NAME_2=machine-2

NAME_3=machine-3

HOST_1=192.168.22.154

HOST_2=192.168.22.155

HOST_3=192.168.22.157

CLUSTER=${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_3}=http://${HOST_3}:2380

2.2.2在每个机器上执行如下命令

监# For machine 1

THIS_NAME=${NAME_1}

THIS_IP=${HOST_1}

etcd --data-dir=data.etcd --name ${THIS_NAME} \

--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \

--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \

--initial-cluster ${CLUSTER} \

--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN}

# For machine 2

THIS_NAME=${NAME_2}

THIS_IP=${HOST_2}

etcd --data-dir=data.etcd --name ${THIS_NAME} \

--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \

--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \

--initial-cluster ${CLUSTER} \

--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN}

# For machine 3

THIS_NAME=${NAME_3}

THIS_IP=${HOST_3}

etcd --data-dir=data.etcd --name ${THIS_NAME} \

--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \

--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \

--initial-cluster ${CLUSTER} \

--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN}

2.2.3查看集群状态

export ETCDCTL_API=3

HOST_1=192.168.22.154

HOST_2=192.168.22.155

HOST_3=192.168.22.157

ENDPOINTS=$HOST_1:2379,$HOST_2:2379,$HOST_3:2379

etcdctl --endpoints=$ENDPOINTS member list

3 etcd常用操作

3.1增加key

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put foo "Hello World"

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get foo

foo

Hello World

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS --write-out="json" get foo

3.2删除key

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS del foo

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get foo

通过前缀删除key

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put k1 value1

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put k2 value2

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get k --prefix

value1

value2

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS del k --prefix

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get k --prefix

3.3通过前缀获取key

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put web1 value1

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put web2 value2

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS put web3 value3

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS get web --prefix

web1

value1

web2

value2

web3

value3

3.4查看集群状态

[root@localhost ~]# etcdctl --write-out=table --endpoints=$ENDPOINTS endpoint status

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS endpoint health

3.5 etcd数据备份

[root@localhost ~]# ENDPOINTS=$HOST_1:2379

[root@localhost ~]# etcdctl --endpoints=$ENDPOINTS snapshot save my.db

[root@localhost ~]# etcdctl --write-out=table --endpoints=$ENDPOINTS snapshot status my.db

4 Prometheus+grafana监控etcd集群

4.1安装prometheus

[root@localhost ~]# PROMETHEUS_VERSION="2.0.0"

[root@localhost ~]# wget https://github.com/prometheus/prometheus/releases/download/v$PROMETHEUS_VERSION/prometheus-$PROMETHEUS_VERSION.linux-amd64.tar.gz -O /tmp/prometheus-$PROMETHEUS_VERSION.linux-amd64.tar.gz

S_VERSION.linux-amd64.tar.gz --directory /tmp/ --strip-components=1

[root@localhost k8s]# /tmp/prometheus --version

4.2配置etcd集群端点

[root@localhost k8s]# cat > /tmp/test-etcd.yaml <<EOF

> global:

> scrape_interval: 10s

> scrape_configs:

> - job_name: test-etcd

> static_configs:

> - targets: ['192.168.22.154:2379','192.168.22.155:2379','192.168.22.157:2379']

> EOF

[root@localhost k8s]# cat /tmp/test-etcd.yaml

4.3启动prometheus

[root@localhost k8s]# /tmp/prometheus --config.file /tmp/test-etcd.yaml --web.listen-address ":9090" >> /tmp/test-etcd.log

4.4访问prometheus

4.5执行查询

4.6配合grafana查看

4.6.1部署grafana

[root@localhost k8s]# wget https://dl.grafana.com/enterprise/release/grafana-enterprise-9.0.1-1.x86_64.rpm

[root@localhost k8s]# sudo yum install grafana-enterprise-9.0.1-1.x86_64.rpm

[root@localhost k8s]# systemctl restart grafana-server.service

4.6.2登录grafana

http://192.168.22.154:3000/login

账号和密码：admin/admin

4.6.3添加prometheus

4.6.4导入etcd模板

https://grafana.com/grafana/dashboards/3070

4.6.5查看视图

5总结

由于etcd将数据写入磁盘，其性能很大程度上取决于磁盘性能。因此，强烈推荐使用SSD。etcd 集群需要大多数节点（即仲裁）来就集群状态的更新达成一致。对于具有n个成员的集群，法定为(n/2)+1。一个etcd集群可能不应该超过七个节点。一个5成员的etcd集群可以容忍两个成员的故障，这在大多数情况下就足够了。尽管较大的集群提供了更好的容错能力，但写入性能会受到影响，因为必须在更多机器上复制数据。

作者：丁运管，深信服云计算认证专家（SCCE-C），产业教育中心资深讲师，云计算认证架构师，曾就职于阿里云、宏福集团，担任高级运维工程师和云计算高级讲师；多次作为电信、移动等众多大型企业特聘讲师，提供课程培训和技术顾问；致力于Docker、Kubernetes、OpenStack等前沿技术研究，具有丰富的云计算一线实战经验以及课程资源建设和交付经验。