K8s-etcd组件概述

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档


前言

提示:这里可以添加本文要记录的大概内容:

K8s,也就是Kubernetes,是容器时代的云管平台,在容器化和微服务中应用广泛。与OpenStack类似,K8s也是由多个组件共同协作完成的,跟业务部署相关的Deployment;kubelet接收Master发送的信息;apiserver提供对外接口;scheduler负责调度;etcd负责监控和集群信息同步。本文重点分析etcd模块的功能及对应的代码。


一、etcd在K8s中的功能

etcd是一个高可用的键值存储系统,它通过Raft一致性算法保证强一致性。在K8S中主要用于节点监控,共享配置和服务发现。

二、etcd主要组件

组件信息
如上图所示,etcd的主要组件包括HTTP Server、Store、Raft、WAL等四个组件。

1.HTTP Sever

处理HTTP请求

// NewClientHandler generates a muxed http.Handler with the given parameters to serve etcd client requests.
func NewClientHandler(lg *zap.Logger, server etcdserver.ServerPeer, timeout time.Duration) http.Handler {
	if lg == nil {
		lg = zap.NewNop()
	}
	mux := http.NewServeMux()
	etcdhttp.HandleBasic(lg, mux, server)
	etcdhttp.HandleMetricsHealth(lg, mux, server)
	return requestLogger(lg, mux)
}

2.Store

处理 etcd 支持的各类功能的事务,是API功能的具体实现

type Storage interface {
	// Save function saves ents and state to the underlying stable storage.
	// Save MUST block until st and ents are on stable storage.
	Save(st raftpb.HardState, ents []raftpb.Entry) error
	// SaveSnap function saves snapshot to the underlying stable storage.
	SaveSnap(snap raftpb.Snapshot) error
	// Close closes the Storage and performs finalization.
	Close() error
	// Release releases the locked wal files older than the provided snapshot.
	Release(snap raftpb.Snapshot) error
	// Sync WAL
	Sync() error
	// MinimalEtcdVersion returns minimal etcd storage able to interpret WAL log.
	MinimalEtcdVersion() *semver.Version
}

上述代码基本包含Store的主要接口,包括数据同步、数据保存等。


3.Raft

强一致性算法的具体实现,是 etcd 的核心,节点的主要接口信息如下

type Node interface {
   // Tick increments the internal logical clock for the Node by a single tick. Election
   // timeouts and heartbeat timeouts are in units of ticks.
   Tick()
   // Campaign causes the Node to transition to candidate state and start campaigning to become leader.
   Campaign(ctx context.Context) error
   // Propose proposes that data be appended to the log. Note that proposals can be lost without
   // notice, therefore it is user's job to ensure proposal retries.
   Propose(ctx context.Context, data []byte) error
   // ProposeConfChange proposes a configuration change. Like any proposal, the
   // configuration change may be dropped with or without an error being
   // returned. In particular, configuration changes are dropped unless the
   // leader has certainty that there is no prior unapplied configuration
   // change in its log.
   //
   // The method accepts either a pb.ConfChange (deprecated) or pb.ConfChangeV2
   // message. The latter allows arbitrary configuration changes via joint
   // consensus, notably including replacing a voter. Passing a ConfChangeV2
   // message is only allowed if all Nodes participating in the cluster run a
   // version of this library aware of the V2 API. See pb.ConfChangeV2 for
   // usage details and semantics.
   ProposeConfChange(ctx context.Context, cc pb.ConfChangeI) error

   // Step advances the state machine using the given message. ctx.Err() will be returned, if any.
   Step(ctx context.Context, msg pb.Message) error

   // Ready returns a channel that returns the current point-in-time state.
   // Users of the Node must call Advance after retrieving the state returned by Ready.
   //
   // NOTE: No committed entries from the next Ready may be applied until all committed entries
   // and snapshots from the previous one have finished.
   Ready() <-chan Ready

   // Advance notifies the Node that the application has saved progress up to the last Ready.
   // It prepares the node to return the next available Ready.
   //
   // The application should generally call Advance after it applies the entries in last Ready.
   //
   // However, as an optimization, the application may call Advance while it is applying the
   // commands. For example. when the last Ready contains a snapshot, the application might take
   // a long time to apply the snapshot data. To continue receiving Ready without blocking raft
   // progress, it can call Advance before finishing applying the last ready.
   Advance()
   // ApplyConfChange applies a config change (previously passed to
   // ProposeConfChange) to the node. This must be called whenever a config
   // change is observed in Ready.CommittedEntries, except when the app decides
   // to reject the configuration change (i.e. treats it as a noop instead), in
   // which case it must not be called.
   //
   // Returns an opaque non-nil ConfState protobuf which must be recorded in
   // snapshots.
   ApplyConfChange(cc pb.ConfChangeI) *pb.ConfState

   // TransferLeadership attempts to transfer leadership to the given transferee.
   TransferLeadership(ctx context.Context, lead, transferee uint64)

   // ReadIndex request a read state. The read state will be set in the ready.
   // Read state has a read index. Once the application advances further than the read
   // index, any linearizable read requests issued before the read request can be
   // processed safely. The read state will have the same rctx attached.
   // Note that request can be lost without notice, therefore it is user's job
   // to ensure read index retries.
   ReadIndex(ctx context.Context, rctx []byte) error

   // Status returns the current status of the raft state machine.
   Status() Status
   // ReportUnreachable reports the given node is not reachable for the last send.
   ReportUnreachable(id uint64)
   // ReportSnapshot reports the status of the sent snapshot. The id is the raft ID of the follower
   // who is meant to receive the snapshot, and the status is SnapshotFinish or SnapshotFailure.
   // Calling ReportSnapshot with SnapshotFinish is a no-op. But, any failure in applying a
   // snapshot (for e.g., while streaming it from leader to follower), should be reported to the
   // leader with SnapshotFailure. When leader sends a snapshot to a follower, it pauses any raft
   // log probes until the follower can apply the snapshot and advance its state. If the follower
   // can't do that, for e.g., due to a crash, it could end up in a limbo, never getting any
   // updates from the leader. Therefore, it is crucial that the application ensures that any
   // failure in snapshot sending is caught and reported back to the leader; so it can resume raft
   // log probing in the follower.
   ReportSnapshot(id uint64, status SnapshotStatus)
   // Stop performs any necessary termination of the Node.
   Stop()
}

4.WAL

Write Ahead Log,是 etcd 的数据存储方式。etcd通过 WAL 进行持久化存储,所有的数据提交前都会事先记录日志。Snapshot 是为了防止数据过多而进行的状态快照;Entry 表示存储的具体日志内容。

type Storage interface {
	// TODO(tbg): split this into two interfaces, LogStorage and StateStorage.

	// InitialState returns the saved HardState and ConfState information.
	InitialState() (pb.HardState, pb.ConfState, error)
	// Entries returns a slice of log entries in the range [lo,hi).
	// MaxSize limits the total size of the log entries returned, but
	// Entries returns at least one entry if any.
	Entries(lo, hi, maxSize uint64) ([]pb.Entry, error)
	// Term returns the term of entry i, which must be in the range
	// [FirstIndex()-1, LastIndex()]. The term of the entry before
	// FirstIndex is retained for matching purposes even though the
	// rest of that entry may not be available.
	Term(i uint64) (uint64, error)
	// LastIndex returns the index of the last entry in the log.
	LastIndex() (uint64, error)
	// FirstIndex returns the index of the first log entry that is
	// possibly available via Entries (older entries have been incorporated
	// into the latest Snapshot; if storage only contains the dummy entry the
	// first log entry is not available).
	FirstIndex() (uint64, error)
	// Snapshot returns the most recent snapshot.
	// If snapshot is temporarily unavailable, it should return ErrSnapshotTemporarilyUnavailable,
	// so raft state machine could know that Storage needs some time to prepare
	// snapshot and call Snapshot later.
	Snapshot() (pb.Snapshot, error)
}

总结

etcd是Kubernetes的监控和集群信息同步组件,重要性不言而喻,对理解K8s集群的内部实现具有很重要的作用。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

点滴0908

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值