etcd后端存储源码解析——底层读写操作

最新推荐文章于 2024-05-08 14:15:02 发布

ppingfann

最新推荐文章于 2024-05-08 14:15:02 发布

阅读量958

点赞数

分类专栏：数据库文章标签： etcd

本文链接：https://blog.csdn.net/hty46565/article/details/109556191

版权

背景

最近想找一些用Go语言实现的优秀开源项目学习一下，etcd作为一个被广泛应用的高可用、强一致性服务发现存储仓库，非常值得分析学习。
本篇文章主要是对etcd的后台存储源码做一解析，希望可以从中学到一些东西。

etcd大版本区别

目前etcd常用的是v2和v3两个大版本。两个版本不同之处主要在于：

v2版本仅在内存中对数据进行了存储，没有做持久化存储。而v3版本做了持久化存储，且还使用了缓存机制加快查询速度。
v2版本和v3版本对外提供的接口做了一些改变。在命令行界面中，可以使用环境变量ETCDCTL_API来设置对外接口。

我们在这里主要是介绍v3版本的后台存储部分实现。
并且这里仅涉及到底层的读写操作接口，并不涉及到更上层的读写步骤（键值的revision版本选择等）。

etcd的后端存储接口

分析思路：

查看etcd封装的后端存储接口
查看etcd实现了后端存储接口的结构体
查看上述结构体的初始化方法
查看上述结构体的初始化值
查看上述结构体初始化方法的具体初始化过程

首先，我们先来看下etcd封装的后端存储接口：
路径：https://github.com/etcd-io/etcd/blob/master/mvcc/backend/backend.go

type Backend interface {
   
    // ReadTx returns a read transaction. It is replaced by ConcurrentReadTx in the main data path, see #10523.
    ReadTx() ReadTx
    BatchTx() BatchTx
    // ConcurrentReadTx returns a non-blocking read transaction.
    ConcurrentReadTx() ReadTx

    Snapshot() Snapshot
    Hash(ignores map[IgnoreKey]struct{
   }) (uint32, error)
    // Size returns the current size of the backend physically allocated.
    // The backend can hold DB space that is not utilized at the moment,
    // since it can conduct pre-allocation or spare unused space for recycling.
    // Use SizeInUse() instead for the actual DB size.
    Size() int64
    // SizeInUse returns the current size of the backend logically in use.
    // Since the backend can manage free space in a non-byte unit such as
    // number of pages, the returned value can be not exactly accurate in bytes.
    SizeInUse() int64
    // OpenReadTxN returns the number of currently open read transactions in the backend.
    OpenReadTxN() int64
    Defrag() error
    ForceCommit()
    Close() error
}

Backend接口封装了etcd后端所提供的接口，最主要的是：
ReadTx()，提供只读事务的接口，以及BatchTx()，提供读写事务的接口。
Backend作为后端封装好的接口，而backend结构体则实现了Backend接口。
路径：https://github.com/etcd-io/etcd/blob/master/mvcc/backend/backend.go

type backend struct {
   
	// size and commits are used with atomic operations so they must be
	// 64-bit aligned, otherwise 32-bit tests will crash

	// size is the number of bytes allocated in the backend
	// size字段用于存储给后端分配的字节大小
	size int64
	// sizeInUse is the number of bytes actually used in the backend
	// sizeInUse字段是后端实际上使用的内存大小
	sizeInUse int64
	// commits counts number of commits since start
	// commits字段用于记录启动以来提交的次数
	commits int64
	// openReadTxN is the number of currently open read transactions in the backend
	// openReadTxN存储目前读取事务的开启次数
	openReadTxN int64

	// mu是互斥锁
	mu sync.RWMutex
	// db表示一个boltDB实例，此处可以看到，Etcd默认使用Bolt数据库作为底层存储数据库
	db *bolt.DB

	// 用于读写操作
	batchInterval time.Duration
	batchLimit    int
	batchTx       *batchTxBuffered

	// 该结构体用于只读操作,Tx表示transaction
	readTx *readTx

	stopc chan struct{
   }
	donec chan struct{
   }

	// 日志信息
	lg *zap.Logger
}

通过19行 db *bolt.DB 我们可以看到，etcd的底层存储数据库为BoltDB。
好了，接下来我们就看一下这个backend结构体是如何初始化的。
还是在该路径下，我们可以看到New函数

// 创建一个新的backend实例
func New(bcfg BackendConfig) Backend {
   
    return newBackend(bcfg)
}

该函数传入了参数bcfg，类型为BackendConfig，这是后端存储的配置信息。
我们先看下这个配置信息中包含了什么
依然在该路径下，找到BackendConfig结构体

type BackendConfig struct {
   
    // Path is the file path to the backend file.
    Path string
    // BatchInterval is the maximum time before flushing the BatchTx.
    // BatchInterval表示提交事务的最长间隔时间
    BatchInterval time.Duration
    // BatchLimit is the maximum puts before flushing the BatchTx.
    BatchLimit int
    // BackendFreelistType is the backend boltdb's freelist type.
    BackendFreelistType bolt.FreelistType
    // MmapSize is the number of bytes to mmap for the backend.
    // MmapSize表示分配的内存大小
    MmapSize uint64
    // Logger logs backend-side operations.
    Logger *zap.Logger
    // UnsafeNoFsync disables all uses of fsync.
    UnsafeNoFsync bool `json:"unsafe-no-fsync"`
}

可以看到，有许多backend初始化所需要的信息都在这个结构体中。
既然有这些配置信息，那么一定会有相应的默认配置信息，
我们来看下在默认情况下etcd存储部分会被赋怎样的值。
依然在该目录下，找到DefaultBackendConfig函数。

func DefaultBackendConfig() BackendConfig {
   
    return BackendConfig{
   
    BatchInterval: defaultBatchInterval,
    BatchLimit: defaultBatchLimit,
    MmapSize: initialMmapSize,
    }
}

随便查看其中某个全局变量的值，比如defaultBatchInterval，则可以看到默认值：

var (
    defaultBatchLimit = 10000
    defaultBatchInterval = 100 * time.Millisecond
    defragLimit = 10000
    // initialMmapSize is the initial size of the mmapped region. Setting this larger than
    // the potential max db size can prevent writer from blocking reader.
    // This only works for linux.
    initialMmapSize = uint64(10 * 1024 * 1024 * 1024)
    // minSnapshotWarningTimeout is the minimum threshold to trigger a long running snapshot warning.
    minSnapshotWarningTimeout = 30 * time.Second
)

以defaultBatchInterval变量为例，就是说默认情况下，etcd会100秒做一次自动的事务提交。
etcd后端存储默认赋值的部分说完了，就说回对结构体的初始化上。
我们继续看函数New，它调用了函数newBackend，
我们看下函数newBackend做了些什么

func newBackend(bcfg BackendConfig) *backend {
   
    if bcfg.Logger == nil {
   
        bcfg.Logger = zap.NewNop()
    }

    // 一些配置载入
    bopts := &bolt.Options{
   }
    if boltOpenOptions != nil {
   
        *bopts = *boltOpenOptions
    }
    bopts

最低0.47元/天解锁文章