以太坊源码分析之共识算法ethash-CSDN博客

区块链是作为分布式系统来构建的，由于它们不依赖于一个中央权威，因此分散的节点需要就交易的有效与否达成一致，而达成一致的机制便是共识算法。

以太坊目前的算法是类似POW的算法：ethash。它除了和比特币一样需要消耗电脑资源外进行计算外，还考虑了对专用矿机的抵制，增加了挖矿的公平性。

一般的POW算法思路

POW即工作量证明，也就是通过工作结果来证明你完成了相应的工作。它的初衷希望所有人可参与，门槛低（验证容易），但是得到结果难（计算复杂）。在这一点上，只匹配部分特征的hash算法（不可逆）非常符合要求。

通过不断地更换随机数来得到哈希值，比较是否小于给定值（难度），符合的即为合适的结果。随机数一般来自区块header结构里的nonce字段。因为出块时间是一定的，但总体算力是不确定的，所以难度一般会根据时间来调整。

ethash算法的思路

ethash与pow类似，数据来源除了像比特币一样来自header结构和nonce，还有自己定的一套数据集dataset。精简后的核心代码如下：

func (ethash *Ethash) mine(block *types.Block, id int, seed uint64, abort chan struct{}, found chan *types.Block) {
	var (
		header  = block.Header()
		hash    = ethash.SealHash(header).Bytes()
		target  = new(big.Int).Div(two256, header.Difficulty)
		number  = header.Number.Uint64()
		dataset = ethash.dataset(number, false)
	)
	var (
		attempts = int64(0)
		nonce    = seed
	)
	for {
			attempts++
			digest, result := hashimotoFull(dataset.dataset, hash, nonce)
			if new(big.Int).SetBytes(result).Cmp(target) <= 0 {
				// Correct nonce found, create a new header with it
				header = types.CopyHeader(header)
				header.Nonce = types.EncodeNonce(nonce)
				header.MixDigest = common.BytesToHash(digest)
				...
			}
            ...
      }
      ...
}
复制代码

	for i := 0; i < threads; i++ {
		pend.Add(1)
		go func(id int, nonce uint64) {
			defer pend.Done()
			ethash.mine(block, id, nonce, abort, locals)
		}(i, uint64(ethash.rand.Int63()))
	}
复制代码

在miner方法中hashimotoFull返回result，result <= target，则计算到合适的nonce了（挖到矿了，矿工的努力终于没有白费哈哈）。而target则是2^256/难度，result的来源则是随机数nonce,区块头的hash值以及数据集dataset（另外，另一个返回值摘要digest存储在区块头，用来和验证得到的digest进行核对）。

说完了整体，下面将重点对dataset和header.Difficulty进行具体分析。

该dataset是ethash的主要部分，主要是为了抵抗ASIC矿机的。因为生成的dataset很大（初始就有1G），所以该算法的性能瓶颈不在于cpu运算速度，而在于内存读取速度。大内存是昂贵的，并且普通计算机现有内存也足够跑了，通过内存来限制，去除专用硬件的运算优势。

ethash是Dagger-Hashimoto算法的变种，由于ethash对原来算法的特征改变很大，所以不介绍算法的原理了。只结合现有的ethash源码，对生成dataset和使用dataset，分成Dagger和Hashimoto两部分讨论。

Dagger

Dagger是用来生成dataset的。

如图所示：

对于每一个区块，都能通过扫描区块头的方式计算出一个种子（ seed ），该种子只与当前区块有关。
使用种子能产生一个16MB 的伪随机缓存(cache)，轻客户端会存储缓存。
基于缓存再生成一个1GB的数据集(dataset)，数据集中的每一个元素都只依赖于缓存中的某几个元素，也就是说，只要有缓存，就可以快速地计算出数据集中指定位置的元素。挖矿者存储数据集，数据集随时间线性增长。

在代码中生成cache的部分

如图所示：

数字标记代表调用入口，其中3、4代表MakeCache和MakeDataset，是geth提供的命令，用于生成cache和dataset(生成dataset要先生成cache)。
重点是数字1和2代表的挖矿（ethash.mine）和验证(ethash.verifySeal)。mine走的是生成dataset的方式，后面再介绍。verifySeal如果是全量级验证则和mine一样。如果是轻量级验证，则不会生成完整的dataset，而是生成cache，最终的调用交给算法模块里的generateCache完成。

verifySeal 全量级验证部分

	if fulldag {
		dataset := ethash.dataset(number, true)
		if dataset.generated() {
			digest, result = hashimotoFull(dataset.dataset, ethash.SealHash(header).Bytes(), header.Nonce.Uint64())

			// Datasets are unmapped in a finalizer. Ensure that the dataset stays alive
			// until after the call to hashimotoFull so it's not unmapped while being used.
			runtime.KeepAlive(dataset)
		} else {
			// Dataset not yet generated, don't hang, use a cache instead
			fulldag = false
		}
	}
复制代码

verifySeal 轻量级验证部分

	if !fulldag {
		cache := ethash.cache(number)

		size := datasetSize(number)
		if ethash.config.PowMode == ModeTest {
			size = 32 * 1024
		}
		digest, result = hashimotoLight(size, cache.cache, ethash.SealHash(header).Bytes(), header.Nonce.Uint64())

		// Caches are unmapped in a finalizer. Ensure that the cache stays alive
		// until after the call to hashimotoLight so it's not unmapped while being used.
		runtime.KeepAlive(cache)
	}
复制代码

generateCache生成cache:大致思路是在给定种子数组seed[]的情况下，对固定容量的一块buffer(即cache)进行一系列操作，使得buffer的数值分布变得随机、无规律可循。