这里的代码基于1.2.5版本
1.总体流程
nodeos启动时会加载producer_plugin,在这个插件的start函数中会调用schedule_production_loop()开始循环:
voidproducer_plugin::plugin_startup(){try{// …// 这里启动出块循环,在循环中通过回调schedule_production_loop的形式loopmy->schedule_production_loop(); ilog(“producer plugin: plugin_startup() end”);} FC_CAPTURE_AND_RETHROW() }
schedule_production_loop()中会调用producer_plugin_impl::start_block(bool &last_block)尝试出块,根据出块尝试结果,会执行maybe_produce_block()或者等待。
start_block()会先判断节点是否应该出块,如果节点轮到出块,则会调用 chain.start_block(block_time, blocks_to_confirm),这里最终调用chain/controller.cpp中的start_block函数出块,注意代码中有很多个start_block函数。
chain/controller.cpp中的start_block函数主要是设置好_pending_block_state,之后执行on_block transaction。on_block是一个很基础的合约,在eosio.system下的 system_contract::onblock,主要是添加出块奖励和重选出块节点,注意这里eosforce实现了不同的onblock合约。
准备好出块之后,在producer_plugin_impl::start_block中会执行需要执行的trx。
计算好区块之后,逻辑回到schedule_production_loop,根据start_block的结果,如果成功,会调用maybe_produce_block,这里最终调用void producer_plugin_impl::produce_block(),为待出的块计算签名,广播,最后调用commit_block出发新区块的处理逻辑。
下面详细分析一下以上的流程。
2.几个问题
在分析代码之前,先要看一下nodeos的几个问题。
EOS开发者在实现出块逻辑时,代码里有大量的关于_pending_block_mode的if判断,我们这里先梳理一下_pending_block_mode不同值在不同时刻的意义,以便后面不会产生混乱。
2.1 _pending_block_mode
_pending_block_mode 在 https://github.com/EOSIO/eos/issues/3161 中加入, 完整的提交在 https://github.com/EOSIO/eos/pull/3170 。
这个修改改动不小,改变了之前的出块代码结构。
提交是为了解决下面这个bug:
Upon receiving a transaction that throws a soft-fail excepion, which indicates the transaction may be good but the block is full, the producer plugin assumes it is the next active producer and attempts to sign and send off a block, expects success and then reapplies the transaction.
If any of the following assumptions fail this results in a relatively tight loop of creating a block which is exhausted, attempting to play a transaction which soft fails, dropping that block on the floor and trying again until the transaction hard fails due to its own expiration.
This is obviously a bug
在之前的版本中有一个当执行transaction时抛出了一个soft-fail异常时,整个出块插件会不停循环调用loop的bug,在原来的实现当某个transaction soft失败之后,出块节点会认为区块中有个空位置,此时会重新跑一边loop来尝试将未被apply的transaction加入区块,但是,如果由于区块中transaction已满而造成错误时,系统会不断运行loop,因为第二次运行依然是满的,直到有些交易因为超时而被抛弃的时候,此时才能正常出块。
在修正这个bug的同时,EOS团队重构了部分代码,将之前的一些分散处理的错误情况改为记录_pending_block_mode状态,原本block_production_loop()有这些返回:
namespaceblock_production_condition {enumblock_production_condition_enum { produced =0, not_synced =1, not_my_turn =2, not_time_yet =3, no_private_key =4, low_participation =5, lag =6, exception_producing_block =7, fork_below_watermark =8, };}
通过这些返回来判断分别执行现在用_pending_block_mode判断的逻辑。很多原本在之前的maybe_produce_block(后来改名现在是produce_block)判定的逻辑挪到在start_block中判断当前状态:
…// 默认是producing,即正常出块_pending_block_mode = pending_block_mode::producing;// If the next block production opportunity is in the present or future, we’re synced.if( !_production_enabled ) {// 对应not_synced,当前没有同步完成_pending_block_mode = pending_block_mode::speculating; }// Not our turnconstauto& scheduled_producer = hbs->get_scheduled_producer(block_time);if( _producers.find(scheduled_producer.producer_name) != _producers.end()) {// 对应not_my_turn,不是轮到当前节点出块_pending_block_mode = pending_block_mode::speculating; }autoprivate_key_itr = _private_keys.find( scheduled_producer.block_signing_key );if( private_key_itr == _private_keys.end() ) { ilog(“Not producing block because I don’t have the private key for KaTeX parse error: Expected 'EOF', got '}' at position 159: …:speculating; }̲// determine if…{producer}” signed a BFT confirmation OR block at a higher block number ( w a t e r m a