Raft中Batching和pipelining到底是什么?

     Raft supports batching and pipelining of log entries, and both are important for best performance. Many of the costs of request processing are amortized when multiple requests are collected into a batch. For example, it is much faster to send two entries over the network in one packet than in two separate packets, or to write two entries to disk at once. Thus, large batches optimize throughput and are useful when the system is under heavy load. Pipelining, on the other hand, optimizes latency under moderate load by allowing one entry to start to be processed when another is in progress. For example, while a follower is writing the previous entry to disk, pipelining allows the leader to replicate the next entry over the network to that follower. Even at high load, some amount of pipelining can increase throughput by utilizing resources more efficiently. For example, a follower needs to receive entries over the network before it can write them to disk; no amount of batching can use both of these resources at once, but pipelining can. Pipelining also works against batching to some degree. For example, it might be faster overall to delay requests and send one big batch to followers, rather than pipelining multiple small requests.
     Batching is very natural to implement in Raft, since AppendEntries supports sending multiple consecutive entries in one RPC. Leaders in LogCabin send as many entries as are available between the follower’s next index and the end of the log, up to one megabyte in size. The one megabyte limit is arbitrary, but it is enough to use the network and disk efficiently while still providing frequent heartbeats to followers (if one RPC got to be too large, the follower might suspect the leader of failure and start an election). The follower then writes all the new entries from a single AppendEntries request to its disk at once, thus making efficient use of its disk.

     Pipelining is also wellsupported by Raft. The AppendEntries consistency check guarantees that pipelining is safe; in fact, the leader can safely send entries in any order. To support pipelining, the leader treats the next index for each follower optimistically; it updates the next index to send immediately after sending the previous entry, rather than waiting for the previous entry’s acknowledgment. This allows another RPC to pipeline the next entry behind the previous one. Bookkeeping is a bit more involved if RPCs fail. If an RPC times out, the leader must decrement its next index back to its original value to retry. If the AppendEntries consistency check fails, the leader may decrement the next index even further to retry sending the prior entry, or it may wait for that prior entry to be acknowledged and then try again. Even with this change, LogCabin’s original threading architecture still prevented pipelining because it could only support one RPC per follower; thus, we changed it to spawn multiple threads per peer instead of just one.

    This approach to pipelining works best if messages are expected to be delivered in order in the common case, since reordering may lead to inefficient retransmissions. Fortunately, most environments will not reorder messages often. For example, a leader in LogCabin uses a single TCP connection to each follower, and it only switches to a new connection if it suspects a failure. Since a single TCP connection masks networklevel reordering from the application, it is rare for LogCabin followers to receive AppendEntries requests out of order. If the network were to commonly reorder requests, the application could benefit from buffering out-of-order requests temporarily until they could be appended to the log in order.

    The overall performance of a Raft system depends greatly on how batches and pipelines are scheduled. If not enough requests are accumulated in one batch under high load, overall processing will be inefficient, leading to low throughput and high latency. On the other hand, if too many requests are accumulated in one batch, latency will be needlessly high, as early requests wait for later requests to arrive.

     While we are still investigating the best policy, our goal is to minimize the average delay for requests under dynamic workloads. Before we had implemented pipelining in LogCabin, it used a simple double-buffering technique. The leader would keep one outstanding RPC to each follower. When that RPC returned, it would send another one with any log entries that had accumulated in the meantime, and if no more entries were available, the next RPC would be sent out as soon as the next entry was appended. This approach is appealing because it dynamically adjusts to load. As soon as load increases, entries will accumulate, and the next batch will be larger, improving efficiency. Once load decreases, batches will shrink in size, lowering latency. We would like to retain this behavior for pipelining. Intuitively, in a two-level pipeline, we would like the second batch to be started halfway through the processing time for the first batch, thus halving the average delay. However, guessing when a batch is halfway done requires estimating the round-trip time; we are still investigating the best policy to use in LogCabin.

转载于:https://my.oschina.net/fileoptions/blog/1834092

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值