VPP 调度分析

VPP

./src/{vlib, vlibapi, vlibmemory}

US Patent 7,961,636:https://patents.justia.com/patent/7961636

发明提供了一个中间网络节点,其被配置为同时转发多个数据包

VLIB

https://s3-docs.fd.io/vpp/23.02/developer/corearchitecture/vlib.html

处理节点初始化的依赖定义是VLIB_INIT_FUNCTION声明时静态的。如下

VLIB_INIT_FUNCTION(my_init_function) =
{
   .runs_before = VLIB_INITS("we_run_before_function_1",
                             "we_run_before_function_2"),
   .runs_after = VLIB_INITS("we_run_after_function_1",
                            "we_run_after_function_2),
 };
 
// It’s also easy to specify bulk ordering constraints of the form “a then b then c then d”:
VLIB_INIT_FUNCTION(my_init_function) =
{
   .init_order = VLIB_INITS("a", "b", "c", "d"),
};

图节点的特点,虽然声明时是静态的,但运行时可动态添加节点。但不支持删除。

动态添加的节点类型VLIB_NODE_TYPE_INTERNAL

VLIB_NODE_TYPE_INTERNAL - only when explicitly made runnable by adding pending frames for processing

graph-node scheduling

./src/vlib/main.c:vlib_main_loop 分发(调度)节点。

Ask1: 调度过程?
Answer: 调度过程:输入节点产生待处理的向量工作(报文),通过有向图处理向量工作(队列的一部分),直到向量工作被处理完。然后重新开始。

The basic vector processing algorithm is diabolically simple, but may not be obvious from even a long stare at the code. Here’s how it works: some input node, or set of input nodes, produce a vector of work to process. The graph node dispatcher pushes the work vector through the directed graph, subdividing it as needed, until the original work vector has been completely processed. At that point, the process recurs.

以上过程,通过构造,在frame尺寸上产生稳定的平衡。

Ask2: VPP向量化处理的性能收益?
一个frame包含多个报文工作。frame越大,icache预热后,平均每个报文开销越小。

Ask3: 如何权衡当向量工作较少,批处理反而性能损失较大的情况?
Answer: 如果流过的帧大小很低,则图节点调度程序将安排在定时的epoll等待中等待工作,避免浪费CPU时钟。图调度程序支持中断和轮询模式。

Under light load, it is a crazy waste of CPU cycles to run the graph node dispatcher flat-out. So, the graph node dispatcher arranges to wait for work by sitting in a timed epoll wait if the prevailing frame size is low. The scheme has a certain amount of hysteresis to avoid constantly toggling back and forth between interrupt and polling mode. Although the graph dispatcher supports interrupt and polling modes, our current default device drivers do not.

图节点调度程序使用分层计时器轮(hierarchical timer wheel)在计时器到期时重新调度过程节点。

在vpp应用程序中,向量通常使用4字节的向量元素大小。

实现上,

向量工作被看作vlib_frame_t的实例。

Ask4:向量工作和处理节点是怎么关联的?
Answer: 是通过vlib_pending_frame_t类型。node_runtime_index指定了frame的处理节点。vlib_next_frame_t是最后一个关键图调度器数据结构。

The pending frame node_runtime_index associates the frame with the node which will process it.

struct vlib_node_main_t {
  // ...
  /* Vector of next frames. */
  vlib_next_frame_t *next_frames;

  /* Vector of internal node's frames waiting to be called. */
  vlib_pending_frame_t *pending_frames;
  
  /* Pool of pending process frames. */
  vlib_pending_frame_t *suspended_process_frames;
}

/* A frame pending dispatch by main loop. */
typedef struct
{
  /* Frame index (in the heap). */
  vlib_frame_t *frame;

  /* Node and runtime for this frame. */
  u32 node_runtime_index;

  /* Start of next frames for this node. */
  u32 next_frame_index;

  /* Special value for next_frame_index when there is no next frame. */
#define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0)
} vlib_pending_frame_t;

我们可以看到suspend处理过程是可以中断/恢复的。

src/vlib/main.c:vlib_main_or_worker_loop()

  /*
   * Input nodes may have added work to the pending vector.
   * Process pending vector until there is nothing left.
   * All pending vectors will be processed from input -> output.
   */
  for (i = 0; i < _vec_len (nm->pending_frames); i++)
    cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
  /* Reset pending vector for next iteration. */

需要特别注意的是,主图调度循环调用带处理节点,vpp使用有向图而不是有向无环图,可能一个图节点将数据包会入队其自身,所以这种情况,图分配器必须强制分配新frame。

Force allocation of new frame while current frame is being dispatched.

Ask5: vlib的向量化框架,对上提供的接口是怎样的?

Ask6: vlib的向量化处理框架,对内存的布局有什么特殊的需求?它的内存库是怎样的?

ultra-lightweight cooperative multi-tasking threads

Ask7: vlib的向量化处理框架,支持多线程处理不同的处理节点,还是能支持多线程处理相同的节点中的同一个向量工作(不同的分块)?

Ask8: 结合多线程特性,vlib的调度有什么新的支持算法,性能提高如何?

节点类型VLIB_NODE_TYPE_PROCESS

VLIB_NODE_TYPE_PROCESS - only when explicitly made runnable. “Process” nodes are actually cooperative multi-tasking threads. They must explicitly suspend after a reasonably short period of time.

reliable multicast support

先不关注

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值