原文:https://www.rdmamojo.com/2013/01/26/ibv_post_send/
ibv_post_send() 将工作请求 (WR) 的链接列表发布到队列对 (QP) 的发送队列(Send Queue )。 ibv_post_send() 逐个检查链表中的所有条目,检查它是否有效,从中生成一个特定于硬件的发送请求(Send Request)并将其添加到 QP 发送队列的尾部(无需任何上下文切换). RDMA 设备将(稍后)以异步方式处理它。如果由于发送队列已满或 WR 中的属性之一错误而导致其中一个 WR 出现故障,则它会立即停止并返回指向该 WR 的指针。 QP 将根据以下规则处理发送队列中的工作请求(Work Requests):
- 如果 QP 处于 RESET、INIT 或 RTR 状态,则应立即返回错误。但是,
它们可能是一些不遵循此规则的底层驱动程序(以消除对数据路径的额外检查,从而提供更好的性能)并且在这些状态中的一个或所有状态下发布发送请求可能会被默默忽略。 - 如果 QP 处于 RTS 状态,则可以发布 Send Requests并对其进行处理。
- 如果 QP 处于 SQE 或 ERROR 状态,则可以发布 Send Requests,并且它们将以错误为结果的完成。
- 如果 QP 处于 SQD 状态,则可以发布发送请求,但不会处理它们。
结构体 ibv_send_wr 描述了对 QP 发送队列的工作请求,即发送请求(SR)。
struct ibv_send_wr {
uint64_t wr_id;
struct ibv_send_wr *next;
struct ibv_sge *sg_list;
int num_sge;
enum ibv_wr_opcode opcode;
int send_flags;
uint32_t imm_data;
union {
struct {
uint64_t remote_addr;
uint32_t rkey;
} rdma;
struct {
uint64_t remote_addr;
uint64_t compare_add;
uint64_t swap;
uint32_t rkey;
} atomic;
struct {
struct ibv_ah *ah;
uint32_t remote_qpn;
uint32_t remote_qkey;
} ud;
} wr;
};
这是结构 ibv_send_wr 的完整描述:
(可以参考下文了解更多:【RDMA】技术详解(三):理解RDMA Scatter Gather List|聚散表_bandaoyu的note-CSDN博客_rdma sge聚合)
wr_id | A 64 bits value associated with this WR. If a Work Completion will be generated when this Work Request ends, it will contain this value
|
next | Pointer to the next WR in the linked list. NULL indicates that this is the last WR
|
sg_list | Scatter/Gather array, as described in the table below. It specifies the buffers that will be read from or the buffers where data will be written in, depends on the used opcode. The entries in the list can specify memory blocks that were registered by different Memory Regions. The message size is the sum of all of the memory buffers length in the scatter/gather list |
num_sge | Size of the sg_list array. This number can be less or equal to the number of scatter/gather entries that the Queue Pair was created to support in the Send Queue (qp_init_attr.cap.max_send_sge). If this size is 0, this indicates that the message size is 0 |
opcode | 此 WR 将执行的操作。该值控制数据的发送方式、数据流的方向以及 WR 中使用的属性。该值可以是以下枚举值之一: The operation that this WR will perform. This value controls the way that data will be sent, the direction of the data flow and the used attributes in the WR. The value can be one of the following enumerated values:
The content of the local memory buffers specified in sg_list is being sent to the remote QP. The sender doesn’t know where the data will be written in the remote node. A Receive Request will be consumed from the head of remote QP's Receive Queue and sent data will be written to the memory buffers which are specified in that Receive Request. The message size can be [0, 2^31 ] for RC and UC QPs and [0, path MTU] for UD QP
The content of the local memory buffers specified in sg_list is being sent and written to a contiguous block of memory range in the remote QP's virtual space. This doesn't necessarily means that the remote memory is physically contiguous. No Receive Request will be consumed in the remote QP. The message size can be [0, 2^31]
|