【verbs】ibv_get_device_name()|ibv_get_device_list()|verbs api

文章介绍了RDMA设备名称查询函数ibv_get_device_name的使用方法,列举了不同供应商设备的命名规则,并提供了示例代码。此外,还分析了因systemd PrivateDevices设置导致的无法访问/sys/class/infiniband目录问题,以及ibv_devinfo命令失败的恢复步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

先决条件

ibv_get_device_name

两端驱动版本/固件版本不一致造成的错误

ibv_get_device_list

ibv_open_device()

 错误记录

Recovering from a failed ibv_devinfo command

About this task

Procedure

libibverbs can't find running IB devices 

驱动安装上但是不正常


作者:bandaoyu 地址:https://blog.csdn.net/bandaoyu/article/details/116539866

先决条件

ibv_fork_init() 应该在调用 libibverbs 中的任何其他函数之前调用。

ibv_get_device_name

函数ibv_get_device_list()返回当前可用的RDMA设备数组

const char *ibv_get_device_name(struct ibv_device *device);

描述

函数用来获取一个与RDMA设备相关联的名字

注意

  • 这个名字在一台特定的机器中是唯一的(相同的名字不会分配给其他设备);
  • 这个名字在跨InfiniBand fabric并不是唯一的;
  • 当一台电脑上拥有多于一台的RDMA设备时,修改RDMA设备在电脑上的位置(例如,总线上的位置),可能会导致关联的name改变;
  • 为了区分设备,建议使用设备的GUID进行区分,函数ibv_get_device_guid()可以返回该值;

参数(struct ibv_device *device)

函数ibv_get_device_list()返回值中的某一项。(函数ibv_get_device_list()返回当前可用的RDMA设备数组)

返回值(const char *)

函数返回一个指向设备名的指针,如果出错,返回NULL。

名字的组成

  • prefix前缀---描述RDMA设备供应商和样式
    • cxgb3 ---Chelsio Communications, T3 RDMA family
    • cxgb4 - Chelsio Communications, T4 RDMA family
    • ehca - IBM, eHCA family
    • ipathverbs - QLogic
    • mlx4 - Mellanox Technologies, ConnectX family
    • mthca - Mellanox Technologies, InfiniHost family
    • nes - Intel, Intel-NE family
  • index索引-用来在同一台电脑中区分相同供应商的一个数字

例子

函数ibv_get_device_name()可能返回的RDMA设备名的例子

  • mlx4_0
  • mlx4_1
  • mthca0
#include <stdio.h>
#include <infiniband/verbs.h>

int main(void)
{
    struct ibv_device **device_list;
    int num_devices;
    int i;

    device_list = ibv_get_device_list(&num_devices);
    if (!device_list) {
        fprintf(stderr, "Error, ibv_get_device_list() failed\n");
        return -1;
    }

    printf("%d RDMA device(s) found:\n\n", num_devices);

    for (i = 0; i < num_devices; ++ i)
        printf("RDMA device[%d]: name=%s\n", i,
               ibv_get_device_name(device_list[i]));

    ibv_free_device_list(device_list);

    return 0;
}

获取intel or mellonx 的device name、ib_port、gid index 、mtu的脚本

用法:get-rdma-device-info  eth0

#!/bin/bash

g_vendor=""
g_hexip=""
g_nic_name=""


MY_UUID=$(cat /proc/sys/kernel/random/uuid)
MY_TMP_FILE_PATH=/tmp/${MY_UUID}.txt
MY_LOG_FILE_PATH=/var/ceph_osd_get_rdma_info.log


#mellonx or intel
function  set_vendor()
{

 vendor_id=`ibv_devinfo|grep "vendor_id"|awk 'NR==1 {print $2}'`


if [[ "0x8086" = ${vendor_id} ]]; then

     g_vendor="INTEL"

elif [[ "0x02c9" = ${vendor_id} ]]; then

     g_vendor="MELLONX"
else

echo "unknown rdma hca vendor." >>${MY_LOG_FILE_PATH}

exit 1

fi

}


#ip4 to hex ip
function ip4_to_hex()
{
tmpifs=${IFS}

IFS="."
num=0

for str in $1
do
ip[num]=${str}
((num++))
done

g_hexip=`printf "%x%x:%x%x" ${ip[0]} ${ip[1]} ${ip[2]} ${ip[3]}`

IFS=${tmpifs}
}


#intel hca process
#reference:https://downloadmirror.intel.com/30368/eng/README_irdma_1.4.22.txt
function  intel_hca()
{
echo "vendor is intel">>${MY_LOG_FILE_PATH}


devices=`ibv_devices|awk 'NR > 2 {print $1}'`

for dev in $devices
do
ifname=`ls /sys/class/infiniband/${dev}/device/net`
if [[ ${g_nic_name} = ${ifname} ]];then
device_name=${dev}
fi
done

ethip=`ip route|grep 'link src'|grep ${g_nic_name}|awk '{print $9}'`

ip4_to_hex ${ethip}


if [ "X${device_name}" != "X" ];then
  echo "device_name"=${device_name} >>${MY_TMP_FILE_PATH}
else
  echo "get device_name failed">>${MY_LOG_FILE_PATH}
  exit 1
fi


for port in ` ls /sys/class/infiniband/${device_name}/ports/`
{
    for gidx in `ls /sys/class/infiniband/${device_name}/ports/${port}/gids`
    {
    	hca_hex_ip=`cat /sys/class/infiniband/${device_name}/ports/${port}/gids/${gidx}`
     
     	if [[ ${hca_hex_ip} =~ ${g_hexip} ]];then
     		gid_index=${gidx}
     		ib_port=${port}
     	fi

     }
}


if [ "X${gid_index}" != "X" ];then
  echo "gid_index"=${gid_index} >>${MY_TMP_FILE_PATH}
else
  echo "get gid_index failed">>${MY_LOG_FILE_PATH}
  exit 1
fi


if [ "X${ib_port}" != "X" ];then
  echo "ib_port"=${ib_port} >>${MY_TMP_FILE_PATH}
else
  echo "get ib_port failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

mtu_index=`ibv_devinfo|grep -A 17 ${device_name} |grep active_mtu|awk '{print $3}'|awk -F "[()]" '{print $2}'`

if [ "X${mtu_index}" != "X" ];then
  echo "mtu_index"=${mtu_index} >>${MY_TMP_FILE_PATH}
else
  echo "get mtu_index failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

}



#mellonx hca process
#ibdev2netdev、show_gids
function  mellonx_hca()
{

echo "vendor is mellonx">>${MY_LOG_FILE_PATH}

device_name=`ibdev2netdev | grep -w ${g_nic_name} | awk -F ' ' '{print $1}'`

if [ "X$device_name" != "X" ];then
  echo "device_name"=${device_name} >>${MY_TMP_FILE_PATH}
else
  echo "get device_name failed">>${MY_LOG_FILE_PATH}
  exit 1
fi


gid_index=`show_gids | grep -w ${device_name} |grep -w "v2"| awk -F ' ' '$5 !="" {print $3}'|head -n 1`

if [ "X${gid_index}" != "X" ];then
  echo "gid_index"=${gid_index} >>${MY_TMP_FILE_PATH}
else
  echo "get gid_index failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

ib_port=`show_gids | grep -w ${device_name} |grep -w "v2"| awk -F ' ' '$5 !="" {print $2}'|head -n 1`

if [ "X${ib_port}" != "X" ];then
  echo "ib_port"=${ib_port} >>${MY_TMP_FILE_PATH}
else
  echo "get ib_port failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

mtu_index=`ibv_devinfo|grep -A 17 ${device_name} |grep active_mtu|awk '{print $3}'|awk -F "[()]" '{print $2}'`

if [ "X${mtu_index}" != "X" ];then
  echo "mtu_index"=${mtu_index} >>${MY_TMP_FILE_PATH}
else
  echo "get mtu_index failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

}





#====================================================================
#start shell
#====================================================================


echo "input interface name is:$1">${MY_LOG_FILE_PATH}

if [ "X$1" == "X" ]; then
  echo "interface is not specific,excample:$0 eth0">>${MY_LOG_FILE_PATH}
  exit 1
fi

g_nic_name=$1

is_virtual=`ls -l /sys/class/net/ | grep " $g_nic_name " | grep "\/virtual\/net\/" | wc -l`
if [ $is_virtual -ne 0 ]; then
  g_nic_name=`echo $g_nic_name | awk -F "." 'OFS="." {$NF="" ;print $0}' | sed 's/.$//'`
fi


set_vendor


if [[ "INTEL" = ${g_vendor} ]]; then

	intel_hca

elif [[ "MELLONX" = ${g_vendor} ]]; then

	mellonx_hca
else

echo "Unable to determine the vendor. exit 1">>${MY_LOG_FILE_PATH}
exit 1   

fi

cat ${MY_TMP_FILE_PATH}

rm -f ${MY_TMP_FILE_PATH}

两端驱动版本/固件版本不一致造成的错误

两端驱动版本/固件版本不一致造成奇奇怪怪的错误。如果发现找不到问题根因,可以尝试查看网卡的固件版本和驱动版本。

ibv_get_device_list

先决条件

ibv_fork_init() 应该在调用 libibverbs 中的任何其他函数之前调用。

描述
ibv_get_device_list() 返回当前可用的以 NULL 结尾的 RDMA 设备数组。应该使用 ibv_free_device_list() 释放该数组。


不应直接访问数组条目。相反,应该用以下service verbs操作它们:ibv_get_device_name()、ibv_get_device_guid() 和 ibv_open_device()。

参数

名称方向描述
num_devicesout(optional) 如果不为NULL,则设置为数组中返回的设备数


返回值
ibv_get_device_list() 成功时返回可用 RDMA 设备的数组,如果请求失败则返回 NULL 并设置 errno。

如果没有找到设备,则将 num_devices 设置为 0,并返回非 NULL。


可能的 errno 值为:
EPERM - 权限被拒绝。 errno 1
ENOMEM - 内存不足,无法完成操作。 errno  12
ENOSYS - 内核不支持 RDMA(相关函数没有实现) errno  38。

例子


不带参数获取设备列表:

struct ibv_device **dev_list;
 
dev_list = ibv_get_device_list(NULL);
if (!dev_list)
        exit(1);

带参数获取设备列表:

struct ibv_device **dev_list;
int num_devices;
 
dev_list = ibv_get_device_list(&num_devices);
if (!dev_list)
        exit(1);


常见问题


我调用了 ibv_get_device_list() 并返回 NULL,这是什么意思?
这是一个不应失败的verb,请检查模块 ib_uverbs 是否已加载。(命令 lsmod)

我调用了 ibv_get_device_list() 并且它根本没有找到任何 RDMA 设备(空列表),这是什么意思?

驱动程序找不到任何 RDMA 设备。
- 检查 lspci,如果您的机器中有任何 RDMA 设备
- 检查是否加载了 RDMA 设备的低级驱动程序,使用 lsmod
- 检查 dmesg /var/log/messages 是否有错误
 

翻译自:https://www.rdmamojo.com/2012/05/31/ibv_get_device_list/

更多参考:Device Operations - RDMA Aware Programming User Manual v1.7 - NVIDIA Networking Docs 

struct ibv_device
{
    struct ibv_device_ops   ops;
    enum ibv_node_type  node_type;
    enum ibv_transport_type transport_type;
    char    name[IBV_SYSFS_NAME_MAX];
    char    dev_name[IBV_SYSFS_NAME_MAX];
    char    dev_path[IBV_SYSFS_PATH_MAX];
    char    ibdev_path[IBV_SYSFS_PATH_MAX];
};
 
ops pointers to alloc and free functions
node_type   IBV_NODE_UNKNOWN
    IBV_NODE_CA
    IBV_NODE_SWITCH
    IBV_NODE_ROUTER
    IBV_NODE_RNIC
transport_type  IBV_TRANSPORT_UNKNOWN
    IBV_TRANSPORT_IB
    IBV_TRANSPORT_IWARP
name    kernel device name eg “mthca0”
dev_name    uverbs device name eg “uverbs0”
dev_path    path to infiniband_verbs class device in sysfs
ibdev_path  path to infiniband class device in sysfs

ibv_open_device()

struct ibv_context *ibv_open_device(struct ibv_device *device);

描述

函数会创建一个RDMA设备相关的context;可以通过ibv_close_device()函数来关闭设备。

context的作用

  • 查询RDMA设备的资源
  • 创建资源

注意:

  • 函数的功能与函数名所代表的意思不同,它事实上并不是打开设备;
  • 设备实际是被内核底层的驱动打开;
  • 设备可能被其他的用户/内核层的代码所使用;
  • 这个verb仅仅打开一个context,供用户层应用来使用它;

参数(struct ibv_device *device)

参数为函数ibv_get_device_list()返回RDMA可用RDMA设备数组的某一项。

返回值(struct ibv_context *)

struct ibv_context包含下面区域

  • async_fd:async_fd是一个文件描述符,为了用来读取异步事件。如果他想在一个non-blocking模式下读取一个异步事件,可以使用这个描述符
  • num_comp_vectors:可用于此RDMA设备的完成向量(completion vectors)的数量

例子

#include <stdio.h>
#include <infiniband/verbs.h>

int main(void)
{
    struct ibv_device **device_list;
    int num_devices;
    int i;
    int rc;

    device_list = ibv_get_device_list(&num_devices);
    if (!device_list) {
        fprintf(stderr, "Error, ibv_get_device_list() failed\n");
        return -1;
    }

    printf("%d RDMA device(s) found:\n\n", num_devices);

    for (i = 0; i < num_devices; ++ i) {
        struct ibv_context *ctx;

        ctx = ibv_open_device(device_list[i]);
        if (!ctx) {
            fprintf(stderr, "Error, failed to open the device '%s'\n",
                ibv_get_device_name(device_list[i]));
            rc = -1;
            goto out;
        }

        printf("The device '%s' was opened\n", ibv_get_device_name(ctx->device));

        rc = ibv_close_device(ctx);
        if (rc) {
            fprintf(stderr, "Error, failed to close the device '%s'\n",
                ibv_get_device_name(ctx->device));
            rc = -1;
            goto out;
        }
    }
        
    ibv_free_device_list(device_list);

    return 0;

out:
    ibv_free_device_list(device_list);
    return rc;
}

 错误记录

ibv_get_device_list() 没有找到任何 RDMA 设备,返回num=0.报错No such file or directory

问题原因: 

ceph 用systemd管理进程,而

/lib/systemd/system/ceph-mds@.service 中PrivateDevices=yes,这样一来:进程启动后运作在一个私有的文件系统空间,这个私有系统空间中/dev 被一个最小化的版本替代,仅包含非物理设备的节点:

When PrivateDevices=yes is set in the [Service] section of a systemd service unit file, the processes run for the service will run in a private file system namespace where /dev is replaced by a minimal version that only includes the device nodes /dev/null, /dev/zero, /dev/full, /dev/urandom, /dev/random, /dev/tty as well as the submounts /dev/shm, /dev/pts, /dev/mqueue, /dev/hugepages, and the /dev/stdout, /dev/stderr, /dev/stdin symlinks. No device nodes for physical devices will be included however.

Changes/PrivateDevicesAndPrivateNetwork:https://fedoraproject.org/wiki/Changes/PrivateDevicesAndPrivateNetwork

 而根据ibv_get_device_lis源码,需要去/sys/class/infiniband读取list:

[root@a1 ceph]# ls /sys/class/infiniband
mlx5_0  mlx5_1

因为此时mds进程的私有系统空间不包含/sys/,所以报错:No such file or directory。

[Unit]
Description=Ceph metadata server daemon
After=network-online.target local-fs.target time-sync.target
Wants=network-online.target local-fs.target time-sync.target
PartOf=ceph-mds.target

[Service]
LimitCORE=infinity
LimitNOFILE=1048576
LimitNPROC=1048576
EnvironmentFile=-/etc/sysconfig/ceph
Environment=CLUSTER=ceph
ExecStart=/opt/h3c/bin/ceph-mds -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
ExecReload=/bin/kill -HUP $MAINPID
PrivateDevices=yes
ProtectHome=true
ProtectSystem=full
PrivateTmp=true
TasksMax=infinity
Restart=on-failure
StartLimitInterval=30min
StartLimitBurst=50
RestartSec=10s

[Install]
WantedBy=ceph-mds.targe


还需要注意:使用ibverbs api注意事项|https://blog.csdn.net/bandaoyu/article/details/124327417?spm=1001.2014.3001.5501

从失败的 ibv_devinfo 命令中恢复:IBM Docs

Recovering from a failed ibv_devinfo command

Last Updated: 2021-03-01

The ibv_devinfo command can fail when modules or hardware drivers fail to load or when libraries are missing.

About this task

The ibv_devinfo command generally fails with one of two common errors. The recovery steps for each of those two errors, and one less common error, are given below.

Procedure

  1. Error: Failed to get IB devices list: Function not implemented.

    One of the common causes of this failure is that the ib_uverbs module might not by loaded or it might not be enabled at the correct run levels. To recover from this error, complete the following steps:

    1. To verify the ib_uverbs module is loaded, run the following command and look for similar output:
      lsmod | grep ib_uverbs
      
      ib_uverbs              44238  0
    2. To verify that the RDMA run level is set to on for levels 3 and 5, run the following command and look for similar output:
      chkconfig --list | grep rdma
      
      0:off 1:off 2:off 3:on 4:off 5:on 6:off 
      If RDMA is off, run the following commands to activate RDMA on levels 3 and 5:
      chkconfig --level 3 rdma on 
      	chkconfig --level 5 rdma on
      Run the following command to restart RDMA:
      openibd restart/rdma restart
    3. If there is a missing library, you will see an error similar to the following:
      libibverbs: Warning: couldn't load driver 'mlx4': libmlx4-rdmav2.so: cannot open shared object file: No such file or directory 
      	libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 
      	No IB devices found. 

      If you receive this error, install the libmlx4 user level library.

  2. Error: No IB devices found.

    If no IB devices are found, complete the following steps:

    1. Check to see if the relevant hardware driver is loaded. If a hardware driver is missing, then run the following command:
      modprobe <hardware driver>
    2. Verify that the hardware driver is loaded by default by editing the configuration file.
    3. Run the following command to restart RDMA:
      openibd restart/rdma restart
  3. Error: On Red Hat Enterprise Linux 5.x on ppc64, the wrong libraries are installed.

    Red Hat Enterprise Linux 5.x on ppc64 requires 32-bit user level libraries like libmlx4. However, by default, the 64-bit libraries are installed. Make sure that you have the correct 32-bit libraries installed.

libibverbs can't find running IB devices 

749816 – libibverbs can't find running IB devices

驱动安装上但是不正常

按intel的教程将驱动装上了,但是工作不正常。最后发现是系统内核自带的内核态rdma驱动是mellonx的,而我们安装的rdma-core 是用户态的驱动。 二者并不匹配。

modinfo ib-core命令可以看到
H3C发行的镜像的ib-core是/lib/modules/5.10.38-21.01.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/core/ib_core.ko , 是mellonx提供的内核态rdma core

而不是原生发行版本的 /lib/modules/5.10.38-21.01.el7.x86_64/kernel/drivers/infiniband/core/ib_core.ko.xz,也就是H3C发行版本用mlnx的ib_core.ko替换了原生的ib_core.ko.xz)

从名字中的mlnx看,应该是mellonx的驱动,对intel RDMA网卡应该是不适配的,所以造成了无法正常使用。Ice 应该也是一样的情况.

直接将/lib/modules/5.10.38-21.01.el7.x86_64/extra 删掉,重启,让系统加载原生发行版本的 /lib/modules/5.10.38-21.01.el7.x86_64/kernel/drivers/infiniband/core/ib_core.ko.xz

然后再重新安装intel的驱动。遇到冲突的,就用yum remove 卸载掉mellonx的驱动

3.1.Initialization
3.1.1.ibv_fork_init
        Template:
        int ibv_fork_init(void)
        Input Parameters:
        None
        Output Parameters:
        None
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure

Description:
        ibv_fork_init initializes libibverbs' data structures to handle the fork() function safely and avoid data corruption, whether fork() is called explicitly or implicitly such as in system() calls.It is not necessary to call ibv_fork_init if all parent process threads are always blocked until all child processes end or change address space via an exec() operation.

        This function works on Linux kernels supporting the MADV_DONTFORK flag for madvise() (2.6.17 and higher).
        Setting the environment variable RDMAV_FORK_SAFE or IBV_FORK_SAFE to any value has the same effect as calling ibv_fork_init().
        Setting the environment variable RDMAV_HUGEPAGES_SAFE to any value tells the library to check the underlying page size used by the kernel for memory regions. This is required if an application uses huge pages either directly or indirectly via a library such as libhugetlbfs.

        Calling ibv_fork_init() will reduce performance due to an extra system call for every memory registration, and the additional memory allocated to track memory regions. The precise performance impact depends on the workload and usually will not be significant.
Setting RDMAV_HUGEPAGES_SAFE adds further overhead to all memory registrations. 

3.2.Device Operations
3.2.1.ibv_get_device_list
        Template:
        struct ibv_device **ibv_get_device_list(int *num_devices)
        Input Parameters:
        none
        Output Parameters:
        num_devices (optional) If non-null, the number of devices returned in the array will be stored here
        Return Value:
        NULL terminated array of VPI devices or NULL on failure.

Description:
        ibv_get_device_list returns a list of VPI devices available on the system. Each entry on the list is a pointer to a struct ibv_device.
        struct ibv_device is defined as:

        struct ibv_device
        {
                struct ibv_device_ops ops;
                enum ibv_node_type node_type;
                enum ibv_transport_type transport_type;
                char name[IBV_SYSFS_NAME_MAX];
                char dev_name[IBV_SYSFS_NAME_MAX];
                char dev_path[IBV_SYSFS_PATH_MAX];
                char ibdev_path[IBV_SYSFS_PATH_MAX];
        };

        ops pointers to alloc and free functions

        node_type

                IBV_NODE_UNKNOWN
                IBV_NODE_CA
                IBV_NODE_SWITCH
                IBV_NODE_ROUTER
                IBV_NODE_RNIC

        transport_type

                IBV_TRANSPORT_UNKNOWN
                IBV_TRANSPORT_IB
                IBV_TRANSPORT_IWARP

        name kernel device name eg “mthca0”

        dev_name uverbs device name eg “uverbs0”

        dev_path path to infiniband_verbs class device in sysfs

        ibdev_path path to infiniband class device in sysfs

        The list of ibv_device structs shall remain valid until the list is freed. After calling ibv_get_device_ list, the user should open any desired devices and promptly free the list via the ibv_free_device_list command. 

3.2.2.ibv_free_device_list
        Template:
        void ibv_free_device_list(struct ibv_device **list)
        Input Parameters:
        list list of devices provided from ibv_get_device_list command
        Output Parameters:
        none
        Return Value:
        none

Description:
        ibv_free_device_list frees the list of ibv_device structs provided by ibv_get_device_list. Any desired devices should be opened prior to calling this command. Once the list is freed, all ibv_device structs that were on the list are invalid and can no longer be used. 

3.2.3.ibv_get_device_name
        Template:
        const char *ibv_get_device_name(struct ibv_device *device)
        Input Parameters:
        device struct ibv_device for desired device
        Output Parameters:
        none
        Return Value:
        Pointer to device name char string or NULL on failure.

Description:
        ibv_get_device_name returns a pointer to the device name contained within the ibv_device struct.

3.2.4.ibv_get_device_guid
        Template:
        uint64_t ibv_get_device_guid(struct ibv_device *device)
        Input Parameters:
        device struct ibv_device for desired device
        Output Parameters:
        none
        Return Value:
        64 bit GUID

Description:
        ibv_get_device_guid returns the devices 64 bit Global Unique Identifier (GUID) in network byte order.

3.2.5.ibv_open_device
        Template:
        struct ibv_context *ibv_open_device(struct ibv_device *device)
        Input Parameters:
        device struct ibv_device for desired device
        Output Parameters:
        none
        Return Value:
        A verbs context that can be used for future operations on the device or NULL on failure.

Description:
        ibv_open_device provides the user with a verbs context which is the object that will be used for all other verb operations. 

3.2.6.ibv_close_device
        Template:
        int ibv_close_device(struct ibv_context *context)        
        Input Parameters:
        context struct ibv_context from ibv_open_device
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
        ibv_close_device closes the verb context previously opened with ibv_open_device. This operation does not free any other objects associated with the context. To avoid memory leaks, all other objects must be independently freed prior to calling this command. 

3.2.7.ibv_node_type_str
        Template:
        const char *ibv_node_type_str (enum ibv_node_type node_type)
        Input Parameters:
        node_type ibv_node_type enum value which may be an HCA, Switch, Router, RNIC or Unknown
        Output Parameters:
        none
        Return Value:
        A constant string which describes the enum value node_type

Description:
        ibv_node_type_str returns a string describing the node type enum value, node_type. This value can be an InfiniBand HCA, Switch, Router, an RDMA enabled NIC or unknown
enum ibv_node_type {
        IBV_NODE_UNKNOWN = -1,
        IBV_NODE_CA = 1,
        IBV_NODE_SWITCH,
        IBV_NODE_ROUTER,
        IBV_NODE_RNIC
};

3.2.8.ibv_port_state_str
        Template:
        const char *ibv_port_state_str (enum ibv_port_state port_state)
        Input Parameters:
        port_state The enumerated value of the port state
        Output Parameters:
        None
        Return Value:
        A constant string which describes the enum value port_state

Description:
        ibv_port_state_str returns a string describing the port state enum value, port_state.

enum ibv_port_state {
        IBV_PORT_NOP = 0,
        IBV_PORT_DOWN = 1,
        IBV_PORT_INIT = 2,
        IBV_PORT_ARMED = 3,
        IBV_PORT_ACTIVE = 4,
        IBV_PORT_ACTIVE_DEFER = 5
};

3.3.Verb Context Operations
        The following commands are used once a device has been opened. These commands allow you to get more specific information about a device or one of its ports, create completion queues (CQ), completion channels (CC), and protection domains (PD) which can be used for further operations. 

3.3.1.ibv_query_device
        Template:
        int ibv_query_device(struct ibv_context *context, struct ibv_device_attr *device_attr)
        Input Parameters:
        context struct ibv_context from ibv_open_device
        Output Parameters:
        device_attr struct ibv_device_attr containing device attributes
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
        ibv_query_device retrieves the various attributes associated with a device. The user should malloc a struct ibv_device_attr, pass it to the command, and it will be filled in upon successful return. The user is responsible to free this struct.
        struct ibv_device_attr is defined as follows:

        struct ibv_device_attr
        {
                char fw_ver[64];
                uint64_t node_guid;
                uint64_t sys_image_guid;
                uint64_t max_mr_size;
                uint64_t page_size_cap;
                uint32_t vendor_id;
                uint32_t vendor_part_id;

                uint32_t hw_ver;
                int max_qp;
                int max_qp_wr;
                int device_cap_flags;
                int max_sge;
                int max_sge_rd;

                int max_cq;
                int max_cqe;
                int max_mr;
                int max_pd;
                int max_qp_rd_atom;
                int max_ee_rd_atom;
                int max_res_rd_atom;

                int max_qp_init_rd_atom;
                int max_ee_init_rd_atom;
                enum ibv_atomic_cap atomic_cap;

                int max_ee;
                int max_rdd;
                int max_mw;
                int max_raw_ipv6_qp;
                int max_raw_ethy_qp;
                int max_mcast_grp;
                int max_mcast_qp_attach;
                int max_total_mcast_qp_attach;

                int max_ah;
                int max_fmr;
                int max_map_per_fmr;
                int max_srq;
                int max_srq_wr;
                int max_srq_sge;
                uint16_t max_pkeys;
                uint8_t local_ca_ack_delay;
                uint8_t phys_port_cnt;
        };

        fw_ver

                Firmware version
        node_guid

                Node global unique identifier (GUID)
        sys_image_guid

                System image GUID
        max_mr_size

                Largest contiguous block that can be registered
        page_size_cap

                Supported page sizes
        vendor_id

                Vendor ID, per IEEE
        vendor_part_id

                Vendor supplied part ID        
        hw_ver

                Hardware version

        max_qp

                Maximum number of Queue Pairs (QP)
        max_qp_wr

                Maximum outstanding work requests (WR) on any queue
        device_cap_flags

                IBV_DEVICE_RESIZE_MAX_WR
                IBV_DEVICE_BAD_PKEY_CNTR
                IBV_DEVICE_BAD_QKEY_CNTR
                IBV_DEVICE_RAW_MULTI
                IBV_DEVICE_AUTO_PATH_MIG
                IBV_DEVICE_CHANGE_PHY_PORT
                IBV_DEVICE_UD_AV_PORT_ENFORCE
                IBV_DEVICE_CURR_QP_STATE_MOD
                IBV_DEVICE_SHUTDOWN_PORT
                IBV_DEVICE_INIT_TYPE
                IBV_DEVICE_PORT_ACTIVE_EVENT
                IBV_DEVICE_SYS_IMAGE_GUID
                IBV_DEVICE_RC_RNR_NAK_GEN
                IBV_DEVICE_SRQ_RESIZE
                IBV_DEVICE_N_NOTIFY_CQ
                IBV_DEVICE_XRC

        max_sge

                Maximum scatter/gather entries (SGE) per WR for non-RD QPs
        max_sge_rd

                Maximum SGEs per WR for RD QPs
        max_cq

                Maximum supported completion queues (CQ)
        max_cqe

                Maximum completion queue entries (CQE) per CQ
        max_mr

                Maximum supported memory regions (MR)

        max_pd

                Maximum supported protection domains (PD)
        max_qp_rd_atom

                Maximum outstanding RDMA read and atomic operations per QP
        max_ee_rd_atom

                Maximum outstanding RDMA read and atomic operations per End to End (EE) context (RD connections)
        max_res_rd_atom

                Maximum resources used for incoming RDMA read and atomic operations
        max_qp_init_rd_atom

                Maximium RDMA read and atomic operations that may be initiated per QP
        max_ee_init_atom

                Maximum RDMA read and atomic operations that may be initiated per EE
        atomic_cap

                IBV_ATOMIC_NONE - no atomic guarantees
                IBV_ATOMIC_HCA - atomic guarantees within this device
                IBV_ATOMIC_GLOB - global atomic guarantees

        max_ee

                Maximum supported EE contexts
        max_rdd

                Maximum supported RD domains
        max_mw

                Maximum supported memory windows (MW)
        max_raw_ipv6_qp

                Maximum supported raw IPv6 datagram QPs
        max_raw_ethy_qp

                Maximum supported ethertype datagram QPs
        max_mcast_grp

                Maximum supported multicast groups
        max_mcast_qp_attach

                Maximum QPs per multicast group that can be attached
        max_total_mcast_qp_attach

                Maximum total QPs that can be attached to multicast groups
        max_ah

                Maximum supported address handles (AH)
        max_fmr

                Maximum supported fast memory regions (FMR)
        max_map_per_fmr

                Maximum number of remaps per FMR before an unmap operation is required
        max_srq

                Maximum supported shared receive queues (SRCQ)
        max_srq_wr

                Maximum work requests (WR) per SRQ
        max_srq_sge

                Maximum SGEs per SRQ
        max_pkeys

                Maximum number of partitions
        local_ca_ack_delay

                Local CA ack delay
        phys_port_cnt

                Number of physical ports 

3.3.2.ibv_query_port
        Template:
        int ibv_query_port(struct ibv_context *context, uint8_t port_num, struct ibv_port_attr *port_attr)
        Input Parameters:
        context

                struct ibv_context from ibv_open_device

        port_num

                physical port number (1 is first port)
        Output Parameters:
        port_attr

                struct ibv_port_attr containing port attributes
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_query_port retrieves the various attributes associated with a port. The user should allocate a struct ibv_port_attr, pass it to the command, and it will be filled in upon successful return. The user is responsible to free this struct.

        struct ibv_port_attr is defined as follows:
        struct ibv_port_attr
        {
                enum ibv_port_state state;
                enum ibv_mtu max_mtu;
                enum ibv_mtu active_mtu;
                int gid_tbl_len;
                uint32_t port_cap_flags;
                uint32_t max_msg_sz;
                uint32_t bad_pkey_cntr;
                uint32_t qkey_viol_cntr;
                uint16_t pkey_tbl_len;
                uint16_t lid;
                uint16_t sm_lid;
                uint8_t lmc;
                uint8_t max_vl_num;
                uint8_t sm_sl;
                uint8_t subnet_timeout;
                uint8_t init_type_reply;
                uint8_t active_width;
                uint8_t active_speed;
                uint8_t phys_state;
        };

        state

                IBV_PORT_NOP
                IBV_PORT_DOWN
                IBV_PORT_INIT
                IBV_PORT_ARMED
                IBV_PORT_ACTIVE
                IBV_PORT_ACTIVE_DEFER

        max_mtu

                Maximum Transmission Unit (MTU) supported by port. Can be:
                IBV_MTU_256
                IBV_MTU_512
                IBV_MTU_1024
                IBV_MTU_2048
                IBV_MTU_4096

        active_mtu

                Actual MTU in use

        gid_tbl_len

                Length of source global ID (GID) table
        port_cap_flags

                Supported capabilities of this port. There are currently no enumerations/defines declared in verbs.h
        max_msg_sz

                Maximum message size
        bad_pkey_cntr

                Bad P_Key counter
        qkey_viol_cntr

                Q_Key violation counter
        pkey_tbl_len

                Length of partition table
        lid

                First local identifier (LID) assigned to this port
        sm_lid

                LID of subnet manager (SM)
        lmc LID

                Mask control (used when multiple LIDs are assigned to port)
        max_vl_num

                Maximum virtual lanes (VL)
        sm_sl

                SM service level (SL)

        subnet_timeout

                Subnet propagation delay
        init_type_reply

                Type of initialization performed by SM
        active_width

                Currently active link width
        active_speed

                Currently active link speed
        phys_state

                Physical port state 

3.3.3.ibv_query_gid
        Template:
        int ibv_query_gid(struct ibv_context *context, uint8_t port_num, int index, union ibv_gid *gid)
        Input Parameters:
        context

                struct ibv_context from ibv_open_device
        port_num

                physical port number (1 is first port)
        index

                which entry in the GID table to return (0 is first)
        Output Parameters:
        gid union ibv_gid containing gid information
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_query_gid retrieves an entry in the port’s global identifier (GID) table. Each port is assigned at least one GID by the subnet manager (SM). The GID is a valid IPv6 address composed of the globally unique identifier (GUID) and a prefix assigned by the SM. GID[0] is unique and contains the port's GUID.
        The user should allocate a union ibv_gid, pass it to the command, and it will be filled in upon successful return. The user is responsible to free this union.
        union ibv_gid is defined as follows:
        union ibv_gid
        {
                uint8_t raw[16];
                struct
                {
                        uint64_t subnet_prefix;
                        uint64_t interface_id;
                 } global;
        }; 

3.3.4.ibv_query_pkey
        Template:
        int ibv_query_pkey(struct ibv_context *context, uint8_t port_num, int index, uint16_t *pkey)
        Input Parameters:
        context

                struct ibv_context from ibv_open_device
        port_num

                physical port number (1 is first port)
        index

                which entry in the pkey table to return (0 is first)
        Output Parameters:
        pkey desired pkey
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_query_pkey retrieves an entry in the port’s partition key (pkey) table. Each port is assigned at least one pkey by the subnet manager (SM). The pkey identifies a partition that the port belongs to. A pkey is roughly analogous to a VLAN ID in Ethernet networking.
        The user passes in a pointer to a uint16 that will be filled in with the requested pkey. The user is responsible to free this uint16. 

3.3.5.ibv_alloc_pd
        Template:
        struct ibv_pd *ibv_alloc_pd(struct ibv_context *context)
        Input Parameters:
        context

                struct ibv_context from ibv_open_device
        Output Parameters:
        none
        Return Value:
        Pointer to created protection domain or NULL on failure. 

Description:
        ibv_alloc_pd creates a protection domain (PD). PDs limit which memory regions can be accessed by which queue pairs (QP) providing a degree of protection from unauthorized access. The user must create at least one PD to use VPI verbs. 

 3.3.6.ibv_dealloc_pd
        Template:
        int ibv_dealloc_pd(struct ibv_pd *pd)
        Input Parameters:
        pd

                struct ibv_pd from ibv_alloc_pd
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
        ibv_dealloc_pd frees a protection domain (PD). This command will fail if any other objects are currently associated with the indicated PD. 

3.3.7.ibv_create_cq
        Template:
        struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe, void *cq_context, struct ibv_comp_channel *channel, int comp_vector)
        Input Parameters:
        context

                struct ibv_context from ibv_open_device
        cqe

                Minimum number of entries CQ will support
        cq_context (Optional)

                User defined value returned with completion
        events
                channel (Optional) Completion channel
        comp_vector (Optional)

                Completion vector
        Output Parameters:
        none
        Return Value:
        pointer to created CQ or NULL on failure. 

Description:
        ibv_create_cq creates a completion queue (CQ). A completion queue holds completion queue entries (CQE). Each Queue Pair (QP) has an associated send and receive CQ. A single CQ can be shared for sending and receiving as well as be shared across multiple QPs.
        The parameter cqe defines the minimum size of the queue. The actual size of the queue may be larger than the specified value.
        The parameter cq_context is a user defined value. If specified during CQ creation, this value will be returned as a parameter in ibv_get_cq_event when using a completion channel (CC).
        The parameter channel is used to specify a CC. A CQ is merely a queue that does not have a built in notification mechanism. When using a polling paradigm for CQ processing, a CC is unnecessary.
        The user simply polls the CQ at regular intervals. If, however, you wish to use a pend paradigm, a CC is required. The CC is the mechanism that allows the user to be notified that a new CQE is on the CQ.
        The parameter comp_vector is used to specify the completion vector used to signal completion events. It must be >=0 and < context->num_comp_vectors. 

3.3.8.ibv_resize_cq
        Template:
        int ibv_resize_cq(struct ibv_cq *cq, int cqe)
        Input Parameters:
        cq

                CQ to resize
        cqe

                Minimum number of entries CQ will support
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_resize_cq resizes a completion queue (CQ).
        The parameter cqe must be at least the number of outstanding entries on the queue. The actual size of the queue may be larger than the specified value. The CQ may (or may not) contain completions when it is being resized thus, it can be resized during work with the CQ. 

3.3.9.ibv_destroy_cq
        Template:
        int ibv_destroy_cq(struct ibv_cq *cq)
        Input Parameters:
        cq CQ to destroy
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_destroy_cq frees a completion queue (CQ). This command will fail if there is any queue pair (QP) that still has the specified CQ associated with it. 

3.3.10.ibv_create_comp_channel
        Template:
        struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context)
        Input Parameters:
        context

                struct ibv_context from ibv_open_device
        Output Parameters:
        none
        Return Value:
        pointer to created CC or NULL on failure. 

Description:
        ibv_create_comp_channel creates a completion channel. A completion channel is a mechanism for the user to receive notifications when new completion queue event (CQE) has been placed on a completion queue (CQ). 

3.3.11.ibv_destroy_comp_channel
        Template:
        int ibv_destroy_comp_channel(struct ibv_comp_channel *channel)
        Input Parameters:
        channel

                struct ibv_comp_channel from ibv_create_comp_channel
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_destroy_comp_channel frees a completion channel. This command will fail if there are any completion queues (CQ) still associated with this completion channel. 

3.4.Protection Domain Operations
        Once you have established a protection domain (PD), you may create objects within that domain.This section describes operations available on a PD. These include registering memory regions (MR), creating queue pairs (QP) or shared receive queues (SRQ) and address handles (AH). 

3.4.1.ibv_reg_mr
        Template:
        struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, enum ibv_access_flags access)
        Input Parameters:
        pd

                protection domain, struct ibv_pd from ibv_alloc_pd
        addr

                memory base address
        length

                length of memory region in bytes
        access

                access flags
        Output Parameters:
        none
        Return Value:
        pointer to created memory region (MR) or NULL on failure.

Description:
        ibv_reg_mr registers a memory region (MR), associates it with a protection domain (PD), and assigns it local and remote keys (lkey, rkey). All VPI commands that use memory require the memory to be registered via this command. The same physical memory may be mapped to different MRs even allowing different permissions or PDs to be assigned to the same memory, depending on user requirements.
        Access flags may be bitwise or one of the following enumerations:
        IBV_ACCESS_LOCAL_WRITE

                Allow local host write access
        IBV_ACCESS_REMOTE_WRITE

                Allow remote hosts write access
        IBV_ACCESS_REMOTE_READ

                Allow remote hosts read access
        IBV_ACCESS_REMOTE_ATOMIC

                Allow remote hosts atomic access
        IBV_ACCESS_MW_BIND

                Allow memory windows on this MR
        Local read access is implied and automatic.
        Any VPI operation that violates the access permissions of the given memory operation will fail.
        Note that the queue pair (QP) attributes must also have the correct permissions or the operation will fail.
        If IBV_ACCESS_REMOTE_WRITE or IBV_ACCESS_REMOTE_ATOMIC is set, then IBV_ACCESS_LOCAL_WRITE must be set as well.
        struct ibv_mr is defined as follows:
        struct ibv_mr
        {
                struct ibv_context *context;
                struct ibv_pd *pd;
                void *addr;
                size_t length;
                uint32_t handle;
                uint32_t lkey;
                uint32_t rkey;
        };

3.4.2.ibv_dereg_mr
        Template:
        int ibv_dereg_mr(struct ibv_mr *mr)
        Input Parameters:
        mr

                struct ibv_mr from ibv_reg_mr
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_dereg_mr frees a memory region (MR). The operation will fail if any memory windows (MW) are still bound to the MR.

3.4.3.ibv_create_qp
        Template:
        struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
        Input Parameters:
        pd

                struct ibv_pd from ibv_alloc_pd
        qp_init_attr

                initial attributes of queue pair
        Output Parameters:
        qp_init_attr

                actual values are filled in
        Return Value:
        pointer to created queue pair (QP) or NULL on failure. 

Description:
        ibv_create_qp creates a QP. When a QP is created, it is put into the RESET state.
        struct qp_init_attr is defined as follows:
        struct ibv_qp_init_attr
        {
                void *qp_context;
                struct ibv_cq *send_cq;
                struct ibv_cq *recv_cq;
                struct ibv_srq *srq;
                struct ibv_qp_cap cap;
                enum ibv_qp_type qp_type;
                int sq_sig_all;
                struct ibv_xrc_domain *xrc_domain;
        };
        qp_context (optional)

                user defined value associated with QP.
        send_cq

                send CQ. This must be created by the user prior to calling ibv_create_qp.
        recv_cq

                receive CQ. This must be created by the user prior to calling ibv_create_qp. It may be the same as send_cq.
        srq (optional)

                shared receive queue. Only used for SRQ QP’s.
        cap

                defined below.
        qp_type

                must be one of the following:
                IBV_QPT_RC = 2,
                IBV_QPT_UC,
                IBV_QPT_UD,
                IBV_QPT_XRC,
                IBV_QPT_RAW_PACKET = 8,
                IBV_QPT_RAW_ETH = 8
        sq_sig_all

                If this value is set to 1, all send requests (WR) will generate completion queue events (CQE). If this value is set to 0, only WRs that are flagged will generate CQE’s (see ibv_post_send).
        xrc_domain (Optional)

                Only used for XRC operations.
        struct ibv_qp_cap is defined as follows:
        struct ibv_qp_cap
        {
                uint32_t max_send_wr;
                uint32_t max_recv_wr;
                uint32_t max_send_sge;
                uint32_t max_recv_sge;
                uint32_t max_inline_data;
        };
        max_send_wr

                Maximum number of outstanding send requests in the send queue.
        max_recv_wr

                Maximum number of outstanding receive requests (buffers) in the receive queue.
        max_send_sge

                Maximum number of scatter/gather elements (SGE) in a WR on the send queue.
        max_recv_sge

                Maximum number of SGEs in a WR on the receive queue.
        max_inline_data

                Maximum size in bytes of inline data on the send queue. 

3.4.4.ibv_destroy_qp
        Template:
        int ibv_destroy_qp(struct ibv_qp *qp)
        Input Parameters:
        qp

                struct ibv_qp from ibv_create_qp
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_destroy_qp frees a queue pair (QP). 

3.4.5.ibv_create_srq
        Template:
        struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, struct ibv_srq_init_attr *srq_init_attr)
        Input Parameters:
        pd

                The protection domain associated with the shared receive queue (SRQ)
        srq_init_attr

                A list of initial attributes required to create the SRQ
        Output Parameters:
        ibv_srq__attr

                Actual values of the struct are set
        Return Value:
        A pointer to the created SRQ or NULL on failure 

Description:
        ibv_create_srq creates a shared receive queue (SRQ). srq_attr->max_wr and srq_attr->max_sge are read to determine the requeste size of the SRQ, and set to the actual values allocated on return. If ibv_create_srq succeeds, then max_wr and max_sge will be at least as large as the requested values.
        struct ibv_srq is defined as follows:
        struct ibv_srq

        {
                struct ibv_context *context; // struct ibv_context from ibv_open_device
                void *srq_context;
                struct ibv_pd *pd; // Protection domain
                uint32_t handle;
                pthread_mutex_t mutex;
                pthread_cond_t cond;
                uint32_t events_completed;
        }
        struct ibv_srq_init_attr is defined as follows:
        struct ibv_srq_init_attr
        {
                void *srq_context;
                struct ibv_srq_attr attr;
        };
        srq_context

                struct ibv_context from ibv_open_device
        attr

                 An ibv_srq_attr struct defined as follows:
        struct ibv_srq_attr is defined as follows:
        struct ibv_srq_attr
        {
                uint32_t max_wr;
                uint32_t max_sge;
                uint32_t srq_limit;
         };
        max_wr

                Requested maximum number of outstanding WRs in the SRQ
        max_sge

                Requested number of scatter elements per WR
        srq_limit;

                The limit value of the SRQ (irrelevant for ibv_create_srq) 

3.4.6.ibv_modify_srq
        Template:
        int ibv_modify_srq (struct ibv_srq *srq, struct ibv_srq_attr *srq_attr, int srq_attr_mask)
        Input Parameters:
        srq

                The SRQ to modify
        srq_attr

                Specifies the SRQ to modify (input)/the current values of the selected SRQ attributes are returned (output)
        srq_attr_mask

                A bit-mask used to specify which SRQ attributes are being modified
        Output Parameters:
        srq_attr

                The struct ibv_srq_attr is returned with the updated values
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_modify_srq modifies the attributes of the SRQ srq using the attribute values in srq_attr based on the mask srq_attr_mask. srq_attr is an ibv_srq_attr struct as defined above under the verb ibv_create_srq. The argument srq_attr_mask specifies the SRQ attributes to be modified. It is either 0 or the bitwise OR of one or more of the flags:
        IBV_SRQ_MAX_WR

                Resize the SRQ
        IBV_SRQ_LIMIT

                Set the SRQ limit
        If any of the attributes to be modified is invalid, none of the attributes will be modified. Also, not all devices support resizing SRQs. To check if a device supports resizing, check if the IBV_DEVICE_SRQ_RESIZE bit is set in the device capabilities flags.
        Modifying the SRQ limit arms the SRQ to produce an IBV_EVENT_SRQ_LIMIT_REACHED 'low watermark' async event once the number of WRs in the SRQ drops below the SRQ limit. 

3.4.7.ibv_destroy_srq
        Template:
        int ibv_destroy_srq(struct ibv_srq *srq)
        Input Parameters:
        srq

                The SRQ to destroy
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_destroy_srq destroys the specified SRQ. It will fail if any queue pair is still associated with this SRQ. 

3.4.8.ibv_open_xrc_domain
        Template:
        struct ibv_xrc_domain *ibv_open_xrc_domain(struct ibv_context *context, int fd, int oflag)
        Input Parameters:
        context

                struct ibv_context from ibv_open_device
        fd

                The file descriptor to be associated with the XRC domain
        oflag

                The desired file creation attributes
        Output Parameters:
        A file descriptor associated with the opened XRC domain
        Return Value:
        A reference to an opened XRC domain or NULL 

Description:
        ibv_open_xrc_domain opens an eXtended Reliable Connection (XRC) domain for the RDMA device context. The desired file creation attributes oflag can either be 0 or the bitwise OR of O_CREAT and O_EXCL. If a domain belonging to the device named by the context is already associated with the inode, then the O_CREAT flag has no effect. If both O_CREAT and O_XCL are set, open will fail if a domain associated with the inode already exists. Otherwise a new XRC domain will be created and associated with the inode specified by fd.

        Please note that the check for the existence of the domain and creation of the domain if it does not exist is atomic with respect to other processes executing open with fd naming the same inode. If fd equals -1, then no inode is associated with the domain, and the only valid value for oflag is O_CREAT.
        Since each ibv_open_xrc_domain call increments the xrc_domain object's reference count, each such call must have a corresponding ibv_close_xrc_domain call to decrement the xrc_domain object's reference count. 

3.4.9.ibv_create_xrc_srq
        Template:
        struct ibv_srq *ibv_create_xrc_srq(struct ibv_pd *pd, struct ibv_xrc_domain *xrc_domain,
struct ibv_cq *xrc_cq, struct ibv_srq_init_attr *srq_init_attr)
        Input Parameters:
        pd

                The protection domain associated with the shared receive queue
        xrc_domain

                The XRC domain
        xrc_cq

                The CQ which will hold the XRC completion
        srq_init_attr

                A list of initial attributes required to create the SRQ (described above)
        Output Parameters:
        ibv_srq_attr

                Actual values of the struct are set
        Return Value:
        A pointer to the created SRQ or NULL on failure 

Description:
        ibv_create_xrc_srq creates an XRC shared receive queue (SRQ) associated with the protection domain pd, the XRC domain domain_xrc and the CQ which will hold the completion xrc_cq struct ibv_xrc_domain is defined as follows:
struct ibv_xrc_domain {
        struct ibv_context *context;
        uint64_t handle;
}

3.4.10.ibv_close_xrc_domain
        Template:
        int ibv_close_xrc_domain(struct ibv_xrc_domain *d)
        Input Parameters:
        d

                A pointer to the XRC domain the user wishes to close
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_close_xrc_domain closes the XRC domain, d. If this happens to be the last reference, then the XRC domain will be destroyed. This function decrements a reference count and may fail if any QP or SRQ are still associated with the XRC domain being closed.

3.4.11.ibv_create_xrc_rcv_qp
        Template:
        int ibv_create_xrc_rcv_qp(struct ibv_qp_init_attr *init_attr, uint32_t *xrc_rcv_qpn)
        Input Parameters:
        init_attr

                The structure to be populated with QP information
        xrc_rcv_qpn

                The QP number associated with the receive QP to be created
        Output Parameters:
        init_attr

                Populated with the XRC domain information the QP will be associated with
        xrc_rcv_qpn

                The QP number associated with the receive QP being created
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_create_xrc_rcv_qp creates an XRC queue pair (QP) to serve as a receive side only QP and returns the QP number through xrc_rcv_qpn. This number must be passed to the remote (sender) node. The remote node will use xrc_rcv_qpn in ibv_post_send when it sends messages to an XRC SRQ on this host in the same xrc domain as the XRC receive QP.

        The QP with number xrc_rcv_qpn is created in kernel space and persists until the last process registered for the QP called ibv_unreg_xrc_rcv_qp, at which point the QP is destroyed. The process which creates this QP is automatically registered for it and should also call ibv_unreg_xrc_rcv_qp at some point to unregister.
        Any process which wishes to receive on an XRC SRQ via this QP must call ibv_reg_xrc_rcv_qp for this QP to ensure that the QP will not be destroyed while they are still using it.

        Please note that because the QP xrc_rcv_qpn is a receive only QP, the send queue in the init_attr struct is ignored.

3.4.13.ibv_modify_xrc_rcv_qp
        Template:
        int ibv_modify_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num, struct
ibv_qp_attr *attr, int attr_mask)
        Input Parameters:
        xrc_domain

                The XRC domain associated with this QP
        xrc_qp_num

                The queue pair number to identify this QP
        attr

                The attributes to use to modify the XRC receive QP
        attr_mask

                The mask to use for modifying the QP attributes
        Output Parameters:
        None
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_modify_xrc_rcv_qp modifies the attributes of an XRC receive QP with the number
xrc_qp_num which is associated with the attributes in the struct attr according to the mask
attr_mask. It then moves the QP through the following transitions: Reset->Init->RTR
At least the following masks must be set (the user may add optional attributes as needed)

        Next State                                         Next State Required attributes
                Init                                             IBV_QP_STATE, IBV_QP_PKEY_INDEX,                                                                   IBV_QP_PORT,IBV_QP_ACCESS_FLAGS
               RTR                                           IBV_QP_STATE, IBV_QP_AV,                                                                  IBV_QP_PATH_MTU,
                                                                 IBV_QP_DEST_QPN, IBV_QP_RQ_PSN,                                                                               IBV_QP_MAX_DEST_RD_ATOMIC,                                                                                       IBV_QP_MIN_RNR_TIMER

        Please note that if any attribute to modify is invalid or if the mask as invalid values, then none of the attributes will be modified, including the QP state.

3.4.13.ibv_reg_xrc_rcv_qp
        Template:
        int ibv_reg_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num)
        Input Parameters:
        xrc_domain

                The XRC domain associated with the receive QP
        xrc_qp_num

                The number associated with the created QP to which the user process is to be registered
        Output Parameters:
        None
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
        ibv_reg_xrc_rcv_qp registers a user process with the XRC receive QP whose number is xrc_qp_num associated with the XRC domain xrc_domain.
        This function may fail if the number xrc_qp_num is not the number of a valid XRC receive QP (for example if the QP is not allocated or it is the number of a non-XRC QP), or the XRC receive QP was created with an XRC domain other than xrc_domain.

 3.4.14.ibv_unreg_xrc_rcv_qp
        Template:
        int ibv_unreg_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num)
        Input Parameters:
        xrc_domain

                The XRC domain associated with the XRC receive QP from which the user wishes to unregister
        xrc_qp_num

                The QP number from which the user process is to be unregistered
        Output Parameters:
        None
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
        ibv_unreg_xrc_rcv_qp unregisters a user process from the XRC receive QP number xrc_qp_num which is associated with the XRC domain xrc_domain. When the number of user processes registered with this XRC receive QP drops to zero, the QP is destroyed.

 3.4.15.ibv_create_ah
        Template:
        struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr)
        Input Parameters:
        pd

                struct ibv_pd from ibv_alloc_pd
        attr

                attributes of address
        Output Parameters:
        none
        Return Value:
        pointer to created address handle (AH) or NULL on failure.

Description:
        ibv_create_ah creates an AH. An AH contains all of the necessary data to reach a remote destination.
        In connected transport modes (RC, UC) the AH is associated with a queue pair (QP). In the datagram transport modes (UD), the AH is associated with a work request (WR).
struct ibv_ah_attr is defined as follows:
        struct ibv_ah_attr
        {
                struct ibv_global_route grh;
                uint16_t dlid;
                uint8_t sl;
                uint8_t src_path_bits;
                uint8_t static_rate;
                uint8_t is_global;
                uint8_t port_num;
        };
        grh

                defined below
        dlid

                destination lid
        sl

                service level
        src_path_bits

                source path bits
        static_rate

                static rate
        is_global

                this is a global address, use grh.
        port_num

                physical port number to use to reach this destination
        struct ibv_global_route is defined as follows:
        struct ibv_global_route
        {
                union ibv_gid dgid;
                uint32_t flow_label;
                uint8_t sgid_index;
                uint8_t hop_limit;
                uint8_t traffic_class;
         };
        dgid

                destination GID (see ibv_query_gid for definition)
        flow_label

                flow label
        sgid_index

                index of source GID (see ibv_query_gid)
        hop_limit

                hop limit
        traffic_class

                traffic class

3.4.16.ibv_destroy_ah
        Template:
        int ibv_destroy_ah(struct ibv_ah *ah)
        Input Parameters:
        ah

                struct ibv_ah from ibv_create_ah
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_destroy_ah frees an address handle (AH). Once an AH is destroyed, it can't be used anymore in UD QPs

3.5.Queue Pair Bringup (ibv_modify_qp)
        Queue pairs (QP) must be transitioned through an incremental sequence of states prior to being able to be used for communication.
        QP States:
        RESET

                Newly created, queues empty.
        INIT

                Basic information set. Ready for posting to receive queue.
        RTR

                Ready to Receive. Remote address info set for connected QPs, QP may now receive packets.
        RTS

                Ready to Send. Timeout and retry parameters set, QP may now send packets.
        These transitions are accomplished through the use of the ibv_modify_qp command. 

3.5.1.ibv_modify_qp
        Template:
        int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask)
        Input Parameters:
        qp

                struct ibv_qp from ibv_create_qp
        attr

                QP attributes
        attr_mask

                bit mask that defines which attributes within attr have been set for this call
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
        ibv_modify_qp this verb changes QP attributes and one of those attributes may be the QP state. Its name is a bit of a misnomer, since you cannot use this command to modify qp attributes at will. There is a very strict set of attributes that may be modified during each transition, and transitions must occur in the proper order. The following subsections describe each transition in more detail.
        struct ibv_qp_attr is defined as follows:

        struct ibv_qp_attr
        {
                enum ibv_qp_state qp_state;
                enum ibv_qp_state cur_qp_state;
                enum ibv_mtu path_mtu;
                enum ibv_mig_state path_mig_state;
                uint32_t qkey;
                uint32_t rq_psn;
                uint32_t sq_psn;
                uint32_t dest_qp_num;
                int qp_access_flags;
                struct ibv_qp_cap cap;
                struct ibv_ah_attr ah_attr;
                struct ibv_ah_attr alt_ah_attr;
                uint16_t pkey_index;
                uint16_t alt_pkey_index;
                uint8_t en_sqd_async_notify;
                uint8_t sq_draining;
                uint8_t max_rd_atomic;
                uint8_t max_dest_rd_atomic;
                uint8_t min_rnr_timer;
                uint8_t port_num;
                uint8_t timeout;
                uint8_t retry_cnt;
                uint8_t rnr_retry;
                uint8_t alt_port_num;
                uint8_t alt_timeout;
        };

        The following values select one of the above attributes and should be OR’d into the attr_mask field:

        IBV_QP_STATE
        IBV_QP_CUR_STATE
        IBV_QP_EN_SQD_ASYNC_NOTIFY
        IBV_QP_ACCESS_FLAGS
        IBV_QP_PKEY_INDEX
        IBV_QP_PORT
        IBV_QP_QKEY
        IBV_QP_AV
        IBV_QP_PATH_MTU
        IBV_QP_TIMEOUT
        IBV_QP_RETRY_CNT
        IBV_QP_RNR_RETRY
        IBV_QP_RQ_PSN
        IBV_QP_MAX_QP_RD_ATOMIC
        IBV_QP_ALT_PATH
        IBV_QP_MIN_RNR_TIMER
        IBV_QP_SQ_PSN
        IBV_QP_MAX_DEST_RD_ATOMIC
        IBV_QP_PATH_MIG_STATE
        IBV_QP_CAP
        IBV_QP_DEST_QPN

3.5.2.RESET to INIT
        When a queue pair (QP) is newly created, it is in the RESET state. The first state transition that needs to happen is to bring the QP in the INIT state.
        Required Attributes:
        *** All QPs ***
        qp_state / IBV_QP_STATEIBV_QPS_INIT
        pkey_index / IBV_QP_PKEY_INDEX pkey index, normally 0
        port_num / IBV_QP_PORTphysical port number (1...n)
        qp_access_flags / IBV_QP_ACCESS_FLAGS access flags (see ibv_reg_mr)
        *** Unconnected QPs only ***
        qkey / IBV_QP_QKEY qkey (see ibv_post_send)
        Optional Attributes:
        none
        Effect of transition:
        Once the QP is transitioned into the INIT state, the user may begin to post receive buffers to the receive queue via the ibv_post_recv command. At least one receive buffer should be posted
before the QP can be transitioned to the RTR state. 

3.5.3.INIT to RTR
        Once a queue pair (QP) has receive buffers posted to it, it is now possible to transition the QP into the ready to receive (RTR) state.
        Required Attributes:
        *** All QPs ***
        qp_state / IBV_QP_STATEIBV_QPS_RTR
        path_mtu / IBV_QP_PATH_MTUIB_MTU_256
        IB_MTU_512 (recommended value)
        IB_MTU_1024
        IB_MTU_2048
        IB_MTU_4096
        *** Connected QPs only ***
        ah_attr / IBV_QP_AV an address handle (AH) needs to be created and filled in as
appropriate. Minimally, ah_attr.dlid needs to be filled in.
        dest_qp_num / IBV_QP_DEST_QPNQP number of remote QP.
        rq_psn / IBV_QP_RQ_PSN starting receive packet sequence number (should match
remote QP’s sq_psn)
        max_dest_rd_atomic /IBV_MAX_DEST_RD_ATOMIC maximum number of resources for incoming RDMA requests
        min_rnr_timer / IBV_QP_MIN_RNR_TIMER minimum RNR NAK timer (recommended value: 12)
        Optional Attributes:
        *** All QPs ***
        qp_access_flags /IBV_QP_ACCESS_FLAGS access flags (see ibv_reg_mr)
        pkey_index / IBV_QP_PKEY_INDEX pkey index, normally 0
        *** Connected QPs only ***
        alt_ah_attr / IBV_QP_ALT_PATHAH with alternate path info filled in
        *** Unconnected QPs only ***
        qkey / IBV_QP_QKEY qkey (see ibv_post_send)
        Effect of transition:
        Once the QP is transitioned into the RTR state, the QP begins receive processing.

3.5.4.RTR to RTS
        Once a queue pair (QP) has reached ready to receive (RTR) state, it may then be transitioned to the ready to send (RTS) state.
        Required Attributes:
        *** All QPs ***
        qp_state / IBV_QP_STATEIBV_QPS_RTS
        *** Connected QPs only ***
        timeout / IBV_QP_TIMEOUTlocal ack timeout (recommended value: 14)
        retry_cnt / IBV_QP_RETRY_CNTretry count (recommended value: 7)
        rnr_retry / IBV_QP_RNR_RETRYRNR retry count (recommended value: 7)
        sq_psn / IBV_SQ_PSN send queue starting packet sequence number (should match
remote QP’s rq_psn)
        max_rd_atomic / IBV_QP_MAX_QP_RD_ATOMICnumber of outstanding RDMA reads and atomic operations allowed.
        Optional Attributes:
        *** All QPs ***
        qp_access_flags / IBV_QP_ACCESS_FLAGSaccess flags (see ibv_reg_mr)
        *** Connected QPs only ***
        alt_ah_attr / IBV_QP_ALT_PATHAH with alternate path info filled in
        min_rnr_timer / IBV_QP_MIN_RNR_TIMERminimum RNR NAK timer
        *** Unconnected QPs only ***
        qkey / IBV_QP_QKEY qkey (see ibv_post_send)
        Effect of transition:
        Once the QP is transitioned into the RTS state, the QP begins send processing and is fully operational. The user may now post send requests with the ibv_post_send command.

3.6.Active Queue Pair Operations
        A QP can be queried staring at the point it was created and once a queue pair is completely operational, you may query it, be notified of events and conduct send and receive operations on it. This section describes the operations available to perform these actions.

3.6.1.ibv_query_qp
        Template:
        int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *init_attr)
        Input Parameters:
        qp

                struct ibv_qp from ibv_create_qp
        attr_mask

                bitmask of items to query (see ibv_modify_qp)
        Output Parameters:
        attr

                struct ibv_qp_attr to be filled in with requested attributes
        init_attr

                struct ibv_qp_init_attr to be filled in with initial attributes
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
        ibv_query_qp retrieves the various attributes of a queue pair (QP) as previously set through ibv_create_qp and ibv_modify_qp.
        The user should allocate a struct ibv_qp_attr and a struct ibv_qp_init_attr and pass them to the command. These structs will be filled in upon successful return. The user is responsible to free these structs. struct ibv_qp_init_attr is described in ibv_create_qp and struct ibv_qp_attr is described in ibv_modify_qp.

3.6.2.ibv_query_srq 
        Template:
        int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr)
        Input Parameters:
        srq

                The SRQ to query
        srq_attr

                The attributes of the specified SRQ
        Output Parameters:
        srq_attr

                The struct ibv_srq_attr is returned with the attributes of the specified SRQ
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.

Description:
        ibv_query_srq returns the attributes list and current values of the specified SRQ. It returns the attributes through the pointer srq_attr which is an ibv_srq_attr struct described above under ibv_create_srq. If the value of srq_limit in srq_attr is 0, then the SRQ limit reached ('low watermark') event is not or is no longer armed. No asynchronous events will be generated until the event is re-armed.

3.6.3.ibv_query_xrc_rcv_qp
        Template:
        int ibv_query_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num,
struct ibv_qp_attr *attr, int attr_mask, struct ibv_qp_init_attr *init_attr)
        Input Parameters:
        xrc_domain

                The XRC domain associated with this QP
        xrc_qp_num

                The queue pair number to identify this QP

        attr

                The ibv_qp_attr struct in which to return the attributes
        attr_mask

                A mask specifying the minimum list of attributes to retrieve
        init_attr

                The ibv_qp_init_attr struct to return the initial attributes
        Output Parameters:
        attr A pointer to the struct containing the QP attributes of interest init_attr A pointer to the struct containing initial attributes
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_query_xrc_rcv_qp retrieves the attributes specified in attr_mask for the XRC receive QP with the number xrc_qp_num and domain xrc_domain. It returns them through the pointers attr and init_attr.
        The attr_mask specifies a minimal list to retrieve. Some RDMA devices may return extra attributes not requested. Attributes are valid if they have been set using the ibv_modify_xrc_rcv_qp. The exact list of valid attributes depends on the QP state. Multiple ibv_query_xrc_rcv_qp calls may yield different returned values for these attributes: qp_state, path_mig_state, sq_draining, ah_attr (if automatic path migration (APM) is enabled).

3.6.4.ibv_post_recv
        Template:
        int ibv_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr)
        Input Parameters:
        qp

                struct ibv_qp from ibv_create_qp
        wr

                first work request (WR) containing receive buffers
        Output Parameters:
        bad_wr

                pointer to first rejected WR
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_post_recv posts a linked list of WRs to a queue pair’s (QP) receive queue. At least one receive buffer should be posted to the receive queue to transition the QP to RTR. Receive buffers are consumed as the remote peer executes Send, Send with Immediate and RDMA Write with Immediate operations. Receive buffers are NOT used for other RDMA operations. Processing of the WR list is stopped on the first error and a pointer to the offending WR is returned in bad_wr.
        struct ibv_recv_wr is defined as follows:
        struct ibv_recv_wr
        {
                uint64_t wr_id;
                struct ibv_recv_wr *next;
                struct ibv_sge *sg_list;
                int num_sge;
        };
        wr_id

                user assigned work request ID
        next

                pointer to next WR, NULL if last one.
        sg_list

                scatter array for this WR
        num_sge

                number of entries in sg_list
        struct ibv_sge is defined as follows:
        struct ibv_sge
        {
                uint64_t addr;
                uint32_t length;
                uint32_t lkey;
        };
        addr address of buffer
        length length of buffer
        lkey local key (lkey) of buffer from ibv_reg_mr

3.6.5.ibv_post_send
        Template:
        int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr)
        Input Parameters:
        qp

                struct ibv_qp from ibv_create_qp
        wr

                first work request (WR)
        Output Parameters:
        bad_wr pointer to first rejected WR
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_post_send posts a linked list of WRs to a queue pair’s (QP) send queue. This operation is used to initiate all communication, including RDMA operations. Processing of the WR list is stopped on the first error and a pointer to the offending WR is returned in bad_wr.
        The user should not alter or destroy AHs associated with WRs until the request has been fully executed and a completion queue entry (CQE) has been retrieved from the corresponding completion queue (CQ) to avoid unexpected behaviour.

        The buffers used by a WR can only be safely reused after the WR has been fully executed and a WCE has been retrieved from the corresponding CQ. However, if the IBV_SEND_INLINE flag was set, the buffer can be reused immediately after the call returns. struct ibv_send_wr is defined as follows:

        struct ibv_send_wr
        {
                uint64_t wr_id;
                struct ibv_send_wr *next;
                struct ibv_sge *sg_list;
                int num_sge;
                enum ibv_wr_opcode opcode;
                enum ibv_send_flags send_flags;
                uint32_t imm_data;/* network byte order */
                union
                {
                        struct
                        {
                                uint64_t remote_addr;
                                uint32_t rkey;
                        } rdma;
                        struct
                        {
                                uint64_t remote_addr;
                                uint64_t compare_add;
                                uint64_t swap;
                                uint32_t rkey;
                        } atomic;
                        struct
                        {
                                struct ibv_ah *ah;
                                uint32_t remote_qpn;
                                uint32_t remote_qkey;
                        } ud;
                } wr;
                uint32_t xrc_remote_srq_num;
        };

        wr_id

                user assigned work request ID
        next

                pointer to next WR, NULL if last one.
        sg_list

                scatter/gather array for this WR
        num_sge

                number of entries in sg_list

        opcode

                IBV_WR_RDMA_WRITE
        IBV_WR_RDMA_WRITE_WITH_IMM
        IBV_WR_SEND
        IBV_WR_SEND_WITH_IMM
        IBV_WR_RDMA_READ
        IBV_WR_ATOMIC_CMP_AND_SWP
        IBV_WR_ATOMIC_FETCH_AND_ADD
        send_flags (optional)

                - this is a bitwise OR of the flags. See the details below.

        imm_data

                immediate data to send in network byte order
        remote_addr

                remote virtual address for RDMA/atomic operations
        rkey

                remote key (from ibv_reg_mr on remote) for RDMA/atomic operations
        compare_add

                compare value for compare and swap operation
        swap

                swap value
        ah

                address handle (AH) for datagram operations
        remote_qpn

                remote QP number for datagram operations
        remote_qkey

                Qkey for datagram operations
        xrc_remote_srq_num

                shared receive queue (SRQ) number for the destination extended reliable connection (XRC). Only used for XRC operations.

        send flags:
        IBV_SEND_FENCE

                set fence indicator
        IBV_SEND_SIGNALED

                send completion event for this WR. Only meaningful for QPs that had the sq_sig_all set to 0
        IBV_SEND_SEND_SOLICITED
              set solicited event indicator
        IBV_SEND_INLINE

                send data in sge_list as inline data.
        struct ibv_sge is defined in ibv_post_recv.

3.6.6.ibv_post_srq_recv
        Template:
        int ibv_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *recv_wr, struct ibv_recv_wr **bad_recv_wr)
        Input Parameters:
        srq

                The SRQ to post the work request to
        recv_wr

                A list of work requests to post on the receive queue
        Output Parameters:
        bad_recv_wr

                pointer to first rejected WR
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_post_srq_recv posts a list of work requests to the specified SRQ. It stops processing the WRs from this list at the first failure (which can be detected immediately while requests are being posted), and returns this failing WR through the bad_recv_wr parameter. The buffers used by a WR can only be safely reused after WR the request is fully executed and a work completion has been retrieved from the corresponding completion queue (CQ). If a WR is being posted to a UD QP, the Global Routing Header (GRH) of the incoming message will be placed in the first 40 bytes of the buffer(s) in the scatter list. If no GRH is present in the incoming message, then the first 40 bytes will be undefined. This means that in all cases for UD QPs, the actual data of the incoming message will start at an offset of 40 bytes into the buffer(s) in the scatter list. 

3.6.7.ibv_req_notify_cq
        Template:
        int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only)
        Input Parameters:
        cq

                struct ibv_cq from ibv_create_cq
        solicited_only

                only notify if WR is flagged as solicited
        Output Parameters:
        none
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_req_notify_cq arms the notification mechanism for the indicated completion queue (CQ). When a completion queue entry (CQE) is placed on the CQ, a completion event will be sent to the completion channel (CC) associated with the CQ. If there is already a CQE in that CQ, an event won't be generated for this event. If the solicited_only flag is set, then only CQEs for WRs that had the solicited flag set will trigger the notification.
        The user should use the ibv_get_cq_event operation to receive the notification.
The notification mechanism will only be armed for one notification. Once a notification is sent, the mechanism must be re-armed with a new call to ibv_req_notify_cq.

3.6.8.ibv_get_cq_event
        Template:
        int ibv_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq, void **cq_context)
        Input Parameters:
        channel

                struct ibv_comp_channel from ibv_create_comp_channel
        Output Parameters:
        cq

                pointer to completion queue (CQ) associated with event
        cq_context

                user supplied context set in ibv_create_cq        
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure. 

Description:
        ibv_get_cq_event waits for a notification to be sent on the indicated completion channel (CC). Note that this is a blocking operation. The user should allocate pointers to a struct ibv_cq and a void to be passed into the function. They will be filled in with the appropriate values upon return. It is the user’s responsibility to free these pointers.

        Each notification sent MUST be acknowledged with the ibv_ack_cq_events operation. Since the ibv_destroy_cq operation waits for all events to be acknowledged, it will hang if any events are not properly acknowledged.
        Once a notification for a completion queue (CQ) is sent on a CC, that CQ is now “disarmed” and will not send any more notifications to the CC until it is rearmed again with a new call to the ibv_req_notify_cq operation.
        This operation only informs the user that a CQ has completion queue entries (CQE) to be processed, it does not actually process the CQEs. The user should use the ibv_poll_cq operation to process the CQEs. 

 3.6.9.ibv_ack_cq_events
        Template:
        void ibv_ack_cq_events (struct ibv_cq *cq, unsigned int nevents)
        Input Parameters:
        cq
                struct ibv_cq from ibv_create_cq
        nevents
                number of events to acknowledge (1...n)
        Output Parameters:
        None
        Return Value:
        None
Description:
        ibv_ack_cq_events acknowledges events received from ibv_get_cq_event . Although each notification received from ibv_get_cq_event counts as only one event, the user may acknowledge  multiple events through a single call to ibv_ack_cq_events . The number of events to acknowledge is passed in nevents and should be at least 1. Since this operation takes a mutex, it is some what expensive and acknowledging multiple events in one call may provide better performance.
        See ibv_get_cq_event for additional details.
3.6.10.ibv_poll_cq
        Template:
        int ibv_poll_cq (struct ibv_cq *cq, int num_entries, struct ibv_wc *wc)
        Input Parameters:
        cq
                struct ibv_cq from ibv_create_cq
        num_entries
                maximum number of completion queue entries (CQE) to return
        Output Parameters:
        wc CQE array
        Return Value:
        Number of CQEs in array wc or -1 on error
 Description:

        ibv_poll_cq retrieves CQEs from a completion queue (CQ). The user should allocate an array of  struct ibv_wc and pass it to the call in wc. The number of entries available in wc should be  passed in num_entries. It is the user’s responsibility to free this memory.
        The number of CQEs actually retrieved is given as the return value.
        CQs must be polled regularly to prevent an overrun. In the event of an overrun, the CQ will be  shut down and an async event IBV_EVENT_CQ_ERR will be sent.
        struct ibv_wc is defined as follows:
        struct ibv_wc
        {
                uint64_t wr_id;
                enum ibv_wc_status status;
                enum ibv_wc_opcode opcode;
                uint32_t vendor_err;
                uint32_t byte_len;
                uint32_t imm_data;/* network byte order */
                uint32_t qp_num;
                uint32_t src_qp;
                enum ibv_wc_flags wc_flags;
                uint16_t pkey_index;
                uint16_t slid;
                uint8_t sl;
                uint8_t dlid_path_bits;
        };
        wr_id
                user specified work request id as given in ibv_post_send or   ibv_post_recv
        status
                IBV_WC_SUCCESS
                IBV_WC_LOC_LEN_ERR
                IBV_WC_LOC_QP_OP_ERR
                IBV_WC_LOC_EEC_OP_ERR
                IBV_WC_LOC_PROT_ERR
                IBV_WC_WR_FLUSH_ERR
                IBV_WC_MW_BIND_ERR
                IBV_WC_BAD_RESP_ERR
                IBV_WC_LOC_ACCESS_ERR
                IBV_WC_REM_INV_REQ_ERR
                IBV_WC_REM_ACCESS_ERR
                IBV_WC_REM_OP_ERR
                IBV_WC_RETRY_EXC_ERR
                IBV_WC_RNR_RETRY_EXC_ERR
                IBV_WC_LOC_RDD_VIOL_ERR
                IBV_WC_REM_INV_RD_REQ_ERR
                IBV_WC_REM_ABORT_ERR
                IBV_WC_INV_EECN_ERR
                IBV_WC_INV_EEC_STATE_ERR
                IBV_WC_FATAL_ERR
                IBV_WC_RESP_TIMEOUT_ERR
                IBV_WC_GENERAL_ERR
        opcode
                IBV_WC_SEND,
                IBV_WC_RDMA_WRITE,
                IBV_WC_RDMA_READ,
                IBV_WC_COMP_SWAP,
                IBV_WC_FETCH_ADD,
                IBV_WC_BIND_MW,
                IBV_WC_RECV= 1 << 7,
                IBV_WC_RECV_RDMA_WITH_IMM
        vendor_err
                vendor specific error
        byte_len
                number of bytes transferred
        imm_data
                immediate data
        qp_num
                local queue pair (QP) number
        src_qp
                remote QP number
        wc_flags
                see below
        pkey_index
                index of pkey (valid only for GSI QPs)
        slid
                source local identifier (LID)
        sl
                service level (SL)
        dlid_path_bits
                destination LID path bits
        flags:
                IBV_WC_GRH
                        global route header (GRH) is present in UD packet
                IBV_WC_WITH_IMM
                        immediate data value is valid
3.6.11.ibv_init_ah_from_wc
        Template:
        int ibv_init_ah_from_wc (struct ibv_context *context, uint8_t port_num,  struct ibv_wc *wc, struct ibv_grh *grh,  struct ibv_ah_attr *ah_attr)
        Input Parameters:
        context
                struct ibv_context from ibv_open_device. This should be the  device the completion queue entry (CQE) was received on.
        port_num
                physical port number (1..n) that CQE was received on 
        wc
                received CQE from ibv_poll_cq
        grh
                global route header (GRH) from packet (see description)
        
        Output Parameters:
        ah_attr
                address handle (AH) attributes
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate  the reason for the failure.
 Description:
        ibv_init_ah_from_wc initializes an AH with the necessary attributes to generate a response to a  received datagram. The user should allocate a struct ibv_ah_attr and pass this in. If appropriate,  the GRH from the received packet should be passed in as well. On UD connections the first 40 bytes of the received packet may contain a GRH. Whether or not this header is present is indicated by the IBV_WC_GRH flag of the CQE. If the GRH is not present on a packet on a UD connection, the first 40 bytes of a packet are undefined.
        When the function ibv_init_ah_from_wc completes, the ah_attr will be filled in and the ah_attr  may then be used in the ibv_create_ah function. The user is responsible for freeing ah_attr.  Alternatively, ibv_create_ah_from_wc may be used instead of this operation.
 3.6.12.ibv_create_ah_from_wc
        Template:
        struct ibv_ah *ibv_create_ah_from_wc (struct ibv_pd *pd, struct ibv_wc *wc, struct ibv_grh *grh, uint8_t port_num)
        Input Parameters:
        pd
                protection domain (PD) from ibv_alloc_pd
        wc
                completion queue entry (CQE) from ibv_poll_cq
        grh
                global route header (GRH) from packet
        port_num
                physical port number (1..n) that CQE was received on
        Output Parameters:
        none
        Return Value:
        Created address handle (AH) on success or -1 on error
Description:
        ibv_create_ah_from_wc combines the operations ibv_init_ah_from_wc and ibv_create_ah .
        See the description of those operations for details.
 3.7.Event Handling Operations
  3.7.1.ibv_get_async_event
        Template:

        int ibv_get_async_event(struct ibv_context *context, struct ibv_async_event *event)

        Input Parameters:
        context
                struct ibv_context from ibv_open_device
        event
                A pointer to use to return the async event
        Output Parameters:
        event
                A pointer to the async event being sought
        Return Value:
        0 on success, -1 on error. If the call fails, errno will be set to indicate  the reason for the failure.
Description:
        ibv_get_async_event gets the next asynchronous event of the RDMA device context 'context'  and returns it through the pointer 'event' which is an ibv_async_event struct. All async events  returned by ibv_get_async_event must eventually be acknowledged with ibv_ack_asyn c_event .
        ibv_get_async_event() is a blocking function. If multiple threads call this function simultaneously, then when an async event occurs, only one thread will receive it, and it is not possible to predict which thread will receive it.
        struct ibv_async_event is defined as follows:
        struct ibv_async_event {
                union {
                        struct ibv_cq *cq; //The CQ that got the event
                        struct ibv_qp *qp; //The QP that got the event
                        struct ibv_srq *srq; //The SRQ that got the event
                        intport_num; //The port number that got the event
                } element;
                enum ibv_event_type event_type; //Type of event
        };
        One member of the element union will be valid, depending on the event_type member of the structure. event_type will be one of the following events:
        QP events:
        IBV_EVENT_QP_FATAL
                Error occurred on a QP and it transitioned to error state
        IBV_EVENT_QP_REQ_ERR
                Invalid Request Local Work Queue Error
        IBV_EVENT_QP_ACCESS_ERR
                Local access violation error
        IBV_EVENT_COMM_EST
                Communication was established on a QP
        IBV_EVENT_SQ_DRAINED
                Send Queue was drained of outstanding messages in progress
        IBV_EVENT_PATH_MIG
                A connection has migrated to the alternate path
        IBV_EVENT_PATH_MIG_ERR
                A connection failed to migrate to the alternate path
        IBV_EVENT_QP_LAST_WQE_REACHED
                Last WQE Reached on a QP associated with an SRQ
        
        CQ events:
        IBV_EVENT_CQ_ERR
                CQ is in error (CQ overrun)
        SRQ events:
        IBV_EVENT_SRQ_ERR
                Error occurred on an SRQ
        IBV_EVENT_SRQ_LIMIT_REACHED
                SRQ limit was reached
        Port events:
        IBV_EVENT_PORT_ACTIVE
                Link became active on a port
        IBV_EVENT_PORT_ERR
                Link became unavailable on a port
        IBV_EVENT_LID_CHANGE
                LID was changed on a port
        IBV_EVENT_PKEY_CHANGE
                P_Key table was changed on a port
        IBV_EVENT_SM_CHANGE
                SM was changed on a port
        IBV_EVENT_CLIENT_REREGISTER
                SM sent a CLIENT_REREGISTER request to a port
        IBV_EVENT_GID_CHANGE
                GID table was changed on a port
        
        CA events:
        IBV_EVENT_DEVICE_FATAL         
                CA is in FATAL state
 

3.7.2.ib_ack_async_event     
        Template:
        void ibv_ack_async_event( struct ibv_async_event *event)
        Input Parameters:
        event A pointer to the event to be acknowledged
        Output Parameters:
        None
        Return Value:
        None
Description:
        All async events that ibv_get_async_event() returns must be acknowledged using ibv_ack_async_event(). To avoid races, destroying an object (CQ, SRQ or QP) will wait for all affiliated events for the object to be acknowledged; this avoids an application retrieving an affiliated event after the corresponding object has already been destroyed.
 3.7.3.ibv_event_type_str    
        Template:
        const char *ibv_event_type_str (enum ibv_event_type event_type)
        Input Parameters:
        event_type
                ibv_event_type enum value
        Output Parameters:
        None
        Return Value:
        
        A constant string which describes the enum value event_type
Description:
        ibv_event_type_str returns a string describing the event type enum value, event_type. event_type may be any one of the 19 different enum values describing different IB events.
        ibv_event_type {
                IBV_EVENT_CQ_ERR,
                IBV_EVENT_QP_FATAL,
                IBV_EVENT_QP_REQ_ERR,
                IBV_EVENT_QP_ACCESS_ERR,
                IBV_EVENT_COMM_EST,
                IBV_EVENT_SQ_DRAINED,
                IBV_EVENT_PATH_MIG,
                IBV_EVENT_PATH_MIG_ERR,
                IBV_EVENT_DEVICE_FATAL,
                IBV_EVENT_PORT_ACTIVE,
                IBV_EVENT_PORT_ERR,
                IBV_EVENT_LID_CHANGE,
                IBV_EVENT_PKEY_CHANGE,
                IBV_EVENT_SM_CHANGE,
                IBV_EVENT_SRQ_ERR,
                IBV_EVENT_SRQ_LIMIT_REACHED,
                IBV_EVENT_QP_LAST_WQE_REACHED,
                IBV_EVENT_CLIENT_REREGISTER,
                IBV_EVENT_GID_CHANGE,
        };
————————————————
版权声明:本文为CSDN博主「raindayinrain」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/x13262608581/article/details/125054912

好的,我来为您介绍一下ibv_modify_qp的驱动调用流程及其具体接口: ibv_modify_qp是RDMA verbs API中用于修改QP(Queue Pair)状态的函数。其驱动调用流程大致如下: 1. 用户态调用ibv_modify_qp函数 2. 该函数通过系统调用进入内核态 3. 内核中的ib_uverbs模块接收到请求 4. 调用ib_modify_qp函数进行参数检查和状态转换 5. 调用ib_qp_modify函数执行实际的QP修改操作 6. 调用底层驱动程序的modify_qp回调函数 7. 底层驱动程序执行具体的硬件操作 8. 操作完成后,结果层层返回到用户态 具体接口如下: ```c int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask); struct ibv_qp_attr { enum ibv_qp_state qp_state; enum ibv_qp_state cur_qp_state; enum ibv_mtu path_mtu; enum ibv_mig_state path_mig_state; uint32_t qkey; uint32_t rq_psn; uint32_t sq_psn; uint32_t dest_qp_num; int qp_access_flags; struct ibv_qp_cap cap; struct ibv_ah_attr ah_attr; struct ibv_ud_av ah_attr; // 对于UD QP struct ibv_sched sched; uint16_t pkey_index; uint16_t alt_pkey_index; uint8_t en_sqd_async_notify; uint8_t sq_draining; uint8_t max_rd_atomic; uint8_t max_dest_rd_atomic; uint8_t min_rnr_timer; uint8_t port_num; uint8_t timeout; uint8_t retry_cnt; uint8_t rnr_retry; uint8_t alt_port_num; uint8_t alt_timeout; }; #define IBV_QP_STATE 0x1 #define IBV_QP_CUR_STATE 0x2 #define IBV_QP_EN_SQD_ASYNC_NOTIFY 0x4 #define IBV_QP_ACCESS_FLAGS 0x8 #define IBV_QP_PKEY_INDEX 0x10 #define IBV_QP_PORT_NUM 0x20 #define IBV_QP_QKEY 0x40 #define IBV_QP_AV 0x80 #define IBV_QP_PATH_MTU 0x100 #define IBV_QP_TIMEOUT 0x200 #define IBV_QP_RETRY_CNT 0x400 #define IBV_QP_RNR_RETRY 0x800 #define IBV_QP_RQ_PSN 0x1000 #define IBV_QP_SQ_PSN 0x2000 #define IBV_QP_MAX_RD_ATOMIC 0x4000 #define IBV_QP_MAX_DEST_RD_ATOMIC 0x8000 #define IBV_QP_MIN_RNR_TIMER 0x10000 #define IBV_QP_SQ_DRAINING 0x20000 #define IBV_QP_IB_PATH_MIG_STATE 0x40000 #define IBV_QP_CAP 0x80000 #define IBV_QP_ALT_PORT_NUM 0x100000 #define IBV_QP_ALT_PKEY_INDEX 0x200000 #define IBV_QP_ALT_TIMEOUT 0x400000 ``` 这个接口允许用户修改QP的各种属性,包括状态、路径信息、参数设置等。在使用时,需要根据具体需求设置相应的属性和掩码。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值