RDMA is used in many places, mainly because of the high performance that it allows to achieve. In this post, I will provide tips and tricks on how to optimize RDMA code in several aspects.
General tips
Avoid using control operations in the data path
Unlike the data operations that stay in the same context that they were called in (i.e. don't perform a context switch) and they are written in optimized way, the control operations (all create/destroy/query/modify) operations are very expensive because:
- Most of the time, they perform a context switch
- Sometimes they allocate or free dynamic memory
- Sometimes they involved in accessing the RDMA device
As a general rule of thumb, one should avoid calling control operations or decrease its use in the data path.
The following verbs are considered as data operations:
- ibv_post_send()
- ibv_post_recv()
- ibv_post_srq_recv()
- ibv_poll_cq()
- ibv_req_notify_cq
When posting multiple WRs, post them in a list in one call
When posting several Work Requests to one of the ibv_post_*() verbs, posting multiple Work Requests as a linked list in one call instead of several calls each time with one Work Request will provide better performance since it allows the low-level driver to perform optimizations.
When using Work Completion events, acknowledge several events in one call
When handling Work Completions using events, acknowledging several completions in one call instead of several calls each time will provide better performance since less mutual exclusion locks are being performed.
Avoid using many scatter/gather entries
Using several scatter/gather entries in a Work Request (either Send Request or Receive Request) mean that the RDMA device will read th