Mbufs(Memory Buffers) and Output Processing

A fundamental concept in the design of the Berkeley networking code is the memory buffer, called an mbuf, used throughout the networking code to hold various pieces of information.

Mbuf Containing Socket Address Structure
In the call to sendto, the fifth argument points to an internet socket address structure(named serv)

if (sendto(sockfd, buff, BUFFSIZE, 0,
                   (struct sockaddr *) &serv, sizeof(serv)) != BUFFSIZE)
                perror("sendto error");

and the sixth argument specifies its length(which we’ll see later is 16 bytes). One of the first things done by the socket layer for this system call is to varify that these arguments are valid(i.e., the pointer points to a piece of memory in the adderess space of the process) and then copy the socket address structure into an mbuf. Figure below shows the resulting mbuf.
这里写图片描述
the first 20 bytes of the mbuf is a header containing information about the mbuf. This 20-byte header contains four 4-byte fields and two 2-byte fields. The total size of the mbuf is 128 bytes.
Mbufs can be linked together using the m_next and m_nextpkt members, as we’ll see shortly. Both are null pointers in this example, which is a stand-alone mbuf.
The m_data member points to the data in the mbuf and the m_len member specifies its length. For this example, m_data points to the first byte of data in the mbuf. The final 92 bytes of the mbuf data area are unused.
The m_type member specifies the type of data contained in the mbuf, which for this example is MT_SONAME(socket name). The final member in the header, m_flags, is zero in this example.
Mbuf Containing Data
continuing our example, the socket layer copies the data buffer specified in the call to sendto into one or more mbufs. The second argument to sendto specifies the start of the data buffer(buff), and the third argument is its size in bytes(150). Figure below shows how two mbufs hold the 150 bytes of data.
这里写图片描述
This arrangement is called an mbuf chain. The m_next member in each mbuf links together all the mbufs in a chain.
The next change we see is the addition of two members, m_pkthdr.len and m_pkthdr.rcvif, to the mbuf header in the first mbuf of the chain. These two members comprise the packet header and are used only in the first mbuf of a chain. The m_flags member contains the value M_PKTHDR to indicate that this mbuf contains a packet header. The len member of the packet header structure contains the total length of the mbuf chain(150 ), and the next member, rcvif, we’ll see later contains a pointer to the received interface structure for recefived packets.
One reason for maintaining a packet header with the total length in the first mbuf on the chain is to avoid having to go through all the mbufs on the chain to sum their m_len members when the total length is needed.

Prepending IP and UDP Headers
After the socket layer copies the destination socket address structure into an mbuf(Figure 1.6) and the data into an mbuf chain(Figure 1.7), the protocol layer correspongding to the socket descriptor ( a UDP socket) is called. S**pecifically, the UDP output routine is called and pointers to the mbufs that we’ve examined are passed as rguments**.
This routine needs to prepend an ip header and a udp header in front of the 150 bytes of data, fill in the headers, and pass the mbufs to the ip output routine.
The way that data is prepended to the mbuf chain in Figure 1.7 is to allocate another mbuf, make it the front of the chain, and copy the packet header from the mbuf with 100 bytes of data into the new mbuf. This gives us the three mbufs shown in Figure 1.8.
这里写图片描述
The ip header and udp header are stored at the end of the new mbuf that becomes the head of the chain.This allows for any lower-layer protocols(e.g., the interface layer) to prepend its header in front of the ip header if necessary, without having to copy the ip and the udp headers. The m_data pointer in the first mbuf points to the start of these two headers, and m_len is 28. Future headers that fit in the 72 bytes of unused space between the packet header and the ip header can be prepended before the ip header by adjusting the m_data pointer and the m_len accordingly. Shortly we’ll see that the ethernet header is built here in this fashion.
Notice that the packet header has been moved from the mbuf with 100 bytes of data into the new mbuf. The packet header must always be in the first mbuf on the chain. To accommodate this movement of the packet header, the M_PKTHDR flag is set in the first mbuf and cleared in the second mbuf. The space previously occupied by the packet header in the second mbuf is now unused. Finally, the length member in the packet header is incrmented by 28 bytes to become 178.
The udp checksum is calculated and stored in the udp header. Notice that this requires a complete pass of the 150 bytes of data stored in the mbuf chain.
So far the kernel has made two complete passes of the 150 bytes of user data: once to copy the data from the userś buffer into the kernelś mbufs, and now to calculate the udp checksum. Extra passes over the data can degrade the protocol’s performance, and in later chapters we describe alternative implementation techniques that avoid unnecessary passes.

At this point the udp output routine calls the ip output routine, passing a pointer to the mbuf chain for ip to output.

IP Output
The ip output routine fills in the remaining fields in the ip header including the ip checksum, determines the outgoing interface to which the datagram should be given(this is the ip routing function), fragements the ip datagram if necessary, and calls the interface output function.
Assuming the outgoing interface is an Ethernet, a general-purpose Ethernet output function is called, again with a pointer to the mbuf chain as an argument.
Ethernet Output
The first function of the ethernet output function is to convert the 32-bit ip address into its corresponding 48-bit ethernet address. This is done using ARP(address resolution protocol) and may involve sending an arp request on the ethernet and waiting for an arp reply. While this takes places, the mbuf chain to be output is held, waiting for the reply.
The ethenet output routine then prepends a 14-byte ethernet header to the first mbuf in the chain, immediately before the ip header. This contains the 6-byte ethernet destination adderess, 6-byte ethernet source address, and 2-byte ethernet frame type.
The mbuf chain is then added to the end of the output queue for the interface. If the interface is not currently busy, the interfaces’ start output routine is called directly. If the interface is busy, its output routine will process the new mbuf on its queue when it is finished with the buffers already on its output queue.
when the interface processes an mbuf that’s on its output queue, it copies the data to its transmit buffer and initiates the output. In our example, 192 bytes are copied to the transmit buffer: the 14-byte ethernet header, 20-byte ip header, 8-byte udp header, and 150 bytes of user data. This is the third complete pass of the data by the kernel. Once the data is copied from the mbuf chain into the device’s transmit buffer, the mbuf chain is released by the ethernet device driver. The three mbufs are put back into the kernel’s pool of free mbufs.
Summary of UDP Output
The figure below we give an overview of the processing that takes place when a process calls sendto to transmit a single udp datagram. The relationship of the processing that we have described to the three layers of kernel code is also shown.
summary of UDP output
Figure below we give an overview of the processing that takes place when a process calls sendto to transmit a single udp datagram. The relationship of the processing that we’ve described to the three layers of kernel code is also shown.
这里写图片描述
function calls pass control from the socket layer to the udp output routine, to the ip output routine. Each function call passes a pointer to the mbuf chain to be output. At the lowest layer, the device driver, the mbuf chain is placed on the device’s output queue and the device is started, if necessary. The function calls return in reverse order of their call, and eventually the system call returns to the process. Notice that there is no queueing of the udp data until it arrives at the device driver. The higher layers just prepend their header and pass the mbuf to the next lower layer.
At this point our program calls recvfrom to read the server’s reply. Since the input queue for the specified socket is empty(assuming the reply has not been received yet), the process is put to sleep.

### 关于 DPDK 中 EAL 错误 “invalid memory”的解决方案 DPDK 的环境抽象层(EAL)负责初始化和管理底层资源,包括内存分配。当遇到 `eal error invalid memory` 时,通常表明存在内存配置错误或使用不当的情况。以下是可能的原因及其对应的解决方法: #### 可能原因及解决办法 1. **未正确初始化 Hugepages** 如果系统中没有足够的 Hugepage 或者 Hugepage 配置不正确,则可能导致此错误。Hugepage 是 DPDK 使用的主要内存机制之一。 解决方案:确保已启用并正确配置 Hugepage。可以通过以下命令检查和设置 Hugepage 数量: ```bash cat /proc/meminfo | grep HugePages_ echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hmounts ``` 此外,在启动应用程序时需指定 `-m` 参数来定义可用的内存大小[^1]。 2. **NUMA 节点配置问题** 当多 NUMA 节点环境中未能合理分配内存到各个节点上时也可能引发该错误。如果程序尝试访问不属于当前 NUMA 节点上的内存区域就会失败。 解决措施:通过参数 `--socket-mem` 明确指明各 NUMA socket 上应预留多少 MB 内存给应用进程使用;或者利用 `--no-numa` 开关忽略 NUMA 设置让整个系统的 huge pages 均可被任何 CPU 访问[^2]。 3. **MBUF 缓冲区泄漏** 若 mbuf 数据包缓冲区内存未得到适当释放,长期运行下会造成内存耗尽从而触发上述异常情况。这涉及到对 `pktmbuf_free_bulk()` 函数的应用场景分析以及确认是否存在遗漏调用之处[^3]。 示例修复代码如下所示: ```c // 确保每次处理完成后及时清理不再使用的mbufs pktmbuf_free_bulk(bulk_mbuf_array, count); ``` 4. **RX/TX Burst 处理逻辑缺陷** 在数据接收(`rte_eth_rx_burst`) 和发送 (`rte_eth_tx_burst`) 过程里如果没有妥善管理传入传出的数据帧数量与实际操作之间的匹配关系也会造成类似现象发生[^4]^。 推荐做法是在实现业务功能的同时严格控制好输入输出队列长度,并且对于超出预期范围之外的部分采取合理的丢弃策略而不是强行继续执行下去以免影响整体稳定性。 综上所述,“Invalid Memory Error” 往往源于基础资源配置失误或者是高层级编程细节方面的疏忽所致。针对具体项目需求仔细排查以上几个方面应该能够有效定位并解决问题根源所在。 ```python def check_memory_allocation(): """ A function to simulate checking the memory allocation status. This is a conceptual representation and not actual code. """ try: # Simulate initialization process with proper parameters initialize_eal("--socket-mem=1024,1024", "--legacy-mem") # Ensure all buffers are freed after usage free_all_buffers() return True except Exception as e: print(f"Error during memory validation: {str(e)}") return False ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值