Improving Linux kernel networking performance 笔记

最新推荐文章于 2022-11-06 23:10:41 发布

Teddei

最新推荐文章于 2022-11-06 23:10:41 发布

阅读量264

点赞数

分类专栏： DPDK 文章标签： linux kernel networking 优化

本文链接：https://blog.csdn.net/wzcprince/article/details/78810261

版权

DPDK 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

原文链接：
https://lwn.net/Articles/629155/
By Jonathan Corbet
January 13, 2015

- Time budgets预算

Time budgets预算

只有67.2ns

The smallest Ethernet frame that can be sent is 84 bytes; on a 10G adapter, Jesper said, there are 67.2ns between minimally-sized packets.

可行性分析

a cache miss on Jesper’s 3GHz processor takes about 32ns to resolve
thus only takes two misses to wipe out the entire time budget for processing a packet
Given that a socket buffer (“SKB”) occupies four cache lines on a 64-bit system and that much of the SKB is written during packet processing
the x86 LOCK prefix for atomic operations takes about 8.25ns, 所以the shortest spinlock lock/unlock cycle takes a little over 16ns. So there is not room for a lot of locking within the time budget.
the cost of performing a system call 大约75ns

可行方案

批量操作

免锁

减少系统调用

cache 优化

The key appears to be batching of operations, along with preallocation and prefetching of resources. These solutions keep work CPU-local and avoid locking. It is also important to shrink packet metadata and reduce the number of system calls. Faster, cache-optimal data structures also help. Of all of these techniques, batching of operations is the most important. A cost that is intolerable on a per-packet basis is easier to absorb if it is incurred once per dozens of packets. 16ns of locking per packet hurts; if sixteen packets are processed at once, that overhead drops to 1ns per packet.

2个cache miss以内，不能有spin lock

提升batching，Latency and throughput的折中权衡

The tricky part, he said, is adding batching APIs to the networking stack without increasing the latency of the system. Latency and throughput must often be traded off against each other; here the objective is to optimize both.

TCP bulk transmission work :
Bulk network packet transmission [LWN.net]
https://lwn.net/Articles/615238/

Memory Management【需要bypass以便提升性能】

implemented a subsystem called qmempool; it does bulk allocation and free operations in a lockless manner

[RFC PATCH 0/3] Faster than SLAB caching of SKBs with qmempool (backed by alf_queue) [LWN.net]
https://lwn.net/Articles/625427/

Teddei

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Improving Linux kernel networking performance 笔记

原文链接： https://lwn.net/Articles/629155/ By Jonathan Corbet January 13, 2015Time budgets预算只有672ns可行性分析可行方案批量操作免锁减少系统调用cache 优化提升batchingLatency and throughput的折中权衡 Memory Management需要bypass
复制链接

扫一扫