C-基于线程的CPU独占解决方案

最新推荐文章于 2022-11-22 17:17:27 发布

sif_666

最新推荐文章于 2022-11-22 17:17:27 发布

阅读量2.1k

点赞数 1

分类专栏： c 文章标签：多线程 linux c

本文链接：https://blog.csdn.net/weixin_43708622/article/details/108541726

版权

c 专栏收录该内容

20 篇文章 2 订阅

订阅专栏

实际的工程中，在多线程环境下，需要高性能时，有时候为了减少多线程上下文切换的开销，需要将线程"绑定"在指定的cpu core上，但是这种"绑定"并不是"独占"。例如，线程1"绑定"到了core1上，只是限制了线程1不会跑到其他core上运行，但是其他线程仍然会被调度到core1上运行，显然，core1不是被线程1"独占"。基于以上需求，本文会讲解"绑定"和"独占"的实现。

软亲和

也叫"绑核"，就是将线程绑定到指定的cpu core上运行。

硬亲和

隔离指定的cpu core，使其不参与SMP的负载均衡和系统调度。

查看CPU数

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1229.437
CPU max MHz:           3200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4801.50
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush_l1d

可以看到我的服务器的CPU core是24。

硬亲和配置

$ vi /etc/default/grub

# 增加 isolcpus=19-23,表示将cpu 19-23隔离出来
GRUB_CMDLINE_LINUX="default_hugepagesz=1GB hugepagesz=1G hugepages=32 isolcpus=19-23"

isolcpus是kernel中的一个功能(这里先不展开了)，它的功能及描述我从kernel的document中引用了出来。如下：

isolcpus

isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
The argument is a cpu list, as described below.

This option can be used to specify one or more CPUs
to isolate from the general SMP balancing and scheduling
algorithms. You can move a process onto or off an
“isolated” CPU via the CPU affinity syscalls or cpuset.
begins at 0 and the maximum value is
“number of CPUs in system - 1”.

This option is the preferred way to isolate CPUs. The
alternative – manually setting the CPU mask of all
tasks in the system – can cause problems and
suboptimal load balancer performance.

cpu lists:

Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number>
(must be a positive range in ascending order)
or a mixture

,…,-

Note that for the special case of a range one can split the range into equal
sized groups and for each group use some amount from the beginning of that
group:
<cpu number>-cpu number>:<used size>/<group size>
For example one can add to the command line following parameter:
isolcpus=1,2,10-20,100-2000:2/25
where the final item represents CPUs 100,101,125,126,150,151,…

软亲和

在代码层面，线程的软亲和可以通过linux系统API来实现。

//主要API如下

// 绑核
int pthread_setaffinity_np(pthread_t thread, size_t cpusetsize,const cpu_set_t *cpuset);
// 获取已绑定的核
int pthread_getaffinity_np(pthread_t thread, size_t cpusetsize,,const cpu_set_t *cpuset);

// CPU集处理宏
void CPU_ZERO(cpu_set_t *set);
void CPU_SET(int cpu, cpu_set_t *set);
void CPU_CLR(int cpu, cpu_set_t *set);
int  CPU_ISSET(int cpu, cpu_set_t *set);

假设programA 有4个子线程(pthread1,pthread2,pthread3,pthread4)

通过调用pthread_setaffinity_np()，将pthread1,pthread2,pthread3,pthread4 分别绑定到CPU20，CPU21，CPU22，CPU23。结合前面硬亲和配置，pthread1,pthread2,pthread3,pthread4 将分别独占CPU20，CPU21，CPU22，CPU23

测试

测试环境：ubuntu

cal()函数耗时

单线程测试cal()耗时 — 4.xxx s

$ time -p ./a.out 1
sizeof unsigned long int :8
sizeof cpu_set_t:128
system has 24 processor(s)
enter thread [0]
thread 82291700[0] is running in processor 23
thread 82291700[0], cal [0] times, time cost:04.611.309 s
thread 82291700[0], cal [1] times, time cost:04.387.202 s
thread 82291700[0], cal [2] times, time cost:04.383.626 s
thread 82291700[0], cal [3] times, time cost:04.384.935 s
thread 82291700[0], cal [4] times, time cost:04.384.374 s
thread 82291700[0], cal [5] times, time cost:04.384.06 s
thread 82291700[0], cal [6] times, time cost:04.384.11 s
thread 82291700[0], cal [7] times, time cost:04.386.454 s
thread 82291700[0], cal [8] times, time cost:04.385.883 s
thread 82291700[0], cal [9] times, time cost:04.384.986 s
real 44.07
user 61.82
sys 0.00

普通多线程正常调度

5个子线程，每个子线程执行10次 cal()函数；cal()耗时7.xxxs，程序整体耗时79.52s。

$ time -p ./a.out 5
sizeof unsigned long int :8
sizeof cpu_set_t:128
system has 24 processor(s)
enter thread [0]
enter thread [1]
enter thread [3]
enter thread [4]
enter thread [2]
thread 979a0700[1], cal [0] times, time cost:07.833.855 s
thread 9699e700[3], cal [0] times, time cost:07.855.874 s
thread 981a1700[0], cal [0] times, time cost:07.885.449 s
thread 9619d700[4], cal [0] times, time cost:07.899.541 s
thread 9719f700[2], cal [0] times, time cost:07.935.300 s
thread 979a0700[1], cal [1] times, time cost:07.602.604 s
thread 9699e700[3], cal [1] times, time cost:07.603.702 s
thread 981a1700[0], cal [1] times, time cost:07.731.507 s
thread 9619d700[4], cal [1] times, time cost:07.730.282 s
thread 9719f700[2], cal [1] times, time cost:07.739.334 s
thread 979a0700[1], cal [2] times, time cost:07.655.725 s
thread 9699e700[3], cal [2] times, time cost:07.674.184 s
thread 9619d700[4], cal [2] times, time cost:07.601.438 s
thread 981a1700[0], cal [2] times, time cost:07.731.829 s
thread 9719f700[2], cal [2] times, time cost:07.737.896 s
thread 979a0700[1], cal [3] times, time cost:07.591.769 s
thread 9619d700[4], cal [3] times, time cost:07.586.129 s
thread 9699e700[3], cal [3] times, time cost:07.716.120 s
thread 981a1700[0], cal [3] times, time cost:07.706.884 s
thread 9719f700[2], cal [3] times, time cost:07.760.966 s
thread 979a0700[1], cal [4] times, time cost:07.690.961 s
thread 981a1700[0], cal [4] times, time cost:07.378.672 s
thread 9619d700[4], cal [4] times, time cost:07.670.166 s
thread 9699e700[3], cal [4] times, time cost:07.641.645 s
thread 9719f700[2], cal [4] times, time cost:08.753.642 s
thread 981a1700[0], cal [5] times, time cost:07.390.977 s
thread 979a0700[1], cal [5] times, time cost:07.604.881 s
thread 9619d700[4], cal [5] times, time cost:07.523.953 s
thread 9699e700[3], cal [5] times, time cost:08.193.336 s
thread 9719f700[2], cal [5] times, time cost:08.122.191 s
thread 981a1700[0], cal [6] times, time cost:07.429.311 s
thread 9699e700[3], cal [6] times, time cost:07.368.249 s
thread 9619d700[4], cal [6] times, time cost:08.66.928 s
thread 979a0700[1], cal [6] times, time cost:08.225.484 s
thread 9719f700[2], cal [6] times, time cost:08.190.563 s
thread 981a1700[0], cal [7] times, time cost:07.273.300 s
thread 9699e700[3], cal [7] times, time cost:07.485.483 s
thread 9619d700[4], cal [7] times, time cost:07.838.146 s
thread 979a0700[1], cal [7] times, time cost:07.946.420 s
thread 9719f700[2], cal [7] times, time cost:08.77.908 s
thread 9699e700[3], cal [8] times, time cost:07.340.102 s
thread 981a1700[0], cal [8] times, time cost:08.503.182 s
thread 9619d700[4], cal [8] times, time cost:07.530.351 s
thread 979a0700[1], cal [8] times, time cost:08.647.891 s
thread 9719f700[2], cal [8] times, time cost:07.710.382 s
thread 9699e700[3], cal [9] times, time cost:07.449.603 s
thread 981a1700[0], cal [9] times, time cost:07.995.195 s
thread 9619d700[4], cal [9] times, time cost:07.736.26 s
thread 979a0700[1], cal [9] times, time cost:07.942.921 s
thread 9719f700[2], cal [9] times, time cost:07.490.94 s
real 79.52
user 419.32
sys 0.00

多线程软亲和测试

5个子线程，每个子线程执行10次 cal()函数；cal()耗时8.xxxs，程序整体耗时81.75s。

$ time -p ./iso_cpu 5
sizeof unsigned long int :8
sizeof cpu_set_t:128
system has 24 processor(s)
enter thread [0]
enter thread [1]
enter thread [2]
enter thread [3]
enter thread [4]
thread 53ab2700[0], cal [0] times, time cost:08.279.687 s
thread 52ab0700[2], cal [0] times, time cost:08.279.830 s
thread 51aae700[4], cal [0] times, time cost:08.280.788 s
thread 522af700[3], cal [0] times, time cost:08.280.963 s
thread 532b1700[1], cal [0] times, time cost:08.304.684 s
thread 53ab2700[0], cal [1] times, time cost:08.159.58 s
thread 52ab0700[2], cal [1] times, time cost:08.159.644 s
thread 51aae700[4], cal [1] times, time cost:08.160.558 s
thread 522af700[3], cal [1] times, time cost:08.160.594 s
thread 532b1700[1], cal [1] times, time cost:08.174.232 s
thread 53ab2700[0], cal [2] times, time cost:08.158.333 s
thread 52ab0700[2], cal [2] times, time cost:08.158.777 s
thread 51aae700[4], cal [2] times, time cost:08.159.808 s
thread 522af700[3], cal [2] times, time cost:08.159.866 s
thread 532b1700[1], cal [2] times, time cost:08.158.436 s
thread 53ab2700[0], cal [3] times, time cost:08.160.631 s
thread 52ab0700[2], cal [3] times, time cost:08.161.204 s
thread 51aae700[4], cal [3] times, time cost:08.162.52 s
thread 522af700[3], cal [3] times, time cost:08.170.87 s
thread 532b1700[1], cal [3] times, time cost:08.161.15 s
thread 53ab2700[0], cal [4] times, time cost:08.159.763 s
thread 52ab0700[2], cal [4] times, time cost:08.159.850 s
thread 51aae700[4], cal [4] times, time cost:08.161.152 s
thread 522af700[3], cal [4] times, time cost:08.160.997 s
thread 532b1700[1], cal [4] times, time cost:08.160.79 s
thread 53ab2700[0], cal [5] times, time cost:08.157.40 s
thread 52ab0700[2], cal [5] times, time cost:08.157.723 s
thread 51aae700[4], cal [5] times, time cost:08.158.532 s
thread 522af700[3], cal [5] times, time cost:08.158.376 s
thread 532b1700[1], cal [5] times, time cost:08.157.374 s
thread 53ab2700[0], cal [6] times, time cost:08.157.965 s
thread 52ab0700[2], cal [6] times, time cost:08.158.620 s
thread 51aae700[4], cal [6] times, time cost:08.159.765 s
thread 522af700[3], cal [6] times, time cost:08.159.232 s
thread 532b1700[1], cal [6] times, time cost:08.158.245 s
thread 53ab2700[0], cal [7] times, time cost:08.161.805 s
thread 52ab0700[2], cal [7] times, time cost:08.161.102 s
thread 51aae700[4], cal [7] times, time cost:08.161.789 s
thread 522af700[3], cal [7] times, time cost:08.161.885 s
thread 532b1700[1], cal [7] times, time cost:08.162.372 s
thread 53ab2700[0], cal [8] times, time cost:08.157.749 s
thread 52ab0700[2], cal [8] times, time cost:08.158.384 s
thread 51aae700[4], cal [8] times, time cost:08.160.384 s
thread 522af700[3], cal [8] times, time cost:08.159.00 s
thread 532b1700[1], cal [8] times, time cost:08.158.111 s
thread 53ab2700[0], cal [9] times, time cost:08.158.62 s
thread 52ab0700[2], cal [9] times, time cost:08.157.861 s
thread 51aae700[4], cal [9] times, time cost:08.158.566 s
thread 522af700[3], cal [9] times, time cost:08.157.624 s
thread 532b1700[1], cal [9] times, time cost:08.153.392 s
real 81.75
user 436.61
sys 0.00

多线程独占测试

5个子线程，每个子线程执行10次 cal()函数；cal()耗时5.xxxs，程序整体耗时52.12s。

$ time -p ./iso_cpu 5
sizeof unsigned long int :8
sizeof cpu_set_t:128
system has 24 processor(s)
enter thread [0]
thread 32e38700[0] is running in processor 23
enter thread [1]
enter thread [2]
thread 31e36700[2] is running in processor 21
thread 32637700[1] is running in processor 22
enter thread [3]
enter thread [4]
thread 30e34700[4] is running in processor 19
thread 31635700[3] is running in processor 20
thread 31e36700[2], cal [0] times, time cost:05.318.382 s
thread 32637700[1], cal [0] times, time cost:05.323.415 s
thread 31635700[3], cal [0] times, time cost:05.324.331 s
thread 32e38700[0], cal [0] times, time cost:05.324.730 s
thread 30e34700[4], cal [0] times, time cost:05.324.986 s
thread 31e36700[2], cal [1] times, time cost:05.201.362 s
thread 32637700[1], cal [1] times, time cost:05.200.34 s
thread 31635700[3], cal [1] times, time cost:05.200.823 s
thread 32e38700[0], cal [1] times, time cost:05.200.585 s
thread 30e34700[4], cal [1] times, time cost:05.201.601 s
thread 31e36700[2], cal [2] times, time cost:05.196.756 s
thread 32637700[1], cal [2] times, time cost:05.196.720 s
thread 31635700[3], cal [2] times, time cost:05.196.575 s
thread 32e38700[0], cal [2] times, time cost:05.196.560 s
thread 30e34700[4], cal [2] times, time cost:05.197.496 s
thread 31e36700[2], cal [3] times, time cost:05.197.537 s
thread 32637700[1], cal [3] times, time cost:05.197.319 s
thread 32e38700[0], cal [3] times, time cost:05.196.953 s
thread 31635700[3], cal [3] times, time cost:05.197.177 s
thread 30e34700[4], cal [3] times, time cost:05.197.845 s
thread 31e36700[2], cal [4] times, time cost:05.196.190 s
thread 32637700[1], cal [4] times, time cost:05.196.109 s
thread 32e38700[0], cal [4] times, time cost:05.196.340 s
thread 31635700[3], cal [4] times, time cost:05.196.190 s
thread 30e34700[4], cal [4] times, time cost:05.196.424 s
thread 31e36700[2], cal [5] times, time cost:05.202.587 s
thread 32637700[1], cal [5] times, time cost:05.202.529 s
thread 32e38700[0], cal [5] times, time cost:05.202.83 s
thread 31635700[3], cal [5] times, time cost:05.202.627 s
thread 30e34700[4], cal [5] times, time cost:05.203.633 s
thread 31e36700[2], cal [6] times, time cost:05.194.706 s
thread 32e38700[0], cal [6] times, time cost:05.194.811 s
thread 32637700[1], cal [6] times, time cost:05.195.952 s
thread 31635700[3], cal [6] times, time cost:05.194.509 s
thread 30e34700[4], cal [6] times, time cost:05.195.113 s
thread 31e36700[2], cal [7] times, time cost:05.196.723 s
thread 32e38700[0], cal [7] times, time cost:05.196.557 s
thread 31635700[3], cal [7] times, time cost:05.196.843 s
thread 32637700[1], cal [7] times, time cost:05.197.960 s
thread 30e34700[4], cal [7] times, time cost:05.197.17 s
thread 31e36700[2], cal [8] times, time cost:05.194.982 s
thread 32e38700[0], cal [8] times, time cost:05.194.665 s
thread 31635700[3], cal [8] times, time cost:05.194.845 s
thread 32637700[1], cal [8] times, time cost:05.195.998 s
thread 30e34700[4], cal [8] times, time cost:05.195.87 s
thread 31e36700[2], cal [9] times, time cost:05.195.67 s
thread 32e38700[0], cal [9] times, time cost:05.195.24 s
thread 31635700[3], cal [9] times, time cost:05.194.995 s
thread 32637700[1], cal [9] times, time cost:05.195.860 s
thread 30e34700[4], cal [9] times, time cost:05.194.334 s
real 52.12
user 288.24
sys 0.00

在这里插入图片描述

结论

普通5子线程程序整体耗时79.52s

软亲和 5子线程程序整体耗时81.75s

5子线程独占程序整体耗时52.12s

可以发现，多线程程序，软亲和并不能提升程序的性能，只有独占才能真正提升程序的性能。
因此，想要将cpu用到极限，独占是一种方案。

源码路径：https://gitee.com/yellow_mamba/c_tutorial/tree/master/isolation_cpu
转载或使用请注明出处

sif_666

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
0
评论
C-基于线程的CPU独占解决方案

实际的工程中，在多线程环境下，需要高性能时，有时候为了减少多线程上下文切换的开销，需要将线程"绑定"在指定的cpu core上，但是这种"绑定"并不是"独占"。例如，线程1"绑定"到了core1上，只是限制了线程1不会跑到其他core上运行，但是其他线程仍然会被调度到core1上运行，显然，core1不是被线程1"独占"。基于以上需求，本文会讲解"绑定"和"独占"的实现。文章目录软亲和硬亲和查看CPU数硬亲和配置isolcpuscpu lists:软亲和测试cal()函数耗时普通多线程正常调度多线程软亲和
复制链接

扫一扫