优化实时性能

本文记录了通过设置内核启动参数降低程序调度的延迟。从内核参数、测试工具等方面来阐述这个问题。

目录

Linux Kernel

通用内核

实时内核

RT-Preempt patch

内核参数

nosoftlockup

softlockup_panic=0

nowatchdog

nmi_watchdog=0

mce=ignore_mce

selinux=0

enforcing=0

intel_idle.max_cstate=0

cpuidle.off=1

processor.max_cstate=1

intel_pstate=disable

isolcpus=cpulist

nohz

nohz_full

irqaffinity

rcu_nocbs

时钟设置

skew_tick=1

Tuned

测试工具

实验结果


Linux Kernel

通用内核

在spinlock、irq上下文方面无法抢占,因此即使高优先级任务被唤醒到得以执行的时间并不能完全确定。正是因为这个执行时间的不确定性,

实时内核

实时系统分为硬实时和软实时,区别主要在于,软实时是统计上的实时,保证一定百分比的进程可以在规定的时间内完成。而硬实时可以保证每一个进程都可以在规定的时间内得到执行。

RT-Preempt patch

待续

内核参数

对于内核参数的解释可参考:Linux 内核引导参数简介_宽简厚重—博约-CSDN博客_idle=poll

nosoftlockup

先说说softlockup,系统检测到cpu在20s内没有发生调度,就认为发生了softlockup,可能是由于关抢占或者死循环导致的。当内核配置了CONFIG_LOCKUP_DETECTOR,就会为每个cpu启动一个watchdog,用于检测是否遇到了软死锁。

参见:linux watchdog softlockup 检测原理 - 知乎

nosoftlockup就是禁止内核进行软死锁检测。

softlockup_panic=0

是否在检测到软死锁(soft-lockup)的时候让内核panic,其默认值由 CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE 确定

0:只显示警告堆栈

1:panic系统

nowatchdog

禁止硬死锁检测

nmi_watchdog=0

配置 nmi_watchdog(不可屏蔽中断看门狗)

0: 表示关闭看门狗;
panic: 表示出现看门狗超时(长时间没喂狗)的时候触发 内核错误,通常和"panic="配合使用,以实现在系统出现锁死的时候自动重启。
nopanic: 正好相反,表示即使出现看门狗超时(长时间没喂狗),也不触发内核错误。

mce=ignore_mce

最大化减少干扰可以将其值设置为off

Machine Check Exception

   mce=off
        Disable machine check
   mce=no_cmci
        Disable CMCI(Corrected Machine Check Interrupt) that
        Intel processor supports.  Usually this disablement is
        not recommended, but it might be handy if your hardware
        is misbehaving.
        Note that you'll get more problems without CMCI than with
        due to the shared banks, i.e. you might get duplicated
        error logs.
   mce=dont_log_ce
        Don't make logs for corrected errors.  All events reported
        as corrected are silently cleared by OS.
        This option will be useful if you have no interest in any
        of corrected errors.
   mce=ignore_ce
        Disable features for corrected errors, e.g. polling timer
        and CMCI.  All events reported as corrected are not cleared
        by OS and remained in its error banks.
        Usually this disablement is not recommended, however if
        there is an agent checking/clearing corrected errors
        (e.g. BIOS or hardware monitoring applications), conflicting
        with OS's error handling, and you cannot deactivate the agent,
        then this option will be a help.
   mce=bootlog
        Enable logging of machine checks left over from booting.
        Disabled by default on AMD because some BIOS leave bogus ones.
        If your BIOS doesn't do that it's a good idea to enable though
        to make sure you log even machine check events that result
        in a reboot. On Intel systems it is enabled by default.
   mce=nobootlog
        Disable boot machine check logging.
   mce=tolerancelevel[,monarchtimeout] (number,number)
        tolerance levels:
        0: always panic on uncorrected errors, log corrected errors
        1: panic or SIGBUS on uncorrected errors, log corrected errors
        2: SIGBUS or log uncorrected errors, log corrected errors
        3: never panic or SIGBUS, log all errors (for testing only)
        Default is 1
        Can be also set using sysfs which is preferable.
        monarchtimeout:
        Sets the time in us to wait for other CPUs on machine checks. 0
        to disable.

selinux=0

是否在启动时就开启SELinux功能(CONFIG_SECURITY_SELINUX_BOOTPARAM):"0"表示关闭,"1"表示开启

enforcing=0

是否在启动时强制启用SELinux规则。
"0"(默认值)表示仅仅做记录违规操作日志而不真正拒绝违规操作;
"1"表示真正拒绝违规操作并做记录违规操作日志。

intel_idle.max_cstate=0

设置intel_idle驱动(CONFIG_INTEL_IDLE)允许使用的最大 C-state深度。"0"表示禁用intel_idle驱动,转而使用通用的acpi_idle驱动(CONFIG_CPU_IDLE)

cpuidle.off=1

禁止CPU处于idle状态,最大化性能

processor.max_cstate=1

无视ACPI表报告的值,强制指定CPU的最大 C-state值(0|1|2|3|4|5|6|7|8|9):C0为正常状态,其他则为不同的省电模式(数字越大表示CPU休眠的程度越深/越省电)。"9"表示无视所有的DMI黑名单限制。

intel_pstate=disable

禁用 Intel CPU 的 P-state 驱动(CONFIG_X86_INTEL_PSTATE),也就是Intel CPU专用的频率调节器驱动

isolcpus=cpulist

将列表中的CPU从内核SMP平衡和调度算法中剔除。
提出后并不是绝对不能再使用该CPU的,操作系统仍然可以强制指定特定的进程使用哪个CPU(可以通过taskset来做到)。
该参数的目的主要是用于实现特定cpu只运行特定进程的目的。

nohz

启用/禁用内核的dynamic ticks特性,默认系统会周期性的触发中断(周期调度器),更新线程统计信息,是否达到ideal_time等。若目标CPU的runqueue上没有任何可调度实体,则CPU进入idle状态,此时停止tick。

nohz_full

nohz_full指定哪些CPU进入无滴答状态。当指定CPU上只运行一个任务或者运行实时任务时,关掉该CPU的周期tick。这里列出的CPU编号必须也要同时列进"rcu_nocbs=..."

遇到的问题:当设置nohz和nohz_full后,即使cpu处于idle状态或者运行实时任务,仍然会有周期中断(tick)。这是因为CPU从idle状态退出时,会无条件的重启tick,如果此时任务队列只有一个任务或者这个任务具有最高优先级,则这个tick只trigger一次,但是这个tick仍然会对正在运行的进程造成干扰。

fixed patch(华为藏龙卧虎,牛批):[tip: timers/nohz] tick/nohz: Conditionally restart tick on idle exit - tip-bot2 for Yunfeng Ye

irqaffinity

通过设置中断亲和性,由非isolated cpu处理中断,减少被隔离CPU的抖动。同时disable irqblance:systemctl stop irqblance。

rcu_nocbs

先简单介绍下什么事RCU(Read-copy Update),在读多写少的情况下,这是一个高性能的锁机制,对于被RCU保护的共享数据,读者不需要获得锁就可以访问(速度快)。但是对于写操作,它首先copy一个副本,然后对副本进行修改,最后使用回调机制在适当的时候将原数据指针指向被修改的数据,因此写速度很慢。

rcu_nocbs=cpulist 指定某些CPU是无回调的。当需要使用回调机制时,将工作转移到其他cpu,用以减轻nocb cpu的负载。rcu_nocb_poll 不用周期性的唤醒offloaded CPU for cb jobs.

每个CPU上都有一个rcuc的内核线程,用于处理回调工作。

时钟设置

clocksource=tsc tsc=reliable hpet=disable

在设置tsc=reliable后无需再设置tsc=nowatchdog ,参见code:tsc.c - arch/x86/kernel/tsc.c - Linux source code (v5.14.3) - Bootlin

skew_tick=1

ensures that the ticks per CPU do not occur simultaneously,decreases the potential for lock conflicts。

Tuned

tuned是一套系统调优工具,支持如下几种模式:

Available profiles:
- accelerator-performance     - Throughput performance based tuning with disabled higher latency STOP states
- atomic-guest                - Optimize virtual guests based on the Atomic variant
- atomic-host                 - Optimize bare metal systems running the Atomic variant
- balanced                    - General non-specialized tuned profile
- cpu-partitioning            - Optimize for CPU partitioning
- default                     - Legacy default tuned profile
- desktop                     - Optimize for the desktop use-case
- desktop-powersave           - Optmize for the desktop use-case with power saving
- enterprise-storage          - Legacy profile for RHEL6, for RHEL7, please use throughput-performance profile
- hpc-compute                 - Optimize for HPC compute workloads
- intel-sst                   - Configure for Intel Speed Select Base Frequency
- laptop-ac-powersave         - Optimize for laptop with power savings
- laptop-battery-powersave    - Optimize laptop profile with more aggressive power saving
- latency-performance         - Optimize for deterministic performance at the cost of increased power consumption
- mssql                       - Optimize for MS SQL Server
- network-latency             - Optimize for deterministic performance at the cost of increased power consumption, focused on low latency network performance
- network-throughput          - Optimize for streaming network throughput, generally only necessary on older CPUs or 40G+ networks
- optimize-serial-console     - Optimize for serial console use.
- oracle                      - Optimize for Oracle RDBMS
- postgresql                  - Optimize for PostgreSQL server
- powersave                   - Optimize for low power consumption
- realtime                    - Optimize for realtime workloads
- realtime-virtual-guest      - Optimize for realtime workloads running within a KVM guest
- realtime-virtual-host       - Optimize for KVM guests running realtime workloads
- sap-hana                    - Optimize for SAP HANA
- sap-netweaver               - Optimize for SAP NetWeaver
- server-powersave            - Optimize for server power savings
- spectrumscale-ece           - Optimized for Spectrum Scale Erasure Code Edition Servers
- spindown-disk               - Optimize for power saving by spinning-down rotational disks
- throughput-performance      - Broadly applicable tuning that provides excellent performance across a variety of common server workloads
- virtual-guest               - Optimize for running inside a virtual guest
- virtual-host                - Optimize for running KVM guests

切换profile: tuned-adm profile name

查看当前profile:tuned-adm active

在实际应用中使用的是real-time

测试工具

主要使用cyclictest和oslat

实时响应测试工具之Cyclictest_白杨谷的博客-CSDN博客

实验结果

Normal VM without additional boot params

 Add additional boot params

  • 7
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值