C/C++中如何稳定地每隔5ms执行某个函数？

最新推荐文章于 2024-04-27 01:33:13 发布

土豆西瓜大芝麻

最新推荐文章于 2024-04-27 01:33:13 发布

阅读量1k

点赞数

文章标签： c语言单片机开发语言

原文链接：https://www.zhihu.com/question/536739862

版权

链接：https://www.zhihu.com/question/536739862/answer/2524011929

1. windows上的多媒体定时器

5ms级别的绝对的准确定时是不可能的，在允许一定偏差的情况下是可行的。但是，方法不能是Sleep或者C++库，而是Win32 API中已经不常用的多媒体定时器，就是timeBeginPeriod、timeSetEvent那套API。

在多媒体定时器里面可以指定定时分辨率，最小1ms。当指定定时分辨率小于当前操作系统使用的硬件定时器周期的情况下，它会对硬件定时器重新编程，以实现所要求的定时分辨率。当然，同时也会调整操作系统内核里的参数，保持线程调度时间片长度等其他方面都不受影响。

由于多媒体定时器事件是由独立硬件产生的，它基本上没有积累误差，并且，它是从中断响应例程里面经过简单处理就转入用户态进行回调，从中断发生到进入用户态回调函数入口的延迟很短，但是回调函数里的用户态代码可能会被更高优先级的线程抢占，所以即使回调函数很短很简单，它能否在下一个事件到来之前运行完成并没有绝对保证。另外要注意，多媒体定时器的回调函数里面只能执行极其有限的任务。

这里澄清一下，多媒体定时器调整定时精度并不会影响Windows线程调度时间片的长度，这是两码事。比方说，假设原先硬件定时器的中断周期是15ms，中断一次系统就查看一下有没有其他线程等着切换上来。现在通过多媒体定时器api设定每毫秒产生一个事件，那么它会把硬件定时器重新编程，让它1ms发一个中断，操作系统会响应每个中断，会每毫秒去回调多媒体定时器的事件处理函数，但是，负责线程时间片的那部分代码会把中断累计15次，才认为当前线程的时间片已用完，进行线程调度。

2. linux上的sleep测试

非硬实时系统的确无法做到绝对精准可靠的定时。然而工程实践上，不要忽视操作系统和编译器这么多年的努力。有人手撸的线程池手撸过定时器，亲测核心数*100个定时器，20ms间隔，用单线程sleep计算超时然后分发给线程池执行回调，在回调中计算误差，不超过2%（实际为1.6%），而且NUMA环境依然运行稳定。所以至少在Linux上，只要系统负载不是过高，sleep定时基本可靠。

使用的C++和标准库测试定时器：

static const constexpr auto kInterval = 5ms;
static const constexpr auto kDuration = 1min;
auto now = std::chrono::high_resolution_clock::now();
auto tick = now;
auto final = now + kDuration;
while (now < final) {
  tick += kInterval;
  std::this_thread::sleep_until(tick);
}

不但用了sleep陷入上下文切换，还用的是标准库的API，而非系统API，猜猜看表现如何？

用最笨的CMake Release编译，不加什么-march=native之类的歪门邪道，然后用户态执行，也不玩啥绑定核心、设置优先级和实时调度等等。

误差计算方式为，每个tick对齐时间后，计算当前时刻和这个tick理论时刻的差值，将其追加到数组里，跑完完整时长后再统计和打印，避免打印输出造成干扰。

测试系统环境分别为：

系统：WSL2 Archlinux，内核：5.10.102.1-microsoft-standard-WSL2；
系统：Windows 11 专业版 22H2，内核：25145.1000；
系统：Ubuntu 18.04.5 LTS，内核：4.15.0-180-generic；
系统：Ubuntu 18.04.5 LTS，内核：4.14.193-rt92-tegra premmpt-rt

CPU	系统总负载	操作系统	编译器	标准差(us)	最大误差(us)	95%误差(us)	99%误差(us)
i9-9900k	45%	WSL2 Archlinux	gcc 11.2.0	11.648	986.400	208.000	253.600
i9-9900k	45%	WSL2 Archlinux	clang 13.0.1	11.408	1421.300	177.600	237.300
i9-9900k	45%	Windows 11 22H2	MSVC 2019	7228.230	16648.700	14278.500	15335.900
i9-9900k	45%	Windows 11 22H2	clang 14.0.0	8871.749	16475.500	14771.200	15519.200
Xeon 5217x2	5%	Ubuntu 18.04 LTS	gcc 7.5.0	57.609	73.106	58.512	59.794
ARMv8	0.5%	Ubuntu 18.04 premmpt-rt	gcc 7.5.0	89.910	128.747	97.195	101.769

Windows+MSVC比较拉跨，时间轴全程都是乱的，因为Windows上默认时间片是15.625ms；而另一个回答中说到的游戏服务器代码，通过timeBeginPeriod设置时间片为1ms，至少在我的Win11专业版上是无效的，设置后通过GetSystemTimeAdjustment返回的时间片依然是15.625ms，可能得使用Windows Server才行；

不过最神奇的是，WSL2的表现还挺不错，虽然这玩意本质上是个hyper-v虚拟机……99%误差在5%范围，这个精度基本可用；

Linux表现非常优秀，误差全程保持在1.2%以内，这个精度完全可用；

Linux的preempt-rt软实时内核表现反而更差，不知道是不是因为arm性能低一点，但是sleep也不吃性能啊。

然后，假如我们给每个tick增加100us的自旋呢？也就是sleep 4900us，然后自旋直到到超时？

static const constexpr auto kInterval = 5ms;
static const constexpr auto kSpin = 100us;
static const constexpr auto kDuration = 1min;
auto now = std::chrono::high_resolution_clock::now();
auto tick = now;
auto final = now + kDuration;
while (now < final) {
  tick += kInterval;
  std::this_thread::sleep_until(tick - kSpin);
  while (now < tick) {
    std::this_thread::yield();
    now = std::chrono::high_resolution_clock::now();
  }
}

作者：诸葛不亮
链接：https://www.zhihu.com/question/536739862/answer/2543502653
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

这里有个问题，就是yield有可能触发线程挂起，导致上下文切换。但是注释掉该行后，Linux和Windows的运行结果并未有显著变化，包括后面补充的timeBeginPeriod版本也是如此。

CPU	系统总负载	操作系统	编译器	标准差(us)	最大误差(us)	95%误差(us)	99%误差(us)
i9-9900k	45%	WSL2 Archlinux	gcc 11.2.0	48.701	570.700	96.300	151.500
i9-9900k	45%	WSL2 Archlinux	clang 13.0.1	50.495	576.400	106.000	154.600
i9-9900k	45%	Windows 11 22H2	MSVC 2019	8859.809	18281.700	14557.600	15351.600
i9-9900k	45%	Windows 11 22H2	clang 14.0.0	8933.111	16552.800	14747.900	15478.800
Xeon 5217x2	5%	Ubuntu 18.04 LTS	gcc 7.5.0	0.362	18.427	0.558	0.690
ARMv8	8%	Ubuntu 18.04 premmpt-rt	gcc 7.5.0	0.738	14.916	0.815	0.893

很神奇吧，Linux上的误差直线下降到万分之一，约等于没有误差了。

Windows上受限于15.625ms的时间片，还是无法保证精度。

WSL2上平均误差略有提升，但误差波动和最大误差显著下降，整体运行更加平稳，99%场景下保持3%误差以内，如果能接受1%的偶发波动，那么也算完全可用。

完整代码如下：

#include <algorithm>
#include <chrono>
#include <cmath>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <thread>
#include <vector>

using namespace std::literals::chrono_literals;

static const constexpr auto kInterval = 5ms;
static const constexpr auto kSpin = 100us;

int main(int argc, char* argv[]) {
  std::chrono::nanoseconds duration = 1min;
  if (argc > 1) {
    duration = std::chrono::seconds(std::atoi(argv[1]));
  }
  int ticks = duration / kInterval;
  std::vector<double> errors;
  errors.reserve(ticks);

  int digits = 0;
  do {
    ticks /= 10;
    ++digits;
  } while (ticks > 0);

  auto now = std::chrono::high_resolution_clock::now();
  auto begin = now;
  auto prev = now;
  auto tick = now;
  auto final = now + duration + kInterval;
  begin += kInterval;  // 多执行一次作为预热
  int i = 0;
  while (now < final) {
    prev = now;
    tick += kInterval;
    std::this_thread::sleep_until(tick - kSpin);
    now = std::chrono::high_resolution_clock::now();
    while (now < tick) {
      std::this_thread::yield();
      now = std::chrono::high_resolution_clock::now();
    }
    auto error =
        std::chrono::duration_cast<std::chrono::duration<double, std::micro>>(
            now - tick);
    errors.push_back(error.count());
    ++i;
  }
  errors.erase(errors.begin());  // 统计时丢弃第一个预热tick

  std::ios::sync_with_stdio(false);
  std::cout.tie(nullptr);
  std::cout.precision(3);
  std::cout.setf(std::ios::fixed);
  double std_err = 0.0;
  double max_err = 0.0;
  for (size_t i = 0; i < errors.size(); ++i) {
    std_err += std::pow(errors.at(i), 2);
    max_err = std::max(std::abs(max_err), std::abs(errors.at(i)));
    std::cout << "tick " << std::setw(digits) << (i + 1)
              << " error: " << std::setw(8) << errors.at(i) << "us"
              << std::endl;
  }
  std_err = std::sqrt(std_err / errors.size());
  for (double& error : errors) {
    error = std::abs(error);
  }
  std::sort(errors.begin(), errors.end());
  double most_95 = errors.at(errors.size() * 0.95);
  double most_99 = errors.at(errors.size() * 0.99);
  auto elapsed =
      std::chrono::duration_cast<std::chrono::duration<double>>(now - begin);
  std::cout << "total " << errors.size() << " ticks in " << elapsed.count()
            << "s" << std::endl;
  std::cout << "std error: " << std_err << "us" << std::endl;
  std::cout << "max error: " << max_err << "us" << std::endl;
  std::cout << "95% error: " << most_95 << "us" << std::endl;
  std::cout << "99% error: " << most_99 << "us" << std::endl;

  return 0;
}

查了一下MSDN，通过timeBeginPeriod设置时间片至1ms后，使用timeSetEvent建立系统级定时器，成功获得了比较理想的性能，大部分的tick误差降低到1ms以内。

（所以GetSystemTimeAdjustment依旧返回156250是在搞我是吧？）

但是奇怪的是，在大约2000-10500tick范围内，时间间隔又回到了15.625ms，只有在大约首尾10s的范围内是正常的。

使用sleep版本，在设置timeBeginPeriod后也差不多，只能稳定运行大约10s，统计结果基本一致。

由于运行过于不稳定，所以修改了误差方式，改成计算两个tick之间的时间差，再减去5ms间隔，运行十秒的统计结果为：

total 2000 ticks in 9.999s
std error: 436.408us
max error: 1361.100us
95% error: 855.100us
99% error: 960.000us

平均误差在10%左右，而且至少没丢tick，成功在刚好10s的时候触发了2000次，所以勉强能用吧…

20230303更新

手撸的简易版线程池和定时器优化过几轮，最近抽空跑了下benchmark，更新下数据。

定时器的调度还是开头的第一个方案，直接sleep到最近的一个tick，然后分发给线程池。实测只要不是系统里有乱七八糟的东西抢cpu，5msz何种精度根本不需要上硬实时。

1. Desktop

CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz 8C16T
System: Arch Linux
Kernel-release: Linux 6.1.12-zen1-1-zen
Kernel-version: #1 ZEN SMP PREEMPT_DYNAMIC Tue, 14 Feb 2023 22:08:11 +0000

timers	interval	ticks	standard error
Single Shot
16	20ms	1	0.064172%
160	20ms	1	0.048121%
1600	20ms	1	0.052030%
------	--------	-----	--------------
Sync Single Shot
32	20ms	1	0.015647%
320	20ms	1	0.018143%
3200	20ms	1	0.019471%
------	--------	-----	--------------
Normal Timer
16	20ms	100	0.094364%
160	20ms	100	0.049935%
1600	20ms	100	0.047018%
------	--------	-----	--------------
Sync Normal Timer
16	20ms	100	0.009787%
160	20ms	100	0.024087%
1600	20ms	100	0.037850%
------	--------	-----	--------------
Light Timer
16	20ms	100	0.079792%
160	20ms	100	0.047847%
1600	20ms	100	0.066415%
------	--------	-----	--------------
Sync Light Timer
16	20ms	100	0.046064%
160	20ms	100	0.018828%
1600	20ms	100	0.018925%

2. Desktop with NUMA

CPU: Intel(R) Xeon(R) Gold 5217 CPU @ 3.00GHz 8C16Tx2
System: Ubuntu 18.04.5 LTS
Kernel-release: Linux 4.15.0-201-generic
Kernel-version: #212-Ubuntu SMP Mon Nov 28 11:29:59 UTC 2022

timers	interval	ticks	standard error
Single Shot
32	20ms	1	0.083189%
320	20ms	1	0.068409%
3200	20ms	1	0.065791%
------	--------	-----	--------------
Sync Single Shot
32	20ms	1	0.013066%
320	20ms	1	0.013754%
3200	20ms	1	0.018617%
------	--------	-----	--------------
Normal Timer
32	20ms	100	0.011256%
320	20ms	100	0.027207%
3200	20ms	100	0.822226%
------	--------	-----	--------------
Sync Normal Timer
32	20ms	100	0.008387%
320	20ms	100	0.089677%
3200	20ms	100	0.050699%
------	--------	-----	--------------
Light Timer
32	20ms	100	0.013912%
320	20ms	100	0.012924%
3200	20ms	100	0.066572%
------	--------	-----	--------------
Sync Light Timer
32	20ms	100	0.008739%
320	20ms	100	0.009238%
3200	20ms	100	0.094552%

3. Embbed Industrial PC

CPU: ARMv8 Processor rev 0 (v8l) 7C7T
System: Ubuntu 18.04.5 LTS
Kernel-release: Linux 4.14.193-rt92-tegra
Kernel-version: #1 SMP PREEMPT RT Mon Apr 19 02:11:35 PDT 2021

timers	interval	ticks	standard error
Single Shot
7	20ms	1	0.070154%
70	20ms	1	0.136009%
700	20ms	1	0.248653%
------	--------	-----	--------------
Sync Single Shot
7	20ms	1	0.071237%
70	20ms	1	0.068406%
700	20ms	1	0.087798%
------	--------	-----	--------------
Normal Timer
7	20ms	100	0.077910%
70	20ms	100	0.078375%
700	20ms	100	0.082158%
------	--------	-----	--------------
Sync Normal Timer
7	20ms	100	0.086753%
70	20ms	100	0.078405%
700	20ms	100	0.087344%
------	--------	-----	--------------
Light Timer
7	20ms	100	0.070703%
70	20ms	100	0.064910%
700	20ms	100	0.084942%
------	--------	-----	--------------
Sync Light Timer
7	20ms	100	0.086785%
70	20ms	100	0.060398%
700	20ms	100	0.085978%

Light Timer是轻量级定时器，类似QObject::startTimer，只有一个id，没有timer对象管理。

Sync Timer是同步执行机制，回调函数直接在调度线程里就地执行，而不是分发到线程池。

3. 一种游戏编程方案

提供一个游戏编程中的方案，大概的代码（windows下，如果需要改为其他系统需要替换对应的time_begin_period或者sleep函数）：

time_begin_period(1); // windows中winmm.dll的timeBeginPeriod的封装

while(true) {
    auto start_ticks = get_current_ticks(); //执行前时间

    execute(); // 目标函数
    
    auto end_ticks = get_current_ticks(); //执行后时间
    auto cost_ticks = end_ticks - start_ticks; //执行当前帧花费的时间
    auto wait_ticks = milliseconds_to_ticks(5) - cost_ticks; //计算需要等待的时间
    if(wait_ticks <= 0) {
        continue; // 当前帧正好或者超时了，直接执行下一帧
    }
    if(wait_ticks > milliseconds_to_ticks(1)) {
        sleep(ticks_to_milliseconds(wait_ticks) - 1); // 如果等待时间大于1ms，就sleep
    }
    auto next_ticks = start_ticks + milliseconds_to_ticks(5); //下一次执行期望时间
    while (get_current_ticks() < next_ticks) {
        //空，原地等待
    }
}

虽然会牺牲少量的性能，但是可以在低负载情况下保证基本稳定在200帧。