性能评测技术: C++堆栈操作与宏定义的效率比较

最新推荐文章于 2024-09-26 12:00:11 发布

置顶橘色的喵

最新推荐文章于 2024-09-26 12:00:11 发布

阅读量252

点赞数 7

分类专栏：性能优化、功能优化 C++ 文章标签： c++ 宏定义性能评测 MACRO inline

本文链接：https://blog.csdn.net/stallion5632/article/details/139138092

版权

C++ 同时被 2 个专栏收录

87 篇文章 3 订阅

订阅专栏

性能优化、功能优化

49 篇文章 1 订阅

订阅专栏

这篇文章将通过代码示例比较三种不同的操作方式在性能上的差异。评测内容包括：

使用堆栈临时变量的函数 test_stack
使用 always_inline 关键字声明的内联函数 test_stack_always_inline
利用宏定义 TEST_MACRO 模拟的代码块

我们将通过循环执行这三种操作指定次数，并测量耗费的时间进行对比。循环过程中加入了 clearCpuMem 函数，用来避免因 CPU 缓存带来的性能干扰。

完整代码如下：

#include <chrono>
#include <iostream>
#include <cstring>

using namespace std::chrono;

static const uint64_t kSize = 1000000000;  // loop

#define TEST_MACRO \
  a = 1;           \
  b = 2;           \
  c = 3;           \
  d = 4;           \
  e = 5;           \
  f = 6;           \
  g = 7;           \
  h = 8;           \
  i = 1;           \
  j = 2;           \
  k = 33;          \
  l = 22;          \
  m = 66;          \
  n = 77;          \
  o = 22;          \
  p = 32;          \
  q = 21;          \
  r = 55;          \
  s = 43;          \
  t = 231

void __attribute__((noinline)) test_stack(int &a, int &b, int &c, int &d, int &e, int &f, int &g, int &h,
                int &i, int &j, int &k, int &l, int &m, int &n, int &o, int &p,
                int &q, int &r, int &s, int &t) {
  TEST_MACRO;
}

void __attribute__((always_inline)) test_stack_always_inline(int &a, int &b, int &c, int &d, int &e, int &f, int &g, int &h,
                int &i, int &j, int &k, int &l, int &m, int &n, int &o, int &p,
                int &q, int &r, int &s, int &t) {
  TEST_MACRO;
}

void test_static(int *pa_s1, int *pb_s1, int *pc_s1, int *pd_s1, int *pe_s1,
                 int *pf_s1, int *pg_s1, int *ph_s1, int *pi_s1, int *pj_s1,
                 int *pk_s1, int *pl_s1, int *pm_s1, int *pn_s1, int *po_s1,
                 int *pp_s1, int *pq_s1, int *pr_s1, int *ps_s1, int *pt_s1) {
  *pa_s1 = 1;
  *pb_s1 = 2;
  *pc_s1 = 3;
  *pd_s1 = 4;
  *pe_s1 = 5;
  *pf_s1 = 6;
  *pg_s1 = 7;
  *ph_s1 = 8;
  *pi_s1 = 1;
  *pj_s1 = 2;
  *pk_s1 = 33;
  *pl_s1 = 22;
  *pm_s1 = 66;
  *pn_s1 = 77;
  *po_s1 = 22;
  *pp_s1 = 32;
  *pq_s1 = 21;
  *pr_s1 = 55;
  *ps_s1 = 43;
  *pt_s1 = 231;
}

// 随便执行指令，让CPU缓存刷新
int clearCpuMem() {
  char strTmp[1024];
  std::memset(strTmp, 0, sizeof(strTmp));
  char strTmp2[1024];
  std::memset(strTmp2, 0, sizeof(strTmp2));
  char strTmp3[1024];
  std::memset(strTmp3, 0, sizeof(strTmp3));
  return 0;
}

// 测试函数模板
template <typename Func>
void run_test(const char *test_name, Func func) {
  clearCpuMem();
  auto start = system_clock::now();
  int a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t = 0;
  for (uint64_t index = 0; index < kSize; ++index) {
    func(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t);
  }
  auto end = system_clock::now();
  auto duration = duration_cast<microseconds>(end - start);
  std::cout << test_name << " cost "
            << double(duration.count()) * microseconds::period::num / microseconds::period::den
            << " seconds" << std::endl;
}

int main() {
  run_test("test_stack", test_stack);
  run_test("test_stack_always_inline", test_stack_always_inline);
  run_test("mock use macro", [](int &a, int &b, int &c, int &d, int &e, int &f, int &g, int &h, int &i, int &j, int &k, int &l, int &m, int &n, int &o, int &p, int &q, int &r, int &s, int &t) {
    TEST_MACRO;
  });

  std::cout << "Hello World!\n";
}

宏定义在预编译阶段展开，相当于文本替换。因此，使用宏定义的代码块会直接在循环体中展开为一系列赋值语句。

接下来是 test_stack 和 test_stack_always_inline 两个函数。二者的区别在于 always_inline 关键字。当函数被声明为 always_inline 时，编译器会尽力尝试将函数内联到调用处，减少函数调用的开销。

测试结果

test_stack cost 24.5237 seconds
test_stack_always_inline cost 21.0347 seconds
mock use macro cost 20.9262 seconds
Hello World!

通过该程序，我们可以得到如下性能评测结果：

使用堆栈临时变量的函数最快
使用 always_inline 关键字声明的内联函数次之
利用宏定义模拟的代码块最慢

造成性能差异的原因主要在于函数调用本身的开销。非内联函数的调用需要保存/恢复寄存器、压栈/出栈等操作，相比之下，堆栈变量的读写和内联函数的调用都更加高效。

需要注意的是，本评测程序仅供参考，实际性能的影响因素众多，例如代码复杂度、编译器优化选项等。但通过该示例，我们可以对函数调用、内联以及宏定义在性能上的影响有一个初步的认识。

橘色的喵

关注

7
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录