RDTSCP inaccuracy at Intel i7

最新推荐文章于 2024-06-24 14:23:16 发布

mzguanglin

最新推荐文章于 2024-06-24 14:23:16 发布

阅读量1.4k

点赞数

本文链接：https://blog.csdn.net/mzguanglin/article/details/38445833

版权

In 2010, an Intel guy Gabriele Paoloni wrote a white paper "How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures", describing precise methods to measure the clock cycles required to execute specific C code in a Linux environment by RDTSC/RDTSCP. In his paper, he addresses 3 problems that may harm measurement reliability. The 1st is instruction cache; the 2nd is CPU preemption (task scheduling ,interrupt.. ); The 3rd is out of order execution. He resolves those problems by using some kernel functions as well as CPU instructions and finally demonstrates a extremely reliable measurement cost (i.e. measuring no instruction) with the min time 44 cycles and the variance of 2~3. That's awesome!

Since Gabriele announces his source code in the paper, it is straightforward for me to replicate his experiment, but I couldn't get the comparable result. The experiment consists of two loops. The inner loop measures no instruction for 100K times. The outer loop repeats 1K times of the inner loop. However, even with identical source code, I couldn't gain stable result set on my Intel i7 workstation. In a trial, the min cycles varies from 40 to 122; the variances varies from 15326 to 138.

The CPU on my platform is Intel Core i7-4779K CPU @ 3.50GHz. Maybe the i7 introduces new features that harm RDTSCP? Can anybody provide some ideas?