RDTSCP inaccuracy at Intel i7

In 2010, an Intel guy Gabriele Paoloni wrote a white paper "How to Benchmark   Code Execution   Times on Intel®   IA-32 and IA-64   Instruction Set   Architectures", describing precise methods to measure the clock cycles required to execute specific C code in a Linux environment by RDTSC/RDTSCP. In his paper, he addresses 3 problems that may harm measurement reliability. The 1st is instruction cache; the 2nd is CPU preemption (task scheduling ,interrupt.. ); The 3rd is out of order execution. He resolves those problems by using some kernel functions as well as CPU instructions and finally demonstrates a extremely reliable measurement cost  (i.e. measuring no instruction) with the min time 44 cycles and the variance of 2~3. That's awesome!


Since Gabriele announces his source code in the paper, it is straightforward for me to replicate his experiment, but I couldn't get the comparable result.  The experiment consists of two loops. The inner loop measures no instruction for 100K times. The outer loop repeats 1K times of the inner loop. However, even with identical source code, I couldn't gain stable result set on my Intel i7 workstation. In a trial, the min cycles varies from 40 to 122; the variances varies from 15326 to 138.


The CPU on my platform is Intel Core i7-4779K CPU @ 3.50GHz. Maybe the i7 introduces new features that harm RDTSCP? Can  anybody provide some ideas?


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值