4326 坐标测量不准确
It is quite common to measure the time in programs using APIs like clock()
and gettimeofday()
. We may also want to measure the time “accurately” for certain purposes, such as measuring a small piece of code’s execution time for performance analysis, or measuring the time in time-sensitive game software. It is hard to measure the time very accurately. But we surely can measure the time to the granularity that we can accept for our purpose. Let’s look at possible methods.
使用诸如clock()
和gettimeofday()
类的API在程序中测量时间是很常见的。 我们可能还希望出于某些目的“准确”地测量时间,例如测量一小段代码的执行时间以进行性能分析,或者测量时间敏感的游戏软件中的时间。 很难非常准确地测量时间。 但是我们肯定可以衡量达到我们可以接受的粒度所需的时间。 让我们看一下可能的方法。
函数gettimeofday和clock_gettime ∞ (gettimeofday and clock_gettime ∞)
gettimeofday
and clock_gettime
are POSIX APIs to get the time. gettimeofday
is easy to use, but does not specify or tell the resolution of the system clock. For clock_gettime
, clock_getres
can be used to find out the resolution of a clock.
gettimeofday
和clock_gettime
是用于获取时间的POSIX API。 gettimeofday
易于使用,但不指定或告知系统时钟的分辨率。 对于clock_gettime
, clock_getres
可用于找出时钟的分辨率。
On the other hand, the calling gettimeofday
and clock_gettime
themselves have cost. Assume they get the time from the same source, one important factor for the accuracy is the cost (or time) for calling these APIs. At which level do these APIs cost? Is gettimeofday
very slow?
另一方面,调用gettimeofday
和clock_gettime
本身是有代价的。 假设它们从同一来源获得时间,则准确性的一个重要因素是调用这些API的成本(或时间)。 这些API在哪个级别上收费? gettimeofday
很慢吗?
A benchmark and the results by David Terei may give us a brief picture. I quote part of the results here with time
and ftime
although they provide granularity of seconds or micro-seconds:
基准测试和David Terei的结果可能为我们提供了简短的描述。 我在这里用time
和ftime
引用部分结果,尽管它们提供了秒或微秒的粒度:
time (s) => 4ns
ftime (ms) => 39ns
gettimeofday (us) => 30ns
clock_gettime (ns) => 26ns (CLOCK_REALTIME)
clock_gettime (ns) => 8ns (CLOCK_REALTIME_COARSE)
clock_gettime (ns) => 26ns (CLOCK_MONOTONIC)
clock_gettime (ns) => 9ns (CLOCK_MONOTONIC_COARSE)
clock_gettime (ns) => 170ns (CLOCK_PROCESS_CPUTIME_ID)
clock_gettime (ns) => 154ns (CLOCK_THREAD_CPUTIME_ID)
The performance/cost of gettiemofday
is at 10s of ns. This cost and the fact the the actual resolution is unkown may be acceptable for many programs. These APIs on modern Linux are implemented with VDSO and are avoided to call into kernel (see a discussion here). If lower cost (10ns) and known resolution are required by the program, clock_gettime
with (CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE) may be a good choice.
gettiemofday
的性能/成本为10 ns。 这种成本以及实际分辨率未知的事实对于许多程序来说都是可以接受的。 现代Linux上的这些API是通过VDSO实现的,避免了调用内核(请参见此处的讨论)。 如果程序需要较低的成本(10ns)和已知的分辨率,则使用(CLOCK_MONOTONIC_COARSE或CLOCK_REALTIME_COARSE)的clock_gettime
是一个不错的选择。
For even higher resolution, rdtsc
may be on put the table.
为了获得更高的分辨率,可能会在桌面上放置rdtsc
。
RDTSC和rdtscp ∞ (rdtsc and rdtscp ∞)
rdtsc
is an instruction supported since Pentium class CPUs to read the current time stamp counter (TSC) which is incremented every CPU tick (1/CPU_HZ). The TSC is a 64-bit register on x86 processors. PowerPC provides similar capability. TSC/rdtsc
allow to measure time in an accurate fashion.
rdtsc
是受支持的指令,因为奔腾类CPU读取当前时间戳计数器(TSC),该时间戳计数器每CPU滴答(1 / CPU_HZ)递增一次。 TSC是x86处理器上的64位寄存器。 PowerPC提供了类似的功能。 TSC / rdtsc
允许以准确的方式测量时间。
There are a couple of good implementations using rdtsc
in C/asm on the Web, you can check them: Time-stamp counter, cycle.h and Pentium Time Stamp Counter.
在Web上的C / asm中有一些使用rdtsc
的良好实现,您可以检查它们: 时间戳计数器 , cycle.h和Pentium时间戳计数器 。
Everything has two sides. You need to pay special attention to their drawbacks if you used rdtsc
in your program.
一切都有两个方面。 如果在程序中使用rdtsc
,则需要特别注意它们的缺点。
First, the rdtsc
instructions may not be performed in the order that they appear in the executable because of out-of-order execution. This can make one rdtsc
executed later than expected and produce a misleading cycle count. Here is an example from Using the RDTSC Instruction for Performance Monitoring:
首先,由于rdtsc
序执行, rdtsc
指令可能无法按它们在可执行文件中出现的顺序执行。 这会使一个rdtsc
执行时间比预期的晚,并产生误导性的周期计数。 这是使用RDTSC指令进行性能监视的示例:
rdtsc ; read time stamp
mov time, eax ; move counter into variable
fdiv ; floating-point divide
rdtsc ; read time stamp
sub eax, time ; find the difference
This code tries to measure the time it takes to perform a floating-point division by fdiv
. The fdiv
will take a long time to complete and, potentially, the second rdtsc
instruction could actually execute before the fdiv
. If this happened, the cycle count will not be the one expected.
此代码尝试测量执行fdiv
进行浮点除法所需的时间。 fdiv
将花费很长时间才能完成,并且第二条rdtsc
指令实际上可能会在fdiv
之前执行。 如果发生这种情况,则周期计数将不会是预期的计数。
Inserting serializing instructions, such cpuid
, which forces every preceding instructions in the code to complete before allowing the program to continue, can keep the rdtsc
instructions from being performed out-of-order. The code using cpuid
for the above example is as follows.
插入序列化指令(例如cpuid
)可强制代码中的每个先前指令在允许程序继续运行之前完成,从而可以防止rdtsc
指令被rdtsc
执行。 上面示例中使用cpuid
的代码如下。
cpuid ; force all previous instructions to complete
rdtsc ; read time stamp counter
mov time, eax ; move counter into variable
fdiv ; floating-point divide
cpuid ; wait for FDIV to complete before RDTSC
rdtsc ; read time stamp counter
sub eax, time ; find the difference
An alternative way is to use rdtscp
which will wait until all previous instructions have been executed before reading the counter. However, rdtscp
is not supported on all CPU models. It is indicated by CPUID leaf 80000001H, EDX
bit 27. If the bit is set to 1 then rdtscp
is present on the processor. For more details, check x86-64 ISA / Assembly Programming References.
另一种方法是使用rdtscp
,它会等到所有先前的指令都已执行后再读取计数器。 但是,并非所有CPU型号都支持rdtscp
。 它由CPUID leaf 80000001H, EDX
位27指示。如果该位设置为1,则rdtscp
出现在处理器上。 有关更多详细信息,请参见《 x86-64 ISA /汇编编程参考》 。
There are other cons with rdtsc
used. Here is a list of these concerns combined from Game Timing and Multicore Processors and Time Stamp Counter which together summarize these possible problems quite well.
使用rdtsc
还有其他缺点。 这里列出了游戏计时和多核处理器以及时间戳计数器 ,这些问题综合在一起,很好地总结了这些可能的问题。
Discontinuous values. Multiprocessor and dual-core systems do not guarantee synchronization of their cycle counters between cores. This is exacerbated when combined with modern power management technologies that idle and restore various cores at different times, which results in the cores typically being out of synchronization. For an application, this generally results in glitches or in potential crashes as the thread jumps between the processors and gets timing values that result in large deltas, negative deltas, or halted timing.
不连续的值。 多处理器和双核系统不保证内核之间的周期计数器同步。 当与现代电源管理技术结合使用时,这种情况会更加恶化,该技术会在不同时间闲置和恢复各个内核,这通常导致内核不同步。 对于一个应用程序,这通常会导致线程间在处理器之间跳转并导致出现小故障或潜在的崩溃,这些定时值会导致较大的增量,负的增量或暂停的时序。
Variability of the CPU’s frequency. Technology that changes the frequency of the CPU is in use in many high-end desktop PCs. Recent Intel processors include a constant rate TSC. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate.
CPU频率的可变性。 许多高端台式机都使用改变CPU频率的技术。 最近的英特尔处理器包括恒定速率TSC。 虽然这可以使时间保持更加一致,但它可能会歪曲基准测试,在OS将处理器切换到较高速率之前,在较低时钟速率下花费了一定数量的加速时间。
Portability. Reliance on the time stamp counter also reduces portability, as other processors may not have a similar feature.
可移植性。 对时间戳计数器的依赖还降低了可移植性,因为其他处理器可能没有类似的功能。
翻译自: https://www.systutorials.com/measuring-time-accurately-in-programs/
4326 坐标测量不准确