4326 坐标测量不准确_在程序中准确地测量时间

最新推荐文章于 2024-06-12 08:16:57 发布

cuma2369

最新推荐文章于 2024-06-12 08:16:57 发布

阅读量270

点赞数

文章标签： java 操作系统编程语言 linux cpu

原文链接：https://www.systutorials.com/measuring-time-accurately-in-programs/

版权

4326 坐标测量不准确

It is quite common to measure the time in programs using APIs like clock() and gettimeofday(). We may also want to measure the time “accurately” for certain purposes, such as measuring a small piece of code’s execution time for performance analysis, or measuring the time in time-sensitive game software. It is hard to measure the time very accurately. But we surely can measure the time to the granularity that we can accept for our purpose. Let’s look at possible methods.

使用诸如clock()和gettimeofday()类的API在程序中测量时间是很常见的。我们可能还希望出于某些目的“准确”地测量时间，例如测量一小段代码的执行时间以进行性能分析，或者测量时间敏感的游戏软件中的时间。很难非常准确地测量时间。但是我们肯定可以衡量达到我们可以接受的粒度所需的时间。让我们看一下可能的方法。

函数gettimeofday和clock_gettime ∞ (gettimeofday and clock_gettime ∞)

gettimeofday and clock_gettime are POSIX APIs to get the time. gettimeofday is easy to use, but does not specify or tell the resolution of the system clock. For clock_gettime, clock_getres can be used to find out the resolution of a clock.

gettimeofday和clock_gettime是用于获取时间的POSIX API。 gettimeofday易于使用，但不指定或告知系统时钟的分辨率。对于clock_gettime ， clock_getres可用于找出时钟的分辨率。

On the other hand, the calling gettimeofday and clock_gettime themselves have cost. Assume they get the time from the same source, one important factor for the accuracy is the cost (or time) for calling these APIs. At which level do these APIs cost? Is gettimeofday very slow?

另一方面，调用gettimeofday和clock_gettime本身是有代价的。假设它们从同一来源获得时间，则准确性的一个重要因素是调用这些API的成本（或时间）。这些API在哪个级别上收费？ gettimeofday很慢吗？

A benchmark and the results by David Terei may give us a brief picture. I quote part of the results here with time and ftime although they provide granularity of seconds or micro-seconds:

基准测试和David Terei的结果可能为我们提供了简短的描述。我在这里用time和ftime引用部分结果，尽管它们提供了秒或微秒的粒度：

time (s) => 4ns
ftime (ms) => 39ns
gettimeofday (us) => 30ns
clock_gettime (ns) => 26ns (CLOCK_REALTIME)
clock_gettime (ns) => 8ns (CLOCK_REALTIME_COARSE)
clock_gettime (ns) => 26ns (CLOCK_MONOTONIC)
clock_gettime (ns) => 9ns (CLOCK_MONOTONIC_COARSE)
clock_gettime (ns) => 170ns (CLOCK_PROCESS_CPUTIME_ID)
clock_gettime (ns) => 154ns (CLOCK_THREAD_CPUTIME_ID)

The performance/cost of gettiemofday is at 10s of ns. This cost and the fact the the actual resolution is unkown may be acceptable for many programs. These APIs on modern Linux are implemented with VDSO and are avoided to call into kernel (see a discussion here). If lower cost (10ns) and known resolution are required by the program, clock_gettime with (CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE) may be a good choice.

gettiemofday的性能/成本为10 ns。这种成本以及实际分辨率未知的事实对于许多程序来说都是可以接受的。现代Linux上的这些API是通过VDSO实现的，避免了调用内核（请参见此处的讨论）。如果程序需要较低的成本（10ns）和已知的分辨率，则使用（CLOCK_MONOTONIC_COARSE或CLOCK_REALTIME_COARSE）的clock_gettime是一个不错的选择。

For even higher resolution, rdtsc may be on put the table.

为了获得更高的分辨率，可能会在桌面上放置rdtsc 。

RDTSC和rdtscp ∞ (rdtsc and rdtscp ∞)

rdtsc is an instruction supported since Pentium class CPUs to read the current time stamp counter (TSC) which is incremented every CPU tick (1/CPU_HZ). The TSC is a 64-bit register on x86 processors. PowerPC provides similar capability. TSC/rdtsc allow to measure time in an accurate fashion.

rdtsc是受支持的指令，因为奔腾类CPU读取当前时间戳计数器（TSC），该时间戳计数器每CPU滴答（1 / CPU_HZ）递增一次。 TSC是x86处理器上的64位寄存器。 PowerPC提供了类似的功能。 TSC / rdtsc允许以准确的方式测量时间。

There are a couple of good implementations using rdtsc in C/asm on the Web, you can check them: Time-stamp counter, cycle.h and Pentium Time Stamp Counter.

在Web上的C / asm中有一些使用rdtsc的良好实现，您可以检查它们：时间戳计数器， cycle.h和Pentium时间戳计数器。

Everything has two sides. You need to pay special attention to their drawbacks if you used rdtsc in your program.

一切都有两个方面。如果在程序中使用rdtsc ，则需要特别注意它们的缺点。

First, the rdtsc instructions may not be performed in the order that they appear in the executable because of out-of-order execution. This can make one rdtsc executed later than expected and produce a misleading cycle count. Here is an example from Using the RDTSC Instruction for Performance Monitoring:

首先，由于rdtsc序执行， rdtsc指令可能无法按它们在可执行文件中出现的顺序执行。这会使一个rdtsc执行时间比预期的晚，并产生误导性的周期计数。这是使用RDTSC指令进行性能监视的示例：

 rdtsc         ; read time stamp
 mov time, eax ; move counter into variable
 fdiv          ; floating-point divide
 rdtsc         ; read time stamp
 sub eax, time ; find the difference

This code tries to measure the time it takes to perform a floating-point division by fdiv. The fdiv will take a long time to complete and, potentially, the second rdtsc instruction could actually execute before the fdiv. If this happened, the cycle count will not be the one expected.

此代码尝试测量执行fdiv进行浮点除法所需的时间。 fdiv将花费很长时间才能完成，并且第二条rdtsc指令实际上可能会在fdiv之前执行。如果发生这种情况，则周期计数将不会是预期的计数。

Inserting serializing instructions, such cpuid, which forces every preceding instructions in the code to complete before allowing the program to continue, can keep the rdtsc instructions from being performed out-of-order. The code using cpuid for the above example is as follows.

插入序列化指令（例如cpuid ）可强制代码中的每个先前指令在允许程序继续运行之前完成，从而可以防止rdtsc指令被rdtsc执行。上面示例中使用cpuid的代码如下。

 cpuid         ; force all previous instructions to complete
 rdtsc         ; read time stamp counter
 mov time, eax ; move counter into variable
 fdiv          ; floating-point divide
 cpuid         ; wait for FDIV to complete before RDTSC
 rdtsc         ; read time stamp counter
 sub eax, time ; find the difference

An alternative way is to use rdtscp which will wait until all previous instructions have been executed before reading the counter. However, rdtscp is not supported on all CPU models. It is indicated by CPUID leaf 80000001H, EDX bit 27. If the bit is set to 1 then rdtscp is present on the processor. For more details, check x86-64 ISA / Assembly Programming References.

另一种方法是使用rdtscp ，它会等到所有先前的指令都已执行后再读取计数器。但是，并非所有CPU型号都支持rdtscp 。它由CPUID leaf 80000001H, EDX位27指示。如果该位设置为1，则rdtscp出现在处理器上。有关更多详细信息，请参见《 x86-64 ISA /汇编编程参考》。

There are other cons with rdtsc used. Here is a list of these concerns combined from Game Timing and Multicore Processors and Time Stamp Counter which together summarize these possible problems quite well.

使用rdtsc还有其他缺点。这里列出了游戏计时和多核处理器以及时间戳计数器，这些问题综合在一起，很好地总结了这些可能的问题。

Discontinuous values. Multiprocessor and dual-core systems do not guarantee synchronization of their cycle counters between cores. This is exacerbated when combined with modern power management technologies that idle and restore various cores at different times, which results in the cores typically being out of synchronization. For an application, this generally results in glitches or in potential crashes as the thread jumps between the processors and gets timing values that result in large deltas, negative deltas, or halted timing.

不连续的值。多处理器和双核系统不保证内核之间的周期计数器同步。当与现代电源管理技术结合使用时，这种情况会更加恶化，该技术会在不同时间闲置和恢复各个内核，这通常导致内核不同步。对于一个应用程序，这通常会导致线程间在处理器之间跳转并导致出现小故障或潜在的崩溃，这些定时值会导致较大的增量，负的增量或暂停的时序。

Variability of the CPU’s frequency. Technology that changes the frequency of the CPU is in use in many high-end desktop PCs. Recent Intel processors include a constant rate TSC. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate.

CPU频率的可变性。许多高端台式机都使用改变CPU频率的技术。最近的英特尔处理器包括恒定速率TSC。虽然这可以使时间保持更加一致，但它可能会歪曲基准测试，在OS将处理器切换到较高速率之前，在较低时钟速率下花费了一定数量的加速时间。

Portability. Reliance on the time stamp counter also reduces portability, as other processors may not have a similar feature.

可移植性。对时间戳计数器的依赖还降低了可移植性，因为其他处理器可能没有类似的功能。

翻译自: https://www.systutorials.com/measuring-time-accurately-in-programs/

4326 坐标测量不准确

cuma2369

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
4326 坐标测量不准确_在程序中准确地测量时间

4326 坐标测量不准确It is quite common to measure the time in programs using APIs like clock() and gettimeofday(). We may also want to measure the time “accurately” for certain purposes, such as measuring a s...
复制链接

扫一扫