Software Prefetching 软件预取
With softwareprefetching the programmer or compiler inserts prefetch instructions into theprogram. These are instructions that initiate a load of a cache line into thecache, but do not stall waiting for the data to arrive.预取指令的关键问题在于,预取执行的时机,如果预取指令与数据计算指令相隔很近,则预取指令在没有完成数据load到Cache之前,计算仍然会Stall,此时软件预取效果就没有很好的达到
如果预取的时机太过晚于计算指令,那么就很有可能被预取的数据Cache line已经被替换,此时仍然无法达到预取的效果。
Processors that have multiple levels of caches often have differentprefetch instructions for prefetching data into different cache levels. Thiscan be used, for example, to prefetch data from main memory to the L2 cache farahead of the use with an L2 prefetch instruction, and then prefetch data fromthe L2 cache to the L1 cache just before the use with a L1 prefetchinstruction.
处理器有多层缓存,通常使用不同的预取指令。例如如果从主存储加载数据到L2,则可以使用L2预取指令,并且要远提前于执行指令,然后从L2加载数据到L1,则可以使用L1预取指令在执行指令之前即可。
There is a cost for executing a prefetch instruction. The instruction hasto be decoded and it uses some execution resources. A prefetch instruction thatalways prefetches cache lines that are already in the cache will consumeexecution resources without providing any benefit. It is therefore important toverify that prefetch instructions really prefetch data that is not already inthe cache.
预取指令具有开销,指令需要进行解码和执行并使用相应的自愿。如果预取指令加载的Cache line已经存在于Cache中,则执行开销没有带来任何的收益。因此预取要确定数据确实不在Cache中。
The cache miss ratio needed by a prefetch instruction to be useful dependson its purpose. A prefetch instruction that fetches data from main memory onlyneeds a very low miss ratio to be useful because of the high main memory accesslatency. A prefetch instruction that fetches cache lines from a cache furtherfrom the processor to a cache closer to the processor may need a miss ratio ofa few percent to do any good.
软件预取有效果,前提是Cache趋势率必须小于一定比例,这样才能减少访问主存带来的访问延时。
It is common that software prefetching fetches slightly more data than is actually used. For example, when iterating over a large array it is common to prefetch data some distance ahead of the loop, for example, 1 kilobyte ahead of the loop. When the loop is approaching the end of the array the software prefetching should ideally stop. However, it is often cheaper to continue to prefetch data beyond the end of the array than to insert additional code tocheck when the end of the array is reached. This means that 1 kilobyte of databeyond the end of the array that isn't needed is fetched.
Hardware Prefetching 硬件预取
Many modern processors implement hardware prefetching. This means that theprocessor monitors the memory access pattern of the running program and triesto predict what data the program will access next and prefetches that data.There are few different variants of how this can be done.
现代处理器大多数实现了硬件预取。处理器检测运行的程序的存储访问模式,并进行预测那些数据会被下次访问,并将预取这些数据。
A stream prefetcher looks for streams where a sequence ofconsecutive cache lines are accessed by the program. When such a stream isfound the processor starts prefetching the cache lines ahead of the program'saccesses.
stream prefetcher,将连续访问的Cache line预取到Cache中。
A stride prefetcher looks for instructions that makeaccesses with regular strides, that do not necessarily have to be toconsecutive cache lines. When such an instruction is detected the processortries to prefetch the cache lines it will access ahead of it.
stride prefetcher,固定的跨度访问模式
An adjacent cache line prefetcher automatically fetchesadjacent cache lines to ones being accessed by the program. This can be used tomimic behaviour of a larger cache line size in a cache level without actuallyhaving to increase the line size.
adjacent cache line prefetcher,自动预取相邻的Cache line
Hardware prefetchers can generally only handle very regular accesspatterns. The cost of prefetching data that isn't used can be high, soprocessor designers have to be conservative.
An advantage of hardware prefetching compared to software prefetching isthat no extra instructions that use execution resources are needed in theprogram. If you know that an application is going to be run on processors withhardware prefetching, a combination of hardware and software prefetching can beused. The hardware prefetcher can be trusted to prefetch highly regularaccesses, while software prefetching can be used for irregular accesses thatthe hardware prefetcher can not handle.
硬件预取:规则访问
软件预取:非规则访问