缓存数据预取(Cache Prefetching Overview)

本文探讨了缓存数据预取的重要性,包括软件预取和硬件预取两种方式。软件预取通过预取指令提前加载数据,但需避免无效预取。硬件预取则根据程序的内存访问模式自动预取数据,适用于规则访问模式。硬件预取节省执行资源,与软件预取结合使用可覆盖更多访问类型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文为半翻译文(非直译,多数理解为主),主旨在于理解缓存预取机制,以便提高缓存效率

Even programs with good data locality will now and then have to access a cache line that is not in the cache, and will then stall until the data has been fetchedfrom main memory. It would of course be better if there was a way to load thedata into the cache before it is needed so the stall could be avoided. This iscalled prefetching and there are two ways to achieve it,software prefetching and hardware prefetching.
即使程序有好的数据局部性,也仍然在某一时刻需要访问不在Cache中的数据,此时CPU必须等待数据被加载到Cache后,计算才能继续进行,为此数据预取是一个很好的优化方法:数据预取分为软件预取和硬件预取。

Software Prefetching 软件预取

With softwareprefetching the programmer or compiler inserts prefetch instructions into theprogram. These are instructions that initiate a load of a cache line into thecache, but do not stall waiting for the data to arrive.
软件预取指令可以帮助程序员,或由编译器完成数据的软件预取。

A critical property of prefetch instructions is the time from when theprefetch is executed to when the data is used. If the prefetch is too close tothe instruction using the prefetched data, the cache line will not have hadtime to arrive from main memory or the next cache level and the instructionwill stall. This reduces the effectiveness of the prefetch.
预取指令的关键问题在于,预取执行的时机,如果预取指令与数据计算指令相隔很近,则预取指令在没有完成数据load到Cache之前,计算仍然会Stall,此时软件预取效果就没有很好的达到

If the prefetch is too far ahead of the instruction using the prefetcheddata, the prefetched cache line will instead already have been evicted againbefore the data is actually used. The instruction using the data will thencause another fetch of the cache line and have to stall. This not onlyeliminates the benefit of the prefetch instruction, but introduces additionalcosts since the cache line is now fetched twice from main memory or the nextcache level. This increases the memory bandwidth requirement of the program.
如果预取的时机太过晚于计算指令,那么就很有可能被预取的数据Cache line已经被替换,此时仍然无法达到预取的效果。

Processors that have multiple levels of caches often have differentprefetch instructions for prefetching data into different cache levels. Thiscan be used, for example, to prefetch data from main memory to the L2 cache farahead of the use with an L2 prefetch instruction, and then prefetch data fromthe L2 cache to the L1 cache just before the use with a L1 prefetchinstruction.
处理器有多层缓存,通常使用不同的预取指令。例如如果从主存储加载数据到L2,则可以使用L2预取指令,并且要远提前于执行指令,然后从L2加载数据到L1,则可以使用L1预取指令在执行指令之前即可。

There is a cost for executing a prefetch instruction. The instruction hasto be decoded and it uses some execution resources. A prefetch instruction thatalways prefetches cache lines that are already in the cache will consumeexecution resources without providing any benefit. It is therefore important toverify that prefetch instructions really prefetch data that is not already inthe cache.
预取指令具有开销,指令需要进行解码和执行并使用相应的自愿。如果预取指令加载的Cache line已经存在于Cache中,则执行开销没有带来任何的收益。因此预取要确定数据确实不在Cache中。

The cache miss ratio needed by a prefetch instruction to be useful dependson its purpose. A prefetch instruction that fetches data from main memory onlyneeds a very low miss ratio to be useful because of the high main memory accesslatency. A prefetch instruction that fetches cache lines from a cache furtherfrom the processor to a cache closer to the processor may need a miss ratio ofa few percent to do any good.
软件预取有效果,前提是Cache趋势率必须小于一定比例,这样才能减少访问主存带来的访问延时。

It is common that software prefetching fetches slightly more data than is actually used. For example, when iterating over a large array it is common to prefetch data some distance ahead of the loop, for example, 1 kilobyte ahead of the loop. When the loop is approaching the end of the array the software prefetching should ideally stop. However, it is often cheaper to continue to prefetch data beyond the end of the array than to insert additional code tocheck when the end of the array is reached. This means that 1 kilobyte of databeyond the end of the array that isn't needed is fetched.

Hardware Prefetching 硬件预取

Many modern processors implement hardware prefetching. This means that theprocessor monitors the memory access pattern of the running program and triesto predict what data the program will access next and prefetches that data.There are few different variants of how this can be done.
现代处理器大多数实现了硬件预取。处理器检测运行的程序的存储访问模式,并进行预测那些数据会被下次访问,并将预取这些数据。

stream prefetcher looks for streams where a sequence ofconsecutive cache lines are accessed by the program. When such a stream isfound the processor starts prefetching the cache lines ahead of the program'saccesses.
stream prefetcher,将连续访问的Cache line预取到Cache中。

stride prefetcher looks for instructions that makeaccesses with regular strides, that do not necessarily have to be toconsecutive cache lines. When such an instruction is detected the processortries to prefetch the cache lines it will access ahead of it.
stride prefetcher,固定的跨度访问模式

An adjacent cache line prefetcher automatically fetchesadjacent cache lines to ones being accessed by the program. This can be used tomimic behaviour of a larger cache line size in a cache level without actuallyhaving to increase the line size.
adjacent cache line prefetcher,自动预取相邻的Cache line

Hardware prefetchers can generally only handle very regular accesspatterns. The cost of prefetching data that isn't used can be high, soprocessor designers have to be conservative.

An advantage of hardware prefetching compared to software prefetching isthat no extra instructions that use execution resources are needed in theprogram. If you know that an application is going to be run on processors withhardware prefetching, a combination of hardware and software prefetching can beused. The hardware prefetcher can be trusted to prefetch highly regularaccesses, while software prefetching can be used for irregular accesses thatthe hardware prefetcher can not handle.

硬件预取:规则访问

软件预取:非规则访问
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值