Writing Cache-friendly Code

原创 2013年12月02日 10:49:40
In the previous essay Exhibiting Good Locality in Your Programs, we presented two functions named sumarrayrows and sumarraycols respectively. And we knew that sumarrayrows had a stride-1 reference pattern (visit each element of the array sequentially), whereas sumarraycols had a stride-N reference pattern (visit every Nth element of the contiguous array). In this essay, we will show you how to quantify the idea of locality in terms of cache hits and cache misses.
      In general, if a cache has a block size of B bytes, then a stride-k reference pattern (where k is expressed in words) results in an average of min(1,  (wordsize × k) / B) misses per loop iteration. This is minimized for = 1.
      To take sumarrayrowsfor example, 
int sumarrayrows(int a[M][N])
{
    int sum = 0;
    for (int i=0; i!=M; ++i)
        for (int j=0; j!=N; ++j)
            sum += a[i][j];
    return sum;
}
since C stores arrays in row-major order, the inner loop of this function has a desirable stride-1 access pattern. Suppose that a is block aligned, words are 4 bytes, cache blocks are 4 words, and the cache is initially empty (a cold cache). Then the references to the array a will result in the following pattern of hits and misses:

      In this example, the reference to a[0][0] misses and the corresponding block which contains a[0][0]-a[0][3], is loaded into the cache from memory. Thus, the next three reference are all hits. The reference to a[0][4] causes another miss as a new block is loaded into the cache, the next three references are hits, and so on. In general, three out of four references will hit, which is the best we can do in this case with a cold cache.
      But consider what happens if we make the seemingly innocuous change of permuting the loops as sumarraycols:
int sumarraycols(int a[M][N])
{
    int sum = 0;
    for (int j=0; j!=N; ++j)
        for (int i=0; i!=M; ++i)
            sum += a[i][j];
    return sum;
}
In this case, we are scanning the array column by column instead of  row by row. If we are lucky and the entire array fits in the cache, then we will enjoy the same miss rate of 1/4. However, if the array if larger than the cache (the more likely case), then each and every access of a[i][j] will miss!

      Higher miss rates can have a significant impact on running time. For example, on our desktop machine,sumarrayrows runs twice as fast as sumarraycols. To summarize, the two functions illustrate two important points about writing cache-friendly code:
   1. Repeated references to local variables are good because the compiler can cache them in the register file (temporal locality).
   2. Stride-1 reference patterns are good because caches at all levels of the memory hierarchy store data as contiguous blocks (spatial locality).

it.sauronsoftware.ftp4j.FTPException [code=426, message= Failure writing network stream.

在使用ftp4j下载ftp时出现,下载却提示Failure writing network stream.。。很令人费解。 后面发现是在FTPDataTransferListener里面使用了系统的...
  • x635981012
  • x635981012
  • 2016年04月15日 10:13
  • 970

ios BLE通讯遇到的问题

错误log: As: Error Domain=CBATTErrorDomain Code=3 "Writing is not permitted." UserInfo={NSLocalizedDe...
  • knaht
  • knaht
  • 2017年12月07日 17:04
  • 159

Cache friendly code

缓存友好型代码,和减少磁盘IO类似,减少内存IO,使CPU尽量使用缓存中的数据。 点击这里打开一篇讲Cache friendly code的文章...
  • fujiaxiaoshao
  • fujiaxiaoshao
  • 2015年10月29日 17:21
  • 330

Writing code is writing

Introduction I may be a programmer, but I’m also an avid reader of non-technical books (commonly ...
  • ys_073
  • ys_073
  • 2012年11月20日 23:49
  • 506

543A - Writing Code (动态规划)

A. Writing Code time limit per test 3 seconds memory limit per test 256 megabytes input standa...
  • hellossg
  • hellossg
  • 2016年03月31日 22:42
  • 127

Codeforces544C:Writing Code(完全背包)

Programmers working on a large project have just received a task to write exactly m lines of code. T...
  • libin56842
  • libin56842
  • 2015年05月15日 13:48
  • 1463

C. Writing Code(Codeforces Round #302(div2)

C. Writing Code time limit per test 3 seconds memory limit per test 256 megabytes i...
  • caduca
  • caduca
  • 2015年05月08日 17:12
  • 3618

Writing Fast Matlab code 6-7

6 内联简单函数“内联一个函数”指用一个调用取代函数代码本身。注意你定义的M函数不要与MATLAB本身自带的函数混淆。如果你需要修改函数,在操作台键入:edit [函数名]以下函数值得内联: con...
  • lLYDl
  • lLYDl
  • 2015年10月11日 12:18
  • 334

codefroces 543A Writing Code dp优化 完全背包

Programmers working on a large project have just received a task to write exactly m lines of code. T...
  • Littlewhite520
  • Littlewhite520
  • 2017年04月30日 10:28
  • 174

《Writing clean code》读书笔记

  • hopesophite
  • hopesophite
  • 2006年08月02日 14:26
  • 2738
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Writing Cache-friendly Code
举报原因:
原因补充:

(最多只允许输入30个字)