Rearranging Loops to Increase Spatial Locality

原创 2013年12月04日 09:38:59

Consider the problem of multiplying a pair of n×n matrices: C = AB. For example, if n=2, then


Matrix multiply is usually implemented using three nested loops, which are identified by their indexes i,j, and k. The following two versions ijk and ijk share the same cycles per inner loop iteration:

// Version ijk                      // Version jik                     
for (int i=0; i!=n; ++i)            for (int j=0; j!=n; ++j)           
    for (int j=0; j!=n; ++j) {          for (int i=0; i!=n; ++i) {     
        sum = 0.0;                          sum = 0.0;                 
        for (int k=0; k!=n; ++k)            for (int k=0; k!=n; ++k)   
            sum += A[i][k]*B[k][j];             sum += A[i][k]*B[k][j];
        C[i][j] += sum;                     C[i][j] += sum;            
    }                                   }                              
The inner loops of the two routines scan a row of array A with a stride of 1 and a column of B with a stride of n. Supposing a block holds four words and the array size is so large that a single matrix row does not fit in the L1 cache, the miss rate for A is 0.25 misses per iteration and each access of array B results in a miss, for a total of 1.25 misses per iteration.

      To increase spatial locality, loops are rearranged as follows:

// Version kij                      // Version ikj                     
for (int k=0; k!=n; ++k)            for (int i=0; i!=n; ++i)           
    for (int i=0; i!=n; ++i) {          for (int k=0; k!=n; ++k) {     
        r = A[i][k];                        r = A[i][k];                 
        for (int j=0; j!=n; ++j)            for (int j=0; j!=n; ++j)   
            C[i][j] += r*B[k][j];             C[i][j] += r*B[k][j];
    }                                   }                             
The routines present an interesting trade-off: With two loads and a store, they require one more memory operation than version ijk and jik. On the other hand, since the inner loop scans both B and C row-wise with a stride-1 access pattern, the miss rate on each array is each array is only 0.25 misses per iteration, for a total of 0.50 misses per iteration for both version kij and ikj.
      It is concluded that: Pairs of versions with the same number of memory-references and misses per iteration have almost identical measured performance (cycles per inner loop iteration); Miss rate, in this case, is a better predictor of performance than the total number of memory access; For large value of n, the performance of the faster pair of versions (kij and ikj) is const.

Using Blocking to Increase Temporal Locality

In the last essay Rearranging Loops to Increase Spatial Locality we saw how some simple rearrangeme...
  • zhangyubingcatherine
  • zhangyubingcatherine
  • 2013年12月04日 10:27
  • 752

memory wall/Spatial locality/Temporal locality/Memory Latency/

 Generally speaking, memory bus bandwidth has not seen the same improvement as CPU performance (an ...
  • zhuliting
  • zhuliting
  • 2010年12月27日 20:57
  • 989

[Hadoop]Hadoop上Data Locality

Hadoop上的Data Locality是指数据与Mapper任务运行时数据的距离接近程度(Data Locality in Hadoop refers to the“proximity” of t...
  • SunnyYoona
  • SunnyYoona
  • 2016年12月26日 17:47
  • 929

HBase File Locality in HDFS

罪过啊,之前的几篇翻墙文章已经全部都转过来了,但是这篇却给忘记了。 文章的大意就是hbase是否会保证RegionServer所管理的数据在本地就可以拿到,或者到最近的地方就可以拿到。 文章来源:ht...
  • macyang
  • macyang
  • 2011年03月23日 00:06
  • 1524

Codeforces Gym 101158 A. Rearranging a Sequence

题意有序的 1~N 数字,m 个操作,每次给出 x 值,将数字 x 提到序列的首位。求最终的序列中的每个数。解题思路将每个数字所在序列中的位置进行标记,标记代表更新的时间。可以考虑更新的时间越新,表示...
  • DorMOUSENone
  • DorMOUSENone
  • 2017年07月09日 15:03
  • 425

局部敏感哈希(Locality Sensitive Hashing)

局部敏感哈希(Locality Sensitive Hashing)     计算item之间的相似项,计算item的top M最相似item,协同过滤计算(user b...
  • pi9nc
  • pi9nc
  • 2013年10月03日 10:39
  • 6231

数据局部性(data locality)

  • lanchunhui
  • lanchunhui
  • 2016年09月20日 10:53
  • 1453

图解Oracle 表连接优化之嵌套循环连接(Nested loops join)

  • seagal890
  • seagal890
  • 2014年06月22日 23:30
  • 6761

oracle执行计划中NESTED LOOPS SEMI (即半嵌套循环)的解释

在存在in的子查询的SQL语句和存在EXISTS的相关子查询的SQL语句的执行计划里,有NESTED LOOPS SEMI (即半嵌套循环)。 所谓的NESTED LOOPS SEMI (即半嵌套...
  • haiross
  • haiross
  • 2014年12月25日 11:27
  • 3483

Python 入门教程 13 ---- Loops

第一节      1 介绍了另外一种循环while循环      2 while循环的结构如下         while condition:                   state...
  • cgl1079743846
  • cgl1079743846
  • 2013年09月30日 10:51
  • 2699
您举报文章:Rearranging Loops to Increase Spatial Locality