Using Blocking to Increase Temporal Locality

原创 2013年12月04日 10:27:53

In the last essay Rearranging Loops to Increase Spatial Locality we saw how some simple rearrangements of the loops could increase spatial locality. But observe that even with good loop nestings, the time per loop iteration increases with increasing array size. What is happening is that as the array size increases, the temporal locality decreases, and the cache experiences an increasing number of capacity misses. To fix this, we can use a general technique called blocking.

      The general idea of blocking is to organize the data structures in a program into large chunks called blocks. (In this context, the term “block” refers to an application-level chunk of data, not a cache block.) The program is structured so that it loads a chunk into the L1 cache, does all the reads and writes that it needs to on that chunk, then discards the chunk, loads in the next chunk, and so on.

      Blocking a matrix multiply routine works by partitioning the matrices into submatrices and then exploiting the mathematical fact that these submatrices can be manipulated just like scalars. For example, if n = 8, then we could partition each matrix into four 4×4 submatrices:


The version of  blocked matrix multiplication, which we call the bijk version is presented below. The basic idea behind this code is to partition A and C into 1×bsize row slivers and to partition B into bsize×bsize blocks. The innermost (jk) loop pair multiplies a sliver of A by a block of B and accumulates the result into a sliver of C. The i loop iterates through n row slivers of A and C, using the same block in B.

void bijk(array A, array B, array C, int n, int bsize)
		double sum = 0.0;
		int en = bsize*(n/bsize); // Amount that fits evenly into blocks 
		for (int i=0; i!=n; ++i)
		    for (int j=0; j!=n; ++j)
		        C[i][j] = 0.0;
		for (int kk=0; kk < en; kk += bsize) {
		    for (int jj=0; jj < en; jj += bsize) {
		        for (int i=0; i!=n; ++i) {
		            for (int j=jj; j != jj+bsize; ++j) {
		                sum = C[i][j];
		                for (int k=kk; k != kk+bsize; ++k) {
		                    sum += A[i][k]*B[k][j];
		                C[i][j] = sum;

      The key idea is that it loads a block of B into the cache, uses it up, and then discards it. References to A enjoy good spatial locality
because each sliver is accessed with a stride of 1. There is also good temporal locality because the entire sliver is referenced bsize times in succession. References to B enjoy good temporal locality because the entire bsize×bsize block is accessed times in succession. Finally, the references to C have good spatial locality because each element of the sliver is written in succession. Notice that references to C do not have
good temporal locality because each sliver is only accessed one time.

      Blocking can make code harder to read, but it can also pay big performance dividends. Blocking improves the running time by a factor of two over the best non-blocked version, from about 20 cycles per iteration down to about 10 cycles per iteration.

CentOS7下Elastic Stack 5.0日志分析系统搭建

一、概述        Elasticsearch是个开源分布式搜索引擎,它的特点有:分布式,零配置,自动发现,索引自动分片,索引副本机制,restful风格接口,多数据源,自动搜索负载等。 ...
  • amm28824
  • amm28824
  • 2017年01月08日 21:47
  • 1844

让页面滑动流畅得飞起的新特性:Passive Event Listeners

【前言】 在不久前的Google I/O 2016 Mobile Web Talk中,Google公布了一个让页面滑动更流畅的新特性Passive Event Listeners。该特性...
  • dj0379
  • dj0379
  • 2016年10月21日 14:57
  • 2364


  • weini1111
  • weini1111
  • 2017年03月05日 15:50
  • 8353

VARCHART XGantt应用实例:用于To-Increase项目管理图形化

Visual Job Planner     对于一个公司尤其是复杂的项目来说时间的分配决定一切。To-Increase作为Microsoft Dynamics全球领先的ISV,使用NETRON...
  • pk52020081
  • pk52020081
  • 2015年05月12日 09:22
  • 352

How to increase MySQL memory limit?

The question – I have a lot of RAM on my machine. How can I increase the memory limits used by MySQL...
  • boy317
  • boy317
  • 2016年09月27日 11:20
  • 167

5种提高认知潜能的方法(You can increase your intelligence: 5 ways to maximize your cognitive potential )

今天逛新浪微博时发现了一篇scientific american的文章,写得挺好:一个是文笔,读起来通俗易懂;另一个是内容,有理有据。原文在这里。而且作者态度显得非常自信(大概意思是,只要你照着我说的...
  • BusyCai
  • BusyCai
  • 2011年04月07日 18:44
  • 2543

How to Increase the Memory Limit for 32-bit Applications in Windows 64-bit OS

1. Go to Control Panel, and click view by “small icons” in the top right hand corner 2. Click Syste...
  • andy_212
  • andy_212
  • 2012年06月04日 14:01
  • 404

Violence detection in video using spatio-temporal features

  • 2015年02月06日 04:38
  • 986KB
  • 下载

Event tacticanalysis insports video using spatio-temporal pattern

  • 2014年04月17日 15:20
  • 238KB
  • 下载

VMworld 2009 - TA2467: Best Practices to Increase Availability and Throughput for the Future of VMware

  • 2009年12月13日 09:49
  • 3.29MB
  • 下载
您举报文章:Using Blocking to Increase Temporal Locality