代码优化-有效使用内存

最新推荐文章于 2023-03-08 19:07:20 发布

tofro

最新推荐文章于 2023-03-08 19:07:20 发布

阅读量1k

点赞数 1

分类专栏：算法和优化文章标签：优化数据结构编译器 cache 算法 up

本文链接：https://blog.csdn.net/tofro/article/details/7169629

版权

算法和优化专栏收录该内容

12 篇文章 0 订阅

订阅专栏

“代码优化-有效使用内存”
这本书某些思想和方法可以借鉴，但具体的方法跟处理器架构有关系，不可同一而论，书中
（原书及翻译版）也存在一些bug。
一、内存优化
1.展开循环
展开循环可以减小分支(循环次数），一般通过重复循环内的指令来实现
exp:
for(a = 0; a < 666; a++)
x+=p[a];
for(a = 0; a < 664; a+=4)
{
// This shows the approximate number of
// loop iterations to the nearest multiple of four.

   x+=p[a];          // The body
   x+=p[a + 1];      // of the loop
   x+=p[a + 2];      // is duplicated
   x+=p[a + 3];      // four times.
}
x+=p[a];             // The two remaining iterations
x+=p[a + 1];         // are added to the end.

如果循环次数是一个变量，则可以改写成
for(a = 0; a < (N & ~3); a += 4)
{
4次...
}
for(a = (N & ~3); a < N; a++)

2.消除数据相关性
如果请求的RAM单元存在地址-数据相关性，那么CPU就不能并行地处理它们，
而在得到地址之前必须等待（地址值的读取和运算）。
; /*-----------------------------------------------------------------------
; * Loop for reading dependent data
; * (nonoptimized version)
; -----------------------------------------------------------------------*/
for (a=0; a < BLOCK_SIZE; a += 32)
// The loop is unrolled to speed up the processing.

{
x = *(int *)((int)p1 + a + 0);
// The cell is read.

   a += x;
// The address of the next cell is calculated using the value of
// the previous cell. Therefore, the processor cannot send
// the next request to the chipset until it receives this cell.
// The code proceeds in a similar manner...
   y = *(int *)((int)p1 + a + 4);
   a += y;

x = *(int *)((int)p1 + a + 8);
a += x;

y = *(int *)((int)p1 + a + 12);
a += y;

x = *(int *)((int)p1 + a + 16);
a += x;

y = *(int *)((int)p1 + a + 20);
a += y;

x = *(int *)((int)p1 + a + 24);
a += x;

y = *(int *)((int)p1 + a + 28);
a += y;
}

; /*-----------------------------------------------------------------------
; *                   Loop for reading independent data
; *                       (optimized version)
; -----------------------------------------------------------------------*/
for (a=0; a<BLOCK_SIZE; a += 32)
{
   x += *(int *)((int)p1 + a + 0);
   y += *(int *)((int)p1 + a + 4);
   x += *(int *)((int)p1 + a + 8);
   y += *(int *)((int)p1 + a + 12);
   x += *(int *)((int)p1 + a + 16);
   y += *(int *)((int)p1 + a + 20);
   x += *(int *)((int)p1 + a + 24);
   y += *(int *)((int)p1 + a + 28);
// The processor could send the next request to the chipset
// without waiting for the previous request to be completed,
// because the cell address is not related to the data being processed.
}

注：书中的代码似乎有问题：循环变量a在循环中有修改，忌！
而优化版本中循环中并未修改a.

在MIPS里，对数据相关同样需注意。避免潜在的数据相关。
for(i=0;i<100;i++)
{
b[i]=a[0]+a[1];//读写内存同时进行，编译器会认为内存有修改，a[0]会重复读取
b[i+1]=a[0]+a[2];
...
}
可以利用临时变量,采取读内存，运算，写内存的分步操作
同时这样可以发挥寄存器的作用
for(i=0;i<100;i++)
{
int tmp, b1, b2;
tmp=a[0];
b1=tmp+a[1];
b2=tmp+a[2];
b[i]=b1;
b[i+1]=b2;
}

3.数据并行处理
线性读取无关数据（按字节顺序读取）并不能确保对该内容进行并行处理。应该按32字节的增量（cache line的大小?)读取。并且可以与预取内存结合起来。即
(1)以增量32访问内存
(2)循环访问每个32字节里面的内容。

4.优化引用数据结构
内存中的数据应尽可能紧密地放在一起。
(1) 字节对齐，减小数据结构大小
(2)减小引用数组的大小（用小的数据类型）
(3)注意页面内存大小（32K）及高速缓存大小（4K）的影响
核心代码or数据尽可能地在此范围之内

5.内存访问与计算的结合
在加载内存的同时，进行计算，以尽可能地提高并行处理。

6.读写操作的结合
一般来说，不要集中的读内存或者集中的写内存。

7.只有在必要时才访问内存
该问题与具体情况的算法及数据处理流程有关系。

tofro

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
代码优化-有效使用内存

“代码优化-有效使用内存”这本书某些思想和方法可以借鉴，但具体的方法跟处理器架构有关系，不可同一而论，书中（原书及翻译版）也存在一些bug。一、内存优化1.展开循环展开循环可以减小分支(循环次数），一般通过重复循环内的指令来实现exp:for(a = 0; a x+=p[a];for(a = 0; a {// This shows the appr
复制链接

扫一扫

专栏目录