体系结构实验（6）—— Cache映射策略

最新推荐文章于 2023-09-21 05:00:00 发布

zyw2002

最新推荐文章于 2023-09-21 05:00:00 发布

阅读量1.7k

点赞数 3

分类专栏： # 计算机体系结构文章标签： cache

本文链接：https://blog.csdn.net/zyw2002/article/details/127342321

版权

计算机体系结构专栏收录该内容

7 篇文章 12 订阅

订阅专栏

文章目录

Charpter6: Lab Cache Mapping Strategies and Performance Analysis of Data Access Streams
1. 关键参数解读
2. 访问数据流
3. 缓存模拟
4. 缓存性能
- 4.1 直接映射
- 4.2 二路组相连映射
5. 心得体会

Charpter6: Lab Cache Mapping Strategies and Performance Analysis of Data Access Streams

Based on the following C program which is run in a real computer:

int i, j, c, stride, array[256]
&hellip;
for (i=0; i<10000; i++)
for(j=0; j<256; j=j+stride)
		 c= array[j]+5;

if we consider only the cache activity generated by references to the array and we assume that integers are words, what is the expected miss rate when the cache is direct-mapped and stride=132? How about if stride=131? Would either of this change if the cache were two-way set associative?

如果我们只考虑由对数组的引用产生的缓存活动，并且我们假设整数是单词，那么当缓存是直接映射且stride=132时，预期的遗漏率是多少?如果stride=131呢?如果缓存是双向设置关联的，那么这两种情况会发生变化吗?

(1) Extract the data access streams in this program at stride of 132 and 131 respectively;

(1) 提取该程序中步幅分别为132和131的数据访问流;

(2) Run the cache simulator configured with four-word (16-byte) blocks and 256 bytes of size. Access the cache using the data access streams obtained at (1) in a direct-mapped and two-way set associative respectively. Analyze the miss rates.

(2)运行配置为4字(16字节)块和256字节大小的缓存模拟器。使用在(1)处获得的数据访问流分别在一个直接映射集和双向集合关联中访问缓存。分析失误率。

(3) Analyze the cache performance of the above code segment by using the data cache simulation tools in Mars simulator.

(3)利用Mars模拟器中的数据缓存仿真工具分析上述代码段的缓存性能。

1. 关键参数解读

参数名	含义
choice	映像方式的选择；1：直接映射、2：组相连映射、3：全相连映射
catchsize	cache的大小; 以字节为单位
blocksize	Block的大小；以字为单位
assoc	n-way 组相连；
accesscount	请求次数；等于project.txt中值得个数
hitcount	命中次数；在cache找到的次数
hitrate	命中率；hitrate=hitcount/accesscount
misscount	未命中的次数；没有在catch中找到的次数
missrate	未命中的次数
c1c,c2c,c3c	不同类型的失效次数
blockinbyte	块的字节大小；一个块占用多少个字节，blockinbyte=blocksize*4
NOofblock	块个数；NOofblock= catchsize/blockinbyte
NOofset	组个数；NOofset=NOofblock/assoc
bytearray[]	要访问的数据的字节地址，等于project.txt中的值
wordaddress[]	要访问的数据的字地址；wordaddress[]=bytearray[]/4
blockaddress[]	数据的块地址；wordaddress[]/blocksize
index	索引位(组地址)；index=blockaddress[]%NOofset
tag	标识位；tag=blockaddress[]/NOofset
*valid	有效位；有效为1，失效为0
lru[index][z]	最近未被使用的次数

2. 访问数据流

当stride =132或者131时，程序相当于循环访问偏移地址为0和132或者131的内容，循环i 次，及总共进行了2i次存储。

（考虑到内存有限，在此仅采用100次循环测试）

    int tempnum=1;
    for(i=0;i<100;i++) // 生成0和132或131的循环访问序列
    {
        if(tempnum>0){
            bytearray[i]=0;
        }else{
            bytearray[i]=132*4;// stride=132
           // bytearray[i]=131*4; // stride =131
        }
        tempnum=-tempnum;
    }

上述的代码实现了共100次的访存序列。

3. 缓存模拟

Run the cache simulator configured with four-word (16-byte) blocks and 256 bytes of size. Access the cache using the data access streams obtained at (1) in a direct-mapped and two-way set associative respectively. Analyze the miss rates.

catch的大小为256个字节，块大小为16个字节，因此缓存中块的个数为 $256/16 = 16 个$ 。

3.1 直接映射

3.1.1 stride=131

array[0]的块地址为0，映射的缓存块号为: 0 mod 16 =0;

array[132]的块地址为131/4=32，映射的缓存块号为33 mod 16=0;

在第一次访问时，缓存中什么数据都没有，因此会发生强制性时效。

然后catch块号为0的块，不停的被替换写入，一直会产生冲突失效。

因此其失效率应该为100%。
在这里插入图片描述

3.1.1 stride=132

array[0]的块地址为0，映射的缓存块号为: 0 mod 16 =0;

array[132]的块地址为132/4=33，映射的缓存块号为33 mod 16=1;

在第一次访问时，缓存中什么数据都没有，因此会发生强制性时效。

然后缓存中会一直保存该数值，不会在发生时效。

因此一共时效次数为2，一共发生了100次访存。因此失效率为 $2/100 = 0.02$

实验结果如下如所示：

在这里插入图片描述

3.2 二路组相连映射

当为2路组相连时，组数=16/2=8；

3.2.1 stride=131

array[0]的块地址为0，映射到cache组号为：0 mod 8 =0;

array[131]的块地址为131/4=32，映射到cache的组号为32 mod 8=0;

第一次访问cache时，会发生强制性失效，之后是2路组相连，数据都被调入第0组，不会发生失效。因此一共时效次数为2，一共发生了100次访存。因此失效率为 $2/100 = 0.02$

在这里插入图片描述

3.2.1 stride=132

array[0]的块地址为0，映射到cache组号为：0 mod 8 =0;

array[131]的块地址为132/4=33，映射到cache的组号为32 mod 8=1;

第一次访问cache时，会发生强制性失效，之后是2路组相连，数据已经被调入到第0号和第1号块，不会发生失效。因此一共失效次数为2，一共发生了100次访存。因此失效率为 $2/100 = 0.02$ 。

在这里插入图片描述

4. 缓存性能

编写可以实现相同功能的Mars代码如下：

       ADDI $t2,$t2,0    # t2=0
       ADDI $t3,$t3,0    # t3=0
       ADDI $t4,$t4,0    # t4 =0
$oLoop: 
       sub $t3,$t3,$t3        # t3=0
       addi $t2,$t2,1        # t2=t2+1
       slti $t6,$t2,10001   # 如果t2<10001 则 t6=1
       bnez $t6,$iLoop	   # t6!=0 则继续循环，跳转
       jal $finish	   # 跳出外层循环
$iLoop:
       slti $t6,$t3,1024                    # 如果t3<1024 则 t6=1    1024=256*4
       beqz $t6,$oLoop	   # 如果t6不等于0，则跳转
       lw $t5,0x10010000($t3)         # t5= 0x10010000(t3)
       addi $t4,$t5,5	                   # t4=t5+5 ， c=array[j]+5
       addi $t3,$t3,524	                   # t3=t3+524 ,stride   524=131*4   528=132*4
       jal $iLoop                            # 跳转，继续内层循环 
$finish:   nop

打开mars,选择Data Cache Simulator

在这里插入图片描述

4.1 直接映射

选择Direct Mapping 置换策略选择LRU, Number of blocks 选择16个，Cache block size(words) 选择4 ，总共的Cache size 是256.

stride=131

修改addi $t3,$t3,524 中的数值为524=131*4。然后循环10000次，即相当于访问20000次。所有的访问都失效，命中率为0。

在这里插入图片描述

stride=132

修改addi $t3,$t3,528 中的数值为524=132*4。然后循环10000次，即相当于访问20000次。只发生了最开始2次访问的强制失效。因此命中率=19998/20000 约等于 100%

在这里插入图片描述

4.2 二路组相连映射

选择N-way Set Associative , Set size 选择2 ，置换策略选择LRU, Number of blocks 选择16个，Cache block size(words) 选择4 ，总共的Cache size 是256.

stride =131

修改addi $t3,$t3,524 中的数值为524=131*4。然后循环10000次，即相当于访问20000次。只发生了最开始2次访问的强制失效。因此命中率=19998/20000 约等于 100%
stride =132

修改addi $t3,$t3,528 中的数值为524=132*4。然后循环10000次，即相当于访问20000次。只发生了最开始2次访问的强制失效。因此命中率=19998/20000 约等于 100%

在这里插入图片描述

综上，可以看出Mars模拟器上的测试结果与第三节中阐述的理论相吻合。

5. 心得体会

通过本次实验，我加深了对于Cache缓存机制的理解，更深刻的体会了组相连映射和直接映射的区别。以及数据在内存中存放的位置不同对于访问效率的影响。总的来说，组相连的效果较直接相连更好。以及数据的访问流对于访存的效率影响可能很大。

zyw2002

关注

3
点赞
踩
6

收藏

觉得还不错? 一键收藏
打赏
0
评论
体系结构实验（6）—— Cache映射策略

array[131]的块地址为131/4=32，映射到cache的组号为32 mod 8=0;array[131]的块地址为132/4=33，映射到cache的组号为32 mod 8=1;array[132]的块地址为131/4=32，映射的缓存块号为33 mod 16=0;array[0]的块地址为0，映射到cache组号为：0 mod 8 =0;array[0]的块地址为0，映射的缓存块号为: 0 mod 16 =0;array[0]的块地址为0，映射的缓存块号为: 0 mod 16 =0;
复制链接

扫一扫