【深入理解计算机系统】CSAPP-实验五：CacheLab(未完成）

最新推荐文章于 2024-06-22 16:12:18 发布

热爱学习的贾克斯

最新推荐文章于 2024-06-22 16:12:18 发布

阅读量2k

点赞数 4

分类专栏：笔记文章标签： c++ 操作系统

本文链接：https://blog.csdn.net/qq_42234461/article/details/108790705

版权

笔记专栏收录该内容

12 篇文章 7 订阅

订阅专栏

前言

本章帮助理解告诉缓存对程序性能的影响。包含两个部分：

第一部分是写两三百行代码来模拟高速缓存。

第二部分是优化矩阵转置计算，使得cache miss次数最少。

本机使用win10 +wsl2.0 + ubuntu18.04完成实验。

点击查看我的全部代码

倒是CMU的官网打开了。两天了。

从github上下载了实验数据。

reference

CSAPP-Labs-实验材料-含有PDF

CSAPP-Labs-实验材料-含有源文件

深入理解计算机系统-cachelab

CSAPP-cachelab 解题思路记录

linux环境valgrind 安装

PART A

任务：在csim.c中完成cache模拟器的编码。自己写的程序需要输出与目标文件一样。cache使用LRU策略。

linux> ./csim-ref -v -s 4 -E 1 -b 4 -t traces/yi.trace
L 10,1 miss
M 20,1 miss hit
L 22,1 hit
S 18,1 hit
L 110,1 miss eviction
L 210,1 miss eviction
M 12,1 miss eviction hit
hits:4 misses:5 evictions:3

命令行参数

Usage: ./csim-ref [-hv] -s <s> -E <E> -b <b> -t <tracefile>

其中各参数意义如下：

①-h：输出帮助信息的选项；

②-v：输出详细运行过程信息的选项；

③-s：组索引的位数(意味着组数S=2^s)；

④-E：每一组包含的行数；

⑤-b：偏移位的宽度(意味着块的大小为B=2^b);

⑥-t：输入数据文件的路径(测试数据从该文件里面读取)。

指令解析

Instruction Name	Meaning	Description
I	instruction load	an instruction load 不用管
L	data load	data load 即访问一次
S	data store	data store 即访问一次
M	data modify	data modify (i.e., a data load followed by a data store). 即访问两次

每个“I”前面都没有空格。每个“M”，“L”和“S”之前总是有空格。

这里解释一下traces/yi.trace的执行过程。(S,E,B,m) :(S = 2^4,E = 1,B = 2^4, 64) .

即共有16个组，每个组只有一行（退化为直接映射高速缓存）。

b=4, B = 16

s=4, S = 16

t = m - b - s = …

yi.trace

格式：Instruction name - address - size

 L 10,1
 M 20,1
 L 22,1
 S 18,1
 L 110,1
 L 210,1
 M 12,1

①对于地址0x10进行访问：

0x10=0000…00010000，偏移值为最低四位，故S=1;

访问结果为mis;

②连续对地址0x20进行连续两次访问：

0x20=000…00100000，S=2;

结果为第一次mis，第二次hit；

③对地址0x22进行访问：

0x22=000…00100100，S=2;

由于操作②以将该块存入高速缓存，故结果为hit;

④对地址0x18进行访问：

0x18=000…00011000，S=1;

由于操作①以将该块存入高速缓存，故结果为hit;

⑤对地址0x110进行访问：

0x110=0…000100010000，S=1;

虽然操作①使得第一组(只有一行有效)，但是这里的标志位的值Tag为1

故结果为先mis，后eviction;

⑥对地址0x210进行访问：

0x210=0…001000010000，S=1;

同操作⑤，但是这里的标志位的值为2，不匹配

故结果为先mis，后evicton;

⑦对地址0x12进行连续两次访问：

0x12=000…00000010010，S=1;

由于标志位不匹配，故第一次访问时mis，并evicton

第二次访问时当然就是hit。

可以确定：

同一组缓存之后，无论它的块偏移是多少，都会hit。
只要是M操作（访问两次），第二次一定是hit。

Each data load (L) or store (S) operation can cause at most one cache miss. The data modify operation(M) is treated as a load followed by a store to the same address. Thus, an M operation can result in two cache hits, or a miss and a hit plus a possible eviction.

思路

PART A只是对组相联高速缓存的简单的模拟，没有时间空间复杂度要求，没有什么难度。只是对代码

程序显然是这个流程进行的：

命令行的解析。解析后需要初始化某些超参数。
读取文件，每行每行进行结果处理。
将最后结果打印。

代码实现

调用

sudo ./csim -v -s 4 -E 1 -b 4 -t traces/dave.trace

sudo ./csim-ref -v -s 4 -E 1 -b 4 -t traces/dave.trace

输入

 L 10,4 
 S 18,4
 L 20,4
 S 28,4
 S 50,4

步骤1：命令行解析

    //命令行解析部分
    int verbose = 0;
    int s,E,b;
    char* t;
    int ch;
    while ((ch = getopt(argc, argv, "s:E:b:t:v")) != -1){
         switch (ch) {
            case 's':
                s = atoi(optarg);
                break;
            case 'E':
                E = atoi(optarg);
                break;
            case 'b':
                b = atoi(optarg);
                break;
            case 't':
                t = optarg;
                break;
            case 'v':
                verbose = 1;
                break;
            default:
                exit(-1);
         }
    }

    printf("输入命令行：s:%d E:%d b:%d verbose:%d t:%s\n",s,E,b,verbose,t);
  int S = 1<<s;
  int B = 1<<b;

输出如下

输入测试：s:4 E:1 b:4 verbose:1 t:traces/dave.trace

步骤2：读取文件

    //读取文件
    FILE* fp = fopen(t, "r");
    if(fp == NULL)
    {
      printf("%s: No such file or directory\n", t);
      exit(1);
    }
    /*
        定义数据结构；读入数据；拟真
    */
    int type,address,size;
    while(fscanf(fp, " %c %lx,%d", &type, &address, &size) != EOF)
    {
      if(type == 'I'){
          continue;
      }
      else{
          //接口进入
      }
    }

步骤3：拟真

每行含有：一个有效位、t个标记位，b个高速缓存位。以及支持LRU的时间。可以定义一个struct支持。
有S个组，每个组有E行。可以定义一个二维数组。

详细请查看代码吧。注释写得很详细。

/*
    NAME: Jaxchan   
    GITHUB: Jaxchan25
*/
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "unistd.h"
#include "getopt.h"
#include "time.h"
#include "cachelab.h"

int hits = 0 ,misses = 0 ,evictions = 0; 
int DEBUG = 0;

typedef  struct
{
  int valid;//有效位
  unsigned long tag; //标记位
    //这里是否需要存高速缓存块呢
  clock_t time_stamp;

}cache_line;

/*
  @brief: 初始化二维数组，动态分配内存
  @param:S 多少个组
  @param:E 每组有多少个行
*/
cache_line** initiate(int S,int E){
  cache_line** cache;
  cache = (cache_line** ) malloc(sizeof(cache_line) * S); 
    for(int i=0;i<S;i++)
    {
      int size = sizeof(cache_line) * E;
      cache[i] = (cache_line* ) malloc(size);
      memset(cache[i], 0, size);
    }
  return cache;
}

/*
  清理二维数组
*/
int clean(cache_line** c, int S){
  /* cache_line 二维数组的清理工作 */
  int i;
  for(i=0;i<S;i++)
    {
      free(c[i]);
    }
  free(c);
  return 0;
}


int isHit
(
  cache_line* cache_line_group,
  int E,
  unsigned long t
){

  //DEBUG
  if(DEBUG){
    printf("\n DEBUG IN isHit \t");
    for(int i =0;i<E;i++){
        printf("\n INDEX: %d ",i);
        printf(" FOUND USING TAG: %lx \t",cache_line_group[i].tag);
    }
    printf("\nEND DEBUG IN isHit\n");
  }

  for(int i =0;i<E;i++){
    //hit：有效位 =1 ,且标记位对得上
    if (cache_line_group[i].valid==1&&cache_line_group[i].tag == t){
      cache_line_group[i].time_stamp = clock();
      hits+=1;
      printf(" FOUND USING TAG: %lx \t",t);
      return 1;
    }
  }
  return 0;
}

int putInCache(
  cache_line* cache_line_group,
  int E,
  unsigned long t
){

  //直接可以放在空位。
  for(int i =0;i<E;i++){
    //hit：有效位 =1 ,且标记位对得上
    if (cache_line_group[i].valid==0){
      cache_line_group[i].valid = 1;
      cache_line_group[i].tag = t;
      cache_line_group[i].time_stamp = clock();
      printf(" PLACE IN BLANK USING TAG: %lx \t",t);
      printf(" UPDATE TAG: %lx\t",cache_line_group[i].tag);
     return 0;
    }
  }

  //需要进行evictions
  //搜索
  int LRU_index = 0;
  clock_t LRU_time_stamp = cache_line_group[0].time_stamp; //时间越久，数值越少
  for(int i =0;i<E;i++){
    if (cache_line_group[i].time_stamp<LRU_time_stamp){
      LRU_index = i;
      LRU_time_stamp = cache_line_group[i].time_stamp;
    }
  }
  //进行
  cache_line_group[LRU_index].tag = t;
  cache_line_group[LRU_index].time_stamp = clock();
  evictions+=1;
  printf(" PLACE BY EVICTIONS USING TAG: %lx\t",t);
  printf(" UPDATE TAG: %lx\t UPDATE INDEX: %d\t",
    cache_line_group[LRU_index].tag,LRU_index);
  return 1;
}

void print_verbose(char* pre, char type, int hit_miss, int eviction){
  /* 命令行带 -v 的话的详细数据输出函数 */
  char* h = hit_miss?" hit":" miss";
  char* e = eviction?" eviction":"";
  char* format;
  if(type == 'M')
    { 
      //如果是M模式，最后一定是hit
      format = "%s%s%s\n";
      strcat(pre, format);
      printf(pre, h, e, " hit");
    }
  else
    { 
      format = "%s%s\n";
      strcat(pre, format);
      printf(pre, h, e);
    }
}


/*
  @brief: 访存核心部分。处理各个指令，判断是否hit\miss\evictions
  @param: cache 缓存，二维数组表示
  @param: type，命令指令类型，分别为L\S\M
  @param: address，命令访问地址，64位长度hex
  @param: size,命令访问块的大小，单位字节。这里不影响解题，只是输出就好。
  @param: b_mask s_mask t_mask都是cache本身的超参数，做成了mask是方便计算。
  @param: E,b 是cache本身的超参数。
  @param: hits misses evictions 击中\丢失\替换的累计值
*/
void handleCore(
  cache_line** cache,
  char type,
  unsigned long address,
  int size,
  unsigned long b_mask,
  unsigned long s_mask,
  unsigned long t_mask,
  int E,
  int b,
  int verbose
){
  //对地址解析
  unsigned long t = address&t_mask; 
  unsigned long s =  address&s_mask;

  //取出组
  unsigned long s_id = s>>b;
  cache_line* cache_line_group = cache[s_id];
  printf("组号：%lu ,tag: %lx    \t",s_id,t);

  //查看是否hit，如果hit直接更新时钟。
  int flag_hit = isHit(cache_line_group,E,t);

  //如果没有hit，就要evict或放入空位置
  int flag_evict = 0;
  if(flag_hit==0){
    misses+=1;
    flag_evict = putInCache(cache_line_group,E,t);
  }

  //如果指令是M，需要额外+1个hit
  if(type=='M'){
    hits+=1;
  }

  //输出一下
  if(verbose)
  { char pre[20];
    sprintf(pre, "%c %lx,%d", type, address, size);
    print_verbose(pre, type, flag_hit, flag_evict);
  }


}




int main(int argc, char * argv[])
{      
  //命令行解析部分
  int verbose = 0;
  int s,E,b;
  char* t;
  int ch;
  while ((ch = getopt(argc, argv, "s:E:b:t:v")) != -1){
        switch (ch) {
          case 's':
              s = atoi(optarg);
              break;
          case 'E':
              E = atoi(optarg);
              break;
          case 'b':
              b = atoi(optarg);
              break;
          case 't':
              t = optarg;
              break;
          case 'v':
              verbose = 1;
              break;
          default:
              exit(-1);
        }
  }
  printf("输入命令行：s:%d E:%d b:%d verbose:%d t:%s\n",s,E,b,verbose,t);
  int S = 1<<s;
  //int B = 1<<b;
  /*
    初始化二维数组
  */
  cache_line** cache;
  cache = initiate(S,E);


  //读取文件
  FILE* fp = fopen(t, "r");
  if(fp == NULL)
  {
    printf("%s: No such file or directory\n", t);
    exit(1);
  }
  /*
      定义数据结构；读入数据；拟真
  */

  unsigned long b_mask = 1<<b; 
  b_mask -=1; //0000...0 00001111

  unsigned long s_mask = 1<<(s+b) ;
  s_mask = (s_mask-1)^ b_mask;//0000...0 11110000

  unsigned long t_mask = 1<<(s+b);
  t_mask = (t_mask - 1 )^(~0);//1111...1 00000000

  printf("b_mask: %lx ",b_mask);
  printf("s_mask: %lx ",s_mask);
  printf("t_mask: %lx \n",t_mask);

  char type;
  int size;
  unsigned long address;
  while(fscanf(fp, " %c %lx,%d", &type, &address, &size) != EOF)
  {
    if(type == 'I'){
        continue;
    }
    else{
        //接口进入
        handleCore(cache,type,address,size,b_mask,s_mask,t_mask,E,b,verbose);

    }
  }

  printSummary(hits, misses, evictions);
  clean(cache,S);
  fclose(fp);
  return 0;
}

测试

tset-case1:
sudo ./csim -v -s 4 -E 1 -b 4 -t traces/dave.trace
sudo ./csim-ref -v -s 4 -E 1 -b 4 -t traces/dave.trace

test-case2:
sudo ./csim -v -s 4 -E 2 -b 4 -t traces/yi.trace
sudo ./csim-ref -v -s 4 -E 2 -b 4 -t traces/yi.trace

make
sudo ./test-csim

在这里插入图片描述

这样27分就是满分了。

PART B

任务

在trans.c中完成矩阵转置函数transpose_submit，并且越少的Miss越好。

限制：

最多使用12 local variables int。不能作弊。
不可递归。
用helper函数的话，栈总共不超12 variables
A不许修改。
不允许用数组和malloc.

评估

评估方法：

• 32 × 32: 8 points if m < 300, 0 points if m > 600
• 64 × 64: 8 points if m < 1300, 0 points if m > 2000
• 61 × 67: 10 points if m < 2000, 0 points if m > 3000

思路

cache参数：(s = 5, E = 1, b = 5,m=64) 。

即32组，每组一行，每行存32字节，即8个int。

所以一共可以存32 * 8 = 256个int。

要知道的是：

每次加载，都是加载线性空间的Block，Block大小就是缓存B的大小。

测试

sudo make
sudo ./test-trans -M 4 -N 4
sudo ./test-trans -M 32 -N 32
sudo ./test-trans -M 64 -N 64
sudo ./test-trans -M 61 -N 67
sudo ./csim-ref -v -s 5 -E 1 -b 5 -t trace.f0

参考

不错的解析：

Reference 1

Reference 2

思路

从小的开始：4X4

昨天我认真又看了一下书以及习题。PART B事实上就是书本练习6.17.

于是我们把PART A，自己写的工具用来测试一下。参数和PART B 要求不变，仍然是(s = 5, E = 1, b = 5,m=64)。输入矩阵则是4 x 4。

我们可以任意输出debug信息。

sudo make
sudo ./test-trans -M 4 -N 4
sudo ./csim -v -s 5 -E 1 -b 5 -t trace.f0

又有Reference 1的解析，我们可以知道：

miss过多的原因是，两者一开始的访问index很接近，又都加载到cache的同一行，所以会不断的eviction，从而发生抖动。
所以，我们的最小单位起码是block，即在src读取一个block之后，一次性全部加载到寄存器（local variables)中；然后再写入dst。

后面的详解，我查看了大佬们的解答，觉得很有学问，暂时无法点评。

然后我就又暂时放下了这个实验。

后话

这次实验很难。PART A是从无到有写一个bug-free代码，挺考验功底的。

PART B则是高难度的性能分析了，需要很熟悉。后面如果碰到对这里有要求，会再回来嗑这个实验。

由于做得太菜，就不要留言交流了。

热爱学习的贾克斯

关注

4
点赞
踩
26

收藏

觉得还不错? 一键收藏
0
评论
【深入理解计算机系统】CSAPP-实验五：CacheLab(未完成）

前言本章帮助理解告诉缓存对程序性能的影响。包含两个部分：第一部分是写两三百行代码来模拟高速缓存。第二部分是优化矩阵转置计算，使得cache miss次数最少。本机使用win10 +wsl2.0 + ubuntu18.04完成实验。点击查看我的全部代码倒是CMU的官网打开了。两天了。从github上下载了实验数据。referenceCSAPP-Labs-实验材料-含有PDFCSAPP-Labs-实验材料-含有源文件深入理解计算机系统-cachelabCSAPP-cachelab 解题思
复制链接

扫一扫