CS61C (su20) lab07

cs61c_lab7

Exercise 1:

Scenario 1:
1. Because (Step Size) * 4 in bytes (32 bytes) is exactly equal to Cache Size in bytes.
2. hit rate will always be zero. because the index portion is always 00, and the address we access is always mapped to the first set.However the tag portion is always different, so the first cache line is always evicted so that the hit rate will always be zero.
3. Change Step Size to 1

Scenario 2
1. There are two memory accesses.
2. One miss followed by three hit. Because every four accesses will increase index portion by one which lead one miss, and the next there accesses will not change the flag and index portion.
3. The hit rate is 75%.
4. When Rep Count goes to infinity, the hit rate will be very closed to 100%. Because all data in array has been cached.
5. We can access part of the array which size is equal to the size of cache size at a time and apply all of the function to the part of the array so we can be completely done with it before moving on, thereby keeping that the data of portion hot in the cache and not having to circle back to it later on.

Scenario 3
1. The hit rate of our L1 cache is 50% and the hit rate of L2 cache is 0%. Overall hit rate is 50%.
2. We have 32 accesses to L1 cache total. 16 of them misses.
3. We have 16 accesses to L2 cache total. They are the accesses which missed in L1 cache.
4. We can increase Rep Count to increase our L2 hit rate.Because the first iteration cache all the data to L2 cache, so L2 hit rate has increased. However the tag portion changed in L1 cache and the data are not cached, so our L1 hit rate will not change.
5. When we increase the number of blocks in L1, the hit rates for L1 and L2 will not change. When We increase the L1 block size, the hit rate for L1 will increase and for L2 will not change.

Exercise 2:
	ijk:    n = 1000, 1.702 Gflop/s
	ikj:    n = 1000, 0.160 Gflop/s
	jik:    n = 1000, 1.647 Gflop/s
	jki:    n = 1000, 9.107 Gflop/s
	kij:    n = 1000, 0.155 Gflop/s
	kji:    n = 1000, 8.787 Gflop/s
  1. jki loop order performs best. Because in the innermost iteration, it moves through B with stride 0 and moves through A and C with stride 1.
  2. kij loop order performs worst. Because in the innermost iteration, it moves through B and C with stride n.
  3. We stride through the matrices with respect to the innermost loop continuous can reduce the time to multiply the matrices.
Exercise 3:

代码实现如下:

void transpose_blocking(int n, int blocksize, int *dst, int *src) {
    // YOUR CODE HERE
    for (int bx = 0; bx + blocksize <= n; bx += blocksize) {
        for (int by = 0; by + blocksize <= n; by += blocksize) {
            for (int x = 0; x < blocksize; x ++) {
                for (int y = 0; y < blocksize; y ++) {
                    dst[by + y + (bx + x) * n] = src[bx + x + (by + y) * n];
                }
            }
        }
    }
    if (n % blocksize == 0) return;
    int ex = (n / blocksize) * blocksize;
    int ey = ex;
    for (int x = ex; x < n; x ++) {
        for (int y = 0; y < ey; y ++) {
            dst[y + x * n] = src[x + y * n];
        }
    }
    for (int x = 0; x < ex; x ++) {
        for (int y = ey; y < n; y ++) {
            dst[y + x * n] = src[x + y * n];
        }
    }
    for (int x = ex; x < n; x ++) {
        for (int y = ey; y < n; y ++) {
            dst[y + x * n] = src[x + y * n];
        }
    }
}

Part 1

		blocksize = 20, n = 100:    0.004 milliseconds, 0.023 milliseconds
		blocksize = 20, n = 1000:   0.898 milliseconds, 1.816 milliseconds
		blocksize = 20, n = 2000:   28.081 milliseconds, 4.874 milliseconds
		blocksize = 20, n = 5000:   207.785 milliseconds, 30.871 milliseconds
		blocksize = 20, n = 10000:  1103.16 milliseconds, 157.027 milliseconds
  1. N is big.
  2. Because there are more operations in the blocked version.

Part 2

		blocksize = 50, n = 10000:      1405.15 milliseconds, 161.008 milliseconds
		blocksize = 100, n = 10000:     888.638 milliseconds, 147.19 milliseconds
		blocksize = 500, n = 10000:     951.018 milliseconds, 131.87 milliseconds
		blocksize = 1000, n = 10000:    974.174 milliseconds, 174.938 milliseconds
		blocksize = 5000, n = 10000:    938.069 milliseconds, 891.538 milliseconds
  1. First it gets better, then it gets worse.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值