Computer Organization and Design--计组作业习题(6)

Computer Organization and Design

 

 ----------------------个人作业,如果有后辈的作业习题一致,可以参考学习,一起交流,请勿直接copy

 

 

 Problem 2. Cache Associativity (8 points)

 

For this question, you will simulate different configurations of an 8 block cache using the following 

8-bit memory accesses:

     load      100

     store    102

     store    88

     load     120

     load     90

 

a) Simple Questions

     i) If the block size is 4 bytes, how large must the cache be?

----------32 bytes ;

     ii) How many bits will be used for the offset?

----------2 bits will be used for the offset ;

b) Fully-associative with a 4-byte block size

    i) How many bits should be used for the tag?

----------6 bits should be used for the tag ;

    ii) Complete the table by simulating the above memory references

         Show each entry, and simply cross out blocks as they are overwritten

     

 

 

Block#

Tag

Dirty

0

25

0->1

1

22

1

2

30

 

3

 

 

4

 

 

5

 

 

6

 

 

7

 

 

 

c) Direct-mapped with a 4-byte block size

    i) How many bits should be used for the tag?

                 ------------3 bits should be used for the tag ;

    ii) Complete the table by simulating the above memory references

         Show each entry, and simply cross out blocks as they are overwritten

 

 

Block/Set#

Tag

Dirty

0

 

 

1

3

0->1

2

 

 

3

 

 

4

 

 

5

 

 

6

2->3->2

1->0->0

7

 

 

 

 

 

 

 

d) 2-way Set-associative with a 4-byte block size

    i) How many bits should be used for the tag?

                 ------------4 bits should be used for the tag ;

    ii) Complete the table by simulating the above memory references

         Show each entry, and simply cross out blocks as they are overwritten

 

Set#

Block#

Tag

Dirty

0

0

 

 

 

1

 

 

1

2

6

0->1

 

3

 

 

2

4

5

1

 

5

7

0

3

6

 

 

 

7

 

 

 

 

e) Matching 

     Note: Some may have multiple answers and some answers may not be used. 

__bd___full-associative cache                 a) Fewest connections to memory are needed

__ae___ direct-mapped cache                 b) Provides optimal cache utilization

___f___ set-associative cache                 c) Allows a larger block size to be used

                                               d) Largest Tag overhead

                                             e) Smallest Tag overhead

                                             f) May prevent conflict when two blocks have the same set

                                                  (N/A to full-associative caches)

Problem 3. Cache Comparison (8 Points)

EZ-Cache Company has hired you to design their next generation cache for the LC2k. They have want to use a 32-byte direct-mapped cache with a 4-byte block size. 

 

a) Using the following sequence of memory accesses, compute the number of cache hits:

     load  200

     store 204

     store 236

     load  201

     load  208

     store 234

     load  204

     load  239

     store 201

 

Block/Set#

Tag

Dirty

0

 

 

1

 

 

2

6->7->6

 

3

6->7->6->7

 

4

6

 

5

 

 

6

 

 

7

 

 

 

 

Number of Hits: ________1________

 

b) EZ-Cache believes they can improve the hit rate by either increasing the block size or increasing the associativity

 

Simulate the above memory accesses for a 32-byte direct-mapped cache with an 8-byte block size

 

Block/Set#

Tag

0

 

1

6->7->6->7->6->7->6

2

6

3

 

 

 

Number of Hits: _________1________

 

Simulate the above memory accesses for a 32-byte 2-way associative cache with a 4-byte block size

 

 

 

Set#

Block#

Tag

0

0

13

 

1

 

1

2

 

 

3

 

2

4

12

 

5

14

3

6

12

 

7

14

 

Number of Hits ________4__________

 

 

Which one of the configurations is best for the memory access sequence?

 

                 -----------2 ways ,4 bytes block associative cache ;

 

 

 

Problem 4 (12 points)

 

You have been given the following two caches which are both byte addressable and use 16 bit memory addresses.

 

 

Cache

Cache A

Cache B

Total size (Bytes)

16

16

Block size (Bytes)

4

4

Organization

Fully Associative

Direct Mapped

Replacement policy

LRU

-

Write policy

Allocate on write

Allocate on write

 

 

 

 

 

 

 

 

 

a) The following addresses are referenced in the given order; please put an H for each of the hits and an M for each of the misses for both the caches. Also calculate the hit rate for each cache. An extra column for an infinite size fully-associative cache (also of block size 4 bytes) is given to make the calculation for part (b) easy. [8 pts]

 

Address(hex)

Address(binary)

Infinite

Cache A

Cache B

0x0000

0000 0000 0000 0000

M

M

M

0x0007

0000 0000 0000 0111

M

M

M

0x0003

0000 0000 0000 0011

H

H

H

0x0009

0000 0000 0000 1001

M

M

M

0x0016

0000 0000 0001 0110

M

M

M

0x0005

0000 0000 0000 0101

H

H

M

0x000D

0000 0000 0000 1101

M

M

M

0x0001

0000 0000 0000 0001

H

M

H

Hit Rate

 

3/8

2/8

2/8

 

-----------------(2/8+2/8)*2*8=8 ;

 

b) For each reference in the previous sequence of references, classify them using one of the four possible labels HIT (if the access is a hit) or COMPULSORY / CAPACITY / CONFLICT (if it’s a miss depending on the type of miss) [4 pts]

 

Address(hex)

Cache A

Cache B

0x0000

COMPULSORY

COMPULSORY

0x0007

COMPULSORY

COMPULSORY

0x0003

HIT

HIT

0x0009

COMPULSORY

COMPULSORY

0x0016

COMPULSORY

COMPULSORY

0x0005

HIT

CONFLICT

0x000D

COMPULSORY

COMPULSORY

0x0001

CAPACITY

HIT

 

 

 

 

 

 

 

 

 

 

 

 

 

------------0.25*8*2=4 ;

Problem 5 (10 points)

The picojoule microprocessor has a byte-addressable ISA and only 64 bytes of memory. It has a 16 byte, 2-way set-associative, write-back, write-allocate cache, and uses a block size of 2 bytes. Each load / store instruction accesses a single byte. The OB0 and OB1 fields in the cache hold the 2 data bytes in a block (1 byte each). Given the following sequence of instructions, update the cache after each instruction. When both ways in a set are invalid and a block has to be allocated, the cache logic puts higher priority on way 0. Use decimal value for the B0 and B1 field and binary for the rest, if a cache block is invalid you don’t have to fill in anything, if you don’t know the value of a certain field for a valid block put a X there. The initial empty state of the cache is given. The content of the following memory locations are known:

 

 

M[4]=7

M[14]=11

M[15]=13

M[37]=17

M[45]=19

 

 

The instructions (LD is a load and ST is a store) follow:

 

 

1: LD R1 ← M[4]

2: LD R2 ← M[37]

3: ST R1 → M[36]

4: ST R2 → M[5]

5: LD R1 ← M[15]

6: LD R2 ← M[14]

7: LD R1 ← M[45]

8: ST R2 → M[44]

9: HALT

 

 

 

Part (a) [8 points]

 

 

 

Initial

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

0

 

 

 

 

 

0

 

 

 

 

 

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 After instruction 1  1: LD R1 ← M[4]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

0

 

000

7

X

0

 

 

 

 

 

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

After instruction 2  2: LD R2 ← M[37]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

0

LRU

000

7

X

1

0

 

100

X

17

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

After instruction 3  3: ST R1 → M[36]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

0

LRU

000

7

X

1

1

 

100

7

17

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

After instruction 4  4: ST R2 → M[5]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

 

000

7

17

1

1

LRU

100

7

17

Set 3

0

 

 

 

 

 

0

 

 

 

 

 

 

After instruction 5  5: LD R1 ← M[15]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

 

000

7

17

1

1

LRU

100

7

17

Set 3

1

0

 

001

11

13

0

 

 

 

 

 

 

After instruction 6  6: LD R2 ← M[14]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

 

000

7

17

1

1

LRU

100

7

17

Set 3

1

0

 

001

11

13

0

 

 

 

 

 

 

After instruction 7  7: LD R1 ← M[45]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

LRU

000

7

17

1

0

 

101

X

19

Set 3

1

0

 

001

11

13

0

 

 

 

 

 

 

After instruction 8   8: ST R2 → M[44]

 

 

Way 0

Way 1

 

V

D

lru

Tag

OB0

OB1

V

D

lru

Tag

OB0

OB1

Set 0

0

 

 

 

 

 

0

 

 

 

 

 

Set 1

0

 

 

 

 

 

0

 

 

 

 

 

Set 2

1

1

LRU

000

7

17

1

1

 

101

11

19

Set 3

1

0

 

001

11

13

0

 

 

 

 

 

 

 

 

 

Part (b):

In total how many bytes are written to memory for executing instruction 1 to 8 (including instruction 8) ? How many more bytes will have to be written to memory after HALT is executed? [2 points]

 

------------In total 2 bytes are written to memory for executing instruction 1 to 8 ;

 2x2=4 bytes will have to be written to memory after HALT is executed.

 

Problem 6 (8 points)

A certain workload having the following instruction mix is run on two processor designs with both having I-Cache and D-Cache.

 

ADD 10%

NAND 20%

BEQ 25%

SW 15%

LW 30%

 

Additionally, it is known that I-Cache Hit-rate is 90%, D-Cache Hit-rate is 98%, 45% branches are not taken and 25% of LW instructions are followed by a dependent instruction. The memory takes 75 nano-seconds to access.

 

a) Assuming the above code is run on a standard LC-2K 5-stage pipeline design processor with forwarding and with branches predicted not taken and clocked at 200MHz, what is the CPI? Show your work. [3 points]

 

Clock period : 1 / 200MHz = 5ns

Cache : 75 ns / 5ns = 15 cycles

CPI = 1  +  1*0.10*15  +  (0.3+0.15)*0.02*15  +  0.3*0.25*1  +  0.25*0.55*3 = 3.1225  

 

 

b) This five stage pipeline is extended to a similar 15 stage pipeline with no additional hazards being introduced. The amount of stall cycles needed for a lw followed by a dependent instruction does not change. The new frequency is 400MHz. Now the same code is run on the 15 stage pipeline where branches are resolved in the 11th stage. 

I. What is the new CPI? Show your work. [4 pts]

II. Does this new design result in better performance for this workload? [1 pt]

 

  I :

Beq :  11 – 1 = 10 cycles 

Clock period :   1/400 MHz = 2.5ns

Cache :  75ns / 2.5 ns = 30 cycles

CPI = 1  +  1*0.1*30  +  (0.3+0.15)*0.02*30  +  0.3*0.25*1  +  0.25*0.55*10 = 5.72

  II :

(a) : 5 ns * 3.1225 = 15.6125 ns ;

(b) : 2.5 ns * 5.72 = 14.3 ns ;

Yes, this new design result in better performance for this workload.

 

转载于:https://www.cnblogs.com/nanashi/p/6662279.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值