Computer Composition and Design Homework 4
2.26
Consider the following MIPS loop:
LOOP: slt $t2, $0, $t1 # 0<t1则t2=1 else 0
beq $t2, $0, DONE # t2==0则跳转到DONE
subi $t1, $t1, 1 # t1 = t1 - 1
addi $s2, $s2, 2 # s2 = s2 + 2
j LOOP # 跳到LOOP
DONE:
2.26.1 [5] <§2.7> Assume that the register $t1 is initialized to the value 10. What is the value in register $s2 assuming $s2 is initially zero?
s 2 = 20 s2=20 s2=20
2.26.2 [5] <§2.7> For each of the loops above, write the equivalent C code routine. Assume that the registers $s1, $s2, $t1, and $t2 are integers A, B, i, and temp, respectively
转换成等效的C语言为
int i = 10, B = 0;
while(i > 0)
{
i--;
B+=2;
}
如果需要一一对应的话
int B = 0;
for(int i = 10; (A = i > 0); i--)
{
//这里的判定条件是先让 i>0赋值给A再由A判定
B += 2;
}
2.26.3 [5] <§2.7> For the loops written in MIPS assembly above, assume that the register $t1 is initialized to the value N. How many MIPS instructions are executed
N每次成功-1说明做了5个语句,N==0执行第二行语句进行跳转
所以应该为
N
∗
5
+
2
N*5+2
N∗5+2
2.27
[5] <§2.7>
Translate the following C code to MIPS assembly code. Use a minimum number of instructions. Assume that the values of a, b, i, and j are in registers $s0, $s1, $t0, and $t1, respectively. Also, assume that register $s2 holds the base address of the array D.
for(i=0; i<a; i++)
for(j=0; j<b; j++)
D[4*j] = i + j;
a -- $s0
b -- $s1
i -- $t0
j -- $t1
汇编代码
and $t0, $t0, 0 # i = 0
LOOP1: slt $t2, $t0, $s0 # i < a
beq $t2, 0 EXIT1
and $t1, $t1, 0 # j = 0
LOOP2: slt $t2, $t1, $s1 # j < b
beq $t2, 0 EXIT2
sll $t2, $t1, 4 # j * 4再*4个字节
add $t3, $s2, $t2 # D[4*j]
add $t2, $t0, $t1 # i + j
sw $t2, 0($t3) # D[4*j] = i + j
addi $t1, $t1, 1 # j++
j LOOP2
EXIT2
addi $t0, $t0, 1 # i++
EXIT1
2.29
[5] <§2.7>
Translate the following loop into C. Assume that the C-level integer i is held in register $t1, $s2 holds the C-level integer called result, and $s0 holds the base address of the integer MemArray.
addi $t1, $0, $0
# i = t1 = 0+0
LOOP: lw $s1, 0($s0)
# $s1 = MemArray[0]
add $s2, $s2, $s1
# result += *(MemArray+0)
addi $s0, $s0, 4
# (MemArray + 1)
addi $t1, $t1, 1
# i++
slti $t2, $t1, 100
# t2 = i < 100
bne $t2, 0, LOOP
# if($t2 != 0) go to LOOP
#这里应该是0不是$s0吧不然死循环了
//这里最后才判断循环,所以用 do while
int i = 0;
do
{
result += *MemArray;
MemArray += 1;//往后四个字节
i++;
}
while{i < 100};
2.30
[5] <§2.7>
Rewrite the loop from Exercise 2.29 to reduce the number of MIPS instructions executed.
第一版
addi $t1, $0, $0
LOOP: lw $s1, 0($s0)
add $s2, $s2, $s1
addi $s0, $s0, 4
addi $t1, $t1, 1
bne $t1, 100, LOOP
第二版,发现addi s0部分和 addi t1部分可以合并起来
addi $t1, 400($s0)
LOOP: lw $s1, 0($t1)
add $s2, $s2, $s1
subi $t1, $t1, 4
bne $t1, $s0, LOOP
2.34
2.34 Translate function f into MIPS assembly language. If you need to use
registers $t0 through $t7, use the lower-numbered registers fi rst. Assume the
function declaration for func is “int f(int a, int b);”. Th e code for function
f is as follows:
int f(int a, int b, int c, int d){
return func(func(a,b),c+d);
}
f: addi $sp, $sp, -12 #将栈往下移来临时存放三个数据
sw $ra, 8($sp) # m[8+$sp]=ra 将返回地址记录在底层
sw $s1, 4($sp) # m[4+$sp]=s1 将寄存器空出来
sw $s0, 0($sp) # m[$sp] = s0 同上
move $s1, $a2 # 放置临时变量参数
move $s0, $a3 # 放置临时变量参数
jal func # 调用第一个func, 参数用的是$a0$a1#(此时存放a和b)
move $a0, $v0 # 得到返回值v0传给a0
add $a1, $s0, $s1 # 得到c+d
jal func # 同上
lw $ra, 8($sp)
lw $s1, 4($sp)
sw $s0, 0($sp)
addi $sp, $sp, -12
jr $ra #回到返回地址
2.35
2.35 [5] <§2.8> Can we use the tail-call optimization in this function? If no,
explain why not. If yes, what is the diff erence in the number of executed instructions
in f with and without the optimization?
可以对FUNC的第二个调用使用尾部调用优化,但必须在该调用之前恢复
$ra $s0 $s1和$sp。并且可以只保存一条指令:jr $ra
2.36
2.36 [5] <§2.8> Right before your function f from Exercise 2.34 returns, what do we know about contents of registers $t5, $s3, $ra, and $sp? Keep in mind that we know what the entire function f looks like, but for function func we only know its declaration.
ra是调用函数中的返回地址
sp和s3的值与调用函数f时的值相同,t5可以有任意值,理由如下:
对于寄存器t5,虽然函数f没有修改它,但函数func可以修改它,因此无法在函数func被调用后 确定t5的内容
2.46
2.46 Assume for a given processor the CPI of arithmetic instructions is 1, the CPI of load/store instructions is 10, and the CPI of branch instructions is 3. Assume a program has the following instruction breakdowns: 500 million arithmetic instructions, 300 million load/store instructions, 100 million branch instructions.
2.46.1[5] <§2.19> Suppose that new, more powerful arithmetic instructions are added to the instruction set. On average, through the use of these more powerful arithmetic instructions, we can reduce the number of arithmetic instructions needed to execute a program by 25%, and the cost of increasing the clock cycle time by only 10%. Is this a good design choice? Why?
C P U 时间 = ( 5 × 1 × 75 % + 3 × 10 + 1 × 3 ) × 1.1 = 40.425 CPU时间 =( 5\times1\times75\%+3\times10+1\times3)\times1.1=40.425 CPU时间=(5×1×75%+3×10+1×3)×1.1=40.425
C P U 原来时间为 38 CPU原来时间为38 CPU原来时间为38
并不是一个好选择 并不是一个好选择 并不是一个好选择
2.46.2[5] <§2.19> Suppose that we fi nd a way to double the performance of arithmetic instructions. What is the overall speedup of our machine? What if we find a way to improve the performance of arithmetic instructions by 10 times?
5
×
1
+
3
×
10
+
1
×
3
5
×
1
×
50
%
+
3
×
10
+
1
×
3
=
38
/
35.5
=
1.07
\frac{5\times1+3\times10+1\times3}{5\times1\times50\%+3\times10+1\times3}=38/35.5=1.07
5×1×50%+3×10+1×35×1+3×10+1×3=38/35.5=1.07
5
×
1
+
3
×
10
+
1
×
3
5
×
1
×
10
%
+
3
×
10
+
1
×
3
=
38
/
33.5
=
1.13
\frac{5\times1+3\times10+1\times3}{5\times1\times10\%+3\times10+1\times3}=38/33.5=1.13
5×1×10%+3×10+1×35×1+3×10+1×3=38/33.5=1.13