------------------------------------------------------------------------------------------------------------------
@梁廷振
------------------------------------------------------------------------------------------------------------------
复旦大学计算机科学技术学院
《计算机原理》期末考试试卷
A卷
共 13页
Problem
1.
Number Conversion: IEEE 754 single precision 32-bit float
standard representation with a little change is illustrated
below.
Normalized: (-1)sign *
(1.fraction) * 2exponent-127
(exp=1 to
254)
Denormalized: (-1)sign *
(0.fraction) * 2-126 (when exp=0, fraction≠0)
Zero: all 0’s in all 3
fields
1)
Convert the number -35.390625 into this changed IEEE 754 FP
single precision representation (in hex).
(-35.390625)10 =
(0x)16
2)
With changed 32-bit float
representation, what is the equivalent value as a decimal
number?
A: (0011 1111 0001 0000 0000 0000 0000
0000)2
B: (0000 0000 0011 0101 0000 0000 0000
0000)2
3)
Calculate the sum of (35.390625)10 and (0011 1111
0001 0000 0000 0000 0000 0000)2 using
changed 32-bit float representation, and then round the sum (in
binary number value, e.g. (1000.11)2
for (8.75)10 ) to 4 bits to the right
of the binary point by both round-up and round-down. Give your
steps detailed.
4)
Given three numbers f1, f2 and f3 of this changed 32-bit
representation, none of them equals +¥,
-¥ or NaN,
and x is signed 32-complement representation. Please tell whether two C expressions below are always true. If
yes, give the reason in detail; If no, give a counterexample and
show the detail computation steps to get false.
A: x = = (int) (float)
x;
B: (f1 > f2) = = ((f1 + f3) > (f2 +
f3))
Problem
2.
Please read the following C code and assembly code and then
fill in the blanks.
#include
int p[5][4] =
{{43,56,78,69},{-7,89,7,23},{24,36,88,67},
{12,56,78,90},{62,93,-78,9}};
int main(void){
int result = cal(5,4);
printf("The result is %d\n", result);
}
int cal(int a, int b){
int i;
int result=0;
for ( i = 0; i < b; i++) {
if (i < a-1)
result -= i*p[i+1][i];
}
return result;
}
The assembly code:
cal:
pushl �p
movl %esp, �p
subl $8,
%esp
movl $0,
-8(�p)
movl $0,
-4(�p)
.L3:
movl -4(�p),
�x
cmpl 12(�p),
�x
jl .L6
jmp .L4
.L6:
movl 8(�p),
�x
decl �x
cmpl �x,
-4(�p)
jge .L5
movl -4(�p),
�x
sall ________, �x
movl �x,
�x
addl -4(�p),
�x
movl -4(�p),
�x
imull ________(,�x,4),
�x
movl �x, �x
leal -8(�p),
�x
subl �x,
(�x)
.L5:
leal -4(�p),
�x
incl (�x)
jmp .L3
.L4:
movl -8(�p),
�x
leave
ret
Please give the reason for the blanks you filled in
the assembly code and the result of the printf in main
function.
Problem
3.
The following figure illustrates a five stage pipeline
processor similar to that in your text book (Figure 4.53, Page 334
in English Book). Notice there are three differences for
this architecture from that in your book.
First difference is that Function
Units in Stage E become multi-cycle function units. Now
stage E contains three function units, two Adder
and one Subtracter. The Subtracter can only handle
subtraction operations and takes 3 cycles to
completion. The Adder can handle other calculation exception
subtraction and takes 2 cycles to completion.
Notice one Adder or Substracter can handle only one
instruction at a time, that is, other instructions must wait in its
Stage D until the expected Adder or Substracter is free.
Second difference is that Memory
Units will consume different cycle to complete a memory access.
For a cache hit, it takes 1 cycle;
for a cache miss, it takes 6 cycles;
for a non-memory instruction, it takes
1 cycle to pass Stage M. And only one instruction can
occupy the memory units, that is, if two memory instructions are
entering the Stage M at the same time, the second instruction will
wait the first instruction to complete before retrieve its own
memory data.
Third difference is that this
architecture is 2-issue in-order pipeline processor.
It means the fetch stage can fetch at most two
instructions and all the state registers between stages can also
store at most two instructions’ states.
The fetch units will fetch as many instructions as possible to fill
the state registers between Stage F and Stage D.
1)
This problem is based on the
code in figure, which will be executed on the processor described
above. Assume in cycle 0, no instruction is executed, and in cycle
1, the first two instructions are fetched. Fill in the blanks.
(Stage: F, D, E, M, W,
finished or not fetched, undecidable value marked
with “--”) (10’)
2)
The cache miss in the above code cause the CPU to waste
several cycles, because the next instruction depends on the value
fetched by the cache miss instruction. So, we have a technical
called “code motion” which changes the order of
instructions so that the time used to load the data from memory
can be hidden by the instructions do not use memory and do not
depends on the memory instruction. Use this technology to optimize
the original code in figure to get least execution time and write
you code down.
3)
How many cycles are saved from this optimization?
___[1]___
4)
The last instruction in this optimized instruction
flow will exit its write-back stage in
Cycle___[2]___
Problem
4.
Implement the following 2 functions and make their
functionalities be the same with the description. The coding
rules are also same as Lab1. (If you think there is nothing to
do, just fill the blank with “--”)
1)
bitOr (10‟ +
2‟)
int
bitOr (int x, int y)
{
return ret;
}
2)
logicalShift (2‟ *
6)
int
logicalShift (int x, int n)
{
int mask= 0x7f <
| 0xff <
return ((x >> __[4]__) __[5]__ ( mask >>
(__[6]__));
}
Problem
2.
A computer has main memory(MM) with 256K words size, and a 8K
words set associative cache. Each set of cache has 4 pages with 64
words in each page. Suppose the cache is initially empty. CPU read
data from MM at the address of 0, 1, 2, …, 8447(When read misses in
cache, the full page containing destination bytes is written into
the cache first and read from the cache). This reading process
repeats for 20 times. We know that cache access time is 10% of that
of MM. LRU replacement policy is used on cache. Please figure out
how many times the total reading operations will be faster using
cache compared with no cache? You must write out complete steps
with necessary descriptions.
Problem
5.
You are given the following definitions:
struct pc {
int x;
int y;
int z;
};
struct pc sq[16][16];
int i,j;
Assume the following:
Ÿ
sizeof(int) == 4.
Ÿ
sq begins at memory address 0.
Ÿ
The machine has a 2048-byte direct-mapped data cache with
32-byte blocks.
Ÿ
The cache is initially empty.
Ÿ
The only memory accesses are to the entries of the array sq.
Variables i and j are stored in registers.
Determine the cache performance of the following
code:
for
(i = 0; i < 16; i ++) {
for (j = 0; j < 16; j ++) {
sq[i][j].x = 0;
sq[i][j].y = 0;
sq[i][j].z = 1;
}
}
1)
What is the total number of writes?
2)
What is the total number of writes that miss the
cache?
Given the above assumptions, determine the cache performance
of the following code:
for
(i = 0; i < 16; i ++) {
for (j = 0; j < 16; j ++) {
sq[j][i].x = 0;
sq[j][i].y = 0;
sq[j][i].z = 1;
}
}
3)
What is the total number of writes?
4)
What is the total number of writes that miss the
cache
Problem
6.
Virtual Address Translation
------------------------------------------------------------------------------------------------------------------
@梁廷振
------------------------------------------------------------------------------------------------------------------