复旦大学计算机水平测试题,复旦大学计算机原理期末考试试题_梁廷振的学习笔记...

最新推荐文章于 2021-07-24 04:59:48 发布

改掉习惯动作

最新推荐文章于 2021-07-24 04:59:48 发布

阅读量320

点赞数

文章标签：复旦大学计算机水平测试题

------------------------------------------------------------------------------------------------------------------

@梁廷振

------------------------------------------------------------------------------------------------------------------

复旦大学计算机科学技术学院

《计算机原理》期末考试试卷

A卷

共 13页

Problem

Number Conversion: IEEE 754 single precision 32-bit float

standard representation with a little change is illustrated

below.

Normalized: (-1)sign *

(1.fraction) * 2exponent-127

(exp=1 to

254)

Denormalized: (-1)sign *

(0.fraction) * 2-126 (when exp=0, fraction≠0)

Zero: all 0’s in all 3

fields

Convert the number -35.390625 into this changed IEEE 754 FP

single precision representation (in hex).

(-35.390625)10 =

(0x)16

With changed 32-bit float

representation, what is the equivalent value as a decimal

number?

A: (0011 1111 0001 0000 0000 0000 0000

0000)2

B: (0000 0000 0011 0101 0000 0000 0000

0000)2

Calculate the sum of (35.390625)10 and (0011 1111

0001 0000 0000 0000 0000 0000)2 using

changed 32-bit float representation, and then round the sum (in

binary number value, e.g. (1000.11)2

for (8.75)10 ) to 4 bits to the right

of the binary point by both round-up and round-down. Give your

steps detailed.

Given three numbers f1, f2 and f3 of this changed 32-bit

representation, none of them equals +¥,

-¥ or NaN,

and x is signed 32-complement representation. Please tell whether two C expressions below are always true. If

yes, give the reason in detail; If no, give a counterexample and

show the detail computation steps to get false.

A: x = = (int) (float)

B: (f1 > f2) = = ((f1 + f3) > (f2 +

f3))

Problem

Please read the following C code and assembly code and then

fill in the blanks.

#include

int p[5][4] =

{{43,56,78,69},{-7,89,7,23},{24,36,88,67},

{12,56,78,90},{62,93,-78,9}};

int main(void){

int result = cal(5,4);

printf("The result is %d\n", result);

}

int cal(int a, int b){

int i;

int result=0;

for ( i = 0; i < b; i++) {

if (i < a-1)

result -= i*p[i+1][i];

}

return result;

}

The assembly code:

cal:

pushl �p

movl %esp, �p

subl $8,

%esp

movl $0,

-8(�p)

movl $0,

-4(�p)

.L3:

movl -4(�p),

�x

cmpl 12(�p),

�x

jl .L6

jmp .L4

.L6:

movl 8(�p),

�x

decl �x

cmpl �x,

-4(�p)

jge .L5

movl -4(�p),

�x

sall ________, �x

movl �x,

�x

addl -4(�p),

�x

movl -4(�p),

�x

imull ________(,�x,4),

�x

movl �x, �x

leal -8(�p),

�x

subl �x,

(�x)

.L5:

leal -4(�p),

�x

incl (�x)

jmp .L3

.L4:

movl -8(�p),

�x

leave

ret

Please give the reason for the blanks you filled in

the assembly code and the result of the printf in main

function.

Problem

The following figure illustrates a five stage pipeline

processor similar to that in your text book (Figure 4.53, Page 334

in English Book). Notice there are three differences for

this architecture from that in your book.

First difference is that Function

Units in Stage E become multi-cycle function units. Now

stage E contains three function units, two Adder

and one Subtracter. The Subtracter can only handle

subtraction operations and takes 3 cycles to

completion. The Adder can handle other calculation exception

subtraction and takes 2 cycles to completion.

Notice one Adder or Substracter can handle only one

instruction at a time, that is, other instructions must wait in its

Stage D until the expected Adder or Substracter is free.

Second difference is that Memory

Units will consume different cycle to complete a memory access.

For a cache hit, it takes 1 cycle;

for a cache miss, it takes 6 cycles;

for a non-memory instruction, it takes

1 cycle to pass Stage M. And only one instruction can

occupy the memory units, that is, if two memory instructions are

entering the Stage M at the same time, the second instruction will

wait the first instruction to complete before retrieve its own

memory data.

Third difference is that this

architecture is 2-issue in-order pipeline processor.

It means the fetch stage can fetch at most two

instructions and all the state registers between stages can also

store at most two instructions’ states.

The fetch units will fetch as many instructions as possible to fill

the state registers between Stage F and Stage D.

This problem is based on the

code in figure, which will be executed on the processor described

above. Assume in cycle 0, no instruction is executed, and in cycle

1, the first two instructions are fetched. Fill in the blanks.

(Stage: F, D, E, M, W,

finished or not fetched, undecidable value marked

with “--”) (10’)

The cache miss in the above code cause the CPU to waste

several cycles, because the next instruction depends on the value

fetched by the cache miss instruction. So, we have a technical

called “code motion” which changes the order of

instructions so that the time used to load the data from memory

can be hidden by the instructions do not use memory and do not

depends on the memory instruction. Use this technology to optimize

the original code in figure to get least execution time and write

you code down.

How many cycles are saved from this optimization?

___[1]___

The last instruction in this optimized instruction

flow will exit its write-back stage in

Cycle___[2]___

Problem

Implement the following 2 functions and make their

functionalities be the same with the description. The coding

rules are also same as Lab1. (If you think there is nothing to

do, just fill the blank with “--”)

bitOr (10‟ +

2‟)

int

bitOr (int x, int y)

{

return ret;

}

logicalShift (2‟ *

int

logicalShift (int x, int n)

{

int mask= 0x7f <

| 0xff <

return ((x >> __[4]__) __[5]__ ( mask >>

(__[6]__));

}

Problem

A computer has main memory(MM) with 256K words size, and a 8K

words set associative cache. Each set of cache has 4 pages with 64

words in each page. Suppose the cache is initially empty. CPU read

data from MM at the address of 0, 1, 2, …, 8447(When read misses in

cache, the full page containing destination bytes is written into

the cache first and read from the cache). This reading process

repeats for 20 times. We know that cache access time is 10% of that

of MM. LRU replacement policy is used on cache. Please figure out

how many times the total reading operations will be faster using

cache compared with no cache? You must write out complete steps

with necessary descriptions.

Problem

You are given the following definitions:

struct pc {

int x;

int y;

int z;

};

struct pc sq[16][16];

int i,j;

Assume the following:

sizeof(int) == 4.

sq begins at memory address 0.

The machine has a 2048-byte direct-mapped data cache with

32-byte blocks.

The cache is initially empty.

The only memory accesses are to the entries of the array sq.

Variables i and j are stored in registers.

Determine the cache performance of the following

code:

for

(i = 0; i < 16; i ++) {

for (j = 0; j < 16; j ++) {

sq[i][j].x = 0;

sq[i][j].y = 0;

sq[i][j].z = 1;

}

What is the total number of writes?

What is the total number of writes that miss the

cache?

Given the above assumptions, determine the cache performance

of the following code:

for

(i = 0; i < 16; i ++) {

for (j = 0; j < 16; j ++) {

sq[j][i].x = 0;

sq[j][i].y = 0;

sq[j][i].z = 1;

}

What is the total number of writes?

What is the total number of writes that miss the

cache

Problem

Virtual Address Translation

------------------------------------------------------------------------------------------------------------------

@梁廷振

------------------------------------------------------------------------------------------------------------------

改掉习惯动作

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复旦大学计算机水平测试题,复旦大学计算机原理期末考试试题_梁廷振的学习笔记...

------------------------------------------------------------------------------------------------------------------@梁廷振----------------------------------------------------------------------------------...
复制链接

扫一扫