深入理解计算机系统（英文版第三版）第三章阅读笔记（英文）

本文链接：https://blog.csdn.net/qq_36992525/article/details/116235604

本文详细介绍了计算机程序在机器层面上的表示，包括虚拟地址、内存模型、数据编码、数据格式和访问方式。讲解了指令格式的唯一解码特性，以及数据在寄存器和内存中的表示，如字节、字、双字和四字的定义。此外，还讨论了x86-64架构中跳转指令的实现，包括直接跳转和间接跳转，并探讨了条件分支和循环的实现策略。

摘要由CSDN通过智能技术生成

Computer System A Programmer Perspective

Chapter 3 Machine-Level Representation of Programs

3.2 Program Encodings

3.2.1 Machine-Level Code

The memory addresses used by a machine-level program are virtual addresses,providing a memory model that appears to be a very large byte array.

A byte array is simply an area of memory containing a group of contiguous (side by side) bytes, such that it makes sense to talk about them in order: the first byte, the second byte etc…

The crucial thing about a byte array is that it gives indexed (fast), precise, raw access to each 8-bit value being stored in that part of memory, and you can operate on those bytes to control every single bit.

Parts of the processor state

The integer registers file contains 16 named locations storing 64-bit values.These registers can hold addresses (corresponding to C pointers) or integer data.
The condition code registers hold status information about the most recently executed arithmetic or logical instruction.+ A set of vector registers can each hold one or more integer of floating-point values.

The program memory is addressed using virtual addresses.At any given time,only limited subranges of virtual addresses
are considered valid.For example,x86-64 virtual addresses are represented by 64-bit words.In current implementations of
these machines,the upper 16 bits must be set to zero,and so an address can potentially specify a byte over a range of $2^{48}$ ,or 256 terabytes.

Code examples

A key lesson to learn from this is that the program executed by the machine is simply a sequence of bytes encoding a series of instructions.The machine has very little information about the source code from which these instructions were generated.

A small feature about machine code
The instruction format is designed in such a way that from a given starting position,there is a unique decoding of the bytes into machine instructions.

3.3 Data Formats

Due to its origins as a 16-bit architecture that expanded into a 32-bit one,Intel uses the term “word” to refer to a 16-bit data type.Based on this,they refer to 32-bit quantities as “double words”,and 64-bit quantities as “quad words”.

Summary:

A byte --> 8 bits --> 2 hexadecimal digits
A word --> 2 bytes --> 16 bits --> 4 hexadecimal digits
A double words --> 4 bytes --> 32 bits --> 8 hexadecimal digits
A quad words --> 8 bytes --> 64 bits -->16 hexadecimal digits
Recall that for a virtual addresses,the upper 16 bits must be set to zero,thus there are actually 64-16=48 bits that are valid,that is, $2^{48}$

3.4 Accessing Information

Two conventions arise for what happens to the remaining bytes in the register for instructions that generate less than 8 bytes:

Those that generate 1- or 2-byte quantities leave the remaining bytes unchanged.
Those that generate 4-byte quantities set the upper 4 bytes of the register to zero.

3.4.1 Operand Specifiers

Most instructions have one or more operands specifying the source values to use in performing an operation and the destination location into which to place the result.

There are three types:

Immediate,is for constant values.
Register,denotes the contents of a register $r_a$ to denote an arbitrary register a and indicate its value with the reference R[ $r_a$ ],viewing the set of registers as an array R indexed by register identifiers.
Memory reference,in which we access some memory location according to a computed address,often called the effective address.We use the notation $M_b[Addr]$ to denote a reference to the b-byte value stored in memory starting at address $A d d r$ .

Note: The scaling factor s must be either 1,2,4,or 8.

3.4.2 Data Movement Instructions

x86-64 imposes the restrictions–a move instruction cannot have both operands refer to memory locations.

Register operands for these instructions can be the labeled portions of any the 16 registers,where the size of the register must match the size designate by the last character of the instruction(‘b’,‘w’,‘l’ or ‘q’)

Note: The only exception is that when movl has a register as the destination.it will also set the high-order 4 bytes of teh register to 0

Practice Problem 3.2 : One important feature is that memory references in x86-64 are always given quad word registers,such as %rax.

3.6 Control

…

3.6.3 Jump Instructions

In generating the object-code file,the assembler(assembly code --> machine code) determines the addresses of all labeled instructions and encodes the jump targets (the addresses of the destination instructions) as part of the jump instructions.

The jmp instruction jumps unconditionally.It can be either a direct jump,or an indirect jump.Direct jumps are written in assembly code by giving a label as the jump target,for example, jmp .L1

Indirect jumps:

jmp * %rax --> uses the value in register %rax as the jump target.
jmp * (%rax) --> reads the jump target from memory,using the value in %rax as the read address.

3.6.4. Jump Instruction Encodings

The most commonly used encoding method is PC relative.That is, $address_{encoding} = address_{target} - address_{following}$

So, $address_{target} = address_{encoding} + address_{following}$ (In machine code,the following part of jmp is $address_{encoding}$ )
These offsets can be encoded using 1,2,or 4 bytes.

A second encoding method is to give an “absolute” address,using 4 bytes to directly specify the target.

After linking,the instructions have been relocated to different addresses,but the encodings of the jump targets remain unchanged,By using a PC-relative encoding of the jump targets,the instructions can be compactly encoded(requiring just 2 bytes),and the object code can be shifted to different positions in memory without alteration.

3.6.5 Implementing Conditional Branches with Conditional Control

The general form of an if-else statement in C is given by the template.

    if(test-expr)
        then-statement
    else
        else-statement

For this general form,the assembly implementation typically adheres to the following form.

    t = test-expr;
    if(!t)
        goto false; 
    then-statement
    goto done;
false:
    else-statement
done:

3.6.6 Implementing Conditional Branches with Conditional Moves

Conditional moves can be describe by following abstract code.

    v = then-expr;
    ve = else-expr;
    t = test-expr;
    if(!t) v = ve;

Our experiments with GCC indicate that it only uses conditional moves when the two expressions ca be computed very easily,for example,with single add instructions.In our experience,GCC uses conditional control transfers even in many cases where the cost of branch misprediction would exceed even more complex computations.

3.6.7 Loops

Do-While Loops

loop:
    body-statement
    t = test-expr;
    if(t)
        goto loop;

While Loops

The first translation method,which we refer to as jump to middle,performs the initial test by performing an unconditional jump to the test at the end of the loop.

    goto test;
loop:
    body-statement
test:
    t = test-expr;
    if(t)
        goto loop;

The second translation method,which we refer to as guarded do,first transforms the code into a do-while loop by using a conditional branch to skip over the loop if the initial test fails.GCC follows this strategy when compiling with higher levels of optimization,for example,with command-line option -O1.

    t = test-expr;
    if(!t)
        goto done;
loop:
    body-statement
    t = test-expr;
    if(t)
        goto loop;
done:

For Loops
The general form of a for loop is as follows:

for(init-expr;text-expr;update-expr)
    body-statement

The C language standard states that the behavior of such a loop is identical to the following code using a while loop:

init-expr;
while(test-expr){
    body-statement
    update-expr;
}

The code generated by GCC for a for loop then follows one of our two translation strategies for while loops,depending on the optimization level.

Jump-to-middle strategy:

    init-expr;
    goto test;
loop:
    body-statement
    update-statement;
test:
    t = test-expr;
    if(t) goto loop;

Guarded-do strategy:

    init-expr;
    t = test-expr;
    if(!t) goto done;
loop:
    body-statement
    update-expr;
    t = test-expr;
    if(t)
        goto loop;
done: