Computer Organization and Design The Hardware Software interface 阅读笔记2

最新推荐文章于 2021-11-15 17:52:38 发布

乘螺舟而至

最新推荐文章于 2021-11-15 17:52:38 发布

阅读量617

点赞数

分类专栏： RISCV

本文链接：https://blog.csdn.net/qq_26371477/article/details/109686203

版权

RISCV 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

Terminology of Logic design

asserted signal: Boolean 1
deasserted signal: Boolean 0
Blocks without memory are called combinational
which is compared to sequential logic
Truth table 真值表
DeMorgan’s theorems
在这里插入图片描述

The sum-of-products representation corresponds to a common structured-logic implementation called a programmable logic array (PLA)

don’t cares：不定态
A bus is a collection of data lines that is treated together as a single logical signal. Most buses are 64 bits wide.
A behavioral specification describes how a digital system functionally operates. A structural specification describes the detailed organization of a digital system, usually using a hierarchical description.
sensitivity list: The list of signals that specifies when an always block should be re-evaluated.
blocking assignment: In Verilog, an assignment that completes before the execution of the next statement. While nonblocking assignment indicates an assignment that continues after evaluating the righthand side, assigning the left-hand side the value only after all right-hand
sides are evaluated.
edge-triggered clocking: A clocking scheme in which all state changes occur on a clock edge.
A register file consists of a set of registers that can be read and written by supplying a register number to be
accessed.
wired AND: 线与
tristate buffer：三态门（three states: asserted, deasserted, or high impedance（高阻态））
finite-state machine: 有限状态机(FSM)
数字电路就是由真值表（表达组合逻辑）和有限状态机（表达时序逻辑）组成的。

MSB(Most Significant Bit)
在这里插入图片描述

LSB(Least Significant Bit)
在这里插入图片描述

RISCV divides the 32 bits of an instruction into “fields”, because:
regular field size $\Rightarrow$ simpler hardware
immediates 立即数，该数值紧跟在操作码之后，不进入内存

6 Instruction formats

1. R-format: using 3 register inputs
-add, xor, mul   -arithmetic/logical ops
2. I-format: instructions with immediates, loads
-addi, lw, jalr, slli
3. S-format: store instructions:sw, sb
4. SB-format: branch instuctions: beq, bge
5. U-format: instructions with upper immediates
Upper immediates is 20-bits
6. UJ-format: jump instructions: jal

在这里插入图片描述

Using multiple levels of control can reduce the size of the main control unit. Using several smaller control units may also potentially reduce the latency of the control unit.
· In RISCV, word comprises 32 bits, while a group 64 bits was given the name of doubleword
RISCV架构一般中限定寄存器个数为32个，遵照Smaller is Faster的思想
The desire to keep all instructions the same size conflicts with the desire to have as many registers as possible. Any increase in the number of registers uses up at least one more bit in every register field of the instruction format. Given these constraints and the design principle that smaller is faster, most instruction sets today have 16 or 32 general-purpose registers.

The sign and magnitude representation was soon abandoned because:
1. where to put the sign bit?
2. adders need an extra step to set the sign.
3. positive and negative zero issue.

The data transfer instruction that copies data from memory to a register is traditionally called load.
The addresses of sequential doublewords differ by 8.
Two’s compliment representation 用二进制补码表示负数：逐位求反，末位加1
Two’s complement representation has the advantage that all negative numbers have a 1 in the most significant bit. Thus, hardware needs to test only this bit to see if a number is positive or negative (with the number 0 is considered positive). This bit is often called the sign bit.
represent the most negative value by 00 … 000, and the most positive value by 11 … 11, with 0 typically having the value 10 … 00. This representation is called a biased notation

Representing Instructions in the Computer

we have a conflict between:
1. Desire to keep all instructions the same length.
2. Desire to have a single instruction format.

Design Principle: Good design demands good compromises.
The compromise chosen by the RISC-V designers is to keep all instructions the same length, thereby requiring distinct instruction formats for different kinds of instructions.
S-type instructions
在这里插入图片描述

The 12-bit immediate in the S-type format is split into two fields, which supply the lower 5 bits and upper 7 bits. The RISC-V architects chose this design because it keeps the rs1 and rs2 fields in the same place in all instruction formats.

each format is assigned a distinct set of opcode values in the first field (opcode) so that the hardware knows how to treat the rest of the instruction.

the stored-program concept: Programs are stored in memory to be read or written, just like data.

case/switch statement use unconditional jump to the certain address. It is more efficiently encoded as a table of addresses of alternative instruction sequences, called a branch address table or branch table, and the program needs only to index into the table and then branch
to the appropriate sequence.
jump-and-link instruction: An instruction that branches to an address and simultaneously saves the address of the following instruction in a register (usually x1 in RISC-V).

#for example: 
	jal x1, ProcedureAddress
#Where ProcedureAddress called the ==return address== is stored in register x1.

caller: The program that instigates a procedure and provides the necessary parameter values.
callee: A procedure that executes a series of stored instructions based on parameters provided by the caller and then returns control to the caller.
Why using stack(栈)：

x10-x17 are eight parameter registers in which to pass parameters or return values in the convention of RISCV. Suppose a compiler needs more registers for a procedure than the eight argument registers. Since we must cover our tracks after our mission is complete, any registers needed by the caller must be restored to the values that they contained before the procedure was invoked. This situation is an example in which we need to spill registers to memory.
RISCV has a stack pointer: x2(sp)
placing data onto the stack is called a push, and removing data from the stack is called a pop.
By historical precedent, stacks “grow” from higher addresses to lower addresses. This convention means that you push values onto the stack by subtracting from the stack pointer. Adding to the stack pointer shrinks the stack, thereby popping values off the stack.

堆栈的关键作用在于避免在procedure调用过程中寄存器之间的conflict：
The caller pushes any argument registers (x10–x17) or temporary registers (x5-x7 and x28-x31) that are needed after the call. The callee pushes the return address register x1 and any saved registers (x8x9 and x18-x27) used by the callee. The stack pointer sp is adjusted to account
for the number of registers placed on the stack.

To avoid saving and restoring a register whose value is never used, which might happen with a temporary register, RISC-V software separates 19 of the registers into two groups:
在这里插入图片描述

对于factorial汇编程序的分析

C:
参数n存储于参数寄存器x10中

long long int fact(long long int n)
{
	if(n < 1) retrun(1);
		else return(n * fact(n - 1));
}

汇编：

fact:
	addi sp, sp, -16            //入栈
	  sd x1, 8(sp)              //x1中存储的返回地址入栈
	  sd x10, 0(sp)             //x10中存储的参数n入栈

	addi x5, x10, -1     // 计算x5 = n - 1
	bge  x5, x0, L1      // 如果x5 >= 0, 指令寄存器跳转至L1

	addi x10, x0, 1
	addi sp, sp, 16
	jalr x0, 0(x1)

	L1:
		addi x10, x10, -1   //计算n-1, 并更新x10寄存器的值
		jal x1, fact   //跳转至fact处，并将此时的地址存入x1

	addi x6, x10, 0     //
	ld   x10, 0(sp)
	ld   x1,  8(sp)
	addi sp, sp, 16
		
	mul x10, x10, x6
	jalr x0, 0(x1)

以n = 3为例

1.fact第一次调用，地址f1存入x1并入栈2，n=3入栈1
2.x5存入3-1=2
3.2>=0, 跳转至L1
4.更新x10为x10 - 1=2, PC跳转至fact处
5.fact第二次调用, 地址f2存入x1并入栈2, n=3入栈1
6.f1压栈4，n=3压栈3
7.x5存入2-1=1
8.1>=0, 跳转至L1
9.更新x10为x10 - 1=1, PC跳转至fact处
10.fact第三次调用, 地址f3存入x1并入栈2, n=1入栈1
11.f1压栈6，n=3压栈5，f2压栈4，n=2压栈3
12.x5存入1-1=0
13.1>=0, 跳转至L1
14.更新x10为x10 - 1=0, PC跳转至fact处
15.fact第四次调用, 地址f4存入x1并入栈2, n=0入栈1
16.f1压栈8，n=3压栈7，f2压栈6，n=2压栈5，f3压栈4, n=1压栈3
17.x5存入0-1=-1, 不跳转L1
18.x10存入1, 栈1、2pop, f1出栈6，n=3出栈5，f2出栈4，n=2出栈3，f3出栈2, n=1出栈1
19.程序跳转回此时x1存储的地址f4
20.拷贝x10的值进入x6中，x6=1
21.栈1出栈至x10, 即x10=1; 栈2出栈至x1, 即地址f3;
22.f1出栈4，n=3出栈3，f2出栈2, n=2出栈1
23.更新x10的值为1*1
24.程序跳转回此时x1存储的地址f3
25.拷贝x10的值进入x6中，x6=1
26.栈1出栈至x10, 即x10=2; 栈2出栈至x1, 即地址f2;
27.f1出栈2，n=3出栈1
28.更新x10的值为1*1*2
29.程序跳转回此时x1存储的地址f2
30.拷贝x10的值进入x6中，x6=2
31.栈1出栈至x10, 即x10=3; 栈2出栈至x1, 即地址f3;
32.更新x10的值为1*1*2*3
33.程序跳转回此时x1存储的地址f1

C语言中的变量一般表达内存中的一个地址

C has two storage classes: automatic and static. Automatic variables are local to a procedure and are discarded when the procedure exits. Static variables exist across exits from and entries to procedures. C variables declared outside all procedures are considered static, as are any variables declared using the keyword static. The rest are automatic. To simplify access to static data, some RISC-V compilers reserve a register x3 for use as the global pointer, or gp.

The stack starts in the high end of the user addresses space
stack is also used to store variables that are local to the procedure but do not fit in registers, such as local arrays or structures

procedure frame or activation record:
The segment of the stack containing a procedure’s saved registers and local variables
frame pointer: A value denoting the location of the saved registers and local variables for a given procedure.

frame pointer VS. stack pointer:

The stack pointer operates on the stack. The frame pointer operates on the frame. Very often, the frame is located in the stack, but it ain’t necessarily so.

Graphics and Computing GPUs

The major driving force for improving graphics processing was the computer game industry. Many programmers of scientific and multimedia applications today are pondering whether to use GPUs or CPUs.
heterogeneous systems: A system combining different processor types.
Trends:
GPUs and their associated drivers implement the OpenGL and DirectX models of graphics processing.
OpenGL is an open standard for 3D graphics programming available for most computer.
DirectX is a series of Microsoft multimedia programming interfaces.
These API(application programming interfaces) have well-defined behavior.
visual computing.
GPU evolves into scalable parallel processor.
In the GeForce 8-series generation of GPUs, the geometry, vertex, and pixel processing all run on the same type of processor. This unification allows for dramatic scalability. $\Downarrow$
A new model of programming for the GPU requires
Compute Unified Device Architecture(CUDA) is a scalable parallel programming model and software platform for the GPU and other parallel processors that allows the programmer to bypass the graphics API and graphics interfaces of the GPU and simply program in C or C++.
The CUDA programming model has an SPMD (single-program multiple data) software style, in which a programmer writes a program for one thread that is instanced and executed by many threads in parallel
on the multiple processors of the GPU.
With CUDA and GPU computing, it is now possible to use the GPU as both a graphics processor and a computing processor at the same time