Computer Organization and Design The Hardware Software interface 阅读笔记2

Terminology of Logic design

asserted signal: Boolean 1
deasserted signal: Boolean 0
Blocks without memory are called combinational
which is compared to sequential logic
Truth table 真值表
DeMorgan’s theorems
在这里插入图片描述

  • The sum-of-products representation corresponds to a common structured-logic implementation called a programmable logic array (PLA)
    在这里插入图片描述
    don’t cares: 不定态
    A bus is a collection of data lines that is treated together as a single logical signal. Most buses are 64 bits wide.

  • A behavioral specification describes how a digital system functionally operates. A structural specification describes the detailed organization of a digital system, usually using a hierarchical description.

  • sensitivity list: The list of signals that specifies when an always block should be re-evaluated.

  • blocking assignment: In Verilog, an assignment that completes before the execution of the next statement. While nonblocking assignment indicates an assignment that continues after evaluating the righthand side, assigning the left-hand side the value only after all right-hand
    sides are evaluated.

  • edge-triggered clocking: A clocking scheme in which all state changes occur on a clock edge.

  • A register file consists of a set of registers that can be read and written by supplying a register number to be
    accessed.

  • wired AND: 线与

  • tristate buffer:三态门 (three states: asserted, deasserted, or high impedance(高阻态)

  • finite-state machine: 有限状态机(FSM)

  • 数字电路就是由真值表(表达组合逻辑)和有限状态机(表达时序逻辑)组成的。

FSM两种架构
Moore machine
输出取决于状态与输入
Mealy machine
输出取决于状态
更小-状态数少
更快

MSB(Most Significant Bit)
在这里插入图片描述

LSB(Least Significant Bit)
在这里插入图片描述

  • RISCV divides the 32 bits of an instruction into “fields”, because:
    regular field size ⇒ \Rightarrow simpler hardware
    immediates 立即数 ,该数值紧跟在操作码之后,不进入内存

6 Instruction formats

1. R-format: using 3 register inputs
-add, xor, mul   -arithmetic/logical ops
2. I-format: instructions with immediates, loads
-addi, lw, jalr, slli
3. S-format: store instructions:sw, sb
4. SB-format: branch instuctions: beq, bge
5. U-format: instructions with upper immediates
Upper immediates is 20-bits
6. UJ-format: jump instructions: jal

在这里插入图片描述
在这里插入图片描述
Using multiple levels of control can reduce the size of the main control unit. Using several smaller control units may also potentially reduce the latency of the control unit.
· In RISCV, word comprises 32 bits, while a group 64 bits was given the name of doubleword
RISCV架构一般中限定寄存器个数为32个,遵照Smaller is Faster的思想
The desire to keep all instructions the same size conflicts with the desire to have as many registers as possible. Any increase in the number of registers uses up at least one more bit in every register field of the instruction format. Given these constraints and the design principle that smaller is faster, most instruction sets today have 16 or 32 general-purpose registers.

The sign and magnitude representation was soon abandoned because:
1. where to put the sign bit?
2. adders need an extra step to set the sign.
3. positive and negative zero issue.

  • The data transfer instruction that copies data from memory to a register is traditionally called load.
  • The addresses of sequential doublewords differ by 8.
  • Two’s compliment representation 用二进制补码表示负数:逐位求反,末位加1
  • Two’s complement representation has the advantage that all negative numbers have a 1 in the most significant bit. Thus, hardware needs to test only this bit to see if a number is positive or negative (with the number 0 is considered positive). This bit is often called the sign bit.
  • represent the most negative value by 00 … 000, and the most positive value by 11 … 11, with 0 typically having the value 10 … 00. This representation is called a biased notation

Representing Instructions in the Computer

  • we have a conflict between:
    1. Desire to keep all instructions the same length.
    2. Desire to have a single instruction format.

Design Principle: Good design demands good compromises.
The compromise chosen by the RISC-V designers is to keep all instructions the same length, thereby requiring distinct instruction formats for different kinds of instructions.
S-type instructions
在这里插入图片描述

  • The 12-bit immediate in the S-type format is split into two fields, which supply the lower 5 bits and upper 7 bits. The RISC-V architects chose this design because it keeps the rs1 and rs2 fields in the same place in all instruction formats.

each format is assigned a distinct set of opcode values in the first field (opcode) so that the hardware knows how to treat the rest of the instruction.

the stored-program concept: Programs are stored in memory to be read or written, just like data.

  • case/switch statement use unconditional jump to the certain address. It is more efficiently encoded as a table of addresses of alternative instruction sequences, called a branch address table or branch table, and the program needs only to index into the table and then branch
    to the appropriate sequence.
    jump-and-link instruction: An instruction that branches to an address and simultaneously saves the address of the following instruction in a register (usually x1 in RISC-V).
#for example: 
	jal x1, ProcedureAddress
#Where ProcedureAddress called the ==return address== is stored in register x1.

caller: The program that instigates a procedure and provides the necessary parameter values.
callee: A procedure that executes a series of stored instructions based on parameters provided by the caller and then returns control to the caller.
Why using stack(栈):

  • x10-x17 are eight parameter registers in which to pass parameters or return values in the convention of RISCV. Suppose a compiler needs more registers for a procedure than the eight argument registers. Since we must cover our tracks after our mission is complete, any registers needed by the caller must be restored to the values that they contained before the procedure was invoked. This situation is an example in which we need to spill registers to memory.
  • RISCV has a stack pointer: x2(sp)
  • placing data onto the stack is called a push, and removing data from the stack is called a pop.
  • By historical precedent, stacks “grow” from higher addresses to lower addresses. This convention means that you push values onto the stack by subtracting from the stack pointer. Adding to the stack pointer shrinks the stack, thereby popping values off the stack.

堆栈的关键作用在于避免在procedure调用过程中寄存器之间的conflict:
The caller pushes any argument registers (x10–x17) or temporary registers (x5-x7 and x28-x31) that are needed after the call. The callee pushes the return address register x1 and any saved registers (x8x9 and x18-x27) used by the callee. The stack pointer sp is adjusted to account
for the number of registers placed on the stack.

To avoid saving and restoring a register whose value is never used, which might happen with a temporary register, RISC-V software separates 19 of the registers into two groups:
在这里插入图片描述

对于factorial汇编程序的分析

C:
参数n存储于参数寄存器x10中

long long int fact(long long int n)
{
	if(n < 1) retrun(1);
		else return(n * fact(n - 1));
}

汇编:

fact:
	addi sp, sp, -16            //入栈
	  sd x1, 8(sp)              //x1中存储的返回地址入栈
	  sd x10, 0(sp)             //x10中存储的参数n入栈

	addi x5, x10, -1     // 计算x5 = n - 1
	bge  x5, x0, L1      // 如果x5 >= 0, 指令寄存器跳转至L1

	addi x10, x0, 1
	addi sp, sp, 16
	jalr x0, 0(x1)

	L1:
		addi x10, x10, -1   //计算n-1, 并更新x10寄存器的值
		jal x1, fact   //跳转至fact处,并将此时的地址存入x1

	addi x6, x10, 0     //
	ld   x10, 0(sp)
	ld   x1,  8(sp)
	addi sp, sp, 16
		
	mul x10, x10, x6
	jalr x0, 0(x1)

以n = 3为例

1.fact第一次调用,地址f1存入x1并入栈2,n=3入栈1
2.x5存入3-1=2
3.2>=0, 跳转至L1
4.更新x10为x10 - 1=2, PC跳转至fact处
5.fact第二次调用, 地址f2存入x1并入栈2, n=3入栈1
6.f1压栈4,n=3压栈3
7.x5存入2-1=1
8.1>=0, 跳转至L1
9.更新x10为x10 - 1=1, PC跳转至fact处
10.fact第三次调用, 地址f3存入x1并入栈2, n=1入栈1
11.f1压栈6,n=3压栈5,f2压栈4,n=2压栈3
12.x5存入1-1=0
13.1>=0, 跳转至L1
14.更新x10为x10 - 1=0, PC跳转至fact处
15.fact第四次调用, 地址f4存入x1并入栈2, n=0入栈1
16.f1压栈8,n=3压栈7,f2压栈6,n=2压栈5,f3压栈4, n=1压栈3
17.x5存入0-1=-1, 不跳转L1
18.x10存入1, 栈1、2pop, f1出栈6,n=3出栈5,f2出栈4,n=2出栈3,f3出栈2, n=1出栈1
19.程序跳转回此时x1存储的地址f4
20.拷贝x10的值进入x6中,x6=1
21.栈1出栈至x10, 即x10=1; 栈2出栈至x1, 即地址f3;
22.f1出栈4,n=3出栈3,f2出栈2, n=2出栈1
23.更新x10的值为1*1
24.程序跳转回此时x1存储的地址f3
25.拷贝x10的值进入x6中,x6=1
26.栈1出栈至x10, 即x10=2; 栈2出栈至x1, 即地址f2;
27.f1出栈2,n=3出栈1
28.更新x10的值为1*1*2
29.程序跳转回此时x1存储的地址f2
30.拷贝x10的值进入x6中,x6=2
31.栈1出栈至x10, 即x10=3; 栈2出栈至x1, 即地址f3;
32.更新x10的值为1*1*2*3
33.程序跳转回此时x1存储的地址f1

C语言中的变量一般表达内存中的一个地址

  • C has two storage classes: automatic and static. Automatic variables are local to a procedure and are discarded when the procedure exits. Static variables exist across exits from and entries to procedures. C variables declared outside all procedures are considered static, as are any variables declared using the keyword static. The rest are automatic. To simplify access to static data, some RISC-V compilers reserve a register x3 for use as the global pointer, or gp.

The stack starts in the high end of the user addresses space
stack is also used to store variables that are local to the procedure but do not fit in registers, such as local arrays or structures

procedure frame or activation record:
The segment of the stack containing a procedure’s saved registers and local variables
frame pointer: A value denoting the location of the saved registers and local variables for a given procedure.

frame pointer VS. stack pointer:
frame pointer VS. stack pointer
The stack pointer operates on the stack. The frame pointer operates on the frame. Very often, the frame is located in the stack, but it ain’t necessarily so.

Graphics and Computing GPUs

  • The major driving force for improving graphics processing was the computer game industry. Many programmers of scientific and multimedia applications today are pondering whether to use GPUs or CPUs.
  • heterogeneous systems: A system combining different processor types.
    Trends:
    GPUs and their associated drivers implement the OpenGL and DirectX models of graphics processing.
  • OpenGL is an open standard for 3D graphics programming available for most computer.
  • DirectX is a series of Microsoft multimedia programming interfaces.
    These API(application programming interfaces) have well-defined behavior.
  • visual computing.
  • GPU evolves into scalable parallel processor.
  • In the GeForce 8-series generation of GPUs, the geometry, vertex, and pixel processing all run on the same type of processor. This unification allows for dramatic scalability. ⇓ \Downarrow
    A new model of programming for the GPU requires
    Compute Unified Device Architecture(CUDA) is a scalable parallel programming model and software platform for the GPU and other parallel processors that allows the programmer to bypass the graphics API and graphics interfaces of the GPU and simply program in C or C++.
  • The CUDA programming model has an SPMD (single-program multiple data) software style, in which a programmer writes a program for one thread that is instanced and executed by many threads in parallel
    on the multiple processors of the GPU.
    With CUDA and GPU computing, it is now possible to use the GPU as both a graphics processor and a computing processor at the same time
### 回答1: 《计算机组成与设计:硬件/软件接口》是一本经典的计算机科学教材,由David A. Patterson和John L. Hennessy撰写。本书是计算机体系结构和计算机组成原理的教材之一,适合计算机科学和工程专业的本科生和研究生学习。 本书的主要内容包括计算机体系结构和计算机组成原理两个方面。其中计算机体系结构主要涵盖了指令集架构、流水线、多级存储器层次结构、输入输出系统等知识。计算机组成原理则讲述了如何实现计算机体系结构,主要包括数字逻辑、微程序控制器、数据通路、存储器等方面的内容。 除此之外,本书还涉及计算机性能、并行计算、虚拟存储器等与计算机体系架构和计算机组成原理有关的高级话题。本书同时也关注现代计算机体系结构的最新动态,比如高性能计算和云计算等领域。 《计算机组成与设计:硬件/软件接口》是一本全面而深入的教材,涵盖了极为重要的计算机科学知识。对于想要深入理解计算机体系结构和组成原理的学生和工程师来说,是一本必读的教材。 ### 回答2: 《计算机组成与设计:硬件/软件接口》是一本由David Patterson和John Hennessy合著的经典教材,它涵盖了计算机组成的相关内容。本书是计算机科学与技术领域的重要参考资料之一。 首先,《计算机组成与设计》介绍了计算机硬件和软件之间的接口。它深入研究了计算机组成的各个层次,从底层的逻辑门、寄存器和存储器开始,到高层的指令集和操作系统。 其次,本书详细讲解了指令集和指令执行的原理。它通过解释指令的格式和编码,揭示了指令在计算机中的执行过程。读者可以了解到指令的转化和存储方式,以及指令的执行时间和性能。 此外,《计算机组成与设计》探讨了计算机存储器的层次结构和访问方式。它介绍了从寄存器到缓存和主存的存储器层级结构,以及存储器的映射方式和访问策略。通过这些内容,读者可以更好地理解计算机存储器的工作原理和性能优化方法。 最后,本书还讲述了计算机的输入输出系统和总线结构。它阐述了计算机与外部设备之间的数据传输和通信方式,包括串行通信、并行通信和总线通信等。读者可以了解到计算机与外部设备之间的接口标准和通信协议。 综上所述,《计算机组成与设计:硬件/软件接口》是一本涵盖计算机组成各个层次的优秀教材,对于计算机科学与技术领域的学习和研究具有重要意义。它帮助读者全面了解计算机硬件和软件之间的接口,深入理解计算机的工作原理和性能优化方法。 ### 回答3: 《计算机组织与设计:硬件/软件接口》是一本由David A. Patterson和John L. Hennessy合著的经典教材。该书主要介绍了计算机系统的组成和设计原理,强调硬件与软件之间的接口。 这本书首先介绍了计算机的基本工作原理和体系结构。它详细阐述了指令集架构、运算器、控制器、存储器和输入输出设备等各个方面的知识,并解释了它们如何协同工作以完成计算任务。 接着,本书着重介绍了计算机系统的层次结构和性能优化方法。它讲解了如何通过层次化的存储系统、流水线技术和超标量处理器等方法提高计算机系统的性能,同时深入探讨了一些常见的优化技术和编程模型。 此外,该书还特别关注了现代计算机系统中的重要问题,如并行计算、存储器层次结构和虚拟存储器等。它解释了多核处理器和并行编程的基本概念,介绍了缓存和虚拟内存的设计原理,并讨论了如何通过这些技术提高计算机系统的性能和可扩展性。 总之,《计算机组织与设计:硬件/软件接口》透彻地介绍了计算机系统的组织和设计原理,深入探讨了硬件与软件之间的接口及其影响。它对计算机科学及相关领域的学生和从业者都具有重要的参考价值,可帮助他们更好地理解和应用计算机系统的原理和技术。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

乘螺舟而至

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值