Computer Organization and Design The Hardware Software interface 阅读笔记3

最新推荐文章于 2023-11-12 06:19:59 发布

乘螺舟而至

最新推荐文章于 2023-11-12 06:19:59 发布

阅读量1.1k

点赞数

分类专栏： RISCV

本文链接：https://blog.csdn.net/qq_26371477/article/details/109705452

版权

RISCV 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

The processor

在大规模循环逻辑中要尽量避免数据强依赖的分支(data-dependent branching).
条件语句(conditional execution)中的两种分支情况(branching)

Unconditional jump: This is performed by the JMP instruction. The JMP instruction provides a label name where the flow of control is transferred immediately. The syntax of the JMP instruction is −
Conditional jump: If some specified condition is satisfied in conditional jump, the control flow is transferred to a target instruction. There are numerous conditional jump instructions depending upon the condition and data.

现代的双模态分支预测器有高达93.5%的预测精度

simplicity favors regularity

Build a Single Cycle Datapath

R-type instructions:

Basic element 1: program counter(PC)

is a register hold the address of the current instruction

Basic element 2: instruction adder
We need a adder to increment the PC to the address of the next instruction.

Basic element 3:Instruction memory
We also need to fetch the instruction from memory

Store the instructions of a program
Supply instructions given an address.

Basic datapath
在这里插入图片描述

SB-type instructions:

To implement the branch instruction, we must compute the branch target address relative to the branch instruction address. We need to compute the branch target address by adding the sign-extended offset field of the instruction to the PC.

load and store address compute the memory address by adding the base register.
Basic element 1: register file
Basic element 2: ALU

Basic element 3: Data memory unit
We need a unit to sign-extend the 12-bit offset field in the instruction to a 64-bit signed value
Basic element 4: immediate generation unit
We need a data memory unit to read from and write to.

The immediate generation unit has a 32-bit instruction as input that selects a 12-bit field for load, store, and branch if equal that is sign-extended into a 64-bit result appearing on the output.

The Basic datapath which can execute the basic instruction(load-store register, ALU operations, and branches)
在这里插入图片描述
Two questions:

1. Why the branch address generated by ImmGen should shift left by 1?

A: the RISC-V instruction format called SB-type. This format can represent branch addresses from −4096 to 4094, in multiples of 2.
在这里插入图片描述

1. Since the immediate only contains 12 bits, how does the processor operate if I am adding a big immediate contains mores that 12 bits?

A: Using a new instruction format: U-type. Either the compiler or the assembler must break large constants into pieces and then reassemble them into a register. As you might expect, the immediate field’s size restriction may be a problem for memory addresses in loads and stores as well as for constants in immediate instructions.

1. Why using PC-relative addressing?

PC-relative addressing: An addressing regime in which the address is the sum of the program counter (PC) and a constant in the instruction.
A: Even the U-type format only has 20-bit field to represent, which is far too small to make all of the address of today’s program to fit in.
$\Rightarrow$ Since most of the branch go to the nearby instruction $\Rightarrow$ Using an alternative approach to specify a register that would always be added to the branch offset.
$\Rightarrow$ The program counter(PC) contains the address of the current instruction.
$\Rightarrow$ PC is an ideal address to serve as a base address for the instructions to provide the offset to branch.
$\Rightarrow$ If occasionally they branch far away.
$\Rightarrow$ Compiler inserts an unconditional branch to the branch target, and inverts the condition so that the conditional branch decides whether to skip the unconditional branch.

Four RISCV addressing modes:

在这里插入图片描述

Build the Single Cycle Control Unit

ALU control

output: 在这里插入图片描述
input:
funct7 field; funct3 field; 2bit control field(ALUOP)
ALUOP:

00: load and store
01: subtract and test if zero for beq
10: Arithmetic operation be determined by the operation encoded in the funct7 and funct3 fields.

Once the truth table has been constructed, it can be optimized and then turned into gates.

Datapath with control Unit:
在这里插入图片描述

Implementing Pipelining

RISCV五级指令流水线：
在这里插入图片描述
RISC-V pipeline design advantages:

All RISC-V instructions has the same length.
RISC-V has just few instruction format
memory operands only appear in loads and stores in RISC-V. So we can use the execute stage to calculate the memory address then access memory in the following stage.

Three types of pipeline hazards:

=Structural hazard: When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.
The RISCV instruction set was designed to be pipelined, making it fairly easy for designers to avoid structural hazards when designing a pipeline. Unless we have a single memory instead of two, then it will happen when the first instruction is accessing data from memory while the fourth instruction is fetching an instruction from that same memory. Then a structural hazard occurs.
Data Hazards: When a planned instruction cannot execute in the proper clock cycle because data that are needed to execute the instruction are not yet available.

Adding extra hardware to retrieve the missing item early from the internal resources is called forwarding or bypassing. This is a method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer visible registers or memory.
一般当两个算术运算相邻，且后一个算术运算要用到前一个算数运算的值的时候，需要采用forwarding technique.

load-use data hazard A specific form of data hazard in which the data being loaded by a load instruction have not yet become available when they are needed by another instruction.

Even with forwarding, we would have to stall one stage for a load-use data hazard.

A pipeline stall: (bubble)

当前一个指令为load时，只有在第四个阶段MEM阶段才能取出值，而若下一个指令比如sub在第三个EX阶段要用到，则此时不仅需要forwarding，更需要stall
在这里插入图片描述
Each RISC-V instruction writes at most one result and does this in the last stage of the pipeline. Forwarding is harder if there are multiple results to forward per instruction or if there is a need to write a result early on in instruction execution.

Control Hazard: When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected.(分支预测失败)
Here comes the branch prediction

longer pipelines exacerbate the problem, in this case by raising the cost of misprediction

latency: Time to complete an individual instruction.

Instruction sets can either make life harder or simpler for pipeline designers, who must already cope with structural, control, and data hazards.

Pipelined Datapath and Control

The pipelined datapath:
在这里插入图片描述

Instruction fetch
1 Instruction being read from the instruction memory using the address in the PC.
2 Instruction being placed in the IF/ID registers
3 PC is also saved in the IF/ID in case it is needed later such as ‘beq’
Instruction decode and register file read
1 Decode the instruction
2 Store the content of the corresponding registers in ID/EX registers
3 Store the sign-extended immediate in ID/EX registers
4 Store the PC address in ID/EX registers
-Execute and address calculation

Using ALU to calculate and store the result in the EX/MEM registers.
-Memory access
1 Using the address to read the data from memory and load into the MEM/WB registers
-Write-back
1 Read the data from the MEM/WB registers and write it into the register file
2 Note that in the ld instruction, the destination register is passed through the pipeline, and finally enter into the MEM/WB registers and serves as address for the data to be write back.

Control signals are then used in the appropriate pipeline stage as the instruction moves down the pipeline, just as the destination register number for loads moves down the pipeline.

Hardware perform forwarding

hazard type 1: when a register is read and written in the same clock cycle

Can be resolved by the design of the register file hardware. Assume that the write is in the first half of the clock cycle and the read is in the second half, so the read delivers what is written. As is the case for many implementations of register files, we have no data hazard in this case.
We assume that the register file forwards values that are read and written during the same clock cycle, but the values come from the register file instead of a pipeline register. Register file “forwarding”—that is, the read gets the value of the write in that clock cycle
Such a register file performs another form of forwarding, but it occurs within the register file.

hazard type 2: the result is not yet gotten in the first instruction and the second instruction is about to use this result

Can be resolved by forwarding.

Two pairs of hazard conditions:

该表示法的后半段表示指令中特定的field.
Rd: The register destination operand
Rs1: The first register operand
Rs2: The second register operand
故例如如下hazard：

sub x2, x1, x3
and x12, x2, x5      //1st operand x2 set by sub

This sub-and is a type1a hazard

If we can take the inputs to the ALU from any pipeline register rather than just ID/EX, then we can forward the correct data. By adding multiplexors to the input of the ALU, and with the proper controls, we can run the pipeline at full speed in the presence of these data hazards.

Some instructions do not write registers $\Rightarrow$ When detecting hazards, simply to check if the RegWrite signal will be active.

故检测EX hazard

if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 10

if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 10

检测MEM hazard

if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0)
		and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01

if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0)
		and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01

So here comes the datapath modified to resolve hazards via forwarding:
在这里插入图片描述

Hardware performed stalling

if (ID/EX.MemRead and         //test if the instruction is load
//check if the destination register field of the load matches in EX //stage matches either source register of the instruction in the ID //stage
	((ID/EX.RegisterRd = IF/ID.RegisterRs1) or 
	(ID/EX.RegisterRd = IF/ID.RegisterRs2)))
	stall the pipeline

If the condition holds, the instruction stalls one clock cycle. After this one-cycle stall, the forwarding logic can handle the dependence and execution proceeds

If the instruction in the ID stage is stalled, then the instruction in the IF stage must also be stalled; otherwise, we would lose the fetched instruction. Preventing these two instructions from making progress is accomplished simply by preventing the PC register and the IF/ID pipeline register from changing.
nops: An instruction that does no operation to change state.

Pipelined control overview, showing the two multiplexors for forwarding, the hazard detection unit, and the forwarding unit.
在这里插入图片描述

Control hazards

control hazard or branch hazard Delay in determining the proper instruction to fetch.
Issue: An instruction must be fetched at every clock cycle to sustain the pipeline, yet in our design the decision about whether to branch doesn’t occur until the MEM pipeline stage.

flush: To discard instructions in a pipeline, usually due to an
unexpected event.
$\Rightarrow$ If the conditional branch is taken, the instructions that are being fetched and decoded must be discarded. Execution continues at the branch target.
$\Rightarrow$ We just change control to 0 in the ID stage and let them percolate through the pipeline. Discarding instructions, then, means we must be able to flush instructions in the IF, ID, and EX stages of the pipeline.
$\Rightarrow$ We want to reduce the delay of taking branches.
$\Rightarrow$ Move the branch taken decision from MEM stage to earlier stage.
$\Rightarrow$ Two actions occur earlier:

computing the branch target address
$\Rightarrow$ Just move the branch adder from the EX stage to the ID stage.
evaluating the branch decision
$\Rightarrow$ Two complicating factors
- Decide weather to take the branch and set the PC to the branch target address during ID. Requires an new forwarding logic.
- If the value in branch comparison is to be produced later time, a stall will be need. for one stall in ALU instruction and two stalls in load instruction.

To flush instructions in the IF stage, we add a control line, called IF.Flush, that zeros the instruction field of the IF/ID pipeline register.

Dynamic branch prediction动态分支预测

For the simple five-stage pipeline, static prediction scheme, possibly coupled with compiler-based prediction, is probably adequate.
But for deeper pinelines, the branch penalty increases in terms of instructions lost, which will surely waste too much performance.

dynamic branch prediction. To look up the address of the instruction to see if the conditional branch was taken the last time this instruction was executed, and, if so, to begin fetching new instructions from the same place as the last time.
branch prediction buffer Also called branch history table. A small memory that is indexed by the lower portion of the address of the branch instruction and that contains one or more bits indicating whether the branch was recently taken or not.

Two-bit prediction schemes are often used, a prediction must be wrong twice before it is changed.
branch target buffer A structure that caches the destination PC or destination instruction for a branch. It is usually organized as a cache with tags, making it more costly than a simple prediction buffer.
Correlating predictor 双模态分支预测器状态机：