Digital Design and Computer Architecture, Second Edition
by David M. Harris, Sarah L. Harris

7.4.1 多周期数据通路

我们还是像上次一样,从存储器和结构状态开始设计CPU,如图7.16 。在设计单周期CPU时,指令存储器和数据存储器是分开的,因为指令和数据的读取是在同一个周期中进行的。现在,我们把指令和数据存储器结合起来。这是可以实现的,因为我们可以在一个周期中读取指令,然后在另一个周期中读写数据,同时PC和寄存器中的值保持不变。我们将通过不断添加模块来分析每一条指令的每一步工作,从而逐步搭建数据通路。新添加的连线用黑色线标出(新的控制信号用蓝色标出),已经学过的部分用灰色标出。

Again, we begin our design with the memory and architectural state of the MIPS processor, shown in Figure 7.16. In the single-cycle design, we used separate instruction and data memories because we needed to read the instruction memory and read or write the data memory all in one cycle. Now, we choose to use a combined memory for both instructions and data. This is more realistic, and it is feasible because we can read the instruction in one cycle, then read or write the data in a separate cycle. The PC and register file remain unchanged. We gradually build the datapath by adding components to handle each step of each instruction. The new connections are emphasized in black (or blue, for new control signals), whereas the hardware that has already been studied is shown in gray.
图 7.16

PC存储了当前要执行的指令的地址。指令执行的第一步是将指令从指令存储器中读出来。图7.17展示了:PC直接连接到(指令)存储器的地址输入端口。指令读出后,被存储在一个新的,非结构性的指令寄存器中,供以后的时钟周期使用。指令寄存器接受一个使能信号:IR Write,该信号有效时,寄存器中的值可以被更新。

The PC contains the address of the instruction to execute. The first step is to read this instruction from instruction memory. Figure 7.17 shows that the PC is simply connected to the address input of the instruction memory. The instruction is read and stored in a new nonarchitectural Instruction Register so that it is available for future cycles. The Instruction Register receives an enable signal, called IRWrite, that is asserted when it should be updated with a new instruction.



As we did with the single-cycle processor, we will work out the datapath connections for the lw instruction. Then we will enhance the datapath to handle the other instructions.
For a lw instruction, the next step is to read the source register containing the base address. This register is specified in the rs field of the instruction, Instr25:21. These bits of the instruction are connected to one of the address inputs, A1, of the register file, as shown in Figure 7.18. The register file reads the register onto RD1. This value is stored in another nonarchitectural register, A.


The lw instruction also requires an offset. The offset is stored in the
immediate field of the instruction, Instr15:0, and must be sign-extended to 32 bits, as shown in Figure 7.19. The 32-bit sign-extended value is called SignImm. To be consistent, we might store SignImm in another nonarchitectural register. However, SignImm is a combinational function of Instr and will not change while the current instruction is being processed, so there is no need to dedicate a register to hold the constant value.

要载入的数据的地址是基址和偏移量的和,我们用一个ALU来运算它们的和,如图7.20。为了进行加法运算,ALUControl应设为 010,运算结果ALUResult 存储在一个非结构性寄存器 ALUOut中。

The address of the load is the sum of the base address and offset. We
use an ALU to compute this sum, as shown in Figure 7.20. ALUControl
should be set to 010 to perform an addition. ALUResult is stored in a nonarchitectural register called ALUOut.


The next step is to load the data from the calculated address in the memory. We add a multiplexer in front of the memory to choose the memory address, Adr, from either the PC or ALUOut, as shown in Figure 7.21. The multiplexer select signal is called IorD, to indicate either an instruction or data address. The data read from the memory is stored in another nonarchitectural register, called Data. Notice that the address multiplexer permits us to reuse the memory during the lw instruction. On the first step, the address is taken from the PC to fetch the instruction. On a later step, the address is taken from ALUOut to load the data. Hence, IorD must have different values on different steps. In Section 7.4.2, we develop the FSM controller that generates these sequences of control signals.


Finally, the data is written back to the register file, as shown in Figure
7.22. The destination register is specified by the rt field of the instruction, Instr20:16.


While all this is happening, the processor must update the program counter by adding 4 to the old PC. In the single-cycle processor, a separate adder was needed. In the multicycle processor, we can use the existing ALU on one of the steps when it is not busy. To do so, we must insert source multiplexers to choose the PC and the constant 4 as ALU inputs, as shown in Figure 7.23. A two-input multiplexer controlled by ALUSrcA chooses either the PC or register A as SrcA. A four-input multiplexer controlled by ALUSrcB chooses either 4 or SignImm as SrcB. We use the other two multiplexer inputs later when we extend the datapath to handle other instructions. (The numbering of inputs to the multiplexer is arbitrary.) To update the PC, the ALU adds SrcA (PC) to SrcB (4), and the result is written into the program counter register. The PCWrite control signal enables the PC register to be written only on certain cycles.



This completes the datapath for the lw instruction. Next, let us extend the datapath to also handle the sw instruction. Like the lw instruction, the sw instruction reads a base address from port 1 of the register file and sign-extends the immediate. The ALU adds the base address to the immediate to find the memory address. All of these functions are already supported by existing hardware in the datapath.

唯一需要添加的新特性是,我们要再从寄存器组中读一次寄存器,并把结果存入存储器中,如图7.24。该寄存器编号由指令的rt域指出,即Instr[20:16]。这部分要接到寄存器组的第二个地址输入中。数据读出后,存储在非结构性寄存器B中。在下一个步骤中,它会被送至数据存储器的数据输入端口(WD)以向存储器写入。存储器还需附加一个控制信号,Mem Write来控制何时可以进行写入操作。

The only new feature of sw is that we must read a second register from the register file and write it into the memory, as shown in Figure 7.24. The register is specified in the rt field of the instruction, Instr20:16,
which is connected to the second port of the register file. When the register is read, it is stored in a nonarchitectural register, B. On the next step, it is sent to the write data port (WD) of the data memory to be written. The memory receives an additional MemWrite control signal to indicate that the write should occur.



For R-type instructions, the instruction is again fetched, and the two source registers are read from the register file. ALUSrcB1:0, the control input of the SrcB multiplexer, is used to choose register B as the second source register for the ALU, as shown in Figure 7.25. The ALU performs the appropriate operation and stores the result in ALUOut. On the next step, ALUOut is written back to the register specified by the rd field of the instruction, Instr15:11. This requires two new multiplexers. The MemtoReg multiplexer selects whether WD3 comes from ALUOut (for R-type instructions) or from Data (for lw). The RegDst instruction selects whether the destination register is specified in the rt or rd field of the instruction.


若要运行beq指令,在取指令后,同样从寄存器组中读出两个寄存器的数据。要检查它们是否相等,我们用ALU对它们做减法操作,若结果为0,就将控制信号Zero设为有效。同时,如果确定分支发生,数据通路需要把新的PC值,PC’=PC+4+SignInn*4计算出来。在单周期处理器中,我们需要另一个ALU来进行这一计算,但在多周期处理器中,ALU可以重复使用。第一步,我们把PC+4写入程序计数器中,就像我们在其他指令中做的那样;第二步,ALU用更新后的PC计算PC+SignImmi*4。SignImmi*4由SignImmi左移2位得到,如图7.26。ScrB选择左移后的SignImm并将其与PC相加。在PCWrite有效时,或在分支发生时,程序计数器都可以更新。控制信号Branch指示当前指令为分支指令,此时若Zero有效,分支执行。所以通路需要生成一个新的PC写使能:PCEn,在PCWrite有效时,或是Branch 和Zero都有效时,PCEn有效。

For the beq instruction, the instruction is again fetched, and the two source registers are read from the register file. To determine whether the registers are equal, the ALU subtracts the registers and, upon a zero result, sets the Zero flag. Meanwhile, the datapath must compute the next value of the PC if the branch is taken: PC′ = PC + 4 + SignImm × 4. In the single-cycle processor, yet another adder was needed to compute the branch address. In the multicycle processor, the ALU can be reused again to save hardware. On one step, the ALU computes PC + 4 and writes it back to the program counter, as was done for other instructions. On another step, the ALU uses this updated PC value to compute PC + SignImm × 4. SignImm is left-shifted by 2 to multiply it by 4, as shown in Figure 7.26. The SrcB multiplexer chooses this value and adds it to the PC. This sum represents the destination of the branch and is stored in ALUOut. A new multiplexer, controlled by PCSrc, chooses what signal should be sent to PC′. The program counter should be written either when PCWrite is asserted or when a branch is taken. A new control signal, Branch, indicates that the beq instruction is being executed. The branch is taken if Zero is also asserted. Hence, the datapath computes a new PC write enable,called PCEn, which is TRUE either when PCWrite is asserted or when both Branch and Zero are asserted.


This completes the design of the multicycle MIPS processor datapath. The design process is much like that of the single-cycle processor in that hardware is systematically connected between the state elements to handle each instruction. The main difference is that the instruction is executed in several steps. Nonarchitectural registers are inserted to hold the results of each step. In this way, the ALU can be reused several times, saving the cost of extra adders. Similarly, the instructions and data can be stored in one shared memory. In the next section, we develop an FSM controller to deliver the appropriate sequence of control signals to the datapath on each step of each instruction.

7.4.2 多周期控制器


As in the single-cycle processor, the control unit computes the control signals based on the opcode and funct fields of the instruction, Instr31:26 and Instr5:0. Figure 7.27 shows the entire multicycle MIPS processor with the control unit attached to the datapath. The datapath is shown in black, and the control unit is shown in blue.


As in the single-cycle processor, the control unit is partitioned into a main controller and an ALU decoder, as shown in Figure 7.28. The ALU decoder is unchanged and follows the truth table of Table 7.2. Now, however, the main controller is an FSM that applies the proper control signals on the proper cycles or steps. The sequence of control signals depends on the instruction being executed. In the remainder of this section, we will develop the FSM state transition diagram for the main controller.

首先来看选择器选择信号和寄存器使能信号。选择信号有:MemtiReg, RegDest, IorD, PCSrc, ALUScrB, ALUScrA。使能信号有:IR Write, MemWrite, PCWrite, Branch, RegWrite。

The main controller produces multiplexer select and register enable signals for the datapath. The select signals are MemtoReg, RegDst, IorD, PCSrc, ALUSrcB, and ALUSrcA. The enable signals are IRWrite, MemWrite, PCWrite, Branch, and RegWrite.


To keep the following state transition diagrams readable, only the relevant control signals are listed. Select signals are listed only when their value matters; otherwise, they are don’t cares. Enable signals are listed only when they are asserted; otherwise, they are 0.


任何指令的第一步都是根据PC寄存器中的地址,从存储器中取出指令。FSM在重置时进入这种状态。读存储器时,IorD置为1,使存储器接收来自PC的地址。IR Write为有效,使指令能够写入指令寄存器中。同时,PC增加4,指向下一条指令。因为此时ALU没有被其他过程占用,我们可以用它在取指令的同时计算PC+4。ALUScrA为0,选择第一个操作数来自PC。ALUScrB为01,选择第二个操作数为4.ALUOp为00,,这样ALU解码器会生成ALUControl=010,控制ALU进行加法操作。为了更新PC值,PCScr设为0,PCWrite为有效。这些控制信号列在图7.29中。这一步骤中的数据流画在图7.30中,取址用蓝色虚线表示,PC自增用灰色虚线表示。

The first step for any instruction is to fetch the instruction from memory at the address held in the PC. The FSM enters this state on reset. To read memory, IorD = 0, so the address is taken from the PC. IRWrite is asserted to write the instruction into the instruction register, IR. Meanwhile, the PC should be incremented by 4 to point to the next instruction. Because the
ALU is not being used for anything else, the processor can use it to compute PC + 4 at the same time that it fetches the instruction. ALUSrcA = 0, so SrcA comes from the PC. ALUSrcB = 01, so SrcB is the constant 4. ALUOp = 00, so the ALU decoder produces ALUControl = 010 to make the ALU add. To update the PC with this new value, PCSrc = 0, and PCWrite is asserted. These control signals are shown in Figure 7.29. The data flow on this step is shown in Figure 7.30, with the instruction fetch shown using the dashed blue line and the PC increment shown using the dashed gray line.


The next step is to read the register file and decode the instruction. The register file always reads the two sources specified by the rs and rt fields of the instruction. Meanwhile, the immediate is sign-extended. Decoding involves examining the opcode of the instruction to determine what to do next. No control signals are necessary to decode the instruction, but the FSM must wait 1 cycle for the reading and decoding to complete, as shown in Figure 7.31. The new state is highlighted in blue. The data flow is shown in Figure 7.32.

lw 和 sw


Now the FSM proceeds to one of several possible states, depending on the opcode. If the instruction is a memory load or store (lw or sw), the multicycle processor computes the address by adding the base address to the sign-extended immediate. This requires ALUSrcA = 1 to select register A and ALUSrcB = 10 to select SignImm. ALUOp = 00, so the ALU adds. The effective address is stored in the ALUOut register for use on the next step. This FSM step is shown in Figure 7.33, and the data flow is shown in Figure 7.34.

如果要执行的是lw指令,接下来处理器应该从存储器读取数据并将其写入寄存器组中。这两步如图7.35所示。为了从存储器中读数,IorD设为1,选择刚刚运算出来后被存入ALUOut 中的结果作为地址输入。这个地址中的数据读出后被存在数据寄存器中,这是状态S3。在下一状态,S4里, Data被写入寄存器组中。MemtoReg设为1,选择数据来源为Data;RegDst设为0,选择写入的寄存器由指令的rt域指出。RegWrite为有效,以允许写入操作进行。这样lw指令就完成执行了。最后,FSM回到初始状态S0以取下一条指令。请读者们尝试自己想象这些步骤中的数据流。

If the instruction is lw, the multicycle processor must next read data from memory and write it to the register file. These two steps are shown in Figure 7.35. To read from memory, IorD = 1 to select the memory address that was just computed and saved in ALUOut. This address in memory is read and saved in the Data register during step S3. On the next step, S4, Data is written to the register file. MemtoReg = 1 to select Data, and RegDst = 0 to pull the destination register from the rt field of the instruction. RegWrite is asserted to perform the write, completing the lw instruction. Finally, the FSM returns to the initial state, S0, to fetch the next instruction. For these and subsequent steps, try to visualize the data flow on your own.


From state S2, if the instruction is sw, the data read from the second port of the register file is simply written to memory. In state S3, IorD = 1 to select the address computed in S2 and saved in ALUOut. MemWrite is asserted to write the memory. Again, the FSM returns to S0 to fetch the next instruction. The added step is shown in Figure 7.36.

R 型指令


If the opcode indicates an R-type instruction, the multicycle processor must calculate the result using the ALU and write that result to the register file. Figure 7.37 shows these two steps. In S6, the instruction is executed by selecting the A and B registers (ALUSrcA = 1, ALUSrcB = 00) and performing the ALU operation indicated by the funct field of the instruction. ALUOp = 10 for all R-type instructions. The ALUResult is stored in ALUOut. In S7, ALUOut is written to the register file, RegDst = 1, because the destination register is specified in the rd field of the instruction. MemtoReg = 0 because the write data, WD3, comes from ALUOut. RegWrite is asserted to write the register file.

beq 指令


For a beq instruction, the processor must calculate the destination address and compare the two source registers to determine whether the branch should be taken. This requires two uses of the ALU and hence might seem to demand two new states. Notice, however, that the ALU was not used during S1 when the registers were being read. The processor might as well use the ALU at that time to compute the destination address by adding the incremented PC, PC + 4, to SignImm × 4, as shown in Figure 7.38 (see page 404). ALUSrcA = 0 to select the incremented PC,ALUSrcB = 11 to select SignImm × 4, and ALUOp = 00 to add. The destination address is stored in ALUOut. If the instruction is not beq, the computed address will not be used in subsequent cycles, but its computation was harmless. In S8, the processor compares the two registers by subtracting them and checking to determine whether the result is 0. If it is, the processor branches to the address that was just computed. ALUSrcA = 1 to select register A; ALUSrcB = 00 to select register B; ALUOp = 01 to subtract; PCSrc = 1 to take the destination address from ALUOut, and Branch = 1 to update the PC with this address if the ALU result is 0.[2]


2 Now we see why the PCSrc multiplexer is necessary to choose PC′ from either ALUResult(in S0) or ALUOut (in S8).


Putting these steps together, Figure 7.39 shows the complete main controller state transition diagram for the multicycle processor (see page 405). Converting it to hardware is a straightforward but tedious task using the techniques of Chapter 3. Better yet, the FSM can be coded in an HDL and synthesized using the techniques of Chapter 4.

7.4.3 更多指令


As we did in Section 7.3.3 for the single-cycle processor, let us now extend the multicycle processor to support the addi and j instructions. The next two examples illustrate the general design process to support new instructions.

addi 指令

例 7.5:改进多周期处理器,使之可以支持addi指令。

解决方案:解决方案:现有的数据通路已经能够进行寄存器数据与立即数的加法运算了,因此我们要做的就是在FSM中加入控制进行addi运算的新状态,如图7.40。这些状态和R型指令的状态很相似。在状态S9中,寄存器A中的值与SignImm相加(ALUSrcA=1, ALUSrcB=10, ALUOp=00),得到的结果,ALUResult,存储在ALUOut中。在状态10中,ALUOut中的结果被写入rt指示的寄存器中(RegDst=0, MemtoReg=0, RegWrite设为有效)。细心的读者也许会注意到S2和S9是完全相同的,因此我们可以把它们合并为一个状态。

Example 7.5 addi INSTRUCTION
Modify the multicycle processor to support addi.

Solution: The datapath is already capable of adding registers to immediates, so all we need to do is add new states to the main controller FSM for addi, as shown in Figure 7.40 (see page 406). The states are similar to those for R-type instructions. In S9, register A is added to SignImm (ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00) and the result, ALUResult, is stored in ALUOut. In S10, ALUOut is written to the register specified by the rt field of the instruction (RegDst = 0, MemtoReg = 0, RegWrite asserted). The astute reader may notice that S2 and S9 are identical and could be merged into a single state.

j 指令

例 7.6:改进多周期处理器,使之能够执行j指令。


Example 7.6 j INSTRUCTION
Modify the multicycle processor to support j.

Solution: First, we must modify the datapath to compute the next PC value in the case of a j instruction. Then we add a state to the main controller to handle the instruction.
Figure 7.41 shows the enhanced datapath (see page 407). The jump destination address is formed by left-shifting the 26-bit addr field of the instruction by two bits, then prepending the four most significant bits of the already incremented PC. The PCSrc multiplexer is extended to take this address as a third input.

Figure 7.42 shows the enhanced main controller (see page 408). The new state, S11, simply selects PC′ as the PCJump value (PCSrc = 10) and writes the PC. Note that the PCSrc select signal is extended to two bits in S0 and S8 as well.





