《Digital Design and Computer Architecture》7.4 多周期CPU（1）

最新推荐文章于 2022-04-19 23:01:42 发布

CSDN_Ady

最新推荐文章于 2022-04-19 23:01:42 发布

阅读量1.4k

点赞数

本文介绍了《Digital Design and Computer Architecture》中关于多周期CPU的设计，从数据通路和控制器两个方面详细阐述了lw、sw、R型指令和beq指令的实现过程。首先，通过ALU计算基址和偏移量得到内存地址，读取或写入数据。接着，扩展数据通路以支持R型指令，包括加法运算和结果写回。最后，探讨了beq分支指令的处理，涉及条件分支和目标地址计算。此外，还讨论了如何扩展多周期CPU以支持addi和j跳转指令。

摘要由CSDN通过智能技术生成

教材学习与翻译：简单的多周期处理器实现
原文来自
Digital Design and Computer Architecture, Second Edition
by David M. Harris, Sarah L. Harris
7.4 MULTICYCLE PROCESSOR

7.4.1 多周期数据通路

我们还是像上次一样，从存储器和结构状态开始设计CPU，如图7.16 。在设计单周期CPU时，指令存储器和数据存储器是分开的，因为指令和数据的读取是在同一个周期中进行的。现在，我们把指令和数据存储器结合起来。这是可以实现的，因为我们可以在一个周期中读取指令，然后在另一个周期中读写数据，同时PC和寄存器中的值保持不变。我们将通过不断添加模块来分析每一条指令的每一步工作，从而逐步搭建数据通路。新添加的连线用黑色线标出（新的控制信号用蓝色标出），已经学过的部分用灰色标出。

Again, we begin our design with the memory and architectural state of the MIPS processor, shown in Figure 7.16. In the single-cycle design, we used separate instruction and data memories because we needed to read the instruction memory and read or write the data memory all in one cycle. Now, we choose to use a combined memory for both instructions and data. This is more realistic, and it is feasible because we can read the instruction in one cycle, then read or write the data in a separate cycle. The PC and register file remain unchanged. We gradually build the datapath by adding components to handle each step of each instruction. The new connections are emphasized in black (or blue, for new control signals), whereas the hardware that has already been studied is shown in gray.

PC存储了当前要执行的指令的地址。指令执行的第一步是将指令从指令存储器中读出来。图7.17展示了：PC直接连接到（指令）存储器的地址输入端口。指令读出后，被存储在一个新的，非结构性的指令寄存器中，供以后的时钟周期使用。指令寄存器接受一个使能信号：IR Write，该信号有效时，寄存器中的值可以被更新。

The PC contains the address of the instruction to execute. The first step is to read this instruction from instruction memory. Figure 7.17 shows that the PC is simply connected to the address input of the instruction memory. The instruction is read and stored in a new nonarchitectural Instruction Register so that it is available for future cycles. The Instruction Register receives an enable signal, called IRWrite, that is asserted when it should be updated with a new instruction.

lw指令通路的实现

就像我们对单周期处理器所做的那样，我们要先建立lw指令的数据通路，然后对其进行补充和优化，使之能够处理其他的指令。
对于lw指令，读出指令后的下一步是读取源寄存器中的基址。这个寄存器的由指令的rs域指出，即Instr[25:21]。指令的这几位连接到寄存器组的一个地址输入：A1，如图7.18。寄存器组读取数据后将数据送到输出端口RD1。这个值被存储在另一个非结构性寄存器A中。

As we did with the single-cycle processor, we will work out the datapath connections for the lw instruction. Then we will enhance the datapath to handle the other instructions.
For a lw instruction, the next step is to read the source register containing the base address. This register is specified in the rs field of the instruction, Instr25:21. These bits of the instruction are connected to one of the address inputs, A1, of the register file, as shown in Figure 7.18. The register file reads the register onto RD1. This value is stored in another nonarchitectural register, A.

lw指令还需要一个偏移量。偏移量由指令的立即数域指出，即Instr[15:0]。偏移量需要符号扩展至32位，如图7.19。经符号扩展后的32位数据记为SignImm。出于一致性考虑，我们也可以将SignImm存储在另一个非结构性寄存器中。然而，SignImm是Instr的组合函数，在处理当前指令时，SignImm的值不会改变，所以我们没有必要为这个不会改变的值分配一个寄存器。

The lw instruction also requires an offset. The offset is stored in the
immediate field of the instruction, Instr15:0, and must be sign-extended to 32 bits, as shown in Figure 7.19. The 32-bit sign-extended value is called SignImm. To be consistent, we might store SignImm in another nonarchitectural register. However, SignImm is a combinational function of Instr and will not change while the current instruction is being processed, so there is no need to dedicate a register to hold the constant value.

要载入的数据的地址是基址和偏移量的和，我们用一个ALU来运算它们的和，如图7.20。为了进行加法运算，ALUControl应设为 010，运算结果ALUResult 存储在一个非结构性寄存器 ALUOut中。

The address of the load is the sum of the base address and offset. We
use an ALU to compute this sum, as shown in Figure 7.20. ALUControl
should be set to 010 to perform an addition. ALUResult is stored in a nonarchitectural register called ALUOut.

接下来我们需要将计算结果接入存储器。我们在存储器的地址输入前设置一个多路选择器来选择存储器地址Adr是来自PC还是ALUOut，如图7.21。该选择器的选择控制信号记为IorD，表示“指令”或“数据”。从存储器中读出的数据存储在另一个非结构性寄存器中，该寄存器记为Data。注意到地址选择器让我们能够在lw指令进行时多次访问存储器。第一次是根据PC值获取指令，第二次是根据ALU输出的地址读取数据。因此，IorD必须在这两个过程中取不同的值。在7.4.2中，我们将设计用于生成这些顺序控制信号的有限状态机。

The next step is to load the data from the calculated address in the memory. We add a multiplexer in front of the memory to choose the memory address, Adr, from either the PC or ALUOut, as shown in Figure 7.21. The multiplexer select signal is called IorD, to indicate either an instruction or data address. The data read from the memory is stored in another nonarchitectural register, called Data. Notice that the address multiplexer permits us to reuse the memory during the lw instruction. On the first step, the address is taken from the PC to fetch the instruction. On a later step, the address is taken from ALUOut to load the data. Hence, IorD must have different values on different steps. In Section 7.4.2, we develop the FSM controller that generates these sequences of control signals.

最后，我们将数据写入寄存器组中，如图7.22。目标寄存器由指令的rt域指出，即Instr[20:16]。

Finally, the data is written back to the register file, as shown in Figure
7.22. The destination register is specified by the rt field of the instruction, Instr20:16.

在这一切发生时，处理器需要将PC值更新为当前PC+4。在单周期处理器中，这一运算需要一个额外的ALU。而在多周期处理器中，我们可以直接在现有的ALU空闲时使用它。要实现这一设计，我们需要在ALU数据输入端口前设置选择器，将ALU输入设为当前PC值和常量4，如图7.23。由ALUScrA控制的二路选择器选择第一个运算数ScrA是来自PC还是寄存器A，由ALUScrB控制的四路选择器选择第二个操作数ScrB是4还是SignImm。稍后我们会用到选择器的另外两个输入。（多路选择器输入的编号可以随意确定）要更新PC，ALU将ScrA(PC)和ScrB(4)相加，并将结果写入程序计数器中。控制信号PCWrite用来确保PC只在特定的时钟周期内被更新。

While all this is happening, the processor must update the program counter by adding 4 to the old PC. In the single-cycle processor, a separate adder was needed. In the multicycle processor, we can use the existing ALU on one of the steps when it is not busy. To do so, we must insert source multiplexers to choose the PC and the constant 4 as ALU inputs, as shown in Figure 7.23. A two-input multiplexer controlled by ALUSrcA chooses either the PC or register A as SrcA. A four-input multiplexer controlled by ALUSrcB chooses either 4 or SignImm as SrcB. We use the other two multiplexer inputs later when we extend the datapath to handle other instructions. (The numbering of inputs to the multiplexer is arbitrary.) To update the PC, the ALU adds SrcA (PC) to SrcB (4), and the result is written into the program counter register. The PCWrite control signal enables the PC register to be written only on certain cycles.

sw指令通路的实现

现在lw指令的数据通路已经完成了。接下来，让我们把它扩展成支持sw指令的数据通路。同lw指令一样，sw指令从输入寄存器组的第一个地址中读出基址，并对立即数做符号扩展。ALU计算基址和立即数的和，得到目标地址。这些功能都可以利用现有的元件实现。

This completes the datapath for the lw instruction. Next, let us extend the datapath to also handle the sw instruction. Like the lw instruction, the sw instruction reads a base address from port 1 of the register file and sign-extends the immediate. The ALU adds the base address to the immediate to find the memory address. All of these functions are already supported by existing hardware in the datapath.

唯一需要添加的新特性是，我们要再从寄存器组中读一次寄存器，并把结果存入存储器中，如图7.24。该寄存器编号由指令的rt域指出，即Instr[20:16]。这部分要接到寄存器组的第二个地址输入中。数据读出后，存储在非结构性寄存器B中。在下一个步骤中，它会被送至数据存储器的数据输入端口（WD）以向存储器写入。存储器还需附加一个控制信号，Mem Write来控制何时可以进行写入操作。

The only new feature of sw is that we must read a second register from the register file and write it into the memory, as shown in Figure 7.24. The register is specified in the rt field of the instruction, Instr20:16,
which is connected to the second port of the register file. When the register is read, it is stored in a nonarchitectural register, B. On the next step, it is sent to the write data port (WD) of the data memory to be written. The memory receives an additional MemWrite control signal to indicate that the write should occur.

R型指令通路的实现

对于R型指令，第一步同样是取指令，而后有两个寄存器的数据会从寄存器组中读出。选择器ScrB的控制信号ALUScrB将选择来自寄存器B的数据作为ALU的第二个操作数，如图7.25。ALU进行相应的运算，并将结果存储在ALUOut中。下一步，ALUOut中的的数据被写回寄存器中，写入的地址由指令的rd域，Instr[15:11]指出。这需要再加入两个选择器来实现。控制信号为MemtoReg的选择器选择WD3是来自ALUOut（如R型指令），还是来自Data（如lw）。控制信号为RegDst的选择器选择目标寄存器是由指令的rt域还是rs域指出。

For R-type instructions, the instruction is again fetched, and the two source registers are read from the register file. ALUSrcB1:0, the control input of the SrcB multiplexer, is used to choose register B as the second source register for the ALU, as shown in Figure 7.25. The ALU performs the appropriate operation and stores the result in ALUOut. On the next step, ALUOut is written back to the register specified by the rd field of the instruction, Instr15:11. This requires two new multiplexers. The MemtoReg multiplexer selects whether WD3 comes from ALUOut (for R-type instructions) or from Data (for lw). The RegDst instruction selects whether the destination register is specified in the rt or rd field of the instruction.

beq指令的实现

若要运行beq指令，在取指令后，同样从寄存器组中读出两个寄存器的数据。要检查它们是否相等，我们用ALU对它们做减法操作，若结果为0，就将控制信号Zero设为有效。同时，如果确定分支发生，数据通路需要把新的PC值，PC’=PC+4+SignInn*4计算出来。在单周期处理器中，我们需要另一个ALU来进行这一计算，但在多周期处理器中，ALU可以重复使用。第一步，我们把PC+4写入程序计数器中，就像我们在其他指令中做的那样；第二步，ALU用更新后的PC计算PC+SignImmi*4。SignImmi*4由SignImmi左移2位得到，如图7.26。ScrB选择左移后的SignImm并将其与PC相加。在PCWrite有效时，或在分支发生时，程序计数器都可以更新。控制信号Branch指示当前指令为分支指令，此时若Zero有效，分支执行。所以通路需要生成一个新的PC写使能：PCEn，在PCWrite有效时，或是Branch 和Zero都有效时，PCEn有效。

For the beq instruction, the instruction is again fetched, and the two source registers are read from the register file. To determine whether the registers are equal, the ALU subtracts the registers and, upon a zero result, sets the Zero flag. Meanwhile, the datapath must compute the next value of the PC if the branch is taken: PC′ = PC + 4 + SignImm × 4. In the single-cycle processor, yet another adder was needed to compute the branch address. In the multicycle processor, the ALU can be reused again to save hardware. On one step, the ALU computes PC + 4 and writes it back to the program counter, as was done for other instructions. On another step, the ALU uses this updated PC value to compute PC + SignImm × 4. SignImm is left-shifted by 2 to multiply it by 4, as shown in Figure 7.26. The SrcB multiplexer chooses this value and adds it to the PC. This sum represents the destination of the branch and is stored in ALUOut. A new multiplexer, controlled by PCSrc, chooses what signal should be sent to PC′. The program counter should be written either when PCWrite is asserted or when a branch is taken. A new control signal, Branch, indicates that the beq instruction is being executed. The branch is taken if Zero is also asserted. Hence, the datapath computes a new PC write enable,called PCEn, which is TRUE either when PCWrite is asserted or when both Branch and Zero are asserted.

现在一个多周期MIPS处理器就设计完成了。这套设计和单周期处理器的设计十分相似：它们的硬件系统地连接在状态元素之间以处理每个指令。它和单周期处理器最主要的差异在于，每条指令的执行都是分步完成的。每一步得到的结果都存储在寄存器中。通过这种方法，ALU可以在不同的周期中重复使用，同样，指令和数据也可以存储在同一个存储器中，这就节省了多余元件的成本。在下一个部分中，我们将设计能够生成每条指令在每一步所需控制信号的有限状态机。

This completes the design of the multicycle MIPS processor datapath. The design process is much like that of the single-cycle processor in that hardware is systematically connected between the state elements to handle each instruction. The main difference is that the instruction is executed in several steps. Nonarchitectural registers are inserted to hold the results of each step. In this way, the ALU can be reused several times, saving the cost of extra adders. Similarly, the instructions and data can be stored in one shared memory. In the next section, we develop an FSM controller to deliver the appropriate sequence of control signals to the datapath on each step of each instruction.

7.4.2 多周期控制器

和在单周期处理器中一样，控制单元根据指令的opcode域（Instr[31:26]）和funct域（Instr[5:0]）生成控制信号。图7.27是带有数据通路和控制单元的完整的多周期MIPS处理器。数据通路用黑色画出，控制信号用蓝色画出。

As in the single-cycle processor, the control unit computes the control signals based on the opcode and funct fields of the instruction, Instr31:26 and Instr5:0. Figure 7.27 shows the entire multicycle MIPS processor with the control unit attached to the datapath. The datapath is shown in black, and the control unit is shown in blue.

多周期控制器依然分为两个部分，主控制器和ALU解码器，如图7.28。ALU解码器的功能可以沿用表7.2给出的真值表，而主控制器变成了能在特定时钟周期生成适当控制信号的有限状态机。控制信号生成的次序取决于正在执行的指令。接下来我们将构建主控制器的状态转化表。

As in the single-cycle processor, the control unit is partitioned into a main controller and an ALU decoder, as shown in Figure 7.28. The ALU decoder is unchanged and follows the truth table of Table 7.2. Now, however, the main controller is an FSM that applies the proper control signals on the proper cycles or steps. The sequence of control signals depends on the instruction being executed. In the remainder of this section, we will develop the FSM state transition diagram for the main controller.

首先来看选择器选择信号和寄存器使能信号。选择信号有：MemtiReg, RegDest, IorD, PCSrc, ALUScrB, ALUScrA。使能信号有：IR Write, MemWrite, PCWrite, Branch, RegWrite。

The main controller produces multiplexer select and register enable signals for the datapath. The select signals are MemtoReg, RegDst, IorD, PCSrc, ALUSrcB, and ALUSrcA. The enable signals are IRWrite, MemWrite, PCWrite, Branch, and RegWrite.

为了保持图表的可读性，我们只列出了相关的控制信号。选择信号只在有意义时列出，否则为不关心的值。使能信号只在有效时列出，否则为0。

To keep the following state transition diagrams readable, only the relevant control signals are listed. Select signals are listed only when their value matters; otherwise, they are don’t cares. Enable signals are listed only when they are asserted; otherwise, they are 0.

取指令和译码

任何指令的第一步都是根据PC寄存器中的地址，从存储器中取出指令。FSM在重置时进入这种状态。读存储器时，IorD置为1，使存储器接收来自PC的地址。IR Write为有效，使指令能够写入指令寄存器中。同时，PC增加4，指向下一条指令。因为此时ALU没有被其他过程占用，我们可以用它在取指令的同时计算PC+4。ALUScrA为0，选择第一个操作数来自PC。ALUScrB为01，选择第二个操作数为4.ALUOp为00，，这样ALU解码器会生成ALUControl=010，控制ALU进行加法操作。为了更新PC值，PCScr设为0，PCWrite为有效。这些控制信号列在图7.29中。这一步骤中的数据流画在图7.30中，取址用蓝色虚线表示，PC自增用灰色虚线表示。

The first step for any instruction is to fetch the instruction from memory at the address held in the PC. The FSM enters this state on reset. To read memory, IorD = 0, so the address is taken from the PC. IRWrite is asserted to write the instruction into the instruction register, IR. Meanwhile, the PC should be incremented by 4 to point to the next instruction. Because the
ALU is not being used for anything else, the processor can use it to compute PC + 4 at the same time that it fetches the instruction. ALUSrcA = 0, so SrcA comes from the PC. ALUSrcB = 01, so SrcB is the constant 4. ALUOp = 00, so the ALU decoder produces ALUControl = 010 to make the ALU add. To update the PC with this new value, PCSrc = 0, and PCWrite is asserted. These control signals are shown in Figure 7.29. The data flow on this step is shown in Figure 7.30, with the instruction fetch shown using the dashed blue line and the PC increment shown using the dashed gray line.

下一步是读寄存器组和指令译码。寄存器组读操作的地址总是由指令的rs和rt域指出。同时，立即数进行符号扩展。在译码过程中，我们检查opcode中的数据，来确定在接下来的步骤中应该怎么做。指令译码不需要控制信号，但FSM必须暂停一个周期，等待指令读入和译码完成，如图7.31。新的状态以蓝色标示，数据流画在图7.32中。

The next step is to read the register file and decode the instruction. The register file always reads the two sources specified by the rs and rt fields of the instruction. Meanwhile, the immediate is sign-extended. Decoding involves examining the opcode of the instruction to determine what to do next. No control signals are necessary to decode the instruction, but the FSM must wait 1 cycle for the reading and decoding to complete, as shown in Figure 7.31. The new state is highlighted in blue. The data flow is shown in Figure 7.32.

lw 和 sw

现在FSM需要根据opcode，从几种可能的状态中选择一种来显示。如果指令是读写存储器的指令（lw或sw），处理器会把基址和符号扩展的立即数相加来生成目标地址。这需要将AlUScrA置为1，选择第一个操作数来自寄存器A；ALUScrB置为10，选择第二个操作数为立即数。ALUOp为00，表示ALU执行加法操作。目标地址被存在寄存器ALUOut中共下一步使用。这一步的FSM状态如图7.33所示，数据流如图7.34所示。

Now the FSM proceeds to one of several possible states, depending on the opcode. If the instruction is a memory load or store (lw or sw), the multicycle processor computes the address by adding the base address to the sign-extended immediate. This requires ALUSrcA = 1 to select register A and ALUSrcB = 10 to select SignImm. ALUOp = 00, so the ALU adds. The effective address is stored in the ALUOut register for use on the next step. This FSM step is shown in Figure 7.33, and the data flow is shown in Figure 7.34.

如果要执行的是lw指令，接下来处理器应该从存储器读取数据并将其写入寄存器组中。这两步如图7.35所示。为了从存储器中读数，IorD设为1，选择刚刚运算出来后被存入ALUOut 中的结果作为地址输入。这个地址中的数据读出后被存在数据寄存器中，这是状态S3。在下一状态，S4里， Data被写入寄存器组中。MemtoReg设为1，选择数据来源为Data；RegDst设为0，选择写入的寄存器由指令的rt域指出。RegWrite为有效，以允许写入操作进行。这样lw指令就完成执行了。最后，FSM回到初始状态S0以取下一条指令。请读者们尝试自己想象这些步骤中的数据流。

If the instruction is lw, the multicycle processor must next read data from memory and write it to the register file. These two steps are shown in Figure 7.35. To read from memory, IorD = 1 to select the memory address that was just computed and saved in ALUOut. This address in memory is read and saved in the Data register during step S3. On the next step, S4, Data is written to the register file. MemtoReg = 1 to select Data, and RegDst = 0 to pull the destination register from the rt field of the instruction. RegWrite is asserted to perform the write, completing the lw instruction. Finally, the FSM returns to the initial state, S0, to fetch the next instruction. For these and subsequent steps, try to visualize the data flow on your own.

从状态S2开始，如果该指令为sw，那么从寄存器组的第二个数据输出端口流出的数据会直接写入存储器中。在状态S5中，为选择S2中计算出并存在ALUOut中的地址，IorD设为1。为了写存储器将MemWrite设为有效。然后FSM回到初始状态取下一条指令。这些新出现的状态展示在图7.36中。

From state S2, if the instruction is sw, the data read from the second port of the register file is simply written to memory. In state S3, IorD = 1 to select the address computed in S2 and saved in ALUOut. MemWrite is asserted to write the memory. Again, the FSM returns to S0 to fetch the next instruction. The added step is shown in Figure 7.36.

R 型指令

如果译码结果显示一条指令为R型指令，处理器会用ALU进行运算并将结果写入寄存器组中。这两步如图7.37所示。在状态S6中，我们使ALUScrA=1，ALUScrB=00，选择来自寄存器的两个数作为ALU的运算数。同时，ALU进行的运算类型由指令的funct域指出。在所有的R型指令中，ALUOp=10。ALUResult被存储在ALUOut中。在状态S7中，ALUOut被写回寄存器组中。将RegDst设为1，因为写回的目的地址由指令的rd域指出。MemReg为0，因为要写回的数据，WD3，来自于ALUOut。为了写寄存器，RegWrite设为有效。

If the opcode indicates an R-type instruction, the multicycle processor must calculate the result using the ALU and write that result to the register file. Figure 7.37 shows these two steps. In S6, the instruction is executed by selecting the A and B registers (ALUSrcA = 1, ALUSrcB = 00) and performing the ALU operation indicated by the funct field of the instruction. ALUOp = 10 for all R-type instructions. The ALUResult is stored in ALUOut. In S7, ALUOut is written to the register file, RegDst = 1, because the destination register is specified in the rd field of the instruction. MemtoReg = 0 because the write data, WD3, comes from ALUOut. RegWrite is asserted to write the register file.

beq 指令

对于beq指令，处理器必须计算目标地址并通过比较两个源寄存器数据来决定是否执行分支。这需要两次使用ALU，因此似乎会增加两个新状态。但是，如果我们注意到在状态S1中，进行读寄存器的过程时，ALU并没有被使用，我们就可以在这一步计算目标地址：把PC+4和SignImm*4相加，如图7.38。ALUScrA=0，选择自增4后的PC值为第一个运算数；ALUScrB=11，选择SignImm*4为第二个运算数；ALUOp=00进行加法操作。目标地址被存储在ALUOut中。即使这条指令不是beq，这一步算出来的地址不会被用在接下来的任何一个步骤中，这步计算也是完全无害的。在状态S8中，处理器对这两个数进行减法运算，若结果为0，处理器跳转到刚刚计算出的地址处。ALUScrA=1，ALUScrB=00，选择寄存器中读出的两个数作为ALU的运算数。ALUOp=01进行减法运算；PCSrc=1,将ALUOut中的数据送入PC；Branch=1，这样只要ALU的运算结果为0，PC的值就会更新为分支目标地址。[2]

For a beq instruction, the processor must calculate the destination address and compare the two source registers to determine whether the branch should be taken. This requires two uses of the ALU and hence might seem to demand two new states. Notice, however, that the ALU was not used during S1 when the registers were being read. The processor might as well use the ALU at that time to compute the destination address by adding the incremented PC, PC + 4, to SignImm × 4, as shown in Figure 7.38 (see page 404). ALUSrcA = 0 to select the incremented PC,ALUSrcB = 11 to select SignImm × 4, and ALUOp = 00 to add. The destination address is stored in ALUOut. If the instruction is not beq, the computed address will not be used in subsequent cycles, but its computation was harmless. In S8, the processor compares the two registers by subtracting them and checking to determine whether the result is 0. If it is, the processor branches to the address that was just computed. ALUSrcA = 1 to select register A; ALUSrcB = 00 to select register B; ALUOp = 01 to subtract; PCSrc = 1 to take the destination address from ALUOut, and Branch = 1 to update the PC with this address if the ALU result is 0.[2]

[2]现在我们知道为什么有必要设置选择器PCSrc来选择用ALUResult（S0）还是ALUOut（S8）作为PC’了。

2 Now we see why the PCSrc multiplexer is necessary to choose PC′ from either ALUResult(in S0) or ALUOut (in S8).

把上面的这些步骤整合起来，我们就得到了完整的多周期处理器主控制单元状态转化图，如图7.39。用第三章的内容（时序逻辑电路）实现处理器十分麻烦，好在我们可以用第四章学习的硬件编程语言对它进行编写和仿真。

Putting these steps together, Figure 7.39 shows the complete main controller state transition diagram for the multicycle processor (see page 405). Converting it to hardware is a straightforward but tedious task using the techniques of Chapter 3. Better yet, the FSM can be coded in an HDL and synthesized using the techniques of Chapter 4.

7.4.3 更多指令

就像我们在7.3.3中对单周期处理器做的那样，我们现在来扩展多周期处理器，使之能够执行addi和j指令。下面的两个例子展示了给处理器加入新指令的一般设计方法。

As we did in Section 7.3.3 for the single-cycle processor, let us now extend the multicycle processor to support the addi and j instructions. The next two examples illustrate the general design process to support new instructions.

addi 指令

例 7.5：改进多周期处理器，使之可以支持addi指令。

解决方案：解决方案：现有的数据通路已经能够进行寄存器数据与立即数的加法运算了，因此我们要做的就是在FSM中加入控制进行addi运算的新状态，如图7.40。这些状态和R型指令的状态很相似。在状态S9中，寄存器A中的值与SignImm相加（ALUSrcA=1, ALUSrcB=10, ALUOp=00）,得到的结果，ALUResult，存储在ALUOut中。在状态10中，ALUOut中的结果被写入rt指示的寄存器中（RegDst=0, MemtoReg=0, RegWrite设为有效）。细心的读者也许会注意到S2和S9是完全相同的，因此我们可以把它们合并为一个状态。

Example 7.5 addi INSTRUCTION
Modify the multicycle processor to support addi.

Solution: The datapath is already capable of adding registers to immediates, so all we need to do is add new states to the main controller FSM for addi, as shown in Figure 7.40 (see page 406). The states are similar to those for R-type instructions. In S9, register A is added to SignImm (ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00) and the result, ALUResult, is stored in ALUOut. In S10, ALUOut is written to the register specified by the rt field of the instruction (RegDst = 0, MemtoReg = 0, RegWrite asserted). The astute reader may notice that S2 and S9 are identical and could be merged into a single state.

j 指令

例 7.6：改进多周期处理器，使之能够执行j指令。

解决方案：首先我们要建立能够计算j指令目标地址的数据通路。然后向FSM加入控制该指令的新状态。
图7.41为改进后的数据通路。跳转目标地址的生成需要先将指令的26位addr域左移两位，然后在其左边附上自增后的PC值的高四位。选择器PCSrc扩展为一个三路选择器来接入这个地址输入。
图7.42为改进后的主控制器。新状态PC只是简单地选取了PCJump作为新的PC值（此时PCSrc=10）。注意状态S0和S8中的PCSrc信号也被扩展为两位。

Example 7.6 j INSTRUCTION
Modify the multicycle processor to support j.

Solution: First, we must modify the datapath to compute the next PC value in the case of a j instruction. Then we add a state to the main controller to handle the instruction.
Figure 7.41 shows the enhanced datapath (see page 407). The jump destination address is formed by left-shifting the 26-bit addr field of the instruction by two bits, then prepending the four most significant bits of the already incremented PC. The PCSrc multiplexer is extended to take this address as a third input.

Figure 7.42 shows the enhanced main controller (see page 408). The new state, S11, simply selects PC′ as the PCJump value (PCSrc = 10) and writes the PC. Note that the PCSrc select signal is extended to two bits in S0 and S8 as well.