前言
本文将描述基于 MIPS 的定制功能型处理器的 Verilog 实现。
所有需要实现的
M
I
P
S
MIPS
MIPS 指令共45条,详见计算机组成原理(实验二):简单功能型处理器设计(simple_cpu);本文旨在分享我在实验中的想法,作者也是初学者,代码可能存在未知问题,欢迎指正。仅供参考,请勿复用!
1、接口定义
信号名 | I/O | 说明 |
---|---|---|
rst | Input | 与处理器工作时钟同步的高电平复位信号 |
clk | Input | 处理器工作时钟 |
PC[31:0] | Output | 程序计数器, 复位后初值为32’d0 |
inst_req_valid | output | 指令请求发送通道握手信号,高电平表示发送方发出的请求内容有效 |
inst_req_ready | input | 指令请求发送通道握手信号,高电平表示接收方可以接收发送方的请求 |
Instruction[31:0] | Input | 从内存(Memory)中读取至处理器的指令 |
inst_valid | intput | 指令应答接收通道握手信号,高电平表示发送方发出的应答内容有效 |
inst_ready | output | 指令请求发送通道握手信号,高电平表示接收方可以接收发送方的应答 |
Address[31:0] | Output | 数据访存指令使用的内存地址 |
MemWrite | Output | 内存访问的写使能信号(高电平有效) |
Write_data[31:0] | Output | 内存写操作数据 |
Write_strb[3:0] | Output | 内存写操作字节有效信号(支持32/16/8-bit内存写)Write_strb[i] == 1表示Write_data[8 × (i + 1) - 1 : 8 × i ] 位会被写入内存的对应地址 |
MemRead | Output | 内存访问的读使能信号(高电平有效) |
Read_data[31:0] | Input | 从内存中读取的数据 |
2、定制MIPS功能型处理器的改进
实验项目二中,我们实现了简单功能型处理器,但是我们所实现的简单功能性处理器使用的是理想内存,我们需要将其改进为支持真实内存,即建立真实内存访问通路。真实内存的访问需要根据一定的内存访问协议,并通过内存访问控制器增加额外的周期。
1)访存通路及其接口改造
之前的访存通路是这样的:
访存通道 | 说明 |
---|---|
指令请求发送通道 | 程序计数器(PC)作为内存读地址 |
指令应答接收信号 | 从内存读入指令(Instruction) |
数据请求发送通道 | 数据访问地址、读写控制信号、写数据共同组成 |
数据应答接收通道 | 从外界读取数据(Read_data) |
我们需要做的是为每一个通道添加一个Valid-Ready握手信号,当且仅当某个通道的Valid与Ready同时拉高时才开放通道。其中,Valid为高电平表示发送方发出的请求或应答内容有效;而Ready表示接收方可以接收发送方的请求或应答,已经在接口定义时提到。为了方便改造,数据请求通道的Valid由MemWrite和MemRead代替。
2)三段式状态机
在等待接收方拉高Ready信号时,发送方需要保持Valid信号为高电平,并在接收方Ready拉高后的第一个时钟上升沿释放包括Valid在内的所有对应通道控制信号,此时对应通道输入或输出的内容有效,即握手成功。对于接收方,如果发送方已经准备好即Ready已拉高,它需要等待发送方发送有效值即Valid信号拉高,并在握手成功后释放信号。这样的控制信号模式使用状态机相当方便,我们可以通过定义不同的状态来改变这些控制信号的值。三段式状态机则是一个比较规范的状态机的典型。
各个状态及信号定义
localparam INIT = 9'b000000001,
IF = 9'b000000010,
IW = 9'b000000100,
ID = 9'b000001000,
EX = 9'b000010000,
ST = 9'b000100000,
WB = 9'b001000000,
LD = 9'b010000000,
RDW = 9'b100000000;
reg [8:0] current_state;
reg [8:0] next_state;
reg [31:0] current_PC;
reg [31:0] Valid_Instruction;
reg [31:0] Valid_Read_data;
状态转移图如下:
状态机
always@(posedge clk) begin
if (rst)
current_state <= INIT;
else
current_state <= next_state;
end
always@(*) begin
case (current_state)
INIT: next_state <= IF;//无条件
IF: begin
if (Inst_Req_Ready) next_state <= IW;//Inst_Req_Ready
else next_state <= IF;
end
IW: begin
if (Inst_Valid) next_state <= ID;//Inst_Valid
else next_state <= IW;
end
ID: begin
if (Valid_Instruction != 32'b0) next_state <= EX;//非 NOP 指令
else next_state <= IF;
end
EX: begin
if (opcode == `REGIMM || opcode[5:2] == 4'b0001 || opcode == 6'b000010)
next_state <= IF;//REGIMM / I-Type 跳转指令 / J 指令
else if (opcode == `SPECIAL || opcode[5:3] == 3'b001 || opcode == 6'b000011)
next_state <= WB;//R-Type指令 / I-Type运算指令 / JAL指令
else if (opcode[5] && ~opcode[3]) next_state <= LD;//Load 指令
else if (opcode[5] && opcode[3]) next_state <= ST;//Store 指令
else next_state = EX;
end
LD: begin
if (Mem_Req_Ready) next_state <= RDW;//Mem_Req_Ready
else next_state <= LD;
end
ST: begin
if (Mem_Req_Ready) next_state <= IF;//Mem_Req_Ready
else next_state <= ST;
end
WB: next_state <= IF;//无条件
RDW: begin
if (Read_data_Valid) next_state <= WB;//Read_data_Valid
else next_state <= RDW;
end
default:
next_state <= current_state;
endcase
end
assign PC_4 = PC + 4;
always@(posedge clk) begin
if (rst) begin
PC <= 32'b0;
end
else if (current_state == EX) begin
PC <= Jump ? Jump_addr : (Branch ? Branch_addr : PC_4);
end
else if (Instruction == 32'b0 && current_state == IW && Inst_Ready && Inst_Valid) begin
PC <= PC_4;
end
else begin
PC <= PC;
end
end
always @(posedge clk) begin
current_PC <= (current_state == IF) ? PC : current_PC;
end
assign Inst_Req_Valid = (current_state == IF) ? 1 : 0;
assign Inst_Ready = (current_state == INIT || current_state == IW) ? 1 : 0;
always@(posedge clk) begin
Valid_Instruction <= (Inst_Ready && Inst_Valid) ? Instruction : Valid_Instruction;
end
assign Read_data_Ready = (current_state == INIT || current_state == RDW) ? 1 : 0;
always@(posedge clk) begin
Valid_Read_data <= (Read_data_Ready && Read_data_Valid) ? Read_data : Valid_Read_data;
end
3、UART控制器的访问与打印的实现
对于 puts 函数,它需要向 UART 控制器传送字符串 s 而不是打印 s。首先需要检查 UART 是否为满,这是通过查看基地址偏移四位后的地址所指向的 STATUS 值来实现的,之后等待它清空,并把 s[i] 写入到基地址偏移八位后的地址指向的位置。关于如何判断 UART 为空,需要先偏移四位然后取倒数第四位看是不是零,其中的偏移四位可以通过直接加一或者先转化为 char 类型并加四,再转换回 unsigned int 即可。这中间的 volatile 意为不稳定,即告诉计算机这个变量的值随时可能改变,在程序运行时无需对这个变量进行优化,而且在之后每次运行中用到该变量的时候都需要从内存中重新读取;如果去掉 volatile 可能导致 UART 控制器的基地址不能实时更新。
int
puts(const char *s)
{
//TODO: Add your driver code here
int i = 0;
while (s[i] != '\0') {
while ((*(volatile unsigned int *)((char *)uart + UART_STATUS)) & UART_TX_FIFO_FULL);
*((char *)uart + UART_TX_FIFO) = s[i++];
}
return i;
}
4、性能计数器的实现
性能计数器比较自由,可以自行选择。我在实验中实现了对处理器运行周期、完成执行的指令数、访存指令数、访存延时、跳转发生/不发生指令数这六个性能指标。
//处理器运行周期
reg [31:0] Cycle_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
Cycle_cnt <= 32'b0;
else
Cycle_cnt <= Cycle_cnt + 32'b1;
end
assign cpu_perf_cnt_0 = Cycle_cnt;
//完成执行的指令数
reg [31:0] Inst_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
Inst_cnt <= 32'b0;
else if (current_state == EX)
Inst_cnt <= Inst_cnt + 32'b1;
else
Inst_cnt <= Inst_cnt;
end
assign cpu_perf_cnt_1 = Inst_cnt;
//访存指令数
reg [31:0] MemVisit_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
MemVisit_cnt <= 32'b0;
else if ((current_state == LD || current_state == ST) && Mem_Req_Ready)
MemVisit_cnt <= MemVisit_cnt + 32'b1;
else
MemVisit_cnt <= MemVisit_cnt;
end
assign cpu_perf_cnt_2 = MemVisit_cnt;
//访存延时
reg [31:0] MemDelay_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
MemDelay_cnt <= 32'b0;
else if (((current_state == ST || current_state == LD) && !Mem_Req_Ready) || (current_state == RDW && !Read_data_Valid))
MemDelay_cnt <= MemDelay_cnt + 32'b1;
else
MemDelay_cnt <= MemDelay_cnt;
end
assign cpu_perf_cnt_3 = MemDelay_cnt;
//跳转发生数
reg [31:0] Branch_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
Branch_cnt <= 32'b0;
else if (current_state == EX && Branch)
Branch_cnt <= Branch_cnt + 32'b1;
else
Branch_cnt <= Branch_cnt;
end
assign cpu_perf_cnt_4 = Branch_cnt;
//非跳转发生数
reg [31:0] NotBra_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
NotBra_cnt <= 32'b0;
else if (current_state == EX && !Branch)
NotBra_cnt <= NotBra_cnt + 32'b1;
else
NotBra_cnt <= NotBra_cnt;
end
assign cpu_perf_cnt_5 = NotBra_cnt;
除此之外,还需要对软件部分做更改。先在头文件中定义我们所用到的六个接口:
#define cpu_perf_cnt_0 0x60010000
#define cpu_perf_cnt_1 0x60010008
#define cpu_perf_cnt_2 0x60011000
#define cpu_perf_cnt_3 0x60011008
#define cpu_perf_cnt_4 0x60012000
#define cpu_perf_cnt_5 0x60012008
typedef struct Result {
int pass;
unsigned long msec;
unsigned long Inst;
unsigned long MemVisit;
unsigned long MemDelay;
unsigned long Branch;
unsigned long NotBra;
} Result;
之后修改函数 perf_cnt.c:
unsigned long _uptime() {
// TODO [COD]
// You can use this function to access performance counter related with time or cycle.
volatile unsigned long *Cycle_cnt = (unsigned long *)cpu_perf_cnt_0;
return *Cycle_cnt;
}
unsigned long _upInst() {
// TODO [COD]
// You can use this function to access performance counter related with time or cycle.
volatile unsigned long *Inst_cnt = (unsigned long *)cpu_perf_cnt_1;
return *Inst_cnt;
}
unsigned long _upMemVisit() {
// TODO [COD]
// You can use this function to access performance counter related with time or cycle.
volatile unsigned long *MemVisit_cnt = (unsigned long *)cpu_perf_cnt_2;
return *MemVisit_cnt;
}
unsigned long _upMemDelay() {
// TODO [COD]
// You can use this function to access performance counter related with time or cycle.
volatile unsigned long *MemDelay_cnt = (unsigned long *)cpu_perf_cnt_3;
return *MemDelay_cnt;
}
unsigned long _upBranch() {
// TODO [COD]
// You can use this function to access performance counter related with time or cycle.
volatile unsigned long *Branch_cnt = (unsigned long *)cpu_perf_cnt_4;
return *Branch_cnt;
}
unsigned long _upNotBra() {
// TODO [COD]
// You can use this function to access performance counter related with time or cycle.
volatile unsigned long *NotBra_cnt = (unsigned long *)cpu_perf_cnt_5;
return *NotBra_cnt;
}
void bench_prepare(Result *res) {
// TODO [COD]
// Add preprocess code, record performance counters' initial states.
// You can communicate between bench_prepare() and bench_done() through
// static variables or add additional fields in `struct Result`
res->msec = _uptime();
res->Inst = _upInst();
res->MemVisit = _upMemVisit();
res->MemDelay = _upMemDelay();
res->Branch = _upBranch();
res->NotBra = _upNotBra();
}
void bench_done(Result *res) {
// TODO [COD]
// Add postprocess code, record performance counters' current states.
res->msec = _uptime() - res->msec;
res->Inst = _upInst() - res->Inst;
res->MemVisit = _upMemVisit() - res->MemVisit;
res->MemDelay = _upMemDelay() - res->MemDelay;
res->Branch = _upBranch() - res->Branch;
res->NotBra = _upNotBra() - res->NotBra;
}
还需要修改 bench.c 以在运行中打印性能计数器的值:
int main() {
int pass = 1;
_Static_assert(ARR_SIZE(benchmarks) > 0, "non benchmark");
for (int i = 0; i < ARR_SIZE(benchmarks); i ++) {
Benchmark *bench = &benchmarks[i];
current = bench;
setting = &bench->settings[SETTING];
const char *msg = bench_check(bench);
printk("[%s] %s: ", bench->name, bench->desc);
if (msg != NULL) {
printk("Ignored %s\n", msg);
} else {
unsigned long msec = ULONG_MAX;
unsigned long Inst = ULONG_MAX;
unsigned long MemVisit = ULONG_MAX;
unsigned long MemDelay = ULONG_MAX;
unsigned long Branch = ULONG_MAX;
unsigned long NotBra = ULONG_MAX;
int succ = 1;
for (int i = 0; i < REPEAT; i ++) {
Result res;
run_once(bench, &res);
printk(res.pass ? "*" : "X");
succ &= res.pass;
if (res.msec < msec) msec = res.msec;
if (res.Inst < Inst) Inst = res.Inst;
if (res.MemVisit < MemVisit) MemVisit = res.MemVisit;
if (res.MemDelay < MemDelay) MemDelay = res.MemDelay;
if (res.Branch < Branch) Branch = res.Branch;
if (res.NotBra < NotBra) NotBra = res.NotBra;
}
if (succ) printk(" Passed.\n");
else printk(" Failed.\n");
pass &= succ;
// TODO [COD]
// A benchmark is finished here, you can use printk to output some informantion.
// `msec' is intended indicate the time (or cycle),
// you can ignore according to your performance counters semantics.
printk("Time Cycle: %u\n", msec);
printk("Instruction Number: %u\n", Inst);
printk("Memory Visit: %u\n", MemVisit);
printk("Memory Delay: %u\n", MemDelay);
printk("Branch Number: %u\n", Branch);
printk("Not Branch Number: %u\n", NotBra);
}
}
printk("benchmark finished\n");
if(pass)
hit_good_trap();
else
nemu_assert(0);
return 0;
}
5、附:costom_cpu.v
`timescale 10ns / 1ns
`define DATA_WIDTH 32
`define ADDR_WIDTH 5
// OPCODE: 6-bit
`define SPECIAL 6'b000000
`define REGIMM 6'b000001
`define ADDIU 6'b001001
`define LUI 6'b001111
`define LB 6'b100000
`define LH 6'b100001
`define LBU 6'b100100
`define LHU 6'b100101
`define LWL 6'b100010
`define LWR 6'b100110
`define SB 6'b101000
`define SH 6'b101001
`define SW 6'b101011
`define SWL 6'b101010
`define SWR 6'b101110
`define J 6'b000010
`define JAL 6'b000011
// FUNC: 6-bit
`define JR 6'b001000
`define JALR 6'b001001
`define MOVZ 6'b001010
`define MOVN 6'b001011
module custom_cpu(
input clk,
input rst,
//Instruction request channel
output reg [31:0] PC,
output Inst_Req_Valid,
input Inst_Req_Ready,
//Instruction response channel
input [31:0] Instruction,
input Inst_Valid,
output Inst_Ready,
//Memory request channel
output [31:0] Address,
output MemWrite,
output [31:0] Write_data,
output [ 3:0] Write_strb,
output MemRead,
input Mem_Req_Ready,
//Memory data response channel
input [31:0] Read_data,
input Read_data_Valid,
output Read_data_Ready,
input intr,
output [31:0] cpu_perf_cnt_0,
output [31:0] cpu_perf_cnt_1,
output [31:0] cpu_perf_cnt_2,
output [31:0] cpu_perf_cnt_3,
output [31:0] cpu_perf_cnt_4,
output [31:0] cpu_perf_cnt_5,
output [31:0] cpu_perf_cnt_6,
output [31:0] cpu_perf_cnt_7,
output [31:0] cpu_perf_cnt_8,
output [31:0] cpu_perf_cnt_9,
output [31:0] cpu_perf_cnt_10,
output [31:0] cpu_perf_cnt_11,
output [31:0] cpu_perf_cnt_12,
output [31:0] cpu_perf_cnt_13,
output [31:0] cpu_perf_cnt_14,
output [31:0] cpu_perf_cnt_15,
output [69:0] inst_retire
);
/* The following signal is leveraged for behavioral simulation,
* which is delivered to testbench.
*
* STUDENTS MUST CONTROL LOGICAL BEHAVIORS of THIS SIGNAL.
*
* inst_retired (70-bit): detailed information of the retired instruction,
* mainly including (in order)
* {
* reg_file write-back enable (69:69, 1-bit),
* reg_file write-back address (68:64, 5-bit),
* reg_file write-back data (63:32, 32-bit),
* retired PC (31: 0, 32-bit)
* }
*
*/
wire [69:0] inst_retire;
// TODO: Please add your custom CPU code here
localparam INIT = 9'b000000001,
IF = 9'b000000010,
IW = 9'b000000100,
ID = 9'b000001000,
EX = 9'b000010000,
ST = 9'b000100000,
WB = 9'b001000000,
LD = 9'b010000000,
RDW = 9'b100000000;
reg [8:0] current_state;
reg [8:0] next_state;
reg [31:0] current_PC;
reg [31:0] Valid_Instruction;
reg [31:0] Valid_Read_data;
wire RF_wen;
wire [`ADDR_WIDTH - 1:0] RF_waddr;
wire [`DATA_WIDTH - 1:0] RF_wdata;
wire [`DATA_WIDTH - 1:0] RF_rdata1;
wire [`DATA_WIDTH - 1:0] RF_rdata2;
wire [5:0] opcode;
wire [`ADDR_WIDTH - 1:0] rs;
wire [`ADDR_WIDTH - 1:0] rt;
wire [`ADDR_WIDTH - 1:0] rd;
wire [4:0] sa;
wire [5:0] func;
wire [`DATA_WIDTH - 1:0] zero_extend;
wire [`DATA_WIDTH - 1:0] signed_extend;
wire [`DATA_WIDTH - 1:0] shift_signed_extend;
wire [2:0] ALU_control;
wire [`DATA_WIDTH - 1:0] ALU_result;
wire [`DATA_WIDTH - 1:0] ALU_num1;
wire [`DATA_WIDTH - 1:0] ALU_num2;
wire Zero;
wire [4:0] Shift_num;
wire [1:0] Shift_op;
wire [`DATA_WIDTH - 1:0] Shift_result;
wire Jump;
wire [`DATA_WIDTH - 1:0] Jump_addr;
wire Branch;
wire [`DATA_WIDTH - 1:0] Branch_addr;
wire [`DATA_WIDTH - 1:0] load_data;
wire [7:0] byte_data;
wire [15:0] half_data;
wire [`DATA_WIDTH - 1:0] lwl_data;
wire [`DATA_WIDTH - 1:0] lwr_data;
wire [`DATA_WIDTH - 1:0] PC_4;
assign opcode = Valid_Instruction[31:26];
assign rs = Valid_Instruction[25:21];
assign rt = Valid_Instruction[20:16];
assign rd = Valid_Instruction[15:11];
assign sa = Valid_Instruction[10:6];
assign func = Valid_Instruction[5:0];
assign zero_extend = {16'b0, Valid_Instruction[15:0]};
assign signed_extend = Valid_Instruction[15] ? {{16{1'b1}}, Valid_Instruction[15:0]} : {{16{1'b0}}, Valid_Instruction[15:0]};
assign shift_signed_extend = Valid_Instruction[15] ? {{14{1'b1}}, Valid_Instruction[15:0], 2'b00} : {{14{1'b0}}, Valid_Instruction[15:0], 2'b00};
assign ALU_control = (opcode == `SPECIAL && func[3:2] == 2'b00) ? {func[1], 2'b10}//ADD/SUB: R-Type: 运算指令-ADDU/SUBU
: (opcode == `SPECIAL && func[3:2] == 2'b01) ? {func[1], 1'b0, func[0]}//AND/OR/XOR/NOR: R-Type: 运算指令-AND/OR/XOR/NOR
: (opcode == `SPECIAL && func[3:2] == 2'b10) ? {~func[0], 2'b11}//SLT/SLTU: R-Type: 运算指令-SLT
: (opcode == `REGIMM || opcode[5:1] == 5'b00011) ? 3'b111//SLT: REGIMM指令/I-Type: 分支指令-BLEZ/BGTZ
: (opcode[5:1] == 5'b00010) ? 3'b110//SUB: I-Type: 分支指令-BEQ/BNE
: (opcode[5:3] == 3'b001 && opcode[2:1] == 2'b00) ? {opcode[1], 2'b10}//ADD: I-Type: 计算指令-ADDI/ADDIU
: (opcode[5:3] == 3'b001 && opcode[2] == 1'b1 && opcode[1:0] != 2'b11) ? {opcode[1], 1'b0, opcode[0]}//AND/OR/XOR: I-Type: 计算指令-ANDI/ORI/XORI
: (opcode[5:3] == 3'b001 && opcode[2:1] == 2'b01) ? {~opcode[0], 2'b11}//SLT/SLTU: I-Type: 计算指令-SLTI/SLTIU
: (opcode[5]) ? 3'b010//ADD: I-Type: 访存指令
: 3'bXXX;//NOPE
assign ALU_num1 = (opcode[5:1] == 5'b00011) ? 0 : RF_rdata1;//I-Type: 分支指令-BLEZ/BGTZ : 其他指令
assign ALU_num2 = (opcode == `REGIMM) ? 32'b0//REGIMM指令
: (opcode[5:1] == 5'b00011) ? RF_rdata1//I-Type: 分支指令-BLEZ/BGTZ
: (opcode[5:3] == 3'b001 && opcode != `ADDIU) ? zero_extend//I-Type: 计算指令(除了ADDIU)
: (opcode[5] == 1 || opcode == `ADDIU) ? signed_extend//I-Type: 访存指令/计算指令-ADDIU
: RF_rdata2;//其他指令
assign Shift_num = (func[2] == 0) ? sa : RF_rdata1[4:0];
assign Shift_op = (opcode == `SPECIAL && func[5:3] == 3'b000) ? func[1:0] : 2'bXX;
assign Jump = ((opcode == `SPECIAL && {func[5:3], func[1]} == 4'b0010) || opcode[5:1] == 5'b00001) ? 1//R-Type: 跳转指令/J-Type指令
: 0;
assign Jump_addr = (opcode == `SPECIAL && {func[5:3], func[1]} == 4'b0010) ? {RF_rdata1}//R-Type: 跳转指令
: {PC_4[31:28], Valid_Instruction[25:0], 2'b00};//J-Type指令
assign Branch = ((opcode == `REGIMM && (rt[0]^ALU_result[0])) || (opcode[5:2] == 4'b0001 && (opcode[0] ^ Zero))) ? 1 : 0;//REGIMM指令/I-Type: 分支指令
assign Branch_addr = shift_signed_extend + PC_4;
assign RF_wen = (opcode == `REGIMM || opcode[5:2] == 4'b0001 || (opcode[5] && opcode[3])) ? 0//REGIMM指令/I-Type: 分支指令/I-Type: 内存写指令
: (opcode == `SPECIAL && {func[5:3], func[1]} == 4'b0011) ? func[0]^(RF_rdata2 == 32'b0)//R-Type: mov指令
: (opcode == `SPECIAL && func == `JR) ? 0//R-Type: 跳转指令-JR
: (opcode == `J) ? 0//J-Type: J
: (opcode == `SPECIAL && func == `JALR && current_state == EX) ? 1//R-Type: 跳转指令-JALR 且 state = EX
: (opcode == `JAL && current_state == EX) ? 1//J-Type: JAL 且 state = EX
: (current_state == WB) ? 1//state = WB
: 0;
assign RF_waddr = (opcode[5:3] == 3'b001 || opcode[5] & ((~opcode[3]))) ? rt//I-Type: 计算指令/I-Type: 内存读指令
: (opcode[5:1] == 5'b00001 || (opcode == `SPECIAL && func == `JALR && rd == 0)) ? 31//J-Type指令/R-Type: 跳转指令-JALR(rd未指定)
: rd;
assign RF_wdata = (opcode == `SPECIAL && ((func == `MOVZ && RF_rdata2 == 32'b0) || (func == `MOVN && RF_rdata2 != 32'b0))) ? RF_rdata1//R-Type: mov指令
: (opcode == `LUI) ? {Valid_Instruction[15:0], 16'b0}//I-Type: 计算指令-LUI
: ((opcode == `SPECIAL && func[5] == 1'b1) || (opcode[5:3] == 3'b001)) ? ALU_result//R-Type: 运算指令/I-Type: 计算指令
: (opcode == `SPECIAL && func[5:3] == 3'b000) ? Shift_result//R-Type: 移位指令
: ((opcode == `SPECIAL && {func[5:3], func[1]} == 4'b0010) || opcode[5:1] == 5'b00001) ? (current_PC + 8)//R-Type: 跳转指令/J-Type指令
: (opcode[5] && (~opcode[3])) ? load_data//I-Type: 内存读指令
: 32'bx;
assign MemRead = (current_state == LD) ? 1 : 0;//I-Type: 内存读指令
assign load_data = (opcode == `LB) ? (byte_data[7] ? {{24{1'b1}}, byte_data} : {{24{1'b0}}, byte_data})//LB
: (opcode == `LH) ? (half_data[15] ? {{16{1'b1}}, half_data} : {{16{1'b0}}, half_data})//LH
: (opcode == `LBU) ? {{24{1'b0}}, byte_data}//LBU
: (opcode == `LHU) ? {{16{1'b0}}, half_data}//LHU
: (opcode == `LWL) ? lwl_data//LWL
: (opcode == `LWR) ? lwr_data//LWR
: Valid_Read_data;//LW
assign byte_data = (ALU_result[1] & ALU_result[0]) ? Valid_Read_data[31:24]
: (ALU_result[1] & ~ALU_result[0]) ? Valid_Read_data[23:16]
: (~ALU_result[1] & ALU_result[0]) ? Valid_Read_data[15:8]
: Valid_Read_data[7:0];
assign half_data = (~ALU_result[1] & ~ALU_result[0]) ? Valid_Read_data[15:0] : Valid_Read_data[31:16];
assign lwl_data = (ALU_result[1] & ALU_result[0]) ? Valid_Read_data[31:0]
: (ALU_result[1] & ~ALU_result[0]) ? {Valid_Read_data[23:0], RF_rdata2[7:0]}
: (~ALU_result[1] & ALU_result[0]) ? {Valid_Read_data[15:0], RF_rdata2[15:0]}
: {Valid_Read_data[7:0], RF_rdata2[23:0]};
assign lwr_data = (ALU_result[1] & ALU_result[0]) ? {RF_rdata2[31:8], Valid_Read_data[31:24]}
: (ALU_result[1] & ~ALU_result[0]) ? {RF_rdata2[31:16], Valid_Read_data[31:16]}
: (~ALU_result[1] & ALU_result[0]) ? {RF_rdata2[31:24], Valid_Read_data[31:8]}
: Valid_Read_data[31:0];
assign Address = {ALU_result[31:2], 2'b00};
assign MemWrite = (current_state == ST) ? 1 : 0;//I-Type: 内存写指令
assign Write_data = (opcode == `SB) ? (Write_strb[3] ? {RF_rdata2[7:0], 24'b0}
: Write_strb[2] ? {8'b0, RF_rdata2[7:0], 16'b0}
: Write_strb[1] ? {16'b0, RF_rdata2[7:0], 8'b0}
: {24'b0, RF_rdata2[7:0]})//SB
: (opcode == `SH) ? ((Write_strb[3] && Write_strb[2]) ? {RF_rdata2[15:0], 16'b0}
: {16'b0, RF_rdata2[15:0]})//SH
: (opcode == `SWL) ? (Write_strb[3] ? RF_rdata2
: Write_strb[2] ? {8'b0, RF_rdata2[31:8]}
: Write_strb[1] ? {16'b0, RF_rdata2[31:16]}
: {24'b0, RF_rdata2[31:24]})
: (opcode == `SWR) ? (Write_strb[0] ? RF_rdata2
: Write_strb[1] ? {RF_rdata2[23:0], 8'b0}
: Write_strb[2] ? {RF_rdata2[15:0], 16'b0}
: {RF_rdata2[7:0], 24'b0})
: RF_rdata2;
assign Write_strb = (opcode[1:0] == 2'b00) ? (4'b1000 >> (~ALU_result[1:0]))//SB
: (opcode[1:0] == 2'b01) ? {{2{ALU_result[1]}}, {2{~ALU_result[1]}}}//SH
: (opcode[1:0] == 2'b11) ? 4'b1111//SW
: (opcode[2:0] == 3'b010) ? {ALU_result[1]&ALU_result[0], ALU_result[1], ALU_result[1]|ALU_result[0], 1'b1}//SWL
: {1'b1, (~ALU_result[1]) | (~ALU_result[0]), (~ALU_result[1]), (~ALU_result[1]) & (~ALU_result[0])};//SWR
reg_file reg_file_module(
.clk(clk),
.waddr(RF_waddr),
.raddr1(rs),
.raddr2(rt),
.wen(RF_wen),
.wdata(RF_wdata),
.rdata1(RF_rdata1),
.rdata2(RF_rdata2)
);
alu alu_module(
.A(ALU_num1),
.B(ALU_num2),
.ALUop(ALU_control),
.Result(ALU_result),
.Overflow(),
.CarryOut(),
.Zero(Zero)
);
shifter shifter_module(
.A(RF_rdata2),
.B(Shift_num),
.Shiftop(Shift_op),
.Result(Shift_result)
);
always@(posedge clk) begin
if (rst)
current_state <= INIT;
else
current_state <= next_state;
end
always@(*) begin
case (current_state)
INIT: next_state <= IF;//无条件
IF: begin
if (Inst_Req_Ready) next_state <= IW;//Inst_Req_Ready
else next_state <= IF;
end
IW: begin
if (Inst_Valid) next_state <= ID;//Inst_Valid
else next_state <= IW;
end
ID: begin
if (Valid_Instruction != 32'b0) next_state <= EX;//非 NOP 指令
else next_state <= IF;
end
EX: begin
if (opcode == `REGIMM || opcode[5:2] == 4'b0001 || opcode == 6'b000010)
next_state <= IF;//REGIMM / I-Type 跳转指令 / J 指令
else if (opcode == `SPECIAL || opcode[5:3] == 3'b001 || opcode == 6'b000011)
next_state <= WB;//R-Type指令 / I-Type运算指令 / JAL指令
else if (opcode[5] && ~opcode[3]) next_state <= LD;//Load 指令
else if (opcode[5] && opcode[3]) next_state <= ST;//Store 指令
else next_state = EX;
end
LD: begin
if (Mem_Req_Ready) next_state <= RDW;//Mem_Req_Ready
else next_state <= LD;
end
ST: begin
if (Mem_Req_Ready) next_state <= IF;//Mem_Req_Ready
else next_state <= ST;
end
WB: next_state <= IF;//无条件
RDW: begin
if (Read_data_Valid) next_state <= WB;//Read_data_Valid
else next_state <= RDW;
end
default:
next_state <= current_state;
endcase
end
assign PC_4 = PC + 4;
always@(posedge clk) begin
if (rst) begin
PC <= 32'b0;
end
else if (current_state == EX) begin
PC <= Jump ? Jump_addr : (Branch ? Branch_addr : PC_4);
end
else if (Instruction == 32'b0 && current_state == IW && Inst_Ready && Inst_Valid) begin
PC <= PC_4;
end
else begin
PC <= PC;
end
end
always @(posedge clk) begin
current_PC <= (current_state == IF) ? PC : current_PC;
end
assign Inst_Req_Valid = (current_state == IF) ? 1 : 0;
assign Inst_Ready = (current_state == INIT || current_state == IW) ? 1 : 0;
always@(posedge clk) begin
Valid_Instruction <= (Inst_Ready && Inst_Valid) ? Instruction : Valid_Instruction;
end
assign Read_data_Ready = (current_state == INIT || current_state == RDW) ? 1 : 0;
always@(posedge clk) begin
Valid_Read_data <= (Read_data_Ready && Read_data_Valid) ? Read_data : Valid_Read_data;
end
//处理器运行周期
reg [31:0] Cycle_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
Cycle_cnt <= 32'b0;
else
Cycle_cnt <= Cycle_cnt + 32'b1;
end
assign cpu_perf_cnt_0 = Cycle_cnt;
//完成执行的指令数
reg [31:0] Inst_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
Inst_cnt <= 32'b0;
else if (current_state == EX)
Inst_cnt <= Inst_cnt + 32'b1;
else
Inst_cnt <= Inst_cnt;
end
assign cpu_perf_cnt_1 = Inst_cnt;
//访存指令数
reg [31:0] MemVisit_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
MemVisit_cnt <= 32'b0;
else if ((current_state == LD || current_state == ST) && Mem_Req_Ready)
MemVisit_cnt <= MemVisit_cnt + 32'b1;
else
MemVisit_cnt <= MemVisit_cnt;
end
assign cpu_perf_cnt_2 = MemVisit_cnt;
//访存延时
reg [31:0] MemDelay_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
MemDelay_cnt <= 32'b0;
else if (((current_state == ST || current_state == LD) && !Mem_Req_Ready) || (current_state == RDW && !Read_data_Valid))
MemDelay_cnt <= MemDelay_cnt + 32'b1;
else
MemDelay_cnt <= MemDelay_cnt;
end
assign cpu_perf_cnt_3 = MemDelay_cnt;
//跳转发生数
reg [31:0] Branch_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
Branch_cnt <= 32'b0;
else if (current_state == EX && Branch)
Branch_cnt <= Branch_cnt + 32'b1;
else
Branch_cnt <= Branch_cnt;
end
assign cpu_perf_cnt_4 = Branch_cnt;
//非跳转发生数
reg [31:0] NotBra_cnt;
always@(posedge clk) begin
if (rst == 1'b1)
NotBra_cnt <= 32'b0;
else if (current_state == EX && !Branch)
NotBra_cnt <= NotBra_cnt + 32'b1;
else
NotBra_cnt <= NotBra_cnt;
end
assign cpu_perf_cnt_5 = NotBra_cnt;
endmodule