HDL4SE:软件工程师学习Verilog语言(十五)

15 RISC-V CPU的FPGA实现

前面我们用软件实现了一个RISC-V的CPU,虽然是用HDL4SE建模实现,但是仍然不是RTL的,没法直接在硬件上运行,充其量算是RISC-V CPU的CModel。本次我们实现一个用verilog写的RISC-V CPU,能够在FPGA上跑起来。然而重点是在于介绍从软件CModel到FPGA中间的开发过程,可以看到,整个过程比直接写verilog还是要强,很多开发设计迭代过程在软件建模中实现了,软件做到RTL时,再用硬件语言实现就水到渠成了。反过来,如果直接看最后的verilog代码,其实很难想象能够直接这么设计出来。

15.1 目标

15.1.1 FPGA开发板

要做FPGA应用开发,需要准备很多前置知识,除了verilog语言的学习之外,还有一些硬件相关的内容。我们选用一个硬件方面要求比较低的FPGA开发板DE1-SOC,它采用了Altera(Intel)的Cyclone V(SOC) FPGA芯片,其中有一个ARM,不过本次我们不使用它,我们只使用它的FPGA部分。
DE1-SOC板子有CLK(50MHz外部输入),GPIO(两组共70个,inout类型),KEY(input 4个,按下松开后自动恢复),SW(input10个,二值输入,开关接入),7段数码管(共阳极,6个),LED灯(10个),64MB SDRAM,视频输入,视频输出(VGA DAC),还有PS/2,IR,AUDIO等接口,CPU那边还能接网络,USB,SD卡等。软件上还有一个自动建立工程的软件,省了很多配置FPGA管脚的工作,比较适合软件工程师入门,应该说做CPU的开发板时比较合适的。
我们这次的目标是在这个FPGA开发板上把RISC-V CPU跑起来,并实现软件的计数器,将计数结果输出到LED上,通过KEY或者SW来控制计数器的动作。
在这里插入图片描述
它的外部接口示意图如下(来自友terasIC官网):
在这里插入图片描述

我们用DE1-Soc自带的SystemBuilder工具生成FPGA的顶层模块及Altera的工程文件:
在这里插入图片描述
生成的工程文件中已经配置好了FPGA的管脚,顶层模型文件如下:


//=======================================================
//  This code is generated by Terasic System Builder
//=======================================================

module de1_riscv(

	 ADC //
	output		          		ADC_CONVST,
	output		          		ADC_DIN,
	input 		          		ADC_DOUT,
	output		          		ADC_SCLK,

	 Audio //
	input 		          		AUD_ADCDAT,
	inout 		          		AUD_ADCLRCK,
	inout 		          		AUD_BCLK,
	output		          		AUD_DACDAT,
	inout 		          		AUD_DACLRCK,
	output		          		AUD_XCK,

	 CLOCK //
	input 		          		CLOCK2_50,
	input 		          		CLOCK3_50,
	input 		          		CLOCK4_50,
	input 		          		CLOCK_50,

	 SDRAM //
	output		    [12:0]		DRAM_ADDR,
	output		     [1:0]		DRAM_BA,
	output		          		DRAM_CAS_N,
	output		          		DRAM_CKE,
	output		          		DRAM_CLK,
	output		          		DRAM_CS_N,
	inout 		    [15:0]		DRAM_DQ,
	output		          		DRAM_LDQM,
	output		          		DRAM_RAS_N,
	output		          		DRAM_UDQM,
	output		          		DRAM_WE_N,

	 I2C for Audio and Video-In //
	output		          		FPGA_I2C_SCLK,
	inout 		          		FPGA_I2C_SDAT,

	 SEG7 //
	output		     [6:0]		HEX0,
	output		     [6:0]		HEX1,
	output		     [6:0]		HEX2,
	output		     [6:0]		HEX3,
	output		     [6:0]		HEX4,
	output		     [6:0]		HEX5,

	 IR //
	input 		          		IRDA_RXD,
	output		          		IRDA_TXD,

	 KEY //
	input 		     [3:0]		KEY,

	 LED //
	output		     [9:0]		LEDR,

	 PS2 //
	inout 		          		PS2_CLK,
	inout 		          		PS2_CLK2,
	inout 		          		PS2_DAT,
	inout 		          		PS2_DAT2,

	 SW //
	input 		     [9:0]		SW,

	 Video-In //
	input 		          		TD_CLK27,
	input 		     [7:0]		TD_DATA,
	input 		          		TD_HS,
	output		          		TD_RESET_N,
	input 		          		TD_VS,

	 VGA //
	output		          		VGA_BLANK_N,
	output		     [7:0]		VGA_B,
	output		          		VGA_CLK,
	output		     [7:0]		VGA_G,
	output		          		VGA_HS,
	output		     [7:0]		VGA_R,
	output		          		VGA_SYNC_N,
	output		          		VGA_VS,

	 GPIO_0, GPIO_0 connect to GPIO Default //
	inout 		    [35:0]		GPIO
);
endmodule

我们先修改它,验证板子能够正确运行:

module de1_riscv(
/*
  端口部分省略......
*/

	wire wClk = CLOCK_50;
	wire nwReset = KEY[3];
	reg [6:0] led0;
	reg [6:0] led1;
	reg [6:0] led2;
	reg [6:0] led3;
	reg [6:0] led4;
	reg [6:0] led5;
	assign HEX0 = ~led0;
	assign HEX1 = ~led1;
	assign HEX2 = ~led2;
	assign HEX3 = ~led3;
	assign HEX4 = ~led4;
	assign HEX5 = ~led5;

	always @(posedge wClk) begin
		if (!nwReset) begin
			led0 <= 8'h3f;
			led1 <= 8'h3f;
			led2 <= 8'h3f;
			led3 <= 8'h3f;
			led4 <= 8'h3f;
			led5 <= 8'h3f;
		end else begin
			if (SW[8]) begin
				led0 <= 8'h06;
				led1 <= 8'h06;
				led2 <= 8'h06;
				led3 <= 8'h07;
				led4 <= 8'h07;
				led5 <= 8'h07;				
			end
			else if (SW[9]) begin
				led0 <= 8'h3f;
				led1 <= 8'h06;
				led2 <= 8'h5b;
				led3 <= 8'h4f;
				led4 <= 8'h66;
				led5 <= 8'h6d;				
			end
		end
	end
	
endmodule

这段代码用Quartus II综合后下载到FPGA板子中(这些操作不详细说明了),能够在数码管显示数字,按键KEY[3],显示000000,松开后拨动开关SW[9]和SW[8],分别显示54321和777111,这样就表示FPGA板子正常运行起来了。

15.1.2 设计目标

我们这次的目标是,将前面的RISC-V CPU核改写为verilog语言实现,在FPGA开发板上跑起来,运行前面的计数器软件,能够将计数值显示在数码管上,并读出按键信息,控制计数的行为(清零,暂停,继续)。数码管作为一个硬件设备挂在CPU的外部读写口上,通过写地址0xF0000010和0xF0000014来控制数码管显示,读0xF0000000得到输入信息。计数器软件代码如下:

const unsigned int segcode[10] =
{
	0x3F,
	0x06,
	0x5B,//	8'b01011011, 
	0x4F,// 8'b01001111, 
	0x66,// 8'b01100110, 
	0x6d,// 8'b01101101, 
	0x7d,// 8'b01111101, 
	0x07,// 8'b00000111, 
	0x7f,// 8'b01111111, 
	0x6f,// 8'b01101111, 
};

unsigned int num2seg(unsigned int num)
{
	return segcode[num % 10];
}

int main(int argc, char* argv[])
{
	unsigned long long count, ctemp;
	int countit = 1;
	unsigned int* ledkey = (unsigned int*)0xF0000000;
	unsigned int* leddata = (unsigned int*)0xf0000010;
	count = 0;
	leddata[0] = 0x6f7f077d;
	leddata[1] = 0x6d664f5b;
	do {
		unsigned int key;
		key = *ledkey;
		if (key & 1) {
			count = 0;
		}
		else if (key & 2) {
			countit = 0;
		}
		else if (key & 4) {
			countit = 1;
		}
		if (countit)
			count++;

		ctemp = count;
		leddata[0] = num2seg(ctemp) |
			((num2seg(ctemp / 10ll)) << 8) |
			((num2seg(ctemp / 100ll)) << 16) |
			((num2seg(ctemp / 1000ll)) << 24);
		ctemp /= 10000ll;
		leddata[1] = num2seg(ctemp) |
			((num2seg(ctemp / 10ll)) << 8) |
			((num2seg(ctemp / 100ll)) << 16) |
			((num2seg(ctemp / 1000ll)) << 24);
		ctemp /= 10000ll;
		leddata[2] = num2seg(ctemp) |
			((num2seg(ctemp / 10ll)) << 8);
	} while (1);
	return 1;
}

这段代码用前面准备的RISC-V工具链编译连接后,生成一个ELF文件,通过工具链中的objcopy生成FPGA能够读的格式(本来支持ihex格式,但是不知道怎么回事,Altera ModelSim仿真时总是读不对,于是就用verilog格式然后在软件仿真开始时转换为MIF文件)。我们打算把代码和数据都放在FPGA的RAM中,利用FPGA的RAM IP能够用数据初始化的功能,将ELF文件生成的数据文件放在FPGA的一个RAM中。
这个过程中还涉及到ELF文件生成过程中的内存映象部署的问题,默认的工具链连接时,把运行起始点放在0x00010074开始的地方,前面64KB就空出来了,我们做这个应用时,希望占用的FPGA资源尽可能少,比如用8KB的RAM,就可以支持这个应用运行,其中4KB是代码和只读数据,4KB是程序的数据区和堆栈(当然这个应用没有调用诸如malloc之类的动态内存管理方面的API,因此,堆空间没有实现)。
为此,我们修改了默认的链接脚本,让程序从0x00000000开始,这样就可以在8KB的地址空间内完成运行。具体连接脚本的修改结果请看git文件库。

15.2 CModel模式改到RTL

跟SystemC一样,HDL4SE用来对数字电路建模,最大的好处是能够利用c/c++的资源,这会给建模带来很多方便,能够快速把数字电路的模型建立起来,并且能够仿真运行,可以验证软件工具链以及实现的算法等。我们做这个事情的时候,假定计算资源和存储资源都是不受限制的,而且可以用c/c++的一些表达方式进行算法描述。这样做出来的模型,往往不是RTL的,只能作为数字电路的CModel来用。从CModel改到RTL,其实就是将建模过程中用过的c/c++的表达方式,逐步改为全部用HDL4SE的建模方式实现。具体到RISC-V的这个模型,我们分几步来完成。

15.2.1 存储器实现

在前面实现RISC-V时,内存是用c的指针实现的,这是前面一节中的模型定义:

MODULE_DECLARE(riscv_core)
    unsigned int *ram;
    unsigned int regs[32];
    unsigned int ramsize;
    unsigned int dstreg;
    unsigned int dstvalue;
    unsigned int ramdstaddr;
    unsigned int ramdstvalue;
    unsigned int ramdstwidth; /* 1, 2, 4 */
END_MODULE_DECLARE(riscv_core)

......

MODULE_INIT(riscv_core)
    int i;
    pobj->ramsize = RAMSIZE * 4;
    pobj->ram = malloc(pobj->ramsize);
    loadExecImage(pobj->ram, pobj->ramsize);
......

可以看到其中的ram是用c语言的指针实现的,这样做无法对应到硬件实现,因此我们第一步就是把ram移到模型的外部,也通过模型的read/write系列接口来访问。
具体做法是,用Altera的IP生成工具生成RAM,字长32位,总共2048个字(8KB),生成的RAM只有一个读写口,它的接口如下:

module ram8kb (
    address,
    byteena,
    clock,
    data,
    wren,
    q);

    input	[10:0]  address;
    input	[3:0]  byteena;
    input	  clock;
    input	[31:0]  data;
    input	  wren;
    output	[31:0]  q;
endmodule

具体的用法见Altera的相关文档,为了在HDL4SE系统中进行仿真,我们照这个接口建立HDL4SE模型:

#define riscv_ram_MODULE_VERSION_STRING "0.4.0-20210825.0610 RISCV RAM cell"
#define riscv_ram_MODULE_CLSID CLSID_HDL4SE_RISCV_RAM

#define M_ID(id) riscv_ram##id

IDLIST
    VID(address),
    VID(byteena),
    VID(clock),
    VID(data),
    VID(wren),
    VID(q),
    VID(lastaddr),
END_IDLIST

MODULE_DECLARE(riscv_ram)
    unsigned int* ram;
    unsigned int ramaddr;
    unsigned int ramwrdata;
    unsigned int ramwren;
    unsigned int rambyteena;
END_MODULE_DECLARE(riscv_ram)

DEFINE_FUNC(riscv_ram_gen_q, "address, byteena, data, wren, lastaddr") {
    unsigned int lastaddr;
    lastaddr = vget(lastaddr);
    if (lastaddr < RAMSIZE)
        vput(q, pobj->ram[vget(lastaddr)]);
    else
        vput(q, 0xdeadbeef);
} END_DEFINE_FUNC


DEFINE_FUNC(riscv_ram_clktick, "") {
    pobj->ramwren = vget(wren);
    pobj->ramwrdata = vget(data);
    pobj->rambyteena = vget(byteena);
    pobj->ramaddr = vget(address);
    vput(lastaddr, vget(address));
} END_DEFINE_FUNC

DEFINE_FUNC(riscv_ram_deinit, "") {
    if (pobj->ram != NULL)
        free(pobj->ram);
} END_DEFINE_FUNC

DEFINE_FUNC(riscv_ram_setup, "") {
    if (pobj->ramwren) {
        unsigned int mask =
              (pobj->rambyteena & 1 ? 0x000000ff : 0)
            | (pobj->rambyteena & 2 ? 0x0000ff00 : 0)
            | (pobj->rambyteena & 4 ? 0x00ff0000 : 0)
            | (pobj->rambyteena & 8 ? 0xff000000 : 0);
        pobj->ram[pobj->ramaddr] =  (pobj->ram[pobj->ramaddr] & (~mask))
                                    | (pobj->ramwrdata & mask);
    }
    pobj->ramwren = 0;
} END_DEFINE_FUNC

static int loadExecImage(unsigned char* data, int maxlen)
{
....
}

MODULE_INIT(riscv_ram)
    pobj->ram = malloc(RAMSIZE * 4);
    loadExecImage(pobj->ram, RAMSIZE * 4);
    pobj->ramwren = 0;
    PORT_IN(clock, 1);
    PORT_IN(wren, 1);
    PORT_IN(address, 11);
    PORT_IN(data, 32);
    PORT_IN(byteena, 4);
    GPORT_OUT(q, 32, riscv_ram_gen_q);
    REG(lastaddr, 11);
    CLKTICK_FUNC(riscv_ram_clktick);
    SETUP_FUNC(riscv_ram_setup);
    DEINIT_FUNC(riscv_ram_deinit);
END_MODULE_INIT(riscv_ram)

这个模型当然也是大量用了c/c++的描述,不过因为这是FPGA的IP,在FPGA应用时是由Altera生成的,因此这里的描述仅供HDL4SE仿真使用,所以也就无所谓了。
当然我们面临的问题是,这里的存储器访问在读的时候有1拍的延时,而且读写不能同时进行,这样前面建模中使用c/c++在同一个周期中读写的方式要进行修改,这点我们后面的寄存器文件修改后再一起描述。

15.2.2 寄存器文件

RISC-V中有32个32位寄存器,一个PC和31个通用寄存器,这些寄存器当然可以使用HDL4SE的寄存器实现,但是要实现按照寄存器号读写寄存器,其实是一个多路选择电路,为了简化电路,我们把寄存器也用1个端口的RAM实现,当然也放在CPU外面实现(当然,PC寄存器还是放在核内用寄存器实现),这样我们也为寄存器访问增加相应的接口。寄存器文件实际就是一个ram,AlteraIP工具生成的接口如下:

module regfile (
    address,
    byteena,
    clock,
    data,
    wren,
    q);

    input	[4:0]  address;
    input	[3:0]  byteena;
    input	  clock;
    input	[31:0]  data;
    input	  wren;
    output	[31:0]  q;
endmodule

为了HDL4SE仿真运行,我们同样用HDL4SE建模语言建模如下,我们特别增加了每个寄存器的访问接口,可以在仿真时在VCD文件中记录每个寄存器的值:

#define riscv_regfile_MODULE_VERSION_STRING "0.4.0-20210825.1540 RISCV REGFILE cell"
#define riscv_regfile_MODULE_CLSID CLSID_HDL4SE_RISCV_REGFILE

#define M_ID(id) riscv_regfile##id
IDLIST
    VID(address),
    VID(byteena),
    VID(clock),
    VID(data),
    VID(wren),
    VID(q),
    VID(lastaddr),
    VID(x1),
    ......
    VID(x31),
END_IDLIST

#define REGCOUNT 32

MODULE_DECLARE(riscv_regfile)
    unsigned int ram[REGCOUNT];
    unsigned int ramaddr;
    unsigned int ramwrdata;
    unsigned int ramwren;
    unsigned int rambyteena;
END_MODULE_DECLARE(riscv_regfile)

DEFINE_FUNC(riscv_regfile_gen_q, "address, byteena, data, wren, lastaddr") {
    unsigned int lastaddr;
    lastaddr = vget(lastaddr);
    if (lastaddr == 0)
        vput(q, 0);
    else
    if (lastaddr < REGCOUNT)
        vput(q, pobj->ram[vget(lastaddr)]);
    else {
        printf("We have %d registers only, but you want to read %d\n", REGCOUNT, lastaddr);
    }
} END_DEFINE_FUNC


DEFINE_FUNC(riscv_regfile_clktick, "") {
    pobj->ramwren = vget(wren);
    pobj->ramwrdata = vget(data);
    pobj->rambyteena = vget(byteena);
    pobj->ramaddr = vget(address);
    vput(lastaddr, vget(address));
} END_DEFINE_FUNC

DEFINE_FUNC(riscv_regfile_setup, "") {
    if (pobj->ramwren) {
        unsigned int mask =
              (pobj->rambyteena & 1 ? 0x000000ff : 0)
            | (pobj->rambyteena & 2 ? 0x0000ff00 : 0)
            | (pobj->rambyteena & 4 ? 0x00ff0000 : 0)
            | (pobj->rambyteena & 8 ? 0xff000000 : 0);
        pobj->ram[pobj->ramaddr] =  (pobj->ram[pobj->ramaddr] & (~mask))
                                    | (pobj->ramwrdata & mask);
    }
    pobj->ramwren = 0;
} END_DEFINE_FUNC

DEFINE_FUNC(riscv_regfile_register, "wren, data, byteena, address") {
    int i;
    for (i = 1; i < 32; i++)
        vput_idx(VID(x1) + i - 1, pobj->ram[i]);
} END_DEFINE_FUNC


MODULE_INIT(riscv_regfile)
    pobj->ramwren = 0;
    PORT_IN(clock, 1);
    PORT_IN(wren, 1);
    PORT_IN(address, 5);
    PORT_IN(data, 32);
    PORT_IN(byteena, 4);
    GPORT_OUT(q, 32, riscv_regfile_gen_q);
    REG(lastaddr, 5);
    GWIRE(x1, 32, riscv_regfile_register);
	......
    GWIRE(x31, 32, riscv_regfile_register);

    CLKTICK_FUNC(riscv_regfile_clktick);
    SETUP_FUNC(riscv_regfile_setup);
END_MODULE_INIT(riscv_regfile)

由于这个寄存器文件只有一个读写口,而且读写不能同时进行,所以对CPU中的寄存器访问就有限制,不能在一拍中读写两个源寄存器并且写一个目的寄存器了,因此一方面对RISC-V CPU模型的接口有所修改,另一方面对其实现也有较大改动。下面是模型接口,增加了寄存器读写的接口信号:

(* 
  HDL4SE="LCOM", 
  CLSID="638E8BC3-B0E0-41DC-9EDD-D35A39FD8051", 
  softmodule="hdl4se" 
*) 
module riscv_core(
    input wClk, nwReset,
    output          wWrite,
    output [31:0]   bWriteAddr,
    output [31:0]   bWriteData,
    output [3:0]    bWriteMask,
    output reg         wRead,
    output reg [31:0]   bReadAddr,
    input [31:0]    bReadData,
    output reg [4:0]    regno,
    output reg [3:0]    regena,
    output reg [31:0]   regwrdata, 
    output reg          regwren,
    input [31:0]        regrddata
    );

这样,FPGA的主模块就可以写成:

`define USECLOCK50_1
module de1_riscv(
/*端口信号略去不写*/
);
/* 
我们引入一个FPGA的锁相环来提供系统时钟,这样可以通过配置锁相环参数输出不同的频率
让RISC-V CPU在不同的频率下运行 
*/
`ifdef USECLOCK50
	wire wClk = CLOCK_50;
`else
	wire clk100MHz, clk75MHz, clklocked;
	clk100M clk100(.refclk(CLOCK_50),
	               .rst(~KEY[3]),
				   .outclk_0(clk100MHz), 
				   .outclk_1(clk75MHz),
				   .locked(clklocked));
				   
	wire wClk = clk100MHz;
`endif
/* 复位信号用第三个按键来表示,按下即复位整个系统 */
	wire nwReset = KEY[3];

    wire wWrite, wRead;
    wire [31:0] bWriteAddr, bWriteData, bReadAddr, bReadData, bReadDataRam, bReadDataKey;
    wire [3:0]  bWriteMask;

	assign bReadDataKey = {18'b0, KEY, SW};

	reg readcmd;
	reg [31:0] readaddr;

	wire wRead_out = readcmd;
    wire [31:0] bReadAddr_out = readaddr;
    /*将读信号和读地址缓存一拍,用来选择读的内容 */
	always @(posedge wClk) begin
		if (!nwReset) begin
			readcmd <= 1'b0;
			readaddr <= 32'b0;
		end else begin
			readcmd <= wRead;
			readaddr <= bReadAddr;
		end
	end
    
    /* 
    CPU读到的值按照地址分别来自于按键或者是RAM 
    将来有其他外部设备,也可以挂在这里。
    */
    assign bReadData = 
            ((bReadAddr_out & 32'hffffff00) == 32'hf0000000) ? bReadDataKey : (
            ((bReadAddr_out & 32'hffffc000) == 32'h00000000) ? bReadDataRam : (0)
            );

	/* RAM是单口的,我们考虑写优先,如果有写命令,则写入,否则读出 */
    wire [10:0] ramaddr;
    assign ramaddr = wWrite?bWriteAddr[12:2]:bReadAddr[12:2];

    wire [4:0]  regno;
    wire [3:0]  regena;
    wire [31:0] regwrdata;
    wire        regwren;
    wire [31:0] regrddata;

    regfile    regs(regno, regena, wClk, regwrdata, regwren, regrddata);
    ram8kb     ram(ramaddr, ~bWriteMask, wClk, bWriteData, 
                   ((bWriteAddr & 32'hffffc000) == 0)?wWrite:1'b0, bReadDataRam);
	riscv_core core(wClk, nwReset, wWrite, bWriteAddr, bWriteData, bWriteMask, 
	                wRead, bReadAddr, bReadData,
                    regno, regena, regwrdata, regwren, regrddata);

	reg [6:0] led0;
	reg [6:0] led1;
	reg [6:0] led2;
	reg [6:0] led3;
	reg [6:0] led4;
	reg [6:0] led5;
	assign HEX0 = ~led0;
	assign HEX1 = ~led1;
	assign HEX2 = ~led2;
	assign HEX3 = ~led3;
	assign HEX4 = ~led4;
	assign HEX5 = ~led5;

	always @(posedge wClk) begin
		if (!nwReset) begin
			led0 <= 8'h3f;
			led1 <= 8'h3f;
			led2 <= 8'h3f;
			led3 <= 8'h3f;
			led4 <= 8'h3f;
			led5 <= 8'h3f;
		end else begin
			if (SW[8]) begin
				led0 <= 8'h06;
				led1 <= 8'h06;
				led2 <= 8'h06;
				led3 <= 8'h07;
				led4 <= 8'h07;
				led5 <= 8'h07;				
			end
			else if (SW[9]) begin
				led0 <= 8'h3f;
				led1 <= 8'h06;
				led2 <= 8'h5b;
				led3 <= 8'h4f;
				led4 <= 8'h66;
				led5 <= 8'h6d;				
			end
			else if (wWrite && ((bWriteAddr & 32'hffffff00) == 32'hf0000000)) begin
				if (bWriteAddr[7:0] == 8'h10) begin
					led0 <= bWriteData[6:0];
					led1 <= bWriteData[14:8];
					led2 <= bWriteData[22:16];
					led3 <= bWriteData[30:24];
				end else if (bWriteAddr[7:0] == 8'h14) begin
					led4 <= bWriteData[6:0];
					led5 <= bWriteData[14:8];
				end
			end
		end
	end
	
endmodule

15.2.3 RISC-V CPU内部状态机

为了满足存储器外移和寄存器外移带来的影响,我们将RISC-V CPU的实现从原来的一拍一条指令改为一条指令分多拍完成,内部用状态机实现,考虑到每条指令的性质不同,我们允许不同的指令执行的周期数以及经过的状态不一样。下面是RISC-V CPU内部的状态:

enum riscv_core_state {
    RISCVSTATE_INIT_REGX1,
    RISCVSTATE_INIT_REGX2,
    RISCVSTATE_READ_INST,
    RISCVSTATE_READ_RS1,
    RISCVSTATE_READ_RS2,
    RISCVSTATE_STORE_RS2,
    RISCVSTATE_EXEC_INST,
    RISCVSTATE_WRITE_RD,
    RISCVSTATE_WAIT_LD,
    RISCVSTATE_WAIT_ST,
    RISCVSTATE_WAIT_DIV,
};

我们为了剔除c语言实现的部分,增加了两个初始化寄存器的状态,用来初始化x1(入口地址)和x2(内存容量)。
每个状态描述如下:

  1. RISCVSTATE_INIT_REGX1:系统复位后的状态,该状态下通过寄存器写接口将x1寄存器设置为程序入口地址(0x8c),然后转移到RISCVSTATE_INIT_REGX2。
  2. RISCVSTATE_INIT_REGX2:该状态下通过寄存器写接口将x2寄存器设置为内存大小(2048 * 4-16),然后转移到RISCVSTATE_READ_INST。
  3. RISCVSTATE_READ_INST:该状态将PC的值送到RAM读接口,发起读RAM周期,返回指令,状态转移到RISCVSTATE_READ_RS1。
  4. RISCVSTATE_READ_RS1:该状态记录读到的指令到寄存器instr中,并同时解析出rs1的编号,送到寄存器读端口读寄存器,状态转移到RISCVSTATE_READ_RS2。
  5. RISCVSTATE_READ_RS2:该状态下记录读到的寄存器值到寄存器rs1中,然后从instr中解析出rs2的编号,发送到寄存器读端口读寄存器,状态转移到RISCVSTATE_STORE_RS2。
  6. RISCVSTATE_STORE_RS2:该状态下记录读到的寄存器值到寄存器rs2中,然后转移到状态RISCVSTATE_EXEC_INST。
  7. RISCVSTATE_EXEC_INST:该状态执行指令,按照指令的类型分别进行状态转移,如果是转移指令,则设置新的PC,如果是alu/alui指令,则设置会写的寄存器和值,转移到RISCVSTATE_WRITE_RD状态,如果是比较复杂的DIV/MOD指令,则设置等待周期数,并转移到RISCVSTATE_WAIT_DIV状态,如果是LOAD指令,则发送读存储器请求,并转移到RISCVSTATE_WAIT_LD,如果是STORE指令,则发出写RAM信号,并转移到RISCVSTATE_WAIT_ST状态。
  8. RISCVSTATE_WRITE_RD:按照前面设置的回写寄存器的编号和值,发起写寄存器周期,转到RISCVSTATE_READ_INST。
  9. RISCVSTATE_WAIT_LD:把读RAM返回的值设置到会写寄存器中,转移RISCVSTATE_WRITE_RD。
  10. RISCVSTATE_WAIT_ST:转移到RISCVSTATE_READ_INST。
  11. RISCVSTATE_WAIT_DIV:递减等待DIV结果的计数器,如果计数器为零,则将结果写到回写寄存器中,然后转移到RISCVSTATE_WRITE_RD。

这个状态机中的几个状态有一定的冗余性,照这个实现,一条指令至少要6个周期才能执行完,很浪费。但是这样描述实现上比较简单,容易读懂,容易被初学者接受。真正要用这个核时,需要对状态机进行优化。

15.2.4 模型函数改造

前面一节实现RISC-V CPU时,使用了c语言函数中间的调用,这样的做法也不是很简单能够用RTL实现的,因此我们对每个函数进行修改,并且将clktick函数和setup函数中实现的功能分解到寄存器的更新函数中,然每个寄存器都绑定自己的更新函数,并且取消了函数间的调用。这样改造的函数就比较小,而且对应到每个线网或寄存器,原则上线网和寄存器生成不在一个函数中,这样就达到了RTL的要求,下面看几个典型的函数:
状态转移函数:

DEFINE_FUNC(riscv_core_gen_state, "state, instr, nwReset") {
    if (vget(nwReset) == 0) {
        vput(state, RISCVSTATE_INIT_REGX1);
    }
    else {
        int state = vget(state);
        switch (state) {
        case RISCVSTATE_INIT_REGX1: {
            vput(state, RISCVSTATE_INIT_REGX2);
        }break;
        case RISCVSTATE_INIT_REGX2: {
            vput(state, RISCVSTATE_READ_INST);
        }break;
        case RISCVSTATE_READ_INST: {
            vput(state, RISCVSTATE_READ_RS1);
        }break;
        case RISCVSTATE_READ_RS1: {
            vput(state, RISCVSTATE_READ_RS2);
        }break;
        case RISCVSTATE_READ_RS2: {
            vput(state, RISCVSTATE_EXEC_INST);
        }break;
        case RISCVSTATE_WRITE_RD: {
            vput(state, RISCVSTATE_READ_INST);
        }break;
        case RISCVSTATE_EXEC_INST: {
            unsigned int instr = vget(instr);
            unsigned int opcode = instr & 0x7f;
            opcode >>= 2;
            if (opcode == 0x00)
                vput(state, RISCVSTATE_WAIT_LD);//ld
            else if (opcode == 0x08) 
                vput(state, RISCVSTATE_WAIT_ST);//st
            else if (opcode == 0x0c && (instr & (1 << 25)) && (func3 & 4)) {
                vput(state, RISCVSTATE_WAIT_DIV);
                vput(divclk, 11);
            }
            else 
                vput(state, RISCVSTATE_WRITE_RD);
        }break;
        case RISCVSTATE_WAIT_LD: {
            vput(state, RISCVSTATE_WRITE_RD);
        }break;
        case RISCVSTATE_WAIT_ST: {
            vput(state, RISCVSTATE_READ_INST);
        }break;
        case RISCVSTATE_WAIT_DIV: {
            if (vget(divclk) == 0)
                vput(state, RISCVSTATE_WRITE_RD);
            else
                vput(divclk, vget(divclk) - 1);
        }break;
        }
    }
} END_DEFINE_FUNC

我们在READ_RS2周期来解码instr中的imm

DEFINE_FUNC(riscv_core_gen_imm, "instr, state") {
    /* 在RISCVSTATE_READ_RS2周期生成imm */
    if (vget(state) == RISCVSTATE_READ_RS2) {
        unsigned int instr;
        unsigned int opcode;
        instr = vget(instr);
        opcode = instr & 0x7f;
        opcode >>= 2;
        switch (opcode) {
        case 0x0d: {
            vput(imm, instr & 0xfffff000); 
        }break;
        case 0x05: {
            vput(imm, instr & 0xfffff000);
        }break;
        case 0x1b: {
            unsigned int imm;
            imm = (instr & (1 << 20)) ? (1 << 11) : 0;
            imm |= (instr >> 20) & 0x7fe;
            imm |= instr & 0xff000;
            imm |= instr & (1 << 31) ? 0x100000 : 0;
            imm = sign_expand(imm, 20);
            vput(imm, imm);
        }break;
        case 0x19: {
            unsigned int imm;
            imm = instr >> 20;
            imm = sign_expand(imm, 11);
            vput(imm, imm);
        }break;
        case 0x18: {
            unsigned int imm;
            unsigned int immh;
            unsigned int immd;

            immh = instr >> 25;
            immd = (instr >> 7) & 0x1f;
            imm = immd & 0x1e;
            imm |= (immh & 0x3f) << 5;
            imm |= (immd & 1) << 11;
            imm |= (immh & 0x40) ? (1 << 12) : 0;
            imm = sign_expand(imm, 12);
            vput(imm, imm);
        }break;
        case 0x00: {
            unsigned int imm;
            imm = instr >> 20;
            imm = sign_expand(imm, 11);
            vput(imm, imm);
        }break;
        case 0x08: {
            unsigned int imm;
            imm = ((instr >> 20) & 0xfe0) | ((instr >> 7) & 0x1f);
            imm = sign_expand(imm, 11);
            vput(imm, imm);
        }break;
        case 0x04: {
            unsigned int imm;
            imm = instr >> 20;
            imm = sign_expand(imm, 11);
            vput(imm, imm);
        }break;
        }
    }
} END_DEFINE_FUNC

这样改造后,HDL4SE模型中就没有太多c语言描述的表达方式了,而且满足RTL的要求,于是后面改为verilog实现就是水到渠成的事情了。

15.3 HDL4SE模型到Verilog

将前面修改好的满足RTL的HDL4SE模型中的接口和函数逐个改写成verilog实现,其实是比较简单的,这里为了实现的方便,我们对alu指令中的乘法和除法使用了Altera的IP实现其中乘法用了Altera FPGA中的DSP块。除法直接用它的IP生成器生成,由于除法比较复杂,单拍完成的配置主频只有10MHZ,所以我们选用了多拍流水线,3级流水可以达到25MHZ,4级可以达到34MHZ,反正除法用得比较少,使用是慢点就慢点,干脆设置为12级流水(占用的寄存器多一些),主频的瓶颈就不在除法器这里了,综合的结果最坏情况下整体可以达到85MHZ,常温下运行到100MHZ计数器应用还是正常的。下面贴出完整的verilog语言实现的RISC-V核:


/*
** HDL4SE: 软件Verilog综合仿真平台
** Copyright (C) 2021-2021, raoxianhong<raoxianhong@163.net>
** LCOM: 轻量级组件对象模型
** Copyright (C) 2021-2021, raoxianhong<raoxianhong@163.net>
** All rights reserved.
**
** Redistribution and use in source and binary forms, with or without
** modification, are permitted provided that the following conditions are met:
**
** * Redistributions of source code must retain the above copyright notice,
**   this list of conditions and the following disclaimer.
** * Redistributions in binary form must reproduce the above copyright notice,
**   this list of conditions and the following disclaimer in the documentation
**   and/or other materials provided with the distribution.
** * The name of the author may be used to endorse or promote products
**   derived from this software without specific prior written permission.
**
** THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
** AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
** IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
** ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
** LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
** CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
** SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
** INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
** CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
** ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
** THE POSSIBILITY OF SUCH DAMAGE.
*/
/* riscv_core.v */

`define RISCVSTATE_INIT_REGX1   0
`define RISCVSTATE_INIT_REGX2   1
`define RISCVSTATE_READ_INST    2
`define RISCVSTATE_READ_RS1     3
`define RISCVSTATE_READ_RS2     4
`define RISCVSTATE_STORE_RS2    5
`define RISCVSTATE_WRITE_RD     6
`define RISCVSTATE_EXEC_INST    7
`define RISCVSTATE_WAIT_LD      8
`define RISCVSTATE_WAIT_ST      9
`define RISCVSTATE_WAIT_DIV    10

`define RAMSIZE  2048

(* 
  HDL4SE="LCOM", 
  CLSID="638E8BC3-B0E0-41DC-9EDD-D35A39FD8051", 
  softmodule="hdl4se" 
*) 
module riscv_core(
    input wClk, nwReset,
    output          wWrite,
    output [31:0]   bWriteAddr,
    output [31:0]   bWriteData,
    output [3:0]    bWriteMask,
    output reg         wRead,
    output reg [31:0]   bReadAddr,
    input [31:0]    bReadData,
    output reg [4:0]    regno,
    output reg [3:0]    regena,
    output reg [31:0]   regwrdata, 
    output reg          regwren,
    input [31:0]        regrddata
    );

    reg [31:0]  pc; //GREG(pc, 32, riscv_core_reg_gen_pc);
    reg [31:0]  instr; //GREG(instr, 32, riscv_core_reg_gen_instr);
    reg [31:0]  rs1; //GREG(rs1, 32, riscv_core_reg_gen_rs1);
    reg [31:0]  rs2; //GREG(rs1, 32, riscv_core_reg_gen_rs2);
    reg         write; //GREG(write, 1, riscv_core_gen_write);
    reg [31:0]  writeaddr; //GREG(writeaddr, 32, riscv_core_gen_write);
    reg [31:0]  writedata; //GREG(writedata, 32, riscv_core_gen_write);
    reg [3:0]   writemask; //GREG(writemask, 4, riscv_core_gen_write);
    reg [4:0]   readreg; //GREG(readreg, 5, riscv_core_reg_gen_readreg);
    reg [3:0]   state; //GREG(state, 4, riscv_core_gen_state);
    reg [31:0]  imm; //GREG(imm, 32, riscv_core_gen_imm);
    reg [4:0]   dstreg; //GREG(dstreg, 5, riscv_core_gen_dstreg);
    reg [31:0]  dstvalue; //GREG(dstvalue, 32, riscv_core_gen_dstreg);
    reg [1:0]   ldaddr; //GREG(ldaddr, 2, riscv_core_gen_ldaddr);
    reg [4:0]   divclk;

    assign wWrite = write;
    assign bWriteAddr = writeaddr;
    assign bWriteData = writedata;
    assign bWriteMask = writemask;

    wire [4:0]  opcode = instr[6:2];
    wire [4:0]  rd = instr[11:7];
    wire [2:0]  func3 = instr[14:12];
    reg cond;
    wire signed [31:0] rs1_s = rs1;
    wire signed [31:0] rs2_s = rs2;
    wire signed [31:0] imm_s = imm;
    wire [31:0] add_result;
    wire [31:0] sub_result;
    wire [63:0] mul_result;
    wire [63:0] muls_result;
    wire [71:0] mulsu_result;
    wire [31:0] div_result_r, mod_result_r, divs_result_r, mods_result_r;
    wire [31:0] div_result, mod_result, divs_result, mods_result;
    adder add(rs1, rs2, add_result);
    suber sub(rs1, rs2, sub_result);
    mult mul(rs1, rs2, mul_result);
    mult_s mul_s(rs1, rs2, muls_result);
    mulsu mul_su(rs1, {8'b0, rs2}, mulsu_result);
    div div(wClk, rs2, rs1, div_result_r, mod_result_r);
    div_s divs(wClk, rs2, rs1, divs_result_r, mods_result_r);

    assign div_result = (rs2 == 0) ? 32'hffffffff : div_result_r;
    assign divs_result = (rs2 == 0) ? 32'hffffffff : divs_result_r;
    assign mod_result = (rs2 == 0) ? rs1 : mod_result_r;
    assign mods_result = (rs2 == 0) ? rs1 : mods_result_r;

    /* cond */
    always @(rs1 or rs2 or rs1_s or rs2_s or func3) 
    case(func3)
     0:/*beq*/ cond = rs1 == rs2;
     1:/*bne*/ cond = rs1 != rs2;
     4:/*blt*/ cond = rs1_s < rs2_s;
     5:/*bge*/ cond = rs1_s >= rs2_s;
     6:/*bltu*/cond = rs1 < rs2; 
     7:/*bgeu*/cond = rs1 >= rs2; 
     default: cond = 1'b0;
     endcase
    

    //DEFINE_FUNC(riscv_core_reg_gen_pc, "nwReset, state, instr, pc, rs1, imm, regrddata") {
    always @(posedge wClk)
    if (!nwReset) begin
        pc <= 32'h00000074;
    end else begin
        if (state == `RISCVSTATE_EXEC_INST) begin
            case (opcode) 
            5'h1b: pc <= pc + imm;
            5'h19: pc <= rs1 + imm;
            5'h18: pc <= cond ? pc + imm : pc + 4;
            default: pc <= pc + 4;
            endcase
        end
    end

    //DEFINE_FUNC(riscv_core_reg_gen_instr, "state, bReadData") {
    always @(posedge wClk)
    if (state == `RISCVSTATE_READ_RS1)
        instr <= bReadData;

    //DEFINE_FUNC(riscv_core_reg_gen_readreg, "state, instr") {
    always @(posedge wClk)
    if (state == `RISCVSTATE_EXEC_INST)
        if (opcode == 5'h00)
           readreg <= rd;

    //DEFINE_FUNC(riscv_core_reg_gen_rs1, "state, regrddata") {
    always @(posedge wClk)
    if (state == `RISCVSTATE_READ_RS2)
        rs1 <= regrddata;

    //DEFINE_FUNC(riscv_core_reg_gen_rs2, "state, regrddata") {
    always @(posedge wClk)
    if (state == `RISCVSTATE_STORE_RS2)
        rs2 <= regrddata;

    //DEFINE_FUNC(riscv_core_gen_write, "nwReset, state, pc, instr, rs1, regrddata, imm") {
    always @(posedge wClk)
    if (!nwReset) begin
        write <= 0;
    end else if (state == `RISCVSTATE_EXEC_INST) begin
        write <= 0;
        if (opcode == 5'h08) begin
            /* riscv支持地址不对齐访问,但是假定写在一个32位字中 */
            writeaddr <= rs1 + imm;
            writemask <= 4'h0;
            writedata <= rs2;
            write <= 1'b1;
            case (func3)
            0:/*sb*/ begin
                case (writeaddr)
                0: begin
                    writemask <= 4'he; 
                    writedata <= rs2; 
                end
                1: begin
                    writemask <= 4'hd; 
                    writedata <= {rs2[23:0], 8'b0}; 
                end
                2: begin
                    writemask <= 4'hb; 
                    writedata <= {rs2[15:0], 16'b0};
                end
                3: begin
                    writemask <= 4'h7; 
                    writedata <= {rs2[7:0], 24'b0}; 
                end
                endcase
            end
            1:/*sh*/ begin
                case (writeaddr)
                0: begin
                    writemask <= 4'hc; 
                    writedata <= rs2;
                end
                1: begin
                    writemask <= 4'hd; 
                    writedata <= {rs2[23:0], 8'b0}; 
                end
                2: begin
                    writemask <= 4'hb; 
                    writedata <= {rs2[15:0], 16'b0};
                end
                endcase
            end
            endcase
        end
    end else begin
        write <= 0;
    end

    //DEFINE_FUNC(riscv_core_gen_state, "state, instr, nwReset") {
    always @(posedge wClk)
    if (!nwReset) begin
        state <= `RISCVSTATE_INIT_REGX1;
    end else begin
        case (state)
        `RISCVSTATE_INIT_REGX1: state <= `RISCVSTATE_INIT_REGX2;
        `RISCVSTATE_INIT_REGX2: state <= `RISCVSTATE_READ_INST;
        `RISCVSTATE_READ_INST:  state <= `RISCVSTATE_READ_RS1;
        `RISCVSTATE_READ_RS1:   state <= `RISCVSTATE_READ_RS2;
        `RISCVSTATE_READ_RS2:   state <= `RISCVSTATE_STORE_RS2;
        `RISCVSTATE_STORE_RS2:   state <= `RISCVSTATE_EXEC_INST;
        `RISCVSTATE_WRITE_RD:   state <= `RISCVSTATE_READ_INST;
        `RISCVSTATE_EXEC_INST: begin
            if (opcode == 5'h00)
                state <= `RISCVSTATE_WAIT_LD;
            else if (opcode == 5'h08) 
                state <= `RISCVSTATE_WAIT_ST;
            else if (opcode == 5'h0c && instr[25] && func3[2]) begin
                state <= `RISCVSTATE_WAIT_DIV;
                divclk <= 11;
            end else
                state <= `RISCVSTATE_WRITE_RD;
        end
        `RISCVSTATE_WAIT_LD:    state <= `RISCVSTATE_WRITE_RD;
        `RISCVSTATE_WAIT_ST:    state <= `RISCVSTATE_READ_INST;
        `RISCVSTATE_WAIT_DIV:  begin
            if (divclk == 0)
                state <= `RISCVSTATE_WRITE_RD;
            else
                divclk <= divclk - 1;
        end
        endcase
    end

    //DEFINE_FUNC(riscv_core_gen_imm, "instr, state") {
    /* 在RISCVSTATE_READ_RS2周期生成imm */
    always @(posedge wClk)
    if (state == `RISCVSTATE_READ_RS2) begin
        case (opcode)
        5'h0d: imm <= {instr[31:12], 12'b0}; 
        5'h05: imm <= {instr[31:12], 12'b0};
        5'h1b: imm <= {{12{instr[31]}}, instr[19:12], instr[20], instr[30:21], 1'b0};
        5'h19: imm <= {{20{instr[31]}}, instr[31:20]};
        5'h18: imm <= {{20{instr[31]}}, instr[7], instr[30:25], instr[11:8], 1'b0};
        5'h00: imm <= {{20{instr[31]}}, instr[31:20]};
        5'h08: imm <= {{20{instr[31]}}, instr[31:25], instr[11:7]};
        5'h04: imm <= {{20{instr[31]}}, instr[31:20]};
        endcase
    end

    //DEFINE_FUNC(riscv_core_reg_wr_sig, "state, dstreg, dstvalue, bReadData, instr, regrddata, pc") {
    always @(state or dstreg or dstvalue or bReadData or instr or regrddata or pc)
    case (state)
    `RISCVSTATE_READ_RS1: begin
        regno = bReadData[19:15]; /* instr */
        regwren = 0;
        regena = 0;
        regwrdata = 0;
    end
    `RISCVSTATE_READ_RS2: begin
        regno = instr[24:20]; 
        regwren = 0;
        regena = 0;
        regwrdata = 0;
    end
    `RISCVSTATE_WRITE_RD: begin
        regwren = (dstreg != 0) ? 1 : 0;
        regno = dstreg;
        regena = 4'hf;
        regwrdata = dstvalue;
    end
    `RISCVSTATE_INIT_REGX1: begin
        regwren = 1;
        regno = 1;
        regena = 4'hf;
        regwrdata = 32'h8c;
    end
    `RISCVSTATE_INIT_REGX2: begin
        regwren = 1;
        regno = 2;
        regena = 4'hf;
        regwrdata = `RAMSIZE * 4 - 16;
    end
    default: begin
        regwren = 0;
        regno = 0;
        regena = 0;
        regwrdata = 0;
    end
    endcase

    //DEFINE_FUNC(riscv_core_gen_ldaddr, "state, pc, instr, rs1") {
    always @(posedge wClk)
    if (state == `RISCVSTATE_READ_INST) begin
        ldaddr <= pc;
    end else if (state == `RISCVSTATE_EXEC_INST) begin
        if (opcode == 5'h00) begin
            /* ld inst */
            ldaddr <= rs1 + imm;
        end
    end

    //DEFINE_FUNC(riscv_core_gen_dstreg, "state, instr, ldaddr, readreg, bReadData, pc, rs1, regrddata, imm") {
    always @(posedge wClk)
    case (state)
    `RISCVSTATE_WAIT_LD: begin
        dstreg <= readreg;
        case (func3)
        0: begin
           case (ldaddr)
           0: dstvalue <= {{24{bReadData[7]}}, bReadData[7:0]};
           1: dstvalue <= {{24{bReadData[15]}}, bReadData[15:8]};
           2: dstvalue <= {{24{bReadData[23]}}, bReadData[23:16]};
           3: dstvalue <= {{24{bReadData[31]}}, bReadData[31:24]};
           endcase
        end
        1: begin
           case (ldaddr)
           0: dstvalue <= {{16{bReadData[15]}}, bReadData[15:0]};
           1: dstvalue <= {{16{bReadData[23]}}, bReadData[23:8]};
           2: dstvalue <= {{16{bReadData[31]}}, bReadData[31:16]};
           3: dstvalue <= 32'hdeadbeef;
           endcase
        end
        2: dstvalue <= bReadData;
        4: begin
           case (ldaddr)
           0: dstvalue <= {24'b0, bReadData[7:0]};
           1: dstvalue <= {24'b0, bReadData[15:8]};
           2: dstvalue <= {24'b0, bReadData[23:16]};
           3: dstvalue <= {24'b0, bReadData[31:24]};
           endcase
        end
        5: begin
           case (ldaddr)
           0: dstvalue <= {16'b0, bReadData[15:0]};
           1: dstvalue <= {16'b0, bReadData[23:8]};
           2: dstvalue <= {16'b0, bReadData[31:16]};
           3: dstvalue <= 32'hdeadbeef;
           endcase
        end
        endcase
    end
    `RISCVSTATE_WAIT_DIV: if (divclk == 0) begin
        dstreg <= 0;
        case (func3[1:0])
        0: begin //div
            dstreg <= rd;
            if (rs2 == 0)
                dstvalue <= 32'hffffffff;
            else
                dstvalue <= divs_result;
        end
        1: begin //divu
            dstreg <= rd;
            if (rs2 == 0)
                dstvalue <= 32'hffffffff;
            else
                dstvalue <= div_result;
         end
         2: begin//rem
            dstreg <= rd;
            if (rs2 == 0)
                dstvalue <= rs1;
            else
                dstvalue <= mods_result;
         end
         3: begin //remu
            dstreg <= rd;
            if (rs2 == 0)
                dstvalue <= rs1;
            else
                dstvalue <= mod_result;
         end
        endcase
    end 
    `RISCVSTATE_EXEC_INST: begin
        dstreg <= rd;
        case (opcode)
        5'h0d: begin
            dstvalue <= imm;
        end
        5'h05: begin
            dstvalue <= imm + pc;
        end
        5'h1b: begin
            dstvalue <= pc + 4;
        end 
        5'h19: begin
            dstvalue <= pc + 4;
        end 
        5'h04: begin /* alui */
            case (func3)
            0:/*addi*/ dstvalue <= rs1 + imm;
            1:/*slli*/ dstvalue <= rs1 << imm[4:0];
            2:/*slti*/ dstvalue <= (rs1_s < imm_s) ? 1 : 0;
            3:/*sltiu*/dstvalue <= (rs1 < imm) ? 1 : 0;
            4:/*xori*/ dstvalue <= rs1 ^ imm;
            5:/*srli/srai*/
                       dstvalue <= instr[30] ? (rs1_s >> imm[4:0]) : (rs1 >> imm[4:0]);
            6:/*ori*/  dstvalue <= rs1 | imm;
            7:/*andi*/ dstvalue <= rs1 & imm;
            default: begin dstreg <= 0; dstvalue<=0; end
            endcase
        end
        5'h0c: begin /*alu*/
            if (instr[25]) begin /* is MUL/DIV instr*/
                case (func3)
                0: begin //mul 
                    dstvalue <= muls_result[31:0];
                end
                1: begin //mulh 
                    dstvalue <= muls_result[63:32];
                end
                2: begin //mulhsu 
                    dstvalue <= mulsu_result[63:32]; 
                end
                3: begin //mulhu 
                   dstvalue <= mul_result[63:32];
                end
                
                default: begin //div
                    dstreg <= 0;
                    dstvalue <= 0;//divs_result;
                end
                endcase
            end else begin
                case (func3) 
                0: begin
                    if (instr[30])
                         dstvalue <= sub_result;
                    else
                         dstvalue <= add_result;
                end
                1: begin //sll 
                    dstvalue <= rs1 << rs2[4:0];
                end
                2: begin //slt 
                    dstvalue <= (rs1_s < rs2_s) ? 1 : 0;
                end
                3: begin //sltu 
                    dstvalue <= (rs1 < rs2) ? 1 : 0;
                end
                4: begin //xor
                    dstvalue <= rs1 ^ rs2;
                end
                5: begin //srl/sra
                    if (instr[30])
                        dstvalue <= rs1 >> rs2[4:0];
                    else
                        dstvalue <= rs1_s >> rs2[4:0];
                end
                6: begin //or
                    dstvalue <= rs1 | rs2;
                end
                7: begin //and
                    dstvalue <= rs1 & rs2;
                end
                endcase
            end
        end
        default: begin
            dstreg <= 0;
            dstvalue <= 0;
        end 
        endcase
    end
    endcase

    //DEFINE_FUNC(riscv_core_read_sig, "state, pc, instr, bReadData, rs1") {
    always @(state or pc or opcode or imm or rs1) begin
        wRead = 0;
        bReadAddr = 0;
        if (state == `RISCVSTATE_READ_INST) begin
            wRead = 1;
            bReadAddr = pc;
        end else if (state == `RISCVSTATE_EXEC_INST) begin
            if (opcode == 5'h00) begin
                /* ld inst */
                bReadAddr = rs1 + imm;
                wRead = 1;
            end
        end
    end

endmodule

总共也就500多行verilog代码,应该很容易读懂了。

15.4 结论

RISC-V CPU核以及DE1-SOC的顶层模型代码一起综合的结果如下:
在这里插入图片描述
占用的寄存器比较多一点,主要是采用了12级流水线实现了两个除法器(带符号和无符号的)。逻辑总共占了8%。使用了10个DSP块,存储器主要是内存64Kb和寄存器文件1Kb。还是可以接受的,修改一下可以实用了。下面是时序报告,最坏情况可以达到85MHZ,按照100MHZ时钟综合后下载到FPGA开发板跑起来好像没有发现问题。
在这里插入图片描述
后面的章节中我们会做一些实用化的改造出来。

【请参考】
01.HDL4SE:软件工程师学习Verilog语言(十四)
02.HDL4SE:软件工程师学习Verilog语言(十三)
03.HDL4SE:软件工程师学习Verilog语言(十二)
04.HDL4SE:软件工程师学习Verilog语言(十一)
05.HDL4SE:软件工程师学习Verilog语言(十)
06.HDL4SE:软件工程师学习Verilog语言(九)
07.HDL4SE:软件工程师学习Verilog语言(八)
08.HDL4SE:软件工程师学习Verilog语言(七)
09.HDL4SE:软件工程师学习Verilog语言(六)
10.HDL4SE:软件工程师学习Verilog语言(五)
11.HDL4SE:软件工程师学习Verilog语言(四)
12.HDL4SE:软件工程师学习Verilog语言(三)
13.HDL4SE:软件工程师学习Verilog语言(二)
14.HDL4SE:软件工程师学习Verilog语言(一)
15.LCOM:轻量级组件对象模型
16.LCOM:带数据的接口
17.工具下载:在64位windows下的bison 3.7和flex 2.6.4
18.git: verilog-parser开源项目
19.git: HDL4SE项目
20.git: LCOM项目
21.git: GLFW项目
22.git: SystemC项目

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

饶先宏

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值