总述
目前进行了五级流水线的ADD指令数据通路设计,一定程度上来说,只有单一指令应该不需要相应的控制通路。因为单一指令相当于控制信号固定可以预先确定好就行。这里我们按照五级流水线取指、译码、执行、访存、写回来响应编写对应Moudle。因为流水线是每个时钟传递一次数据,所以每个Moudle都会有寄存器定义的操作。
整体架构图:
(等matepad11到了再说…先放个不是自己画的,和MIPS一样)
其中实线为我们所需要的数据通路。
边写代码边解释
模块代码
//写时候的注释就懒得去了等之后代码整理的时候再说吧
//最终为了省事还是把寄存器写使能放到了数据通路里
首先是接口定义,这一部分在我写的过程中本来是没有的。因为我根据chisel - book 上的写法 <> 这个连接符会将两个Moudle同名端口(同一级同名Input连Output,父级和子级Input连Input,Output连Output),但这个被chisel更新放弃掉了,所以现在只有两个IO端口全部相同命名才可以连接,所以进行了接口定义。
class dectoexeIO extends Bundle{
val exe_op1_data = Output(UInt(64.W))
val exe_op2_data = Output(UInt(64.W))
//which need transmit to wb stage
val exe_wbaddr = Output(UInt(5.W))
val exe_ctrl_rf_wen = Output(Bool())
}
class wbtodecIO extends Bundle{
val wb_rd_data = Output(UInt(64.W))
val wb_rd_addr = Output(UInt(5.W))
val wb_rd_en = Output(Bool())
}
class regfiledebugIO extends Bundle{
val debug_addr = Input(UInt(5.W))
val debug_rdata = Output(UInt(64.W))
val debug_wdata =Input(UInt(64.W))
val debug_en =Input(Bool())
}
class exetomemIO extends Bundle{
val mem_alu_out = Output(UInt(64.W))
val mem_wbaddr = Output(UInt(5.W))
val mem_ctrl_rf_wen = Output(Bool())
}
class memtowbIO extends Bundle{
val wb_rd_data = Output(UInt(64.W))
val wb_rd_addr = Output(UInt(5.W))
val wb_ctrl_rf_wen = Output(Bool())
}
Fetch阶段,也就是取指阶段,因为现在没有B指令和J指令,所以就木讷的pc = pc +4就好了。
class FetchDataPath extends Module
{
val io = IO(new Bundle{
val ice = Output(UInt(1.W))
val iaddr = Output(UInt(32.W))
val pc = Output(UInt(32.W))
})
//Instruction Fetch State
//reg
val if_reg_pc = RegInit(0.U(32.W))//RegInit(0x00000000.U)
val ice_reg = RegInit(0.U(1.W))
//temp wire
val if_pc_next = Wire(UInt(32.W))
//function
val if_pc_plus4 = (if_reg_pc + 4.asUInt(32.W))//one condition
//according to the control signal
if_pc_next := Mux(ice_reg === 1.U,if_pc_plus4,0.U)//maybe have a problem when control add
//according to the stop signal
//assign reg
ice_reg := 1.U
if_reg_pc := if_pc_next
// assign Output
io.ice := ice_reg
io.pc := if_reg_pc
io.iaddr := if_reg_pc //now there is no tranformation for IRam
}
Decode阶段,也就是译码阶段主要是将指令中的原操作数操作数(目的寄存器的话是地址)从寄存器堆中取出并发送到Execute阶段(执行阶段)中,并把从WriteBack阶段(写回阶段)发来的数据写入到寄存器堆中去。因为我们就一条指令,连操作数从哪里来的选择都不用考虑,只有唯一的来源就是寄存器堆。等指令多了就需要控制通路控制操作数类型以及操作数来源了。这里涉及到了寄存器堆的代码,寄存器堆的代码中Mem()函数所生成的Memory是同步写异步读的Memory也就是说写进寄存器需要一个时钟周期而读出寄存器的数据可以立马读出。代码里包括了接口定义和功能实现。
class RFileIo(implicit val conf: Int = 64) extends Bundle()
{
val rs1_addr = Input(UInt(5.W))
val rs1_data = Output(UInt(conf.W))
val rs2_addr = Input(UInt(5.W))
val rs2_data = Output(UInt(conf.W))
val dm_addr = Input(UInt(5.W))
val dm_rdata = Output(UInt(conf.W))
val dm_wdata = Input(UInt(conf.W))
val dm_en = Input(Bool())
val waddr = Input(UInt(5.W))
val wdata = Input(UInt(conf.W))
val wen = Input(Bool())
}
//Mem is synchronous write and asynchronous read
class RegisterFile(implicit val conf: Int = 64) extends Module
{
val io = IO(new RFileIo())
val regfile = Mem(32, UInt(conf.W))
when (io.wen && (io.waddr =/= 0.U))
{
regfile(io.waddr) := io.wdata
}
when (io.dm_en && (io.dm_addr =/= 0.U))
{
regfile(io.dm_addr) := io.dm_wdata
}
io.rs1_data := Mux((io.rs1_addr =/= 0.U), regfile(io.rs1_addr), 0.U)
io.rs2_data := Mux((io.rs2_addr =/= 0.U), regfile(io.rs2_addr), 0.U)
io.dm_rdata := Mux((io.dm_addr =/= 0.U), regfile(io.dm_addr), 0.U)
}
class DecodeDataPath extends Module
{
val io = IO(new Bundle{
//data from IRAM
val dec_inst = Input(UInt(32.W))
//data from regfiles//regilrs be instantiated in module
// val rd1 = Input(UInt(64.W))
// val rd2 = Input(UInt(64.W))
//data from Fetch and it will be transmit
//val pc = Input(UInt(32.W))
//choose source data to execute
// val exe_op1_data = Output(UInt(64.W))
// val exe_op2_data = Output(UInt(64.W))
// //which need transmit to wb stage
// val exe_wbaddr = Output(UInt(5.W))
val dectoexeIO = new dectoexeIO()
//val exe_wen = Output(Uint(1.W)
//which need be wirtten in contal path
//data from write back stage to write regfile
// val wb_rd_addr = Input(UInt(5.W))
// val wb_rd_data = Input(UInt(64.W))
// val wb_rd_en = Input(Bool())
val wbtodecIO = Flipped(new wbtodecIO())
// val debug_addr = Input(UInt(5.W))
// val debug_rdata = Output(UInt(64.W))
// val debug_wdata =Input(UInt(64.W))
// val debug_en =Input(Bool())
val debugIO = new regfiledebugIO()
//debug contal but it is also a true contal signal
//val dec_wen = Input(Bool())
})
//because of Instruction Ram is SRAM which reads memory need one cycle so decode instruction reg is not need other stage may be need
//init data reg which is used in execute stage
val exe_reg_op1_data = RegInit(0.U(64.W))
val exe_reg_op2_data = RegInit(0.U(64.W))
val exe_reg_wbaddr = RegInit(0.U(5.W))
val exe_reg_ctrl_rf_wen = RegInit(false.B)
val dec_rs1_addr = io.dec_inst(19,15)
val dec_rs2_addr = io.dec_inst(24,20)
val dec_wbaddr = io.dec_inst(11,7)
//RegFile connect
val regfile = Module(new RegisterFile())
regfile.io.rs1_addr := dec_rs1_addr
regfile.io.rs2_addr := dec_rs2_addr
//regfile.io.wen :=io.dec_wen
//debug io connect
regfile.io.dm_addr := io.debugIO.debug_addr
regfile.io.dm_wdata :=io.debugIO.debug_wdata
regfile.io.dm_en :=io.debugIO.debug_en
io.debugIO.debug_rdata:=regfile.io.dm_rdata
val rf_rs1_data = regfile.io.rs1_data
val rf_rs2_data = regfile.io.rs2_data
regfile.io.waddr := io.wbtodecIO.wb_rd_addr
regfile.io.wdata := io.wbtodecIO.wb_rd_data
regfile.io.wen := io.wb_rd_en
//transmit to execute stage
//R type ALU instruction doesn't need MuxCase to choose op
exe_reg_op1_data := rf_rs1_data
exe_reg_op2_data := rf_rs2_data
exe_reg_wbaddr := dec_wbaddr
//exe_reg_ctrl_rf_wen := io.ctl.rf_wen
exe_reg_ctrl_rf_wen := Mux(dec_wbaddr =/= 0.U,true.B,false.B)
//output
io.dectoexeIO.exe_op1_data := exe_reg_op1_data
io.dectoexeIO.exe_op2_data := exe_reg_op2_data
io.dectoexeIO.exe_wbaddr := exe_reg_wbaddr
io.dectoexeIO.exe_ctrl_rf_wen := exe_reg_ctrl_rf_wen
}
Execute阶段,在我们目前只需要将我们从Decode阶段接收到的两个操作数相加即可。相加后发送到Memory阶段。
class ExecuteDataPath extends Module
{
val io = IO(new Bundle{
val dectoexeIO = Flipped(new dectoexeIO())
val exetomemIO = new exetomemIO()
//contral signal
})
//reg init
val mem_reg_alu_out = RegInit(0.U)
val mem_reg_wbaddr = RegInit(0.U(5.W))
val mem_reg_ctrl_rf_wen = RegInit(false.B)
val exe_alu_op1 = io.dectoexeIO.exe_op1_data.asUInt()//asUInt maybe a let val have a type
val exe_alu_op2 = io.dectoexeIO.exe_op2_data.asUInt()
val exe_adder_out = (exe_alu_op1 + exe_alu_op2)(64-1,0)
val exe_alu_out = Wire(UInt(64.W))
//depend on contal signal
exe_alu_out := exe_adder_out//only add instruction
//reg assign
mem_reg_alu_out := exe_alu_out
mem_reg_wbaddr := io.dectoexeIO.exe_wbaddr
mem_reg_ctrl_rf_wen := io.dectoexeIO.exe_ctrl_rf_wen
//output
io.exetomemIO.mem_alu_out := mem_reg_alu_out
io.exetomemIO.mem_wbaddr := mem_reg_wbaddr
io.exetomemIO.mem_ctrl_rf_wen := mem_reg_ctrl_rf_wen
}
Memory阶段在当前指令并没有操作,直接传递到WriteBcak阶段就好。
class MemoryDataPath extends Module
{
val io = IO(new Bundle{
val exetomemIO = Flipped(new exetomemIO())
val wbtodecIO = new wbtodecIO()
})
//reg init
val wb_reg_rd_data = RegInit(0.U(64.W))
val wb_reg_rd_addr = RegInit(0.U(5.W))
val wb_reg_ctrl_rf_wen = RegInit(false.B)
//reg assign
wb_reg_rd_data := io.exetomemIO.mem_alu_out
wb_reg_rd_addr := io.exetomemIO.mem_wbaddr
wb_reg_ctrl_rf_wen := io.exetomemIO.mem_ctrl_rf_wen
//output
io.wbtodecIO.wb_rd_data := wb_reg_rd_data
io.wbtodecIO.wb_rd_addr := wb_reg_rd_addr
io.wbtodecIO.wb_rd_en := wb_reg_ctrl_rf_wen
}
至于WriteBack阶段,从架构图上可以看到,属于直接连回Decode阶段的寄存器,所以这部分连线需要在Top层连线。
Top层,将需要debug时传入的信号预留出来,如dec_inst和debugIO,然后进行连线。因为这里我们没有指令存储器IROM,所以Fetch我们就先不整体连入了,就自己模拟指令进入,正常简单的话IROM是读出需要一个周期的,那就不需要互锁机制来保证时序的正确。
package mypack
import chisel3._
import chisel3.util._
class TopDataPath extends Module{
val io = IO(new Bundle{
val dec_inst = Input(UInt(32.W))
val debugIO = new regfiledebugIO()
//val wb_rd_en = Input(Bool())
val dec_wen = Input(Bool())
})
val fetchdatapath = Module( new FetchDataPath())
val decodedatapath = Module( new DecodeDataPath())
val executedatapath = Module( new ExecuteDataPath())
val memorydatapath = Module( new MemoryDataPath())
io.dec_inst<>decodedatapath.io.dec_inst
io.debugIO<>decodedatapath.io.debugIO
decodedatapath.io.dectoexeIO<>executedatapath.io.dectoexeIO
executedatapath.io.exetomemIO<> memorydatapath.io.exetomemIO
memorydatapath.io.wbtodecIO<>decodedatapath.io.wbtodecIO
// io.dec_inst <> decodedatapath.io.dec_inst
// io.debug_addr<>decodedatapath.io.debug_addr
// io.debug_rdata<>decodedatapath.io.debug_rdata
// io.debug_wdata<>decodedatapath.io.debug_wdata
// io.debug_en<>decodedatapath.io.debug_en
// io.dec_wen<>decodedatapath.io.dec_wen
// io.wb_rd_en<>decodedatapath.io.wb_rd_en
// decodedatapath.io.exe_op1_data<>executedatapath.io.exe_op1_data
// decodedatapath.io.exe_op2_data<>executedatapath.io.exe_op2_data
// decodedatapath.io.exe_wbaddr<>executedatapath.io.exe_wbaddr
// executedatapath.io.mem_alu_out <> memorydatapath.io.mem_alu_out
// executedatapath.io.mem_wbaddr <> memorydatapath.io.mem_wbaddr
// memorydatapath.io.wb_rd_addr<>decodedatapath.io.wb_rd_addr
// memorydatapath.io.wb_rd_data<>decodedatapath.io.wb_rd_data
//behind the times
// executedatapath.io <> memorydatapath.io
// memorydatapath.io <> decodedatapath.io
// decodedatapath.io <> executedatapath.io
}
测试代码
测试代码emmm功能肯定不够完善,就先简单测测。
寄存器堆测试,先把32个挨个写再挨个读。
class RegfileSpec extends FreeSpec with ChiselScalatestTester with Matchers{
"RegFile should be OK " in {
test(new RegisterFile).withAnnotations(Seq(WriteVcdAnnotation)) { c =>
c.io.wen.poke(true.B)
var x=0;
for ( x <- 0 to 31){
c.io.waddr.poke(x.U)
c.io.wdata.poke((x+1).U)
c.clock.step(10)
}
c.io.rs1_addr.poke(0.U)
c.io.rs1_data.expect(0.U)
c.clock.step(1)
for ( x <- 1 to 31){
c.io.rs1_addr.poke(x.U)
c.io.rs1_data.expect((x+1).U)
c.clock.step(10)
}
}
}
}
Fetch阶段测试,这里玩了玩如何在chisel tester体系中进行复位,和复位结果是啥。
class FetchSpec extends FreeSpec with ChiselScalatestTester with Matchers{
"Fetch should be OK " in {
test(new FetchDataPath) { c =>
//test reset function
c.io.ice.expect(false.B)
c.reset.poke(true.B)
c.clock.step()
c.io.ice.expect(false.B)
c.reset.poke(false.B)
c.clock.step()
c.io.ice.expect(true.B)
c.clock.step()
c.reset.poke(true.B)
c.io.ice.expect(true.B)
c.clock.step()
c.reset.poke(false.B)
c.clock.step()
//reset finish
var x=0
for(x <- 0 to 100){
c.io.ice.expect(true.B)
c.io.pc.expect((4*x).U)
c.io.iaddr.expect((4*x).U)
c.clock.step()
}
}
}
}
Decode阶段测试,这里尝试了利用函数生成对应指令编码,因为函数在Scala中也是对象的一种表示。
class DecodeSpec extends FreeSpec with ChiselScalatestTester with Matchers{
def instruction_synthetic (rs1:Int ,rs2:Int ,rd:Int) = (0|51|(rd<<7)|(0<<12)|(rs1<<15)|(rs2<<20))
"Decode should be OK " in {//scala.util.Random.nextInt(32)
test(new DecodeDataPath).withAnnotations(Seq(WriteVcdAnnotation)) { c =>
//use debug io to write reg
c.io.debugIO.debug_wdata.poke(1.U)
c.io.debugIO.debug_addr.poke(1.U)
c.io.debugIO.debug_en.poke(true.B)
c.clock.step()
c.io.debugIO.debug_wdata.poke(2.U)
c.io.debugIO.debug_addr.poke(2.U)
c.io.debugIO.debug_en.poke(true.B)
c.clock.step()
//test read
c.io.dec_inst.poke(instruction_synthetic(1,2,3).U)
c.clock.step()
// c.io.debug_addr.poke(3.U)
// c.io.debug_rdata.expect(3.U)
c.io.dectoexeIO.exe_op1_data.expect(1.U)
c.io.dectoexeIO.exe_op2_data.expect(2.U)
c.io.dectoexeIO.exe_wbaddr.expect(3.U)
c.clock.step()
//test write
c.io.wbtodecIO.wb_rd_addr.poke(3.U)
c.io.wbtodecIO.wb_rd_en.poke(true.B)
c.io.wbtodecIO.wb_rd_data.poke(3.U)
c.clock.step()
c.io.debugIO.debug_addr.poke(3.U)
c.io.debugIO.debug_rdata.expect(3.U)
}
}
}
Execute阶段测试,这里有个问题就是Random.nextInt这个随机数发生器没有办法进行64位整形生成,目前暂时先用用之后会找到解决方案。
class ExecuteSpec extends FreeSpec with ChiselScalatestTester with Matchers{
"Execute should be OK " in {//scala.util.Random.nextInt(32)
test(new ExecuteDataPath).withAnnotations(Seq(WriteVcdAnnotation)) { c =>
// val debug_rdata = Output(UInt(64.W))
// val debug_wdata =Input(UInt(64.W))
// val debug_en =Input(Bool())nnotations(Seq(WriteVcdAnnotation)) { c =>
var x=0
for (x <- 0 to 100)
{
val op1 = Random.nextInt(429496729)
val op2 = Random.nextInt(429496729)
val result = op1 + op2
val addr = Random.nextInt(32)
c.io.dectoexeIO.exe_wbaddr.poke(addr.U)
c.io.dectoexeIO.exe_op1_data.poke(op1.U)
c.io.dectoexeIO.exe_op2_data.poke(op2.U)
c.clock.step()
c.io.exetomemIO.mem_alu_out.expect(result.U)
c.io.exetomemIO.mem_wbaddr.expect(addr.U)
// println(c.io.mem_alu_out.peek())
c.io.exetomemIO.mem_alu_out.peek()
}
}
}
}
Memory阶段测试,就是测一下寄存器,因为本质就是两个信号间隔了寄存器。
class MemorySpec extends FreeSpec with ChiselScalatestTester with Matchers{
"Memory should be OK " in {//scala.util.Random.nextInt(32)
test(new MemoryDataPath) { c =>
var x = 0
for (x <- 0 to 100)
{
val alu_out = Random.nextInt(429496729)
val addr = Random.nextInt(32)
c.io.exetomemIO.mem_alu_out.poke(alu_out.U)
c.io.exetomemIO.mem_wbaddr.poke(addr.U)
//c.io.mem_alu_out.peek()
c.clock.step()
c.io.wbtodecIO.wb_rd_data.expect(alu_out.U)
c.io.wbtodecIO.wb_rd_addr.expect(addr.U)
}
}
}
}
Top层测试,这次测试没有expect测试,是利用通过生成vcd波形来观察结果对不对。测试重要通过自己给了个wb_rd_en来控制写入寄存器堆使能。
这里ps一下:以下这种操作是生成Vcd波形的一种方法之一,vcd波形生成后可以通过GTKwave查看。
import chiseltest.experimental.TestOptionBuilder._
import chiseltest.internal.WriteVcdAnnotation
test(new TopDataPath).withAnnotations(Seq(WriteVcdAnnotation))
class TopSpec extends FreeSpec with ChiselScalatestTester with Matchers{
"Top should be OK " in {//scala.util.Random.nextInt(32)
test(new TopDataPath).withAnnotations(Seq(WriteVcdAnnotation)) { c =>
def instruction_synthetic (rs1:Int ,rs2:Int ,rd:Int) = (0|51|(rd<<7)|(0<<12)|(rs1<<15)|(rs2<<20))
var x = 0
c.io.dec_wen.poke(true.B)
c.clock.step()
for (x <- 0 to 32 )
{
c.io.debugIO.debug_wdata.poke(x.U)
c.io.debugIO.debug_addr.poke(x.U)
c.io.debugIO.debug_en.poke(true.B)
c.clock.step()
}
for (x <- 0 to 100)
{
val rs1_addr = Random.nextInt(32)
val rs2_addr = Random.nextInt(32)
val rd_addr = Random.nextInt(32)
c.io.dec_inst.poke(instruction_synthetic(rs1_addr,rs2_addr,rd_addr).U)
c.clock.step()
}
for (x <- 0 to 32)
{
c.io.debugIO.debug_addr.poke(x.U)
c.clock.step()
}
}
}
}
vcd图大概看了看…应该没有太大问题。