CV32E40P处理器源码剖析（三）：EX_Stage

li_xuan_li_xuan

已于 2024-01-23 15:55:41 修改

阅读量385

点赞数 8

文章标签： risc-v 嵌入式硬件

于 2024-01-23 15:53:17 首次发布

本文链接：https://blog.csdn.net/li_xuan_li_xuan/article/details/135773812

版权

本文详细解析了CV32E40P处理器的EX_Stage模块，包括alu和mult模块的功能，加减、移位、比较、位操作以及乘法（包括32位和16位）和除法的实现过程。

摘要由CSDN通过智能技术生成

原文链接：CV32E40P处理器源码剖析（三）：EX_Stage - 知乎 (zhihu.com)

cv32e40p_ex_stage负责实现指令执行功能，内部由cv32e40p_alu和cv32e40p_mult两个模块组成。其中cv32e40p_alu模块负责执行加、减、与、或、移位等操作，cv32e40p_mult模块负责实现乘法操作。除法功能在cv32e40p_alu模块中实现。

cv32e40p_ex_stage内部组成

1. 执行阶段剖析

cv32e40p_ex_stage除了包含cv32e40p_alu和cv32e40p_mult两个模块，还实现：

ALU计算结果写回寄存器的赋值功能{regfile_alu_wdata_fw_o，regfile_alu_waddr_fw_o，regfile_alu_we_fw_o}；
访存结果写回寄存器的赋值功能{regfile_wdata_wb_o，regfile_waddr_wb_o，regfile_we_wb_o}；
条件分支指令结果输出;

assign branch_decision_o = alu_cmp_result; assign jump_target_o = alu_operand_c_i;

控制信号赋值，ex_ready_o反压id_stage，ex_valid_o传导回id_stage;

assign ex_ready_o = (~apu_stall & alu_ready & mult_ready & lsu_ready_ex_i & wb_ready_i & ~wb_contention) | (branch_in_ex_i); assign ex_valid_o = (apu_valid | alu_en_i | mult_en_i | csr_access_i | lsu_en_i) & (alu_ready & mult_ready & lsu_ready_ex_i & wb_ready_i);

2. 模块内部设计

2.1 cv32e40p_alu

cv32e40p_alu模块内部处理逻辑大致可以分为五块：

1）加减操作

首先，使用adder_in_a/b的作用是记录进位/借位功能。考虑到cv32e40p支持向量处理，即将32b数据拆分为4个8b数据，因此adder也拆分为4端，根据8b/16b/32b操作需求，插入相应的4b数值。

其次，执行加减操作。其中，减操作时将b操作数取反再与a操作数相加。ALU_ADDR/ ALU_ADDRU/ ALU_SUBR/ ALU_SUBUR，实现操作数(op_a+op_b+2^(bmask_b_i-1))>> bmask_b_i。

2）移位操作

Shift操作包含三类：a）shift_amt_norm，针对的ADD/SUB以及基于ADD/SUB扩展的定制指令；b）shift_left，针对左移、统计前导0/1个数、找尾1、除法、取余操作；c）剩下的操作类型，如右移、算数右移、与、或等。其中，左移操作是通过操作数取逆，右移，再取逆得到的。

assign shift_left = (operator_i == ALU_SLL) || (operator_i == ALU_BINS) || (operator_i == ALU_FL1) || (operator_i == ALU_CLB) || (operator_i == ALU_DIV) || (operator_i == ALU_DIVU) || (operator_i == ALU_REM) || (operator_i == ALU_REMU) || (operator_i == ALU_BREV); assign shift_use_round = (operator_i == ALU_ADD) || (operator_i == ALU_SUB) || (operator_i == ALU_ADDR) || (operator_i == ALU_SUBR) || (operator_i == ALU_ADDU) || (operator_i == ALU_SUBU) || (operator_i == ALU_ADDUR) || (operator_i == ALU_SUBUR); assign shift_arithmetic = (operator_i == ALU_SRA) || (operator_i == ALU_BEXT) || (operator_i == ALU_ADD) || (operator_i == ALU_SUB) || (operator_i == ALU_ADDR) || (operator_i == ALU_SUBR); // choose the bit reversed or the normal input for shift operand a assign shift_op_a = shift_left ? operand_a_rev : (shift_use_round ? adder_round_result : operand_a_i); assign shift_amt_int = shift_use_round ? shift_amt_norm : (shift_left ? shift_amt_left : shift_amt); assign shift_amt_norm = is_clpx_i ? {clpx_shift_ex, clpx_shift_ex} : {4{3'b000, bmask_b_i}};

3）比较逻辑

首先，设置需要比较大小的数据类型，32b，16b，8b，即使用4b位宽cmp_signed信号标识；
其次，判断操作数a与b是否相等，操作数a是否大于b，分别用is_equal_vec、is_greater_vec记录；
第三，根据指令类型，记录判断结果，输出最高bit，作为分支跳转判断结果；

always_comb begin cmp_result = is_equal; unique case (operator_i) ALU_EQ: cmp_result = is_equal; ALU_NE: cmp_result = ~is_equal; ALU_GTS, ALU_GTU: cmp_result = is_greater; ALU_GES, ALU_GEU: cmp_result = is_greater | is_equal; ALU_LTS, ALU_SLTS, ALU_LTU, ALU_SLTU: cmp_result = ~(is_greater | is_equal); ALU_SLETS, ALU_SLETU, ALU_LES, ALU_LEU: cmp_result = ~is_greater; default: ; endcase end

第四，根据比较结果，选择大/小的数据，实现max/min功能。其中，ABS操作，a操作数与b操作数（0）对比，若大于b，则取a，否则取~a。

4）位操作

位操作大致可以分为以下几类：

首1/尾1/前导0个数查找逻辑，cv32e40p_ff_one采用的是二分（树）查找思想；
1个数统计，采用的分层两两累加的思想；
位取逆操作，分为三类，一类是逐1b取逆，如0010_0001 -> 1000_0100；二是逐2b取逆，如0010_0001 -> 0100_1000；三是逐3b取逆，如0010_0001 -> 1000_0001；

5）除法操作，由cv32e40p_alu_div模块实现，具体实现细节这里不展开介绍。

2.2 cv32e40p_mult

乘法操作包含两种模式，一是a*b+c，二是点乘（a[3:0]*b[3:0]）。

1）a*b+c

该模式下包含两类乘法指令，即32位乘法，16位乘法；

32位乘法包含MUL_MSU32和MUL_MAC32，其中MSU32执行的动作rd=rd-rs1*rs2=rd+ [rs1]补码*rs2；MUL_MAC32执行的动作为rd=rd+rs1*rs2，可以合并执行。值得注意的是标准乘法指令mul是mul_mac32的特殊情况，即op_c_i=0。

assign int_is_msu = (operator_i == MUL_MSU32); assign int_op_a_msu = op_a_i ^ {32{int_is_msu}}; assign int_op_b_msu = op_b_i & {32{int_is_msu}}; assign int_result = $signed(op_c_i) + $signed(int_op_b_msu) + $signed(int_op_a_msu) * $signed(op_b_i);