HDLBits难题总结

goooooom

已于 2022-05-19 17:01:10 修改

阅读量456

点赞数

文章标签： fpga开发

于 2022-05-14 19:27:56 首次发布

本文链接：https://blog.csdn.net/m0_52489858/article/details/124725032

版权

Mux256to1v

题目要求：
Create a 4-bit wide, 256-to-1 multiplexer. The 256 4-bit inputs are all packed into a single 1024-bit input vector. sel=0 should select bits in[3:0], sel=1 selects bits in[7:4], sel=2 selects bits in[11:8], etc.

我的写法：

module top_module( 
    input [1023:0] in,
    input [7:0] sel,
    output [3:0] out );
    wire a1,a2;
    a1 = sel*4+3;
    a2 = sel*4;
    assign out = in[a1:a2];

endmodule

查资料发现假如向量A，那么A[a:b]中的a和b必须是常量。但是这样A[a]是可以的。

后来采用:

module top_module( 
    input [1023:0] in,
    input [7:0] sel,
    output [3:0] out );
    assign out = in[sel*4+:4];
endmodule

或者：

module top_module( 
    input [1023:0] in,
    input [7:0] sel,
    output [3:0] out );
    assign out = {in[sel*4+3],in[sel*4+2],in[sel*4+1],in[sel*4]};
endmodule

补码运算中溢出的问题（Exams/ece241 2014 q1c）

关于补码运算中进位溢出的问题及延伸，hdlbits中Exams/ece241 2014 q1c给出了很好的解释，首先来看问题：

Assume that you have two 8-bit 2’s complement numbers, a[7:0] and b[7:0]. These numbers are added to produce s[7:0]. Also compute whether a (signed) overflow has occurred.
译文：*假设你有两个8位2的补码，a[7:0]和b[7:0]。把这些数字相加就得到s[7:0]。还要计算是否发生了(带符号的)溢出

我们需要了解一下补码的概念，二进制数据计算时，由于负数的出现，单纯的二进制数无法计算（即原码）。为了区分正数和负数，在原码的最高位前新添加了一位，称之为符号位（0表示正，1表示负），但是仅仅这样是不够的，你可以尝试计算一下-3+5，因此把负数中所有二进制数取反（此时为反码），然后+1，正数还是原来的样子，此时这种形式的数据称之为原数据的反码。由于反码的存在，带二进制数可以直接进行加减运算。
然而在反码计算的过程中，会出现符号位溢出的问题，例如-7-5的计算为1001+1010=（1）0011，即结果为+3,这显然是不正确的，因此该问题中除了简单的相加，更要考虑进位问题的存在，下面是该题的代码：

module top_module (
    input [7:0] a,
    input [7:0] b,
    output [7:0] s,
    output overflow
); 
    assign s=a + b;
    assign overflow=a[7] & b[7] & ~s[7] | ~a[7] & ~b[7] & s[7];

endmodule

要明确，出现错误的根源是两个数相加后的进位与原本的符号位进行运算，覆盖掉了原本的符号位，也就是以下两种情况：
1.输入为两个正数，并产生了进位1，覆盖了原本得数的符号位0.（例如反码计算 0100+0100）
2.输入为两个负数，两个数符号位相加产生0（例如反码计算1001+1001）

其实为了避免这种情况的发生，产生了一种反码的另一种形式，即变形补码：

变形补码，又称”模4补码“即用两个二进制位来表示数字的符号位，其余与补码相同。变形补码，用“00”表示正，用“11”表示负，也称为模4的补码。用变形补码进行加减运算时，当运算结果的符号位出现“01”或者“10”时，则表示产生溢出。变形补码的最高位（第一个符号位）总是表示正确的符号，比如"00"、 “01”分别表示正数、正溢出（上溢），“11”、“10”表示负数、

负溢出（下溢）。（摘自百度百科）

Kmap3

Implement the circuit described by the Karnaugh map below.

Try to simplify the k-map before coding it. Try both product-of-sums and sum-of-products forms. We can't check whether you have the optimal simplification of the k-map. But we can check if your reduction is equivalent, and we can check whether you can translate a k-map into a circuit.

module top_module(
    input a,
    input b,
    input c,
    input d,
    output out  ); 
	assign out = a | ( !b & c );
endmodule

这里有个点需要注意：我平时喜欢用~，这个地方取反用的是！，下面说一下区别：

“!”表示逻辑取反，“~”表示按位取反
当面对位宽为1时:两个操作符的作用相同。
当位宽不为1时:
“~”会将变量的各个位依次取反如：a[3:0] ={1,0,1,1} ， ~a ={0,1,0,0};
“!”会将变量作为一个值去做处理，非0为1：a[3:0] ={1,0,1,1} ，a=11，!a=0。a[3:0] ={0,0,0,0} ，a=0，!a=1

同步复位

什么同步复位？当时钟上升沿到来时，如果同步复位端有效（本题中复位高电平有效，即 reset），那么任凭你触发器此前输出或者输入的是 0，是 1，输出一律变为 0。

复位电路对于那些经常需要恢复到初始状态的电路是必要的，复位相较于断电重新加载程序恢复到初始状态的速度要快得多。但也有一些电路则不需要复位设计。（作者也是有所耳闻那些不需要复位的电路，平常自己还是会加上复位电路）

always@(posedge clk) begin
        if(reset)
            q <= 8'b0;
        else
    	    q <= d;
    end

异步复位

本题中的触发器引入了异步复位。当异步复位端有效时，触发器的输出复位为 0 。

对于上上题中引入的同步复位来说，存在一个问题。

我们假设一个工作时钟为 1Hz 的系统，比如你的床头闹钟，你按下隆隆叫的闹钟时，就好比按下了闹钟的同步复位键。那么当你清晨 6:30:00 的闹钟响起，不想起床的你拍下闹钟，最糟糕的情况下，闹钟还会再响接近一秒，因为你刚好错过了一个时钟上升沿，真是糟糕，这真是太可怕了！

对于同步复位系统来说，当同步复位事件发生时，等到下一个时钟上升沿才会得到响应，响应的速度比较慢。

与之相对的异步复位的响应就很快，因为在异步复位有效的时刻，复位响应就会发生，好像戳破气球一般。

always@(posedge clk or posedge areset) begin
        if(areset)
            q <= 8'b0;
        else
    	    q <= d;
    end

Edgedetect

For each bit in an 8-bit vector, detect when the input signal changes from 0 in one clock cycle to 1 the next (similar to positive edge detection). The output bit should be set the cycle after a 0 to 1 transition occurs.

Here are some examples. For clarity, in[1] and pedge[1] are shown separately.

module top_module (
    input clk,
    input [7:0] in,
    output [7:0] pedge
);
	reg [7:0]	in_reg;
    always@(posedge clk)begin
        in_reg <= in;
    end
  
    always@(posedge clk)begin
        pedge <= in & ~in_reg;
    end

    /*
    //second way
    integer i;
    always@(posedge clk)begin
        for(i = 0; i <= 7; i = i + 1)begin
            if(in[i] & ~in_reg[i])begin
                pedge[i] = 1'b1;
            end
            else begin
                pedge[i] = 1'b0;
            end
        end
    end
    */

endmodule

这是一个上升沿检测电路，大家要认真对待这个问题，招聘经常会问到。

3.2.1.16 Detect both edge（Edgedetect2）

For each bit in an 8-bit vector, detect when the input signal changes from one clock cycle to the next (detect any edge). The output bit should be set the cycle after a 0 to 1 transition occurs.

Here are some examples. For clarity, in[1] and anyedge[1] are shown separately

module top_module (
    input clk,
    input [7:0] in,
    output [7:0] anyedge
);
    
    reg [7:0]	in_reg;
    always@(posedge clk)begin
        in_reg <= in;
    end
    
    always@(posedge clk)begin
        anyedge = in ^ in_reg;
    end
 
endmodule

这是一个双沿检测电路。

Dualedge

You're familiar with flip-flops that are triggered on the positive edge of the clock, or negative edge of the clock. A dual-edge triggered flip-flop is triggered on both edges of the clock. However, FPGAs don't have dual-edge triggered flip-flops, and always @(posedge clk or negedge clk) is not accepted as a legal sensitivity list.

Build a circuit that functionally behaves like a dual-edge triggered flip-flop:


module top_module (
    input clk,
    input d,
    output q
);
    reg q_d1;
    reg q_d2;
    
    always@(posedge clk)begin
        q_d1 <= d ^ q_d2;
    end
    always@(negedge clk)begin
        q_d2 <= d ^ q_d1;
    end
    
    assign q = q_d1 ^ q_d2;
    
endmodule

这道题是重点，大家注意，对于双沿检测，一定不能使用always@(posedge clk or negedge clk)begin这种方式，这种方式是不可综合的

为什么这种方法可以呢？

首先，在上升沿的时候，q_d1变成了d ^ q_d2，那么q = q_d1 ^ q_d2 = d ^ q_d2 ^ q_d2 = d；

接着，在下降沿的时候，q_d2变成了d ^ q_d1，那么q = q_d1 ^ q_d2 = q_d1 ^ d ^ q_d1 = d；

在每个正负沿，q_d1和q_d2交替出现，因此q总会输出最新的值。

这样就可以实现双沿采样，这种方法真是非常巧妙~

module top_module (
    input clk,
    input d,
    output q
);
 
    reg q_d1;
    reg q_d2;
 
    always@(posedge clk)begin
        q_d1 <= d;
    end
    
    always@(negedge clk)begin
        q_d2 <= d;
    end
    
    assign q = clk ? q_d1 : q_d2;
    
endmodule

这种方法更好理解，在上升沿和下降沿分别对d采样，然后根据时钟的正负，选择输出上升沿和下降沿采样后的信号。博主最先想到的答案也是这个。对于做题而言，这个答案没有任何问题，因为毛刺对success不会有很大影响，但是实际应用中，这个电路会产生毛刺。

大家可以想象，在上升沿和下降沿采到d信号以后，q_d1和q_d2会立即变化，同时时钟也会立即变化，时钟和q_d1、q_d2的变化的延迟是不一致的，这就导致了毛刺的出现，这时候的时钟相当于是一个组合逻辑了，由于路径延时的不同，导致组合逻辑到达的时间存在差异，从而会产生毛刺。

3.2.3.3 Left/right arithmetic shift by 1 or 8（Shift18）

module top_module(
    input clk,
    input load,
    input ena,
    input [1:0] amount,
    input [63:0] data,
    output reg [63:0] q); 
    
    always@(posedge clk)begin
        if(load)begin
            q <= data;
        end
        else if(ena)begin
            case(amount)
                2'b00:begin
                    q <= {q[62:0], 1'b0};
                end
                2'b01:begin
                    q <= {q[55:0], 8'd0};
                end
                2'b10:begin
                    q <= {q[63], q[63:1]};
                end
                2'b11:begin
                    q <= { {8{q[63]}}, q[63:8] };
                end 
            endcase
        end
    end
 
endmodule

这道题目大家要注意，算术移位，也就是包含符号位的移位，对于正数来说，最高位为0，对于负数来说，最高位为1，所以进行算术移位时，如果是左移，那不用管符号位的问题，如果是右移，就要将符号位补在高位。

3.2.3.9 3-input LUT（Exams/ece241 2013 q12）

module top_module (
    input clk,
    input enable,
    input S,
    input A, B, C,
    output Z ); 
    
    reg [7:0]	Q;
    
    always@(posedge clk)begin
        if(enable)begin
            Q <= {Q[6:0], S};
        end
    	else begin
            Q <= Q;
        end
    end
    
    assign Z = Q[{A, B, C}];
 
    /*
    //second way
    always@(*)begin
        case({A, B, C})
            3'b000:begin
                Z = Q[0];
            end
            3'b001:begin
                Z = Q[1];
            end
            3'b010:begin
                Z = Q[2];
            end
            3'b011:begin
                Z = Q[3];
            end
            3'b100:begin
                Z = Q[4];
            end
            3'b101:begin
                Z = Q[5];
            end
            3'b110:begin
                Z = Q[6];
            end
            3'b111:begin
                Z = Q[7];
            end
        endcase
    end
    */
 
endmodule

这道题目是查找表，博主的方法一比方法二简洁很多，用到了一定的技巧性，大家好好体会一下。值得一提的是，FPGA是基于查找表和寄存器的，大家一定注意到每次在quartus或者vivado中综合完成后显示的资源用量，其中就有LUT，这就是查找表逻辑。

Rule90

For an initial state of q[511:0] = 1, the first few iterations are:

This forms half of a Sierpiński triangle.

我写的代码：

module top_module(
    input clk,
    input load,
    input [511:0] data,
    output [511:0] q ); 
    reg [511:0] q_reg;    
    integer i;
    assign q = q_reg;

    
always@(posedge clk)
    if(load)
        q_reg <= data;
    else begin
    	for(i=0;i<512;i++)begin
    	    if(i == 0)begin
    	        q_reg[0] <= q_reg[1] ^ 1'b0;
    	    end        
    	    else if(i == 511)begin
    	        q_reg[511] <= q_reg[510] ^ 1'b0;
    	    end
    	    else begin
    	        q_reg[i]<=q_reg[i-1] ^ q_reg[i+1];
    	    end
    	end
    end
endmodule

标准答案的

module top_module(
	input clk,
	input load,
	input [511:0] data,
	output reg [511:0] q);
	
	always @(posedge clk) begin
		if (load)
			q <= data;	// Load the DFFs with a value.
		else begin
			// At each clock, the DFF storing each bit position becomes the XOR of its left neighbour
			// and its right neighbour. Since the operation is the same for every
			// bit position, it can be written as a single operation on vectors.
			// The shifts are accomplished using part select and concatenation operators.
			
			//     left           right
			//  neighbour       neighbour
			q <= q[511:1] ^ {q[510:0], 1'b0} ;
		end
	end
endmodule

Conwaylife

Conway's Game of Life is a two-dimensional cellular automaton.

The "game" is played on a two-dimensional grid of cells, where each cell is either 1 (alive) or 0 (dead). At each time step, each cell changes state depending on how many neighbours it has:

0-1 neighbour: Cell becomes 0.
2 neighbours: Cell state does not change.
3 neighbours: Cell becomes 1.
4+ neighbours: Cell becomes 0.

The game is formulated for an infinite grid. In this circuit, we will use a 16x16 grid. To make things more interesting, we will use a 16x16 toroid, where the sides wrap around to the other side of the grid. For example, the corner cell (0,0) has 8 neighbours: (15,1), (15,0), (15,15), (0,1), (0,15), (1,1), (1,0), and (1,15). The 16x16 grid is represented by a length 256 vector, where each row of 16 cells is represented by a sub-vector: q[15:0] is row 0, q[31:16] is row 1, etc. (This tool accepts SystemVerilog, so you may use 2D vectors if you wish.)

load: Loads data into q at the next clock edge, for loading initial state.
q: The 16x16 current state of the game, updated every clock cycle.

The game state should advance by one timestep every clock cycle.

John Conway, mathematician and creator of the Game of Life cellular automaton, passed away from COVID-19 on April 11, 2020.

康威的《生命游戏》是一个二维的细胞自动机。

“游戏”在二维单元格网格上进行，其中每个单元格为1（生存）或0（死亡）。在每个时间步长，每个小区都会根据其具有的邻居数量来更改状态：

0-1个邻居：单元格变为0。
2个邻居：单元格状态不变。
3个邻居：单元格变成1。
4个以上邻居：单元格变为0。

该游戏适用于无限网格。在此电路中，我们将使用16x16的网格。为了使事情变得更有趣，我们将使用16x16的环形面，其侧面环绕在网格的另一侧。例如，角单元格（0,0）有8个邻居：（15,1），（15,0），（15,15），（0,1），（0,15），（1,1），（1,0）和（1,15）。 16x16网格由长度为256的矢量表示，其中16个单元格的每一行由子矢量表示：q [15：0]是第0行，q [31:16]是第1行，依此类推。（此工具接受SystemVerilog，因此你可以根据需要使用2D向量。）

qoad：在下一个时钟沿将数据加载到q中，以加载初始状态。
q：游戏的16x16当前状态，每个时钟周期更新一次。

游戏状态应在每个时钟周期前移一个时间步长。

数学家，生命游戏细胞自动机的创造者约翰·康威（John Conway）于2020年4月11日因COVID-19逝世。

我经过尝试写出代码如下，但是应该还有很大的提升空间

module top_module(
    input clk,
    input load,
    input [255:0] data,
    output [255:0] q );
    reg [255:0] q_reg;
    reg [3:0] count;
       
assign q = q_reg;
    
always @(posedge clk)
    if(load)
      	q_reg = data;
    else
        for(int i=0;i<256;i++)begin
            if(i==0)begin
                count = q[255]+q[1]+q[17]+q[15]+q[16]+q[240]+q[241]+q[31];
            end
            else if(i==15)begin
                count = q[255]+q[254]+q[240]+q[14]+q[0]+q[31]+q[30]+q[16];
            end
            else if(i==240)begin
                count = q[255]+q[224]+q[239]+q[225]+q[15]+q[0]+q[241]+q[1];
                end
            else if(i==255)begin
                count = q[240]+q[254]+q[239]+q[238]+q[224]+q[0]+q[15]+q[14];
                end
            else if(i==16)begin
                count = q[17]+q[31]+q[15]+q[0]+q[1]+q[33]+q[32]+q[47];end
            else if(i==31) begin
                count = q[i+1]+q[0]+q[i+15]+q[i+16]+q[i-15]+q[i-1]+q[i-17]+q[i-16];end
            else if(i>0&&i<15)begin
                count = q[i+1]+q[i+17]+q[i+15]+q[i+16]+q[i+241]+q[i-1]+q[i+239]+q[i+240];
                end
            else if(i>240&&i<255)begin
                count = q[i+1]+q[i+17-256]+q[i+15-256]+q[i+16-256]+q[i-15]+q[i-1]+q[i-17]+q[i-16];end
            else if(i%16==0&&i>17&&i<240)begin
                count = q[i+1]+q[i+17]+q[i+16]+q[i-15]+q[i-16]+q[i+31]+q[i+15]+q[i-1];end
            else if(i%16==15&&i>31&&i<240)begin
                count = q[i+15]+q[i+16]+q[i-1]+q[i-17]+q[i-16]+q[i-15]+q[i+1]+q[i-31];end
            else begin
                count = q[i+1]+q[i+17]+q[i+15]+q[i+16]+q[i-15]+q[i-1]+q[i-17]+q[i-16];end
            case(count)
                4'd0,4'b0:	q_reg[i]=1'b0;
                4'd2:		q_reg[i]=q_reg[i];
                4'd3:		q_reg[i]=1'b1;
                default:	q_reg[i]=1'b0;
            endcase
        end
    
endmodule

其他人的方法

在解答此题之后，笔者开始对于这一部分进行了一些思考，该题并不像是利用Verilog编写的题目，如果采用软件模拟的方法，似乎更容易完成细胞自动机，这就让人不由得想到利用高层次语言完成硬件编程，恰巧笔者在之前也有做过类似的项目，利用C语言完成卷积神经网络的前向推导（PL端），通过AXI总线传输数据，再利用PS端调度，输出预测结果，完成数据集的识别，这一项目基于Ultra96 V2板子完成的，其中个人编写的卷积操作如下：

void convolution_c1(
			  DTYPE X[C1_X_DMNIN][C1_X_DMNIN][C1_N_CHAN],
		const DTYPE W[C1_W_DMNIN][C1_W_DMNIN][C1_N_CHAN][C1_N_FILTERS],
			  DTYPE out[C1_OUT_DMNIN][C1_OUT_DMNIN][C1_N_FILTERS],
		const DTYPE bias[C1_N_FILTERS])
{
	convolution_c1_label9:for(uint8_t f = 0 ; f < C1_N_FILTERS; f++)
	{
		convolution_c1_label10:for (uint8_t r = 0; r < C1_OUT_DMNIN ; r++)
		{
			convolution_c1_label11:for (uint8_t c = 0; c < C1_OUT_DMNIN ; c++)
			{
				out[r][c][f] = bias[f];
			}
		}
	}
	printf("the conv1 output is :");
	for(uint8_t f = 0 ; f < C1_N_FILTERS; f++)
	{
		for (uint8_t r = 0; r < C1_OUT_DMNIN ; r+=STRIDE)
		{
			for (uint8_t c = 0; c < C1_OUT_DMNIN ; c+=STRIDE)
			{
				for(uint8_t ch = 0 ; ch < C1_N_CHAN; ch++)
				{
					for (uint8_t i = 0; i < C1_W_DMNIN ; i++)
					{
						for (uint8_t j = 0; j < C1_W_DMNIN ; j++)
						{
							out[r][c][f] = out[r][c][f] + W[i][j][ch][f] * X[r+i][c+j][ch];
						}
					}
				}
				printf("%d,",out[r][c][f]);
			}
		}
	}
	printf("\n");
}

可以看出，卷积操作由于是在二维图像中进行，也是利用许多for循环嵌套完成的，这一部分的乘加操作在C语言编写中十分简单快捷，六个for循环嵌套便可完成一次卷积操作。
笔者在完成其他部分的函数中，封装IP后在Vivado中调用，成功完成了这一网络的部署，得到输出结果。