Clifford E. Cummings 论文详解（三）RTL Coding Styles That Yield Simulation and Synthesis Mismatches

G2突破手259

已于 2025-01-15 11:51:56 修改

阅读量935

点赞数 13

分类专栏： Clifford E. Cummings 论文详解文章标签： fpga开发

于 2025-01-12 16:16:30 首次发布

本文链接：https://blog.csdn.net/s1_mple/article/details/144914776

版权

Clifford E. Cummings 论文详解专栏收录该内容

5 篇文章

订阅专栏

产生仿真和综合不匹配的 RTL 编码风格

1.0 前言
2.0 敏感度列表
- 2.1 敏感度列表不完整
- 2.2 具有错误排序的完整敏感度列表
3.0 函数
4.0 CASE 语句
5.0 初始化
- 5.1 赋值 'X'
- 5.2 使用 translate_off / translate_on 初始化模型
6.0 translate_off/ translate_on 的一般使用
7.0 Timing Delays
8.0 总结
参考资料

1.0 前言

ASIC 和 FPGA 设计的全部内容就是将一个想法、一个规范转化为物理设计的工程任务。自上而下的设计方法要求将抽象的想法转化为可以实现和构建的物理形式。开发简洁、准确的设计需要了解 RTL 编码风格是如何综合的，以及哪些风格会导致问题。本文将讨论一些导致 RTL 和门级建模之间不匹配的 HDL 编码风格。基本前提是，任何编码风格，如果提供 HDL 仿真器有关设计的信息而不能传递给综合工具，都是糟糕的编码风格。此外，任何向综合工具提供信息而仿真器不可用的综合开关都是不好的。

如果违反了这些准则，综合前的 RTL 仿真将与综合后的门级仿真不匹配。如果没有完全测试所有可能的逻辑组合，这些不匹配可能很难检测到，如果没有被发现，通常对生产 ASIC 是致命的。此外，当设计的尺寸达到数百万个门时，完整的测试变得不切实际。解决方案是了解什么编码风格或综合开关会导致 RTL 与门级建模不匹配，并避免这些结构。

2.0 敏感度列表

综合工具从具有不包含 Verilog 关键字 posedge 或 negedge 的敏感度列表的 always 块中推断组合或锁存逻辑。对于组合 always 块，推导出的逻辑由块中的方程推导而来，与敏感度列表无关。综合工具将读取敏感度列表并将其与 always 块中的方程进行比较，仅报告可能导致综合前和综合后仿真之间不匹配的编码遗漏。

敏感度列表中未在 always 块中使用的信号的存在不会对综合前或综合后仿真产生任何功能差异。外来信号的唯一影响是综合前仿真将运行得更慢。这是由于 always 块被输入和计算的次数比必要的要多。

2.1 敏感度列表不完整

由 always 块中的方程描述的可综合逻辑将始终被实现，就好像敏感度列表是完整的一样。然而，同样的 always 块的综合前仿真功能将会有很大的不同。在模块 code1a 中，敏感度列表是完整的；因此，综合前和综合后仿真都将模拟一个 2 输入和门。在模块 code1b 中，敏感度列表只包含变量 a。综合后的仿真将模拟一个 2 输入和门。然而，对于综合前仿真，always 块只会在变量 a 发生变化时执行。任何变量 b 的变化与 a 的变化不一致，将不会在输出中观察到。此功能将不匹配综合后仿真的 2 输入和门。最后，模块 code1c 不包含任何敏感度列表。在综合前仿真期间，这个 always 块将把模拟器锁定到一个无限循环中。然而，综合后的仿真将再次是一个 2 输入和门。

EXAMPLE 2.1 – Incomplete sensitivity lists

module code1a (o, a, b);
	output o;
	input a, b;
	reg o;
	always @(a or b)
		o = a & b;
endmodule

module code1b (o, a, b);
	output o;
	input a, b;
	reg o;
	always @(a)
		o = a & b;
endmodule

module code1c (o, a, b);
	output o;
	input a, b;
	reg o;
	always
		o = a & b;
endmodule

// Warning: Variable 'b' is being read in routine code1b line 15 in file 'code1.v',
// 			but does not occur in the timing control of the block which begins there.	
// Warning: Variable 'a' is being read in routine code1c line 24 in file 'code1.v',
// 			but does not occur in the timing control of the block which begins there.
// Warning: Variable 'b' is being read in routine code1c line 24 in file 'code1.v',
// 			but does not occur in the timing control of the block which begins there.
// NOTE: All three modules infer a 2-input and gate

2.2 具有错误排序的完整敏感度列表

在 always 块内的综合前赋值是顺序执行的。当在 alway s块中使用局部临时变量时，这就会成为一个问题。temp 变量可用于 if 语句的条件部分、case 语句表达式或赋值语句的右侧。如果在赋值之前使用临时变量，则会导致赋值顺序错误。在执行 temp 变量赋值语句之前，temp 将包含在之前通过 always 块时赋给它的值。

在下面的模块 code2a 中，对象 temp 在被赋值之前被读取。在前一次遍历块时赋给 temp 的值将用于确定对 o 的赋值。在下一行中，temp 被分配了一个新值，对应于通过 always 块的当前值。在综合前仿真期间，temp 将模拟为锁存状态。该值将被保存，以便在下一次通过 always 块时使用。同样的代码将综合，就好像赋值顺序被正确列出一样。这将导致综合前和综合后仿真之间的不匹配。模块 code2b 中的代码显示了正确的顺序，这将使综合前和合综合后的仿真匹配。

Example 2.2 - Complete sensitivity list with mis-ordered assignments

module code2a (o, a, b, c, d);
	output o;
	input a, b, c, d;
	reg o, temp;
	always @(a or b or c or d) begin
		o = a & b | temp;
		temp = c & d;
	end
endmodule

module code2b (o, a, b, c, d);
	output o;
	input a, b, c, d;
	reg o, temp;
	always @(a or b or c or d) begin
		temp = c & d;
		o = a & b | temp;
	end
endmodule

// Warning: Variable 'temp' is being read in routine code2a line 6 in file 'code2.v',
// 			but does not occur in the timing control of the block which begins there.

// Both designs infer an and-or gate (two 2-input and gates driving one 2-input or gate

3.0 函数

函数总是综合为组合逻辑。出于这个原因，一些工程师选择使用函数对所有组合逻辑进行编码。只要编码的函数像组合逻辑一样仿真，使用函数就没有问题。当工程师在组合功能代码中犯了一个错误，并创建了一个像锁存器一样的仿真代码时，问题就出现了。由于在函数代码中模拟锁存器行为时没有综合工具警告，因此使用函数对可综合的组合逻辑建模的做法是危险的。

在下面的示例中，模块 code3a 展示了编码锁存器的典型方法。当在函数内部使用相同的 if 语句时，如模块 code3b 所示，结果是一个 3 输入和门。如果编写函数中的代码来推断锁存器，则综合前仿真将模拟锁存器的功能，而综合后仿真将模拟组合逻辑。因此，综合前和综合后仿真的结果将不匹配。

Example 3.0 – Latch code in a function

module code3a (o, a, nrst, en);
	output o;
	input a, nrst, en;
	reg o;
	always @(a or nrst or en)
		if (!nrst) o = 1'b0;
		else if (en) o = a;
endmodule

// Infers a latch with asynchronous low-true nrst and transparent high latch enable "en"

module code3b (o, a, nrst, en);
	output o;
	input a, nrst, en;
	reg o;
	always @(a or nrst or en)
		o = latch(a, nrst, en);
	function latch;
		input a, nrst, en;
		if (!nrst) latch = 1'b0;
		else if (en) latch = a;
	endfunction
endmodule

// Infers a 3-input and gate

4.0 CASE 语句

4.1 Full Case

使用综合工具指令 //synopsys full_case 为综合工具提供了比提供给仿真工具更多的设计信息。这个特殊的指令用于通知合综合工具 case 语句已经完全定义，并且所有未使用的 case 的输出赋值都是“不关心”的。使用此指令时，综合前和综合后设计之间的功能可能保持不变，也可能不保持不变。此外。虽然这个指令告诉综合工具使用未使用的状态作为“不关心”，但这个指令有时会使设计比省略 full_case 指令的设计更大、更慢。

在模块 code4a 中，编写 case 语句时不使用任何综合指令。最终的设计是一个由三输入门和逆变器组成的解码器。综合前和综合后的仿真将匹配。模块 code4b 使用一个 case 语句和综合指令 full case。由于综合指令的存在，en 输入在综合过程中被优化掉，并作为悬空输入留下。模块 code4a 和 code4b 的综合前仿真结果将与模块 code4a 的综合后仿真结果匹配，但与模块 code4b 的综合后仿真结果不匹配。

Example 4.1 – Full Case

// no full_case
// Decoder built from four 3-input and gates and two inverters

module code4a (y, a, en);
	output [3:0] y;
	input [1:0] a;
	input en;
	reg [3:0] y;
	
	always @(a or en) begin
		y = 4'h0;
		case ({en,a})
			3'b1_00: y[a] = 1'b1;
			3'b1_01: y[a] = 1'b1;
			3'b1_10: y[a] = 1'b1;
			3'b1_11: y[a] = 1'b1;
		endcase
	end
endmodule

// full_case example
// Decoder built from four 2-input nor gates and two inverters
// The enable input is dangling (has been optimized away)

module code4b (y, a, en);
	output [3:0] y;
	input [1:0] a;
	input en;
	reg [3:0] y;
	always @(a or en) begin
		y = 4'h0;
		case ({en,a}) // synopsys full_case
			3'b1_00: y[a] = 1'b1;
			3'b1_01: y[a] = 1'b1;
			3'b1_10: y[a] = 1'b1;
			3'b1_11: y[a] = 1'b1;
		endcase
	end
endmodule

4.2 Parallel Case

使用综合工具指令 //synopsys parallel_case 为综合工具提供了比提供给仿真工具更多的设计信息。这个特殊的指令用于通知综合工具，所有的 case 都应该并行测试，即使有重叠的 case，这通常会导致推断出优先级编码器。当一个设计确实有重叠的情况下，综合前和综合后设计之间的功能将是不同的。在某些情况下，使用这个指令会使设计更大、更慢。

一位顾问讲述了在 RTL 设计中添加 parallel_case 以改善优化面积和速度的经验。RTL 模型（表现得像一个优先级编码器）通过了测试，但在模拟门级模型时，测试忽略了这个缺陷，门级模型被实现为非优先级并行逻辑。结果是：设计是错误的，直到 ASIC 原型交付后才发现缺陷，ASIC 必须重新设计，在资金和时间上都付出了巨大的代价。

下面对模块 code5a 和 code5b 的综合前仿真，以及模块 code5a 的综合后结构将推断出优先级编码器功能。然而，模块 code5b 的综合后结构将是两个和门。使用综合工具指令 //synopsys parallel_case 将导致优先级编码器 case 语句被综合为并行逻辑，导致综合前和综合后仿真不匹配。

Example 4.2 – Parallel Case

// no parallel_case
// Priority encoder - 2-input nand gate driving an inverter (z-output) and also driving a 3-input and gate (y-output)

module code5a (y, z, a, b, c, d);
	output y, z;
	input a, b, c, d;
	reg y, z;
	always @(a or b or c or d) begin
		{y, z} = 2'b0;
		casez ({a, b, c, d})
			4'b11??: z = 1;
			4'b??11: y = 1;
		endcase
	end
endmodule

// parallel_case
// two parallel 2-input and gates

module code5b (y, z, a, b, c, d);
	output y, z;
	input a, b, c, d;
	reg y, z;
	always @(a or b or c or d) begin
		{y, z} = 2'b0;
		casez ({a, b, c, d}) // synopsys parallel_case
			4'b11??: z = 1;
			4'b??11: y = 1;
		endcase
	end
endmodule

4.3 casex

使用 casex 语句可能会导致设计问题。如果 ‘X’ 出现在 case 表达式或 case 项中，则 casex 将其视为“不关心”。当由 casex 表达式测试的输入初始化为未知状态时，就会出现 casex 的问题。当在 casex 语句中求值时，综合前仿真将把未知输入视为“不关心”。如果测试该条件，等效的综合后仿真将通过门级模型传播 ‘X’。

一家公司讲述了他们在设计中使用 casex 的经验。在复位释放后，设计进入了 casex 语句的一个输入未知的状态。由于综合前 RTL 仿真将未知输入视为“不关心”，因此 casex 语句错误地将设计初始化为工作状态。门级模拟不够复杂或详细，无法捕捉到错误，因此，第一批 ASIC 带着一个严重的缺陷回来了。

下面的模块 code6 是一个简单的地址解码器。有时，外部接口中的设计错误会导致启用在初始化之后，在稳定到有效状态之前，故障到未知状态。当启用处于这种未知状态时，case 选择器将根据 addr 的值错误地匹配其中一个 case 条件。在综合前设计中，这可能会掩盖复位初始化问题，这只会在综合后仿真中可见。如果地址总线的 MSB 在断言时变为未知，则可能存在类似的情况。这将导致 memce0 或 memce1 在应该断言芯片选择（cs）信号时被断言。

指导原则：不要在 RTL 编码中使用 casex。对一个杂散的未知信号进行匹配太容易了。最好使用下一节所示的 casez 语句。

Example 4.3 - Casex Address Decoder

module code6 (memce0, memce1, cs, en, addr);
	output memce0, memce1, cs;
	input en;
	input [31:30] addr;
	reg memce0, memce1, cs;
	always @(addr or en) begin
		{memce0, memce1, cs} = 3'b0;
		casex ({addr, en})
			3'b101: memce0 = 1'b1;
			3'b111: memce1 = 1'b1;
			3'b0?1: cs = 1'b1;
		endcase
	end
endmodule

4.4 casez

使用 casez 语句可能会导致与 casex 相同的设计问题，但是这些问题在验证期间不太可能被忽略。对于 casez，如果输入初始化为高阻抗状态，就会出现问题。然而，casez 语句是一种简短、简洁的表格式方法，用于编码某些有用的结构，如优先级编码器、中断处理程序和地址解码器。因此，casez 语句不应该从设计工程师的有用的 HDL 编码结构库中完全排除。

模块 code7 与上面模块 code6 中所示的具有 enable 的简单地址解码器相同，除了它使用 casez 语句而不是 casex 语句。当其中一个输入进入高阻抗状态而不是未知状态时，将发生 4.3 节中描述的相同问题。同样，根据 case 语句的其他输入的状态，将出现错误的 case 匹配。然而，与 casez 语句（浮动输入或三状态驱动的信号）相比，casex 语句（信号暂时未知）发生偏离匹配的可能性更小，但潜在的问题确实存在。注意，casez 对于地址解码器和优先级编码器的建模很有用。

指导原则：对于 RTL 编码，要谨慎地使用 casez，因为有可能匹配杂散的三态信号。

Example 4.3 - Casez Address Decoder

module code7 (memce0, memce1, cs, en, addr);
	output memce0, memce1, cs;
	input en;
	input [31:30] addr;
	reg memce0, memce1, cs;
	always @(addr or en) begin
		{memce0, memce1, cs} = 3'b0;
		casez ({addr, en})
			3'b101: memce0 = 1'b1;
			3'b111: memce1 = 1'b1;
			3'b0?1: cs = 1'b1;
		endcase
	end
endmodule

5.0 初始化

5.1 赋值 ‘X’

在 RTL 代码中进行赋值时，有时会倾向于赋值 ‘X’。‘X’ 赋值被 Verilog 仿真器解释为未知（前面讨论的 casex 除外），但被综合工具解释为“不关心”。进行 ‘X’ 赋值会导致综合前和综合后仿真不匹配；然而，进行 ‘X’ 赋值的技术也可以是一个有用的技巧。在存在未使用状态的 FSM 设计中，对状态变量进行 ‘X’ 赋值可以帮助调试虚假的状态转换。这是通过在进入 case 语句之前将下一个状态寄存器默认为 ‘X’ 来实现的，导致任何不正确的状态转换都为 ‘X’。请记住，综合工具将未使用的 ‘X’ 状态转换解释为“不关心”来更好的综合优化。

模块 code8a 和 code8b 是简单的 Verilog 模型，实现 3 对 1 多路复用器。模块 code8a 中使用的编码风格将导致仿真不匹配，如果选择 s 的值为 2’b11。模块 code8b 中使用的编码风格将不会有这样的不匹配。如果从未期望选择 2’b11 的 s 组合，则这种不匹配可能是有价值的。如果确实发生了这种 s 组合，那么在仿真过程中它将变得明显，因为 y 输出将被驱动到意想不到的 ‘X’ 条件（这可能有助于调试）。但是，如果设计常规且无害地过渡到 2’b11 选择状态，则第一种编码风格将导致仿真不匹配。

Example 5.1 – Initializing with 'X'

// Note: the second example synthesizes to a smaller and faster implementation than the first example.

module code8a (y, a, b, c, s);
	output y;
	input a, b, c;
	input [1:0] s;
	reg y;
	always @(a or b or c or s) begin
		y = 1'bx;
		case (s)
			2'b00: y = a;
			2'b01: y = b;
			2'b10: y = c;
		endcase
	end
endmodule

module code8b (y, a, b, c, s);
	output y;
	input a, b, c;
	input [1:0] s;
	reg y;
	always @(a or b or c or s)
		case (s)
			2'b00: y = a;
			2'b01: y = b;
			2'b10, 2'b11: y = c;
		endcase
endmodule

5.2 使用 translate_off / translate_on 初始化模型

这一点似乎很明显，不需要提及。然而，一位工程师有以下经验。他使用的是一种状态机工具，该工具生成的 FSM 代码中有初始化的变量，这些变量是隐藏的，无法通过 translate_on 和 translate_off 综合指令进行综合。综合前仿真运行良好，但第一个生产 ASIC 没有正确初始化，这需要重新设计 ASIC。模块 code9 显示了对指令 translate_off/translate_on 初始化部分设计的错误使用。这很可能导致综合前和综合后的仿真不匹配。

Example 5.2 - initialization using translate_off / translate_on

module code9 (y1, go, clk, nrst);
	output y1;
	input go, clk, nrst;
	reg y1;
	parameter IDLE = 1'd0,
			  BUSY = 1'd1;
	reg [0:0] state, next;
	// Hiding the initialization of variables from the synthesis tool is a very dangerous practice!!
	// synopsys translate_off
	initial y1 = 1'b1;
	// synopsys translate_on
	always @(posedge clk or negedge nrst)
		if (!nrst) state <= IDLE;
		else state <= next;
	
	always @(state or go) begin
		next = 1'bx;
		y1 = 1'b0;
		case (state)
			IDLE: if (go) next = BUSY;
			BUSY: begin
				if (!go) next = IDLE;
					y1 = 1'b1;
			end
		endcase
	end
endmodule

6.0 translate_off/ translate_on 的一般使用

一般来说，应该谨慎使用 translate_off/translate_on 综合指令。当用于显示设计信息时，它们是很好的，但是当用于建模功能时，它们是危险的。一个例外是具有异步复位和设置的 D 触发器，其典型编码风格在 100% 的时间内综合和仿真正确的逻辑；然而，在综合前仿真期间，它在 99% 的时间内仿真了正确的功能。这种例外需要使用不可综合的构造来提供正确的综合前仿真，以准确地建模并匹配综合后仿真。创建这个异常条件的方法如下：assert reset，assert set，remove reset，让 set 保持断言状态。在这种情况下，D 触发器模型在综合前仿真中需要一些辅助来正确地模拟设置的条件。这是由于 always 块只在 set/reset 的活动边缘进入。由于这两个输入都是异步的，一旦复位被移除，set 应该是活动的，但情况并非如此，因为没有办法触发 always 块。解决这个问题的方法是使用 translate_off/translate_on 指令对触发器进行建模，并强制输出为这一条件的正确值。这里最好的建议是尽可能避免需要使用异步设置/复位触发器的情况。

模块 code10a 将正确仿真99%的时间（综合前）。它有上面描述的缺陷。如果像模块 code10b 所示的那样从 rstn 中删除并在敏感度列表中设置 negedge，则设计将无法正确仿真，无论是综合前还是综合后，也无法正确综合。最后，模块 code10c 中的代码显示了将 100% 正确模仿真的修复，并将匹配综合前和综合后的仿真。这段代码使用 translate_off/translate_on 指令强制为上述异常条件输出正确的输出。

Example 7.0 - translate_off / translate_on

// Generally good DFF with asynchronous set and reset
module code10a (q, d, clk, rstn, setn);
	output q;
	input d, clk, rstn, setn;
	reg q;
	always @(posedge clk or negedge rstn or negedge setn)
		if (!rstn) q <= 0; // asynchronous reset
		else if (!setn) q <= 1; // asynchronous set
		else q <= d;
endmodule

// synopsys translate_off
// Bad DFF with asynchronous set and reset. This design will not compile from Synopsys, and the design will not simulate correctly.

module code10b (q, d, clk, rstn, setn);
	output q;
	input d, clk, rstn, setn;
	reg q;
	always @(posedge clk or rstn or setn)
		if (!rstn) q <= 0; // asynchronous reset
		else if (!setn) q <= 1; // asynchronous set
		else q <= d;
endmodule

// synopsys translate_on
// Good DFF with asynchronous set and reset and self-correcting set-reset assignment
module code10c (q, d, clk, rstn, setn);
	output q;
	input d, clk, rstn, setn;
	reg q;
	always @(posedge clk or negedge rstn or negedge setn)
		if (!rstn) q <= 0; // asynchronous reset
		else if (!setn) q <= 1; // asynchronous set
		else q <= d;
		
// synopsys translate_off
	always @(rstn or setn)
		if (rstn && !setn) force q = 1;
		else release q;
// synopsys translate_on
endmodule

7.0 Timing Delays

没有在零时间内调度事件的 always 块可能会错过 RTL 或行为模型触发的事件。如模块 code11 所示，将时间延迟添加到赋值的左侧，将导致综合前仿真与综合后仿真不同。首先，一旦由于敏感度列表变量 in 的更改而进入 always 块，在 65 个时间单位之后退出 always 块之前，对 in 的后续更改将不会导致重新进入。其次，经过 25 个时间单位的延迟后，读取、反转 in 的当前值并将其赋值给 out1。在额外的 40 个时间单位之后，in 将再次被读取、反转并分配给 out2。在计时延迟期间，in 上的所有其他事件将被忽略。如果更改发生的频率超过每 65 个时间单位，则输出将不会在每次输入更改时更新。综合后的门级模型将模拟两个逆变器，而综合前的 RTL 代码将错过多个输入转换。将延迟放置在 always 块赋值的左侧并不能准确地建立 RTL 模型或行为模型。

Example 7.0 - Timing Delays

module code11 (out1, out2, in);
	output out1, out2;
	input in;
	reg out1, out2;
	always @(in) begin
		#25 out1 = ~in;
		#40 out2 = ~in;
	end
endmodule

8.0 总结

了解哪种编码风格会导致综合前和综合后仿真之间的不匹配，理解不匹配可能发生的原因，避免容易出错的编码风格，将大大减少 RTL 设计缺陷和修复不匹配所需的调试时间。随着 ASIC 尺寸的不断增加，在每次综合之后运行 100% 覆盖率回归测试的能力变得越来越不切实际。设计师必须在设计周期的早期使用所有可用的方法来降低风险。