java仿真陷入机制,VHDL仿真陷入循环

I'm doing simulation testing for some VHDL I wrote and when I run it in ModelSim it gets stuck. When I hit 'break' it has an arrow pointing to the For loop in the following function:

function MOD_3 (a, b, c : UNSIGNED (1023 downto 0)) return UNSIGNED is

VARIABLE x : UNSIGNED (1023 downto 0) := TO_UNSIGNED(1, 1024);

VARIABLE y : UNSIGNED (1023 downto 0) := a;

VARIABLE b_temp : UNSIGNED (1023 downto 0) := b;

begin

for I in 0 to 1024 loop

if b_temp > 0 then

if b_temp MOD 2 = 1 then

x := (x * y) MOD c;

end if;

y := (y * y) MOD c;

b_temp := b_temp / 2;

else

exit;

end if;

end loop;

return x MOD c;

end function;

I originally had this as a while loop which I realize is not good for synthesizing. So I converted it to a for loop with the condition that b_temp is greater than 0. b_temp is a 1024-bit unsigned and so if it is the largest number that could be represented by 1024 bits and divided in half (which I do in each iteration) 1024 times, shouldn't it definitely be 0?

I have a feeling my problem lies in the large multiplications...if I comment out x := (x * y) MOD c and y := (y * y) MOD c then it exits the loop. So the only thing I can think of is it takes too long to carry out these 1024-bit multiplications? If this is the case, is there any built-in way I can optimize this to make it faster, or is my only option to implement something like Karatsuba multiplication, etc...?

解决方案

I'm of the opinion implementing a Montgomery multiplier in numeric_std function calls may not improve simulation as much as you'd like (while giving synthesis eligibility).

The issue is the number of dynamically elaborated subprogram calls vs. their operand sizes vs. fitting in your CPU-running-Modelsim's L1/L2/L3 caches.

It does do wonders for targeting synthesis in an FPGA or a SIMD GPU implementation.

See Subversion Repositories BasicRSA file modmult.vhd (which has a generic size). I successfully converted this to using numeric_std[_unsigned].

If I recall correctly this appears inspired by

a Masters thesis (Efficient Hardware Architectures for Modular Multiplication) by David Narh Amanor in 2005 outlining a Java and a VHDL implementation in various word sizes.

I found the OpenCores implementation mentioned in a Stackoverflow question (Montgomery multiplication VHDL Implementation) and found the generic sized version in the SVN repository (the downloadable version is 16 bit) and the mention of the thesis in A 1024 – Bit Implementation of the Faster Montgomery Multiplier Using VHDL (by David Narh Anamor, the original link having expired). Note the quoted FPGA implementation performance under 42 usec.

Notice the length 1024 version specified by a generic would still be performing dynamically elaborated function calls with length 1024 operands (although not the "*"s, the "mod"s or the "/"s. You'd still be doing millions of function calls with dynamically elaborated (passed on an expression stack) 1024 bit parameters. We're simply changing how many millions of large parameter subroutine calls and how long they can take.

And that also brings up the possibility of an integer vector implementation (bignum equivalent) in VHDL, which would potential increase simulation performance even more (and you're likely in uncharted territory here).

A subprogram based version of the OpenCores model using variable parameters would be telling. (Whether or not you can impress anyone showing them a simulation model executing, or whether there's this looong pause interrupted by everyone taking furtive glances at the wall clock and looking bored).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值