java仿真陷入机制,VHDL仿真陷入循环

最新推荐文章于 2023-11-09 12:55:17 发布

weixin_39911916

最新推荐文章于 2023-11-09 12:55:17 发布

阅读量494

点赞数

文章标签： java仿真陷入机制

I'm doing simulation testing for some VHDL I wrote and when I run it in ModelSim it gets stuck. When I hit 'break' it has an arrow pointing to the For loop in the following function:

function MOD_3 (a, b, c : UNSIGNED (1023 downto 0)) return UNSIGNED is

VARIABLE x : UNSIGNED (1023 downto 0) := TO_UNSIGNED(1, 1024);

VARIABLE y : UNSIGNED (1023 downto 0) := a;

VARIABLE b_temp : UNSIGNED (1023 downto 0) := b;

begin

for I in 0 to 1024 loop

if b_temp > 0 then

if b_temp MOD 2 = 1 then

x := (x * y) MOD c;

end if;

y := (y * y) MOD c;

b_temp := b_temp / 2;

else

exit;

end if;

end loop;

return x MOD c;

end function;

I originally had this as a while loop which I realize is not good for synthesizing. So I converted it to a for loop with the condition that b_temp is greater than 0. b_temp is a 1024-bit unsigned and so if it is the largest number that could be represented by 1024 bits and divided in half (which I do in each iteration) 1024 times, shouldn't it definitely be 0?

I have a feeling my problem lies in the large multiplications...if I comment out x := (x * y) MOD c and y := (y * y) MOD c then it exits the loop. So the only thing I can think of is it takes too long to carry out these 1024-bit multiplications? If this is the case, is there any built-in way I can optimize this to make it faster, or is my only option to implement something like Karatsuba multiplication, etc...?

解决方案

I'm of the opinion implementing a Montgomery multiplier in numeric_std function calls may not improve simulation as much as you'd like (while giving synthesis eligibility).

The issue is the number of dynamically elaborated subprogram calls vs. their operand sizes vs. fitting in your CPU-running-Modelsim's L1/L2/L3 caches.

It does do wonders for targeting synthesis in an FPGA or a SIMD GPU implementation.

See Subversion Repositories BasicRSA file modmult.vhd (which has a generic size). I successfully converted this to using numeric_std[_unsigned].

If I recall correctly this appears inspired by

a Masters thesis (Efficient Hardware Architectures for Modular Multiplication) by David Narh Amanor in 2005 outlining a Java and a VHDL implementation in various word sizes.

I found the OpenCores implementation mentioned in a Stackoverflow question (Montgomery multiplication VHDL Implementation) and found the generic sized version in the SVN repository (the downloadable version is 16 bit) and the mention of the thesis in A 1024 – Bit Implementation of the Faster Montgomery Multiplier Using VHDL (by David Narh Anamor, the original link having expired). Note the quoted FPGA implementation performance under 42 usec.

Notice the length 1024 version specified by a generic would still be performing dynamically elaborated function calls with length 1024 operands (although not the "*"s, the "mod"s or the "/"s. You'd still be doing millions of function calls with dynamically elaborated (passed on an expression stack) 1024 bit parameters. We're simply changing how many millions of large parameter subroutine calls and how long they can take.

And that also brings up the possibility of an integer vector implementation (bignum equivalent) in VHDL, which would potential increase simulation performance even more (and you're likely in uncharted territory here).

A subprogram based version of the OpenCores model using variable parameters would be telling. (Whether or not you can impress anyone showing them a simulation model executing, or whether there's this looong pause interrupted by everyone taking furtive glances at the wall clock and looking bored).

weixin_39911916

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java仿真陷入机制,VHDL仿真陷入循环

I'm doing simulation testing for some VHDL I wrote and when I run it in ModelSim it gets stuck. When I hit 'break' it has an arrow pointing to the For loop in the following function:function MOD_3 (a...
复制链接

扫一扫