一个朋友问起了xilinx内部IP的调用以及使用的问题,于是整理了一下,放在这里。
当时的问题是浮点除法器IP可以设置他的延迟从0~28,那么是不是延迟28的时序会更好,因为相当于做了一个28级
的流水。不过相对而言,面积会更大。这是和朋友讨论的最初的结果。但是调用了这个浮点除法器的IP之后,ISE给
出的结果并不是这样。后来才发现是我们只调用了IP,但是忘记在这个IP的前后插上寄存器,以至于ISE出来的结果
迷惑了我们。不知道是否也有人遇到同样的问题,所以整理写在这里。希望有用。
目录
1.整体分析... 1
2.3E start kit上的除法器IP的调用... 2
3.XUPV5-LX110T 板卡上除法器IP的调用6
1.分析
总体来说,和之前的讨论结果是一致的,而一开始ISE跑的是有问题的。
―――――――――――――――――――――――
我们先看3E板卡上的情况,即下面的第二章节
下面1) 中的结果就是我们之前做的,可以达到758MHz这么高频率的设计,其实这个设计是无法实现的,这个
758MHz也不是真正的周期的概念。因为在这个设计中,只有输出一级寄存器,而周期的概念就是(需要大于)相邻两
级寄存器之前延迟的最大值。这里只有一级寄存器,更本没有两级来计算这个时钟周期。所以这个得到的时钟周期是
有问题的。从布局布线之后我们来查看关键路径我们会发现其实它只是计算了一级寄存器的延迟。
而2) 中的结果是一个正确的结果。在输入端放入一级寄存器之后,会发现这个时候这个设计只能跑到8MHz。这个
时候,我们才真正的得到只含一个延迟除法器IP的真正的延迟,那就是125ns。从电路结构来说,这个时候的关键路
径就是数据输入这一级到数据输出这一级,两级寄存器(FD)之间的路径,换句话说就是一个除法器的延迟。
最后3) 中的结果就是,将除法器IP变成内部延迟28个时钟单位的IP。我们可以发现资源使用变多了(特别是ff以
及LUT),但是时钟变快了,可以跑到180MHz。
―――――――――――――――――――――――
我们再来看V5板卡上的情况。即第三章节
A)B)C)和3E板卡上的情况基本类似。只是性能要比3E板卡高。比如3E的2)中只能跑9MHz,而V5的B)中可以跑
18MHz。当然这里的这个所谓的两倍不一定准确,但是可以肯定的是,同样的电路,在3E上跑和在V5上跑,V5要跑的
更快一些。
2.3E板卡
1) IP内部一个延迟,输入没有寄存器,输出有寄存器缓存。
时序:
Minimum period: 1.319ns (Maximum Frequency: 758.150MHz)
资源:
Device utilization summary:
---------------------------
Selected Device : 3s500efg320-4
Number of Slices: 392 out of 4656 8%
Number of Slice Flip Flops: 33 out of 9312 0%
Number of 4 input LUTs: 744 out of 9312 7%
2) IP内部一个延迟,输入输出都有寄存器
时序:
Timing Summary:
---------------
Speed Grade: -4
Minimum period: 126.555ns (Maximum Frequency: 7.902MHz)
=========================================================================
资源:
Device utilization summary:
---------------------------
Selected Device : 3s500efg320-4
Number of Slices: 426 out of 4656 9%
Number of Slice Flip Flops: 97 out of 9312 1%
Number of 4 input LUTs: 744 out of 9312 7%
布局布线出现error,因为这个关键路径太长了,根本无法满足时序(我设置了周期为20ns)要求。
ERROR:Par:228 - At least one timing constraint is impossible to meet because component delays
alone exceed the
constraint. A timing constraint summary below shows the failing constraints (preceded with
an Asterisk (*)). Please
use the Timing Analyzer (GUI) or TRCE (command line) with the Mapped NCD and PCF files to
identify which constraints
and paths are failing because of the component delays alone. If the failing path(s) is
mapped to Xilinx components as
expected, consider relaxing the constraint. If it is not mapped to components as expected,
re-evaluate your HDL and
how synthesis is optimizing the path. To allow the tools to bypass this error, set the
environment variable
XIL_TIMING_ALLOW_IMPOSSIBLE to 1.
3) 延迟是28个时钟单位,输入输出都有寄存器
时序:
Timing Summary:
---------------
Speed Grade: -4
Minimum period: 5.568ns (Maximum Frequency: 179.610MHz)
资源:
Device utilization summary:
---------------------------
Selected Device : 3s500efg320-4
Number of Slices: 805 out of 4656 17%
Number of Slice Flip Flops: 1420 out of 9312 15%
Number of 4 input LUTs: 800 out of 9312 8%
布局布线之后的时序:
Timing constraint: TS_clk = PERIOD TIMEGRP "clk" 20 ns HIGH 50%;
31903 paths analyzed, 3966 endpoints analyzed, 0 failing endpoints
0 timing errors detected. (0 setup errors, 0 hold errors)
Minimum period is 6.739ns.
--------------------------------------------------------------------------------
Slack: 13.261ns (requirement - (data path - clock path skew + uncertainty))
Source: a_temp_0 (FF)
Destination: uut1/blk00000003/blk000000e3 (FF)
Requirement: 20.000ns
Data Path Delay: 6.717ns (Levels of Logic = 12)
Clock Path Skew: -0.022ns (0.116 - 0.138)
Source Clock: clk_BUFGP rising at 0.000ns
Destination Clock: clk_BUFGP rising at 20.000ns
Clock Uncertainty: 0.000ns
Maximum Data Path: a_temp_0 to uut1/blk00000003/blk000000e3
Delay type Delay(ns) Logical Resource(s)
---------------------------- -------------------
Tcko 0.587 a_temp_0
net (fanout=2) 2.786 a_temp<0>
Topcyf 1.162 uut1/blk00000003/blk00000c59
uut1/blk00000003/blk00000116
uut1/blk00000003/blk00000114
net (fanout=1) 0.000 uut1/blk00000003/sig00000203
Tbyp 0.118 uut1/blk00000003/blk00000112
uut1/blk00000003/blk00000110
net (fanout=1) 0.000 uut1/blk00000003/sig000001ff
Tbyp 0.118 uut1/blk00000003/blk0000010e
uut1/blk00000003/blk0000010c
net (fanout=1) 0.000 uut1/blk00000003/sig000001fb
Tbyp 0.118 uut1/blk00000003/blk0000010a
uut1/blk00000003/blk00000108
net (fanout=1) 0.000 uut1/blk00000003/sig000001f7
Tbyp 0.118 uut1/blk00000003/blk00000106
uut1/blk00000003/blk00000104
net (fanout=1) 0.000 uut1/blk00000003/sig000001f3
Tbyp 0.118 uut1/blk00000003/blk00000102
uut1/blk00000003/blk00000100
net (fanout=1) 0.000 uut1/blk00000003/sig000001ef
Tbyp 0.118 uut1/blk00000003/blk000000fe
uut1/blk00000003/blk000000fc
net (fanout=1) 0.000 uut1/blk00000003/sig000001eb
Tbyp 0.118 uut1/blk00000003/blk000000fa
uut1/blk00000003/blk000000f8
net (fanout=1) 0.000 uut1/blk00000003/sig000001e7
Tbyp 0.118 uut1/blk00000003/blk000000f6
uut1/blk00000003/blk000000f4
net (fanout=1) 0.000 uut1/blk00000003/sig000001e3
Tbyp 0.118 uut1/blk00000003/blk000000f2
uut1/blk00000003/blk000000f0
net (fanout=1) 0.000 uut1/blk00000003/sig000001df
Tbyp 0.118 uut1/blk00000003/blk000000ee
uut1/blk00000003/blk000000ec
net (fanout=1) 0.000 uut1/blk00000003/sig000001db
Tcinck 1.002 uut1/blk00000003/blk000000ea
uut1/blk00000003/blk000000e7
uut1/blk00000003/blk000000e3
---------------------------- ---------------------------
Total 6.717ns (3.931ns logic, 2.786ns route)
(58.5% logic, 41.5% route)
3.XUPV5-LX110T 板卡
A) IP内部一个延迟,输入没有寄存器,输出有寄存器缓存。
时序:
Timing Summary:
---------------
Speed Grade: -1
Minimum period: 0.807ns (Maximum Frequency: 1239.157MHz)
资源:
Device utilization summary:
---------------------------
Selected Device : 5vlx110tff1136-1
Slice Logic Utilization:
Number of Slice Registers: 33 out of 69120 0%
Number of Slice LUTs: 724 out of 69120 1%
Number used as Logic: 724 out of 69120 1%
布局布线之后的时序结果。
Maximum Data Path: uut1/blk00000003/blk00000010 to result_out
Delay type Delay(ns) Logical Resource(s)
---------------------------- -------------------
Tcko 0.450 uut1/blk00000003/blk00000010
net (fanout=1) 1.506 result
Tdick 0.002 result_out
---------------------------- ---------------------------
Total 1.958ns (0.452ns logic, 1.506ns route)
(23.1% logic, 76.9% route)
B) IP内部一个延迟,输入输出都有寄存器,代码同上。
Timing Summary:
---------------
Minimum period: 55.397ns (Maximum Frequency: 18.052MHz)
Device utilization summary:
---------------------------
Selected Device : 5vlx110tff1136-1
Slice Logic Utilization:
Number of Slice Registers: 97 out of 69120 0%
Number of Slice LUTs: 724 out of 69120 1%
Number used as Logic: 724 out of 69120 1%
会发现布局布线还是无法通过,
ERROR:Pack:1653 - At least one timing constraint is impossible to meet because
component delays alone exceed the constraint. A timing constraint summary
below shows the failing constraints (preceded with an Asterisk (*)). Please
use the Timing Analyzer (GUI) or TRCE (command line) with the Mapped NCD and
PCF files to identify which constraints and paths are failing because of the
component delays alone. If the failing path(s) is mapped to Xilinx components
as expected, consider relaxing the constraint. If it is not mapped to
components as expected, re-evaluate your HDL and how synthesis is optimizing
the path. To allow the tools to bypass this error, set the environment
variable XIL_TIMING_ALLOW_IMPOSSIBLE to 1.
因为V5上这个时钟我们设定为100MHz,而这里它只能跑到18MHz
C) IP使用28延迟,输入输出都有寄存器,
Timing Summary:
---------------
Minimum period: 2.808ns (Maximum Frequency: 356.125MHz)
Minimum input arrival time before clock: 1.154ns
Device utilization summary:
--------------------------
Slice Logic Utilization:
Number of Slice Registers: 1417 out of 69120 2%
Number of Slice LUTs: 758 out of 69120 1%
Number used as Logic: 721 out of 69120 1%
Number used as Memory: 37 out of 17920 0%
Number used as SRL: 37
布局布线之后的时序为:
Maximum Data Path: uut1/blk00000003/blk0000081e to uut1/blk00000003/blk00000097
Delay type Delay(ns) Logical Resource(s)
---------------------------- -------------------
Tcko 0.450 uut1/blk00000003/blk0000081e
net (fanout=1) 2.154 uut1/blk00000003/sig00000b4e
Tas 0.300 uut1/blk00000003/blk00000d47
uut1/blk00000003/blk00000099
uut1/blk00000003/blk00000097
---------------------------- ---------------------------
Total 2.904ns (0.750ns logic, 2.154ns route)
(25.8% logic, 74.2% route)
当时的问题是浮点除法器IP可以设置他的延迟从0~28,那么是不是延迟28的时序会更好,因为相当于做了一个28级
的流水。不过相对而言,面积会更大。这是和朋友讨论的最初的结果。但是调用了这个浮点除法器的IP之后,ISE给
出的结果并不是这样。后来才发现是我们只调用了IP,但是忘记在这个IP的前后插上寄存器,以至于ISE出来的结果
迷惑了我们。不知道是否也有人遇到同样的问题,所以整理写在这里。希望有用。
目录
1.整体分析... 1
2.3E start kit上的除法器IP的调用... 2
3.XUPV5-LX110T 板卡上除法器IP的调用6
1.分析
总体来说,和之前的讨论结果是一致的,而一开始ISE跑的是有问题的。
―――――――――――――――――――――――
我们先看3E板卡上的情况,即下面的第二章节
下面1) 中的结果就是我们之前做的,可以达到758MHz这么高频率的设计,其实这个设计是无法实现的,这个
758MHz也不是真正的周期的概念。因为在这个设计中,只有输出一级寄存器,而周期的概念就是(需要大于)相邻两
级寄存器之前延迟的最大值。这里只有一级寄存器,更本没有两级来计算这个时钟周期。所以这个得到的时钟周期是
有问题的。从布局布线之后我们来查看关键路径我们会发现其实它只是计算了一级寄存器的延迟。
而2) 中的结果是一个正确的结果。在输入端放入一级寄存器之后,会发现这个时候这个设计只能跑到8MHz。这个
时候,我们才真正的得到只含一个延迟除法器IP的真正的延迟,那就是125ns。从电路结构来说,这个时候的关键路
径就是数据输入这一级到数据输出这一级,两级寄存器(FD)之间的路径,换句话说就是一个除法器的延迟。
最后3) 中的结果就是,将除法器IP变成内部延迟28个时钟单位的IP。我们可以发现资源使用变多了(特别是ff以
及LUT),但是时钟变快了,可以跑到180MHz。
―――――――――――――――――――――――
我们再来看V5板卡上的情况。即第三章节
A)B)C)和3E板卡上的情况基本类似。只是性能要比3E板卡高。比如3E的2)中只能跑9MHz,而V5的B)中可以跑
18MHz。当然这里的这个所谓的两倍不一定准确,但是可以肯定的是,同样的电路,在3E上跑和在V5上跑,V5要跑的
更快一些。
2.3E板卡
1) IP内部一个延迟,输入没有寄存器,输出有寄存器缓存。
时序:
Minimum period: 1.319ns (Maximum Frequency: 758.150MHz)
资源:
Device utilization summary:
---------------------------
Selected Device : 3s500efg320-4
Number of Slices: 392 out of 4656 8%
Number of Slice Flip Flops: 33 out of 9312 0%
Number of 4 input LUTs: 744 out of 9312 7%
2) IP内部一个延迟,输入输出都有寄存器
时序:
Timing Summary:
---------------
Speed Grade: -4
Minimum period: 126.555ns (Maximum Frequency: 7.902MHz)
=========================================================================
资源:
Device utilization summary:
---------------------------
Selected Device : 3s500efg320-4
Number of Slices: 426 out of 4656 9%
Number of Slice Flip Flops: 97 out of 9312 1%
Number of 4 input LUTs: 744 out of 9312 7%
布局布线出现error,因为这个关键路径太长了,根本无法满足时序(我设置了周期为20ns)要求。
ERROR:Par:228 - At least one timing constraint is impossible to meet because component delays
alone exceed the
constraint. A timing constraint summary below shows the failing constraints (preceded with
an Asterisk (*)). Please
use the Timing Analyzer (GUI) or TRCE (command line) with the Mapped NCD and PCF files to
identify which constraints
and paths are failing because of the component delays alone. If the failing path(s) is
mapped to Xilinx components as
expected, consider relaxing the constraint. If it is not mapped to components as expected,
re-evaluate your HDL and
how synthesis is optimizing the path. To allow the tools to bypass this error, set the
environment variable
XIL_TIMING_ALLOW_IMPOSSIBLE to 1.
3) 延迟是28个时钟单位,输入输出都有寄存器
时序:
Timing Summary:
---------------
Speed Grade: -4
Minimum period: 5.568ns (Maximum Frequency: 179.610MHz)
资源:
Device utilization summary:
---------------------------
Selected Device : 3s500efg320-4
Number of Slices: 805 out of 4656 17%
Number of Slice Flip Flops: 1420 out of 9312 15%
Number of 4 input LUTs: 800 out of 9312 8%
布局布线之后的时序:
Timing constraint: TS_clk = PERIOD TIMEGRP "clk" 20 ns HIGH 50%;
31903 paths analyzed, 3966 endpoints analyzed, 0 failing endpoints
0 timing errors detected. (0 setup errors, 0 hold errors)
Minimum period is 6.739ns.
--------------------------------------------------------------------------------
Slack: 13.261ns (requirement - (data path - clock path skew + uncertainty))
Source: a_temp_0 (FF)
Destination: uut1/blk00000003/blk000000e3 (FF)
Requirement: 20.000ns
Data Path Delay: 6.717ns (Levels of Logic = 12)
Clock Path Skew: -0.022ns (0.116 - 0.138)
Source Clock: clk_BUFGP rising at 0.000ns
Destination Clock: clk_BUFGP rising at 20.000ns
Clock Uncertainty: 0.000ns
Maximum Data Path: a_temp_0 to uut1/blk00000003/blk000000e3
Delay type Delay(ns) Logical Resource(s)
---------------------------- -------------------
Tcko 0.587 a_temp_0
net (fanout=2) 2.786 a_temp<0>
Topcyf 1.162 uut1/blk00000003/blk00000c59
uut1/blk00000003/blk00000116
uut1/blk00000003/blk00000114
net (fanout=1) 0.000 uut1/blk00000003/sig00000203
Tbyp 0.118 uut1/blk00000003/blk00000112
uut1/blk00000003/blk00000110
net (fanout=1) 0.000 uut1/blk00000003/sig000001ff
Tbyp 0.118 uut1/blk00000003/blk0000010e
uut1/blk00000003/blk0000010c
net (fanout=1) 0.000 uut1/blk00000003/sig000001fb
Tbyp 0.118 uut1/blk00000003/blk0000010a
uut1/blk00000003/blk00000108
net (fanout=1) 0.000 uut1/blk00000003/sig000001f7
Tbyp 0.118 uut1/blk00000003/blk00000106
uut1/blk00000003/blk00000104
net (fanout=1) 0.000 uut1/blk00000003/sig000001f3
Tbyp 0.118 uut1/blk00000003/blk00000102
uut1/blk00000003/blk00000100
net (fanout=1) 0.000 uut1/blk00000003/sig000001ef
Tbyp 0.118 uut1/blk00000003/blk000000fe
uut1/blk00000003/blk000000fc
net (fanout=1) 0.000 uut1/blk00000003/sig000001eb
Tbyp 0.118 uut1/blk00000003/blk000000fa
uut1/blk00000003/blk000000f8
net (fanout=1) 0.000 uut1/blk00000003/sig000001e7
Tbyp 0.118 uut1/blk00000003/blk000000f6
uut1/blk00000003/blk000000f4
net (fanout=1) 0.000 uut1/blk00000003/sig000001e3
Tbyp 0.118 uut1/blk00000003/blk000000f2
uut1/blk00000003/blk000000f0
net (fanout=1) 0.000 uut1/blk00000003/sig000001df
Tbyp 0.118 uut1/blk00000003/blk000000ee
uut1/blk00000003/blk000000ec
net (fanout=1) 0.000 uut1/blk00000003/sig000001db
Tcinck 1.002 uut1/blk00000003/blk000000ea
uut1/blk00000003/blk000000e7
uut1/blk00000003/blk000000e3
---------------------------- ---------------------------
Total 6.717ns (3.931ns logic, 2.786ns route)
(58.5% logic, 41.5% route)
3.XUPV5-LX110T 板卡
A) IP内部一个延迟,输入没有寄存器,输出有寄存器缓存。
时序:
Timing Summary:
---------------
Speed Grade: -1
Minimum period: 0.807ns (Maximum Frequency: 1239.157MHz)
资源:
Device utilization summary:
---------------------------
Selected Device : 5vlx110tff1136-1
Slice Logic Utilization:
Number of Slice Registers: 33 out of 69120 0%
Number of Slice LUTs: 724 out of 69120 1%
Number used as Logic: 724 out of 69120 1%
布局布线之后的时序结果。
Maximum Data Path: uut1/blk00000003/blk00000010 to result_out
Delay type Delay(ns) Logical Resource(s)
---------------------------- -------------------
Tcko 0.450 uut1/blk00000003/blk00000010
net (fanout=1) 1.506 result
Tdick 0.002 result_out
---------------------------- ---------------------------
Total 1.958ns (0.452ns logic, 1.506ns route)
(23.1% logic, 76.9% route)
B) IP内部一个延迟,输入输出都有寄存器,代码同上。
Timing Summary:
---------------
Minimum period: 55.397ns (Maximum Frequency: 18.052MHz)
Device utilization summary:
---------------------------
Selected Device : 5vlx110tff1136-1
Slice Logic Utilization:
Number of Slice Registers: 97 out of 69120 0%
Number of Slice LUTs: 724 out of 69120 1%
Number used as Logic: 724 out of 69120 1%
会发现布局布线还是无法通过,
ERROR:Pack:1653 - At least one timing constraint is impossible to meet because
component delays alone exceed the constraint. A timing constraint summary
below shows the failing constraints (preceded with an Asterisk (*)). Please
use the Timing Analyzer (GUI) or TRCE (command line) with the Mapped NCD and
PCF files to identify which constraints and paths are failing because of the
component delays alone. If the failing path(s) is mapped to Xilinx components
as expected, consider relaxing the constraint. If it is not mapped to
components as expected, re-evaluate your HDL and how synthesis is optimizing
the path. To allow the tools to bypass this error, set the environment
variable XIL_TIMING_ALLOW_IMPOSSIBLE to 1.
因为V5上这个时钟我们设定为100MHz,而这里它只能跑到18MHz
C) IP使用28延迟,输入输出都有寄存器,
Timing Summary:
---------------
Minimum period: 2.808ns (Maximum Frequency: 356.125MHz)
Minimum input arrival time before clock: 1.154ns
Device utilization summary:
--------------------------
Slice Logic Utilization:
Number of Slice Registers: 1417 out of 69120 2%
Number of Slice LUTs: 758 out of 69120 1%
Number used as Logic: 721 out of 69120 1%
Number used as Memory: 37 out of 17920 0%
Number used as SRL: 37
布局布线之后的时序为:
Maximum Data Path: uut1/blk00000003/blk0000081e to uut1/blk00000003/blk00000097
Delay type Delay(ns) Logical Resource(s)
---------------------------- -------------------
Tcko 0.450 uut1/blk00000003/blk0000081e
net (fanout=1) 2.154 uut1/blk00000003/sig00000b4e
Tas 0.300 uut1/blk00000003/blk00000d47
uut1/blk00000003/blk00000099
uut1/blk00000003/blk00000097
---------------------------- ---------------------------
Total 2.904ns (0.750ns logic, 2.154ns route)
(25.8% logic, 74.2% route)