Lecture 1 2 Pipelining of Non-Recursive System| HW/SW Codesign for DSP

2 Pipelining of Non-Recursive System

2.1 Basic Principles

1.Flow graphs basically consist of two types of elements:

  • operators (i.e. computational or logic functions)

  • delay elements (i.e. registers, flip-flops, memories, etc.

The operator is time-invariant : if the inputs to operator are time shifted then the outputs of the operator stay the same and are time shifted by the same amount as the inputs.

  • if the inputs to operator are time shifted then the outputs of the operator stay the same and are time shifted by the same amount as the inputs.

在这里插入图片描述
在这里插入图片描述

  1. speeder:k+1(下图中的红色小方块)
  2. delay:k-1(红色竖线)

Graphical interpretation of basic operator and delay

在这里插入图片描述

Registers can either be moved from the outputs of an operator to the inputs
在这里插入图片描述在这里插入图片描述

2.2 Cut-Sets

Assume:

  • Time clocked signal-flow graph (SFG)
  • Time invariant node operators
  • Discrete time delays (usually a delay is given as multiple of a clock cycle)

Def: Cut of a graph is partitioning of the graph nodes into two disjoin subsets.
Def: Cut-set of the cut is the set of graph edges whose endpoints belongs to different graph partitions.

**Def: Cut-set rule:

  1. Construct closed cut …
  2. Move discrete delays from branches entering cut-sets and vice versa by insering delays (registers)on the one side and corrsponding negativedelays (speed-ups) on the other side
  3. Combine complementary delay elements (register/speed-up) on cut-set edges resulting in zero delay

在这里插入图片描述

由图4知道,在cutset 的一端增加delay,另外一端加speeder,不改变输入输出,反之同理。

Be aware that speed-up elements are only theoretical (auxiliary) constructs that can’t be realized in hardware. Thus, all speed-up elements must be removed (at the cost of an increased latency) to be able to implement it as a real circuit. In the example of figure 4 we could remove the speed-up from the output yk and consequently providing the delayed version yk−1at this output port.

在这里插入图片描述

Here the cut-set is constructed around the node as first. Two realization of retiming schemes are depicted there

2.3 Critical Path

Def: Critical Path (CP) is the longest delay/latency path among all paths in the SFG determining maximum system clock frequency.

  1. CP poses as the limitation on the execution time of SFG

the CP determines maximum clock frequency of system as the delay of CP is inverse proportional to the clock frequency(Critical Path 越大,最大频率越小,反之,频率越大,延迟越低)

  1. the speedup of the system can be achieved by
  • spending additional (computational) resources (增加计算单元)-----> spatial parallelism (parallel processing )
  • increasing clock frequency------> temporal parallelism (pipelining)
  1. Pipelining reduces CP by inserting additional delay elements (registers) along the CP. However, the transfer function of the system after pipelining shall be preserved
    two methods :
  • Retiming - is technique for structural relocation of delay elements within SFG such that SFG function is preserved.
  • Loop transformation - graph structure is changed without affecting the transfer function resulting in increased number of delay elements and operators. Newly created delay elements can be moved into the CP be retiming.
  1. In the example in Fig. 6, all nodes have same delay T. Thus, CP passes 4 nodes resulting in total delay of 4 · T. In order to reduce the CP, pipelining using cut-set rule is applied along CP to shorten the CP by 1/2.

在这里插入图片描述
5. Full pipelining is associated with such graph structure that any two nodes in the graph are isolated by delay element. In the Fig. 7 , three cut-sets are constructed.
Fig. 7

  1. Cut-set rule can be seen as application of distributive law
  • Before: x = a·y1 + a·y2. 在这里插入图片描述
  • After: x1= a·(y1 + y2)
    在这里插入图片描述
  1. The length of the critical path in this example is 5 operations. Thus, 4 pipelining stages should be sufficient for fully pipelining the circuit.

在这里插入图片描述

总共4个cutset,每个cutset的入口添加delay,相应出口添加四个speeder,但speeder硬件不可实现,故实际输出为K-4,因为没有speeder为其加一
The critical path is cut ones between every two operations for inserting registers on the input edges of the cut-set. The output edges of the cut-set where the speed-ups are inserted are also the output edges of the circuit. This allows an easy removal of the speed-ups later on. The latency of the circuit after fully pipelined is 4 clock cycles: y−4

2.4 Pure Pipelining

In this case, cut-sets are constructed such that they contain only physically realizable register elements (positive delay).
(不用speeder,因为speeder无法在现实中实现)

  • An example of pure pipelining for system
    在这里插入图片描述

Only registers are placed at crossings of cuts and edges resulting in the same structure as that in Fig. 7.(仅仅在红线处,也就是三个cutset的出口处加delay,三个cutset的入口相同,均为输入"x",因为在三个cutset的输出均添加了delay,所以必须在相应的cutset的入口添加speeder,但因为speeder硬件无法实现,故可等效为K+3)

The idea behind is that the cuts crossing input edges or output edges only. If this is the case, we can insert a register on every cutted edge and close the cut via all inputs or outputs of the circuit, respectively, where the speed-ups are inserted. The speed-ups at the circuit inputs and outputs can simply be dropped after pipelining and only the internally inserted registers remain.

2.5 Example: FIR Filter

  1. FIR filter :
    在这里插入图片描述

  2. pure pipelining for FIR:
    在这里插入图片描述

  • 8 additional registers are inserted (in addition to the already existing 3 registers)
  • After pipelining the CP only contains a single adder or constant multiplier but the latency of the circuit was increased to 3 clock cycles.(一个delay可以抵消一个CP中的operator)
  1. Besser Aproach:运用加法结合律,向左翻转(transposed form for FIR)

    在这里插入图片描述

total of 7 registers
has a latency of only 1 clock cycle.

  1. FIR 框图的两种形式
    a. transposed form (fully pipeliend)
    在这里插入图片描述

因为每个Operator之间都有delay,所以是fully Pipeliened
CP不能经过"a3",因为该路径上operator的数量和delay数量一致,没有延迟,其他路径上Operator和delay数量不一致,有1的延迟,根据CP的定义,CP必须是图中从输入到输出延迟最大的路径。

b. direct form (fully Pipeliened)

在这里插入图片描述

在这里插入图片描述

  1. compare the critical path of both FIR filter versions
    在这里插入图片描述

Direkt Form 的CP必须经过a0
Transform 的CP不经过a(k-1)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值