FPGA architecture overview

 programming technology:

  • static memory (RAM): main stream methodology. the SRAM used for:
    1. programming the routing interconnects that are usually steered by small multiplexors
    2. programming CLBs that are used to implement logic functions
    3. volatile
  • flash (EEPROM): 
    1. nonvolatile in nature
    2. area efficient than SRAM
    3. but can not be reconfigured/reprogrammed an infinite number of times
    4. non-standard CMOS process
  • anti-fuse:
    1. lower area and lower on resistance and parasitic cap than the other two
    2. non-volatile in nature
    3. non standard CMOS
    4. cannot be re-programmed

CLB (configurable logic block):

  • basic logic + storage capability
  • commercial vendors use LUT based CLB instead of NAND gates, PAL, etc
  • CLB comprises of clusters of basic logic elements (BLE)
  • BLE = LUT + Flip-Flop, LUT-k means k-input boolean functions. example: LUT-4 uses 16 SRAM bits to implement any 4 inputs boolean function.

  • between the BLEs within a cluster, the communication is through a local routing network. modern FPGA typically contains 4 to 10 BLEs in a single cluster.
  • besider CLB (BLE), modern FPGA contains a heterogeneous mixture of blocks, some of which are for spcific purpose, referred as hard blocks, include memory, multipliers, adders, and DSP blocks.

FPGA routing architecture:

  • programmable routing network: provide connections among logic blocks, IO blocks; consists of wires and programmable switches that form the requried connection and configured using the prgrammable technology.
  • most design exhibit locality, hence requiring abundant short wires, but still there are times for distant connections. the arrangement of routing resources, relative to the arrangement of logic blocks, plays a very important role in the overall efficiency. this arrangement is termed as global routing architecture (hierarchical or island-style) whereas the microscopic details regarding the switching topology of different switch blocks is termed as detailed routing architecture.

FPGA routing architecture: island style (mesh-based)

  • routing area might take 80-90% of the chip area and the rest for logic area, this is why FPGA provide as much as flexibility. a flexibility of a connection box (Fc) is the number of routing tracks of adjacent channel which are connected to the pin of a block. The connectivity of input pins of logic blocks with the adjacent routing channel is called as Fc(in); the connectivity of output pins of the logic blocks with the adjacent routing channel is called as Fc(out). An Fc(in) equal to 1.0 means that all the tracks of adjacent routing channel are connected to the input pin of the logic block. The flexibility of switch box (Fs) is the total number of tracks with which every track entering in the switch box connects to.
  • switch box:

  • in this mesh-based routing, multi-length wires are created to reduce delay.

  • example: Altera Stratix II: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stx2/stx2_sii51002.pdf
    • the logic structure consists of LABs (logic array blocks), memory blocks, and DSP blocks. LABs used to implement general purpose logic, and are symmetrilcally distributed in rows and columns throughout the device fabric. the DSP blocks are custom designed to implement full-precision multiplier of different granularities, and are grouped into columns. Input-and-Ouput only elements (IOEs) represent the external interface of the device. IOEs are located along the prephery of the device. each LAB consists of 8 ALMs (Adaptive Logic Modules), and each ALM consists of 2 adaptive LUTs (ALUTs) with 8 inputs altogether and 2 programmable registers, 2 dedicated full-adders, a carry chain, and a register-chain (to implement arithmetic operations or shift registers). 
    • Interconnections between LABs, RAM blocks, DSP blocks and the IOEs are established using the Multi-track interconnect structure. This interconnect structure consists of wire segments of different lengths and speeds. The interconnect wire-segments span fixed distances, and run in the horizontal (row interconnects) and vertical (column interconnects) directions. The row interconnects can be used to route signals between LABs, DSP blocks, and memory blocks in the same row.
    • Row interconnect resources are of the following types: direct connections between LABs and adjacent blocks; R4 resources that span 4 blcoks to the left or right; R24 resources that provide high speed access across 24 columns. (multiple R4 resources can be connected to each other to establish longer connections within the same row, R4 interconnects can also drive C4 and C16 column interconnects, and R24 high speed row resources.
    • the column interconnect structure is similar to the row interconnect strucutre and includes: carry chain interconnects within a LAB, and from LAB to LAB in the same column; register chain interconnects; C4 and C16 resources. carry chain and register chain interconnects are separated from local interconnect in a LAB. each LAB has its own set of driven-up and driven-down C4 interconnects. C4 interconnects can also be driven by the LABs that are immediately adjacent to the primary LAB. Multiple C4 resources can be connected to each other to form longer connections within a column, and C4 interconnects can also drive row interconnects to establish column-to-column interconnections.C16 interconnects are high-speed vertical resources that span 16 LABs. A C16 interconnect can drive row and column interconnects at every fourth LAB. A LAB local interconnect structure cannot be directly driven by a C16 interconnect; only C4 and R4 interconnects can drive a LAB local interconnect structure.

FPGA routing architecture: hierarchical style (tree-based)

Most logic designs exhibit locality of connections; hence implying a hierarchy in placement and routing of connections between different logic blocks. Hierarchical routing architectures exploit this locality by dividing FPGA logic blocks into separate groups/clusters. These clusters are recursively connected to form a hierarchical structure. In a hierarchical architecture (also termed as tree-based architecture), connections between logic blocks within same cluster are made by wire segments at the lowest level of hierarchy. However, the connection between blocks residing in different groups require the traversal of one or more levels of hierarchy. In a hierarchical architecture, the signal bandwidth varies as we move away from the bottom level and generally it is widest at the top level of hierarchy. The hierarchical routing architecture has been used in a number of commercial FPGA families including Altera Flex10K, Apex and ApexII architectures.

design flow:

  • high level HDL --> logic synthesis --> technology mapping --> packing --> placement --> routing --> bitstream generation --> download and program
  • logic synthesis: HDL --> boolean gates and FFs and wiring connections between these elements.
  • mapping: given a library of cells, the technology mapping problem is to find a network of cells that implements the Boolean network. In the FPGA technology mapping problem, the library of cells is composed of k-input LUTs and flip-flops. Therefore, FPGA technology mapping involves transforming the Boolean network into k-bounded cells. Each cell can then be implemented as an independent k-LUT. The FlowMap algorithm is the most widely used academic tool for FPGA technology mapping. The result of the technology mapping step generates a network of k-bounded LUTs and flip-flops.

  • clustering/packing: The logic elements in a Mesh-based FPGA are typically arranged in two levels of hierarchy. The first level consists of logic blocks (LBs) which are k-input LUT and flip-flop pairs. The second level hierarchy groups k LBs together to form logic blocks clusters.The clustering phase of theFPGA CAD flow is the process of forming groups of k LBs using top-down, depth optimal or bottom-up methods.
  • placement: Placement algorithms determine which logic block within an FPGA should implement the corresponding logic block (instance) required by the circuit. The optimization goals consist in placing connected logic blocks close together to minimize the required wiring (wire length-driven placement), and sometimes to place blocks to balance the wiring density across the FPGA (routability-driven placement) or to maximize circuit speed (timing-driven placement).
  • Routing: The FPGA routing problem consists in assigning nets to routing resources such that no routing resource is shared by more than one net. pathfinder is one such algorithm.
  • timing analysis: this is to determine the speed of circuits which have been completely placed and routed, and to estimate the slack of each source-sink connection during routing (placement and other parts of the CAD flow) in order to decide which connections must be made via fast paths to avoid slowing down the circuit.

 

 

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值