Flexible Architecture for Simulation and Testing (FAST)

We are at the point in the area of multithreaded microprocessor architectures where further progress will require the development of a hardware prototype. This prototype should support more than two parallel threads and thread-level speculation (TLS). Currently, no commercial microprocessor has these multithreading capabilities and this prevents the serious OS, compiler and application development that is required to take full advantage of multiple threads and TLS. Without the resulting optimized software, it will be difficult to understand the true benefits of these capabilities or make the appropriate hardware/software design tradeoffs to achieve the best performance. However, the problem with building a microprocessor is that it requires VLSI chip design, a resource-intensive process. The immense task of chip design verification before tape out, in particular, can make microprocessor design a difficult undertaking in an academic environment.  This is the primary reason that all prior multithreading and TLS research has relied on software simulators.  

 

FAST is a flexible simulation platform that will enable chip multiprocessor (CMP) and multithreaded simulation on a real hardware platform that enables complex system design with the ability to execute millions of instructions per second.  FAST is a flexible platform that enables the manipulation of the memory hierarchy and other key components.  FPGAs are used to interface between the 4 processor tiles and within the processor tiles.  Figure 1 below, illustrates the FAST PCB at a high level.  The yellow tiles are processor tiles and the blue tile servers as the internal and external system interconnect.  The initial implementation of the FAST leverages the existing Hydra Architecture components, but other CMP designs can be realized by changing the FPGA configuration.

 

Figure 1: Generic FAST implementation on a PCB.

 

Figure 2 shows an expanded view of the processor tiles.  Each tile consists of an FPU, CPU, L1 memory, and FPGAs.  This configuration enables the processing tile to run both floating point and integer applications, while giving it the flexibility to modify the L1 memory configuration and adding other components, like multithreading support and profiling metrics, via the FPGAs

 

Figure 2: Expanded view of the FAST processor tile.

 

We believe it is possible to build a flexible research prototype using fixed-function processors together with FPGAs without doing any VLSI design. We intend to demonstrate this by building a flexible CMP prototyping environment with existing chips called FAST. The key idea is that by combining ten-year-old microprocessor chips with state-of-the-art FPGA chips, it is possible to build a single-board multiprocessor prototyping environment that provides support for TLS and operates at hardware speeds, yet has the latency and bandwidth characteristics equivalent to a modern CMP architecture.  Figure 3 below, is a simplified scaled drawing of the FAST PCB.  A processor tile occupies each corner of the FAST PCB.  The L2 memory, L2 memory controller, Read/Write controller and internal and external glue components occupy the center of the FAST PCB.

 

Figure 3: Simplified component layout for the FAST PCB.

 

As Figure 2 illustrate the processor tile components, Figure 3 uses the same color coding to differentiate the CPU (red), FPU (green), FPGAs (purple) and L1 memory (yellow).  In the center of the PCB resides the L2 memory in light blue and all the internal and external glue components in dark blue.  Starting from the top of the PCB in the center, there is the embedded Ethernet Board that enables the external communication interface.  Next to that is the CPLD that provides more glue logic to augment the capabilities of the microcontroller on the embedded Ethernet board. Below these components are the 2 XC2V6000 which controller the L2 memory and external memory interfaces.  The power connector and DC-to-DC voltage regulators are shown in orange. The voltage regulators are labeled with their output voltages that supply the core FPGA voltages.

 


Figure 4: Completed FAST PCB with component labels.


FAST PCB Specification Files 
FAST Complete Software Archive 

The overall goals of the FAST project is to develop a hardware prototype system that allows hardware and software experimentation with fine-grain and speculative multithreading.  More specifically:

 

Hardware Goals:

  • Explore spectrum of variation in multithreading architectures exploiting the flexibility of the FPGAs.
  • Explore alternative ways to use multithreaded hardware, e.g., support fault tolerance.

 

Software Goals:

  • Explore the design of OS, programming environments, programming paradigms and applications
  • Determine full potential for speedup provided by a TLS architecture for general purpose applications.

 

Education Goals:

  • Provide a project development environment for advance digital design and computer architecture classes

 

Grants & Donations

This project is supported by NSF grant # CCR-0220138, as well as donations from Xilinx, Inc.

异步FIFO设计中的仿真和综合技术通常用于验证和优化设计的性能和功能。 仿真技术是通过使用专门的仿真工具来模拟异步FIFO设计的行为和交互。通过创建整个设计的仿真环境,并将所需的输入信号和时钟周期应用于设计,可以通过观察输出来验证设计的正确性。仿真技术可以检测潜在的时序问题、死锁和数据丢失等设计错误。通过在仿真中模拟不同的工作负载和数据流,可以评估异步FIFO设计的性能和吞吐量。这个过程可以帮助工程师理解设计缺陷并进行改进。 综合技术是将高级描述(如HDL代码)转换为可在特定目标技术上实现的底层门级表示的过程。在异步FIFO设计中,综合器将HDL代码转换为逻辑门级网表,其中包含器件(如D触发器和多路选择器)的具体实现。综合的目标是优化设计的性能、资源使用和功耗,同时满足设计约束。综合技术可以根据设计目标进行优化,例如最小面积、最高性能或最低功耗。通过使用综合技术,工程师可以获得设计的底层物理实现,以评估其性能和功耗,并进行必要的优化。 综合和仿真技术是异步FIFO设计过程中不可或缺的部分,可以帮助工程师验证和优化设计的功能、性能和功耗。它们在设计流程中起到重要的作用,并且通常与其他验证技术(如形式验证和工时验证)结合使用,以确保设计的正确性和可靠性。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值