Is it time to start using high-level synthesis?

http://blogs.cancom.com/elogic_920000692/2010/04/30/is-it-time-to-start-using-high-level-synthesis/

 

One big question people have about high-level synthesis (HLS) is whether or not it is ready for mainstream use. In other words, does it really work (yet)? HLS has had a long history starting with products like Synopsys’s Behavioral Compiler and Cadence’s Visual Architect which never achieved any serious adoption. Then there was a next generation with companies like Synfora, Forte and Mentor’s Catapult. More recently still AutoESL and Cadence’s CtoSilicon.

I met Atul, CEO of AutoESL, last week and he gave me a copy of an interesting report that they had commissioned from Berkeley Design Technology (BDT) who set out to answer the question “does it work?” at least for the AutoESL product, AutoPilot. Since HLS is a competitive market, and the companies in the space are constantly benchmarking at customers and all are making some sales, I think it is reasonable to take this report as a proxy for all the products in the space. Yes, I’m sure each product has its own strengths and weaknesses and different products have different input languages (for instance, Forte only accepts SystemC, Synfora only accepts C and AutoESL accepts C, C++ and SystemC).

BDT ran two benchmarks. One was a video motion analysis algorithm and the other was a DQPSK (think wireless router) receiver. Both were synthesized using AutoPilot and then Xilinx’s tool-chain to create a functional FPGA implementation.

The video algorithm was examined in two ways: first, with a fixed workload at 60 frames per second, how “small” a design could be achieved. Second, given the limitations of the FPGA, how high a frame rate could be achieved. The wireless receiver had a spec of 18.75 megasamples/second, and was synthesized to see what the minimal resources required were to meet the required throughput.

For comparison, they implemented the video algorithms using Texas Instruments TMS320 DSP processors. This is a chip that costs roughly the same as the FPGA they were using, Xilinx’s XC3SD3400A, in the mid $20s.

The video algorithm used 39% of the FPGA but to achieve the same result using the DSPs required at least 12 of them working in parallel, obviously a much more costly and power hungry solution. When they looked at how high a frame rate could be achieved, the AutoPilot/FPGA solution could achieve 183 frames per second, versus 5 frames per second for the DSP. The implementation effort for the two solutions was roughly the same. This is quite a big design, using ¾ of the FPGA. Autopilot read 1600 lines of C and turned it into 38,000 lines of Verilog in 30 seconds.

For the wireless receiver they also had a hand-written RTL implementation for comparison. AutoPilot managed to get the design into 5.6% of the FPGA, and the hand-written implementation achieved 5.9%. I don’t think the difference is significant, and I think it is fair to say that AutoPilot is on a par with hand-coded RTL (at least for this example, ymmv). Using HLS also reduces the development effort by at least 50%.

BDT’s conclusion is that they “were impressed with the quality of results that AutoPilot was able to produce given that this has been a historic weakness for HLS tools in general.” The only real negative is that the tool chain is more expensive (since AutoESL doesn’t come bundled with your FPGA or your DSP).

It would, of course, be interesting to see the same reference designs put through other HLS tools, to know whether these results generalize. But it does look as if HLS is able to achieve results comparable with hand-written RTL at least for this sort of DSP algorithm. But, to be fair to hand-coders, these sort of DSP algorithms where throughput is more important than latency, is a sort of sweet-spot for HLS.

If you want to read the whole report, it’s here.

ISBN-13: 978-1461417903 ISBN-10: 1461417902 Edition: 1st ed. 2013. Corr. 2nd printing 2014 Buy New Price: $265.05 亚马逊卖价。这个书的价值,不多说了吧。 High-Performance Computing using FPGA covers the area of high performance reconfigurable computing (HPRC). This book provides an overview of architectures, tools and applications for High-Performance Reconfigurable Computing (HPRC). FPGAs offer very high I/O bandwidth and fine-grained, custom and flexible parallelism and with the ever-increasing computational needs coupled with the frequency/power wall, the increasing maturity and capabilities of FPGAs, and the advent of multicore processors which has caused the acceptance of parallel computational models. The Part on architectures will introduce different FPGA-based HPC platforms: attached co-processor HPRC architectures such as the CHREC’s Novo-G and EPCC’s Maxwell systems; tightly coupled HRPC architectures, e.g. the Convey hybrid-core computer; reconfigurably networked HPRC architectures, e.g. the QPACE system, and standalone HPRC architectures such as EPFL’s CONFETTI system. The Part on Tools will focus on high-level programming approaches for HPRC, with chapters on C-to-Gate tools (such as Impulse-C, AutoESL, Handel-C, MORA-C++); Graphical tools (MATLAB-Simulink, NI LabVIEW); Domain-specific languages, languages for heterogeneous computing(for example OpenCL, Microsoft’s Kiwi and Alchemy projects). The part on Applications will present case from several application domains where HPRC has been used successfully, such as Bioinformatics and Computational Biology; Financial Computing; Stencil computations; Information retrieval; Lattice QCD; Astrophysics simulations; Weather and climate modeling.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值