tpdn_opt H264 说明


H264论文下载链接

Aim:

The article main shares their experience on code conversion for synthesizability, various HLS optimizations, HLS limitations while dealing with complex input code, and general design insights by H.264 video decoder.

Why need H.264:

Benchmark suites for HLS are small and simple; CHStone benchmarks have between 200 and 1400 lines of code (LoC) and less than 20 functions. Although widely used to study HLS, these benchmarks are small and simple block designs that are not representative of the large, complex applications used with modern tools.
目前的HLScase都太小了,不能满足实际情况。

what is H.264:H.264 Video Decoding

The H.264 standard is a family of specifications covering a variety of encoding features, resolutions, and frame rates.
264标准是一系列规范,涵盖了各种编码特性、分辨率和帧速率。

the reason to choose H.264

选择H.264原因:6000行code lines、100 functions、complex data structures and control flow

H.264: a complex, widely popular real-world application to study HLS. The Main Profile of H.264 alone has 6000 lines of code and more than 100 functions. H.264 contains complex data structures and control flow, which tests the capabilities of HLS beyond conventional strengths in block-level design.

Introduce High Level Synthesis

简介HLS流程,重点说明vivado HLS使用缺点,说明自上而下设计不好。
(1) significant command-line interpretation, GUI and user interface code that needs to be partitioned from the code to be synthesized
(2) May describe a variety of features and use cases, of which only a subset are desired for implementation。可以描述各种特性和用例,其中只有一个子集需要实现,可以描述各种特性和用例,其中只有一个子集需要实现
(3) May have function hierarchy that is efficient and elegant for software reuse, but introduces variable bounds computational loops for hardware。可能具有高效且优雅的软件重用功能层次结构,但为硬件引入可变边界计算循环
(4) May include substantial hierarchy that is efficient in software, but incurs call/return overheads in HLS-based hardware可能包括在软件中高效的实质性层次结构,但在基于HLS的硬件中会产生呼叫/返回开销
(5) Software code organization may not correspond to a desired hardware block diagram, with computation blocks and communications merged and partitioned among the many functions. 软件代码组织可能与所需的硬件框图不对应,计算块和通信在许多功能之间合并和划分。

METHODOLOGY

1、 Reference Code Selection
code的一些要求限制
2、Algorithmic Code Conversion,以下情况需要进行转换,因为不支持,所以转换。
2.1:Reference Software Partitioning;
将视频数据流的读取存入buffer;
定义nalu数据结构;
H.264码流在网络中传输时实际是以NALU的形式进行传输的;
视频中NALU数据格式介绍

typedef struct
{
  int startcodeprefix_len;      //! 4 for parameter sets and first slice in picture, 3 for everything else (suggested)
  unsigned len;                 //! Length of the NAL unit (Excluding the start code, which does not belong to the NALU)
  int nal_unit_type;            //! NALU_TYPE_xxxx
  int nal_reference_idc;        //! NALU_PRIORITY_xxxx
  int forbidden_bit;            //! should be always FALSE
  unsigned long int bit_offset;
  unsigned long int bit_length;
  unsigned char buf[MAXNALBUFFERSIZE];        //! contains the first byte followed by the EBSP
} NALU_t;

May have significant command-line interpretation, GUI and user interface code that needs to be partitioned from the code to be synthesized;

2.2:Reference Software Partitioning;

3、Non-synthesizable Constructs
3.1:Dynamic Memory Allocation
动态内存分配改成静态分配。将其转换为最大潜在缓冲区变成静态分配。
3.2:File I/O.
视频文件格式是具有参数集、解码设置和压缩数据的结构化数据流。参考软件在适当的位置使用文件I/O来读取相关变量,而不是创建单个数据结构来保存数据。基于H.264数据在通信信道上的网络抽象层(NAL)数据结构,我们创建了一个结构来存储一帧编码数据和相关参数。
3.3:Pointer-based Data Structures.
将指针操作转化为静态分配缓冲区,利用索引跟踪指针指向的缓冲区。
3.4:Recursive Functions.
用适当的非递归算法替换所有递归函数

H264 case 编译整理说明

官方的code和使用说明
1、script.tcl是脚本,直接执行是
vivado_hls script.tcl

HLS Testing

Run Vivado HLS (tested with Vivado HLS 2014.4)

vivado_hls script.tcl
It will go through 3 steps: C verification, Synthesis, and Co-simulation (using Verilog with xsim as simulator). Simulation with SystemC is recommended since it is faster and gives meaningful error messages if thing goes wrong. To simulate with SystemC, replace "verilog" with "systemc" at the last line of the file script.tcl.

The solution result is in tpdn_hls/solution1

2、tpdn_opt文件夹存放的是源代码。
main程序在ldecod.c 中,HLS的顶层函数而是在decode.c中,所以ldecode.c是从视频读取数据传递给decode_main,只对decode_main做高层次综合。

3、测试程序是否能跑通,先测试gcc编译:

g++ -g -o cpan ldecod.c nalu.c decode.c cavlc.c interpred.c residual.c framealloc.c mathfunc.c intrapred.c parset.c slice.c vlc.c

可能extern报错,将部分删除就好了。

4、下面在利用vivado_hls跑试试,最后再用esl跑vivado_hls生成的bc文件,并测试bc仿真结果正确性。

g++DATA下的数据可以g++跑通。以前抽取第一组数据修改ldecod.c 和decode.c的数据则放在tpdn_opt_cpan中。
只有将script.tcl中的

#csim_design -clean -argv {test.264 test_dec.yuv}
#cosim_design -argv {test.264 test_dec.yuv} -trace_level none -rtl systemc -tool xsim

才可以跑通

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值