tpdn_opt H264 说明

最新推荐文章于 2022-05-04 01:45:21 发布

春江花月夜晨

最新推荐文章于 2022-05-04 01:45:21 发布

阅读量367

点赞数

分类专栏： llvm

本文链接：https://blog.csdn.net/pc153262603/article/details/106524115

版权

llvm 专栏收录该内容

41 篇文章 8 订阅

订阅专栏

Conclusion：High Level Synthesis of Complex Applications:An H.264 Video Decoder

Aim:
Why need H.264:
- what is H.264：H.264 Video Decoding
- the reason to choose H.264
Introduce High Level Synthesis
METHODOLOGY
H264 case 编译整理说明

H264论文下载链接

Aim:

The article main shares their experience on code conversion for synthesizability, various HLS optimizations, HLS limitations while dealing with complex input code, and general design insights by H.264 video decoder.

Why need H.264:

Benchmark suites for HLS are small and simple; CHStone benchmarks have between 200 and 1400 lines of code (LoC) and less than 20 functions. Although widely used to study HLS, these benchmarks are small and simple block designs that are not representative of the large, complex applications used with modern tools.
目前的HLScase都太小了，不能满足实际情况。

what is H.264：H.264 Video Decoding

The H.264 standard is a family of specifications covering a variety of encoding features, resolutions, and frame rates.
264标准是一系列规范，涵盖了各种编码特性、分辨率和帧速率。

the reason to choose H.264

选择H.264原因：6000行code lines、100 functions、complex data structures and control flow

H.264: a complex, widely popular real-world application to study HLS. The Main Profile of H.264 alone has 6000 lines of code and more than 100 functions. H.264 contains complex data structures and control flow, which tests the capabilities of HLS beyond conventional strengths in block-level design.

Introduce High Level Synthesis

简介HLS流程，重点说明vivado HLS使用缺点，说明自上而下设计不好。
（1） significant command-line interpretation, GUI and user interface code that needs to be partitioned from the code to be synthesized
（2） May describe a variety of features and use cases, of which only a subset are desired for implementation。可以描述各种特性和用例，其中只有一个子集需要实现，可以描述各种特性和用例，其中只有一个子集需要实现
（3） May have function hierarchy that is efficient and elegant for software reuse, but introduces variable bounds computational loops for hardware。可能具有高效且优雅的软件重用功能层次结构，但为硬件引入可变边界计算循环
（4） May include substantial hierarchy that is efficient in software, but incurs call/return overheads in HLS-based hardware可能包括在软件中高效的实质性层次结构，但在基于HLS的硬件中会产生呼叫/返回开销
（5） Software code organization may not correspond to a desired hardware block diagram, with computation blocks and communications merged and partitioned among the many functions. 软件代码组织可能与所需的硬件框图不对应，计算块和通信在许多功能之间合并和划分。

METHODOLOGY

1、 Reference Code Selection
code的一些要求限制
2、Algorithmic Code Conversion，以下情况需要进行转换，因为不支持，所以转换。
2.1：Reference Software Partitioning；
将视频数据流的读取存入buffer；
定义nalu数据结构；
H.264码流在网络中传输时实际是以NALU的形式进行传输的；
视频中NALU数据格式介绍

typedef struct
{
  int startcodeprefix_len;      //! 4 for parameter sets and first slice in picture, 3 for everything else (suggested)
  unsigned len;                 //! Length of the NAL unit (Excluding the start code, which does not belong to the NALU)
  int nal_unit_type;            //! NALU_TYPE_xxxx
  int nal_reference_idc;        //! NALU_PRIORITY_xxxx
  int forbidden_bit;            //! should be always FALSE
  unsigned long int bit_offset;
  unsigned long int bit_length;
  unsigned char buf[MAXNALBUFFERSIZE];        //! contains the first byte followed by the EBSP
} NALU_t;

May have significant command-line interpretation, GUI and user interface code that needs to be partitioned from the code to be synthesized；

2.2：Reference Software Partitioning；

3、Non-synthesizable Constructs
3.1：Dynamic Memory Allocation
动态内存分配改成静态分配。将其转换为最大潜在缓冲区变成静态分配。
3.2：File I/O.
视频文件格式是具有参数集、解码设置和压缩数据的结构化数据流。参考软件在适当的位置使用文件I/O来读取相关变量，而不是创建单个数据结构来保存数据。基于H.264数据在通信信道上的网络抽象层（NAL）数据结构，我们创建了一个结构来存储一帧编码数据和相关参数。
3.3：Pointer-based Data Structures.
将指针操作转化为静态分配缓冲区，利用索引跟踪指针指向的缓冲区。
3.4：Recursive Functions.
用适当的非递归算法替换所有递归函数

H264 case 编译整理说明

官方的code和使用说明
1、script.tcl是脚本，直接执行是
vivado_hls script.tcl

HLS Testing

Run Vivado HLS (tested with Vivado HLS 2014.4)

vivado_hls script.tcl
It will go through 3 steps: C verification, Synthesis, and Co-simulation (using Verilog with xsim as simulator). Simulation with SystemC is recommended since it is faster and gives meaningful error messages if thing goes wrong. To simulate with SystemC, replace "verilog" with "systemc" at the last line of the file script.tcl.

The solution result is in tpdn_hls/solution1

2、tpdn_opt文件夹存放的是源代码。
main程序在ldecod.c 中，HLS的顶层函数而是在decode.c中，所以ldecode.c是从视频读取数据传递给decode_main，只对decode_main做高层次综合。

3、测试程序是否能跑通，先测试gcc编译：

g++ -g -o cpan ldecod.c nalu.c decode.c cavlc.c interpred.c residual.c framealloc.c mathfunc.c intrapred.c parset.c slice.c vlc.c

可能extern报错，将部分删除就好了。

4、下面在利用vivado_hls跑试试，最后再用esl跑vivado_hls生成的bc文件，并测试bc仿真结果正确性。

g++DATA下的数据可以g++跑通。以前抽取第一组数据修改ldecod.c 和decode.c的数据则放在tpdn_opt_cpan中。
只有将script.tcl中的

#csim_design -clean -argv {test.264 test_dec.yuv}
#cosim_design -argv {test.264 test_dec.yuv} -trace_level none -rtl systemc -tool xsim

才可以跑通

春江花月夜晨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录