全志H616 通过Cedrus和v4l2_request API实现硬件编解码加速(香橙派zero2)

ANIMZLS

已于 2024-07-14 20:54:51 修改

阅读量1.4k

点赞数 4

分类专栏：嵌入式文章标签： ffmpeg linux h.264

于 2024-05-29 13:17:53 首次发布

本文链接：https://blog.csdn.net/weixin_45178274/article/details/139293091

版权

嵌入式专栏收录该内容

9 篇文章

订阅专栏

编译安装或加载cedrus驱动模块，加载v4l2-mem2mem

Sunxi-Cedrus 致力于为全志 SoC 提供硬件加速的视频解码和编码支持，并将其引入主线 Linux 内核。此外，还为典型的基于 GNU/Linux 的系统提供了与内核驱动程序接口的其他用户空间组件。

Sunxi-Cedrus - linux-sunxi.org

如果你的kernel中不包含sunxi-cedrus驱动、v4l2-mem2mem，可能要从源码开始编译该模块。

对于香橙派zero2，sunxi-cedrus视频编解码驱动存在于 Linux 内核源码drivers/staging/media/sunxi/cedrus目录，而v4l2相关支持在drivers/media/v4l2-core/中，通过编译orangepi提供的源码我们可以知道，sunxi-cedrus已被编译为模块但并未加载，需要我们手动加载。

sudo modprobe sunxi-cedrus
sudo modprobe v4l2_mem2mem

lsmod | grep cedrus
dmesg | grep cedrus
lsmod | grep v4l2_mem2mem

echo "sunxi-cedrus" | sudo tee -a /etc/modules #系统在启动时自动加载 sunxi-cedrus 模块

我们可以观察到，cedrus已在系统中成功注册，并注册为/dev/video0，并提醒驱动来自于 Linux 内核的“staging”目录，这意味着该模块可能还不完全稳定或经过充分测试。

请添加图片描述

对 Sunxi-Cedrus 的支持是通过内核空间或用户空间中的各种组件实现的：

Cedrus V4L2 M2M 内核驱动程序
v4l2 请求 VAAPI 后端

此外，还提供了其他用户空间组件，用于开发目的：

v4l2-request-test 工具，允许测试 -Cedrus VPU 驱动程序
libdrm-sun4i，支持以 VPU 使用的 MB32 平铺 NV12 格式分配缓冲区**(已弃用)**
libva-dump VAAPI，允许从视频中转储元数据和切片

支持 libVA 的视频播放器应与 v4l2-request libVA 后端兼容。但是，实现中的某些细节可能会导致与某些播放器不兼容。

下表显示了 v4l2-request libVA 后端中特定编解码器的支持状态：

请添加图片描述

下表显示了对 V4L2 M2M 内核驱动程序中特定 SoC 的支持：

请添加图片描述

以下播放器使用 v4l2-request libVA 后端进行了测试：

请添加图片描述

采用Gstreamer工具进行硬件编解码的测试

安装Gstreamer和相关插件，加载必要模块

sudo apt update
sudo apt install gstreamer1.0-tools gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav
sudo apt install v4l-utils
gst-launch-1.0 --version

sudo modprobe videobuf2-core
sudo modprobe videobuf2-memops
sudo modprobe videobuf2-vmalloc
sudo modprobe videobuf2-v4l2
sudo modprobe v4l2-mem2mem
sudo modprobe sunxi-cedrus

查看H.264相关的编解码支持

gst-inspect-1.0 | grep 264

root@orangepizero2:~# gst-inspect-1.0 | grep 264
libav:  avdec_h264: libav H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 decoder
libav:  avenc_h264_omx: libav OpenMAX IL H.264 video encoder encoder
libav:  avmux_ipod: libav iPod H.264 MP4 (MPEG-4 Part 14) muxer
openh264:  openh264dec: OpenH264 video decoder
openh264:  openh264enc: OpenH264 video encoder
rtp:  rtph264depay: RTP H264 depayloader
rtp:  rtph264pay: RTP H264 payloader
typefindfunctions: video/x-h264: h264, x264, 264
uvch264:  uvch264deviceprovider (GstDeviceProviderFactory)
uvch264:  uvch264mjpgdemux: UVC H264 MJPG Demuxer
uvch264:  uvch264src: UVC H264 Source
v4l2codecs:  v4l2slh264dec: V4L2 Stateless H.264 Video Decoder  
# v4l2slh264dec这是一个 V4L2 Stateless H.264 Video Decoder，用于硬件加速解码。
videoparsersbad:  h264parse: H.264 parser
x264:  x264enc: x264 H.264 Encoder

那么，可以先用本地mp4视频测试硬件解码，并将解码数据直接保存保存为原始 YUV 格式文件（例如 NV12 或 I420），我们可以看到v4l2-ctl显示了香橙派的/dev/video0设备支持的视频捕获格式

root@orangepizero2:~# v4l2-ctl --list-formats-ext -d /dev/video0
ioctl: VIDIOC_ENUM_FMT
        Type: Video Capture

        [0]: 'ST12' (Y/UV 4:2:0 (32x32 Linear))
        [1]: 'NV12' (Y/UV 4:2:0)

利用Gstreamer解码本地命令

gst-launch-1.0 -v filesrc location=input.mp4 ! qtdemux ! h264parse ! v4l2slh264dec ! videoconvert ! video/x-raw,format=I420 ! filesink location=output.yuv

出现报错，从dmesg日志中可以看出，cedrus 视频编解码器在尝试分配 DMA 内存时失败了。这通常表示系统内存不足或者 DMA 分配失败。解码 H.264/H.265 视频可能需要大量的 CMA 内存，因此建议设置一个大型 CMA 池，例如使用内核命令行参数。例如，256 MiB 应该足以解码 1080p H.264 视频。

在/boot/orangepiEnv.txt中添加或修改如下行：

extraargs=cma=256M

接下来，我们尝试编码推流，但是我们发现Gstreamer工具中v4l2并没有适用硬件加速编码的支持。

先尝试软件编码推流，安装RTSP服务

sudo apt-get install gir1.2-gst-rtsp-server-1.0 libgstrtspserver-1.0-0 libgstrtspserver-1.0-dev

编写服务器推流代码，命名为 rtsp-server.c

#include <gst/gst.h>
#include <gst/rtsp-server/rtsp-server.h>

int main(int argc, char *argv[]) {
  gst_init(&argc, &argv);

  GstRTSPServer *server = gst_rtsp_server_new();
  gst_rtsp_server_set_service(server, "8554");

  GstRTSPMountPoints *mounts = gst_rtsp_server_get_mount_points(server);
  GstRTSPMediaFactory *factory = gst_rtsp_media_factory_new();

  // 使用v4l2src从摄像头采集视频，并使用x264enc进行编码
  gst_rtsp_media_factory_set_launch(factory,
    "( v4l2src device=/dev/video1 ! video/x-raw,width=640,height=480 ! videoconvert ! video/x-raw,format=I420 ! x264enc tune=zerolatency profile=main ! rtph264pay name=pay0 pt=96 )");

  gst_rtsp_mount_points_add_factory(mounts, "/test", factory);

  g_object_unref(mounts);

  gst_rtsp_server_attach(server, NULL);

  g_print("stream ready at rtsp://192.168.137.189:8554/test\n");
  GMainLoop *loop = g_main_loop_new(NULL, FALSE);
  g_main_loop_run(loop);

  gst_object_unref(server);
  g_main_loop_unref(loop);

  return 0;
}

使用GCC编译器编译代码，链接GStreamer库。

gcc rtsp-server.c -o rtsp-server `pkg-config --cflags --libs gstreamer-1.0 gstreamer-rtsp-server-1.0`
./rtsp-server

这样就会启动RTSP服务器，并开始推流。你可以使用VLC或其他支持RTSP的媒体播放器来访问和观看流媒体。例如，在VLC中打开网络串流，输入RTSP流地址（比如 rtsp://192.168.137.189:8554/test），就可以观看视频流了。延时大概2秒左右，cpu占用率80%。

尝试ffmpeg工具进行硬件编解码的测试

安装 Cedrus 支持的第一步是构建一个具有驱动程序最新补丁系列的 Linux 内核。这一步香橙派已经替我们完成了。
支持 Cedrus VPU 驱动程序的主要用户空间组件是 libva-v4l2-request VAAPI 后端

git clone https://github.com/bootlin/libva-v4l2-request -b release-2019.03
cd libva-v4l2-request
./autogen.sh && make && sudo make install

在这里插入图片描述

系统中缺少对应的定义，这可能是因为我的 V4L2 (Video4Linux) 头文件版本不支持 HEVC (High Efficiency Video Coding) 的相关功能。

config.c

/*
	found = v4l2_find_format(driver_data->video_fd,
				 V4L2_BUF_TYPE_VIDEO_OUTPUT,
				 V4L2_PIX_FMT_HEVC_SLICE);
	if (found && index < (V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES - 1))
		profiles[index++] = VAProfileHEVCMain;
*/

context.c

	case VAProfileHEVCMain:
		//pixelformat = V4L2_PIX_FMT_HEVC_SLICE;
    pixelformat = V4L2_PIX_FMT_H264_SLICE;
		break;

在这里插入图片描述

h264-ctrls.h

/*注释掉重复的结构体定义
struct v4l2_ctrl_h264_pps {
	__u8 pic_parameter_set_id;
	__u8 seq_parameter_set_id;
	__u8 num_slice_groups_minus1;
	__u8 num_ref_idx_l0_default_active_minus1;
	__u8 num_ref_idx_l1_default_active_minus1;
	__u8 weighted_bipred_idc;
	__s8 pic_init_qp_minus26;
	__s8 pic_init_qs_minus26;
	__s8 chroma_qp_index_offset;
	__s8 second_chroma_qp_index_offset;
	__u16 flags;
};

struct v4l2_ctrl_h264_scaling_matrix {
	__u8 scaling_list_4x4[6][16];
	__u8 scaling_list_8x8[6][64];
};

struct v4l2_h264_weight_factors {
	__s8 luma_weight[32];
	__s8 luma_offset[32];
	__s8 chroma_weight[32][2];
	__s8 chroma_offset[32][2];
};
*/
//修改v4l2_h264_dpb_entry
struct v4l2_h264_dpb_re_entry {
	__u64 timestamp;
	__u16 frame_num;
	__u16 pic_num;
	/* Note that field is indicated by v4l2_buffer.field */
	__s32 top_field_order_cnt;
	__s32 bottom_field_order_cnt;
	__u32 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */
};

struct v4l2_ctrl_h264_decode_param {
	__u32 num_slices;
	__u16 idr_pic_flag;
	__u16 nal_ref_idc;
	__u8 ref_pic_list_p0[32];
	__u8 ref_pic_list_b0[32];
	__u8 ref_pic_list_b1[32];
	__s32 top_field_order_cnt;
	__s32 bottom_field_order_cnt;
	struct v4l2_h264_dpb_re_entry dpb[16];
};

h264.c

static void h264_fill_dpb(struct request_data *data,
			  struct object_context *context,
			  struct v4l2_ctrl_h264_decode_param *decode)
{
	int i;

	for (i = 0; i < H264_DPB_SIZE; i++) {
		struct v4l2_h264_dpb_re_entry *dpb = &decode->dpb[i];
		struct h264_dpb_entry *entry = &context->dpb.entries[i];
		struct object_surface *surface =
			SURFACE(data, entry->pic.picture_id);
		uint64_t timestamp;

		if (!entry->valid)
			continue;

		if (surface) {
			timestamp = v4l2_timeval_to_ns(&surface->timestamp);
			dpb->timestamp = timestamp;
		}

		dpb->frame_num = entry->pic.frame_idx;
		dpb->top_field_order_cnt = entry->pic.TopFieldOrderCnt;
		dpb->bottom_field_order_cnt = entry->pic.BottomFieldOrderCnt;

		dpb->flags = V4L2_H264_DPB_RE_ENTRY_FLAG_VALID;

		if (entry->used)
			dpb->flags |= V4L2_H264_DPB_RE_ENTRY_FLAG_ACTIVE;

		if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE)
			dpb->flags |= V4L2_H264_DPB_RE_ENTRY_FLAG_LONG_TERM;
	}
}

在这里插入图片描述

编译器在链接过程中找不到 tiled_to_planar 函数的定义，我们尝试手动编译和链接

sudo apt-get install --reinstall libtool-bin

root@orangepizero2:~/libva-v4l2-request# libtool --mode=compile gcc -c src/tiled_yuv.S -o src/tiled_yuv.lolibtool: compile:  gcc -c src/tiled_yuv.S  -fPIC -DPIC -o src/.libs/tiled_yuv.o
libtool: compile:  gcc -c src/tiled_yuv.S -o src/tiled_yuv.o >/dev/null 2>&1
root@orangepizero2:~/libva-v4l2-request#

make编译依然失败，查看tiled_yuv.S汇编源文件，发现是针对 ARMv7 架构的汇编代码，需要改为适合 aarch64 架构的汇编代码。

貌似安卓系统比较方便。。。