论文阅读——应用于HEVC帧内预测速率控制的强化学习方法

最新推荐文章于 2024-07-25 10:27:09 发布

liaojq2020

最新推荐文章于 2024-07-25 10:27:09 发布

阅读量243

点赞数

分类专栏：强化学习 HEVC 文章标签：人工智能机器学习强化学习视频编码 HEVC

本文链接：https://blog.csdn.net/qq_43616471/article/details/112002037

版权

强化学习同时被 2 个专栏收录

4 篇文章 1 订阅

订阅专栏

HEVC

3 篇文章 2 订阅

订阅专栏

一、文章出处

本文题为《Reinforcement Learning for HEVC/H.265 Intra-Frame Rate Control》，文章链接：原文链接，加载过程较慢容易出现问题，提供资源分享下载链接：分享链接

二、主要内容

文章提出一种基于强化学习的 HEVC 帧内预测速率控制算法，通过对 encoder 端帧内预测的决策过程分析与建模最终通过强化学习解决问题。

1.一些概念

① episode

Each video frame is coded independently to meet a target bit budget, with the encoding of each frame considered an episodic task.

文章中每个 episode 对应一帧的帧内预测模式决策过程。

② agent, state and action

Agent is our CTU-adaptive rate control algorithm.
Actions a’s are the delta QP values ranging from -3 to +3 relative to a frame-level initial QP.
State s on which the action taking is based includes the hand-crafted features detailed in Table.

agent 是 CTU 自适应速率控制算法
action 是相对于 frame-level initial QP 的 delta QP（-3 ~ +3）
state 是人工设计的特征，具体细节在下表中

在这里插入图片描述

The state summarizes not only the texture variation (components 1-4) within a coding CTU but also the spatial characteristics (components 5-8) of the remaining CTUs, the bit balance and the frame-level initial QP, which is decided by the framelevel rate control of HM-16.15.

③ reward

在这里插入图片描述

To achieve the goal in Eq. (2), an immediate reward $r$ is given to the agent for each action it takes. This reward is computed to be the negative mean squared error of the CTU induced by encoding it with the chosen QP value.

immediate reward 设计为当前 CTU 应用选定的 QP 后得到的 negative mean squared error。