一、文章出处
本文题为《Reinforcement Learning for HEVC/H.265 Intra-Frame Rate Control》,文章链接:原文链接,加载过程较慢容易出现问题,提供资源分享下载链接:分享链接
二、主要内容
文章提出一种基于强化学习的 HEVC 帧内预测速率控制算法,通过对 encoder 端帧内预测的决策过程分析与建模最终通过强化学习解决问题。
1.一些概念
① episode
Each video frame is coded independently to meet a target bit budget, with the encoding of each frame considered an episodic task.
文章中每个 episode 对应一帧的帧内预测模式决策过程。
② agent, state and action
- Agent is our CTU-adaptive rate control algorithm.
- Actions a’s are the delta QP values ranging from -3 to +3 relative to a frame-level initial QP.
- State s on which the action taking is based includes the hand-crafted features detailed in Table.
- agent 是 CTU 自适应速率控制算法
- action 是相对于 frame-level initial QP 的 delta QP(-3 ~ +3)
- state 是人工设计的特征,具体细节在下表中
The state summarizes not only the texture variation (components 1-4) within a coding CTU but also the spatial characteristics (components 5-8) of the remaining CTUs, the bit balance and the frame-level initial QP, which is decided by the framelevel rate control of HM-16.15.
③ reward
To achieve the goal in Eq. (2), an immediate reward r r r is given to the agent for each action it takes. This reward is computed to be the negative mean squared error of the CTU induced by encoding it with the chosen QP value.
immediate reward 设计为当前 CTU 应用选定的 QP 后得到的 negative mean squared error。