论文阅读——HEVC中用于动态视频序列基于强化学习的速率控制方法

一、文章出处

本文题为《Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC》,文章链接:原文链接,加载过程较慢容易出现问题,提供资源分享下载链接:分享链接

二、主要内容

文章提出一种基于强化学习的 HEVC 速率控制算法,通过对 encoder 端帧内预测的决策过程分析与建模最终通过强化学习解决问题。

1.一些概念

① frame-level and CTU-level

In addition, our method includes the frame and CTU-level rate control strategy, whose two tasks can be formulated and solved independently. The frame level rate control strategy is determined first; and then the CTU-level rate control strategy is determined.

文中的方法分为 frame-level 和 CTU-level 两部分,两部分独立建模并分别解决。先决定 frame-level rate control strategy 再确定 CTU-level rate control strategy。

② episode

At the frame level, each GOP is regarded as one episode of a task.
At the CTU level, each frame is treated as one episode of a task.

③ state

在这里插入图片描述

Because I frames and P/B frames have many different characteristics in the encoding process, we use different sets of features for I frames and P/B frames.
For an inter-frame, we choose features 3-4 and 6-8 in Table I.
For an intra-frame, we select features 1-7 in Table I to describe the environment.

  • 对于 inter-frame ,选择上表中的特征 3-4 和 6-8 作为 state。
  • 对于 intra-frame ,选择上表中的特征 1-7 作为state。

在这里插入图片描述

For an inter-frame, we choose features 9-12 in Table II.
For an intra-frame, we select features 1-11 in Table II to describe the environment.

  • 对于 inter-frame ,选择上表中的特征 9-12 作为 state。
  • 对于 intra-frame ,选择上表中的特征 1-11 作为state。
④ action

In our proposed framework, the RL agent determines the QP for each frame and CTU. The possible actions are the QP values, which range from 0 to 51.

At the frame level, the actions include all possible QPs and bit budgets for a frame.
At the CTU level, the available actions include all possible QPs that can be used to encode a CTU.

action 是在 0-51 的范围中选择合适的 QP 值。

Note that to ensure stable quality, at the frame level, the QP is must satisfy Q P P i c A v g − 2 ≤ Q P c u r r f r a m e ≤ Q P P i c A v g + 2 QP_{PicAvg}−2 ≤ QP_{currframe} ≤ QP_{PicAvg}+2 QPPicAvg2QPcurrframeQPPicAvg+2, where Q P P i c A v g QP_{PicAvg} QPPicAvg is average QP of previous frames.
At the CTU level, the QP must satisfy Q P c u r r f r a m e − 2 ≤ Q P c u r r C T U ≤ Q P c u r r f r a m e + 2 QP_{currframe} −2 ≤ QP_{currCTU} ≤ QP_{currframe} + 2 QPcurrframe2QPcurrCTUQPcurrframe+2.

为了保证稳定的质量,QP 必须满足 Q P P i c A v g − 2 ≤ Q P c u r r f r a m e ≤ Q P P i c A v g + 2 QP_{PicAvg}−2 ≤ QP_{currframe} ≤ QP_{PicAvg}+2 QPPicAvg2QPcurrframeQPPicAvg+2 Q P c u r r f r a m e − 2 ≤ Q P c u r r C T U ≤ Q P c u r r f r a m e + 2 QP_{currframe} −2 ≤ QP_{currCTU} ≤ QP_{currframe} + 2 QPcurrframe2QPcurrCTUQPcurrframe+2的约束。

⑤ reward
  • frame-level:
    在这里插入图片描述

D c u r _ f r a m e D_{cur\_frame} Dcur_frame and D p r e v _ f r a m e D_{prev\_frame} Dprev_frame are the distortions of the current and previous frames, respectively, for which we use mean-square error (MSE) as the quality evaluation standard;

V = T a v g _ f r a m e ∗ N c o d e d _ f r a m e − ∑ i = 1 N c o d e d _ f r a m e R i V=T_{avg\_frame} * N_{coded\_frame}-\sum^{N_{coded\_frame}}_{i=1}R_i V=Tavg_frameNcoded_framei=1Ncoded_frameRi is the current buffer status, with T a v g _ f r a m e T_{avg\_frame} Tavg_frame being the target average number of bits per frame, N c o d e d _ f r a m e N_{coded\_frame} Ncoded_frame is the number of the encoded frames, R i R_i Ri is the actual bits of i-th frame, and ϵ \epsilon ϵ is a small value to avoid division by 0.

  • CTU-level:
    在这里插入图片描述

D c u r _ C T U D_{cur\_CTU} Dcur_CTU and D p r e v _ C T U D_{prev\_CTU} Dprev_CTU are the distortions of the current and previous CTUs, respectively.

Here, the distortion is measured as the sum of the absolute transformed difference (SATD) of the CTU for an intra-frame and as the MAD for an inter-frame.

2.算法过程

在这里插入图片描述

三、算法效果

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值