HEVC HM::compressCtu函数

华函数

于 2022-12-01 19:20:47 发布

阅读量450

点赞数 1

分类专栏： HEVC码率控制文章标签：算法 c++ 开发语言

本文链接：https://blog.csdn.net/qq_46613874/article/details/128104439

版权

HEVC码率控制专栏收录该内容

10 篇文章 5 订阅

订阅专栏

m_pcCuEncoder->compressCtu( pCtu )

这只是一个入口函数，在进入前完成了一些初始化操作，例如计算出该LCU的目标比特和QP

cpmpress函数中决定了CU的分割模式，还有PU,TU等模式，真正的编码是在后面的encodeCtu函数中

这个函数是由m_pcCuEncoder对象，也就是TEncCu类调用的,HM对这个类的解释是： CU encoder class，CU编码类

TEncCu类

先来学习一下 TEncCu类，这个类主要是该LCU编码，ComDataCU主要是不同大小CU的编码信息，需要不断四叉树划分选取最优RD。

class TEncCu
{
private:
 //TComDataCU存放了CU中所有4x4小块的信息！
  TComDataCU**            m_ppcBestCU;      ///< Best CUs in each depth //每个深度下的最优CU
  TComDataCU**            m_ppcTempCU;      ///< Temporary CUs in each depth   
  //每个深度下的临时CU，tempCU用于编码过程，每编码完成一个CU就将tempCU与bestCU进行比较，如果有需要就更新bestCU
  UChar                   m_uhTotalDepth; //总的深度

  TComYuv**               m_ppcPredYuvBest; ///< Best Prediction Yuv for each depth //每个深度下最优的预测YUV
  TComYuv**               m_ppcResiYuvBest; ///< Best Residual Yuv for each depth //每个深度下最优的残差YUV
  TComYuv**               m_ppcRecoYuvBest; ///< Best Reconstruction Yuv for each depth //每个深度下最优的重建YUV
  TComYuv**               m_ppcPredYuvTemp; ///< Temporary Prediction Yuv for each depth  //每个深度下临时的预测YUV
  TComYuv**               m_ppcResiYuvTemp; ///< Temporary Residual Yuv for each depth //每个深度下临时的残差YUV
  TComYuv**               m_ppcRecoYuvTemp; ///< Temporary Reconstruction Yuv for each depth //每个深度下临时的重建YUV
  TComYuv**               m_ppcOrigYuv;     ///< Original Yuv for each depth //YUV数据，在编码之前从TComPicYuv中得到，TComPicYuv在TComPic中

  //  Data : encoder control
  Bool                    m_bEncodeDQP;
  Bool                    m_bFastDeltaQP;
  Bool                    m_stillToCodeChromaQpOffsetFlag; //indicates whether chroma QP offset flag needs to coded at this particular CU granularity.
  Int                     m_cuChromaQpOffsetIdxPlus1; // if 0, then cu_chroma_qp_offset_flag will be 0, otherwise cu_chroma_qp_offset_flag will be 1.
}

最重要的就是两个m_ppcBestCU，m_ppcTempCU

m_ppcBestCU 每个深度下的最优CU，用于存储最优值，注意TComDataCU只有CU的信息，不存放数据
m_ppcTempCU; 每个深度下的临时CU，这个是用于计算，每次计算完成之后都要和最优值进行比较，然后交换

cerate函数

Void TEncCu::create(UChar uhTotalDepth, UInt uiMaxWidth, UInt uiMaxHeight, ChromaFormat chromaFormat)
{
  Int i;

  m_uhTotalDepth   = uhTotalDepth + 1;
  m_ppcBestCU      = new TComDataCU*[m_uhTotalDepth-1];
  m_ppcTempCU      = new TComDataCU*[m_uhTotalDepth-1];

  m_ppcPredYuvBest = new TComYuv*[m_uhTotalDepth-1];
  m_ppcResiYuvBest = new TComYuv*[m_uhTotalDepth-1];
  m_ppcRecoYuvBest = new TComYuv*[m_uhTotalDepth-1];
  m_ppcPredYuvTemp = new TComYuv*[m_uhTotalDepth-1];
  m_ppcResiYuvTemp = new TComYuv*[m_uhTotalDepth-1];
  m_ppcRecoYuvTemp = new TComYuv*[m_uhTotalDepth-1];
  m_ppcOrigYuv     = new TComYuv*[m_uhTotalDepth-1];

  UInt uiNumPartitions;
  for( i=0 ; i<m_uhTotalDepth-1 ; i++)
  {
    uiNumPartitions = 1<<( ( m_uhTotalDepth - i - 1 )<<1 );
    UInt uiWidth  = uiMaxWidth  >> i;
    UInt uiHeight = uiMaxHeight >> i;

    m_ppcBestCU[i] = new TComDataCU; m_ppcBestCU[i]->create( chromaFormat, uiNumPartitions, uiWidth, uiHeight, false, uiMaxWidth >> (m_uhTotalDepth - 1) );
    m_ppcTempCU[i] = new TComDataCU; m_ppcTempCU[i]->create( chromaFormat, uiNumPartitions, uiWidth, uiHeight, false, uiMaxWidth >> (m_uhTotalDepth - 1) );

    m_ppcPredYuvBest[i] = new TComYuv; m_ppcPredYuvBest[i]->create(uiWidth, uiHeight, chromaFormat);
    m_ppcResiYuvBest[i] = new TComYuv; m_ppcResiYuvBest[i]->create(uiWidth, uiHeight, chromaFormat);
    m_ppcRecoYuvBest[i] = new TComYuv; m_ppcRecoYuvBest[i]->create(uiWidth, uiHeight, chromaFormat);

    m_ppcPredYuvTemp[i] = new TComYuv; m_ppcPredYuvTemp[i]->create(uiWidth, uiHeight, chromaFormat);
    m_ppcResiYuvTemp[i] = new TComYuv; m_ppcResiYuvTemp[i]->create(uiWidth, uiHeight, chromaFormat);
    m_ppcRecoYuvTemp[i] = new TComYuv; m_ppcRecoYuvTemp[i]->create(uiWidth, uiHeight, chromaFormat);

    m_ppcOrigYuv    [i] = new TComYuv; m_ppcOrigYuv    [i]->create(uiWidth, uiHeight, chromaFormat);
  }
}

一些初始化分配空间的操作，m_uhTotalDepth =4+1=5

i =0, uiNumPartitions = 256,uiWidth =uiHeight=64

i=1,uiNumPartitions=64,uiWidth =uiHeight=32

i=2,uiNumPartitions=16,uiWidth =uiHeight=16

i=3,uiNumPartitions=4,,uiWidth =uiHeight=8

进入compressCtu函数后,先对m_ppcBestCU[0]和m_ppcTempCU[0]初始化,深度为0即LCU的最搜CU信息和替换信息

m_ppcBestCU[0]->initCtu( pCtu->getPic(), pCtu->getCtuRsAddr() );
  
m_ppcTempCU[0]->initCtu( pCtu->getPic(), pCtu->getCtuRsAddr() );

init函数具体可参考init函数https://blog.csdn.net/qq_46613874/article/details/128081782?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22128081782%22%2C%22source%22%3A%22qq_46613874%22%7D

接着进入正题xCompressCU函数

xCompressCU函数

递归xCompressCU进行块划分,起始的深度是0，即最开始进去的是LCU

xCompressCU( m_ppcBestCU[0], m_ppcTempCU[0], 0);

/xCompressCU主要作用是完成块划分，确定最优预测模式。主要可以分为：
//1.帧间预测xCheckRDCostInter、xCheckRDCostMerge2Nx2N,skip模式是一种特殊的merge模式
//2.帧内预测xCheckRDCostIntra
//3.PCM模式xCheckIntraPCM

具体流程可以参考

HM学习——xCompressCU函数https://blog.csdn.net/m0_51412823/article/details/116737247?spm=1001.2101.3001.6650.4&utm_medium=distribute.pc_relevant.none-task-blog-2~default~BlogCommendFromBaidu~Rate-4-116737247-blog-72401400.pc_relevant_3mothn_strategy_recovery&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2~default~BlogCommendFromBaidu~Rate-4-116737247-blog-72401400.pc_relevant_3mothn_strategy_recovery&utm_relevant_index=5
HEVC-HM16.9源码学习(1)TEncCu::xCompressCUhttps://blog.csdn.net/HazelNuto/article/details/86648413?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522166972449316800215026670%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=166972449316800215026670&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduend~default-2-86648413-null-null.142^v67^wechat,201^v3^control_2,213^v2^t3_esquery_v3&utm_term=TEncCu&spm=1018.2226.3001.4187整个xcompressCU之中完成了一个LCU所有可能的CU划分模式，以及pu(或tu)所有的模式。并且进行了其中PU的帧间帧内PCM的所有预测模式，搜有以上提到的这些过程都通过率失真优化技术，选择出最优的划分模式以及选择模式.

 TComPic* pcPic = rpcBestCU->getPic(); //先获取当前帧的对象
  DEBUG_STRING_NEW(sDebug)
  const TComPPS &pps=*(rpcTempCU->getSlice()->getPPS());
  const TComSPS &sps=*(rpcTempCU->getSlice()->getSPS());
  
  // These are only used if getFastDeltaQp() is true
  const UInt fastDeltaQPCuMaxSize    = Clip3(sps.getMaxCUHeight()>>sps.getLog2DiffMaxMinCodingBlockSize(), sps.getMaxCUHeight(), 32u);

  // get Original YUV data from picture，从Tcompic的TcomPicYUV 中拷贝一份到TcomYUV
  m_ppcOrigYuv[uiDepth]->copyFromPicYuv( pcPic->getPicYuvOrg(), rpcBestCU->getCtuRsAddr(), rpcBestCU->getZorderIdxInCtu() );

  // variable for Cbf fast mode PU decision
  Bool    doNotBlockPu = true;
  Bool    earlyDetectionSkipMode = false;

  const UInt uiLPelX   = rpcBestCU->getCUPelX();  //该CU的左侧像素X位置
  const UInt uiRPelX   = uiLPelX + rpcBestCU->getWidth(0)  - 1; //该CU的右侧像素X位置,深度为0是LCU
  const UInt uiTPelY   = rpcBestCU->getCUPelY(); //该CU的左侧像素Y位置
  const UInt uiBPelY   = uiTPelY + rpcBestCU->getHeight(0) - 1;//该CU的下侧像素Y位置,深度为0是LCU
  const UInt uiWidth   = rpcBestCU->getWidth(0); //该CU的宽度

  Int iBaseQP = xComputeQP( rpcBestCU, uiDepth );  基本的量化步长
  Int iMinQP;
  Int iMaxQP;
  Bool isAddLowestQP = false; 

  //对最小量化步长和最大进行一些约束
  const UInt numberValidComponents = rpcBestCU->getPic()->getNumberValidComponents();

  if( uiDepth <= pps.getMaxCuDQPDepth() )
  {
    Int idQP = m_pcEncCfg->getMaxDeltaQP();
    iMinQP = Clip3( -sps.getQpBDOffset(CHANNEL_TYPE_LUMA), MAX_QP, iBaseQP-idQP );
    iMaxQP = Clip3( -sps.getQpBDOffset(CHANNEL_TYPE_LUMA), MAX_QP, iBaseQP+idQP );
  }
  else
  {
    iMinQP = rpcTempCU->getQP(0);
    iMaxQP = rpcTempCU->getQP(0);
  }
  //开启码率控制算法
 if ( m_pcEncCfg->getUseRateCtrl() )
  {
    iMinQP = m_pcRateCtrl->getRCQP(); //在compressGOP中计算出该帧的QP后调用setPicEstQP设置了QP
    iMaxQP = m_pcRateCtrl->getRCQP();
  }

整体流程是

第一步 CU的大小为64x64，搜索最优的PU的划分得到最优的预测模式，进行TU的划分

第二步 CU的大小为32x32，第一个CU（按之子扫描顺序）同上

第三步 CU的大小为16x16, 第一个CU 同上

第四步 CU的大小为8x8, 以此进行第一个CU，第二个CU,第三个CU和第四个CU的PU和TU的划分和最优模式的选择。这里面完成每个CU后将这个的RD与前面进行累加。

第五步返回到CU为 16x16的CU,将其RD-COST 与第四部记录的四个8X8的CU的RD-cost进行比较。决定了这个16X16的最优的CU划分及最优的CU下的PU和CU的划分。

第六步 CU的大小为 16X16,第二个CU。重复第四步第五步，可以得到第二个最优的16x16的CU的划分和PU TU 的模式。同时将改第二个CU的最优的RD-COST与5步得到的第一个16x16的CURD-COST进行累加。

第七步：同理完成第三个和第四个的16X16的CU的最优的划分和模式的选择，将其RD-COST累加。这样我们就得到了分割为16X16最佳的RD-cost。

第八步：返回到第二步，比较第一个32X32CU的RD-cost 和分割为4个16X16的CU的RD-cost，得了第一个32X32CU的分割信息和最优的模式。

第九步：同理完成第二个32x32，第三个32X32和第四个32X32的最优的划分和模式选择。通过记录和累加每一个32x32的RD-cost，与64x64的CU的RD-Cost进行比较。我们得到了最终的CU 的划分和每个CU的最优的PU的划分及PU的预测模式以及TU的划分。

PU的划分将按下列顺序进行尝试：

帧间

xCheckRDCostInter( rpcBestCU, rpcTempCU,SIZE_2Nx2N, bFMD ) skip 2NX2N

xCheckRDCostMerge2Nx2N( rpcBestCU,rpcTempCU, &earlyDetectionSkipMode ); Merge 2NX2N

xCheckRDCostInter( rpcBestCU, rpcTempCU,SIZE_2Nx2N, bFMD ); 2NX2N

xCheckRDCostInter( rpcBestCU, rpcTempCU,SIZE_NxN, bFMD ); NXN 划分为4个PU

xCheckRDCostInter( rpcBestCU, rpcTempCU,SIZE_Nx2N, bFMD ); Nx2N 划分为2个PU

xCheckRDCostInter ( rpcBestCU, rpcTempCU, SIZE_2NxN,bFMD );

xCheckRDCostInter( rpcBestCU, rpcTempCU,SIZE_2NxnU, bFMD );

xCheckRDCostInter( rpcBestCU, rpcTempCU,SIZE_2NxnD, bFMD ); 4种非对称的划分

xCheckRDCostInter( rpcBestCU, rpcTempCU,SIZE_nLx2N, bFMD );

这里在调用这个帧内RDcost函数时，rpcBestCU中始终存放的是当前CU下最优的PU的模式和划分信息的CU结构体。

帧内：

xCheckRDCostIntra( rpcBestCU, rpcTempCU,SIZE_2Nx2N ); 2NX2N的划分。 PU为CU的大小。

xCheckRDCostIntra( rpcBestCU, rpcTempCU,SIZE_NxN ); 当CU为最小的CU的时候，将尝试分割为4个PU。

rpcTempCU是当前编码模式，xCheckBestMode是检查rpcBestCU是否是最优，进行一个替换操作。

在RD-cost的函数的最后有这样一个函数

xCheckBestMode(rpcBestCU, rpcTempCU,uiDepth

接下来就是深度+1，分割该CU为四块然后递归调用

 if( bSubBranch && uiDepth < sps.getLog2DiffMaxMinCodingBlockSize() && (!getFastDeltaQp() || uiWidth > fastDeltaQPCuMaxSize || bBoundary))
  {
    // further split
      // 进一步的分割
    for (Int iQP=iMinQP; iQP<=iMaxQP; iQP++)
    {
      const Bool bIsLosslessMode = false; // False at this level. Next level down may set it to true.

      rpcTempCU->initEstData( uiDepth, iQP, bIsLosslessMode );
      //深度+1
      UChar       uhNextDepth         = uiDepth+1;
      //下一层的最佳和替换
      TComDataCU* pcSubBestPartCU     = m_ppcBestCU[uhNextDepth];
      TComDataCU* pcSubTempPartCU     = m_ppcTempCU[uhNextDepth];
      DEBUG_STRING_NEW(sTempDebug)
      // 进一步的分割，当前CU又被划分成为4个子CU
      for ( UInt uiPartUnitIdx = 0; uiPartUnitIdx < 4; uiPartUnitIdx++ )
      {
        pcSubBestPartCU->initSubCU( rpcTempCU, uiPartUnitIdx, uhNextDepth, iQP );           // clear sub partition datas or init.
        pcSubTempPartCU->initSubCU( rpcTempCU, uiPartUnitIdx, uhNextDepth, iQP );           // clear sub partition datas or init.

        if( ( pcSubBestPartCU->getCUPelX() < sps.getPicWidthInLumaSamples() ) && ( pcSubBestPartCU->getCUPelY() < sps.getPicHeightInLumaSamples() ) )
        {
          if ( 0 == uiPartUnitIdx) //initialize RD with previous depth buffer
          {
            m_pppcRDSbacCoder[uhNextDepth][CI_CURR_BEST]->load(m_pppcRDSbacCoder[uiDepth][CI_CURR_BEST]);
          }
          else
          {
            m_pppcRDSbacCoder[uhNextDepth][CI_CURR_BEST]->load(m_pppcRDSbacCoder[uhNextDepth][CI_NEXT_BEST]);
          }
  }
      //子划分递归调用结束

注意这里是用rpcTempCU对pcSubBestPartCU和pcSubTemPartC进行初始化,函数到此 rpcBestCU里面还是当前CU64x64大小最优的PU信息。