目录
x265相关:
【x265】x265编码器参数配置
【x265】预测模块的简单分析—帧内预测
【x265】预测模块的简单分析—帧间预测
【x265】码率控制模块的简单分析—块级码控工具(AQ和cuTree)
【x265】码率控制模块的简单分析—帧级码控模式(CQP、CRF和ABR)
1. 帧间预测概述
1.1 编码块结构
帧间预测是编码器中降低编码耗时和编码码率的最有效工具之一,通过时域上的相邻参考,能够大幅度降低编码码率,从而节省网络带宽。在x265当中,帧间预测(Inter Prediction,下文简称Inter模式)是基于PU实现和操作的,它能够将一个CU划分成为若干个子区域,分别实现预测功能,与帧内预测(Intra Prediction,下文简称Intra模式)不同,Inter模式能够将CU分成不规则的PU尺寸,如下所示,一共8种
1.2 运动估计
通常在视频播放时,前后帧具有比较强的关联性,一个比较好的思考是,在前后图像中找到两个很相似的块,并利用一个运动偏移量来描述这两个块之间的位置差异,前面的帧编码图像块,后续的帧只对这个位置偏移量进行编码,就能够节省编码比特。找到这个运动偏移量的过程叫做运动估计(Motion Estimation,ME),为了找到这两个很相似的块,需要考虑两个问题:
- 如何描述这两个块的差异程度
- 如何高效的去找到这两个块
1.2.1 运动估计准则
在Inter模式中,描述参考块(下称refBlock)和当前块(下称curBlock)的方式主要为SAD和SATD,另外需要加上对应MV使用的比特开销,即率失真优化公式 J = D + lambda * R
1.2.2 运动搜索
在x265中使用的运动搜索(Motion Search,MS)分为几个步骤:
- 整像素搜索
(1) 菱形搜索(X265_DIA_SEARCH)
(1) 六边形搜索(X265_HEX_SEARCH)
这两种搜索方式和x264当中的类似,可以参考雷博的文章:x264源代码简单分析:宏块分析(Analysis)部分-帧间宏块(Inter)。不同之处在于,如果使用HEX搜索,在x265中还会多循环几次,使用半HEX快速搜索,扩大搜索范围,因为x265当中CU的尺寸要更大一些 - 1/2像素搜索
- 1/4像素搜索
PS:整像素搜索使用的是SAD来描述损失大小,1/2和1/4像素搜索使用的是SATD来描述损失大小。另外,不使用1/8像素搜索的原因是带来的性能增益不明显
1.3 MV预测技术
在Inter模式中,使用了Merge和AMVP两项技术,辅助实现更好的Inter编码。其中,Merge技术可以看作成一种编码模式,在x265中有专门的宏定义这种模式,并且在实际编码时也会将merge相关信息写入码流(例如m_entropyCoder.codeMergeIndex(cu, 0)),不存在MVD(MV Difference);而AMVP技术可以看成一种MV预测技术,编码器只需要对实际MV和预测MV的差值进行编码,因此是存在MVD的
1.3.1 Merge模式
Merge模式为当前PU构建一个MV候选列表,这个候选列表存在5个候选MV。通过遍历这个列表,从5个候选MV中选择一个最佳的MV作为Merge模式的MV,merge mv会在后续的帧间预测流程中提供有力指导。
Merge列表的构建分为空域候选列表和时域候选列表两个部分:
- 空域候选列表的构建
空域候选列表的构建顺序 = { A1, B1, B0, A0, B2 },列表从左到右进行顺序构建,空域候选列表至多包含4个候选MV
对于下列使用矩形划分方式中的PU 2,其候选模式需要做额外处理。下图(a)中的情形,PU2的候选列表中不能存在A1的运动信息,因为如果PU2使用了A1(即PU1)中的信息,则PU1和PU2的MV会一样,这与2NxN的划分方式就没有区别了。同理,对于图(b)中的情形,PU2的列表中不能存在B1的运动信息
- 时域候选列表的建立
时域MV候选列表的建立利用了当前PU在邻近已编码图像中对应位置PU(同位PU)的运动信息,但不是直接使用,而是根据当前帧与参考帧的相对位置做对应的比例伸缩调整。图示如下,其中cur_PU为当前预测PU,col_PU为相邻已编码帧的同位PU,cur_ref为当前帧的参考帧,col_ref为相邻已编码帧的参考帧
当前PU的时域候选MV的计算公式为
c u r M V = t d t b c o l M V curMV = \frac{td}{tb}colMV curMV=tbtdcolMV
时域候选列表中同位块的位置位于右下角H块,如果H块不存在,则使用C3来代替。时域候选列表最多只提供1个候选MV
PS:如果merge模式前面两步的操作之后,候选列表不足5个,就填充(0, 0)
1.3.2 AMVP技术
AMVP技术与merge有类似之处,同样使用了空域和时域上运动向量的相关性。
-
空域候选列表的建立
沿用merge模式使用的相邻块编号,AMVP空域候选列表分别从左侧和上方各产生一个候选预测MV,左侧选择的顺序 = { A0, A1, scaled A0, scaled A1 },上方选择的顺序 = { B0, B1, B2, (scaled B0, scaled B1, scaled B2) },这里的scaled和merge中利用同位块计算当前块MV的方式相同。对于上方选择的顺序而言,MV的比例伸缩只有在左侧两个PU都不可用或者都是Intra模式时才会进行。同时,只有当相邻块候选MV指向的参考帧与当前PU相同时,才可以直接使用相邻MV,否则需要对其进行scale另外,AMVP技术中空域候选列表至多包含2个候选MV(merge模式至多包含4个)
-
时域候选列表的建立
与merge模式构建的方式一致
PS:如果AMVP技术经过前面两个步骤之后,候选列表中不足2个候选MV,就填充(0, 0)。另外,AMVP技术在实际编码时,会对MV进行差分编码,即只编码MVD
2. 帧间预测入口函数(compressInterCU_rd0_4)
在x265的帧间预测入口函数中,仅简单分析compressInterCU_rd0_4(),函数中的0和4表示如果rdLevel位于0~4之间则使用这个函数,因为默认的配置中rdLevel=3,所以默认会使用这个函数进行帧间预测
函数的定义位于encoder\analysis.cpp中,其主要的工作流程为
(1)评估使用merge和skip模式带来的损失(checkMerge2Nx2N_rd0_4)
(2)评估划分成为4个子块带来的损失(递归调用compressInterCU_rd0_4)
(3)评估当前深度各种划分模式和Intra模式带来的损失(checkInter_rd0_4,checkIntraInInter)
SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)
{
if (parentCTU.m_vbvAffected && calculateQpforCuSize(parentCTU, cuGeom, 1))
return compressInterCU_rd5_6(parentCTU, cuGeom, qp);
uint32_t depth = cuGeom.depth;
uint32_t cuAddr = parentCTU.m_cuAddr;
ModeDepth& md = m_modeDepth[depth];
// searchMethod默认为X265_HEX_SEARCH
if (m_param->searchMethod == X265_SEA)
{
int numPredDir = m_slice->isInterP() ? 1 : 2;
int offset = (int)(m_frame->m_reconPic->m_cuOffsetY[parentCTU.m_cuAddr] + m_frame->m_reconPic->m_buOffsetY[cuGeom.absPartIdx]);
for (int list = 0; list < numPredDir; list++)
for (int i = 0; i < m_frame->m_encData->m_slice->m_numRefIdx[list]; i++)
for (int planes = 0; planes < INTEGRAL_PLANE_NUM; planes++)
m_modeDepth[depth].fencYuv.m_integral[list][i][planes] = m_frame->m_encData->m_slice->m_refFrameList[list][i]->m_encData->m_meIntegral[planes] + offset;
}
PicYuv& reconPic = *m_frame->m_reconPic;
SplitData splitCUData;
// 是否进行hevc的分析(x265似乎对AVC做了兼容)
bool bHEVCBlockAnalysis = (m_param->bAnalysisType == AVC_INFO && cuGeom.numPartitions > 16);
// 是否进行avc分析的refine
bool bRefineAVCAnalysis = (m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1]));
// no-off loading,如果为true,表示不会将CPU当中的任务移动到其他处理器(如GPU等)上面进行
bool bNooffloading = !(m_param->bAnalysisType == AVC_INFO);
if (bHEVCBlockAnalysis || bRefineAVCAnalysis || bNooffloading)
{
md.bestMode = NULL;
bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
uint32_t minDepth = topSkipMinDepth(parentCTU, cuGeom);
bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
bool skipModes = false; /* Skip any remaining mode analyses at current depth */
bool skipRecursion = false; /* Skip recursion */
bool splitIntra = true;
bool skipRectAmp = false;
bool chooseMerge = false;
bool bCtuInfoCheck = false;
int sameContentRef = 0;
if (m_evaluateInter)
{
if (m_refineLevel == 2)
{
if (parentCTU.m_predMode[cuGeom.absPartIdx] == MODE_SKIP)
skipModes = true;
if (parentCTU.m_partSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
skipRectAmp = true;
}
mightSplit &= false;
minDepth = depth;
}
if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)
m_maxTUDepth = loadTUDepth(cuGeom, parentCTU);
SplitData splitData[4];
splitData[0].initSplitCUData();
splitData[1].initSplitCUData();
splitData[2].initSplitCUData();
splitData[3].initSplitCUData();
// avoid uninitialize value in below reference
if (m_param->limitModes)
{
md.pred[PRED_2Nx2N].bestME[0][0].mvCost = 0; // L0
md.pred[PRED_2Nx2N].bestME[0][1].mvCost = 0; // L1
md.pred[PRED_2Nx2N].sa8dCost = 0;
}
if (m_param->bCTUInfo && depth <= parentCTU.m_cuDepth[cuGeom.absPartIdx])
{
if (bDecidedDepth && m_additionalCtuInfo[cuGeom.absPartIdx])
sameContentRef = findSameContentRefCount(parentCTU, cuGeom);
if (depth < parentCTU.m_cuDepth[cuGeom.absPartIdx])
{
mightNotSplit &= bDecidedDepth;
bCtuInfoCheck = skipRecursion = false;
skipModes = true;
}
else if (mightNotSplit && bDecidedDepth)
{
if (m_additionalCtuInfo[cuGeom.absPartIdx])
{
bCtuInfoCheck = skipRecursion = true;
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
if (!sameContentRef)
{
if ((m_param->bCTUInfo & 2) && (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth))
{
qp -= int32_t(0.04 * qp);
setLambdaFromQP(parentCTU, qp);
}
if (m_param->bCTUInfo & 4)
skipModes = false;
}
if (sameContentRef || (!sameContentRef && !(m_param->bCTUInfo & 4)))
{
if (m_param->rdLevel)
skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
if ((m_param->bCTUInfo & 4) && sameContentRef)
skipModes = md.bestMode && true;
}
}
else
{
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
if (m_param->rdLevel)
skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
}
mightSplit &= !bDecidedDepth;
}
}
if ((m_param->analysisLoadReuseLevel > 1 && m_param->analysisLoadReuseLevel != 10))
{
if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx])
{
if (m_reuseModes[cuGeom.absPartIdx] == MODE_SKIP)
{
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
skipRecursion = !!m_param->recursionSkipMode && md.bestMode;
if (m_param->rdLevel)
skipModes = m_param->bEnableEarlySkip && md.bestMode;
}
if (m_param->analysisLoadReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
{
if (m_reuseModes[cuGeom.absPartIdx] != MODE_INTRA && m_reuseModes[cuGeom.absPartIdx] != 4)
{
skipRectAmp = true && !!md.bestMode;
chooseMerge = !!m_reuseMergeFlag[cuGeom.absPartIdx] && !!md.bestMode;
}
}
}
}
if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_reuseInterDataCTU)
{
if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx])
{
if (m_reuseModes[cuGeom.absPartIdx] == MODE_SKIP)
{
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
skipRecursion = !!m_param->recursionSkipMode && md.bestMode;
if (m_param->rdLevel)
skipModes = m_param->bEnableEarlySkip && md.bestMode;
}
}
}
/* Step 1. Evaluate Merge/Skip candidates for likely early-outs, if skip mode was not set above */
// 1. 对Merge、Skip候选模式进行评估以确定是否可以提前终止某些计算过程(如果skip模式在前面没有配置)
if ((mightNotSplit && depth >= minDepth && !md.bestMode && !bCtuInfoCheck) || (m_param->bAnalysisType == AVC_INFO && m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1])))
/* TODO: Re-evaluate if analysis load/save still works */
{
/* Compute Merge Cost */
// 初始化merge和skip模式的CU
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
// 进行merge模式和skip模式的帧间预测
checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
if (m_param->rdLevel)
skipModes = (m_param->bEnableEarlySkip || m_refineLevel == 2)
&& md.bestMode && md.bestMode->cu.isSkipped(0); // TODO: sa8d threshold per depth
}
if (md.bestMode && m_param->recursionSkipMode && !bCtuInfoCheck && !(m_param->bAnalysisType == AVC_INFO && m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1])))
{
skipRecursion = md.bestMode->cu.isSkipped(0);
if (mightSplit && !skipRecursion)
{
if (depth >= minDepth && m_param->recursionSkipMode == RDCOST_BASED_RSKIP)
{
if (depth)
skipRecursion = recursionDepthCheck(parentCTU, cuGeom, *md.bestMode);
if (m_bHD && !skipRecursion && m_param->rdLevel == 2 && md.fencYuv.m_size != MAX_CU_SIZE)
skipRecursion = complexityCheckCU(*md.bestMode);
}
else if (cuGeom.log2CUSize >= MAX_LOG2_CU_SIZE - 1 && m_param->recursionSkipMode == EDGE_BASED_RSKIP)
{
skipRecursion = complexityCheckCU(*md.bestMode);
}
}
}
// 检查是否需要跳过递归划分
if (m_param->bAnalysisType == AVC_INFO && md.bestMode && cuGeom.numPartitions <= 16 && m_param->analysisLoadReuseLevel == 7)
skipRecursion = true;
/* Step 2. Evaluate each of the 4 split sub-blocks in series */
// 评估4个子块的Inter模式
if (mightSplit && !skipRecursion)
{
if (bCtuInfoCheck && m_param->bCTUInfo & 2)
qp = int((1 / 0.96) * qp + 0.5);
Mode* splitPred = &md.pred[PRED_SPLIT];
splitPred->initCosts();
CUData* splitCU = &splitPred->cu;
splitCU->initSubCU(parentCTU, cuGeom, qp);
uint32_t nextDepth = depth + 1;
ModeDepth& nd = m_modeDepth[nextDepth];
invalidateContexts(nextDepth);
Entropy* nextContext = &m_rqt[depth].cur;
int nextQP = qp;
splitIntra = false;
for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
{
const CUGeom& childGeom = *(&cuGeom + cuGeom.childOffset + subPartIdx);
if (childGeom.flags & CUGeom::PRESENT)
{
m_modeDepth[0].fencYuv.copyPartToYuv(nd.fencYuv, childGeom.absPartIdx);
m_rqt[nextDepth].cur.load(*nextContext);
if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)
nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));
// 进行4个子块的帧间预测
splitData[subPartIdx] = compressInterCU_rd0_4(parentCTU, childGeom, nextQP);
// Save best CU and pred data for this sub CU
splitIntra |= nd.bestMode->cu.isIntra(0);
splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);
splitPred->addSubCosts(*nd.bestMode);
if (m_param->rdLevel)
nd.bestMode->reconYuv.copyToPartYuv(splitPred->reconYuv, childGeom.numPartitions * subPartIdx);
else
nd.bestMode->predYuv.copyToPartYuv(splitPred->predYuv, childGeom.numPartitions * subPartIdx);
if (m_param->rdLevel > 1)
nextContext = &nd.bestMode->contexts;
}
else
splitCU->setEmptyPart(childGeom, subPartIdx);
}
nextContext->store(splitPred->contexts);
if (mightNotSplit)
addSplitFlagCost(*splitPred, cuGeom.depth);
else if (m_param->rdLevel > 1)
updateModeCost(*splitPred);
else
splitPred->sa8dCost = m_rdCost.calcRdSADCost((uint32_t)splitPred->distortion, splitPred->sa8dBits);
}
/* If analysis mode is simple do not Evaluate other modes */
if (m_param->bAnalysisType == AVC_INFO && m_param->analysisLoadReuseLevel == 7)
{
if (m_slice->m_sliceType == P_SLICE)
{
if (m_checkMergeAndSkipOnly[0])
skipModes = true;
}
else
{
if (m_checkMergeAndSkipOnly[0] && m_checkMergeAndSkipOnly[1])
skipModes = true;
}
}
/* Split CUs
* 0 1
* 2 3 */
uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;
/* Step 3. Evaluate ME (2Nx2N, rect, amp) and intra modes at current depth */
// 评估当前深度的ME和intra模式
if (mightNotSplit && (depth >= minDepth || (m_param->bCTUInfo && !md.bestMode)))
{
if (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth && m_slice->m_pps->maxCuDQPDepth != 0)
setLambdaFromQP(parentCTU, qp);
//
/*
检查是否是skip模式
(1)如果是skip模式,跳过当前深度的inter prediction
(2)如果不是skip模式,进入下面的inter prediction,会按照顺序去检查各种划分方式
(a)2Nx2N
(b)矩形划分
(i) 2NxN, Nx2N
(ii) 2NxnD, 2NxnU
(iii)nRx2N, nLx2N
*/
if (!skipModes)
{
uint32_t refMasks[2];
refMasks[0] = allSplitRefs;
md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
// 2Nx2N的帧间预测
checkInter_rd0_4(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
if (m_param->limitReferences & X265_REF_LIMIT_CU)
{
CUData& cu = md.pred[PRED_2Nx2N].cu;
uint32_t refMask = cu.getBestRefIdx(0);
allSplitRefs = splitData[0].splitRefs = splitData[1].splitRefs = splitData[2].splitRefs = splitData[3].splitRefs = refMask;
}
// B帧的2Nx2N帧间预测(没有研究)
if (m_slice->m_sliceType == B_SLICE)
{
md.pred[PRED_BIDIR].cu.initSubCU(parentCTU, cuGeom, qp);
checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], cuGeom);
}
Mode *bestInter = &md.pred[PRED_2Nx2N];
// 检查是否进行rect模式预测,即矩形划分方式
if (!skipRectAmp)
{
/*
2NxN划分 Nx2N划分
+---+---+ +---+---+
| | | | |
+---+---+ + + +
| | | | |
+---+---+ +---+---+
*/
// 检查是否允许进行矩形分割(非正方形)
if (m_param->bEnableRectInter)
{
// 计算划分成为4个子块的总损失
uint64_t splitCost = splitData[0].sa8dCost + splitData[1].sa8dCost + splitData[2].sa8dCost + splitData[3].sa8dCost;
uint32_t threshold_2NxN, threshold_Nx2N;
/*
(1)如果是P帧,取出前向cost
(2)如果是B帧,求前后向的平均cost
*/
if (m_slice->m_sliceType == P_SLICE)
{
threshold_2NxN = splitData[0].mvCost[0] + splitData[1].mvCost[0];
threshold_Nx2N = splitData[0].mvCost[0] + splitData[2].mvCost[0];
}
else
{
threshold_2NxN = (splitData[0].mvCost[0] + splitData[1].mvCost[0]
+ splitData[0].mvCost[1] + splitData[1].mvCost[1] + 1) >> 1;
threshold_Nx2N = (splitData[0].mvCost[0] + splitData[2].mvCost[0]
+ splitData[0].mvCost[1] + splitData[2].mvCost[1] + 1) >> 1;
}
/*
下面代码的逻辑
(1)如果try_2NxN_first = true,则按照检查顺序的1和2执行
(2)如果try_Nx2N_first = true, 则按照检查顺序的2和3执行
*/
int try_2NxN_first = threshold_2NxN < threshold_Nx2N;
/*
检查顺序1
splitCost:划分成为4个子块的损失
md.pred[PRED_2Nx2N].sa8dCost:按照2Nx2N模式进行预测的损失
threshold_2NxN:划分成2NxN的阈值
如果满足下面的不等式关系,表示使用2NxN有可能损失更小
*/
if (try_2NxN_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxN)
{
// 上半部分
refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* top */
// 下半部分
refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* bot */
md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom, qp);
// 检查2NxN帧间预测损失
checkInter_rd0_4(md.pred[PRED_2NxN], cuGeom, SIZE_2NxN, refMasks);
if (md.pred[PRED_2NxN].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_2NxN];
}
/*
检查顺序2
splitCost:划分成为4个子块的损失
md.pred[PRED_2Nx2N].sa8dCost:按照2Nx2N模式进行预测的损失
threshold_Nx2N:划分成Nx2N的阈值
如果满足下面的不等式关系,表示使用Nx2N有可能损失更小
*/
if (splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_Nx2N)
{
refMasks[0] = splitData[0].splitRefs | splitData[2].splitRefs; /* left */
refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* right */
md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
checkInter_rd0_4(md.pred[PRED_Nx2N], cuGeom, SIZE_Nx2N, refMasks);
if (md.pred[PRED_Nx2N].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_Nx2N];
}
// 检查顺序3
if (!try_2NxN_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxN)
{
refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* top */
refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* bot */
md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom, qp);
checkInter_rd0_4(md.pred[PRED_2NxN], cuGeom, SIZE_2NxN, refMasks);
if (md.pred[PRED_2NxN].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_2NxN];
}
}
// 检查(SIZE_2NxnU, SIZE_2NxnD, SIZE_nLx2N, SIZE_nRx2N)
if (m_slice->m_sps->maxAMPDepth > depth)
{
uint64_t splitCost = splitData[0].sa8dCost + splitData[1].sa8dCost + splitData[2].sa8dCost + splitData[3].sa8dCost;
uint32_t threshold_2NxnU, threshold_2NxnD, threshold_nLx2N, threshold_nRx2N;
// 根据帧类型获取threshold
if (m_slice->m_sliceType == P_SLICE)
{
threshold_2NxnU = splitData[0].mvCost[0] + splitData[1].mvCost[0];
threshold_2NxnD = splitData[2].mvCost[0] + splitData[3].mvCost[0];
threshold_nLx2N = splitData[0].mvCost[0] + splitData[2].mvCost[0];
threshold_nRx2N = splitData[1].mvCost[0] + splitData[3].mvCost[0];
}
else
{
threshold_2NxnU = (splitData[0].mvCost[0] + splitData[1].mvCost[0]
+ splitData[0].mvCost[1] + splitData[1].mvCost[1] + 1) >> 1;
threshold_2NxnD = (splitData[2].mvCost[0] + splitData[3].mvCost[0]
+ splitData[2].mvCost[1] + splitData[3].mvCost[1] + 1) >> 1;
threshold_nLx2N = (splitData[0].mvCost[0] + splitData[2].mvCost[0]
+ splitData[0].mvCost[1] + splitData[2].mvCost[1] + 1) >> 1;
threshold_nRx2N = (splitData[1].mvCost[0] + splitData[3].mvCost[0]
+ splitData[1].mvCost[1] + splitData[3].mvCost[1] + 1) >> 1;
}
/*
检查是否进行水平或者垂直的划分
(1)如果partSize = 2Nx2N,则进行水平划分尝试
(2)如果partSize = Nx2N,则进行垂直划分尝试
(3)如果partSize = 2Nx2N,并且四叉树根节点有非零系数,则同时采用水平和垂直划分尝试
*/
bool bHor = false, bVer = false;
if (bestInter->cu.m_partSize[0] == SIZE_2NxN)
bHor = true;
else if (bestInter->cu.m_partSize[0] == SIZE_Nx2N)
bVer = true;
else if (bestInter->cu.m_partSize[0] == SIZE_2Nx2N &&
md.bestMode && md.bestMode->cu.getQtRootCbf(0))
{
bHor = true;
bVer = true;
}
// 尝试水平划分
if (bHor)
{
// 检查2NxnD是否优先,确定检查顺序
/*
2NxnD 2NxnU
+--+--+--+--+ +--+--+--+--+
| | | | 25% top
+ + +--+--+--+--+
| | 75% top | |
+ + + +
| | | | 75% bottom
+--+--+--+--+ + +
| | 25% bottom | |
+--+--+--+--+ +--+--+--+--+
*/
int try_2NxnD_first = threshold_2NxnD < threshold_2NxnU;
if (try_2NxnD_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxnD)
{
refMasks[0] = allSplitRefs; /* 75% top */
refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* 25% bot */
md.pred[PRED_2NxnD].cu.initSubCU(parentCTU, cuGeom, qp);
// 检查2NxnD
checkInter_rd0_4(md.pred[PRED_2NxnD], cuGeom, SIZE_2NxnD, refMasks);
if (md.pred[PRED_2NxnD].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_2NxnD];
}
if (splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxnU)
{
refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* 25% top */
refMasks[1] = allSplitRefs; /* 75% bot */
md.pred[PRED_2NxnU].cu.initSubCU(parentCTU, cuGeom, qp);
// 检查2NxnU
checkInter_rd0_4(md.pred[PRED_2NxnU], cuGeom, SIZE_2NxnU, refMasks);
if (md.pred[PRED_2NxnU].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_2NxnU];
}
if (!try_2NxnD_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxnD)
{
refMasks[0] = allSplitRefs; /* 75% top */
refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* 25% bot */
md.pred[PRED_2NxnD].cu.initSubCU(parentCTU, cuGeom, qp);
checkInter_rd0_4(md.pred[PRED_2NxnD], cuGeom, SIZE_2NxnD, refMasks);
if (md.pred[PRED_2NxnD].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_2NxnD];
}
}
// 尝试垂直划分
if (bVer)
{
/*
nRx2N
75% left 25% left
+--+--+--+--+ +--+--+--+--+
| | | | | |
+ + + + + +
| | | | | |
+ + + + + +
| | | | | |
+ + + + + +
| | | | | |
+--+--+--+--+ +--+--+--+--+
25% right 75% right
*/
int try_nRx2N_first = threshold_nRx2N < threshold_nLx2N;
if (try_nRx2N_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_nRx2N)
{
refMasks[0] = allSplitRefs; /* 75% left */
refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* 25% right */
md.pred[PRED_nRx2N].cu.initSubCU(parentCTU, cuGeom, qp);
checkInter_rd0_4(md.pred[PRED_nRx2N], cuGeom, SIZE_nRx2N, refMasks);
if (md.pred[PRED_nRx2N].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_nRx2N];
}
if (splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_nLx2N)
{
refMasks[0] = splitData[0].splitRefs | splitData[2].splitRefs; /* 25% left */
refMasks[1] = allSplitRefs; /* 75% right */
md.pred[PRED_nLx2N].cu.initSubCU(parentCTU, cuGeom, qp);
checkInter_rd0_4(md.pred[PRED_nLx2N], cuGeom, SIZE_nLx2N, refMasks);
if (md.pred[PRED_nLx2N].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_nLx2N];
}
if (!try_nRx2N_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_nRx2N)
{
refMasks[0] = allSplitRefs; /* 75% left */
refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* 25% right */
md.pred[PRED_nRx2N].cu.initSubCU(parentCTU, cuGeom, qp);
checkInter_rd0_4(md.pred[PRED_nRx2N], cuGeom, SIZE_nRx2N, refMasks);
if (md.pred[PRED_nRx2N].sa8dCost < bestInter->sa8dCost)
bestInter = &md.pred[PRED_nRx2N];
}
}
}
}
/*
检查是否需要进行intra模式的尝试,需要满足的条件为
(1)sliceType不为B帧,或者允许B帧中使用intra模式
(2)CUSize不能为64
(3)bCTUInfo第三位不能为1(没研究过,但bCTUInfo默认为0)
(4)bCtuInfoCheck表示是否启用基于CTU内容信息的编码策略调整
*/
bool bTryIntra = (m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE && !((m_param->bCTUInfo & 4) && bCtuInfoCheck);
// rdLevel默认为3
if (m_param->rdLevel >= 3)
{
/* Calculate RD cost of best inter option */
if ((!m_bChromaSa8d && (m_csp != X265_CSP_I400)) || (m_frame->m_fencPic->m_picCsp == X265_CSP_I400 && m_csp != X265_CSP_I400)) /* When m_bChromaSa8d is enabled, chroma MC has already been done */
{
uint32_t numPU = bestInter->cu.getNumPartInter(0);
for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
{
PredictionUnit pu(bestInter->cu, cuGeom, puIdx);
motionCompensation(bestInter->cu, pu, bestInter->predYuv, false, true);
}
}
// 不使用merge模式
if (!chooseMerge)
{
// 将前面确定的模式进行编码并计算RdCost
encodeResAndCalcRdInterCU(*bestInter, cuGeom);
checkBestMode(*bestInter, depth);
/* If BIDIR is available and within 17/16 of best inter option, choose by RDO */
// 如果BIDIR的损失小于等于最佳模式的17/16倍(应该是经验性参数)
if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost != MAX_INT64 &&
md.pred[PRED_BIDIR].sa8dCost * 16 <= bestInter->sa8dCost * 17)
{
uint32_t numPU = md.pred[PRED_BIDIR].cu.getNumPartInter(0);
if (m_frame->m_fencPic->m_picCsp == X265_CSP_I400 && m_csp != X265_CSP_I400)
for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
{
PredictionUnit pu(md.pred[PRED_BIDIR].cu, cuGeom, puIdx);
// BIDIR模式的运动补偿
motionCompensation(md.pred[PRED_BIDIR].cu, pu, md.pred[PRED_BIDIR].predYuv, true, true);
}
// 计算BIDIR模式的损失
encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], cuGeom);
checkBestMode(md.pred[PRED_BIDIR], depth);
}
}
// 尝试intra模式
if ((bTryIntra && md.bestMode->cu.getQtRootCbf(0)) ||
md.bestMode->sa8dCost == MAX_INT64)
{
if (!m_param->limitReferences || splitIntra)
{
ProfileCounter(parentCTU, totalIntraCU[cuGeom.depth]);
md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom, qp);
checkIntraInInter(md.pred[PRED_INTRA], cuGeom);
encodeIntraInInter(md.pred[PRED_INTRA], cuGeom);
checkBestMode(md.pred[PRED_INTRA], depth);
}
else
{
ProfileCounter(parentCTU, skippedIntraCU[cuGeom.depth]);
}
}
}
else
{
/* SA8D choice between merge/skip, inter, bidir, and intra */
if (!md.bestMode || bestInter->sa8dCost < md.bestMode->sa8dCost)
md.bestMode = bestInter;
if (m_slice->m_sliceType == B_SLICE &&
md.pred[PRED_BIDIR].sa8dCost < md.bestMode->sa8dCost)
md.bestMode = &md.pred[PRED_BIDIR];
if (bTryIntra || md.bestMode->sa8dCost == MAX_INT64)
{
if (!m_param->limitReferences || splitIntra)
{
ProfileCounter(parentCTU, totalIntraCU[cuGeom.depth]);
md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom, qp);
checkIntraInInter(md.pred[PRED_INTRA], cuGeom);
if (md.pred[PRED_INTRA].sa8dCost < md.bestMode->sa8dCost)
md.bestMode = &md.pred[PRED_INTRA];
}
else
{
ProfileCounter(parentCTU, skippedIntraCU[cuGeom.depth]);
}
}
/* finally code the best mode selected by SA8D costs:
* RD level 2 - fully encode the best mode
* RD level 1 - generate recon pixels
* RD level 0 - generate chroma prediction */
if (md.bestMode->cu.m_mergeFlag[0] && md.bestMode->cu.m_partSize[0] == SIZE_2Nx2N)
{
/* prediction already generated for this CU, and if rd level
* is not 0, it is already fully encoded */
}
else if (md.bestMode->cu.isInter(0))
{
uint32_t numPU = md.bestMode->cu.getNumPartInter(0);
if (m_csp != X265_CSP_I400)
{
for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
{
PredictionUnit pu(md.bestMode->cu, cuGeom, puIdx);
motionCompensation(md.bestMode->cu, pu, md.bestMode->predYuv, false, true);
}
}
if (m_param->rdLevel == 2)
encodeResAndCalcRdInterCU(*md.bestMode, cuGeom);
else if (m_param->rdLevel == 1)
{
/* generate recon pixels with no rate distortion considerations */
CUData& cu = md.bestMode->cu;
uint32_t tuDepthRange[2];
cu.getInterTUQtDepthRange(tuDepthRange, 0);
m_rqt[cuGeom.depth].tmpResiYuv.subtract(*md.bestMode->fencYuv, md.bestMode->predYuv, cuGeom.log2CUSize, m_frame->m_fencPic->m_picCsp);
residualTransformQuantInter(*md.bestMode, cuGeom, 0, 0, tuDepthRange);
if (cu.getQtRootCbf(0))
md.bestMode->reconYuv.addClip(md.bestMode->predYuv, m_rqt[cuGeom.depth].tmpResiYuv, cu.m_log2CUSize[0], m_frame->m_fencPic->m_picCsp);
else
{
md.bestMode->reconYuv.copyFromYuv(md.bestMode->predYuv);
if (cu.m_mergeFlag[0] && cu.m_partSize[0] == SIZE_2Nx2N)
cu.setPredModeSubParts(MODE_SKIP);
}
}
}
else
{
if (m_param->rdLevel == 2)
encodeIntraInInter(*md.bestMode, cuGeom);
else if (m_param->rdLevel == 1)
{
/* generate recon pixels with no rate distortion considerations */
CUData& cu = md.bestMode->cu;
uint32_t tuDepthRange[2];
cu.getIntraTUQtDepthRange(tuDepthRange, 0);
residualTransformQuantIntra(*md.bestMode, cuGeom, 0, 0, tuDepthRange);
if (m_csp != X265_CSP_I400)
{
getBestIntraModeChroma(*md.bestMode, cuGeom);
residualQTIntraChroma(*md.bestMode, cuGeom, 0, 0);
}
md.bestMode->reconYuv.copyFromPicYuv(reconPic, cu.m_cuAddr, cuGeom.absPartIdx); // TODO:
}
}
}
} // !earlyskip
if (m_bTryLossless)
tryLossless(cuGeom);
if (mightSplit)
addSplitFlagCost(*md.bestMode, cuGeom.depth);
}
if (mightSplit && !skipRecursion)
{
Mode* splitPred = &md.pred[PRED_SPLIT];
if (!md.bestMode)
md.bestMode = splitPred;
else if (m_param->rdLevel > 1)
checkBestMode(*splitPred, cuGeom.depth);
else if (splitPred->sa8dCost < md.bestMode->sa8dCost)
md.bestMode = splitPred;
checkDQPForSplitPred(*md.bestMode, cuGeom);
}
/* determine which motion references the parent CU should search */
splitCUData.initSplitCUData();
if (m_param->limitReferences & X265_REF_LIMIT_DEPTH)
{
if (md.bestMode == &md.pred[PRED_SPLIT])
splitCUData.splitRefs = allSplitRefs;
else
{
/* use best merge/inter mode, in case of intra use 2Nx2N inter references */
CUData& cu = md.bestMode->cu.isIntra(0) ? md.pred[PRED_2Nx2N].cu : md.bestMode->cu;
uint32_t numPU = cu.getNumPartInter(0);
for (uint32_t puIdx = 0, subPartIdx = 0; puIdx < numPU; puIdx++, subPartIdx += cu.getPUOffset(puIdx, 0))
splitCUData.splitRefs |= cu.getBestRefIdx(subPartIdx);
}
}
if (m_param->limitModes)
{
splitCUData.mvCost[0] = md.pred[PRED_2Nx2N].bestME[0][0].mvCost; // L0
splitCUData.mvCost[1] = md.pred[PRED_2Nx2N].bestME[0][1].mvCost; // L1
splitCUData.sa8dCost = md.pred[PRED_2Nx2N].sa8dCost;
}
// 最佳模式是skip模式,更新cu统计信息
if (mightNotSplit && md.bestMode->cu.isSkipped(0))
{
FrameData& curEncData = *m_frame->m_encData;
FrameData::RCStatCU& cuStat = curEncData.m_cuStat[parentCTU.m_cuAddr];
uint64_t temp = cuStat.avgCost[depth] * cuStat.count[depth];
cuStat.count[depth] += 1;
cuStat.avgCost[depth] = (temp + md.bestMode->rdCost) / cuStat.count[depth];
}
/* Copy best data to encData CTU and recon */
// 拷贝最新的data到recon缓冲区中
md.bestMode->cu.copyToPic(depth);
if (m_param->rdLevel)
md.bestMode->reconYuv.copyToPicYuv(reconPic, cuAddr, cuGeom.absPartIdx);
if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)
{
if (mightNotSplit)
{
CUData* ctu = md.bestMode->cu.m_encData->getPicCTU(parentCTU.m_cuAddr);
int8_t maxTUDepth = -1;
for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
maxTUDepth = X265_MAX(maxTUDepth, md.bestMode->cu.m_tuDepth[i]);
ctu->m_refTuDepth[cuGeom.geomRecurId] = maxTUDepth;
}
}
}
else
{
// ...
}
return splitCUData;
}
2.1 检查Merge/Skip模式(checkMerge2Nx2N_rd0_4)
函数的主要作用是检查merge模式和skip模式对应的损失,主要的工作流程为
(1)获取merge候选列表(getInterMergeCandidates)
(2)检查merge候选模式列表,确认最佳merge模式(使用了运动补偿motionCompensation,基于SAD)
(3)基于最佳merge模式,计算不编码残差的损失(encodeResAndCalcRdSkipCU,基于SSE)
(4)基于最佳merge模式,计算编码残差的损失(encodeResAndCalcRdInterCU,基于SSE)
PS:需要注意的是,这里说的Skip模式指的是基于最佳Merge模式,不对最佳Merge模式的残差进行编码的操作
/* sets md.bestMode if a valid merge candidate is found, else leaves it NULL */
void Analysis::checkMerge2Nx2N_rd0_4(Mode& skip, Mode& merge, const CUGeom& cuGeom)
{
uint32_t depth = cuGeom.depth;
ModeDepth& md = m_modeDepth[depth];
Yuv *fencYuv = &md.fencYuv;
/* Note that these two Mode instances are named MERGE and SKIP but they may
* hold the reverse when the function returns. We toggle between the two modes */
Mode* tempPred = &merge;
Mode* bestPred = &skip;
X265_CHECK(m_slice->m_sliceType != I_SLICE, "Evaluating merge in I slice\n");
tempPred->initCosts();
tempPred->cu.setPartSizeSubParts(SIZE_2Nx2N);
tempPred->cu.setPredModeSubParts(MODE_INTER);
tempPred->cu.m_mergeFlag[0] = true;
bestPred->initCosts();
bestPred->cu.setPartSizeSubParts(SIZE_2Nx2N);
bestPred->cu.setPredModeSubParts(MODE_INTER);
bestPred->cu.m_mergeFlag[0] = true;
MVField candMvField[MRG_MAX_NUM_CANDS][2]; // double length for mv of both lists,存储MV列表
uint8_t candDir[MRG_MAX_NUM_CANDS]; // 存储前后向
// 1. 获取merge候选列表,MRG_MAX_NUM_CANDS = 5,实际使用时可能为3,与参数配置有关系
uint32_t numMergeCand = tempPred->cu.getInterMergeCandidates(0, 0, candMvField, candDir);
PredictionUnit pu(merge.cu, cuGeom, 0);
bestPred->sa8dCost = MAX_INT64;
int bestSadCand = -1;
int sizeIdx = cuGeom.log2CUSize - 2;
int safeX, maxSafeMv;
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE)
{
safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
maxSafeMv = (safeX - tempPred->cu.m_cuPelX) * 4;
}
// 2. 检查merge候选模式
for (uint32_t i = 0; i < numMergeCand; ++i)
{
// 是否启用帧级并行处理
if (m_bFrameParallel)
{
// Parallel slices bound check
if (m_param->maxSlices > 1)
{
// NOTE: First row in slice can't negative
if (X265_MIN(candMvField[i][0].mv.y, candMvField[i][1].mv.y) < m_sliceMinY)
continue;
// Last row in slice can't reference beyond bound since it is another slice area
// TODO: we may beyond bound in future since these area have a chance to finish because we use parallel slices. Necessary prepare research on load balance
if (X265_MAX(candMvField[i][0].mv.y, candMvField[i][1].mv.y) > m_sliceMaxY)
continue;
}
if (candMvField[i][0].mv.y >= (m_param->searchRange + 1) * 4 ||
candMvField[i][1].mv.y >= (m_param->searchRange + 1) * 4)
continue;
}
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
tempPred->cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
candMvField[i][0].mv.x > maxSafeMv)
// skip merge candidates which reference beyond safe reference area
continue;
// merge候选模式存储在L0中
tempPred->cu.m_mvpIdx[0][0] = (uint8_t)i; // merge candidate ID is stored in L0 MVP idx
X265_CHECK(m_slice->m_sliceType == B_SLICE || !(candDir[i] & 0x10), " invalid merge for P slice\n");
tempPred->cu.m_interDir[0] = candDir[i]; // 候选列表信息
tempPred->cu.m_mv[0][0] = candMvField[i][0].mv; // 前向mv(第二个维度0表示前向,1表示后向)
tempPred->cu.m_mv[1][0] = candMvField[i][1].mv; // 后向mv
tempPred->cu.m_refIdx[0][0] = (int8_t)candMvField[i][0].refIdx; // 前向参考帧索引
tempPred->cu.m_refIdx[1][0] = (int8_t)candMvField[i][1].refIdx; // 后向参考帧索引
// 运动补偿(根据MV来获取预测块)
motionCompensation(tempPred->cu, pu, tempPred->predYuv, true, m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400));
tempPred->sa8dBits = getTUBits(i, numMergeCand);
// 根据运动补偿MC获取的预测块,来计算sad
tempPred->distortion = primitives.cu[sizeIdx].sa8d(fencYuv->m_buf[0], fencYuv->m_size, tempPred->predYuv.m_buf[0], tempPred->predYuv.m_size);
if (m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400))
{
tempPred->distortion += primitives.chroma[m_csp].cu[sizeIdx].sa8d(fencYuv->m_buf[1], fencYuv->m_csize, tempPred->predYuv.m_buf[1], tempPred->predYuv.m_csize);
tempPred->distortion += primitives.chroma[m_csp].cu[sizeIdx].sa8d(fencYuv->m_buf[2], fencYuv->m_csize, tempPred->predYuv.m_buf[2], tempPred->predYuv.m_csize);
}
// 计算rdCost
tempPred->sa8dCost = m_rdCost.calcRdSADCost((uint32_t)tempPred->distortion, tempPred->sa8dBits);
// 检查当前模式的rdCost是否是最佳的
if (tempPred->sa8dCost < bestPred->sa8dCost)
{
bestSadCand = i;
std::swap(tempPred, bestPred);
}
}
/* force mode decision to take inter or intra */
if (bestSadCand < 0)
return;
/* calculate the motion compensation for chroma for the best mode selected */
// 检查chroma分量
if ((!m_bChromaSa8d && (m_csp != X265_CSP_I400)) || (m_frame->m_fencPic->m_picCsp == X265_CSP_I400 && m_csp != X265_CSP_I400)) /* Chroma MC was done above */
motionCompensation(bestPred->cu, pu, bestPred->predYuv, false, true);
if (m_param->rdLevel)
{
if (m_param->bLossless)
bestPred->rdCost = MAX_INT64;
else // 3. 基于最佳merge模式,计算直接skip的损失(基于SSE),skip模式不会实际编码残差
encodeResAndCalcRdSkipCU(*bestPred);
/* Encode with residual */
tempPred->cu.m_mvpIdx[0][0] = (uint8_t)bestSadCand;
tempPred->cu.setPUInterDir(candDir[bestSadCand], 0, 0);
tempPred->cu.setPUMv(0, candMvField[bestSadCand][0].mv, 0, 0);
tempPred->cu.setPUMv(1, candMvField[bestSadCand][1].mv, 0, 0);
tempPred->cu.setPURefIdx(0, (int8_t)candMvField[bestSadCand][0].refIdx, 0, 0);
tempPred->cu.setPURefIdx(1, (int8_t)candMvField[bestSadCand][1].refIdx, 0, 0);
tempPred->sa8dCost = bestPred->sa8dCost;
tempPred->sa8dBits = bestPred->sa8dBits;
tempPred->predYuv.copyFromYuv(bestPred->predYuv);
// 4. 将bestSadCand使用SSE再计算一遍,获取基于SSE的损失,会实际编码残差
encodeResAndCalcRdInterCU(*tempPred, cuGeom);
/*
从两者中取出最佳的模式
(1)bestPred指向的是skip模式
(2)tempPred指向的是从merge候选列表中得到bestSadCand模式(基于SSE重新计算之后)
*/
md.bestMode = tempPred->rdCost < bestPred->rdCost ? tempPred : bestPred;
}
else
md.bestMode = bestPred;
/* broadcast sets of MV field data */
// 存储最佳模式
md.bestMode->cu.setPUInterDir(candDir[bestSadCand], 0, 0);
md.bestMode->cu.setPUMv(0, candMvField[bestSadCand][0].mv, 0, 0);
md.bestMode->cu.setPUMv(1, candMvField[bestSadCand][1].mv, 0, 0);
md.bestMode->cu.setPURefIdx(0, (int8_t)candMvField[bestSadCand][0].refIdx, 0, 0);
md.bestMode->cu.setPURefIdx(1, (int8_t)candMvField[bestSadCand][1].refIdx, 0, 0);
checkDQP(*md.bestMode, cuGeom);
}
2.1.1 获取Merge候选列表(getInterMergeCandidates)
/* Construct list of merging candidates, returns count */
uint32_t CUData::getInterMergeCandidates(uint32_t absPartIdx, uint32_t puIdx, MVField(*candMvField)[2], uint8_t* candDir) const
{
uint32_t absPartAddr = m_absIdxInCTU + absPartIdx;
const bool isInterB = m_slice->isInterB();
const uint32_t maxNumMergeCand = m_slice->m_maxNumMergeCand;
for (uint32_t i = 0; i < maxNumMergeCand; ++i)
{
candMvField[i][0].mv = 0;
candMvField[i][1].mv = 0;
candMvField[i][0].refIdx = REF_NOT_VALID;
candMvField[i][1].refIdx = REF_NOT_VALID;
}
/* calculate the location of upper-left corner pixel and size of the current PU */
int xP, yP, nPSW, nPSH;
int cuSize = 1 << m_log2CUSize[0];
int partMode = m_partSize[0];
/*
// Partition table.
总共有3个维度:
(1)第1维表示划分的方式,例如SIZE_2Nx2N,长度为9
(2)第2维表示划分之后的索引号,即第几个块,长度为4
(3)第3维长度为2,其中第一个表示划分的尺寸,第二个表示划分偏移量
举例如下
(1)partTable[0][0][0] = 0x44,表示这是一个4x4的块,partTable[0][0][1] = 0x00,表示不存在偏移量
(2)partTable[3][0][0] = 0x22,表示这是一个划分成为4个子块的情况,并且标识的是第一个2x2的块;partTable[3][1][1] = 0x20,
表示第二个2x2块,水平偏移量为2,垂直偏移量为0
const uint32_t partTable[8][4][2] =
{
// XY
{ { 0x44, 0x00 }, { 0x00, 0x00 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2Nx2N.
{ { 0x42, 0x00 }, { 0x42, 0x02 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxN.
{ { 0x24, 0x00 }, { 0x24, 0x20 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_Nx2N.
{ { 0x22, 0x00 }, { 0x22, 0x20 }, { 0x22, 0x02 }, { 0x22, 0x22 } }, // SIZE_NxN.
{ { 0x41, 0x00 }, { 0x43, 0x01 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxnU.
{ { 0x43, 0x00 }, { 0x41, 0x03 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxnD.
{ { 0x14, 0x00 }, { 0x34, 0x10 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_nLx2N.
{ { 0x34, 0x00 }, { 0x14, 0x30 }, { 0x00, 0x00 }, { 0x00, 0x00 } } // SIZE_nRx2N.
};
*/
int tmp = partTable[partMode][puIdx][0]; // 尺寸
nPSW = ((tmp >> 4) * cuSize) >> 2; // 宽
nPSH = ((tmp & 0xF) * cuSize) >> 2; // 高
tmp = partTable[partMode][puIdx][1]; // 偏移量
xP = ((tmp >> 4) * cuSize) >> 2; // x偏移量(或者说相对位置)
yP = ((tmp & 0xF) * cuSize) >> 2; // y偏移量
uint32_t count = 0;
// 根据partSize计算left-bottom位置的idx
uint32_t partIdxLT, partIdxRT, partIdxLB = deriveLeftBottomIdx(puIdx);
PartSize curPS = (PartSize)m_partSize[absPartIdx];
/*
Merge候选列表的建立(下图中PU尺寸不固定,只表明相对位置),merge列表最多5个
(1)空域候选列表 = { A1, B1, B0, A0, B2 },按照顺序至多选择4个
+--+ +--+--+
|B2| |B1|B0|
+--+--+--+--+--+--+--+
| |
+ +
| Current |
+ PU +
| |
+--+ +
|A1| |
+--+--+--+--+--+
|A0|
+--+
需要特殊处理的情况,针对下面的PU 2
情况1: 不能存在A1的运动信息 情况2: 不能存在B1的运动信息
+--+ +--+--+ +--+--+--+--+
|B2| |B1|B0| | |
+--+--+--+--+--+--+--+--+--+ +--+ PU 1 +--+
| | | |B2| |B0|
+ + + +--+--+--+--+--+--+
| | | | |
+ PU 1 + PU 2 + +--+ PU 2 +
| | | |A1| |
+ + + +--+--+--+--+--+
| | | |A0|
+--+--+--+--+--+--+--+--+ +--+
|A0|
+--+
(2)时域候选列表 = { H or C3(if H not exist) }
Current PU
+----+----+----+----+
| | |
+ + +
| | |
+----+----+----+----+
| | C3 | |
+ +----+ +
| | |
+----+----+----+----+----+
| H |
+----+
*/
// left
uint32_t leftPartIdx = 0;
const CUData* cuLeft = getPULeft(leftPartIdx, partIdxLB);
// 检查A1是否存在
bool isAvailableA1 = cuLeft &&
/*
isDiffMER()用于检查当前PU和空域需要参考的PU是否位于同一merge区域
(1)相邻块x = xP - 1,相邻块y = yP + nPSH - 1
(2)当前块x = xP,当前块y = yP
我理解这里相邻块x = xP - 1不是表示相邻块左上角的x,而是相邻区域(不然似乎对应不上)
*/
cuLeft->isDiffMER(xP - 1, yP + nPSH - 1, xP, yP) &&
// 检查是否为情况1,如果是情况1,则不存在A1信息
!(puIdx == 1 && (curPS == SIZE_Nx2N || curPS == SIZE_nLx2N || curPS == SIZE_nRx2N)) &&
cuLeft->isInter(leftPartIdx);
// 如果A1块存在,则取出dir和mv
if (isAvailableA1)
{
// get Inter Dir
candDir[count] = cuLeft->m_interDir[leftPartIdx];
// get Mv from Left
cuLeft->getMvField(cuLeft, leftPartIdx, 0, candMvField[count][0]);
if (isInterB)
cuLeft->getMvField(cuLeft, leftPartIdx, 1, candMvField[count][1]);
if (++count == maxNumMergeCand)
return maxNumMergeCand;
}
// 更新partIdxLT和partIdxRT
deriveLeftRightTopIdx(puIdx, partIdxLT, partIdxRT);
// above
uint32_t abovePartIdx = 0;
const CUData* cuAbove = getPUAbove(abovePartIdx, partIdxRT);
// 检查B1是否存在
bool isAvailableB1 = cuAbove &&
/*
检查当前PU和空域需要参考的PU是否位于同一merge区域
(1)相邻块x = xP + nPSW - 1,相邻块y = yP - 1
(2)当前块x = xP,当前块y = yP
*/
cuAbove->isDiffMER(xP + nPSW - 1, yP - 1, xP, yP) &&
!(puIdx == 1 && (curPS == SIZE_2NxN || curPS == SIZE_2NxnU || curPS == SIZE_2NxnD)) &&
cuAbove->isInter(abovePartIdx);
if (isAvailableB1 && (!isAvailableA1 || !cuLeft->hasEqualMotion(leftPartIdx, *cuAbove, abovePartIdx)))
{
// get Inter Dir
candDir[count] = cuAbove->m_interDir[abovePartIdx];
// get Mv from Left
cuAbove->getMvField(cuAbove, abovePartIdx, 0, candMvField[count][0]);
if (isInterB)
cuAbove->getMvField(cuAbove, abovePartIdx, 1, candMvField[count][1]);
if (++count == maxNumMergeCand)
return maxNumMergeCand;
}
// above right
uint32_t aboveRightPartIdx = 0;
const CUData* cuAboveRight = getPUAboveRight(aboveRightPartIdx, partIdxRT);
// 检查B0是否存在
bool isAvailableB0 = cuAboveRight &&
/*
检查当前PU和空域需要参考的PU是否位于同一merge区域
(1)相邻块x = xP + nPSW,相邻块y = yP - 1
(2)当前块x = xP,当前块y = yP
*/
cuAboveRight->isDiffMER(xP + nPSW, yP - 1, xP, yP) &&
cuAboveRight->isInter(aboveRightPartIdx);
if (isAvailableB0 && (!isAvailableB1 || !cuAbove->hasEqualMotion(abovePartIdx, *cuAboveRight, aboveRightPartIdx)))
{
// get Inter Dir
candDir[count] = cuAboveRight->m_interDir[aboveRightPartIdx];
// get Mv from Left
cuAboveRight->getMvField(cuAboveRight, aboveRightPartIdx, 0, candMvField[count][0]);
if (isInterB)
cuAboveRight->getMvField(cuAboveRight, aboveRightPartIdx, 1, candMvField[count][1]);
if (++count == maxNumMergeCand)
return maxNumMergeCand;
}
// left bottom
uint32_t leftBottomPartIdx = 0;
const CUData* cuLeftBottom = this->getPUBelowLeft(leftBottomPartIdx, partIdxLB);
// 检查A0是否存在
bool isAvailableA0 = cuLeftBottom &&
/*
检查当前PU和空域需要参考的PU是否位于同一merge区域
(1)相邻块x = xP - 1,相邻块y = yP + nPSH
(2)当前块x = xP,当前块y = yP
*/
cuLeftBottom->isDiffMER(xP - 1, yP + nPSH, xP, yP) &&
cuLeftBottom->isInter(leftBottomPartIdx);
if (isAvailableA0 && (!isAvailableA1 || !cuLeft->hasEqualMotion(leftPartIdx, *cuLeftBottom, leftBottomPartIdx)))
{
// get Inter Dir
candDir[count] = cuLeftBottom->m_interDir[leftBottomPartIdx];
// get Mv from Left
cuLeftBottom->getMvField(cuLeftBottom, leftBottomPartIdx, 0, candMvField[count][0]);
if (isInterB)
cuLeftBottom->getMvField(cuLeftBottom, leftBottomPartIdx, 1, candMvField[count][1]);
if (++count == maxNumMergeCand)
return maxNumMergeCand;
}
// above left
// 如果前面获取的merge cand小于4个,还会检查左上角的块,即B2
if (count < 4)
{
uint32_t aboveLeftPartIdx = 0;
const CUData* cuAboveLeft = getPUAboveLeft(aboveLeftPartIdx, absPartAddr);
// 检查B2是否可用
bool isAvailableB2 = cuAboveLeft &&
cuAboveLeft->isDiffMER(xP - 1, yP - 1, xP, yP) &&
cuAboveLeft->isInter(aboveLeftPartIdx);
if (isAvailableB2 && (!isAvailableA1 || !cuLeft->hasEqualMotion(leftPartIdx, *cuAboveLeft, aboveLeftPartIdx))
&& (!isAvailableB1 || !cuAbove->hasEqualMotion(abovePartIdx, *cuAboveLeft, aboveLeftPartIdx)))
{
// get Inter Dir
candDir[count] = cuAboveLeft->m_interDir[aboveLeftPartIdx];
// get Mv from Left
cuAboveLeft->getMvField(cuAboveLeft, aboveLeftPartIdx, 0, candMvField[count][0]);
if (isInterB)
cuAboveLeft->getMvField(cuAboveLeft, aboveLeftPartIdx, 1, candMvField[count][1]);
if (++count == maxNumMergeCand)
return maxNumMergeCand;
}
}
/*
检查TemporalMVP是否可用,如果可用则去获取时域上的参考列表
*/
if (m_slice->m_sps->bTemporalMVPEnabled)
{
// 获取右下角pu idx
uint32_t partIdxRB = deriveRightBottomIdx(puIdx);
MV colmv;
int ctuIdx = -1;
// image boundary check
if (m_encData->getPicCTU(m_cuAddr)->m_cuPelX + g_zscanToPelX[partIdxRB] + UNIT_SIZE < m_slice->m_sps->picWidthInLumaSamples &&
m_encData->getPicCTU(m_cuAddr)->m_cuPelY + g_zscanToPelY[partIdxRB] + UNIT_SIZE < m_slice->m_sps->picHeightInLumaSamples)
{
uint32_t absPartIdxRB = g_zscanToRaster[partIdxRB];
uint32_t numUnits = s_numPartInCUSize;
// 检查absPartIdxRB是否是最后一列或者最后一行
bool bNotLastCol = lessThanCol(absPartIdxRB, numUnits - 1); // is not at the last column of CTU
bool bNotLastRow = lessThanRow(absPartIdxRB, numUnits - 1); // is not at the last row of CTU
// 确定时域候选列表同位PU的位置
if (bNotLastCol && bNotLastRow)
{
absPartAddr = g_rasterToZscan[absPartIdxRB + RASTER_SIZE + 1];
ctuIdx = m_cuAddr;
}
else if (bNotLastCol)
absPartAddr = g_rasterToZscan[(absPartIdxRB + 1) & (numUnits - 1)];
else if (bNotLastRow)
{
absPartAddr = g_rasterToZscan[absPartIdxRB + RASTER_SIZE - numUnits + 1];
ctuIdx = m_cuAddr + 1;
}
else // is the right bottom corner of CTU
absPartAddr = 0;
}
// B帧具有两个时域候选模式,P帧只有一个
int maxList = isInterB ? 2 : 1;
int dir = 0, refIdx = 0;
for (int list = 0; list < maxList; list++)
{
// 获取colocated-mv
bool bExistMV = ctuIdx >= 0 && getColMVP(colmv, refIdx, list, ctuIdx, absPartAddr);
if (!bExistMV)
{
// 如果右下角的PU没有可用MV,则从C3位置获取mv,作为可用的mv
uint32_t partIdxCenter = deriveCenterIdx(puIdx);
bExistMV = getColMVP(colmv, refIdx, list, m_cuAddr, partIdxCenter);
}
// 如果找到可用MV,则加入到队列中
if (bExistMV)
{
dir |= (1 << list);
candMvField[count][list].mv = colmv;
candMvField[count][list].refIdx = refIdx;
}
}
if (dir != 0)
{
candDir[count] = (uint8_t)dir;
if (++count == maxNumMergeCand)
return maxNumMergeCand;
}
}
// B帧处理组合列表(没研究过)
if (isInterB)
{
const uint32_t cutoff = count * (count - 1);
uint32_t priorityList0 = 0xEDC984; // { 0, 1, 0, 2, 1, 2, 0, 3, 1, 3, 2, 3 }
uint32_t priorityList1 = 0xB73621; // { 1, 0, 2, 0, 2, 1, 3, 0, 3, 1, 3, 2 }
for (uint32_t idx = 0; idx < cutoff; idx++, priorityList0 >>= 2, priorityList1 >>= 2)
{
int i = priorityList0 & 3;
int j = priorityList1 & 3;
if ((candDir[i] & 0x1) && (candDir[j] & 0x2))
{
// get Mv from cand[i] and cand[j]
int refIdxL0 = candMvField[i][0].refIdx;
int refIdxL1 = candMvField[j][1].refIdx;
int refPOCL0 = m_slice->m_refPOCList[0][refIdxL0];
int refPOCL1 = m_slice->m_refPOCList[1][refIdxL1];
if (!(refPOCL0 == refPOCL1 && candMvField[i][0].mv == candMvField[j][1].mv))
{
candMvField[count][0].mv = candMvField[i][0].mv;
candMvField[count][0].refIdx = refIdxL0;
candMvField[count][1].mv = candMvField[j][1].mv;
candMvField[count][1].refIdx = refIdxL1;
candDir[count] = 3;
if (++count == maxNumMergeCand)
return maxNumMergeCand;
}
}
}
}
int numRefIdx = (isInterB) ? X265_MIN(m_slice->m_numRefIdx[0], m_slice->m_numRefIdx[1]) : m_slice->m_numRefIdx[0];
int r = 0;
int refcnt = 0;
// 如果当前MV候选列表长度不足5个,需要填充(0,0)
while (count < maxNumMergeCand)
{
candDir[count] = 1;
candMvField[count][0].mv.word = 0;
candMvField[count][0].refIdx = r;
if (isInterB)
{
candDir[count] = 3;
candMvField[count][1].mv.word = 0;
candMvField[count][1].refIdx = r;
}
count++;
if (refcnt == numRefIdx - 1)
r = 0;
else
{
++r;
++refcnt;
}
}
return count;
}
2.1.2 运动补偿(motionCompensation)
运动补偿会根据前面提取到的MV,进行预测,获取到参考帧中的参考块。在x265中,主要调用了predInterLumaPixel()进行帧间的运动补偿
void Predict::motionCompensation(const CUData& cu, const PredictionUnit& pu, Yuv& predYuv, bool bLuma, bool bChroma)
{
int refIdx0 = cu.m_refIdx[0][pu.puAbsPartIdx];
int refIdx1 = cu.m_refIdx[1][pu.puAbsPartIdx];
// 是否是P帧
if (cu.m_slice->isInterP())
{
/* P Slice */
WeightValues wv0[3];
X265_CHECK(refIdx0 >= 0, "invalid P refidx\n");
X265_CHECK(refIdx0 < cu.m_slice->m_numRefIdx[0], "P refidx out of range\n");
const WeightParam *wp0 = cu.m_slice->m_weightPredTable[0][refIdx0]; // 加权预测相关,没有研究过
MV mv0 = cu.m_mv[0][pu.puAbsPartIdx];
cu.clipMv(mv0);
if (cu.m_slice->m_pps->bUseWeightPred && wp0->wtPresent)
{
for (int plane = 0; plane < (bChroma ? 3 : 1); plane++)
{
wv0[plane].w = wp0[plane].inputWeight;
wv0[plane].offset = wp0[plane].inputOffset * (1 << (X265_DEPTH - 8));
wv0[plane].shift = wp0[plane].log2WeightDenom;
wv0[plane].round = wp0[plane].log2WeightDenom >= 1 ? 1 << (wp0[plane].log2WeightDenom - 1) : 0;
}
ShortYuv& shortYuv = m_predShortYuv[0];
if (bLuma)
predInterLumaShort(pu, shortYuv, *cu.m_slice->m_refReconPicList[0][refIdx0], mv0);
if (bChroma)
predInterChromaShort(pu, shortYuv, *cu.m_slice->m_refReconPicList[0][refIdx0], mv0);
addWeightUni(pu, predYuv, shortYuv, wv0, bLuma, bChroma);
}
else
{
// 亮度模式运动补偿
if (bLuma)
predInterLumaPixel(pu, predYuv, *cu.m_slice->m_refReconPicList[0][refIdx0], mv0);
// 色度模式运动补偿
if (bChroma)
predInterChromaPixel(pu, predYuv, *cu.m_slice->m_refReconPicList[0][refIdx0], mv0);
}
}
else // B帧(没有研究)
{
// ...
}
}
2.1.2.1 获取预测块(predInterLumaPixel)
从参考帧中获取对应的参考块
void Predict::predInterLumaPixel(const PredictionUnit& pu, Yuv& dstYuv, const PicYuv& refPic, const MV& mv) const
{
pixel* dst = dstYuv.getLumaAddr(pu.puAbsPartIdx);
intptr_t dstStride = dstYuv.m_size;
intptr_t srcStride = refPic.m_stride;
intptr_t srcOffset = (mv.x >> 2) + (mv.y >> 2) * srcStride;
int partEnum = partitionFromSizes(pu.width, pu.height);
const pixel* src = refPic.getLumaAddr(pu.ctuAddr, pu.cuAbsPartIdx + pu.puAbsPartIdx) + srcOffset;
int xFrac = mv.x & 3; // 水平方向偏移量
int yFrac = mv.y & 3; // 垂直方向偏移量
/*
下面根据mv的值确定偏移量
(1)如果x和y的偏移量都为0,直接copy,使用copy_pp()
如果有偏移量,还会进行像素插值,用于后续的亚像素搜索,下面的8tap表示8抽头
(2)如果有x方向的偏移量,使用luma_hpp()进行水平方向的亚像素插值
(3)如果有y方向的偏移量,使用luma_vpp()进行垂直方向的亚像素插值
(4)如果x和y方向的偏移量都不为0,使用luma_hvpp()进行两个方向的亚像素插值
*/
if (!(yFrac | xFrac))
/*
调试过程中发现会使用到的copy函数(非正方形也有对应的处理函数,例如blockcopy_pp_32x16_avx)
p.pu[LUMA_64x64].copy_pp = PFX(blockcopy_pp_64x64_avx);
p.pu[LUMA_32x32].copy_pp = PFX(blockcopy_pp_32x32_avx);
p.pu[LUMA_16x16].copy_pp = x265_blockcopy_pp_16x16_sse2;
p.pu[LUMA_8x8].copy_pp = x265_blockcopy_pp_8x8_sse2;
*/
primitives.pu[partEnum].copy_pp(dst, dstStride, src, srcStride);
else if (!yFrac)
/*
调试过程中发现会使用到的hpp函数
p.pu[LUMA_8x8].luma_hpp = PFX(interp_8tap_horiz_pp_8x8_avx2);
p.pu[LUMA_16x16].luma_hpp = PFX(interp_8tap_horiz_pp_16x16_avx2);
p.pu[LUMA_32x32].luma_hpp = PFX(interp_8tap_horiz_pp_32x32_avx2);
*/
primitives.pu[partEnum].luma_hpp(src, srcStride, dst, dstStride, xFrac);
else if (!xFrac)
/*
调试过程中发现会使用到的vpp函数
p.pu[LUMA_8x8].luma_vpp = PFX(interp_8tap_vert_pp_8x8_avx2);
p.pu[LUMA_16x16].luma_vpp = PFX(interp_8tap_vert_pp_16x16_avx2);
p.pu[LUMA_32x32].luma_vpp = PFX(interp_8tap_vert_pp_32x32_avx2);
*/
primitives.pu[partEnum].luma_vpp(src, srcStride, dst, dstStride, yFrac);
else
/*
调试过程中发现可以使用的vpp函数
ALL_LUMA_PU_T(luma_hvpp, interp_8tap_hv_pp_cpu);
interp_8tap_hv_pp_cpu<size>是一个模板函数,模板变量为size
当size = 1时,表示对8x8块进行处理
当size = 2时,表示对16x16块进行处理
当size = 3时,表示对32x32块进行处理
*/
primitives.pu[partEnum].luma_hvpp(src, srcStride, dst, dstStride, xFrac, yFrac);
}
2.1.3 计算不编码残差的损失(encodeResAndCalcRdSkipCU)
前面已经获取了基于SAD的最佳merge模式,这里计算如果不对残差进行编码,直接进行skip带来的损失,检查是否可以直接使用skip。这里计算distortion使用的是SSE而不是SAD
/* Note: this function overwrites the RD cost variables of interMode, but leaves the sa8d cost unharmed */
// 该函数会覆盖interMode中的RDCost,但不会改动sa8d开销
void Search::encodeResAndCalcRdSkipCU(Mode& interMode)
{
CUData& cu = interMode.cu;
Yuv* reconYuv = &interMode.reconYuv;
const Yuv* fencYuv = interMode.fencYuv;
Yuv* predYuv = &interMode.predYuv;
X265_CHECK(!cu.isIntra(0), "intra CU not expected\n");
uint32_t depth = cu.m_cuDepth[0];
// No residual coding : SKIP mode
// skip模式不去编码残差
cu.setPredModeSubParts(MODE_SKIP);
cu.clearCbf();
cu.setTUDepthSubParts(0, 0, depth);
reconYuv->copyFromYuv(interMode.predYuv);
// 计算基于SSE的Rdcost
// Luma
int part = partitionFromLog2Size(cu.m_log2CUSize[0]);
// 计算sse损失,需要注意的是计算的双方是orig block和recon block
interMode.lumaDistortion = primitives.cu[part].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
interMode.distortion = interMode.lumaDistortion;
// Chroma
if (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400)
{
interMode.chromaDistortion = m_rdCost.scaleChromaDist(1, primitives.chroma[m_csp].cu[part].sse_pp(fencYuv->m_buf[1], fencYuv->m_csize, reconYuv->m_buf[1], reconYuv->m_csize));
interMode.chromaDistortion += m_rdCost.scaleChromaDist(2, primitives.chroma[m_csp].cu[part].sse_pp(fencYuv->m_buf[2], fencYuv->m_csize, reconYuv->m_buf[2], reconYuv->m_csize));
interMode.distortion += interMode.chromaDistortion;
}
cu.m_distortion[0] = interMode.distortion;
m_entropyCoder.load(m_rqt[depth].cur); // 将当前CU的信息输入到熵编码器中,为后续的编码做准备
m_entropyCoder.resetBits(); // 重置比特缓冲区
if (m_slice->m_pps->bTransquantBypassEnabled)
m_entropyCoder.codeCUTransquantBypassFlag(cu.m_tqBypass[0]);
m_entropyCoder.codeSkipFlag(cu, 0); // 编码skip flag
int skipFlagBits = m_entropyCoder.getNumberOfWrittenBits();
m_entropyCoder.codeMergeIndex(cu, 0); // 编码merge idx
interMode.mvBits = m_entropyCoder.getNumberOfWrittenBits() - skipFlagBits;
interMode.coeffBits = 0;
interMode.totalBits = interMode.mvBits + skipFlagBits;
if (m_rdCost.m_psyRd)
interMode.psyEnergy = m_rdCost.psyCost(part, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
else if(m_rdCost.m_ssimRd)
interMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cu.m_log2CUSize[0], TEXT_LUMA, 0);
interMode.resEnergy = primitives.cu[part].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
// 更新该模式的损失
updateModeCost(interMode);
// 存储已编码信息
m_entropyCoder.store(interMode.contexts);
}
2.1.4 计算编码残差的损失(encodeResAndCalcRdInterCU)
沿用前面的最佳merge模式,依据这个模式,在本函数中进行SSE的distortion的计算,使用estimateResidualQT()计算损失。此外,还考虑了cbf为0的情况,即不编码和不传输残差的方式,评估是否有可能使用这种编码方式
/* encode residual and calculate rate-distortion for a CU block.
* Note: this function overwrites the RD cost variables of interMode, but leaves the sa8d cost unharmed */
void Search::encodeResAndCalcRdInterCU(Mode& interMode, const CUGeom& cuGeom)
{
ProfileCUScope(interMode.cu, interRDOElapsedTime[cuGeom.depth], countInterRDO[cuGeom.depth]);
CUData& cu = interMode.cu;
Yuv* reconYuv = &interMode.reconYuv;
Yuv* predYuv = &interMode.predYuv;
uint32_t depth = cuGeom.depth;
ShortYuv* resiYuv = &m_rqt[depth].tmpResiYuv;
const Yuv* fencYuv = interMode.fencYuv;
X265_CHECK(!cu.isIntra(0), "intra CU not expected\n");
uint32_t log2CUSize = cuGeom.log2CUSize;
int sizeIdx = log2CUSize - 2;
// 将预测块pred和编码块enc做差值,获得残差
resiYuv->subtract(*fencYuv, *predYuv, log2CUSize, m_frame->m_fencPic->m_picCsp);
uint32_t tuDepthRange[2];
cu.getInterTUQtDepthRange(tuDepthRange, 0);
m_entropyCoder.load(m_rqt[depth].cur);
if ((m_limitTU & X265_TU_LIMIT_DFS) && !(m_limitTU & X265_TU_LIMIT_NEIGH))
m_maxTUDepth = -1;
else if (m_limitTU & X265_TU_LIMIT_BFS)
memset(&m_cacheTU, 0, sizeof(TUInfoCache));
Cost costs;
if (m_limitTU & X265_TU_LIMIT_NEIGH)
{
/* Save and reload maxTUDepth to avoid changing of maxTUDepth between modes */
int32_t tempDepth = m_maxTUDepth;
if (m_maxTUDepth != -1)
{
uint32_t splitFlag = interMode.cu.m_partSize[0] != SIZE_2Nx2N;
uint32_t minSize = tuDepthRange[0];
uint32_t maxSize = tuDepthRange[1];
maxSize = X265_MIN(maxSize, cuGeom.log2CUSize - splitFlag);
m_maxTUDepth = x265_clip3(cuGeom.log2CUSize - maxSize, cuGeom.log2CUSize - minSize, (uint32_t)m_maxTUDepth);
}
estimateResidualQT(interMode, cuGeom, 0, 0, *resiYuv, costs, tuDepthRange);
m_maxTUDepth = tempDepth;
}
else // 估计编码残差,并计算对应的rdcost
estimateResidualQT(interMode, cuGeom, 0, 0, *resiYuv, costs, tuDepthRange);
/*
检查是否使用bypass(旁路)模式进行编码
(1)对于那些概率接近均匀分布的符号,使用bypass编码可以减少编码开销
(2)这些符号的概率大致相同,不适合使用普通的上下文自适应二进制算术编码
*/
uint32_t tqBypass = cu.m_tqBypass[0];
if (!tqBypass)
{
// 计算Cbf为0情况下的损失,随后与当前模式的costs进行对比,Cbf为0表示不编码残差,也不传输残差
sse_t cbf0Dist = primitives.cu[sizeIdx].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
if (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400)
{
cbf0Dist += m_rdCost.scaleChromaDist(1, primitives.chroma[m_csp].cu[sizeIdx].sse_pp(fencYuv->m_buf[1], predYuv->m_csize, predYuv->m_buf[1], predYuv->m_csize));
cbf0Dist += m_rdCost.scaleChromaDist(2, primitives.chroma[m_csp].cu[sizeIdx].sse_pp(fencYuv->m_buf[2], predYuv->m_csize, predYuv->m_buf[2], predYuv->m_csize));
}
/* Consider the RD cost of not signaling any residual */
m_entropyCoder.load(m_rqt[depth].cur);
m_entropyCoder.resetBits();
m_entropyCoder.codeQtRootCbfZero();
uint32_t cbf0Bits = m_entropyCoder.getNumberOfWrittenBits();
uint32_t cbf0Energy; uint64_t cbf0Cost;
if (m_rdCost.m_psyRd)
{
cbf0Energy = m_rdCost.psyCost(log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
cbf0Cost = m_rdCost.calcPsyRdCost(cbf0Dist, cbf0Bits, cbf0Energy);
}
else if(m_rdCost.m_ssimRd)
{
cbf0Energy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size, log2CUSize, TEXT_LUMA, 0);
cbf0Cost = m_rdCost.calcSsimRdCost(cbf0Dist, cbf0Bits, cbf0Energy);
}
else
cbf0Cost = m_rdCost.calcRdCost(cbf0Dist, cbf0Bits);
// 对比cbf为0的cost和当前模式的cost
if (cbf0Cost < costs.rdcost) //
{
cu.clearCbf();
cu.setTUDepthSubParts(0, 0, depth);
}
}
if (cu.getQtRootCbf(0))
saveResidualQTData(cu, *resiYuv, 0, 0);
/* calculate signal bits for inter/merge/skip coded CU */
m_entropyCoder.load(m_rqt[depth].cur);
m_entropyCoder.resetBits();
if (m_slice->m_pps->bTransquantBypassEnabled)
m_entropyCoder.codeCUTransquantBypassFlag(tqBypass);
uint32_t coeffBits, bits, mvBits;
// 启用merge && size = 2Nx2N && 根节点Cbf为0
if (cu.m_mergeFlag[0] && cu.m_partSize[0] == SIZE_2Nx2N && !cu.getQtRootCbf(0))
{
// 根节点的Cbf为0,说明子块不再需要继续预测,直接skip
cu.setPredModeSubParts(MODE_SKIP);
/* Merge/Skip */
coeffBits = mvBits = 0;
m_entropyCoder.codeSkipFlag(cu, 0); // 编码skip Flag
int skipFlagBits = m_entropyCoder.getNumberOfWrittenBits();
m_entropyCoder.codeMergeIndex(cu, 0); // 编码merge idx
mvBits = m_entropyCoder.getNumberOfWrittenBits() - skipFlagBits;
bits = mvBits + skipFlagBits;
}
else
{
m_entropyCoder.codeSkipFlag(cu, 0);
int skipFlagBits = m_entropyCoder.getNumberOfWrittenBits();
m_entropyCoder.codePredMode(cu.m_predMode[0]);
m_entropyCoder.codePartSize(cu, 0, cuGeom.depth);
m_entropyCoder.codePredInfo(cu, 0);
mvBits = m_entropyCoder.getNumberOfWrittenBits() - skipFlagBits;
bool bCodeDQP = m_slice->m_pps->bUseDQP;
m_entropyCoder.codeCoeff(cu, 0, bCodeDQP, tuDepthRange);
bits = m_entropyCoder.getNumberOfWrittenBits();
coeffBits = bits - mvBits - skipFlagBits;
}
m_entropyCoder.store(interMode.contexts);
if (cu.getQtRootCbf(0))
reconYuv->addClip(*predYuv, *resiYuv, log2CUSize, m_frame->m_fencPic->m_picCsp);
else
reconYuv->copyFromYuv(*predYuv);
// update with clipped distortion and cost (qp estimation loop uses unclipped values)
// 计算最佳的SSE
sse_t bestLumaDist = primitives.cu[sizeIdx].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
interMode.distortion = bestLumaDist;
// 计算chroma分量
if (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400)
{
sse_t bestChromaDist = m_rdCost.scaleChromaDist(1, primitives.chroma[m_csp].cu[sizeIdx].sse_pp(fencYuv->m_buf[1], fencYuv->m_csize, reconYuv->m_buf[1], reconYuv->m_csize));
bestChromaDist += m_rdCost.scaleChromaDist(2, primitives.chroma[m_csp].cu[sizeIdx].sse_pp(fencYuv->m_buf[2], fencYuv->m_csize, reconYuv->m_buf[2], reconYuv->m_csize));
interMode.chromaDistortion = bestChromaDist;
interMode.distortion += bestChromaDist;
}
if (m_rdCost.m_psyRd) // 计算心理视觉的rdcost
interMode.psyEnergy = m_rdCost.psyCost(sizeIdx, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
else if(m_rdCost.m_ssimRd)
interMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cu.m_log2CUSize[0], TEXT_LUMA, 0);
interMode.resEnergy = primitives.cu[sizeIdx].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
interMode.totalBits = bits;
interMode.lumaDistortion = bestLumaDist;
interMode.coeffBits = coeffBits;
interMode.mvBits = mvBits;
cu.m_distortion[0] = interMode.distortion;
// 更新cost
updateModeCost(interMode);
checkDQP(interMode, cuGeom);
}
2.2 常规帧间预测(checkInter_rd0_4)
该函数的作用为进行常规的帧间预测,其中主要调用了predInterSearch()进行帧间搜索,并使用SAD来衡量模式的损失
void Analysis::checkInter_rd0_4(Mode& interMode, const CUGeom& cuGeom, PartSize partSize, uint32_t refMask[2])
{
interMode.initCosts();
interMode.cu.setPartSizeSubParts(partSize);
interMode.cu.setPredModeSubParts(MODE_INTER);
int numPredDir = m_slice->isInterP() ? 1 : 2;
// 是否使用编码分析重用
if (m_param->analysisLoadReuseLevel > 1 && m_param->analysisLoadReuseLevel != 10 && m_reuseInterDataCTU)
{
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
int index = 0;
uint32_t numPU = interMode.cu.getNumPartInter(0);
for (uint32_t part = 0; part < numPU; part++)
{
MotionData* bestME = interMode.bestME[part];
for (int32_t i = 0; i < numPredDir; i++)
bestME[i].ref = m_reuseRef[refOffset + index++];
}
}
// multi-pass优化
if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_reuseInterDataCTU)
{
uint32_t numPU = interMode.cu.getNumPartInter(0);
for (uint32_t part = 0; part < numPU; part++)
{
MotionData* bestME = interMode.bestME[part];
for (int32_t i = 0; i < numPredDir; i++)
{
int* ref = &m_reuseRef[i * m_frame->m_analysisData.numPartitions * m_frame->m_analysisData.numCUsInFrame];
bestME[i].ref = ref[cuGeom.absPartIdx];
bestME[i].mv = m_reuseMv[i][cuGeom.absPartIdx].word;
bestME[i].mvpIdx = m_reuseMvpIdx[i][cuGeom.absPartIdx];
}
}
}
// 进行帧间搜索
predInterSearch(interMode, cuGeom, m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400), refMask);
/* predInterSearch sets interMode.sa8dBits */
// 帧间搜索使用SAD来衡量最佳模式
const Yuv& fencYuv = *interMode.fencYuv;
Yuv& predYuv = interMode.predYuv;
int part = partitionFromLog2Size(cuGeom.log2CUSize);
// 计算SAD
interMode.distortion = primitives.cu[part].sa8d(fencYuv.m_buf[0], fencYuv.m_size, predYuv.m_buf[0], predYuv.m_size);
if (m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400))
{
interMode.distortion += primitives.chroma[m_csp].cu[part].sa8d(fencYuv.m_buf[1], fencYuv.m_csize, predYuv.m_buf[1], predYuv.m_csize);
interMode.distortion += primitives.chroma[m_csp].cu[part].sa8d(fencYuv.m_buf[2], fencYuv.m_csize, predYuv.m_buf[2], predYuv.m_csize);
}
interMode.sa8dCost = m_rdCost.calcRdSADCost((uint32_t)interMode.distortion, interMode.sa8dBits);
if (m_param->analysisSaveReuseLevel > 1 && m_reuseInterDataCTU)
{
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
int index = 0;
uint32_t numPU = interMode.cu.getNumPartInter(0);
for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
{
MotionData* bestME = interMode.bestME[puIdx];
for (int32_t i = 0; i < numPredDir; i++)
m_reuseRef[refOffset + index++] = bestME[i].ref;
}
}
}
2.2.1 帧间预测搜索(predInterSearch)
该函数的作用是为当前PU寻找到最佳的Inter模式,在执行运动搜索之前,需要先确定可参考的MV
(1)如果当前PU是子CU(即partSize不等于2Nx2N),会评估merge模式(mergeEstimation)
(2)构建AMVP列表并从中选出最佳候选模式(getPMV,selectPMV)
(3)进行运动估计(motionEstimate)
/* find the best inter prediction for each PU of specified mode */
void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t refMasks[2])
{
ProfileCUScope(interMode.cu, motionEstimationElapsedTime, countMotionEstimate);
CUData& cu = interMode.cu;
Yuv* predYuv = &interMode.predYuv;
// 12 mv candidates including lowresMV
MV mvc[(MD_ABOVE_LEFT + 1) * 2 + 2]; // motion vector candidates
const Slice *slice = m_slice;
int numPart = cu.getNumPartInter(0);
int numPredDir = slice->isInterP() ? 1 : 2;
const int* numRefIdx = slice->m_numRefIdx;
uint32_t lastMode = 0;
int totalmebits = 0;
MV mvzero(0, 0);
Yuv& tmpPredYuv = m_rqt[cuGeom.depth].tmpPredYuv;
MergeData merge;
memset(&merge, 0, sizeof(merge));
bool useAsMVP = false;
// 分成不同子块进行帧间预测
for (int puIdx = 0; puIdx < numPart; puIdx++)
{
MotionData* bestME = interMode.bestME[puIdx];
PredictionUnit pu(cu, cuGeom, puIdx);
// 配置一些原子计算函数和变量
m_me.setSourcePU(*interMode.fencYuv, pu.ctuAddr, pu.cuAbsPartIdx, pu.puAbsPartIdx, pu.width, pu.height, m_param->searchMethod, m_param->subpelRefine, bChromaMC);
useAsMVP = false;
x265_analysis_inter_data* interDataCTU = NULL;
int cuIdx;
cuIdx = (interMode.cu.m_cuAddr * m_param->num4x4Partitions) + cuGeom.absPartIdx;
if (m_param->analysisLoadReuseLevel == 10 && m_param->interRefine > 1)
{
interDataCTU = m_frame->m_analysisData.interData;
if ((cu.m_predMode[pu.puAbsPartIdx] == interDataCTU->modes[cuIdx + pu.puAbsPartIdx])
&& (cu.m_partSize[pu.puAbsPartIdx] == interDataCTU->partSize[cuIdx + pu.puAbsPartIdx])
&& !(interDataCTU->mergeFlag[cuIdx + puIdx])
&& (cu.m_cuDepth[0] == interDataCTU->depth[cuIdx]))
useAsMVP = true;
}
/* find best cost merge candidate. note: 2Nx2N merge and bidir are handled as separate modes */
// 1. 尽管在checkMerge_2Nx2N_rd0_4当中检查了2Nx2N块的merge模式,这里会对非2Nx2N的块去检查merge模式
uint32_t mrgCost = numPart == 1 ? MAX_UINT : mergeEstimation(cu, cuGeom, pu, puIdx, merge);
bestME[0].cost = MAX_UINT;
bestME[1].cost = MAX_UINT;
// 根据块信息来计算当前块使用的比特数(固定开销)
getBlkBits((PartSize)cu.m_partSize[0], slice->isInterP(), puIdx, lastMode, m_listSelBits);
bool bDoUnidir = true;
// 获取相邻块的MV,为后续构建AMVP做准备
cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);
/* Uni-directional prediction */
if ((m_param->analysisLoadReuseLevel > 1 && m_param->analysisLoadReuseLevel != 10)
|| (m_param->analysisMultiPassRefine && m_param->rc.bStatRead) || (m_param->bAnalysisType == AVC_INFO) || (useAsMVP))
{
// 双向预测没有研究
// ...
}
else if (m_param->bDistributeMotionEstimation) // 分布式运动估计,与多线程相关(没研究过)
{
// ...
}
if (bDoUnidir) // 如果是单向预测
{
interMode.bestME[puIdx][0].ref = interMode.bestME[puIdx][1].ref = -1;
uint32_t refMask = refMasks[puIdx] ? refMasks[puIdx] : (uint32_t)-1;
for (int list = 0; list < numPredDir; list++)
{
for (int ref = 0; ref < numRefIdx[list]; ref++)
{
ProfileCounter(interMode.cu, totalMotionReferences[cuGeom.depth]);
if (!(refMask & (1 << ref)))
{
ProfileCounter(interMode.cu, skippedMotionReferences[cuGeom.depth]);
continue;
}
uint32_t bits = m_listSelBits[list] + MVP_IDX_BITS;
bits += getTUBits(ref, numRefIdx[list]);
// 3. 基于interNeighbours,构建AMVP列表,列表长度为2
int numMvc = cu.getPMV(interMode.interNeighbours, list, ref, interMode.amvpCand[list][ref], mvc);
const MV* amvp = interMode.amvpCand[list][ref];
// 从AMVP列表中选择最佳的MV(2选1)
int mvpIdx = selectMVP(cu, pu, amvp, list, ref);
MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx], mvp_lowres;
bool bLowresMVP = false;
if (!m_param->analysisSave && !m_param->analysisLoad) /* Prevents load/save outputs from diverging when lowresMV is not available */
{
// 获取低分辨率帧的MV
MV lmv = getLowresMV(cu, pu, list, ref);
if (lmv.notZero())
mvc[numMvc++] = lmv;
if (m_param->bEnableHME)
mvp_lowres = lmv;
}
if (m_param->searchMethod == X265_SEA)
{
int puX = puIdx & 1;
int puY = puIdx >> 1;
for (int planes = 0; planes < INTEGRAL_PLANE_NUM; planes++)
m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;
}
// 设置搜索范围(searchRange默认为57)
setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);
// 3. 进行运动估计
int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices,
m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
// 默认不使用HME
if (m_param->bEnableHME && mvp_lowres.notZero() && mvp_lowres != mvp)
{
MV outmv_lowres;
setSearchRange(cu, mvp_lowres, m_param->searchRange, mvmin, mvmax);
int lowresMvCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp_lowres, numMvc, mvc, m_param->searchRange, outmv_lowres, m_param->maxSlices,
m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
if (lowresMvCost < satdCost)
{
outmv = outmv_lowres;
satdCost = lowresMvCost;
bLowresMVP = true;
}
}
/* Get total cost of partition, but only include MV bit cost once */
// 计算对应MV的损失值
bits += m_me.bitcost(outmv);
uint32_t mvCost = m_me.mvcost(outmv);
uint32_t cost = (satdCost - mvCost) + m_rdCost.getCost(bits);
/* Update LowresMVP to best AMVP cand*/
if (bLowresMVP)
updateMVP(amvp[mvpIdx], outmv, bits, cost, mvp_lowres);
/* Refine MVP selection, updates: mvpIdx, bits, cost */
mvp = checkBestMVP(amvp, outmv, mvpIdx, bits, cost);
// 更新损失值
if (cost < bestME[list].cost)
{
bestME[list].mv = outmv;
bestME[list].mvp = mvp;
bestME[list].mvpIdx = mvpIdx;
bestME[list].ref = ref;
bestME[list].cost = cost;
bestME[list].bits = bits;
bestME[list].mvCost = mvCost;
}
}
/* the second list ref bits start at bit 16 */
refMask >>= 16;
}
}
/* Bi-directional prediction */
MotionData bidir[2];
uint32_t bidirCost = MAX_UINT;
int bidirBits = 0;
if (slice->isInterB() && !cu.isBipredRestriction() && /* biprediction is possible for this PU */
cu.m_partSize[pu.puAbsPartIdx] != SIZE_2Nx2N && /* 2Nx2N biprediction is handled elsewhere */
bestME[0].cost != MAX_UINT && bestME[1].cost != MAX_UINT)
{
// B帧双向预测(没研究过)
// ...
}
/* select best option and store into CU */
// 检查最佳的模式
if (mrgCost < bidirCost && mrgCost < bestME[0].cost && mrgCost < bestME[1].cost)
{
cu.m_mergeFlag[pu.puAbsPartIdx] = true;
cu.m_mvpIdx[0][pu.puAbsPartIdx] = merge.index; /* merge candidate ID is stored in L0 MVP idx */
cu.setPUInterDir(merge.dir, pu.puAbsPartIdx, puIdx);
cu.setPUMv(0, merge.mvField[0].mv, pu.puAbsPartIdx, puIdx);
cu.setPURefIdx(0, merge.mvField[0].refIdx, pu.puAbsPartIdx, puIdx);
cu.setPUMv(1, merge.mvField[1].mv, pu.puAbsPartIdx, puIdx);
cu.setPURefIdx(1, merge.mvField[1].refIdx, pu.puAbsPartIdx, puIdx);
totalmebits += merge.bits;
}
else if (bidirCost < bestME[0].cost && bidirCost < bestME[1].cost)
{
lastMode = 2;
cu.m_mergeFlag[pu.puAbsPartIdx] = false;
cu.setPUInterDir(3, pu.puAbsPartIdx, puIdx);
cu.setPUMv(0, bidir[0].mv, pu.puAbsPartIdx, puIdx);
cu.setPURefIdx(0, bestME[0].ref, pu.puAbsPartIdx, puIdx);
cu.m_mvd[0][pu.puAbsPartIdx] = bidir[0].mv - bidir[0].mvp;
cu.m_mvpIdx[0][pu.puAbsPartIdx] = bidir[0].mvpIdx;
cu.setPUMv(1, bidir[1].mv, pu.puAbsPartIdx, puIdx);
cu.setPURefIdx(1, bestME[1].ref, pu.puAbsPartIdx, puIdx);
cu.m_mvd[1][pu.puAbsPartIdx] = bidir[1].mv - bidir[1].mvp;
cu.m_mvpIdx[1][pu.puAbsPartIdx] = bidir[1].mvpIdx;
totalmebits += bidirBits;
}
else if (bestME[0].cost <= bestME[1].cost)
{
lastMode = 0;
cu.m_mergeFlag[pu.puAbsPartIdx] = false;
cu.setPUInterDir(1, pu.puAbsPartIdx, puIdx);
cu.setPUMv(0, bestME[0].mv, pu.puAbsPartIdx, puIdx);
cu.setPURefIdx(0, bestME[0].ref, pu.puAbsPartIdx, puIdx);
cu.m_mvd[0][pu.puAbsPartIdx] = bestME[0].mv - bestME[0].mvp;
cu.m_mvpIdx[0][pu.puAbsPartIdx] = bestME[0].mvpIdx;
cu.setPURefIdx(1, REF_NOT_VALID, pu.puAbsPartIdx, puIdx);
cu.setPUMv(1, mvzero, pu.puAbsPartIdx, puIdx);
totalmebits += bestME[0].bits;
}
else
{ // 存储最佳模式信息
lastMode = 1;
cu.m_mergeFlag[pu.puAbsPartIdx] = false;
cu.setPUInterDir(2, pu.puAbsPartIdx, puIdx);
cu.setPUMv(1, bestME[1].mv, pu.puAbsPartIdx, puIdx);
cu.setPURefIdx(1, bestME[1].ref, pu.puAbsPartIdx, puIdx);
cu.m_mvd[1][pu.puAbsPartIdx] = bestME[1].mv - bestME[1].mvp;
cu.m_mvpIdx[1][pu.puAbsPartIdx] = bestME[1].mvpIdx;
cu.setPURefIdx(0, REF_NOT_VALID, pu.puAbsPartIdx, puIdx);
cu.setPUMv(0, mvzero, pu.puAbsPartIdx, puIdx);
totalmebits += bestME[1].bits;
}
// 进行最佳模式的运动补偿,这样可以获得重建帧,用于后续的帧间预测
motionCompensation(cu, pu, *predYuv, true, bChromaMC);
}
interMode.sa8dBits += totalmebits;
}
2.2.1.1 对子PU评估merge模式(mergeEstimation)
没有太多需要注释的地方,这里主要的一个区别是计算损失时使用的是SATD
/* estimation of best merge coding of an inter PU (2Nx2N merge PUs are evaluated as their own mode) */
uint32_t Search::mergeEstimation(CUData& cu, const CUGeom& cuGeom, const PredictionUnit& pu, int puIdx, MergeData& m)
{
// 2Nx2N的块不会使用当前这个函数
X265_CHECK(cu.m_partSize[0] != SIZE_2Nx2N, "mergeEstimation() called for 2Nx2N\n");
MVField candMvField[MRG_MAX_NUM_CANDS][2];
uint8_t candDir[MRG_MAX_NUM_CANDS];
uint32_t numMergeCand = cu.getInterMergeCandidates(pu.puAbsPartIdx, puIdx, candMvField, candDir);
if (cu.isBipredRestriction())
{
/* do not allow bidir merge candidates if PU is smaller than 8x8, drop L1 reference */
for (uint32_t mergeCand = 0; mergeCand < numMergeCand; ++mergeCand)
{
if (candDir[mergeCand] == 3)
{
candDir[mergeCand] = 1;
candMvField[mergeCand][1].refIdx = REF_NOT_VALID;
}
}
}
Yuv& tempYuv = m_rqt[cuGeom.depth].tmpPredYuv;
uint32_t outCost = MAX_UINT;
// 遍历merge候选列表,从中寻找到一个最佳的模式
for (uint32_t mergeCand = 0; mergeCand < numMergeCand; ++mergeCand)
{
/* Prevent TMVP candidates from using unavailable reference pixels */
if (m_bFrameParallel) // 是否允许帧级并行
{
// Parallel slices bound check
if (m_param->maxSlices > 1)
{
if (cu.m_bFirstRowInSlice &
((candMvField[mergeCand][0].mv.y < (2 * 4)) | (candMvField[mergeCand][1].mv.y < (2 * 4))))
continue;
// Last row in slice can't reference beyond bound since it is another slice area
// TODO: we may beyond bound in future since these area have a chance to finish because we use parallel slices. Necessary prepare research on load balance
if (cu.m_bLastRowInSlice &&
((candMvField[mergeCand][0].mv.y > -3 * 4) | (candMvField[mergeCand][1].mv.y > -3 * 4)))
continue;
}
if (candMvField[mergeCand][0].mv.y >= (m_param->searchRange + 1) * 4 ||
candMvField[mergeCand][1].mv.y >= (m_param->searchRange + 1) * 4)
continue;
}
cu.m_mv[0][pu.puAbsPartIdx] = candMvField[mergeCand][0].mv;
cu.m_refIdx[0][pu.puAbsPartIdx] = (int8_t)candMvField[mergeCand][0].refIdx;
cu.m_mv[1][pu.puAbsPartIdx] = candMvField[mergeCand][1].mv;
cu.m_refIdx[1][pu.puAbsPartIdx] = (int8_t)candMvField[mergeCand][1].refIdx;
// 运动补偿,获得预测块
motionCompensation(cu, pu, tempYuv, true, m_me.bChromaSATD);
// 计算的是SATD
uint32_t costCand = m_me.bufSATD(tempYuv.getLumaAddr(pu.puAbsPartIdx), tempYuv.m_size);
if (m_me.bChromaSATD)
costCand += m_me.bufChromaSATD(tempYuv, pu.puAbsPartIdx);
uint32_t bitsCand = getTUBits(mergeCand, numMergeCand);
costCand = costCand + m_rdCost.getCost(bitsCand);
if (costCand < outCost)
{
outCost = costCand;
m.bits = bitsCand;
m.index = mergeCand;
}
}
m.mvField[0] = candMvField[m.index][0];
m.mvField[1] = candMvField[m.index][1];
m.dir = candDir[m.index];
return outCost;
}
2.2.1.2 AMPV的实现
与Merge模式类似,AVMP也是从可用的空域相邻参考块和时域参考块中提取MV,其步骤大致为
(1)获取相邻可用MV(getNeighbourMV)
(2)构建AMVP列表(getPMV)
(3)从AMVP列表中选择最佳候选模式(selectPMV)
/* Constructs a list of candidates for AMVP, and a larger list of motion candidates */
void CUData::getNeighbourMV(uint32_t puIdx, uint32_t absPartIdx, InterNeighbourMV* neighbours) const
{
// Set the temporal neighbour to unavailable by default.
neighbours[MD_COLLOCATED].unifiedRef = -1;
uint32_t partIdxLT, partIdxRT, partIdxLB = deriveLeftBottomIdx(puIdx);
deriveLeftRightTopIdx(puIdx, partIdxLT, partIdxRT);
// Load the spatial MVs.
// 读取空域上可用块的MV
getInterNeighbourMV(neighbours + MD_BELOW_LEFT, partIdxLB, MD_BELOW_LEFT);
getInterNeighbourMV(neighbours + MD_LEFT, partIdxLB, MD_LEFT);
getInterNeighbourMV(neighbours + MD_ABOVE_RIGHT,partIdxRT, MD_ABOVE_RIGHT);
getInterNeighbourMV(neighbours + MD_ABOVE, partIdxRT, MD_ABOVE);
getInterNeighbourMV(neighbours + MD_ABOVE_LEFT, partIdxLT, MD_ABOVE_LEFT);
// 寻找时间域上可用块的MV
if (m_slice->m_sps->bTemporalMVPEnabled)
{
uint32_t absPartAddr = m_absIdxInCTU + absPartIdx;
uint32_t partIdxRB = deriveRightBottomIdx(puIdx);
// co-located RightBottom temporal predictor (H)
int ctuIdx = -1;
// image boundary check
if (m_encData->getPicCTU(m_cuAddr)->m_cuPelX + g_zscanToPelX[partIdxRB] + UNIT_SIZE < m_slice->m_sps->picWidthInLumaSamples &&
m_encData->getPicCTU(m_cuAddr)->m_cuPelY + g_zscanToPelY[partIdxRB] + UNIT_SIZE < m_slice->m_sps->picHeightInLumaSamples)
{
uint32_t absPartIdxRB = g_zscanToRaster[partIdxRB];
uint32_t numUnits = s_numPartInCUSize;
bool bNotLastCol = lessThanCol(absPartIdxRB, numUnits - 1); // is not at the last column of CTU
bool bNotLastRow = lessThanRow(absPartIdxRB, numUnits - 1); // is not at the last row of CTU
if (bNotLastCol && bNotLastRow)
{
absPartAddr = g_rasterToZscan[absPartIdxRB + RASTER_SIZE + 1];
ctuIdx = m_cuAddr;
}
else if (bNotLastCol)
absPartAddr = g_rasterToZscan[(absPartIdxRB + 1) & (numUnits - 1)];
else if (bNotLastRow)
{
absPartAddr = g_rasterToZscan[absPartIdxRB + RASTER_SIZE - numUnits + 1];
ctuIdx = m_cuAddr + 1;
}
else // is the right bottom corner of CTU
absPartAddr = 0;
}
if (!(ctuIdx >= 0 && getCollocatedMV(ctuIdx, absPartAddr, neighbours + MD_COLLOCATED)))
{
uint32_t partIdxCenter = deriveCenterIdx(puIdx);
uint32_t curCTUIdx = m_cuAddr;
// 获取参考块的MV
getCollocatedMV(curCTUIdx, partIdxCenter, neighbours + MD_COLLOCATED);
}
}
}
构建AMVP列表的方式如下,简单来说,首先填充空域上的候选列表,其次填充时域上的后续选列表,最后如果列表长度不足2个,则填充0
// Create the PMV list. Called for each reference index.
int CUData::getPMV(InterNeighbourMV *neighbours, uint32_t picList, uint32_t refIdx, MV* amvpCand, MV* pmv) const
{
// Direct MVP表示直接从相邻块的运动矢量信息中获取候选运动矢量
MV directMV[MD_ABOVE_LEFT + 1];
// Indirect MVP表示scaled的运动矢量,这是因为相邻MV和当前PU指向的参考帧不是同一个参考帧,需要进行scale
MV indirectMV[MD_ABOVE_LEFT + 1];
bool validDirect[MD_ABOVE_LEFT + 1];
bool validIndirect[MD_ABOVE_LEFT + 1];
// Left candidate.
validDirect[MD_BELOW_LEFT] = getDirectPMV(directMV[MD_BELOW_LEFT], neighbours + MD_BELOW_LEFT, picList, refIdx);
validDirect[MD_LEFT] = getDirectPMV(directMV[MD_LEFT], neighbours + MD_LEFT, picList, refIdx);
// Top candidate.
validDirect[MD_ABOVE_RIGHT] = getDirectPMV(directMV[MD_ABOVE_RIGHT], neighbours + MD_ABOVE_RIGHT, picList, refIdx);
validDirect[MD_ABOVE] = getDirectPMV(directMV[MD_ABOVE], neighbours + MD_ABOVE, picList, refIdx);
validDirect[MD_ABOVE_LEFT] = getDirectPMV(directMV[MD_ABOVE_LEFT], neighbours + MD_ABOVE_LEFT, picList, refIdx);
// Left candidate.
validIndirect[MD_BELOW_LEFT] = getIndirectPMV(indirectMV[MD_BELOW_LEFT], neighbours + MD_BELOW_LEFT, picList, refIdx);
validIndirect[MD_LEFT] = getIndirectPMV(indirectMV[MD_LEFT], neighbours + MD_LEFT, picList, refIdx);
// Top candidate.
validIndirect[MD_ABOVE_RIGHT] = getIndirectPMV(indirectMV[MD_ABOVE_RIGHT], neighbours + MD_ABOVE_RIGHT, picList, refIdx);
validIndirect[MD_ABOVE] = getIndirectPMV(indirectMV[MD_ABOVE], neighbours + MD_ABOVE, picList, refIdx);
validIndirect[MD_ABOVE_LEFT] = getIndirectPMV(indirectMV[MD_ABOVE_LEFT], neighbours + MD_ABOVE_LEFT, picList, refIdx);
/*
1.填充空域可用相邻块的MV,读取的顺序为 A0 -> A1 -> B0 -> B1 -> B2
+--+ +--+--+
|B2| |B1|B0|
+--+--+--+--+--+--+--+
| |
+ +
| Current |
+ PU +
| |
+--+ +
|A1| |
+--+--+--+--+--+
|A0|
+--+
*/
int num = 0;
// Left predictor search
if (validDirect[MD_BELOW_LEFT])
amvpCand[num++] = directMV[MD_BELOW_LEFT];
else if (validDirect[MD_LEFT])
amvpCand[num++] = directMV[MD_LEFT];
else if (validIndirect[MD_BELOW_LEFT])
amvpCand[num++] = indirectMV[MD_BELOW_LEFT];
else if (validIndirect[MD_LEFT])
amvpCand[num++] = indirectMV[MD_LEFT];
bool bAddedSmvp = num > 0;
// Above predictor search
if (validDirect[MD_ABOVE_RIGHT])
amvpCand[num++] = directMV[MD_ABOVE_RIGHT];
else if (validDirect[MD_ABOVE])
amvpCand[num++] = directMV[MD_ABOVE];
else if (validDirect[MD_ABOVE_LEFT])
amvpCand[num++] = directMV[MD_ABOVE_LEFT];
if (!bAddedSmvp)
{
if (validIndirect[MD_ABOVE_RIGHT])
amvpCand[num++] = indirectMV[MD_ABOVE_RIGHT];
else if (validIndirect[MD_ABOVE])
amvpCand[num++] = indirectMV[MD_ABOVE];
else if (validIndirect[MD_ABOVE_LEFT])
amvpCand[num++] = indirectMV[MD_ABOVE_LEFT];
}
int numMvc = 0;
for (int dir = MD_LEFT; dir <= MD_ABOVE_LEFT; dir++)
{
if (validDirect[dir] && directMV[dir].notZero())
pmv[numMvc++] = directMV[dir];
if (validIndirect[dir] && indirectMV[dir].notZero())
pmv[numMvc++] = indirectMV[dir];
}
if (num == 2)
num -= amvpCand[0] == amvpCand[1];
// Get the collocated candidate. At this step, either the first candidate
// was found or its value is 0.
// 2.填充时域上可用块的MV
if (m_slice->m_sps->bTemporalMVPEnabled && num < 2)
{
int tempRefIdx = neighbours[MD_COLLOCATED].refIdx[picList];
if (tempRefIdx != -1)
{
uint32_t cuAddr = neighbours[MD_COLLOCATED].cuAddr[picList];
const Frame* colPic = m_slice->m_refFrameList[m_slice->isInterB() && !m_slice->m_colFromL0Flag][m_slice->m_colRefIdx];
const CUData* colCU = colPic->m_encData->getPicCTU(cuAddr);
// Scale the vector
int colRefPOC = colCU->m_slice->m_refPOCList[tempRefIdx >> 4][tempRefIdx & 0xf];
int colPOC = colCU->m_slice->m_poc;
int curRefPOC = m_slice->m_refPOCList[picList][refIdx];
int curPOC = m_slice->m_poc;
pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
}
}
// 3.如果不满2个,则填充0
while (num < AMVP_NUM_CANDS)
amvpCand[num++] = 0;
return numMvc;
}
从前面已经获取的AMVP列表中选择一个最佳的候选模式,即2选1,使用predInterLumaPixel进行运动补偿,并计算MV带来的损失
/* Pick between the two AMVP candidates which is the best one to use as
* MVP for the motion search, based on SAD cost */
int Search::selectMVP(const CUData& cu, const PredictionUnit& pu, const MV amvp[AMVP_NUM_CANDS], int list, int ref)
{
if (amvp[0] == amvp[1])
return 0;
Yuv& tmpPredYuv = m_rqt[cu.m_cuDepth[0]].tmpPredYuv;
uint32_t costs[AMVP_NUM_CANDS];
for (int i = 0; i < AMVP_NUM_CANDS; i++)
{
MV mvCand = amvp[i];
// NOTE: skip mvCand if Y is > merange and -FN>1
if (m_bFrameParallel)
{
costs[i] = m_me.COST_MAX;
if (mvCand.y >= (m_param->searchRange + 1) * 4)
continue;
if ((m_param->maxSlices > 1) &
((mvCand.y < m_sliceMinY)
| (mvCand.y > m_sliceMaxY)))
continue;
}
cu.clipMv(mvCand);
// 执行帧间搜索,并计算对应MV带来的损失
predInterLumaPixel(pu, tmpPredYuv, *m_slice->m_refReconPicList[list][ref], mvCand);
costs[i] = m_me.bufSAD(tmpPredYuv.getLumaAddr(pu.puAbsPartIdx), tmpPredYuv.m_size);
}
return (costs[0] <= costs[1]) ? 0 : 1;
}
2.2.1.3 进行运动估计(motionEstimate)
运动估计(下面简称ME)是帧间预测的核心部分,主要完成了确认最佳MV和最佳损失的功能,其主要的步骤为
(1)依据前面获取的相邻可用MV,计算每个MV对应的损失,计算的是亚像素级别的SAD损失(subpelCompare)
(2)进行ME
ME主要使用的是菱形搜索和六边形搜索,在获取了full pixel的best mv之后,会与neighbour mv进行对比。如果ME获取的best mv损失更小,则继续进行亚像素ME;否则,直接使用neighbour mv
(3)进行亚像素ME
亚像素ME使用的是SATD,先进行1/2像素ME,随后进行1/4像素的ME
PS:需要注意的是,在进行整像素ME,计算的图像数据由fenc给出;在进行亚像素ME时,使用的是插值之后的ref图像数据
int MotionEstimate::motionEstimate(ReferencePlanes *ref,
const MV & mvmin,
const MV & mvmax,
const MV & qmvp,
int numCandidates,
const MV * mvc,
int merange,
MV & outQMv,
uint32_t maxSlices,
pixel * srcReferencePlane)
{
ALIGN_VAR_16(int, costs[16]);
bool hme = srcReferencePlane && srcReferencePlane == ref->fpelLowerResPlane[0];
if (ctuAddr >= 0)
blockOffset = ref->reconPic->getLumaAddr(ctuAddr, absPartIdx) - ref->reconPic->getLumaAddr(0);
intptr_t stride = hme ? ref->lumaStride / 2 : ref->lumaStride;
pixel* fenc = fencPUYuv.m_buf[0];
pixel* fref = srcReferencePlane == 0 ? ref->fpelPlane[0] + blockOffset : srcReferencePlane + blockOffset;
// qmvp表示前面AMVP当中最佳候选模式,这里设置为初始mv,也就是运动搜索的起点
setMVP(qmvp);
MV qmvmin = mvmin.toQPel(); // 转换成1/4像素
MV qmvmax = mvmax.toQPel();
/* The term cost used here means satd/sad values for that particular search.
* The costs used in ME integer search only includes the SAD cost of motion
* residual and sqrtLambda times MVD bits. The subpel refine steps use SATD
* cost of residual and sqrtLambda * MVD bits. Mode decision will be based
* on video distortion cost (SSE/PSNR) plus lambda times all signaling bits
* (mode + MVD bits). */
// measure SAD cost at clipped QPEL MVP
// 根据min和max值进行clip
MV pmv = qmvp.clipped(qmvmin, qmvmax);
MV bestpre = pmv;
int bprecost;
if (ref->isLowres)
bprecost = ref->lowresQPelCost(fenc, blockOffset, pmv, sad, hme);
else // 对于cliped的AVMP最佳候选模式,进行亚像素级别的运动估计,获得初始损失值
bprecost = subpelCompare(ref, pmv, sad);
/* re-measure full pel rounded MVP with SAD as search start point */
MV bmv = pmv.roundToFPel();
int bcost = bprecost;
if (pmv.isSubpel())
bcost = sad(fenc, FENC_STRIDE, fref + bmv.x + bmv.y * stride, stride) + mvcost(bmv << 2);
// measure SAD cost at MV(0) if MVP is not zero
if (pmv.notZero())
{
// 如果MVP不为0,则计算零矢量的损失,与当前pmv的损失进行对比
int cost = sad(fenc, FENC_STRIDE, fref, stride) + mvcost(MV(0, 0));
if (cost < bcost)
{
bcost = cost;
bmv = 0;
bmv.y = X265_MAX(X265_MIN(0, mvmax.y), mvmin.y);
}
}
X265_CHECK(!(ref->isLowres && numCandidates), "lowres motion candidates not allowed\n")
// measure SAD cost at each QPEL motion vector candidate
// 1.遍历MV候选列表(mvc),随后计算每个MV对应的损失
for (int i = 0; i < numCandidates; i++)
{
MV m = mvc[i].clipped(qmvmin, qmvmax);
if (m.notZero() & (m != pmv ? 1 : 0) & (m != bestpre ? 1 : 0)) // check already measured
{
// mvcost返回的是MVD消耗的比特数,已经乘以lambda
int cost = subpelCompare(ref, m, sad) + mvcost(m);
if (cost < bprecost)
{
bprecost = cost;
bestpre = m;
}
}
}
pmv = pmv.roundToFPel();
MV omv = bmv; // current search origin or starting point
// 2.进行运动搜索
int search = ref->isHMELowres ? (hme ? searchMethodL0 : searchMethodL1) : searchMethod;
switch (search)
{
case X265_DIA_SEARCH:
{
/* diamond search, radius 1 */
/*
使用钻石(菱形)搜索,半径为1,搜索的顺序如下,其中0为起始点
1
3 0 4
2
*/
bcost <<= 4;
int i = merange;
do
{
/*
COST_MV_X4_DIR的定义为
#define COST_MV_X4_DIR(m0x, m0y, m1x, m1y, m2x, m2y, m3x, m3y, costs) \
{ \
pixel *pix_base = fref + bmv.x + bmv.y * stride; \
sad_x4(fenc, \
pix_base + (m0x) + (m0y) * stride, \
pix_base + (m1x) + (m1y) * stride, \
pix_base + (m2x) + (m2y) * stride, \
pix_base + (m3x) + (m3y) * stride, \
stride, costs); \
(costs)[0] += mvcost((bmv + MV(m0x, m0y)) << 2); \ // 上,MV(0, -1), lambda * R0
(costs)[1] += mvcost((bmv + MV(m1x, m1y)) << 2); \ // 下,MV(0, 1), lambda * R1
(costs)[2] += mvcost((bmv + MV(m2x, m2y)) << 2); \ // 左,MV(-1, 0), lambda * R2
(costs)[3] += mvcost((bmv + MV(m3x, m3y)) << 2); \ // 右,MV(1, 0), lambda * R3
}
上面的计算方式为: SAD + lambda * R,其中mvcost返回的是当前mvd消耗的比特数,已经乘以了lambda
*/
COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs);
/*
#define COPY1_IF_LT(x, y) {if ((y) < (x)) (x) = (y);}
下面的代码用于存储最佳的cost和对应的位置
(1)先将cost左移4位,空出来低4位,用于存储最佳搜索点位置
(2)由于bcost前面已经左移4位,所以直接比较就能获取最佳cost
(3)获取最佳cost之后,计算对应的mv
COPY1_IF_LT(bcost, (costs[0] << 4) + 1); // 1 -> 0001,表示上方
COPY1_IF_LT(bcost, (costs[1] << 4) + 3); // 3 -> 0011,表示下方
COPY1_IF_LT(bcost, (costs[2] << 4) + 4); // 4 -> 0100,表示左侧
COPY1_IF_LT(bcost, (costs[3] << 4) + 12); // 12 -> 1100,表示右侧
*/
if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y)) // 检查是否超出上边界
COPY1_IF_LT(bcost, (costs[0] << 4) + 1);
if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y)) // 检查是否超出下边界
COPY1_IF_LT(bcost, (costs[1] << 4) + 3);
COPY1_IF_LT(bcost, (costs[2] << 4) + 4);
COPY1_IF_LT(bcost, (costs[3] << 4) + 12);
// 检查后4位是否为0,如果为0,说明基于原点(0,0)的搜索带来的损失最小,就不用继续搜索了
if (!(bcost & 15))
break;
/*
举例如下,如果当前已经确认最好的块为右侧块,即MV(1, 0),则低4位为1100
(bcost << 28) >> 30 = 11,bmv.x -= -1,水平方向,向右移动一个单位(这里二进制的11表示-1)
(bcost << 30) >> 30 = 00,bmv.y -= 00,垂直方向,不移动
*/
bmv.x -= (bcost << 28) >> 30; // bcost是int类型,一共32位,先左移28位,保留了最低4位,随后右移30位,取出了x坐标
bmv.y -= (bcost << 30) >> 30; // 先左移30位,保留低2位,再右移30位,取出了y坐标
// 将后4位置0,取出最佳cost(如果要取出原始的cost,还需要向右移动4位)
bcost &= ~15;
}
while (--i && bmv.checkRange(mvmin, mvmax)); // 检查是否超出了运动搜索范围或者超出了mv的范围
bcost >>= 4; // 向右移动4位,此时是真实的最佳的cost
break;
}
case X265_HEX_SEARCH: // 六边形搜索,半径为2
{
me_hex2:
/* hexagon search, radius 2 */
/*
六边形搜索的顺序如下,其中0为初始位置
2 3
1 0 4
5 6
*/
#if 0
for (int i = 0; i < merange / 2; i++)
{
omv = bmv;
COST_MV(omv.x - 2, omv.y);
COST_MV(omv.x - 1, omv.y + 2);
COST_MV(omv.x + 1, omv.y + 2);
COST_MV(omv.x + 2, omv.y);
COST_MV(omv.x + 1, omv.y - 2);
COST_MV(omv.x - 1, omv.y - 2);
if (omv == bmv)
break;
if (!bmv.checkRange(mvmin, mvmax))
break;
}
#else // if 0
/* equivalent to the above, but eliminates duplicate candidates */
/*
COST_MV_X3_DIR的定义如下,与前面很类似,只不过这里是一次性计算3个点
#define COST_MV_X3_DIR(m0x, m0y, m1x, m1y, m2x, m2y, costs) \
{ \
pixel *pix_base = fref + bmv.x + bmv.y * stride; \
sad_x3(fenc, \
pix_base + (m0x) + (m0y) * stride, \
pix_base + (m1x) + (m1y) * stride, \
pix_base + (m2x) + (m2y) * stride, \
stride, costs); \
(costs)[0] += mvcost((bmv + MV(m0x, m0y)) << 2); \
(costs)[1] += mvcost((bmv + MV(m1x, m1y)) << 2); \
(costs)[2] += mvcost((bmv + MV(m2x, m2y)) << 2); \
}
*/
COST_MV_X3_DIR(-2, 0, -1, 2, 1, 2, costs);
bcost <<= 3;
if ((bmv.y >= mvmin.y) & (bmv.y <= mvmax.y))
COPY1_IF_LT(bcost, (costs[0] << 3) + 2); // 1号位置
if ((bmv.y + 2 >= mvmin.y) & (bmv.y + 2 <= mvmax.y))
{
COPY1_IF_LT(bcost, (costs[1] << 3) + 3); // 2号位置
COPY1_IF_LT(bcost, (costs[2] << 3) + 4); // 3号位置
}
COST_MV_X3_DIR(2, 0, 1, -2, -1, -2, costs);
if ((bmv.y >= mvmin.y) & (bmv.y <= mvmax.y))
COPY1_IF_LT(bcost, (costs[0] << 3) + 5); // 4号位置
if ((bmv.y - 2 >= mvmin.y) & (bmv.y - 2 <= mvmax.y))
{
COPY1_IF_LT(bcost, (costs[1] << 3) + 6); // 5号位置
COPY1_IF_LT(bcost, (costs[2] << 3) + 7); // 6号位置
}
// 最佳损失对应的位置是否位于上述6个位置
if (bcost & 7)
{
int dir = (bcost & 7) - 2; // 记录最佳位置
// const MV hex2[8] = { MV(-1, -2), MV(-2, 0), MV(-1, 2), MV(1, 2), MV(2, 0), MV(1, -2), MV(-1, -2), MV(-2, 0) };
if ((bmv.y + hex2[dir + 1].y >= mvmin.y) & (bmv.y + hex2[dir + 1].y <= mvmax.y))
{
bmv += hex2[dir + 1]; // 更新bmv位置
/* half hexagon, not overlapping the previous iteration */
// 基于前面搜索的最佳损失位置dir,再进行一次半六边形搜索
for (int i = (merange >> 1) - 1; i > 0 && bmv.checkRange(mvmin, mvmax); i--)
{
/*
假设前面记录的最佳位置为1号位置,即dir = 0,那么
(1)dir + 0 => 5号位置
(2)dir + 1 => 1号位置
(3)dir + 2 => 2号位置
*/
COST_MV_X3_DIR(hex2[dir + 0].x, hex2[dir + 0].y,
hex2[dir + 1].x, hex2[dir + 1].y,
hex2[dir + 2].x, hex2[dir + 2].y,
costs);
bcost &= ~7;
if ((bmv.y + hex2[dir + 0].y >= mvmin.y) & (bmv.y + hex2[dir + 0].y <= mvmax.y))
COPY1_IF_LT(bcost, (costs[0] << 3) + 1);
if ((bmv.y + hex2[dir + 1].y >= mvmin.y) & (bmv.y + hex2[dir + 1].y <= mvmax.y))
COPY1_IF_LT(bcost, (costs[1] << 3) + 2);
if ((bmv.y + hex2[dir + 2].y >= mvmin.y) & (bmv.y + hex2[dir + 2].y <= mvmax.y))
COPY1_IF_LT(bcost, (costs[2] << 3) + 3);
if (!(bcost & 7))
break;
dir += (bcost & 7) - 2;
dir = mod6m1[dir + 1];
bmv += hex2[dir + 1];
}
} // if ((bmv.y + hex2[dir + 1].y >= mvmin.y) & (bmv.y + hex2[dir + 1].y <= mvmax.y))
}
bcost >>= 3; // 获取真实的最佳损失
#endif // if 0
/* square refine */
// 进行正方形搜索,获取更加精细的MV
/*
正方形搜索的顺序为
6 2 7
3 0 4
5 1 8
*/
int dir = 0;
COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs);
if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
COPY2_IF_LT(bcost, costs[0], dir, 1);
if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
COPY2_IF_LT(bcost, costs[1], dir, 2);
COPY2_IF_LT(bcost, costs[2], dir, 3);
COPY2_IF_LT(bcost, costs[3], dir, 4);
COST_MV_X4_DIR(-1, -1, -1, 1, 1, -1, 1, 1, costs);
if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
COPY2_IF_LT(bcost, costs[0], dir, 5);
if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
COPY2_IF_LT(bcost, costs[1], dir, 6);
if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
COPY2_IF_LT(bcost, costs[2], dir, 7);
if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
COPY2_IF_LT(bcost, costs[3], dir, 8);
// const MV square1[9] = { MV(0, 0), MV(0, -1), MV(0, 1), MV(-1, 0), MV(1, 0), MV(-1, -1), MV(-1, 1), MV(1, -1), MV(1, 1) };
bmv += square1[dir];
break;
}
case X265_UMH_SEARCH: // 非对称十字多边形搜索(比较复杂,没研究过)
{
// ...
}
case X265_STAR_SEARCH: // Adapted from HM ME
{ // 星型搜索(slow及更慢的档位会使用,没有研究)
// ...
}
case X265_SEA: // Successive Elimination Algorithm
{
// ...
}
case X265_FULL_SEARCH: // 全搜索
{
// ...
}
default:
X265_CHECK(0, "invalid motion estimate mode\n");
break;
}
/*
3.进行亚像素搜索
检查相邻块MV的最佳cost与运动搜索出来的最佳cost关系
(1)如果相邻块MV的性能更好,即bprecost < bcost,则抛弃当前搜索出来的mv,使用相邻块的mv
(2)否则,使用当前搜索出来的mv进行后续的亚像素搜索
*/
if (bprecost < bcost)
{
bmv = bestpre;
bcost = bprecost;
}
else
bmv = bmv.toQPel(); // promote search bmv to qpel
const SubpelWorkload& wl = workload[this->subpelRefine];
// check mv range for slice bound
// 检查mv是否超出了slice边界,一般配置下一个slice就是一帧,这种情况出现的概率应该比较低
if ((maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
{
bmv.y = x265_min(x265_max(bmv.y, qmvmin.y), qmvmax.y);
bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
}
if (!bcost) // 没有损失,直接跳过子像素搜索,此时返回的cost只包括比特开销
{
/* if there was zero residual at the clipped MVP, we can skip subpel
* refine, but we do need to include the mvcost in the returned cost */
bcost = mvcost(bmv);
}
else if (ref->isLowres) // 低分辨率图像
{
// ..
}
else
{
pixelcmp_t hpelcomp;
// 检查使用satd还是使用sad衡量损失(默认应该是satd)
if (wl.hpel_satd)
{
bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
hpelcomp = satd;
}
else
hpelcomp = sad;
// 进行1/2像素运动搜索
for (int iter = 0; iter < wl.hpel_iters; iter++)
{
int bdir = 0;
for (int i = 1; i <= wl.hpel_dirs; i++)
{
// 按照正方形方式进行搜索
MV qmv = bmv + square1[i] * 2;
// check mv range for slice bound
if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
continue;
// 计算损失并确认最佳的MV
int cost = subpelCompare(ref, qmv, hpelcomp) + mvcost(qmv);
COPY2_IF_LT(bcost, cost, bdir, i);
}
if (bdir)
bmv += square1[bdir] * 2;
else
break;
}
/* if HPEL search used SAD, remeasure with SATD before QPEL */
// 如果半像素搜索使用了SAD,那么需要在进行评估1/4像素之前重新使用SATD计算一边,因为1/4像素搜索使用的是SATD
if (!wl.hpel_satd)
bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
// 进行1/4像素运动搜索
for (int iter = 0; iter < wl.qpel_iters; iter++)
{
int bdir = 0;
for (int i = 1; i <= wl.qpel_dirs; i++)
{
MV qmv = bmv + square1[i];
// check mv range for slice bound
if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
continue;
int cost = subpelCompare(ref, qmv, satd) + mvcost(qmv);
COPY2_IF_LT(bcost, cost, bdir, i);
}
if (bdir)
bmv += square1[bdir];
else
break;
}
}
// check mv range for slice bound
X265_CHECK(((bmv.y >= qmvmin.y) & (bmv.y <= qmvmax.y)), "mv beyond range!");
x265_emms();
outQMv = bmv;
return bcost;
}
2.3 P帧当中的Intra模式(checkIntraInInter)
如果Inter模式带来的损失值比较大,P帧当中的一些块也有可能会使用Intra模式,整体的流程基本和帧内预测一致
/* Note that this function does not save the best intra prediction, it must
* be generated later. It records the best mode in the cu */
void Search::checkIntraInInter(Mode& intraMode, const CUGeom& cuGeom)
{
ProfileCUScope(intraMode.cu, intraAnalysisElapsedTime, countIntraAnalysis);
CUData& cu = intraMode.cu;
uint32_t depth = cuGeom.depth;
cu.setPartSizeSubParts(SIZE_2Nx2N);
cu.setPredModeSubParts(MODE_INTRA);
const uint32_t initTuDepth = 0;
uint32_t log2TrSize = cuGeom.log2CUSize - initTuDepth;
uint32_t tuSize = 1 << log2TrSize;
const uint32_t absPartIdx = 0;
// Reference sample smoothing
IntraNeighbors intraNeighbors;
initIntraNeighbors(cu, absPartIdx, initTuDepth, true, &intraNeighbors);
initAdiPattern(cu, cuGeom, absPartIdx, intraNeighbors, ALL_IDX);
const pixel* fenc = intraMode.fencYuv->m_buf[0];
uint32_t stride = intraMode.fencYuv->m_size;
int sad, bsad;
uint32_t bits, bbits, mode, bmode;
uint64_t cost, bcost;
// 33 Angle modes once
int scaleTuSize = tuSize;
int scaleStride = stride;
int costShift = 0;
int sizeIdx = log2TrSize - 2;
if (tuSize > 32) // CU尺寸是否为64
{
// CU is 64x64, we scale to 32x32 and adjust required parameters
primitives.scale2D_64to32(m_fencScaled, fenc, stride);
fenc = m_fencScaled;
pixel nScale[129];
intraNeighbourBuf[1][0] = intraNeighbourBuf[0][0];
primitives.scale1D_128to64[NONALIGNED](nScale + 1, intraNeighbourBuf[0] + 1);
// we do not estimate filtering for downscaled samples
memcpy(&intraNeighbourBuf[0][1], &nScale[1], 2 * 64 * sizeof(pixel)); // Top & Left pixels
memcpy(&intraNeighbourBuf[1][1], &nScale[1], 2 * 64 * sizeof(pixel));
scaleTuSize = 32;
scaleStride = 32;
costShift = 2;
sizeIdx = 5 - 2; // log2(scaleTuSize) - 2
}
pixelcmp_t sa8d = primitives.cu[sizeIdx].sa8d;
int predsize = scaleTuSize * scaleTuSize;
m_entropyCoder.loadIntraDirModeLuma(m_rqt[depth].cur);
/* there are three cost tiers for intra modes:
* pred[0] - mode probable, least cost
* pred[1], pred[2] - less probable, slightly more cost
* non-mpm modes - all cost the same (rbits) */
// 初始化MPM
uint64_t mpms;
uint32_t mpmModes[3];
uint32_t rbits = getIntraRemModeBits(cu, absPartIdx, mpmModes, mpms);
// DC
// 进行DC模式的预测
primitives.cu[sizeIdx].intra_pred[DC_IDX](m_intraPredAngs, scaleStride, intraNeighbourBuf[0], 0, (scaleTuSize <= 16));
bsad = sa8d(fenc, scaleStride, m_intraPredAngs, scaleStride) << costShift;
bmode = mode = DC_IDX;
bbits = (mpms & ((uint64_t)1 << mode)) ? m_entropyCoder.bitsIntraModeMPM(mpmModes, mode) : rbits;
bcost = m_rdCost.calcRdSADCost(bsad, bbits);
// PLANAR
// 进行Planar模式的预测
pixel* planar = intraNeighbourBuf[0];
if (tuSize & (8 | 16 | 32))
planar = intraNeighbourBuf[1];
primitives.cu[sizeIdx].intra_pred[PLANAR_IDX](m_intraPredAngs, scaleStride, planar, 0, 0);
sad = sa8d(fenc, scaleStride, m_intraPredAngs, scaleStride) << costShift;
mode = PLANAR_IDX;
bits = (mpms & ((uint64_t)1 << mode)) ? m_entropyCoder.bitsIntraModeMPM(mpmModes, mode) : rbits;
cost = m_rdCost.calcRdSADCost(sad, bits);
COPY4_IF_LT(bcost, cost, bmode, mode, bsad, sad, bbits, bits);
bool allangs = true;
if (primitives.cu[sizeIdx].intra_pred_allangs)
{
primitives.cu[sizeIdx].transpose(m_fencTransposed, fenc, scaleStride);
primitives.cu[sizeIdx].intra_pred_allangs(m_intraPredAngs, intraNeighbourBuf[0], intraNeighbourBuf[1], (scaleTuSize <= 16));
}
else
allangs = false;
// 定义角度模式的实现方式
#define TRY_ANGLE(angle) \
if (allangs) { \
if (angle < 18) \
sad = sa8d(m_fencTransposed, scaleTuSize, &m_intraPredAngs[(angle - 2) * predsize], scaleTuSize) << costShift; \
else \
sad = sa8d(fenc, scaleStride, &m_intraPredAngs[(angle - 2) * predsize], scaleTuSize) << costShift; \
bits = (mpms & ((uint64_t)1 << angle)) ? m_entropyCoder.bitsIntraModeMPM(mpmModes, angle) : rbits; \
cost = m_rdCost.calcRdSADCost(sad, bits); \
} else { \
int filter = !!(g_intraFilterFlags[angle] & scaleTuSize); \
primitives.cu[sizeIdx].intra_pred[angle](m_intraPredAngs, scaleTuSize, intraNeighbourBuf[filter], angle, scaleTuSize <= 16); \
sad = sa8d(fenc, scaleStride, m_intraPredAngs, scaleTuSize) << costShift; \
bits = (mpms & ((uint64_t)1 << angle)) ? m_entropyCoder.bitsIntraModeMPM(mpmModes, angle) : rbits; \
cost = m_rdCost.calcRdSADCost(sad, bits); \
}
// 是否允许快速帧内预测
if (m_param->bEnableFastIntra)
{
int asad = 0;
uint32_t lowmode, highmode, amode = 5, abits = 0;
uint64_t acost = MAX_INT64;
/* pick the best angle, sampling at distance of 5 */
for (mode = 5; mode < 35; mode += 5)
{
TRY_ANGLE(mode);
COPY4_IF_LT(acost, cost, amode, mode, asad, sad, abits, bits);
}
/* refine best angle at distance 2, then distance 1 */
for (uint32_t dist = 2; dist >= 1; dist--)
{
lowmode = amode - dist;
highmode = amode + dist;
X265_CHECK(lowmode >= 2 && lowmode <= 34, "low intra mode out of range\n");
TRY_ANGLE(lowmode);
COPY4_IF_LT(acost, cost, amode, lowmode, asad, sad, abits, bits);
X265_CHECK(highmode >= 2 && highmode <= 34, "high intra mode out of range\n");
TRY_ANGLE(highmode);
COPY4_IF_LT(acost, cost, amode, highmode, asad, sad, abits, bits);
}
if (amode == 33)
{
TRY_ANGLE(34);
COPY4_IF_LT(acost, cost, amode, 34, asad, sad, abits, bits);
}
COPY4_IF_LT(bcost, acost, bmode, amode, bsad, asad, bbits, abits);
}
else // calculate and search all intra prediction angles for lowest cost
{
// 遍历35种模式
for (mode = 2; mode < 35; mode++)
{
TRY_ANGLE(mode);
COPY4_IF_LT(bcost, cost, bmode, mode, bsad, sad, bbits, bits);
}
}
cu.setLumaIntraDirSubParts((uint8_t)bmode, absPartIdx, depth + initTuDepth);
intraMode.initCosts();
intraMode.totalBits = bbits;
intraMode.distortion = bsad;
intraMode.sa8dCost = bcost;
intraMode.sa8dBits = bbits;
}
这样x265的帧间预测简单分析就结束了