系列文章目录
【x264编码器】章节1——x264编码流程及基于x264的编码器demo
【x264编码器】章节2——x264的lookahead流程分析
【x265编码器】章节2——编码流程及基于x265的编码器demo
目录
1.用于向前向预测模块添加图片Lookahead::addPicture()
2.检查前向预测队列Lookahead::checkLookaheadQueue()
3.获取已决定的图片Lookahead::getDecidedPicture()
4.查找并执行工作任务Lookahead::findJob()
5.进行类型分析Lookahead::slicetypeDecide()
6.任务分配模块PreLookaheadGroup::processTasks
7.低分辨率帧的帧内估计LookaheadTLD::lowresIntraEstimate()
8.进行类型分析Lookahead::slicetypeAnalyse
9.低分辨率帧间估计CostEstimateGroup::estimateFrameCost
10.低分辨率单个CU帧间估计CostEstimateGroup::estimateCUCost
11.VBV码率和缓冲区Lookahead::vbvLookahead
13.帧结构路径成本计算Lookahead::slicetypePathCost
14.CU tree的构建和处理Lookahead::cuTree
前言
x265完整的流程框架如下:
一、模块功能
在x265中,前向预测(lookahead)是一种技术,用于改善视频编码的效率和质量。x265的前向预测功能涉及分析未来的视频帧,以在当前帧的编码过程中做出更好的决策。同时也会进行帧内预测和帧间预测,不同点有两点:
1.在1/4的低分辨率情况下(宽高各是原始视频的一半),lookahead的cu块大小8x8,跟x264一样,进行帧内和帧间预测;
2.帧间预测时,遍历CU的顺序是从下到上,从右到左,具体可以看estimateFrameCost();
主要的功能有以下四项:场景检测、帧结构确定、CU tree和VBV;
1.场景检测切换
基本流程与x264的scenecut类似,细节有不同,第一轮搜索过滤与x264方案一致,之后的处理与x264的不同,x265场景切换检测的大体流程如下,详细代码分析见Lookahead::scenecut:
对应的场景检测时计算的帧内和帧间coet方式如下,左边是帧内,右边是帧间:
2.帧结构确定
帧结构方案目前有三种,分别是X265_B_ADAPT_NONE、X265_B_ADAPT_FAST和X265_B_ADAPT_TRELLIS,对应的大体流程如下:
X265_B_ADAPT_NONE方案:与x264的方案一致,主要就是按照固定IBBBPBBBP进行展开,对应代码见Lookahead::slicetypeAnalyse;
X265_B_ADAPT_FAST方案:与x264的方案大体一致,但有差异的部分,第一步都会计算BP和PP帧结构类型cost,选取cost最低的帧类型,之后的处理会遍历(i+2,bframes)范围内是否都应为B,假设都是B帧,则在最后添加一个P则,并作为下一轮起始位置,重复往后展开,对应代码见Lookahead::slicetypeAnalyse;
X265_B_ADAPT_TRELLIS方案:与x264方案一致,会保留前面每次计算得到的最优帧结构方案,从而插入0到bframes逐次的B帧,求得当前长度最优方案,之后再这个基础上,计算长度+1的最优方案,不断迭代,直到处理完整个GOP,对应代码见Lookahead::slicetypePath;
3.CU tree
CU tree跟x264的MB tree基本一致,比较简单的解释作用就是:帧与帧之间存在参考的关系,如果被参考的帧拥有更高的质量,那么通过调整一个帧,就可以改善一批帧质量,因此CU tree是根据帧被引用得程度,也可以认为是遗传给了其他帧多少信息,作为衡量该帧的重要性;
因为考虑到遗传是可以累加的,所以采用的逆序遍历的方式进行CU tree中遗传信息的计算,比如b参考p0,则p0的遗传信息对应的公式如下:
遗传信息公式 =(propagate_in + intra_cost * inv_qscales*fps_factor) * (1 - inter_cost / intra_cost) * dist_scale_factor
遗传信息要求正值,所以只有inter_cost<intra_cost即选择了帧间预测的CU块才会有,inter_cost越小,intra_cost越大,越会体现遗传信息的重要,因此遗传信息值越大,采用了帧内预测模式的CU块,遗传信息为0;
propagate_in:为当前帧b帧作为其他帧的参考帧,遗传给其他帧的信息,在计算p0的遗传信息的时候,需要加上作为修正;
dist_scale_factor:距离比例,上面公式需要一个距离来修正;
inv_qscales:量化系数,变得模糊的MB携带的信息更少,理应再加一个修正因素。量化的过程就是把系数除以QStep;
fps_factor:针对可变帧率,这每帧占比的时间不同,占比时间长的帧,理应更重要,因此提出这个参数对公式做修正;
对应代码如下,以及Lookahead::estimateCUPropagate函数中:
/* Estimate the total amount of influence on future quality that could be had if we
* were to improve the reference samples used to inter predict any given CU. */
static void estimateCUPropagateCost(int* dst, const uint16_t* propagateIn, const int32_t* intraCosts, const uint16_t* interCosts,
const int32_t* invQscales, const double* fpsFactor, int len)
{
double fps = *fpsFactor / 256; // range[0.01, 1.00]
for (int i = 0; i < len; i++)
{
int intraCost = intraCosts[i];
int interCost = X265_MIN(intraCosts[i], interCosts[i] & LOWRES_COST_MASK);
double propagateIntra = intraCost * invQscales[i]; // Q16 x Q8.8 = Q24.8
double propagateAmount = (double)propagateIn[i] + propagateIntra * fps; // Q16.0 + Q24.8 x Q0.x = Q25.0
double propagateNum = (double)(intraCost - interCost); // Q32 - Q32 = Q33.0
double propagateDenom = (double)intraCost; // Q32
dst[i] = (int)(propagateAmount * propagateNum / propagateDenom + 0.5);
}
//}
}
根据遗传信息调整QP,公式如下:
其中propagate表示遗传给后续帧的信息;intra表示自身的信息;qcompress对外参数代表调整QP的强度,QP=0表示完成ABR,QP任意调整,QP=1,完全的CBR,固定QP;fpsFactor主要针对可变码率视频,当前帧停留的越久越重要;
对应代码:
void Lookahead::cuTreeFinish(Lowres *frame, double averageDuration, int ref0Distance)
{ //省略多余代码
for (int cuIndex = 0; cuIndex < m_cuCount; cuIndex++)
{ //CU的intracost(MB自身包含的信息)
int intracost = (frame->intraCost[cuIndex] * frame->invQscaleFactor[cuIndex] + 128) >> 8;
if (intracost)
{ //propagateCost(遗传给后续帧的信息)
int propagateCost = (frame->propagateCost[cuIndex] * fpsFactor + 128) >> 8;
double log2_ratio = X265_LOG2(intracost + propagateCost) - X265_LOG2(intracost) + weightdelta;
frame->qpCuTreeOffset[cuIndex] = frame->qpAqOffset[cuIndex] - m_cuTreeStrength * log2_ratio;
}
}
}
4.VBV
与x264基本一致,通过Lookahead::vbvLookahead计算低分辨率情况下的plannedSatd,为实际编码vbv码控时提供数据;
5.lookahead大体流程
与x264基本一致
二、lookahead模块分析
1.调用流程
lookahead流程如下图1,与整体x265的关系如图2 (其中黄色部分):
完整的x265编码流程如下:
2.代码分析
1.用于向前向预测模块添加图片Lookahead::addPicture()
addPicture()方法:该方法由API线程调用,用于向前向预测模块添加图片。
void Lookahead::addPicture(Frame& curFrame, int sliceType)
{ //如果启用了参数analysisLoad且禁用了前向预测(bDisableLookahead),那么会将图片直接添加到输出队列中,并增加m_inputCount计数器。
if (m_param->analysisLoad && m_param->bDisableLookahead)
{
if (!m_filled)
m_filled = true;
m_outputLock.acquire();
m_outputQueue.pushBack(curFrame);
m_outputLock.release();
m_inputCount++;
}
//否则,会调用checkLookaheadQueue()方法来检查输入队列的状态,并将图片添加到前向预测模块中
else
{
checkLookaheadQueue(m_inputCount);
curFrame.m_lowres.sliceType = sliceType;
addPicture(curFrame);
}
}
2.检查前向预测队列Lookahead::checkLookaheadQueue()
用于检查前向预测队列(lookahead queue)的状态。以下是对代码的解释:
void Lookahead::checkLookaheadQueue(int &frameCnt)
{
/* determine if the lookahead is (over) filled enough for frames to begin to
* be consumed by frame encoders */
//如果m_filled为false(即前向预测队列还未填满),则
if (!m_filled)
{ //如果参数bframes和lookaheadDepth都为零,表示使用零延迟模式,此时将m_filled设置为true(表示前向预测队列已满)
if (!m_param->bframes & !m_param->lookaheadDepth)
m_filled = true; /* zero-latency */
//否则,如果已输入的帧数(frameCnt)大于等于前向预测深度(lookaheadDepth)加2加bframes,则将m_filled设置为true(表示前向预测队列已满)
else if (frameCnt >= m_param->lookaheadDepth + 2 + m_param->bframes)
m_filled = true; /* full capacity plus mini-gop lag */
}
m_inputLock.acquire();
//如果存在线程池(m_pool)并且输入队列(m_inputQueue)的大小大于等于m_fullQueueSize,则尝试唤醒一个线程
if (m_pool && m_inputQueue.size() >= m_fullQueueSize)
tryWakeOne();
m_inputLock.release();
}
3.获取已决定的图片Lookahead::getDecidedPicture()
这段代码是前向预测(lookahead)模块中的一部分,用于从输出队列中获取已决定的图片(decided picture)。该方法从输出队列中移除图片,并且只会在没有其他可用图片时阻塞。它只在m_filled为true时开始移除图片,而m_filled在超过前向预测深度的图片已经输入后才设置为true,因此在输出图片被取出之前,slicetypeDecide()应该已经开始运行。第一次slicetypeDecide()显然仍然需要阻塞等待,但之后的slicetypeDecide()将保持领先于编码器(因为每次从输出队列中移除一张图片,就会向输入队列中添加一张图片),并在编码器需要它们之前决定图片的切片类型。以下是对代码的解释:
Frame* Lookahead::getDecidedPicture()
{ //检查m_filled变量是否为true,即是否已经填充了足够的图片到输出队列中
if (m_filled)//表示已经可以从输出队列中获取图片
{ //获取输出锁(m_outputLock)以确保线程安全地访问输出队列
m_outputLock.acquire();
//使用popFront()方法从输出队列中弹出一张图片,并将其赋值给指针out
Frame *out = m_outputQueue.popFront();
//释放输出锁
m_outputLock.release();
//如果成功获取到一张图片(out非空),则将m_inputCount计数器减1,并返回该图片
if (out)
{
m_inputCount--;
return out;
}
//如果未能获取到图片(out为空),则根据参数analysisLoad和bDisableLookahead的设置来判断是否需要运行slicetypeDecide()方法
if (m_param->analysisLoad && m_param->bDisableLookahead)
return NULL;
findJob(-1); /* run slicetypeDecide() if necessary */
m_inputLock.acquire();
//根据slicetypeDecide()方法是否忙碌(m_sliceTypeBusy)来判断是否需要等待输出信号
bool wait = m_outputSignalRequired = m_sliceTypeBusy;
m_inputLock.release();
//如果需要等待输出信号,则调用wait()方法等待信号到来
if (wait)
m_outputSignal.wait();
//再次使用popFront()方法从输出队列中弹出一张图片,并将其赋值给指针out
out = m_outputQueue.popFront();
//如果成功获取到一张图片(out非空),则将m_inputCount计数器减1,并返回该图片
if (out)
m_inputCount--;
return out;
}
else//表示还没有填充足够的图片到输出队列中,此时返回空指针
return NULL;
}
4.查找并执行工作任务Lookahead::findJob()
用于查找并执行工作任务,该方法轮询输入队列的占用情况。如果队列已满,它将运行slicetypeDecide()方法,并将一组帧输出到输出队列中(形成一个mini-gop)。如果调用了flush()方法(意味着不会再收到新的图片),则只要输入队列中还剩下一张图片,就认为输入队列已满。
void Lookahead::findJob(int /*workerThreadID*/)
{
bool doDecide;
//获取输入锁(m_inputLock)以确保线程安全地访问相关变量
m_inputLock.acquire();
//如果输入队列的大小(m_inputQueue.size())大于等于满队列大小(m_fullQueueSize),且切片类型任务不忙碌(!m_sliceTypeBusy)且前向预测模块处于活动状态(m_isActive),则将doDecide、m_sliceTypeBusy都设置为true
if (m_inputQueue.size() >= m_fullQueueSize && !m_sliceTypeBusy && m_isActive)
doDecide = m_sliceTypeBusy = true;
else//否则,将doDecide设置为false,并将m_helpWanted设置为false
doDecide = m_helpWanted = false;
m_inputLock.release();//释放输入锁
if (!doDecide)
return;
//记录切片类型决策的时间和计数
ProfileLookaheadTime(m_slicetypeDecideElapsedTime, m_countSlicetypeDecide);
ProfileScopeEvent(slicetypeDecideEV);
//执行切片类型决策的具体方法(slicetypeDecide())
slicetypeDecide();
//获取输入锁
m_inputLock.acquire();
//如果需要输出信号(m_outputSignalRequired为true),则触发输出信号(m_outputSignal.trigger()),并将m_outputSignalRequired设置为false
if (m_outputSignalRequired)
{
m_outputSignal.trigger();
m_outputSignalRequired = false;
}
m_sliceTypeBusy = false;
m_inputLock.release();//释放输入锁
}
5.进行类型分析Lookahead::slicetypeDecide()
以下是代码的解释:
oid Lookahead::slicetypeDecide()
{ //创建 PreLookaheadGroup 类的实例 pre,并传入当前 Lookahead 对象的引用
PreLookaheadGroup pre(*this);
//创建 Lowres 指针数组 frames 和 Frame 指针数组 list,并将它们初始化为零
Lowres* frames[X265_LOOKAHEAD_MAX + X265_BFRAME_MAX + 4];
Frame* list[X265_BFRAME_MAX + 4];
memset(frames, 0, sizeof(frames));
memset(list, 0, sizeof(list));
//计算最大搜索范围 maxSearch,取 m_param->lookaheadDepth 和 X265_LOOKAHEAD_MAX 中的最小值,并确保至少为 1
int maxSearch = X265_MIN(m_param->lookaheadDepth, X265_LOOKAHEAD_MAX);
maxSearch = X265_MAX(1, maxSearch);
{ //获取输入锁 m_inputLock 的互斥访问权限
ScopedLock lock(m_inputLock);
//获取输入队列中的当前帧 curFrame,并定义整数变量 j
Frame *curFrame = m_inputQueue.first();
int j;
if (m_param->bResetZoneConfig)
{ //遍历 m_param->rc.zones 数组中的每个区域配置
for (int i = 0; i < m_param->rc.zonefileCount; i++)
{ //如果当前帧的 m_poc 等于区域配置的 startFrame,将 m_param 更新为该区域配置的 zoneParam
if (m_param->rc.zones[i].startFrame == curFrame->m_poc)
m_param = m_param->rc.zones[i].zoneParam;
}
}
//遍历 m_param->bframes + 2 次,将当前帧 curFrame 添加到 list 数组中,并将 curFrame 更新为下一帧
for (j = 0; j < m_param->bframes + 2; j++)
{
if (!curFrame) break;
list[j] = curFrame;
curFrame = curFrame->m_next;
}
//将输入队列中的第一帧赋值给 curFrame,将 m_lastNonB 赋值给 frames[0]
curFrame = m_inputQueue.first();
frames[0] = m_lastNonB;
//遍历最大搜索范围 maxSearch 次,将当前帧的低分辨率帧 curFrame->m_lowres 添加到 frames 数组中的相应位置
for (j = 0; j < maxSearch; j++)
{
if (!curFrame) break;
frames[j + 1] = &curFrame->m_lowres;
//如果当前帧的低分辨率帧尚未初始化,将当前帧添加到 pre.m_preframes 数组中,并增加 pre.m_jobTotal 的计数
if (!curFrame->m_lowresInit)
pre.m_preframes[pre.m_jobTotal++] = curFrame;
curFrame = curFrame->m_next;
}
//更新最大搜索范围 maxSearch 为实际遍历的次数
maxSearch = j;
//结束输入锁的使用
}
//如果存在需要进行预分析的帧(pre.m_jobTotal > 0),执行以下操作
/* perform pre-analysis on frames which need it, using a bonded task group */
if (pre.m_jobTotal)
{ //如果线程池 m_pool 存在,尝试将预分析任务与其他任务进行绑定
if (m_pool)
pre.tryBondPeers(*m_pool, pre.m_jobTotal);
//调用 pre.processTasks(-1) 执行预分析任务
pre.processTasks(-1);
//等待所有任务执行完毕
pre.waitForExit();
}
//根据启用淡入区域检测的设置来处理编码器的帧列表
if(m_param->bEnableFades)
{ //初始化一些变量,包括 endIndex、length 和 m_frameVariance 数组
int j, endIndex = 0, length = X265_BFRAME_MAX + 4;
for (j = 0; j < length; j++)
m_frameVariance[j] = -1;
//遍历帧列表 list,将每个帧的低分辨率帧方差(frameVariance)存储在 m_frameVariance 数组中相应位置
for (j = 0; list[j] != NULL; j++)
m_frameVariance[list[j]->m_poc % length] = list[j]->m_lowres.frameVariance;
//根据 m_frameVariance 数组中的值判断是否存在淡入区域。遍历 m_frameVariance 数组的索引 k,并执行以下操作
for (int k = list[0]->m_poc % length; k <= list[j - 1]->m_poc % length; k++)
{ //如果当前索引 k 对应的 m_frameVariance 值为 -1,则跳出循环
if (m_frameVariance[k] == -1)
break;
//如果当前索引 k 大于 0 并且当前 m_frameVariance[k] 大于等于前一个位置的 m_frameVariance 值,或者如果当前索引 k 等于 0 并且当前 m_frameVariance[k] 大于等于 m_frameVariance[length - 1](数组的最后一个元素),则表示进入了淡入区域
if((k > 0 && m_frameVariance[k] >= m_frameVariance[k - 1]) ||
(k == 0 && m_frameVariance[k] >= m_frameVariance[length - 1]))
{
m_isFadeIn = true;
//如果 m_fadeCount 和 m_fadeStart 均为初始值(0 和 -1),则根据当前帧列表中的帧的 POC(Presentation Order Count)值来确定 m_fadeStart 的值
if (m_fadeCount == 0 && m_fadeStart == -1)
{
for(int temp = list[0]->m_poc; temp <= list[j - 1]->m_poc; temp++)
if (k == temp % length) {
m_fadeStart = temp ? temp - 1 : 0;
break;
}
}
//更新 m_fadeCount 的值为 list[endIndex]->m_poc - m_fadeStart,其中 endIndex 是当前帧列表中的索引
m_fadeCount = list[endIndex]->m_poc > m_fadeStart ? list[endIndex]->m_poc - m_fadeStart : 0;
endIndex++;
}
else
{ //否则,如果当前已经处于淡入区域,并且 m_fadeCount 大于等于 m_param->fpsNum / m_param->fpsDenom(每秒帧数的分子除以分母),则表示淡入区域已经结束。将 m_lowres.bIsFadeEnd 设置为 true,以指示当前帧是淡入区域的结束帧
if (m_isFadeIn && m_fadeCount >= m_param->fpsNum / m_param->fpsDenom)
{
for (int temp = 0; list[temp] != NULL; temp++)
{
if (list[temp]->m_poc == m_fadeStart + (int)m_fadeCount)
{
list[temp]->m_lowres.bIsFadeEnd = true;
break;
}
}
}
m_isFadeIn = false;
m_fadeCount = 0;
m_fadeStart = -1;
}
//如果当前索引 k 等于数组的最后一个索引(length - 1),则将 k 重置为 -1,以便下一次循环时 k 递增为 0
if (k == length - 1)
k = -1;
}
}
//在满足一定条件时进行帧分析和码率控制相关的操作
/*首先,代码检查了以下条件:*/
if (m_lastNonB &&
((m_param->bFrameAdaptive && m_param->bframes) ||
m_param->rc.cuTree || m_param->scenecutThreshold || m_param->bHistBasedSceneCut ||
(m_param->lookaheadDepth && m_param->rc.vbvBufferSize)))
{ //如果 m_param->rc.bStatRead 为假,则调用 slicetypeAnalyse 函数,对帧进行分析
if (!m_param->rc.bStatRead)
slicetypeAnalyse(frames, false);
//根据一些条件判断是否需要进行 VBV(Video Buffering Verifier)预测
bool bIsVbv = m_param->rc.vbvBufferSize > 0 && m_param->rc.vbvMaxBitrate > 0;
if ((m_param->analysisLoad && m_param->scaleFactor && bIsVbv) || m_param->bliveVBV2pass)
{
int numFrames;
//遍历帧列表 frames,直到达到最大搜索数 maxSearch 或者遇到空帧(即指针为空),每次递增 numFrames。
for (numFrames = 0; numFrames < maxSearch; numFrames++)
{
Lowres *fenc = frames[numFrames + 1];
if (!fenc)
break;
}
//调用 vbvLookahead 函数,传递帧列表 frames、numFrames 和 false 参数,进行 VBV 预测
vbvLookahead(frames, numFrames, false);
}
}
int bframes, brefs;
if (!m_param->analysisLoad || m_param->bAnalysisType == HEVC_INFO)
{
bool isClosedGopRadl = m_param->radl && (m_param->keyframeMax != m_param->keyframeMin);
for (bframes = 0, brefs = 0;; bframes++)
{
Lowres& frm = list[bframes]->m_lowres;
if (frm.sliceType == X265_TYPE_BREF && !m_param->bBPyramid && brefs == m_param->bBPyramid)
{
frm.sliceType = X265_TYPE_B;
x265_log(m_param, X265_LOG_WARNING, "B-ref at frame %d incompatible with B-pyramid\n",
frm.frameNum);
}
/* pyramid with multiple B-refs needs a big enough dpb that the preceding P-frame stays available.
* smaller dpb could be supported by smart enough use of mmco, but it's easier just to forbid it. */
else if (frm.sliceType == X265_TYPE_BREF && m_param->bBPyramid && brefs &&
m_param->maxNumReferences <= (brefs + 3))
{
frm.sliceType = X265_TYPE_B;
x265_log(m_param, X265_LOG_WARNING, "B-ref at frame %d incompatible with B-pyramid and %d reference frames\n",
frm.sliceType, m_param->maxNumReferences);
}//frm.frameNum与上一个关键帧之间的距离是否满足m_param->keyframeMax和m_extendGopBoundary的条件。根据不同的条件,将帧的slice类型更改为X265_TYPE_I或X265_TYPE_IDR
if (((!m_param->bIntraRefresh || frm.frameNum == 0) && frm.frameNum - m_lastKeyframe >= m_param->keyframeMax &&
(!m_extendGopBoundary || frm.frameNum - m_lastKeyframe >= m_param->keyframeMax + m_param->gopLookahead)) ||
(frm.frameNum == (m_param->chunkStart - 1)) || (frm.frameNum == m_param->chunkEnd))
{
if (frm.sliceType == X265_TYPE_AUTO || frm.sliceType == X265_TYPE_I)
frm.sliceType = m_param->bOpenGOP && m_lastKeyframe >= 0 ? X265_TYPE_I : X265_TYPE_IDR;
bool warn = frm.sliceType != X265_TYPE_IDR;
if (warn && m_param->bOpenGOP)
warn &= frm.sliceType != X265_TYPE_I;
if (warn)
{
x265_log(m_param, X265_LOG_WARNING, "specified frame type (%d) at %d is not compatible with keyframe interval\n",
frm.sliceType, frm.frameNum);
frm.sliceType = m_param->bOpenGOP && m_lastKeyframe >= 0 ? X265_TYPE_I : X265_TYPE_IDR;
}
}
if (frm.bIsFadeEnd){
frm.sliceType = m_param->bOpenGOP && m_lastKeyframe >= 0 ? X265_TYPE_I : X265_TYPE_IDR;
}
if (m_param->bResetZoneConfig)
{
for (int i = 0; i < m_param->rc.zonefileCount; i++)
{
int curZoneStart = m_param->rc.zones[i].startFrame;
curZoneStart += curZoneStart ? m_param->rc.zones[i].zoneParam->radl : 0;
if (curZoneStart == frm.frameNum)
frm.sliceType = X265_TYPE_IDR;
}
}
if ((frm.sliceType == X265_TYPE_I && frm.frameNum - m_lastKeyframe >= m_param->keyframeMin) || (frm.frameNum == (m_param->chunkStart - 1)) || (frm.frameNum == m_param->chunkEnd))
{
if (m_param->bOpenGOP)
{
m_lastKeyframe = frm.frameNum;
frm.bKeyframe = true;
}
else
frm.sliceType = X265_TYPE_IDR;
}
if (frm.sliceType == X265_TYPE_IDR && frm.bScenecut && isClosedGopRadl)
{
for (int i = bframes; i < bframes + m_param->radl; i++)
list[i]->m_lowres.sliceType = X265_TYPE_B;
list[(bframes + m_param->radl)]->m_lowres.sliceType = X265_TYPE_IDR;
}
if (frm.sliceType == X265_TYPE_IDR)
{
/* Closed GOP */
m_lastKeyframe = frm.frameNum;
frm.bKeyframe = true;
int zoneRadl = 0;
if (m_param->bResetZoneConfig)
{
for (int i = 0; i < m_param->rc.zonefileCount; i++)
{
int zoneStart = m_param->rc.zones[i].startFrame;
zoneStart += zoneStart ? m_param->rc.zones[i].zoneParam->radl : 0;
if (zoneStart == frm.frameNum)
{
zoneRadl = m_param->rc.zones[i].zoneParam->radl;
m_param->radl = 0;
m_param->rc.zones->zoneParam->radl = i < m_param->rc.zonefileCount - 1 ? m_param->rc.zones[i + 1].zoneParam->radl : 0;
break;
}
}
}
if (bframes > 0 && !m_param->radl && !zoneRadl)
{
list[bframes - 1]->m_lowres.sliceType = X265_TYPE_P;
bframes--;
}
}
if (bframes == m_param->bframes || !list[bframes + 1])
{
if (IS_X265_TYPE_B(frm.sliceType))
x265_log(m_param, X265_LOG_WARNING, "specified frame type is not compatible with max B-frames\n");
if (frm.sliceType == X265_TYPE_AUTO || IS_X265_TYPE_B(frm.sliceType))
frm.sliceType = X265_TYPE_P;
}
if (frm.sliceType == X265_TYPE_BREF)
brefs++;
if (frm.sliceType == X265_TYPE_AUTO)
frm.sliceType = X265_TYPE_B;
else if (!IS_X265_TYPE_B(frm.sliceType))
break;
}
}
else
{
for (bframes = 0, brefs = 0;; bframes++)
{
Lowres& frm = list[bframes]->m_lowres;
if (frm.sliceType == X265_TYPE_BREF)
brefs++;
if ((IS_X265_TYPE_I(frm.sliceType) && frm.frameNum - m_lastKeyframe >= m_param->keyframeMin)
|| (frm.frameNum == (m_param->chunkStart - 1)) || (frm.frameNum == m_param->chunkEnd))
{
m_lastKeyframe = frm.frameNum;
frm.bKeyframe = true;
}
if (!IS_X265_TYPE_B(frm.sliceType))
break;
}
}
if (m_param->bEnableTemporalSubLayers > 2)
{
//Split the partial mini GOP into sub mini GOPs when temporal sub layers are enabled
if (bframes < m_param->bframes)
{
int leftOver = bframes + 1;
int8_t gopId = m_gopId - 1;
int gopLen = x265_gop_ra_length[gopId];
int listReset = 0;
m_outputLock.acquire();
while ((gopId >= 0) && (leftOver > 3))
{
if (leftOver < gopLen)
{
gopId = gopId - 1;
gopLen = x265_gop_ra_length[gopId];
continue;
}
else
{
int newbFrames = listReset + gopLen - 1;
//Re-assign GOP
list[newbFrames]->m_lowres.sliceType = IS_X265_TYPE_I(list[newbFrames]->m_lowres.sliceType) ? list[newbFrames]->m_lowres.sliceType : X265_TYPE_P;
if (newbFrames)
list[newbFrames - 1]->m_lowres.bLastMiniGopBFrame = true;
list[newbFrames]->m_lowres.leadingBframes = newbFrames;
m_lastNonB = &list[newbFrames]->m_lowres;
/* insert a bref into the sequence */
if (m_param->bBPyramid && newbFrames)
{
placeBref(list, listReset, newbFrames, newbFrames + 1, &brefs);
}
if (m_param->rc.rateControlMode != X265_RC_CQP)
{
int p0, p1, b;
/* For zero latency tuning, calculate frame cost to be used later in RC */
if (!maxSearch)
{
for (int i = listReset; i <= newbFrames; i++)
frames[i + 1] = &list[listReset + i]->m_lowres;
}
/* estimate new non-B cost */
p1 = b = newbFrames + 1;
p0 = (IS_X265_TYPE_I(frames[newbFrames + 1]->sliceType)) ? b : listReset;
CostEstimateGroup estGroup(*this, frames);
estGroup.singleCost(p0, p1, b);
if (newbFrames)
compCostBref(frames, listReset, newbFrames, newbFrames + 1);
}
m_inputLock.acquire();
/* dequeue all frames from inputQueue that are about to be enqueued
* in the output queue. The order is important because Frame can
* only be in one list at a time */
int64_t pts[X265_BFRAME_MAX + 1];
for (int i = 0; i < gopLen; i++)
{
Frame *curFrame;
curFrame = m_inputQueue.popFront();
pts[i] = curFrame->m_pts;
maxSearch--;
}
m_inputLock.release();
int idx = 0;
/* add non-B to output queue */
list[newbFrames]->m_reorderedPts = pts[idx++];
list[newbFrames]->m_gopOffset = 0;
list[newbFrames]->m_gopId = gopId;
list[newbFrames]->m_tempLayer = x265_gop_ra[gopId][0].layer;
m_outputQueue.pushBack(*list[newbFrames]);
/* add B frames to output queue */
int i = 1, j = 1;
while (i < gopLen)
{
int offset = listReset + (x265_gop_ra[gopId][j].poc_offset - 1);
if (!list[offset] || offset == newbFrames)
continue;
// Assign gop offset and temporal layer of frames
list[offset]->m_gopOffset = j;
list[bframes]->m_gopId = gopId;
list[offset]->m_tempLayer = x265_gop_ra[gopId][j++].layer;
list[offset]->m_reorderedPts = pts[idx++];
m_outputQueue.pushBack(*list[offset]);
i++;
}
listReset += gopLen;
leftOver = leftOver - gopLen;
gopId -= 1;
gopLen = (gopId >= 0) ? x265_gop_ra_length[gopId] : 0;
}
}
if (leftOver > 0 && leftOver < 4)
{
int64_t pts[X265_BFRAME_MAX + 1];
int idx = 0;
int newbFrames = listReset + leftOver - 1;
list[newbFrames]->m_lowres.sliceType = IS_X265_TYPE_I(list[newbFrames]->m_lowres.sliceType) ? list[newbFrames]->m_lowres.sliceType : X265_TYPE_P;
if (newbFrames)
list[newbFrames - 1]->m_lowres.bLastMiniGopBFrame = true;
list[newbFrames]->m_lowres.leadingBframes = newbFrames;
m_lastNonB = &list[newbFrames]->m_lowres;
/* insert a bref into the sequence */
if (m_param->bBPyramid && (newbFrames- listReset) > 1)
placeBref(list, listReset, newbFrames, newbFrames + 1, &brefs);
if (m_param->rc.rateControlMode != X265_RC_CQP)
{
int p0, p1, b;
/* For zero latency tuning, calculate frame cost to be used later in RC */
if (!maxSearch)
{
for (int i = listReset; i <= newbFrames; i++)
frames[i + 1] = &list[listReset + i]->m_lowres;
}
/* estimate new non-B cost */
p1 = b = newbFrames + 1;
p0 = (IS_X265_TYPE_I(frames[newbFrames + 1]->sliceType)) ? b : listReset;
CostEstimateGroup estGroup(*this, frames);
estGroup.singleCost(p0, p1, b);
if (newbFrames)
compCostBref(frames, listReset, newbFrames, newbFrames + 1);
}
m_inputLock.acquire();
/* dequeue all frames from inputQueue that are about to be enqueued
* in the output queue. The order is important because Frame can
* only be in one list at a time */
for (int i = 0; i < leftOver; i++)
{
Frame *curFrame;
curFrame = m_inputQueue.popFront();
pts[i] = curFrame->m_pts;
maxSearch--;
}
m_inputLock.release();
m_lastNonB = &list[newbFrames]->m_lowres;
list[newbFrames]->m_reorderedPts = pts[idx++];
list[newbFrames]->m_gopOffset = 0;
list[newbFrames]->m_gopId = -1;
list[newbFrames]->m_tempLayer = 0;
m_outputQueue.pushBack(*list[newbFrames]);
if (brefs)
{
for (int i = listReset; i < newbFrames; i++)
{
if (list[i]->m_lowres.sliceType == X265_TYPE_BREF)
{
list[i]->m_reorderedPts = pts[idx++];
list[i]->m_gopOffset = 0;
list[i]->m_gopId = -1;
list[i]->m_tempLayer = 0;
m_outputQueue.pushBack(*list[i]);
}
}
}
/* add B frames to output queue */
for (int i = listReset; i < newbFrames; i++)
{
/* push all the B frames into output queue except B-ref, which already pushed into output queue */
if (list[i]->m_lowres.sliceType != X265_TYPE_BREF)
{
list[i]->m_reorderedPts = pts[idx++];
list[i]->m_gopOffset = 0;
list[i]->m_gopId = -1;
list[i]->m_tempLayer = 1;
m_outputQueue.pushBack(*list[i]);
}
}
}
}
else
// Fill the complete mini GOP when temporal sub layers are enabled
{
list[bframes - 1]->m_lowres.bLastMiniGopBFrame = true;
list[bframes]->m_lowres.leadingBframes = bframes;
m_lastNonB = &list[bframes]->m_lowres;
/* insert a bref into the sequence */
if (m_param->bBPyramid && !brefs)
{
placeBref(list, 0, bframes, bframes + 1, &brefs);
}
/* calculate the frame costs ahead of time for estimateFrameCost while we still have lowres */
if (m_param->rc.rateControlMode != X265_RC_CQP)
{
int p0, p1, b;
/* For zero latency tuning, calculate frame cost to be used later in RC */
if (!maxSearch)
{
for (int i = 0; i <= bframes; i++)
frames[i + 1] = &list[i]->m_lowres;
}
/* estimate new non-B cost */
p1 = b = bframes + 1;
p0 = (IS_X265_TYPE_I(frames[bframes + 1]->sliceType)) ? b : 0;
CostEstimateGroup estGroup(*this, frames);
estGroup.singleCost(p0, p1, b);
compCostBref(frames, 0, bframes, bframes + 1);
}
m_inputLock.acquire();
/* dequeue all frames from inputQueue that are about to be enqueued
* in the output queue. The order is important because Frame can
* only be in one list at a time */
int64_t pts[X265_BFRAME_MAX + 1];
for (int i = 0; i <= bframes; i++)
{
Frame *curFrame;
curFrame = m_inputQueue.popFront();
pts[i] = curFrame->m_pts;
maxSearch--;
}
m_inputLock.release();
m_outputLock.acquire();
int idx = 0;
/* add non-B to output queue */
list[bframes]->m_reorderedPts = pts[idx++];
list[bframes]->m_gopOffset = 0;
list[bframes]->m_gopId = m_gopId;
list[bframes]->m_tempLayer = x265_gop_ra[m_gopId][0].layer;
m_outputQueue.pushBack(*list[bframes]);
int i = 1, j = 1;
while (i <= bframes)
{
int offset = x265_gop_ra[m_gopId][j].poc_offset - 1;
if (!list[offset] || offset == bframes)
continue;
// Assign gop offset and temporal layer of frames
list[offset]->m_gopOffset = j;
list[offset]->m_gopId = m_gopId;
list[offset]->m_tempLayer = x265_gop_ra[m_gopId][j++].layer;
/* add B frames to output queue */
list[offset]->m_reorderedPts = pts[idx++];
m_outputQueue.pushBack(*list[offset]);
i++;
}
}
bool isKeyFrameAnalyse = (m_param->rc.cuTree || (m_param->rc.vbvBufferSize && m_param->lookaheadDepth));
if (isKeyFrameAnalyse && IS_X265_TYPE_I(m_lastNonB->sliceType))
{
m_inputLock.acquire();
Frame *curFrame = m_inputQueue.first();
frames[0] = m_lastNonB;
int j;
for (j = 0; j < maxSearch; j++)
{
frames[j + 1] = &curFrame->m_lowres;
curFrame = curFrame->m_next;
}
m_inputLock.release();
frames[j + 1] = NULL;
if (!m_param->rc.bStatRead)
slicetypeAnalyse(frames, true);
bool bIsVbv = m_param->rc.vbvBufferSize > 0 && m_param->rc.vbvMaxBitrate > 0;
if ((m_param->analysisLoad && m_param->scaleFactor && bIsVbv) || m_param->bliveVBV2pass)
{
int numFrames;
for (numFrames = 0; numFrames < maxSearch; numFrames++)
{
Lowres *fenc = frames[numFrames + 1];
if (!fenc)
break;
}
vbvLookahead(frames, numFrames, true);
}
}
m_outputLock.release();
}
else
{
if (bframes)
list[bframes - 1]->m_lowres.bLastMiniGopBFrame = true;
list[bframes]->m_lowres.leadingBframes = bframes;
m_lastNonB = &list[bframes]->m_lowres;
//接下来的代码段是关于插入B参考帧(B reference frame)的。如果满足条件m_param->bBPyramid为真,且bframes大于1,且brefs为0,则会调用placeBref函数将B参考帧插入到序列中
/* insert a bref into the sequence */
if (m_param->bBPyramid && bframes > 1 && !brefs)
{
placeBref(list, 0, bframes, bframes + 1, &brefs);
}
/* calculate the frame costs ahead of time for estimateFrameCost while we still have lowres */
if (m_param->rc.rateControlMode != X265_RC_CQP)
{
int p0, p1, b;
/* For zero latency tuning, calculate frame cost to be used later in RC */
if (!maxSearch)
{
for (int i = 0; i <= bframes; i++)
frames[i + 1] = &list[i]->m_lowres;
}
/* estimate new non-B cost */
p1 = b = bframes + 1;
p0 = (IS_X265_TYPE_I(frames[bframes + 1]->sliceType)) ? b : 0;
CostEstimateGroup estGroup(*this, frames);
estGroup.singleCost(p0, p1, b);
if (m_param->bEnableTemporalSubLayers > 1 && bframes)
{
compCostBref(frames, 0, bframes, bframes + 1);
}
else
{
if (bframes)
{
p0 = 0; // last nonb
bool isp0available = frames[bframes + 1]->sliceType == X265_TYPE_IDR ? false : true;
for (b = 1; b <= bframes; b++)
{
if (!isp0available)
p0 = b;
if (frames[b]->sliceType == X265_TYPE_B)
for (p1 = b; frames[p1]->sliceType == X265_TYPE_B; p1++)
; // find new nonb or bref
else
p1 = bframes + 1;
estGroup.singleCost(p0, p1, b);
if (frames[b]->sliceType == X265_TYPE_BREF)
{
p0 = b;
isp0available = true;
}
}
}
}
}
//使用m_inputLock进行锁定,以确保线程安全
m_inputLock.acquire();
/* dequeue all frames from inputQueue that are about to be enqueued
* in the output queue. The order is important because Frame can
* only be in one list at a time */
int64_t pts[X265_BFRAME_MAX + 1];
for (int i = 0; i <= bframes; i++)
{
Frame *curFrame;
curFrame = m_inputQueue.popFront();
pts[i] = curFrame->m_pts;
maxSearch--;
}
m_inputLock.release();
m_outputLock.acquire();
/* add non-B to output queue */
int idx = 0;
list[bframes]->m_reorderedPts = pts[idx++];
m_outputQueue.pushBack(*list[bframes]);
//如果存在B参考帧(brefs为真),则遍历list中的帧,找到类型为B参考帧(X265_TYPE_BREF)的帧,并将其添加到m_outputQueue中。这些帧的时间戳也从pts数组中取出
/* Add B-ref frame next to P frame in output queue, the B-ref encode before non B-ref frame */
if (brefs)
{
for (int i = 0; i < bframes; i++)
{
if (list[i]->m_lowres.sliceType == X265_TYPE_BREF)
{
list[i]->m_reorderedPts = pts[idx++];
m_outputQueue.pushBack(*list[i]);
}
}
}
//代码遍历B帧(除了B参考帧),将它们添加到m_outputQueue中,并从pts数组中取出相应的时间戳
/* add B frames to output queue */
for (int i = 0; i < bframes; i++)
{
/* push all the B frames into output queue except B-ref, which already pushed into output queue */
if (list[i]->m_lowres.sliceType != X265_TYPE_BREF)
{
list[i]->m_reorderedPts = pts[idx++];
m_outputQueue.pushBack(*list[i]);
}
}
//如果满足条件isKeyFrameAnalyse为真且最后一个非B帧的类型为I帧,则进入关键帧分析的逻辑
bool isKeyFrameAnalyse = (m_param->rc.cuTree || (m_param->rc.vbvBufferSize && m_param->lookaheadDepth));
if (isKeyFrameAnalyse && IS_X265_TYPE_I(m_lastNonB->sliceType))
{
m_inputLock.acquire();
Frame *curFrame = m_inputQueue.first();
frames[0] = m_lastNonB;
int j;
for (j = 0; j < maxSearch; j++)
{
frames[j + 1] = &curFrame->m_lowres;
curFrame = curFrame->m_next;
}
m_inputLock.release();
frames[j + 1] = NULL;
if (!m_param->rc.bStatRead)
slicetypeAnalyse(frames, true);
bool bIsVbv = m_param->rc.vbvBufferSize > 0 && m_param->rc.vbvMaxBitrate > 0;
if ((m_param->analysisLoad && m_param->scaleFactor && bIsVbv) || m_param->bliveVBV2pass)
{
int numFrames;
for (numFrames = 0; numFrames < maxSearch; numFrames++)
{
Lowres *fenc = frames[numFrames + 1];
if (!fenc)
break;
}
vbvLookahead(frames, numFrames, true);
}
}
m_outputLock.release();
}
}
6.任务分配模块PreLookaheadGroup::processTasks
这段代码是PreLookaheadGroup
类中的processTasks
函数。以下是代码的解释:
void PreLookaheadGroup::processTasks(int workerThreadID)
{
//如果 workerThreadID 小于 0,则将其设置为 m_lookahead 对象的线程池中的工作线程数量,否则将其设置为 0
if (workerThreadID < 0)
workerThreadID = m_lookahead.m_pool ? m_lookahead.m_pool->m_numWorkers : 0;
//获取与工作线程ID对应的 LookaheadTLD 对象引用 tld,即预先分析任务相关的线程本地数据
LookaheadTLD& tld = m_lookahead.m_tld[workerThreadID];
//获取锁 m_lock 的互斥访问权限
m_lock.acquire();
//在循环中,只要已经获取的任务数量 m_jobAcquired 小于总任务数量 m_jobTotal
while (m_jobAcquired < m_jobTotal)
{ //获取当前需要处理的预先分析帧 preFrame,并将 m_jobAcquired 自增
Frame* preFrame = m_preframes[m_jobAcquired++];
//在预先分析任务开始的位置进行性能分析
ProfileLookaheadTime(m_lookahead.m_preLookaheadElapsedTime, m_lookahead.m_countPreLookahead);
ProfileScopeEvent(prelookahead);
//释放锁 m_lock
m_lock.release();
//初始化预先分析帧的低分辨率帧 preFrame->m_lowres,使用 preFrame->m_fencPic 和 preFrame->m_poc 初始化
preFrame->m_lowres.init(preFrame->m_fencPic, preFrame->m_poc);
//如果启用了自适应量化 (m_lookahead.m_bAdaptiveQuant),则调用 tld.calcAdaptiveQuantFrame 方法计算自适应量化帧
if (m_lookahead.m_bAdaptiveQuant)
tld.calcAdaptiveQuantFrame(preFrame, m_lookahead.m_param);
//如果启用了基于直方图的场景切换检测 (m_lookahead.m_param->bHistBasedSceneCut),则调用 tld.collectPictureStatistics 方法收集图片统计信息
if (m_lookahead.m_param->bHistBasedSceneCut)
tld.collectPictureStatistics(preFrame);
//调用 tld.lowresIntraEstimate 方法进行低分辨率帧的帧内估计
tld.lowresIntraEstimate(preFrame->m_lowres, m_lookahead.m_param->rc.qgSize);
preFrame->m_lowresInit = true;
//获取锁 m_lock 的互斥访问权限
m_lock.acquire();
}
//释放锁 m_lock
m_lock.release();
}
7.低分辨率帧的帧内估计LookaheadTLD::lowresIntraEstimate()
该方法用于进行低分辨率帧的帧内估计
,以下是代码的解释:
//该方法用于进行低分辨率帧的帧内估计
void LookaheadTLD::lowresIntraEstimate(Lowres& fenc, uint32_t qgSize)
{ //定义了一些局部变量和常量,包括像素数组 prediction、fencIntra、neighbours,以及指向 neighbours 中两个不同位置的指针 samples 和 filtered
ALIGN_VAR_32(pixel, prediction[X265_LOWRES_CU_SIZE * X265_LOWRES_CU_SIZE]);
pixel fencIntra[X265_LOWRES_CU_SIZE * X265_LOWRES_CU_SIZE];
pixel neighbours[2][X265_LOWRES_CU_SIZE * 4 + 1];
pixel* samples = neighbours[0], *filtered = neighbours[1];
//初始化一些参数,如预测模式相关的 lambda 值、帧内预测的惩罚值、CU(Coding Unit)的大小和索引等
const int lookAheadLambda = (int)x265_lambda_tab[X265_LOOKAHEAD_QP];
const int intraPenalty = 5 * lookAheadLambda;
const int lowresPenalty = 4; /* fixed CU cost overhead */
const int cuSize = X265_LOWRES_CU_SIZE;
const int cuSize2 = cuSize << 1;
const int sizeIdx = X265_LOWRES_CU_BITS - 2;
pixelcmp_t satd = primitives.pu[sizeIdx].satd;
int planar = !!(cuSize >= 8);
int costEst = 0, costEstAq = 0;
//对于每个 CU 的 Y 坐标(cuY)循环遍历,范围是从 0 到 heightInCU - 1
for (int cuY = 0; cuY < heightInCU; cuY++)
{
fenc.rowSatds[0][0][cuY] = 0;
//在每个 CU 的 X 坐标(cuX)循环遍历,范围是从 0 到 widthInCU - 1
for (int cuX = 0; cuX < widthInCU; cuX++)
{ //计算当前 CU 的索引 cuXY 和像素偏移量 pelOffset
const int cuXY = cuX + cuY * widthInCU;
const intptr_t pelOffset = cuSize * cuX + cuSize * cuY * fenc.lumaStride;
pixel *pixCur = fenc.lowresPlane[0] + pelOffset;
/* copy fenc pixels *///将当前 CU 的像素拷贝到 fencIntra 数组中
primitives.cu[sizeIdx].copy_pp(fencIntra, cuSize, pixCur, fenc.lumaStride);
/* collect reference sample pixels */
//收集邻域样本像素,并存储在 samples 数组中。拷贝顶部样本和左侧样本
pixCur -= fenc.lumaStride + 1;
memcpy(samples, pixCur, (2 * cuSize + 1) * sizeof(pixel)); /* top */
for (int i = 1; i <= 2 * cuSize; i++)
samples[cuSize2 + i] = pixCur[i * fenc.lumaStride]; /* left */
primitives.cu[sizeIdx].intra_filter(samples, filtered);
int cost, icost = me.COST_MAX;
uint32_t ilowmode = 0;
//对于 DC 和 Planar 两种预测模式,分别进行帧内预测,并计算预测残差的 SATD(Sum of Absolute Transformed Differences)代价。选择较小的代价作为当前 CU 的最佳预测模式
/* DC and planar */
primitives.cu[sizeIdx].intra_pred[DC_IDX](prediction, cuSize, samples, 0, cuSize <= 16);
cost = satd(fencIntra, cuSize, prediction, cuSize);
COPY2_IF_LT(icost, cost, ilowmode, DC_IDX);
primitives.cu[sizeIdx].intra_pred[PLANAR_IDX](prediction, cuSize, neighbours[planar], 0, 0);
cost = satd(fencIntra, cuSize, prediction, cuSize);
COPY2_IF_LT(icost, cost, ilowmode, PLANAR_IDX);
/* scan angular predictions */
int filter, acost = me.COST_MAX;
uint32_t mode, alowmode = 4;
//遍历角度预测模式,计算每个模式的预测残差的 SATD 代价,并选择最小的代价作为当前 CU 的最佳预测模式
for (mode = 5; mode < 35; mode += 5)
{
filter = !!(g_intraFilterFlags[mode] & cuSize);
primitives.cu[sizeIdx].intra_pred[mode](prediction, cuSize, neighbours[filter], mode, cuSize <= 16);
cost = satd(fencIntra, cuSize, prediction, cuSize);
COPY2_IF_LT(acost, cost, alowmode, mode);
}
//在最佳预测模式周围的两个模式中,再次计算预测残差的 SATD 代价,并选择最小的代价作为当前 CU 的最终预测模式
for (uint32_t dist = 2; dist >= 1; dist--)
{
int minusmode = alowmode - dist;
int plusmode = alowmode + dist;
mode = minusmode;
filter = !!(g_intraFilterFlags[mode] & cuSize);
primitives.cu[sizeIdx].intra_pred[mode](prediction, cuSize, neighbours[filter], mode, cuSize <= 16);
cost = satd(fencIntra, cuSize, prediction, cuSize);
COPY2_IF_LT(acost, cost, alowmode, mode);
mode = plusmode;
filter = !!(g_intraFilterFlags[mode] & cuSize);
primitives.cu[sizeIdx].intra_pred[mode](prediction, cuSize, neighbours[filter], mode, cuSize <= 16);
cost = satd(fencIntra, cuSize, prediction, cuSize);
COPY2_IF_LT(acost, cost, alowmode, mode);
}
COPY2_IF_LT(icost, acost, ilowmode, alowmode);
//根据预测模式的代价和惩罚值,估计当前 CU 的帧内信号代价,并更新相关数据结构
icost += intraPenalty + lowresPenalty; /* estimate intra signal cost */
fenc.lowresCosts[0][0][cuXY] = (uint16_t)(X265_MIN(icost, LOWRES_COST_MASK) | (0 << LOWRES_COST_SHIFT));
fenc.intraCost[cuXY] = icost;
fenc.intraMode[cuXY] = (uint8_t)ilowmode;
/* do not include edge blocks in the
frame cost estimates, they are not very accurate */
//如果当前 CU 不在边缘位置,则将其帧内信号代价累加到整个帧的代价估计中
const bool bFrameScoreCU = (cuX > 0 && cuX < widthInCU - 1 &&
cuY > 0 && cuY < heightInCU - 1) || widthInCU <= 2 || heightInCU <= 2;
int icostAq;
if (qgSize == 8)
icostAq = (bFrameScoreCU && fenc.invQscaleFactor) ? ((icost * fenc.invQscaleFactor8x8[cuXY] + 128) >> 8) : icost;
else
icostAq = (bFrameScoreCU && fenc.invQscaleFactor) ? ((icost * fenc.invQscaleFactor[cuXY] +128) >> 8) : icost;
if (bFrameScoreCU)
{
costEst += icost;
costEstAq += icostAq;
}
fenc.rowSatds[0][0][cuY] += icostAq;
}
}
//更新整个帧的代价估计
fenc.costEst[0][0] = costEst;
fenc.costEstAq[0][0] = costEstAq;
}
8.进行类型分析Lookahead::slicetypeAnalyse
类型分析
void Lookahead::slicetypeAnalyse(Lowres **frames, bool bKeyframe)
{
int numFrames, origNumFrames, keyintLimit, framecnt;
//根据条件计算最大搜索帧数 maxSearch,取 m_param->lookaheadDepth 和 X265_LOOKAHEAD_MAX 中的较小值
int maxSearch = X265_MIN(m_param->lookaheadDepth, X265_LOOKAHEAD_MAX);
int cuCount = m_8x8Blocks;
int resetStart;
bool bIsVbvLookahead = m_param->rc.vbvBufferSize && m_param->lookaheadDepth;
/* count undecided frames */
//统计未决帧数。遍历帧列表 frames,直到达到最大搜索帧数 maxSearch 或遇到切片类型不为 X265_TYPE_AUTO 的帧,每次递增 framecnt。这一步统计了未决帧的数量
for (framecnt = 0; framecnt < maxSearch; framecnt++)
{
Lowres *fenc = frames[framecnt + 1];
if (!fenc || fenc->sliceType != X265_TYPE_AUTO)
break;
}
//如果 framecnt 为 0,表示未找到未决帧。根据条件判断是否需要进行 CU 树的处理,如果需要,则调用 cuTree 函数进行处理,然后返回
if (!framecnt)
{
if (m_param->rc.cuTree)
cuTree(frames, 0, bKeyframe);
return;
}//将 frames[framecnt + 1] 设置为 NULL,表示未决帧之后的帧为空
frames[framecnt + 1] = NULL;
//如果启用了区域配置重置(m_param->bResetZoneConfig 为真),则根据区域配置的设置更新 m_param->keyframeMax
if (m_param->bResetZoneConfig)
{
for (int i = 0; i < m_param->rc.zonefileCount; i++)
{
int curZoneStart = m_param->rc.zones[i].startFrame, nextZoneStart = 0;
curZoneStart += curZoneStart ? m_param->rc.zones[i].zoneParam->radl : 0;
nextZoneStart += (i + 1 < m_param->rc.zonefileCount) ? m_param->rc.zones[i + 1].startFrame + m_param->rc.zones[i + 1].zoneParam->radl : m_param->totalFrames;
if (curZoneStart <= frames[0]->frameNum && nextZoneStart > frames[0]->frameNum)
m_param->keyframeMax = nextZoneStart - curZoneStart;
if (m_param->rc.zones[m_param->rc.zonefileCount - 1].startFrame <= frames[0]->frameNum && nextZoneStart == 0)
m_param->keyframeMax = m_param->rc.zones[0].keyframeMax;
}
}//根据当前帧的帧号和区块的设置,更新 keylimit 的值
int keylimit = m_param->keyframeMax;
if (frames[0]->frameNum < m_param->chunkEnd)
{
int chunkStart = (m_param->chunkStart - m_lastKeyframe - 1);
int chunkEnd = (m_param->chunkEnd - m_lastKeyframe);
if ((chunkStart > 0) && (chunkStart < m_param->keyframeMax))
keylimit = chunkStart;
else if ((chunkEnd > 0) && (chunkEnd < m_param->keyframeMax))
keylimit = chunkEnd;
}
//根据 GOP 的设置和可用的关键帧限制,计算 keyFrameLimit 的值
int keyFrameLimit = keylimit + m_lastKeyframe - frames[0]->frameNum - 1;
if (m_param->gopLookahead && keyFrameLimit <= m_param->bframes + 1)
keyintLimit = keyFrameLimit + m_param->gopLookahead;
else
keyintLimit = keyFrameLimit;
//根据不同情况更新 numFrames 的值,包括是否启用 VBV 预测、是否为开放式 GOP 和是否存在未决帧
origNumFrames = numFrames = m_param->bIntraRefresh ? framecnt : X265_MIN(framecnt, keyintLimit);
if (bIsVbvLookahead)
numFrames = framecnt;
else if (m_param->bOpenGOP && numFrames < framecnt)
numFrames++;
else if (numFrames == 0)
{
frames[1]->sliceType = X265_TYPE_I;
return;
}
//首先判断是否需要进行批处理的运动搜索
if (m_bBatchMotionSearch)
{ //创建一个CostEstimateGroup对象estGroup,该对象用于存储成本估计,使用嵌套循环遍历帧(frames)中的每个参考帧(b)和其之前的帧(p0),以及其之后的帧(p1),并添加到estGroup中进行运动搜索
/* pre-calculate all motion searches, using many worker threads */
CostEstimateGroup estGroup(*this, frames);
for (int b = 2; b < numFrames; b++)
{ //这个循环仅增加前后帧距离相等的参考关系
for (int i = 1; i <= m_param->bframes + 1; i++)
{
int p0 = b - i;
if (p0 < 0)
continue;
/* Skip search if already done */
if (frames[b]->lowresMvs[0][i][0].x != 0x7FFF)
continue;
/* perform search to p1 at same distance, if possible */
int p1 = b + i;
if (p1 >= numFrames || frames[b]->lowresMvs[1][i][0].x != 0x7FFF)
p1 = b;
estGroup.add(p0, p1, b);
}
}//自动禁用批处理运动搜索(m_bBatchMotionSearch)如果线程池(m_pool)的工作线程数量小于4
/* auto-disable after the first batch if pool is small */
m_bBatchMotionSearch &= m_pool->m_numWorkers >= 4;
estGroup.finishBatch();
if (m_bBatchFrameCosts)
{ //这边在上面的前后帧距离相等的基础上,再补充其他的组合方式
/* pre-calculate all frame cost estimates, using many worker threads */
for (int b = 2; b < numFrames; b++)
{
for (int i = 1; i <= m_param->bframes + 1; i++)
{
if (b < i)
continue;
/* only measure frame cost in this pass if motion searches
* are already done */
if (frames[b]->lowresMvs[0][i][0].x == 0x7FFF)
continue;
int p0 = b - i;
for (int j = 0; j <= m_param->bframes; j++)
{
int p1 = b + j;
if (p1 >= numFrames)
break;
/* ensure P1 search is done */
if (j && frames[b]->lowresMvs[1][j][0].x == 0x7FFF)
continue;
/* ensure frame cost is not done */
if (frames[b]->costEst[i][j] >= 0)
continue;
estGroup.add(p0, p1, b);
}
}
}
/* auto-disable after the first batch if the pool is not large */
m_bBatchFrameCosts &= m_pool->m_numWorkers > 12;
estGroup.finishBatch();
}
}
int numBFrames = 0;
int numAnalyzed = numFrames;
bool isScenecut = false;
if (m_param->bHistBasedSceneCut)
isScenecut = histBasedScenecut(frames, 0, 1, origNumFrames);
else//判断当前帧是否是场景切换
isScenecut = scenecut(frames, 0, 1, true, origNumFrames);
/* When scenecut threshold is set, use scenecut detection for I frame placements */
if (m_param->scenecutThreshold && isScenecut)
{ //将第二帧的 sliceType 设置为关键帧(I 帧)类型,并返回
frames[1]->sliceType = X265_TYPE_I;
return;
}
if (m_param->gopLookahead && (keyFrameLimit >= 0) && (keyFrameLimit <= m_param->bframes + 1))
{
bool sceneTransition = m_isSceneTransition;
m_extendGopBoundary = false;
for (int i = m_param->bframes + 1; i < origNumFrames; i += m_param->bframes + 1)
{
scenecut(frames, i, i + 1, true, origNumFrames);
for (int j = i + 1; j <= X265_MIN(i + m_param->bframes + 1, origNumFrames); j++)
{
if (frames[j]->bScenecut && scenecutInternal(frames, j - 1, j, true))
{
m_extendGopBoundary = true;
break;
}
}
if (m_extendGopBoundary)
break;
}
m_isSceneTransition = sceneTransition;
}
if (m_param->bframes)
{
if (m_param->bFrameAdaptive == X265_B_ADAPT_TRELLIS)
{
if (numFrames > 1)
{ //并初始化第一行为空字符串,第二行为"P"
char best_paths[X265_BFRAME_MAX + 1][X265_LOOKAHEAD_MAX + 1] = { "", "P" };
int best_path_index = numFrames % (X265_BFRAME_MAX + 1);
//调用slicetypePath函数确定最佳的切片路径,并将结果保存在best_paths数组中
/* Perform the frame type analysis. */
for (int j = 2; j <= numFrames; j++)
slicetypePath(frames, j, best_paths);
//使用strspn函数计算best_paths[best_path_index]中连续的"B"字符数量,得到B帧的数量(numBFrames)
numBFrames = (int)strspn(best_paths[best_path_index], "B");
//将分析结果加载到frames数组中
/* Load the results of the analysis into the frame types. */
for (int j = 1; j < numFrames; j++)
frames[j]->sliceType = best_paths[best_path_index][j - 1] == 'B' ? X265_TYPE_B : X265_TYPE_P;
}//将最后一帧(frames[numFrames])的切片类型设置为P帧
frames[numFrames]->sliceType = X265_TYPE_P;
}
else if (m_param->bFrameAdaptive == X265_B_ADAPT_FAST)
{
CostEstimateGroup estGroup(*this, frames);
int64_t cost1p0, cost2p0, cost1b1, cost2p1;
for (int i = 0; i <= numFrames - 2; )
{
cost2p1 = estGroup.singleCost(i + 0, i + 2, i + 2, true);
if (frames[i + 2]->intraMbs[2] > cuCount / 2)
{
frames[i + 1]->sliceType = X265_TYPE_P;
frames[i + 2]->sliceType = X265_TYPE_P;
i += 2;
continue;
}
cost1b1 = estGroup.singleCost(i + 0, i + 2, i + 1);
cost1p0 = estGroup.singleCost(i + 0, i + 1, i + 1);
cost2p0 = estGroup.singleCost(i + 1, i + 2, i + 2);
if (cost1p0 + cost2p0 < cost1b1 + cost2p1)
{
frames[i + 1]->sliceType = X265_TYPE_P;
i += 1;
continue;
}
// arbitrary and untuned
#define INTER_THRESH 300
#define P_SENS_BIAS (50 - m_param->bFrameBias)
frames[i + 1]->sliceType = X265_TYPE_B;
int j;
for (j = i + 2; j <= X265_MIN(i + m_param->bframes, numFrames - 1); j++)
{
int64_t pthresh = X265_MAX(INTER_THRESH - P_SENS_BIAS * (j - i - 1), INTER_THRESH / 10);
int64_t pcost = estGroup.singleCost(i + 0, j + 1, j + 1, true);
if (pcost > pthresh * cuCount || frames[j + 1]->intraMbs[j - i + 1] > cuCount / 3)
break;
frames[j]->sliceType = X265_TYPE_B;
}
frames[j]->sliceType = X265_TYPE_P;
i = j;
}
frames[numFrames]->sliceType = X265_TYPE_P;
numBFrames = 0;
while (numBFrames < numFrames && frames[numBFrames + 1]->sliceType == X265_TYPE_B)
numBFrames++;
}
else
{
numBFrames = X265_MIN(numFrames - 1, m_param->bframes);
for (int j = 1; j < numFrames; j++)
frames[j]->sliceType = (j % (numBFrames + 1)) ? X265_TYPE_B : X265_TYPE_P;
frames[numFrames]->sliceType = X265_TYPE_P;
}
//根据条件判断是否强制使用RADL
int zoneRadl = m_param->rc.zonefileCount && m_param->bResetZoneConfig ? m_param->rc.zones->zoneParam->radl : 0;
bool bForceRADL = zoneRadl || (m_param->radl && (m_param->keyframeMax == m_param->keyframeMin));
bool bLastMiniGop = (framecnt >= m_param->bframes + 1) ? false : true;//根据条件判断是否为最后一个小GOP
int radl = m_param->radl ? m_param->radl : zoneRadl;
int preRADL = m_lastKeyframe + m_param->keyframeMax - radl - 1; /*Frame preceeding RADL in POC order*/
if (bForceRADL && (frames[0]->frameNum == preRADL) && !bLastMiniGop)
{//如果满足强制使用RADL的条件,并且第一个帧的frameNum等于preRADL,并且不是最后一个小GOP,则执行以下操作
int j = 1;
numBFrames = m_param->radl ? m_param->radl : zoneRadl;
for (; j <= numBFrames; j++)//循环设置帧类型为B帧,从第2帧到第numBFrames帧
frames[j]->sliceType = X265_TYPE_B;
frames[j]->sliceType = X265_TYPE_I;
}
else /* Check scenecut and RADL on the first minigop. */
{
for (int j = 1; j < numBFrames + 1; j++)
{ //对于每个帧,检查是否满足场景切换条件或者强制使用RADL的条件,如果满足条件,将该帧的帧类型设置为P帧,并将numAnalyzed设置为当前帧的索引,并跳出循环
if (scenecut(frames, j, j + 1, false, origNumFrames) ||
(bForceRADL && (frames[j]->frameNum == preRADL)))
{
frames[j]->sliceType = X265_TYPE_P;
numAnalyzed = j;
break;
}
}
}
resetStart = bKeyframe ? 1 : X265_MIN(numBFrames + 2, numAnalyzed + 1);
}
else
{
for (int j = 1; j <= numFrames; j++)
frames[j]->sliceType = X265_TYPE_P;
resetStart = bKeyframe ? 1 : 2;
}
if (m_param->bAQMotion)
aqMotion(frames, bKeyframe);
//调用cuTree函数处理帧的CU树
if (m_param->rc.cuTree)
cuTree(frames, X265_MIN(numFrames, m_param->keyframeMax), bKeyframe);
if (m_param->gopLookahead && (keyFrameLimit >= 0) && (keyFrameLimit <= m_param->bframes + 1) && !m_extendGopBoundary)
keyintLimit = keyFrameLimit;
if (!m_param->bIntraRefresh)
for (int j = keyintLimit + 1; j <= numFrames; j += m_param->keyframeMax)
{
frames[j]->sliceType = X265_TYPE_I;
resetStart = X265_MIN(resetStart, j + 1);
}
if (bIsVbvLookahead)
vbvLookahead(frames, numFrames, bKeyframe);
int maxp1 = X265_MIN(m_param->bframes + 1, origNumFrames);
/* Restore frame types for all frames that haven't actually been decided yet. */
for (int j = resetStart; j <= numFrames; j++)
{
frames[j]->sliceType = X265_TYPE_AUTO;
/* If any frame marked as scenecut is being restarted for sliceDecision,
* undo scene Transition flag */
if (j <= maxp1 && frames[j]->bScenecut && m_isSceneTransition)
m_isSceneTransition = false;
}
}
9.低分辨率帧间估计CostEstimateGroup::estimateFrameCost
用于估算一个Frame的成本
int64_t CostEstimateGroup::estimateFrameCost(LookaheadTLD& tld, int p0, int p1, int b, bool bIntraPenalty)
{
Lowres* fenc = m_frames[b];
x265_param* param = m_lookahead.m_param;
int64_t score = 0;
if (fenc->costEst[b - p0][p1 - b] >= 0 && fenc->rowSatds[b - p0][p1 - b][0] != -1)
score = fenc->costEst[b - p0][p1 - b];
else
{
bool bDoSearch[2];
bDoSearch[0] = fenc->lowresMvs[0][b - p0][0].x == 0x7FFF;
bDoSearch[1] = p1 > b && fenc->lowresMvs[1][p1 - b][0].x == 0x7FFF;
#if CHECKED_BUILD
X265_CHECK(!(p0 < b && fenc->lowresMvs[0][b - p0][0].x == 0x7FFE), "motion search batch duplication L0\n");
X265_CHECK(!(p1 > b && fenc->lowresMvs[1][p1 - b][0].x == 0x7FFE), "motion search batch duplication L1\n");
if (bDoSearch[0]) fenc->lowresMvs[0][b - p0][0].x = 0x7FFE;
if (bDoSearch[1]) fenc->lowresMvs[1][p1 - b][0].x = 0x7FFE;
#endif
fenc->weightedRef[b - p0].isWeighted = false;
if (param->bEnableWeightedPred && bDoSearch[0])
tld.weightsAnalyse(*m_frames[b], *m_frames[p0]);
fenc->costEst[b - p0][p1 - b] = 0;
fenc->costEstAq[b - p0][p1 - b] = 0;
//如果不处于批处理模式,并且协同模式的切片数大于1,并且需要进行运动搜索或双向测量,则进入协同模式
if (!m_batchMode && m_lookahead.m_numCoopSlices > 1 && ((p1 > b) || bDoSearch[0] || bDoSearch[1]))
{
/* Use cooperative mode if a thread pool is available and the cost estimate is
* going to need motion searches or bidir measurements */
memset(&m_slice, 0, sizeof(Slice) * m_lookahead.m_numCoopSlices);
m_lock.acquire();
X265_CHECK(!m_batchMode, "single CostEstimateGroup instance cannot mix batch modes\n");
m_coop.p0 = p0;
m_coop.p1 = p1;
m_coop.b = b;
m_coop.bDoSearch[0] = bDoSearch[0];
m_coop.bDoSearch[1] = bDoSearch[1];
m_jobTotal = m_lookahead.m_numCoopSlices;
m_jobAcquired = 0;
m_lock.release();
tryBondPeers(*m_lookahead.m_pool, m_jobTotal);
processTasks(-1);
waitForExit();
//通过使用线程池来并行处理多个任务,计算每个任务的成本估算值,并将结果累加到costEst和costEstAq中
for (int i = 0; i < m_lookahead.m_numCoopSlices; i++)
{
fenc->costEst[b - p0][p1 - b] += m_slice[i].costEst;
fenc->costEstAq[b - p0][p1 - b] += m_slice[i].costEstAq;
if (p1 == b)
fenc->intraMbs[b - p0] += m_slice[i].intraMbs;
}
}
else
{ //计算1/16分辨率下的运动矢量(MV
/* Calculate MVs for 1/16th resolution*/
bool lastRow;
if (param->bEnableHME)
{
lastRow = true;
for (int cuY = m_lookahead.m_4x4Height - 1; cuY >= 0; cuY--)
{
for (int cuX = m_lookahead.m_4x4Width - 1; cuX >= 0; cuX--)
estimateCUCost(tld, cuX, cuY, p0, p1, b, bDoSearch, lastRow, -1, 1);
lastRow = false;
}
}
lastRow = true;
for (int cuY = m_lookahead.m_8x8Height - 1; cuY >= 0; cuY--)
{
fenc->rowSatds[b - p0][p1 - b][cuY] = 0;
for (int cuX = m_lookahead.m_8x8Width - 1; cuX >= 0; cuX--)
estimateCUCost(tld, cuX, cuY, p0, p1, b, bDoSearch, lastRow, -1, 0);
lastRow = false;
}
}
score = fenc->costEst[b - p0][p1 - b];
if (b != p1)
score = score * 100 / (130 + param->bFrameBias);
fenc->costEst[b - p0][p1 - b] = score;
}
if (bIntraPenalty)
// arbitrary penalty for I-blocks after B-frames
score += score * fenc->intraMbs[b - p0] / (tld.ncu * 8);
return score;
}
10.低分辨率单个CU帧间估计CostEstimateGroup::estimateCUCost
用于估算一个Coding Unit(CU)的成本
void CostEstimateGroup::estimateCUCost(LookaheadTLD& tld, int cuX, int cuY, int p0, int p1, int b, bool bDoSearch[2], bool lastRow, int slice, bool hme)
{
Lowres *fref0 = m_frames[p0];
Lowres *fref1 = m_frames[p1];
Lowres *fenc = m_frames[b];
ReferencePlanes *wfref0 = fenc->weightedRef[b - p0].isWeighted && !hme ? &fenc->weightedRef[b - p0] : fref0;
//根据帧的宽度和高度,确定CU在帧中的位置,计算CU的大小、像素偏移量等参数
const int widthInCU = hme ? m_lookahead.m_4x4Width : m_lookahead.m_8x8Width;
const int heightInCU = hme ? m_lookahead.m_4x4Height : m_lookahead.m_8x8Height;
const int bBidir = (b < p1);
const int cuXY = cuX + cuY * widthInCU;
const int cuXY_4x4 = (cuX / 2) + (cuY / 2) * widthInCU / 2;
const int cuSize = X265_LOWRES_CU_SIZE;
const intptr_t pelOffset = cuSize * cuX + cuSize * cuY * (hme ? fenc->lumaStride/2 : fenc->lumaStride);
if ((bBidir || bDoSearch[0] || bDoSearch[1]) && hme)
tld.me.setSourcePU(fenc->lowerResPlane[0], fenc->lumaStride / 2, pelOffset, cuSize, cuSize, X265_HEX_SEARCH, m_lookahead.m_param->hmeSearchMethod[0], m_lookahead.m_param->hmeSearchMethod[1], 1);
else if((bBidir || bDoSearch[0] || bDoSearch[1]) && !hme)
tld.me.setSourcePU(fenc->lowresPlane[0], fenc->lumaStride, pelOffset, cuSize, cuSize, X265_HEX_SEARCH, m_lookahead.m_param->hmeSearchMethod[0], m_lookahead.m_param->hmeSearchMethod[1], 1);
//设置一个小的偏置值lowresPenalty,用于避免由于零残差的预测块导致VBV(Video Buffering Verifier)问题
/* A small, arbitrary bias to avoid VBV problems caused by zero-residual lookahead blocks. */
int lowresPenalty = 4;
int listDist[2] = { b - p0, p1 - b};
MV mvmin, mvmax;
int bcost = tld.me.COST_MAX;
int listused = 0;
// TODO: restrict to slices boundaries
// establish search bounds that don't cross extended frame boundaries
mvmin.x = (int32_t)(-cuX * cuSize - 8);
mvmin.y = (int32_t)(-cuY * cuSize - 8);
mvmax.x = (int32_t)((widthInCU - cuX - 1) * cuSize + 8);
mvmax.y = (int32_t)((heightInCU - cuY - 1) * cuSize + 8);
//对每个参考列表(单向或双向)进行运动估计和成本计算
for (int i = 0; i < 1 + bBidir; i++)
{
int& fencCost = hme ? fenc->lowerResMvCosts[i][listDist[i]][cuXY] : fenc->lowresMvCosts[i][listDist[i]][cuXY];
int skipCost = INT_MAX;
if (!bDoSearch[i])
{
COPY2_IF_LT(bcost, fencCost, listused, i + 1);
continue;
}
int numc = 0;
MV mvc[5], mvp;
MV* fencMV = hme ? &fenc->lowerResMvs[i][listDist[i]][cuXY] : &fenc->lowresMvs[i][listDist[i]][cuXY];
ReferencePlanes* fref = i ? fref1 : wfref0;
//根据特定的条件填充了数组 mvc,将运动矢量存储其中
/* Reverse-order MV prediction */
#define MVC(mv) mvc[numc++] = mv;
if (cuX < widthInCU - 1)
MVC(fencMV[1]);
if (!lastRow)
{
MVC(fencMV[widthInCU]);
if (cuX > 0)
MVC(fencMV[widthInCU - 1]);
if (cuX < widthInCU - 1)
MVC(fencMV[widthInCU + 1]);
}
if (fenc->lowerResMvs[0][0] && !hme && fenc->lowerResMvCosts[i][listDist[i]][cuXY_4x4] > 0)
{
MVC((fenc->lowerResMvs[i][listDist[i]][cuXY_4x4]) * 2);
}
#undef MVC
if (!numc)
mvp = 0;
else
{
ALIGN_VAR_32(pixel, subpelbuf[X265_LOWRES_CU_SIZE * X265_LOWRES_CU_SIZE]);
int mvpcost = MotionEstimate::COST_MAX;
/* measure SATD cost of each neighbor MV (estimating merge analysis)
* and use the lowest cost MV as MVP (estimating AMVP). Since all
* mvc[] candidates are measured here, none are passed to motionEstimate */
for (int idx = 0; idx < numc; idx++)
{
intptr_t stride = X265_LOWRES_CU_SIZE;
pixel *src = fref->lowresMC(pelOffset, mvc[idx], subpelbuf, stride, hme);
int cost = tld.me.bufSATD(src, stride);
COPY2_IF_LT(mvpcost, cost, mvp, mvc[idx]);
/* Except for mv0 case, everyting else is likely to have enough residual to not trigger the skip. */
if (!mvp.notZero() && bBidir)
skipCost = cost;
}
}
int searchRange = m_lookahead.m_param->bEnableHME ? (hme ? m_lookahead.m_param->hmeRange[0] : m_lookahead.m_param->hmeRange[1]) : s_merange;
/* ME will never return a cost larger than the cost @MVP, so we do not
* have to check that ME cost is more than the estimated merge cost */
if(!hme)//使用运动估计技术计算了 fencCost
fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, searchRange, *fencMV, m_lookahead.m_param->maxSlices);
else
fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, searchRange, *fencMV, m_lookahead.m_param->maxSlices, fref->lowerResPlane[0]);
if (skipCost < 64 && skipCost < fencCost && bBidir)
{
fencCost = skipCost;
*fencMV = 0;
}//通过调用宏 COPY2_IF_LT,将 fencCost 的值复制到 bcost
COPY2_IF_LT(bcost, fencCost, listused, i + 1);
}
if (hme)
return;
//如果 bBidir 为真,表示当前帧为双向预测帧(B帧),则执行双向预测的成本估计过程;否则,表示当前帧为单向预测帧(P帧),则执行单向预测的成本估计过程以及考虑帧内预测的情况
if (bBidir) /* B, also consider bidir */
{
/* NOTE: the wfref0 (weightp) is not used for BIDIR */
//调用 fref0->lowresMC 和 fref1->lowresMC 函数,对参考帧进行亚像素运动补偿,得到两个亚像素平面 src0 和 src1
/* avg(l0-mv, l1-mv) candidate */
ALIGN_VAR_32(pixel, subpelbuf0[X265_LOWRES_CU_SIZE * X265_LOWRES_CU_SIZE]);
ALIGN_VAR_32(pixel, subpelbuf1[X265_LOWRES_CU_SIZE * X265_LOWRES_CU_SIZE]);
intptr_t stride0 = X265_LOWRES_CU_SIZE, stride1 = X265_LOWRES_CU_SIZE;
pixel *src0 = fref0->lowresMC(pelOffset, fenc->lowresMvs[0][listDist[0]][cuXY], subpelbuf0, stride0, 0);
pixel *src1 = fref1->lowresMC(pelOffset, fenc->lowresMvs[1][listDist[1]][cuXY], subpelbuf1, stride1, 0);
//创建用于存储像素平均值的缓冲区 ref
ALIGN_VAR_32(pixel, ref[X265_LOWRES_CU_SIZE * X265_LOWRES_CU_SIZE]);
//使用像素平均值函数
primitives.pu[LUMA_8x8].pixelavg_pp[NONALIGNED](ref, X265_LOWRES_CU_SIZE, src0, stride0, src1, stride1, 32);
//计算 ref的 SATD
int bicost = tld.me.bufSATD(ref, X265_LOWRES_CU_SIZE);
COPY2_IF_LT(bcost, bicost, listused, 3);
/* coloc candidate */
//再次使用像素平均值函数,将 fref0->lowresPlane[0] 和 fref1->lowresPlane[0] 的像素平均值存储到 ref 缓冲区中
src0 = fref0->lowresPlane[0] + pelOffset;
src1 = fref1->lowresPlane[0] + pelOffset;
primitives.pu[LUMA_8x8].pixelavg_pp[NONALIGNED](ref, X265_LOWRES_CU_SIZE, src0, fref0->lumaStride, src1, fref1->lumaStride, 32);
bicost = tld.me.bufSATD(ref, X265_LOWRES_CU_SIZE);
COPY2_IF_LT(bcost, bicost, listused, 3);
bcost += lowresPenalty;
}
else /* P, also consider intra */
{
bcost += lowresPenalty;
if (fenc->intraCost[cuXY] < bcost)
{
bcost = fenc->intraCost[cuXY];
listused = 0;
}
}
//根据条件判断当前块是否位于帧的边缘区域,并将结果存储在布尔变量 bFrameScoreCU 中
/* do not include edge blocks in the frame cost estimates, they are not very accurate */
const bool bFrameScoreCU = (cuX > 0 && cuX < widthInCU - 1 &&
cuY > 0 && cuY < heightInCU - 1) || widthInCU <= 2 || heightInCU <= 2;
int bcostAq;
if (m_lookahead.m_param->rc.qgSize == 8)
bcostAq = (bFrameScoreCU && fenc->invQscaleFactor) ? ((bcost * fenc->invQscaleFactor8x8[cuXY] + 128) >> 8) : bcost;
else
bcostAq = (bFrameScoreCU && fenc->invQscaleFactor) ? ((bcost * fenc->invQscaleFactor[cuXY] +128) >> 8) : bcost;
if (bFrameScoreCU)
{ //具体的更新根据当前是整个帧还是分片进行不同的处理
if (slice < 0)//如果 slice 小于零,表示当前处理的是整个帧(不是分片)
{
fenc->costEst[b - p0][p1 - b] += bcost;
fenc->costEstAq[b - p0][p1 - b] += bcostAq;
if (!listused && !bBidir)
fenc->intraMbs[b - p0]++;
}
else
{
m_slice[slice].costEst += bcost;
m_slice[slice].costEstAq += bcostAq;
if (!listused && !bBidir)
m_slice[slice].intraMbs++;
}
}
fenc->rowSatds[b - p0][p1 - b][cuY] += bcostAq;
fenc->lowresCosts[b - p0][p1 - b][cuXY] = (uint16_t)(X265_MIN(bcost, LOWRES_COST_MASK) | (listused << LOWRES_COST_SHIFT));
}
11.VBV码率和缓冲区Lookahead::vbvLookahead
VBV预测用于估计视频编码过程中的码率和缓冲区占用情况,以便进行码率控制和缓冲区管理。
void Lookahead::vbvLookahead(Lowres **frames, int numFrames, int keyframe)
{
int prevNonB = 0, curNonB = 1, idx = 0;
//根据帧类型,确定非B帧(curNonB)和下一个非B帧(nextNonB)的索引
while (curNonB < numFrames && IS_X265_TYPE_B(frames[curNonB]->sliceType))
curNonB++;
int nextNonB = keyframe ? prevNonB : curNonB;
int nextB = prevNonB + 1;
int nextBRef = 0, curBRef = 0;
if (m_param->bBPyramid && curNonB - prevNonB > 1)
curBRef = (prevNonB + curNonB + 1) / 2;
int miniGopEnd = keyframe ? prevNonB : curNonB;
//遍历帧数组中的每个非B帧(curNonB)
while (curNonB <= numFrames)
{ //对于P帧或I帧,计算其与下一个非B帧之间的预测代价(plannedSatd)和帧类型(plannedType)
/* P/I cost: This shouldn't include the cost of nextNonB */
if (nextNonB != curNonB)
{
int p0 = IS_X265_TYPE_I(frames[curNonB]->sliceType) ? curNonB : prevNonB;
frames[nextNonB]->plannedSatd[idx] = vbvFrameCost(frames, p0, curNonB, curNonB);
frames[nextNonB]->plannedType[idx] = frames[curNonB]->sliceType;
/* Save the nextNonB Cost in each B frame of the current miniGop */
if (curNonB > miniGopEnd)
{
for (int j = nextB; j < miniGopEnd; j++)
{
frames[j]->plannedSatd[frames[j]->indB] = frames[nextNonB]->plannedSatd[idx];
frames[j]->plannedType[frames[j]->indB++] = frames[nextNonB]->plannedType[idx];
}
}
idx++;
}
/* Handle the B-frames: coded order */
if (m_param->bBPyramid && curNonB - prevNonB > 1)
nextBRef = (prevNonB + curNonB + 1) / 2;
for (int i = prevNonB + 1; i < curNonB; i++, idx++)
{
int64_t satdCost = 0;
int type = X265_TYPE_B;
//如果当前非B帧之后还有B帧(curNonB - prevNonB > 1),计算B帧的预测代价和帧类型
if (nextBRef)
{
if (i == nextBRef)
{
satdCost = vbvFrameCost(frames, prevNonB, curNonB, nextBRef);
type = X265_TYPE_BREF;
}
else if (i < nextBRef)
satdCost = vbvFrameCost(frames, prevNonB, nextBRef, i);
else
satdCost = vbvFrameCost(frames, nextBRef, curNonB, i);
}
else
satdCost = vbvFrameCost(frames, prevNonB, curNonB, i);
//将计算得到的预测代价和帧类型存储在下一个非B帧(nextNonB)的相应数组中
frames[nextNonB]->plannedSatd[idx] = satdCost;
frames[nextNonB]->plannedType[idx] = type;
/* Save the nextB Cost in each B frame of the current miniGop */
//根据具体情况,将预测代价和帧类型保存在当前miniGop中的每个B帧中
for (int j = nextB; j < miniGopEnd; j++)
{
if (curBRef && curBRef == i)
break;
if (j >= i && j !=nextBRef)
continue;
frames[j]->plannedSatd[frames[j]->indB] = satdCost;
frames[j]->plannedType[frames[j]->indB++] = type;
}
}
//更新索引和计数器,继续下一个非B帧的处理,直到遍历完所有帧。
prevNonB = curNonB;
curNonB++;
while (curNonB <= numFrames && IS_X265_TYPE_B(frames[curNonB]->sliceType))
curNonB++;
}
//设置最后一个非B帧(nextNonB)的帧类型为自动X265_TYPE_AUTO
frames[nextNonB]->plannedType[idx] = X265_TYPE_AUTO;
}
12.场景切换检测Lookahead::scenecut
该函数用于检测场景切换,并返回是否发生了真正的场景切换,代码如下:
bool Lookahead::scenecut(Lowres **frames, int p0, int p1, bool bRealScenecut, int numFrames)
{
/* Only do analysis during a normal scenecut check. */
if (bRealScenecut && m_param->bframes)
{
int origmaxp1 = p0 + 1;
/* Look ahead to avoid coding short flashes as scenecuts. */
origmaxp1 += m_param->bframes;
int maxp1 = X265_MIN(origmaxp1, numFrames);
bool fluctuate = false;
bool noScenecuts = false;
int64_t avgSatdCost = 0;
if (frames[p0]->costEst[p1 - p0][0] > -1)
avgSatdCost = frames[p0]->costEst[p1 - p0][0];
int cnt = 1;
/* Where A and B are scenes: AAAAAABBBAAAAAA
* If BBB is shorter than (maxp1-p0), it is detected as a flash
* and not considered a scenecut. */
//需要避免出现这种闪回认为是场景的情况
for (int cp1 = p1; cp1 <= maxp1; cp1++)
{
if (!scenecutInternal(frames, p0, cp1, false))
{
/* Any frame in between p0 and cur_p1 cannot be a real scenecut. */
for (int i = cp1; i > p0; i--)
{
frames[i]->bScenecut = false;
noScenecuts = false;
}
}
else if (scenecutInternal(frames, cp1 - 1, cp1, false))
{ //判断前一帧与当前帧是否也是场景切换帧
/* If current frame is a Scenecut from p0 frame as well as Scenecut from
* preceeding frame, mark it as a Scenecut */
frames[cp1]->bScenecut = true;
noScenecuts = true;
}
/* compute average satdcost of all the frames in the mini-gop to confirm
* whether there is any great fluctuation among them to rule out false positives */
X265_CHECK(frames[cp1]->costEst[cp1 - p0][0]!= -1, "costEst is not done \n");
avgSatdCost += frames[cp1]->costEst[cp1 - p0][0];
cnt++;
}
/* Identify possible scene fluctuations by comparing the satd cost of the frames.
* This could denote the beginning or ending of scene transitions.
* During a scene transition(fade in/fade outs), if fluctuate remains false,
* then the scene had completed its transition or stabilized */
if (noScenecuts)
{
fluctuate = false;
avgSatdCost /= cnt;
for (int i = p1; i <= maxp1; i++)
{
int64_t curCost = frames[i]->costEst[i - p0][0];
int64_t prevCost = frames[i - 1]->costEst[i - 1 - p0][0];
if (fabs((double)(curCost - avgSatdCost)) > 0.1 * avgSatdCost ||
fabs((double)(curCost - prevCost)) > 0.1 * prevCost)//比较当前帧和前一帧的SAD成本与平均SAD成本的差异是否超过阈值的10%。如果超过阈值,将波动标志fluctuate设置为true
{
fluctuate = true;
if (!m_isSceneTransition && frames[i]->bScenecut)
{
m_isSceneTransition = true;//只需要检测到第一个场景切换帧即可
/* just mark the first scenechange in the scene transition as a scenecut. */
for (int j = i + 1; j <= maxp1; j++)
frames[j]->bScenecut = false;
break;
}
}
frames[i]->bScenecut = false;
}
}
if (!fluctuate && !noScenecuts)
m_isSceneTransition = false; /* Signal end of scene transitioning */
}
if (m_param->csvLogLevel >= 2)
{
int64_t icost = frames[p1]->costEst[0][0];
int64_t pcost = frames[p1]->costEst[p1 - p0][0];
frames[p1]->ipCostRatio = (double)icost / pcost;
}
/* A frame is always analysed with bRealScenecut = true first, and then bRealScenecut = false,
the former for I decisions and the latter for P/B decisions. It's possible that the first
analysis detected scenecuts which were later nulled due to scene transitioning, in which
case do not return a true scenecut for this frame */
if (!frames[p1]->bScenecut)
return false;
//仅返回P1是否是转码
return scenecutInternal(frames, p0, p1, bRealScenecut);
}
13.帧结构路径成本计算Lookahead::slicetypePathCost
实现了X265_B_ADAPT_TRELLIS帧结构的方案,代码如下:
int64_t Lookahead::slicetypePathCost(Lowres **frames, char *path, int64_t threshold)
{
int64_t cost = 0;
int loc = 1;//初始化变量 loc 为 1,表示路径的索引位置,从第一个路径元素开始
int cur_p = 0;//初始化变量 cur_p 为 0,表示当前p帧的索引位置
CostEstimateGroup estGroup(*this, frames);
//将路径指针 path 减1,这是因为第一个路径元素实际上是第二帧
path--; /* Since the 1st path element is really the second frame */
while (path[loc])//在循环中,遍历路径元素,直到遇到空字符结束循环
{
int next_p = loc;
/* Find the location of the next P-frame. */
while (path[next_p] != 'P')
next_p++;
//根据找到的下一个P帧位置,计算该帧的代价,并将其添加到总代价 cost 中
/* Add the cost of the P-frame found above */
cost += estGroup.singleCost(cur_p, next_p, next_p);
/* Early terminate if the cost we have found is larger than the best path cost so far */
if (cost > threshold)
break;
//如果启用了B帧金字塔(B-frame pyramid)且下一个P帧与当前P帧的间隔大于2,则进行特殊处理
if (m_param->bBPyramid && next_p - cur_p > 2)
{
int middle = cur_p + (next_p - cur_p) / 2;
cost += estGroup.singleCost(cur_p, next_p, middle);
for (int next_b = loc; next_b < middle && cost < threshold; next_b++)
cost += estGroup.singleCost(cur_p, middle, next_b);
for (int next_b = middle + 1; next_b < next_p && cost < threshold; next_b++)
cost += estGroup.singleCost(middle, next_p, next_b);
}
else//如果未启用B帧金字塔或间隔小于等于2,则遍历当前P帧和下一个P帧之间的每一帧,计算其代价并添加到总代价 cost 中
{
for (int next_b = loc; next_b < next_p && cost < threshold; next_b++)
cost += estGroup.singleCost(cur_p, next_p, next_b);
}
loc = next_p + 1;
cur_p = next_p;
}
return cost;
}
14.CU tree的构建和处理Lookahead::cuTree
实现了X265_B_ADAPT_TRELLIS帧结构的方案,代码如下:
//对给定的帧数组进行CU树的构建和处理
void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
{
int idx = !bIntra;
int lastnonb, curnonb = 1;
int bframes = 0;
x265_emms();
double totalDuration = 0.0;
for (int j = 0; j <= numframes; j++)
totalDuration += (double)m_param->fpsDenom / m_param->fpsNum;
double averageDuration = totalDuration / (numframes + 1);
int i = numframes;
while (i > 0 && frames[i]->sliceType == X265_TYPE_B)
i--;
lastnonb = i;
/* Lookaheadless MB-tree is not a theoretically distinct case; the same extrapolation could
* be applied to the end of a lookahead buffer of any size. However, it's most needed when
* lookahead=0, so that's what's currently implemented. */
if (!m_param->lookaheadDepth)
{
if (bIntra)
{ //如果没有启用前向预测(lookaheadDepth为0),则根据帧类型进行处理,设置传播代价(propagateCost)和QP偏移
memset(frames[0]->propagateCost, 0, m_cuCount * sizeof(uint16_t));
if (m_param->rc.qgSize == 8)
memcpy(frames[0]->qpCuTreeOffset, frames[0]->qpAqOffset, m_cuCount * 4 * sizeof(double));
else
memcpy(frames[0]->qpCuTreeOffset, frames[0]->qpAqOffset, m_cuCount * sizeof(double));
return;
}
std::swap(frames[lastnonb]->propagateCost, frames[0]->propagateCost);
memset(frames[0]->propagateCost, 0, m_cuCount * sizeof(uint16_t));
}
else
{
if (lastnonb < idx)
return;
memset(frames[lastnonb]->propagateCost, 0, m_cuCount * sizeof(uint16_t));
}
CostEstimateGroup estGroup(*this, frames);
while (i-- > idx)
{ //从最后一个非B帧开始,向前遍历帧序列
curnonb = i;
while (frames[curnonb]->sliceType == X265_TYPE_B && curnonb > 0)
curnonb--;
if (curnonb < idx)
break;
estGroup.singleCost(curnonb, lastnonb, lastnonb);
memset(frames[curnonb]->propagateCost, 0, m_cuCount * sizeof(uint16_t));
bframes = lastnonb - curnonb - 1;
if (m_param->bBPyramid && bframes > 1)
{
int middle = (bframes + 1) / 2 + curnonb;
estGroup.singleCost(curnonb, lastnonb, middle);
memset(frames[middle]->propagateCost, 0, m_cuCount * sizeof(uint16_t));
while (i > curnonb)
{
int p0 = i > middle ? middle : curnonb;
int p1 = i < middle ? middle : lastnonb;
if (i != middle)
{ //从当前帧向前遍历,计算每一帧与参考帧之间的帧类型成本,并进行CU tree 遗传信息的传递操作
estGroup.singleCost(p0, p1, i);
estimateCUPropagate(frames, averageDuration, p0, p1, i, 0);
}
i--;
}
estimateCUPropagate(frames, averageDuration, curnonb, lastnonb, middle, 1);
}
else
{
while (i > curnonb)
{ //向前遍历,计算所有帧的cost
estGroup.singleCost(curnonb, lastnonb, i);
estimateCUPropagate(frames, averageDuration, curnonb, lastnonb, i, 0);
i--;
}
}
estimateCUPropagate(frames, averageDuration, curnonb, lastnonb, lastnonb, 1);
lastnonb = curnonb;
}
if (!m_param->lookaheadDepth)
{
estGroup.singleCost(0, lastnonb, lastnonb);
estimateCUPropagate(frames, averageDuration, 0, lastnonb, lastnonb, 1);
std::swap(frames[lastnonb]->propagateCost, frames[0]->propagateCost);
}
//在所有的帧类型成本计算和宏块树的传递操作完成后,进行宏块树的最终处理,并输出结果
cuTreeFinish(frames[lastnonb], averageDuration, lastnonb);
if (m_param->bBPyramid && bframes > 1 && !m_param->rc.vbvBufferSize)
cuTreeFinish(frames[lastnonb + (bframes + 1) / 2], averageDuration, 0);
}