【x264】码率控制模块的简单分析—宏块级码控工具Mbtree和AQ
参考:
帧类型决策-slicetype_frame/slice/mb_cost()
参数分析:
【x264】x264编码器参数配置
流程分析:
【x264】x264编码主流程简单分析
【x264】编码核心函数(x264_encoder_encode)的简单分析
【x264】分析模块(analyse)的简单分析—帧内预测
【x264】分析模块(analyse)的简单分析—帧间预测
1. 宏块树(mbtree)
mbtree是一种宏块级别的码率控制工具,根据帧间预测相关信息,来有效的降低编码的码率。mbtree的工作原理可以简单描述为,在lookahead当中通过前向预测,来获得当前mb的重要程度,重要程度由被参考的情况来衡量。如果被参考的权值很高,说明这个mb很重要,此时这个mb会按照较低的qp编码,否则按照较高的qp编码。这个权值的衡量由当前mb的intra_cost、inter_cost和前面mb赋予当前mb的传播cost决定
在mbtree中,主要的计算过程位于macroblock_tree中,函数位于slicetype.c中,函数主要的工作流程如下:
- 如果当前帧为intra帧,则直接计算cost
- 为propagate_cost赋初始值
- 循环处理每一帧,计算cost
- 计算当前帧的帧内和帧间cost(slicetype_frame_cost)
- 计算宏块的传播cost(macroblock_tree_propagate),之后根据帧内、帧间cost和宏块传播cost来计算qp_offset(macroblock_tree_finish),从而调整当前mb的qp
static void macroblock_tree( x264_t *h, x264_mb_analysis_t *a, x264_frame_t **frames, int num_frames, int b_intra )
{
int idx = !b_intra;
int last_nonb, cur_nonb = 1;
int bframes = 0;
x264_emms();
float total_duration = 0.0;
for( int j = 0; j <= num_frames; j++ )
total_duration += frames[j]->f_duration;
float average_duration = total_duration / (num_frames + 1);
int i = num_frames;
// ----- 1.当前帧为intra帧,直接计算cost ----- //
if( b_intra )
slicetype_frame_cost( h, a, frames, 0, 0, 0 );
while( i > 0 && IS_X264_TYPE_B( frames[i]->i_type ) )
i--;
last_nonb = i;
/* Lookaheadless MB-tree is not a theoretically distinct case; the same extrapolation could
* be applied to the end of a lookahead buffer of any size. However, it's most needed when
* lookahead=0, so that's what's currently implemented. */
// ----- 2.为propagate_cost赋初始值 ----- //
if( !h->param.rc.i_lookahead )
{
if( b_intra )
{
memset( frames[0]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
memcpy( frames[0]->f_qp_offset, frames[0]->f_qp_offset_aq, h->mb.i_mb_count * sizeof(float) );
return;
}
XCHG( uint16_t*, frames[last_nonb]->i_propagate_cost, frames[0]->i_propagate_cost );
memset( frames[0]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
}
else
{
if( last_nonb < idx )
return;
memset( frames[last_nonb]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
}
// ----- 3.循环处理每一帧,计算cost ----- //
while( i-- > idx )
{
cur_nonb = i;
while( IS_X264_TYPE_B( frames[cur_nonb]->i_type ) && cur_nonb > 0 )
cur_nonb--;
if( cur_nonb < idx )
break;
// ---- 4.计算当前帧的帧内和帧间cost ----- //
slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, last_nonb );
memset( frames[cur_nonb]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
bframes = last_nonb - cur_nonb - 1;
if( h->param.i_bframe_pyramid && bframes > 1 ) // B帧
{
int middle = (bframes + 1)/2 + cur_nonb;
slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, middle );
memset( frames[middle]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
while( i > cur_nonb )
{
int p0 = i > middle ? middle : cur_nonb;
int p1 = i < middle ? middle : last_nonb;
if( i != middle )
{
slicetype_frame_cost( h, a, frames, p0, p1, i );
// ----- 5.计算宏块的传播cost ----- //
macroblock_tree_propagate( h, frames, average_duration, p0, p1, i, 0 );
}
i--;
}
macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, middle, 1 );
}
else
{
while( i > cur_nonb )
{
slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, i );
macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, i, 0 );
i--;
}
}
macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, last_nonb, 1 );
last_nonb = cur_nonb;
}
if( !h->param.rc.i_lookahead )
{
slicetype_frame_cost( h, a, frames, 0, last_nonb, last_nonb );
macroblock_tree_propagate( h, frames, average_duration, 0, last_nonb, last_nonb, 1 );
XCHG( uint16_t*, frames[last_nonb]->i_propagate_cost, frames[0]->i_propagate_cost );
}
// ----- 6.计算量化参数传播系数 ----- //
macroblock_tree_finish( h, frames[last_nonb], average_duration, last_nonb );
if( h->param.i_bframe_pyramid && bframes > 1 && !h->param.rc.i_vbv_buffer_size )
macroblock_tree_finish( h, frames[last_nonb+(bframes+1)/2], average_duration, 0 );
}
1.1 计算当前帧的帧内和帧间cost(slicetype_frame_cost)
计算帧的帧内和帧间cost,其主要的工作流程为:
- 检查是否已经评估了该帧
- 多线程处理
- 单线程处理
- 计算slice的cost(slicetype_slice_cost),其中会计算mb的cost(slicetype_mb_cost),会进行帧内预测和帧间预测,计算其最佳的cost
- 将计算结果累加起来
/*
将帧b以slice为单位计算其开销,统计其总开销
其中p0表示b的前向参考帧,p1表示b的后向参考帧
若p0 = p1 = b,则表示没有参考帧,即I帧
若p1 = b,则表示只有前向参考帧,即P帧
作为I帧,所有宏块的cost = intra cost
作为P帧,所有宏块的cost = min( intra cost, inter cost)
作为B帧,所有宏块的cost = inter cost
其中每一个帧都带有开销矩阵i_cost_est[b-p0][p1-b]
表示帧b以p0为前向参考,p1为后向参考时的帧cost
*/
static int slicetype_frame_cost( x264_t *h, x264_mb_analysis_t *a,
x264_frame_t **frames, int p0, int p1, int b )
{
int i_score = 0;
int do_search[2];
const x264_weight_t *w = x264_weight_none;
x264_frame_t *fenc = frames[b];
/* Check whether we already evaluated this frame
* If we have tried this frame as P, then we have also tried
* the preceding frames as B. (is this still true?) */
// ----- 1.检查是否已经评估了该帧,如果我们已经把该帧当作P帧计算,那我们仍然需要将前面的帧当作B帧来计算 ----- //
/* Also check that we already calculated the row SATDs for the current frame. */
// 同样检查我们是否对该帧进行了行SATD计算
if( fenc->i_cost_est[b-p0][p1-b] >= 0 && (!h->param.rc.i_vbv_buffer_size || fenc->i_row_satds[b-p0][p1-b][0] != -1) ) // 如果帧b已经计算了前向参考p0后向参考p1的cost,则直接得到开销
i_score = fenc->i_cost_est[b-p0][p1-b];
else
{
int dist_scale_factor = 128;
/* For each list, check to see whether we have lowres motion-searched this reference frame before. */
// 对于每个列表,检查我们之前是否对这个参考帧进行了低分辨率运动搜索
do_search[0] = b != p0 && fenc->lowres_mvs[0][b-p0-1][0][0] == 0x7FFF;
do_search[1] = b != p1 && fenc->lowres_mvs[1][p1-b-1][0][0] == 0x7FFF;
if( do_search[0] ) // 前向
{
if( h->param.analyse.i_weighted_pred && b == p1 )
{
x264_emms();
x264_weights_analyse( h, fenc, frames[p0], 1 );
w = fenc->weight[0];
}
fenc->lowres_mvs[0][b-p0-1][0][0] = 0;
}
if( do_search[1] ) fenc->lowres_mvs[1][p1-b-1][0][0] = 0; // 后向
if( p1 != p0 )
dist_scale_factor = ( ((b-p0) << 8) + ((p1-p0) >> 1) ) / (p1-p0);
int output_buf_size = h->mb.i_mb_height + (NUM_INTS + PAD_SIZE) * h->param.i_lookahead_threads;
int *output_inter[X264_LOOKAHEAD_THREAD_MAX+1];
int *output_intra[X264_LOOKAHEAD_THREAD_MAX+1];
output_inter[0] = h->scratch_buffer2;
output_intra[0] = output_inter[0] + output_buf_size;
#if HAVE_OPENCL
if( h->param.b_opencl )
{
x264_opencl_lowres_init(h, fenc, a->i_lambda );
if( do_search[0] )
{
x264_opencl_lowres_init( h, frames[p0], a->i_lambda );
x264_opencl_motionsearch( h, frames, b, p0, 0, a->i_lambda, w );
}
if( do_search[1] )
{
x264_opencl_lowres_init( h, frames[p1], a->i_lambda );
x264_opencl_motionsearch( h, frames, b, p1, 1, a->i_lambda, NULL );
}
if( b != p0 )
x264_opencl_finalize_cost( h, a->i_lambda, frames, p0, p1, b, dist_scale_factor );
x264_opencl_flush( h );
i_score = fenc->i_cost_est[b-p0][p1-b];
}
else
#endif
{ // ----- 2.多线程处理 ----- //
if( h->param.i_lookahead_threads > 1 ) // 多个lookahead线程
{
x264_slicetype_slice_t s[X264_LOOKAHEAD_THREAD_MAX];
for( int i = 0; i < h->param.i_lookahead_threads; i++ )
{
x264_t *t = h->lookahead_thread[i];
/* FIXME move this somewhere else */
// 将搜索方法赋值
t->mb.i_me_method = h->mb.i_me_method;
t->mb.i_subpel_refine = h->mb.i_subpel_refine;
t->mb.b_chroma_me = h->mb.b_chroma_me;
s[i] = (x264_slicetype_slice_t){ t, a, frames, p0, p1, b, dist_scale_factor, do_search, w,
output_inter[i], output_intra[i] };
// 计算起始行
t->i_threadslice_start = ((h->mb.i_mb_height * i + h->param.i_lookahead_threads/2) / h->param.i_lookahead_threads);
// 计算结束行
t->i_threadslice_end = ((h->mb.i_mb_height * (i+1) + h->param.i_lookahead_threads/2) / h->param.i_lookahead_threads);
int thread_height = t->i_threadslice_end - t->i_threadslice_start;
int thread_output_size = thread_height + NUM_INTS;
memset( output_inter[i], 0, thread_output_size * sizeof(int) );
memset( output_intra[i], 0, thread_output_size * sizeof(int) );
output_inter[i][NUM_ROWS] = output_intra[i][NUM_ROWS] = thread_height;
output_inter[i+1] = output_inter[i] + thread_output_size + PAD_SIZE;
output_intra[i+1] = output_intra[i] + thread_output_size + PAD_SIZE;
// 运行线程
x264_threadpool_run( h->lookaheadpool, (void*)slicetype_slice_cost, &s[i] );
}
// 等待线程结束
for( int i = 0; i < h->param.i_lookahead_threads; i++ )
x264_threadpool_wait( h->lookaheadpool, &s[i] );
}
else
{ // ----- 3.单个lookahead线程 ----- //
h->i_threadslice_start = 0;
h->i_threadslice_end = h->mb.i_mb_height;
memset( output_inter[0], 0, (output_buf_size - PAD_SIZE) * sizeof(int) );
memset( output_intra[0], 0, (output_buf_size - PAD_SIZE) * sizeof(int) );
output_inter[0][NUM_ROWS] = output_intra[0][NUM_ROWS] = h->mb.i_mb_height;
x264_slicetype_slice_t s = (x264_slicetype_slice_t){ h, a, frames, p0, p1, b, dist_scale_factor, do_search, w,
output_inter[0], output_intra[0] };
// ---- 4.计算slice的cost ---- //
slicetype_slice_cost( &s );
}
// ----- 5.将计算结果累加起来 ----- //
/* Sum up accumulators */
if( b == p1 )
fenc->i_intra_mbs[b-p0] = 0;
if( !fenc->b_intra_calculated )
{
fenc->i_cost_est[0][0] = 0;
fenc->i_cost_est_aq[0][0] = 0;
}
fenc->i_cost_est[b-p0][p1-b] = 0;
fenc->i_cost_est_aq[b-p0][p1-b] = 0;
int *row_satd_inter = fenc->i_row_satds[b-p0][p1-b];
int *row_satd_intra = fenc->i_row_satds[0][0];
for( int i = 0; i < h->param.i_lookahead_threads; i++ )
{
if( b == p1 )
fenc->i_intra_mbs[b-p0] += output_inter[i][INTRA_MBS];
if( !fenc->b_intra_calculated )
{
fenc->i_cost_est[0][0] += output_intra[i][COST_EST];
fenc->i_cost_est_aq[0][0] += output_intra[i][COST_EST_AQ];
}
// 累计intra的cost和aq_cost到开销矩阵i_cost_est[0][0]中
fenc->i_cost_est[b-p0][p1-b] += output_inter[i][COST_EST];
fenc->i_cost_est_aq[b-p0][p1-b] += output_inter[i][COST_EST_AQ];
if( h->param.rc.i_vbv_buffer_size )
{
int row_count = output_inter[i][NUM_ROWS];
memcpy( row_satd_inter, output_inter[i] + NUM_INTS, row_count * sizeof(int) );
if( !fenc->b_intra_calculated )
memcpy( row_satd_intra, output_intra[i] + NUM_INTS, row_count * sizeof(int) );
row_satd_inter += row_count;
row_satd_intra += row_count;
}
}
i_score = fenc->i_cost_est[b-p0][p1-b];
if( b != p1 )
i_score = (uint64_t)i_score * 100 / (120 + h->param.i_bframe_bias);
else
fenc->b_intra_calculated = 1;
fenc->i_cost_est[b-p0][p1-b] = i_score;
x264_emms();
}
}
return i_score;
}
1.2 计算宏块的传播cost(macroblock_tree_propagate)
该函数的主要功能是计算宏块的传播cost(或者叫做遗传cost),主要的工作流程为:
- 遍历所有的mb
- 计算传播cost(mbtree_propagate_cost)
- 将计算地传播cost分给参考mb(mbtree_propagate_list)
- 根据前述的计算结果计算qp_offset(macroblock_tree_finish)
static void macroblock_tree_propagate( x264_t *h, x264_frame_t **frames, float average_duration, int p0, int p1, int b, int referenced )
{
// ...
float fps_factor = CLIP_DURATION(frames[b]->f_duration) / (CLIP_DURATION(average_duration) * 256.0f) * MBTREE_PRECISION;
/* For non-reffed frames the source costs are always zero, so just memset one row and re-use it. */
// 对于没有被参考的帧,cost永远是0,因此直接赋值为0即可
if( !referenced )
memset( frames[b]->i_propagate_cost, 0, h->mb.i_mb_width * sizeof(uint16_t) );
// ----- 1.遍历所有的mb ----- //
for( h->mb.i_mb_y = 0; h->mb.i_mb_y < h->mb.i_mb_height; h->mb.i_mb_y++ )
{
int mb_index = h->mb.i_mb_y*h->mb.i_mb_stride;
// ----- 2.计算传播cost ----- //
h->mc.mbtree_propagate_cost( buf, propagate_cost,
frames[b]->i_intra_cost+mb_index, lowres_costs+mb_index,
frames[b]->i_inv_qscale_factor+mb_index, &fps_factor, h->mb.i_mb_width );
if( referenced )
propagate_cost += h->mb.i_mb_width;
// ----- 3.将计算地传播cost分给参考mb ----- //
h->mc.mbtree_propagate_list( h, ref_costs[0], &mvs[0][mb_index], buf, &lowres_costs[mb_index],
bipred_weights[0], h->mb.i_mb_y, h->mb.i_mb_width, 0 );
if( b != p1 )
{
h->mc.mbtree_propagate_list( h, ref_costs[1], &mvs[1][mb_index], buf, &lowres_costs[mb_index],
bipred_weights[1], h->mb.i_mb_y, h->mb.i_mb_width, 1 );
}
}
// ----- 4.根据前述的计算结果调整qp ----- //
if( h->param.rc.i_vbv_buffer_size && h->param.rc.i_lookahead && referenced )
macroblock_tree_finish( h, frames[b], average_duration, b == p1 ? b - p0 : 0 );
}
其中,计算传播cost的核心函数mbtree_propagate_cost的实现方式为
/* Estimate the total amount of influence on future quality that could be had if we
* were to improve the reference samples used to inter predict any given macroblock. */
// 本函数评估的内容是,如果提升任何用于帧间预测的样本,会对未来编码的质量带来多大的性能增益
static void mbtree_propagate_cost( int16_t *dst, uint16_t *propagate_in, uint16_t *intra_costs,
uint16_t *inter_costs, uint16_t *inv_qscales, float *fps_factor, int len )
{
// fps因子,在观看时,如果一个视频帧持续的时间较长,对观看者的主观影响就越大,因此将fps因子作为一个参数加进来进行调整
// float fps_factor = CLIP_DURATION(frames[b]->f_duration) / (CLIP_DURATION(average_duration) * 256.0f) * MBTREE_PRECISION;
float fps = *fps_factor;
for( int i = 0; i < len; i++ )
{
int intra_cost = intra_costs[i];
// 如果inter_cost > intra_cost,将其修正为intra_cost
int inter_cost = X264_MIN(intra_costs[i], inter_costs[i] & LOWRES_COST_MASK);
// qscale是拉格朗日乘子,或者也可叫做lambda,其计算方式为
// qscale = 0.85f * powf( 2.0f, ( qp - (12.0f + QP_BD_OFFSET) ) / 6.0f );
//
/*
我对inv_qscale初步的理解如下:
(1)inv_qscale的计算方式为:
frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8( frame->f_qp_offset[mb_xy] );
或 frame->i_inv_qscale_factor[mb_xy] = 256;
或 frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8(qp_adj); 使用aq_mode时会执行的操作
由于默认会使用aq_mode,所以默认是将qp_adj转换成为inv_qscale,即aq转换过来的qscale,所以可以理解为是一个aq的系数
(2)aq模式下,会对每一个mb进行qp的调整,所以将aq的系数也作为一个衡量重要程度的一个因子
*/
float propagate_intra = intra_cost * inv_qscales[i];
// propagate_in是前面的帧中计算的传播cost赋值到当前mb的
// 加上当前帧本身的cost,得到总体的cost
float propagate_amount = propagate_in[i] + propagate_intra*fps;
// inter_cost相对于intra_cost而言,如果越小,说明当前mb的传播cost越多,即当前mb蕴含的信息越多,越重要
float propagate_num = intra_cost - inter_cost;
float propagate_denom = intra_cost;
// 计算出来的propagate cost后续会被分到参考的mb中,用以表示该参考mb的重要性
dst[i] = X264_MIN((int)(propagate_amount * propagate_num / propagate_denom + 0.5f), 32767);
}
}
将计算出来的传播cost分发给参考mb
static void mbtree_propagate_list( x264_t *h, uint16_t *ref_costs, int16_t (*mvs)[2],
int16_t *propagate_amount, uint16_t *lowres_costs,
int bipred_weight, int mb_y, int len, int list )
{
// ...
int x = mvs[i][0];
int y = mvs[i][1];
unsigned mbx = (unsigned)((x>>5)+i);
unsigned mby = (unsigned)((y>>5)+mb_y);
unsigned idx0 = mbx + mby * stride;
unsigned idx2 = idx0 + stride;
x &= 31;
y &= 31;
/*
+ ---- + ---- +
| 0 | 1 |
+ ---- + ---- +
| 2 | 3 |
+ ---- + ---- +
运动搜索的结果不一定让mv刚好处于某一个Mb上,可能处于多个Mb宏块上,则将当前的遗传代价根据所占块的面积来分配给各个宏块
分成了4个小块,每个mb都是16*16
*/
int idx0weight = (32-y)*(32-x); // 0号块的面积
int idx1weight = (32-y)*x; // 1号块的面积
int idx2weight = y*(32-x); // 2号块的面积
int idx3weight = y*x; // 3号块的面积
idx0weight = (idx0weight * listamount + 512) >> 10; // 0号块的权重
idx1weight = (idx1weight * listamount + 512) >> 10; // 1号块的权重
idx2weight = (idx2weight * listamount + 512) >> 10; // 2号块的权重
idx3weight = (idx3weight * listamount + 512) >> 10; // 3号块的权重
// 更新当前mb传播出去的代价
if( mbx < width-1 && mby < height-1 )
{
MC_CLIP_ADD( ref_costs[idx0+0], idx0weight );
MC_CLIP_ADD( ref_costs[idx0+1], idx1weight );
MC_CLIP_ADD( ref_costs[idx2+0], idx2weight );
MC_CLIP_ADD( ref_costs[idx2+1], idx3weight );
}
// ...
}
}
根据传播cost调整qp,实现的函数macroblock_tree_finish如下
static void macroblock_tree_finish( x264_t *h, x264_frame_t *frame, float average_duration, int ref0_distance )
{
int fps_factor = round( CLIP_DURATION(average_duration) / CLIP_DURATION(frame->f_duration) * 256 / MBTREE_PRECISION );
float weightdelta = 0.0;
if( ref0_distance && frame->f_weighted_cost_delta[ref0_distance-1] > 0 )
weightdelta = (1.0 - frame->f_weighted_cost_delta[ref0_distance-1]);
/* Allow the strength to be adjusted via qcompress, since the two
* concepts are very similar. */
// 允许通过qcompress来调整strength
float strength = 5.0f * (1.0f - h->param.rc.f_qcompress);
for( int mb_index = 0; mb_index < h->mb.i_mb_count; mb_index++ )
{
int intra_cost = (frame->i_intra_cost[mb_index] * frame->i_inv_qscale_factor[mb_index] + 128) >> 8;
if( intra_cost )
{
int propagate_cost = (frame->i_propagate_cost[mb_index] * fps_factor + 128) >> 8;
// intra_cost : 当前mb的代价,propagate_cost : 前面的mb计算出来的传播cost赋予到当前的mb,两者相加表示了当前mb所蕴含的信息量
float log2_ratio = x264_log2(intra_cost + propagate_cost) - x264_log2(intra_cost) + weightdelta;
// 我对这里的理解是,如果log2_ratio越大,右边整体的值就越小,此时qp_offset就越小
// log2_ratio表示的是传入进来的cost和自身cost的比值
// 如果传入进来propagate_cost比自身的intra_cost要大,则log2_ratio较大
// 此时,下面等式右边的值就会偏小,即qp_offset偏小,当前mb的qp会比较小(当前mb很重要)
// 反之,qp_offset会偏大,当前mb的qp会较大(当前mb不那么重要)
frame->f_qp_offset[mb_index] = frame->f_qp_offset_aq[mb_index] - strength * log2_ratio;
}
}
}
1.3 mbtree的测试
从上面的分析来看,对于mbtree这一项优化工具而言,利用了lookahead中的一系列参考帧,计算每个mb在这些参考关系中的重要程度,如果一个mb很重要,则进行较高质量编码(低qp),否则进行较低质量编码(高qp)。
重要程度的衡量内容包括:
(1)当前mb的intra_cost、inter_cost(由自己计算)
(2)当前mb的propagate_cost(由其他mb计算并且赋值到当前mb)
mbtree这一工具所提出的论文为:《Garrett-Glaser J. A novel macroblock-tree algorithm for high-performance optimization of dependent video coding in H.264/AVC[J]. Tech. Rep., 2009.》,新的宏块树方法在很低的运算消耗下比已有的宏块树算法在PSNR性能上提高1.2db,在SSIM性能上提高2.3db。我自己用自己的测试序列进行测试100帧,分辨率为1290x1200,是否启用mbtree可以用x264的命令行参数–no-mbtree指定:
(1)不使用mbtree
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:8 Avg QP:23.88 size:186467
x264 [info]: frame P:39 Avg QP:26.54 size:108471
x264 [info]: frame B:53 Avg QP:27.15 size: 63502
x264 [info]: consecutive B-frames: 19.0% 22.0% 27.0% 32.0%
x264 [info]: mb I I16..4: 5.4% 79.7% 14.9%
x264 [info]: mb P I16..4: 6.3% 32.9% 11.4% P16..4: 23.3% 14.6% 7.3% 0.0% 0.0% skip: 4.2%
x264 [info]: mb B I16..4: 2.0% 7.0% 2.6% B16..8: 40.0% 17.7% 4.6% direct: 7.4% skip:18.7% L0:41.7% L1:45.6% BI:12.8%
x264 [info]: 8x8 transform intra:67.6% inter:65.4%
x264 [info]: coded y,uvDC,uvAC intra: 72.1% 32.0% 5.5% inter: 41.5% 18.1% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16% 6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23% 5% 5% 4% 6% 4% 7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10% 4% 5% 4% 6% 4% 4%
x264 [info]: i8c dc,h,v,p: 64% 19% 16% 1%
x264 [info]: Weighted P-Frames: Y:23.1% UV:12.8%
x264 [info]: ref P L0: 66.4% 19.8% 9.3% 3.7% 0.8%
x264 [info]: ref B L0: 94.4% 4.4% 1.1%
x264 [info]: ref B L1: 99.1% 0.9%
x264 [info]: kb/s:18175.45
encoded 100 frames, 8.24 fps, 18175.44 kb/s
(2)使用mbtree
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7 Avg QP:23.59 size:195961
x264 [info]: frame P:39 Avg QP:26.23 size:110076
x264 [info]: frame B:54 Avg QP:28.41 size: 50786
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I I16..4: 4.4% 79.4% 16.2%
x264 [info]: mb P I16..4: 5.3% 36.6% 12.0% P16..4: 22.8% 13.9% 6.6% 0.0% 0.0% skip: 2.8%
x264 [info]: mb B I16..4: 2.1% 6.0% 2.4% B16..8: 41.2% 16.4% 3.9% direct: 6.0% skip:22.1% L0:42.4% L1:47.2% BI:10.5%
x264 [info]: 8x8 transform intra:68.4% inter:66.7%
x264 [info]: coded y,uvDC,uvAC intra: 73.8% 32.0% 5.2% inter: 37.2% 15.2% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16% 6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23% 5% 5% 4% 6% 4% 7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10% 4% 5% 5% 6% 4% 4%
x264 [info]: i8c dc,h,v,p: 65% 19% 15% 1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 65.4% 19.2% 10.6% 3.7% 1.2%
x264 [info]: ref B L0: 94.3% 4.5% 1.2%
x264 [info]: ref B L1: 98.8% 1.2%
x264 [info]: kb/s:16814.29
encoded 100 frames, 8.43 fps, 16814.29 kb/s
从测试结果来看,使用mbtree能够显著降低bitrate(7.5%左右),且带来的编码速度降低不十分明显(2.3%左右),另外速度的测试还存在误差,所以这里差异比较小。使用了mbtree之后,每个帧编码的比特数显著降低,这说明使用了mbtree之后编码的残差变小了,佐证了该算法的有效性
2.自适应量化参数(Adaptive Quantization,AQ)
x264当中,还有一个重要mb级别的码控工具,自适应量化参数(Adaptive Quantization,AQ),它的定义如下
#define X264_AQ_NONE 0 // 不使用AQ
#define X264_AQ_VARIANCE 1 // 使用方差AQ(默认配置)
#define X264_AQ_AUTOVARIANCE 2 // 使用自动方差AQ
#define X264_AQ_AUTOVARIANCE_BIASED 3 // 使用带有偏置项的自动方差AQ
AQ通过考虑一帧之内mb的复杂程度来调整编码的量化参数,从而获得更好的编码质量,不同的模式对应着不同的自适应方式:
- X264_AQ_NONE:不使用AQ,不进行qp的调整
- X264_AQ_VARIANCE:使用方差AQ,仅考虑自身mb的复杂度,不考虑当前帧当中其他mb的复杂度
- X264_AQ_AUTOVARIACNE:使用自动方差AQ,不仅考虑自身mb的复杂度,还考虑当前帧中其他mb的复杂度
- X264_AQ_AUTOVARIACNE_BIASED:使用带有偏置项的自动方差AQ,考虑自身和其他mb的复杂度,计算qp_offset还需要加上一个偏置项
对于AQ的理解,假设每个宏块只有一个频率系数。这个系数可以大也可以小如果一个宏块的系数是1.4,它被量化为1。误差为28.5%,如果一个宏块的系数是9.4,它被量化为9。误差为4.3%。那么,当使用线性量化器时,较大系数的编码比较小系数的编码更精确。从视觉上看,这显然是错误的;仅仅因为某个细节在X块中的对比度比Y块高10倍,并不意味着Y块应该被完全删除。在这种情况下,使用AQ是比较合适的,根据宏块的系数对量化参数进行自适应调整,能够对具有不同系数的宏块更好地编码,实现更精准的mb级别的码控
另外,AQ是mbtree的必要前提,即如果要开启mbtree,则必须要开启AQ,如果没有开启则会强制开启。与AQ有关的另一个紧密相关的变量是aq_strength,它控制了AQ的强度,如果编码的场景复杂,应该使用较小的aq_strength,反之应该用较大的aq_strength。aq_strength默认配置为1,1表示认为当前的场景比较简单,调控的力度比较大
/* MB-tree requires AQ to be on, even if the strength is zero. */
// MB-tree要求AQ打开,即便aq_strength为0
if( !h->param.rc.i_aq_mode && h->param.rc.b_mb_tree )
{
h->param.rc.i_aq_mode = 1;
h->param.rc.f_aq_strength = 0;
}
自适应量化参数AQ的主要实现流程为:
- 自适应量化一帧(x264_adaptive_quant_frame)
- 量化参数的调整(x264_ratecontrol_mb_qp)
- 宏块分析(x264_macroblock_analyse)
2.1 自适应量化一帧(x264_adaptive_quant_frame)
该函数是AQ工具实现的具体函数,通过区分不同的模式,为mb计算qp_adj,即qp的偏移量。主要工作流程为:
- 如果不使用自适应量化(AQ模式),aq_mode为X264_AQ_NONE或者aq_strength为0,则考虑为了使用Mbtree进行qp_offset和inv_qscale_factor的初始化
- 如果使用AQ,则需要计算qp_adj, strength等信息
(1)如果使用AUTOVARIANCE或者是AUTOVARIANCE_BIASED,则要考虑调整qp和strength,具体地做法是计算每一个mb应该调整的qp量,随后取均值,然后计算avg_adj和strength。如果使用VARIANCE,则仅考虑strength
(2)根据不同的AQ模式对应的计算公式,来计算qp_adj
void x264_adaptive_quant_frame( x264_t *h, x264_frame_t *frame, float *quant_offsets )
{
/* Initialize frame stats */
for( int i = 0; i < 3; i++ )
{
frame->i_pixel_sum[i] = 0;
frame->i_pixel_ssd[i] = 0;
}
/* Degenerate cases */
// ----- 1.不使用自适应量化 ----- //
if( h->param.rc.i_aq_mode == X264_AQ_NONE || h->param.rc.f_aq_strength == 0 )
{
/* Need to init it anyways for MB tree */
// 无论如何都要初始化,为了Mb tree的使用
if( h->param.rc.i_aq_mode && h->param.rc.f_aq_strength == 0 )
{
if( quant_offsets )
{
for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )
frame->f_qp_offset[mb_xy] = frame->f_qp_offset_aq[mb_xy] = quant_offsets[mb_xy];
if( h->frames.b_have_lowres )
for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )
frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8( frame->f_qp_offset[mb_xy] );
}
else
{
memset( frame->f_qp_offset, 0, h->mb.i_mb_count * sizeof(float) );
memset( frame->f_qp_offset_aq, 0, h->mb.i_mb_count * sizeof(float) );
if( h->frames.b_have_lowres )
for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )
frame->i_inv_qscale_factor[mb_xy] = 256;
}
}
/* Need variance data for weighted prediction */
if( h->param.analyse.i_weighted_pred )
{
for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )
for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ )
ac_energy_mb( h, mb_x, mb_y, frame );
}
else
return;
}
/* Actual adaptive quantization */
// ----- 2.实际的自适应量化 -----
else
{
/* constants chosen to result in approximately the same overall bitrate as without AQ.
* FIXME: while they're written in 5 significant digits, they're only tuned to 2. */
float strength;
float avg_adj = 0.f;
float bias_strength = 0.f;
// 如果是AUTOVARIANCE或者AUTOVARIANCE_BIASED,则需要考虑调整qp和strength
if( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE || h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE_BIASED )
{
float bit_depth_correction = 1.f / (1 << (2*(BIT_DEPTH-8)));
float avg_adj_pow2 = 0.f;
// 计算每一个mb应该调整的qp量,随后取均值
for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )
for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ )
{
// 在AQ模式计算中,将能量定义为方差。能量(方差)越大,qp调整值越大
// E[X^2] - E[X]^2 = ssd - (sum * sum >> shift)
uint32_t energy = ac_energy_mb( h, mb_x, mb_y, frame );
float qp_adj = powf( energy * bit_depth_correction + 1, 0.125f );
frame->f_qp_offset[mb_x + mb_y*h->mb.i_mb_stride] = qp_adj;
// 计算avg_adj
avg_adj += qp_adj;
avg_adj_pow2 += qp_adj * qp_adj;
}
avg_adj /= h->mb.i_mb_count;
avg_adj_pow2 /= h->mb.i_mb_count;
// 计算strength,根据所有mb的平均qp_adj来调整
strength = h->param.rc.f_aq_strength * avg_adj;
// 计算avg_adj,根据所有mb的平均
avg_adj = avg_adj - 0.5f * (avg_adj_pow2 - 14.f) / avg_adj;
// 计算bias_strength
bias_strength = h->param.rc.f_aq_strength;
}
else
// 如果是VARIANCE,则仅调整strength
strength = h->param.rc.f_aq_strength * 1.0397f;
for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )
for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ )
{
float qp_adj;
int mb_xy = mb_x + mb_y*h->mb.i_mb_stride;
if( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE_BIASED ) // AUTOVARIANCE_BIASED的qp调整
{
qp_adj = frame->f_qp_offset[mb_xy];
qp_adj = strength * (qp_adj - avg_adj) + bias_strength * (1.f - 14.f / (qp_adj * qp_adj)); // 增加的偏置项强度
}
else if( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE ) // AUTOVARIANCE的qp调整
{
qp_adj = frame->f_qp_offset[mb_xy];
qp_adj = strength * (qp_adj - avg_adj);
}
else // VARIANCE的qp调整
{
uint32_t energy = ac_energy_mb( h, mb_x, mb_y, frame );
qp_adj = strength * (x264_log2( X264_MAX(energy, 1) ) - (14.427f + 2*(BIT_DEPTH-8)));
}
if( quant_offsets )
qp_adj += quant_offsets[mb_xy];
// 最后赋值的地方
frame->f_qp_offset[mb_xy] =
frame->f_qp_offset_aq[mb_xy] = qp_adj;
if( h->frames.b_have_lowres )
frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8(qp_adj);
}
}
// ...
}
2.2 量化参数的调整(x264_ratecontrol_mb_qp)
计算之后的qp,实际调整的地方位于x264_ratecontrol_mb_qp之中,该函数给出了一个待编码mb的初始qp
int x264_ratecontrol_mb_qp( x264_t *h )
{
x264_emms();
float qp = h->rc->qpm;
if( h->param.rc.i_aq_mode )
{
/* MB-tree currently doesn't adjust quantizers in unreferenced frames. */
// 如果当前帧会作为一个参考帧,则使用qp_offset,否则使用qp_offset_aq
float qp_offset = h->fdec->b_kept_as_ref ? h->fenc->f_qp_offset[h->mb.i_mb_xy] : h->fenc->f_qp_offset_aq[h->mb.i_mb_xy];
/* Scale AQ's effect towards zero in emergency mode. */
if( qp > QP_MAX_SPEC )
qp_offset *= (QP_MAX - qp) / (QP_MAX - QP_MAX_SPEC);
qp += qp_offset;
}
return x264_clip3( qp + 0.5f, h->param.rc.i_qp_min, h->param.rc.i_qp_max );
}
2.3 宏块分析(x264_macroblock_analyse)
在进行宏块分析时,如果启用了aq并且像素精度的等级小于10,则会进行qp的调整。如果当前mb的qp和前面mb的qp的差值为1,则将当前mb的qp调整为和前面的qp一样,否则使用当前qp
void x264_macroblock_analyse( x264_t *h )
{
x264_mb_analysis_t analysis;
int i_cost = COST_MAX;
h->mb.i_qp = x264_ratecontrol_mb_qp( h );
/* If the QP of this MB is within 1 of the previous MB, code the same QP as the previous MB,
* to lower the bit cost of the qp_delta. Don't do this if QPRD is enabled. */
// 如果使用aq模式,并且像素精度的等级小于10
// 如果当前mb的qp和前面mb的qp的差值为1,则将当前mb的qp调整为和前面的qp一样,否则使用当前qp
if( h->param.rc.i_aq_mode && h->param.analyse.i_subpel_refine < 10 )
h->mb.i_qp = abs(h->mb.i_qp - h->mb.i_last_qp) == 1 ? h->mb.i_last_qp : h->mb.i_qp;
// ...
2.4 AQ的测试
使用自测的序列做简单的测试,分辨率为1920x1200,在这里还关闭了mbtree,因为如果启用了mbtree则会强制开启aq。对比测试配置的参数为开启–no-mbtree 和关闭AQ --no-mbtree --aq-mode 0
(1)使用默认配置的AQ(X264_AQ_VARIANCE)
yuv [info]: 1920x1200p 0:0 @ 25/1 fps (cfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7 Avg QP:23.59 size:195961
x264 [info]: frame P:39 Avg QP:26.23 size:110076
x264 [info]: frame B:54 Avg QP:28.41 size: 50786
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I I16..4: 4.4% 79.4% 16.2%
x264 [info]: mb P I16..4: 5.3% 36.6% 12.0% P16..4: 22.8% 13.9% 6.6% 0.0% 0.0% skip: 2.8%
x264 [info]: mb B I16..4: 2.1% 6.0% 2.4% B16..8: 41.2% 16.4% 3.9% direct: 6.0% skip:22.1% L0:42.4% L1:47.2% BI:10.5%
x264 [info]: 8x8 transform intra:68.4% inter:66.7%
x264 [info]: coded y,uvDC,uvAC intra: 73.8% 32.0% 5.2% inter: 37.2% 15.2% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16% 6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23% 5% 5% 4% 6% 4% 7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10% 4% 5% 5% 6% 4% 4%
x264 [info]: i8c dc,h,v,p: 65% 19% 15% 1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 65.4% 19.2% 10.6% 3.7% 1.2%
x264 [info]: ref B L0: 94.3% 4.5% 1.2%
x264 [info]: ref B L1: 98.8% 1.2%
x264 [info]: kb/s:16814.29
encoded 100 frames, 8.43 fps, 16814.29 kb/s
(2)关闭AQ
yuv [info]: 1920x1200p 0:0 @ 25/1 fps (cfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7 Avg QP:25.00 size:195532
x264 [info]: frame P:39 Avg QP:27.79 size:114287
x264 [info]: frame B:54 Avg QP:29.52 size: 54980
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I I16..4: 20.0% 66.9% 13.0%
x264 [info]: mb P I16..4: 13.7% 28.8% 10.8% P16..4: 16.6% 12.6% 6.5% 0.0% 0.0% skip:11.0%
x264 [info]: mb B I16..4: 3.2% 4.9% 2.6% B16..8: 35.2% 15.6% 4.3% direct: 4.5% skip:29.7% L0:40.8% L1:45.1% BI:14.2%
x264 [info]: 8x8 transform intra:55.4% inter:61.7%
x264 [info]: coded y,uvDC,uvAC intra: 61.5% 33.5% 7.1% inter: 33.4% 10.1% 0.1%
x264 [info]: i16 v,h,dc,p: 57% 24% 10% 8%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 29% 18% 20% 5% 5% 4% 6% 5% 7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 11% 4% 5% 5% 6% 4% 4%
x264 [info]: i8c dc,h,v,p: 68% 17% 14% 1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 63.2% 22.2% 10.6% 3.1% 0.9%
x264 [info]: ref B L0: 94.7% 4.2% 1.1%
x264 [info]: ref B L1: 98.9% 1.1%
x264 [info]: kb/s:17589.66
encoded 100 frames, 9.00 fps, 17589.66 kb/s
从上面的对比来看,关闭AQ之后,码率增加了4.5%,并且编码的帧率也增加了6.3%。从数据来看,关闭AQ之后,各个帧的类型的AvgQP都增加了,其中B帧的编码比特增加较为明显。这里猜测是因为B帧的帧间编码比较复杂,利用帧内的信息进行了AQ的调整,使得编码qp更加合理,这样编码的比特数更小
3.小结
本节讨论了x264当中宏块级别的两个工具Mbtree和AQ,其中Mbtree基于帧间,利用前后帧当中mb互相参考的重要程度来调控mb级别的qp,AQ基于帧内,利用当前帧的mb(或当前帧所有mb)的复杂度信息来调控mb级别的qp。两者的差异是应用在帧内和帧间两个模块之中,关联是Mbtree依赖于AQ,即要使用Mbtree时必须开启AQ,如没有开启AQ,则会强制打开
值得思考的问题是:
(1)这里只考虑了mb级别的码控,没有考虑帧级别的,帧级别的码控是如何处理的?
(2)Mbtree使用了lookahead,这个lookahead实现的细节是怎样的?
CSDN : https://blog.csdn.net/weixin_42877471
Github : https://github.com/DoFulangChen