【x264】码率控制模块的简单分析—宏块级码控工具Mbtree和AQ

参考:
帧类型决策-slicetype_frame/slice/mb_cost()

参数分析:
【x264】x264编码器参数配置

流程分析:
【x264】x264编码主流程简单分析
【x264】编码核心函数(x264_encoder_encode)的简单分析
【x264】分析模块(analyse)的简单分析—帧内预测
【x264】分析模块(analyse)的简单分析—帧间预测

1. 宏块树(mbtree)

mbtree是一种宏块级别的码率控制工具,根据帧间预测相关信息,来有效的降低编码的码率。mbtree的工作原理可以简单描述为,在lookahead当中通过前向预测,来获得当前mb的重要程度,重要程度由被参考的情况来衡量。如果被参考的权值很高,说明这个mb很重要,此时这个mb会按照较低的qp编码,否则按照较高的qp编码。这个权值的衡量由当前mb的intra_cost、inter_cost和前面mb赋予当前mb的传播cost决定

在mbtree中,主要的计算过程位于macroblock_tree中,函数位于slicetype.c中,函数主要的工作流程如下:

  1. 如果当前帧为intra帧,则直接计算cost
  2. 为propagate_cost赋初始值
  3. 循环处理每一帧,计算cost
  4. 计算当前帧的帧内和帧间cost(slicetype_frame_cost)
  5. 计算宏块的传播cost(macroblock_tree_propagate),之后根据帧内、帧间cost和宏块传播cost来计算qp_offset(macroblock_tree_finish),从而调整当前mb的qp
static void macroblock_tree( x264_t *h, x264_mb_analysis_t *a, x264_frame_t **frames, int num_frames, int b_intra )
{
    int idx = !b_intra;
    int last_nonb, cur_nonb = 1;
    int bframes = 0;

    x264_emms();
    float total_duration = 0.0;
    for( int j = 0; j <= num_frames; j++ )
        total_duration += frames[j]->f_duration;
    float average_duration = total_duration / (num_frames + 1);

    int i = num_frames;
	// ----- 1.当前帧为intra帧,直接计算cost ----- //
    if( b_intra )
        slicetype_frame_cost( h, a, frames, 0, 0, 0 );

    while( i > 0 && IS_X264_TYPE_B( frames[i]->i_type ) )
        i--;
    last_nonb = i;

    /* Lookaheadless MB-tree is not a theoretically distinct case; the same extrapolation could
     * be applied to the end of a lookahead buffer of any size.  However, it's most needed when
     * lookahead=0, so that's what's currently implemented. */
    // ----- 2.为propagate_cost赋初始值 ----- //
    if( !h->param.rc.i_lookahead )
    {
        if( b_intra )
        {
            memset( frames[0]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
            memcpy( frames[0]->f_qp_offset, frames[0]->f_qp_offset_aq, h->mb.i_mb_count * sizeof(float) );
            return;
        }
        XCHG( uint16_t*, frames[last_nonb]->i_propagate_cost, frames[0]->i_propagate_cost );
        memset( frames[0]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
    }
    else
    {
        if( last_nonb < idx )
            return;
        memset( frames[last_nonb]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
    }
	// ----- 3.循环处理每一帧,计算cost ----- //
    while( i-- > idx )
    {
        cur_nonb = i;
        while( IS_X264_TYPE_B( frames[cur_nonb]->i_type ) && cur_nonb > 0 )
            cur_nonb--;
        if( cur_nonb < idx )
            break;
        // ---- 4.计算当前帧的帧内和帧间cost ----- //
        slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, last_nonb );
        memset( frames[cur_nonb]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
        bframes = last_nonb - cur_nonb - 1;
        if( h->param.i_bframe_pyramid && bframes > 1 ) // B帧
        {
            int middle = (bframes + 1)/2 + cur_nonb;
            slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, middle );
            memset( frames[middle]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );
            while( i > cur_nonb )
            {
                int p0 = i > middle ? middle : cur_nonb;
                int p1 = i < middle ? middle : last_nonb;
                if( i != middle )
                {
                    slicetype_frame_cost( h, a, frames, p0, p1, i );
                    // ----- 5.计算宏块的传播cost ----- //
                    macroblock_tree_propagate( h, frames, average_duration, p0, p1, i, 0 );
                }
                i--;
            }
            macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, middle, 1 );
        }
        else
        {
            while( i > cur_nonb )
            {
                slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, i );
                macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, i, 0 );
                i--;
            }
        }
        macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, last_nonb, 1 );
        last_nonb = cur_nonb;
    }

    if( !h->param.rc.i_lookahead )
    {
        slicetype_frame_cost( h, a, frames, 0, last_nonb, last_nonb );
        macroblock_tree_propagate( h, frames, average_duration, 0, last_nonb, last_nonb, 1 );
        XCHG( uint16_t*, frames[last_nonb]->i_propagate_cost, frames[0]->i_propagate_cost );
    }
	// ----- 6.计算量化参数传播系数 ----- //
    macroblock_tree_finish( h, frames[last_nonb], average_duration, last_nonb );
    if( h->param.i_bframe_pyramid && bframes > 1 && !h->param.rc.i_vbv_buffer_size )
        macroblock_tree_finish( h, frames[last_nonb+(bframes+1)/2], average_duration, 0 );
}

1.1 计算当前帧的帧内和帧间cost(slicetype_frame_cost)

计算帧的帧内和帧间cost,其主要的工作流程为:

  1. 检查是否已经评估了该帧
  2. 多线程处理
  3. 单线程处理
  4. 计算slice的cost(slicetype_slice_cost),其中会计算mb的cost(slicetype_mb_cost),会进行帧内预测和帧间预测,计算其最佳的cost
  5. 将计算结果累加起来
/*
将帧b以slice为单位计算其开销,统计其总开销
其中p0表示b的前向参考帧,p1表示b的后向参考帧
若p0 = p1 = b,则表示没有参考帧,即I帧
若p1 = b,则表示只有前向参考帧,即P帧

作为I帧,所有宏块的cost = intra cost
作为P帧,所有宏块的cost = min( intra cost, inter cost)
作为B帧,所有宏块的cost = inter cost

其中每一个帧都带有开销矩阵i_cost_est[b-p0][p1-b]
表示帧b以p0为前向参考,p1为后向参考时的帧cost
*/
static int slicetype_frame_cost( x264_t *h, x264_mb_analysis_t *a,
                                 x264_frame_t **frames, int p0, int p1, int b )
{
    int i_score = 0;
    int do_search[2];
    const x264_weight_t *w = x264_weight_none;
    x264_frame_t *fenc = frames[b];

    /* Check whether we already evaluated this frame
     * If we have tried this frame as P, then we have also tried
     * the preceding frames as B. (is this still true?) */
    // ----- 1.检查是否已经评估了该帧,如果我们已经把该帧当作P帧计算,那我们仍然需要将前面的帧当作B帧来计算 ----- //
    
    /* Also check that we already calculated the row SATDs for the current frame. */
    // 同样检查我们是否对该帧进行了行SATD计算
    if( fenc->i_cost_est[b-p0][p1-b] >= 0 && (!h->param.rc.i_vbv_buffer_size || fenc->i_row_satds[b-p0][p1-b][0] != -1) ) // 如果帧b已经计算了前向参考p0后向参考p1的cost,则直接得到开销
        i_score = fenc->i_cost_est[b-p0][p1-b];
    else
    {
        int dist_scale_factor = 128;

        /* For each list, check to see whether we have lowres motion-searched this reference frame before. */
        // 对于每个列表,检查我们之前是否对这个参考帧进行了低分辨率运动搜索
        do_search[0] = b != p0 && fenc->lowres_mvs[0][b-p0-1][0][0] == 0x7FFF;
        do_search[1] = b != p1 && fenc->lowres_mvs[1][p1-b-1][0][0] == 0x7FFF;
        if( do_search[0] ) // 前向
        {
            if( h->param.analyse.i_weighted_pred && b == p1 )
            {
                x264_emms();
                x264_weights_analyse( h, fenc, frames[p0], 1 );
                w = fenc->weight[0];
            }
            fenc->lowres_mvs[0][b-p0-1][0][0] = 0;
        }
        if( do_search[1] ) fenc->lowres_mvs[1][p1-b-1][0][0] = 0; // 后向

        if( p1 != p0 )
            dist_scale_factor = ( ((b-p0) << 8) + ((p1-p0) >> 1) ) / (p1-p0);

        int output_buf_size = h->mb.i_mb_height + (NUM_INTS + PAD_SIZE) * h->param.i_lookahead_threads;
        int *output_inter[X264_LOOKAHEAD_THREAD_MAX+1];
        int *output_intra[X264_LOOKAHEAD_THREAD_MAX+1];
        output_inter[0] = h->scratch_buffer2;
        output_intra[0] = output_inter[0] + output_buf_size;

#if HAVE_OPENCL
        if( h->param.b_opencl )
        {
            x264_opencl_lowres_init(h, fenc, a->i_lambda );
            if( do_search[0] )
            {
                x264_opencl_lowres_init( h, frames[p0], a->i_lambda );
                x264_opencl_motionsearch( h, frames, b, p0, 0, a->i_lambda, w );
            }
            if( do_search[1] )
            {
                x264_opencl_lowres_init( h, frames[p1], a->i_lambda );
                x264_opencl_motionsearch( h, frames, b, p1, 1, a->i_lambda, NULL );
            }
            if( b != p0 )
                x264_opencl_finalize_cost( h, a->i_lambda, frames, p0, p1, b, dist_scale_factor );
            x264_opencl_flush( h );

            i_score = fenc->i_cost_est[b-p0][p1-b];
        }
        else
#endif
        {	// ----- 2.多线程处理 ----- //
            if( h->param.i_lookahead_threads > 1 ) // 多个lookahead线程
            { 
                x264_slicetype_slice_t s[X264_LOOKAHEAD_THREAD_MAX];
                for( int i = 0; i < h->param.i_lookahead_threads; i++ ) 
                {
                    x264_t *t = h->lookahead_thread[i];

                    /* FIXME move this somewhere else */
                    // 将搜索方法赋值
                    t->mb.i_me_method = h->mb.i_me_method;
                    t->mb.i_subpel_refine = h->mb.i_subpel_refine;
                    t->mb.b_chroma_me = h->mb.b_chroma_me;

                    s[i] = (x264_slicetype_slice_t){ t, a, frames, p0, p1, b, dist_scale_factor, do_search, w,
                        output_inter[i], output_intra[i] };
					// 计算起始行
                    t->i_threadslice_start = ((h->mb.i_mb_height *  i    + h->param.i_lookahead_threads/2) / h->param.i_lookahead_threads);
                    // 计算结束行
                    t->i_threadslice_end   = ((h->mb.i_mb_height * (i+1) + h->param.i_lookahead_threads/2) / h->param.i_lookahead_threads);

                    int thread_height = t->i_threadslice_end - t->i_threadslice_start;
                    int thread_output_size = thread_height + NUM_INTS;
                    memset( output_inter[i], 0, thread_output_size * sizeof(int) );
                    memset( output_intra[i], 0, thread_output_size * sizeof(int) );
                    output_inter[i][NUM_ROWS] = output_intra[i][NUM_ROWS] = thread_height;

                    output_inter[i+1] = output_inter[i] + thread_output_size + PAD_SIZE;
                    output_intra[i+1] = output_intra[i] + thread_output_size + PAD_SIZE;
					// 运行线程
                    x264_threadpool_run( h->lookaheadpool, (void*)slicetype_slice_cost, &s[i] );
                }
                // 等待线程结束
                for( int i = 0; i < h->param.i_lookahead_threads; i++ )
                    x264_threadpool_wait( h->lookaheadpool, &s[i] );
            }
            else
            { // ----- 3.单个lookahead线程 ----- //
                h->i_threadslice_start = 0;
                h->i_threadslice_end = h->mb.i_mb_height;
                memset( output_inter[0], 0, (output_buf_size - PAD_SIZE) * sizeof(int) );
                memset( output_intra[0], 0, (output_buf_size - PAD_SIZE) * sizeof(int) );
                output_inter[0][NUM_ROWS] = output_intra[0][NUM_ROWS] = h->mb.i_mb_height;
                x264_slicetype_slice_t s = (x264_slicetype_slice_t){ h, a, frames, p0, p1, b, dist_scale_factor, do_search, w,
                    output_inter[0], output_intra[0] };
                // ---- 4.计算slice的cost ---- //
                slicetype_slice_cost( &s );
            }
			// ----- 5.将计算结果累加起来 ----- //
            /* Sum up accumulators */
            if( b == p1 )
                fenc->i_intra_mbs[b-p0] = 0;
            if( !fenc->b_intra_calculated )
            {
                fenc->i_cost_est[0][0] = 0;
                fenc->i_cost_est_aq[0][0] = 0;
            }
            fenc->i_cost_est[b-p0][p1-b] = 0;
            fenc->i_cost_est_aq[b-p0][p1-b] = 0;

            int *row_satd_inter = fenc->i_row_satds[b-p0][p1-b];
            int *row_satd_intra = fenc->i_row_satds[0][0];
            for( int i = 0; i < h->param.i_lookahead_threads; i++ )
            {
                if( b == p1 )
                    fenc->i_intra_mbs[b-p0] += output_inter[i][INTRA_MBS];
                if( !fenc->b_intra_calculated )
                {
                    fenc->i_cost_est[0][0] += output_intra[i][COST_EST];
                    fenc->i_cost_est_aq[0][0] += output_intra[i][COST_EST_AQ];
                }
				// 累计intra的cost和aq_cost到开销矩阵i_cost_est[0][0]中
                fenc->i_cost_est[b-p0][p1-b] += output_inter[i][COST_EST];
                fenc->i_cost_est_aq[b-p0][p1-b] += output_inter[i][COST_EST_AQ];

                if( h->param.rc.i_vbv_buffer_size )
                {
                    int row_count = output_inter[i][NUM_ROWS];
                    memcpy( row_satd_inter, output_inter[i] + NUM_INTS, row_count * sizeof(int) );
                    if( !fenc->b_intra_calculated )
                        memcpy( row_satd_intra, output_intra[i] + NUM_INTS, row_count * sizeof(int) );
                    row_satd_inter += row_count;
                    row_satd_intra += row_count;
                }
            }

            i_score = fenc->i_cost_est[b-p0][p1-b];
            if( b != p1 )
                i_score = (uint64_t)i_score * 100 / (120 + h->param.i_bframe_bias);
            else
                fenc->b_intra_calculated = 1;

            fenc->i_cost_est[b-p0][p1-b] = i_score;
            x264_emms();
        }
    }

    return i_score;
}

1.2 计算宏块的传播cost(macroblock_tree_propagate)

该函数的主要功能是计算宏块的传播cost(或者叫做遗传cost),主要的工作流程为:

  1. 遍历所有的mb
  2. 计算传播cost(mbtree_propagate_cost)
  3. 将计算地传播cost分给参考mb(mbtree_propagate_list)
  4. 根据前述的计算结果计算qp_offset(macroblock_tree_finish)
static void macroblock_tree_propagate( x264_t *h, x264_frame_t **frames, float average_duration, int p0, int p1, int b, int referenced )
{
    // ...
    float fps_factor = CLIP_DURATION(frames[b]->f_duration) / (CLIP_DURATION(average_duration) * 256.0f) * MBTREE_PRECISION;
	
    /* For non-reffed frames the source costs are always zero, so just memset one row and re-use it. */
    // 对于没有被参考的帧,cost永远是0,因此直接赋值为0即可
    if( !referenced )
        memset( frames[b]->i_propagate_cost, 0, h->mb.i_mb_width * sizeof(uint16_t) );
	// ----- 1.遍历所有的mb ----- //
    for( h->mb.i_mb_y = 0; h->mb.i_mb_y < h->mb.i_mb_height; h->mb.i_mb_y++ )
    {
        int mb_index = h->mb.i_mb_y*h->mb.i_mb_stride;
        // ----- 2.计算传播cost ----- //
        h->mc.mbtree_propagate_cost( buf, propagate_cost,
            frames[b]->i_intra_cost+mb_index, lowres_costs+mb_index,
            frames[b]->i_inv_qscale_factor+mb_index, &fps_factor, h->mb.i_mb_width );
        if( referenced )
            propagate_cost += h->mb.i_mb_width;
		// ----- 3.将计算地传播cost分给参考mb ----- //
        h->mc.mbtree_propagate_list( h, ref_costs[0], &mvs[0][mb_index], buf, &lowres_costs[mb_index],
                                     bipred_weights[0], h->mb.i_mb_y, h->mb.i_mb_width, 0 );
        if( b != p1 )
        {
            h->mc.mbtree_propagate_list( h, ref_costs[1], &mvs[1][mb_index], buf, &lowres_costs[mb_index],
                                         bipred_weights[1], h->mb.i_mb_y, h->mb.i_mb_width, 1 );
        }
    }
	// ----- 4.根据前述的计算结果调整qp ----- //
    if( h->param.rc.i_vbv_buffer_size && h->param.rc.i_lookahead && referenced )
        macroblock_tree_finish( h, frames[b], average_duration, b == p1 ? b - p0 : 0 );
}

其中,计算传播cost的核心函数mbtree_propagate_cost的实现方式为

/* Estimate the total amount of influence on future quality that could be had if we
 * were to improve the reference samples used to inter predict any given macroblock. */
// 本函数评估的内容是,如果提升任何用于帧间预测的样本,会对未来编码的质量带来多大的性能增益 
static void mbtree_propagate_cost( int16_t *dst, uint16_t *propagate_in, uint16_t *intra_costs,
                                   uint16_t *inter_costs, uint16_t *inv_qscales, float *fps_factor, int len )
{
	// fps因子,在观看时,如果一个视频帧持续的时间较长,对观看者的主观影响就越大,因此将fps因子作为一个参数加进来进行调整
	// float fps_factor = CLIP_DURATION(frames[b]->f_duration) / (CLIP_DURATION(average_duration) * 256.0f) * MBTREE_PRECISION;
    float fps = *fps_factor;
    for( int i = 0; i < len; i++ )
    {
        int intra_cost = intra_costs[i];
        // 如果inter_cost > intra_cost,将其修正为intra_cost
        int inter_cost = X264_MIN(intra_costs[i], inter_costs[i] & LOWRES_COST_MASK);
        // qscale是拉格朗日乘子,或者也可叫做lambda,其计算方式为
        // qscale = 0.85f * powf( 2.0f, ( qp - (12.0f + QP_BD_OFFSET) ) / 6.0f );
        // 
        /* 
        	我对inv_qscale初步的理解如下:
        	(1)inv_qscale的计算方式为:
        		frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8( frame->f_qp_offset[mb_xy] );
        		或 frame->i_inv_qscale_factor[mb_xy] = 256;
        		或 frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8(qp_adj); 使用aq_mode时会执行的操作
        		由于默认会使用aq_mode,所以默认是将qp_adj转换成为inv_qscale,即aq转换过来的qscale,所以可以理解为是一个aq的系数
			(2)aq模式下,会对每一个mb进行qp的调整,所以将aq的系数也作为一个衡量重要程度的一个因子
		*/
        float propagate_intra  = intra_cost * inv_qscales[i];
        // propagate_in是前面的帧中计算的传播cost赋值到当前mb的
        // 加上当前帧本身的cost,得到总体的cost
        float propagate_amount = propagate_in[i] + propagate_intra*fps;
        // inter_cost相对于intra_cost而言,如果越小,说明当前mb的传播cost越多,即当前mb蕴含的信息越多,越重要
        float propagate_num    = intra_cost - inter_cost;
        float propagate_denom  = intra_cost;
        // 计算出来的propagate cost后续会被分到参考的mb中,用以表示该参考mb的重要性
        dst[i] = X264_MIN((int)(propagate_amount * propagate_num / propagate_denom + 0.5f), 32767);
    }
}

将计算出来的传播cost分发给参考mb

static void mbtree_propagate_list( x264_t *h, uint16_t *ref_costs, int16_t (*mvs)[2],
                                   int16_t *propagate_amount, uint16_t *lowres_costs,
                                   int bipred_weight, int mb_y, int len, int list )
{
    	// ...
        int x = mvs[i][0];
        int y = mvs[i][1];
        unsigned mbx = (unsigned)((x>>5)+i);
        unsigned mby = (unsigned)((y>>5)+mb_y);
        unsigned idx0 = mbx + mby * stride;
        unsigned idx2 = idx0 + stride;
        x &= 31;
        y &= 31;
        /* 
        	+ ---- + ---- +
			|	0  |   1  |
			+ ---- + ---- +
			|   2  |   3  |
			+ ---- + ---- +
			运动搜索的结果不一定让mv刚好处于某一个Mb上,可能处于多个Mb宏块上,则将当前的遗传代价根据所占块的面积来分配给各个宏块
			分成了4个小块,每个mb都是16*16
		*/
        int idx0weight = (32-y)*(32-x); // 0号块的面积
        int idx1weight = (32-y)*x;		// 1号块的面积
        int idx2weight = y*(32-x);		// 2号块的面积
        int idx3weight = y*x;			// 3号块的面积
        idx0weight = (idx0weight * listamount + 512) >> 10;	// 0号块的权重
        idx1weight = (idx1weight * listamount + 512) >> 10;	// 1号块的权重
        idx2weight = (idx2weight * listamount + 512) >> 10;	// 2号块的权重
        idx3weight = (idx3weight * listamount + 512) >> 10;	// 3号块的权重
		// 更新当前mb传播出去的代价
        if( mbx < width-1 && mby < height-1 )
        {
            MC_CLIP_ADD( ref_costs[idx0+0], idx0weight );
            MC_CLIP_ADD( ref_costs[idx0+1], idx1weight );
            MC_CLIP_ADD( ref_costs[idx2+0], idx2weight );
            MC_CLIP_ADD( ref_costs[idx2+1], idx3weight );
        }
        // ...
    }
}

根据传播cost调整qp,实现的函数macroblock_tree_finish如下

static void macroblock_tree_finish( x264_t *h, x264_frame_t *frame, float average_duration, int ref0_distance )
{
    int fps_factor = round( CLIP_DURATION(average_duration) / CLIP_DURATION(frame->f_duration) * 256 / MBTREE_PRECISION );
    float weightdelta = 0.0;
    if( ref0_distance && frame->f_weighted_cost_delta[ref0_distance-1] > 0 )
        weightdelta = (1.0 - frame->f_weighted_cost_delta[ref0_distance-1]);

    /* Allow the strength to be adjusted via qcompress, since the two
     * concepts are very similar. */
    // 允许通过qcompress来调整strength
    float strength = 5.0f * (1.0f - h->param.rc.f_qcompress);
    for( int mb_index = 0; mb_index < h->mb.i_mb_count; mb_index++ )
    {
        int intra_cost = (frame->i_intra_cost[mb_index] * frame->i_inv_qscale_factor[mb_index] + 128) >> 8;
        if( intra_cost )
        {
            int propagate_cost = (frame->i_propagate_cost[mb_index] * fps_factor + 128) >> 8;
            // intra_cost : 当前mb的代价,propagate_cost : 前面的mb计算出来的传播cost赋予到当前的mb,两者相加表示了当前mb所蕴含的信息量
            float log2_ratio = x264_log2(intra_cost + propagate_cost) - x264_log2(intra_cost) + weightdelta;
            // 我对这里的理解是,如果log2_ratio越大,右边整体的值就越小,此时qp_offset就越小
            // log2_ratio表示的是传入进来的cost和自身cost的比值
            // 如果传入进来propagate_cost比自身的intra_cost要大,则log2_ratio较大
            // 此时,下面等式右边的值就会偏小,即qp_offset偏小,当前mb的qp会比较小(当前mb很重要)
            // 反之,qp_offset会偏大,当前mb的qp会较大(当前mb不那么重要)
            frame->f_qp_offset[mb_index] = frame->f_qp_offset_aq[mb_index] - strength * log2_ratio;
        }
    }
}

1.3 mbtree的测试

从上面的分析来看,对于mbtree这一项优化工具而言,利用了lookahead中的一系列参考帧,计算每个mb在这些参考关系中的重要程度,如果一个mb很重要,则进行较高质量编码(低qp),否则进行较低质量编码(高qp)。

重要程度的衡量内容包括:
(1)当前mb的intra_cost、inter_cost(由自己计算)
(2)当前mb的propagate_cost(由其他mb计算并且赋值到当前mb)

mbtree这一工具所提出的论文为:《Garrett-Glaser J. A novel macroblock-tree algorithm for high-performance optimization of dependent video coding in H.264/AVC[J]. Tech. Rep., 2009.》,新的宏块树方法在很低的运算消耗下比已有的宏块树算法在PSNR性能上提高1.2db,在SSIM性能上提高2.3db。我自己用自己的测试序列进行测试100帧,分辨率为1290x1200,是否启用mbtree可以用x264的命令行参数–no-mbtree指定:

(1)不使用mbtree

x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:8     Avg QP:23.88  size:186467
x264 [info]: frame P:39    Avg QP:26.54  size:108471
x264 [info]: frame B:53    Avg QP:27.15  size: 63502
x264 [info]: consecutive B-frames: 19.0% 22.0% 27.0% 32.0%
x264 [info]: mb I  I16..4:  5.4% 79.7% 14.9%
x264 [info]: mb P  I16..4:  6.3% 32.9% 11.4%  P16..4: 23.3% 14.6%  7.3%  0.0%  0.0%    skip: 4.2%
x264 [info]: mb B  I16..4:  2.0%  7.0%  2.6%  B16..8: 40.0% 17.7%  4.6%  direct: 7.4%  skip:18.7%  L0:41.7% L1:45.6% BI:12.8%
x264 [info]: 8x8 transform intra:67.6% inter:65.4%
x264 [info]: coded y,uvDC,uvAC intra: 72.1% 32.0% 5.5% inter: 41.5% 18.1% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16%  6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23%  5%  5%  4%  6%  4%  7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10%  4%  5%  4%  6%  4%  4%
x264 [info]: i8c dc,h,v,p: 64% 19% 16%  1%
x264 [info]: Weighted P-Frames: Y:23.1% UV:12.8%
x264 [info]: ref P L0: 66.4% 19.8%  9.3%  3.7%  0.8%
x264 [info]: ref B L0: 94.4%  4.4%  1.1%
x264 [info]: ref B L1: 99.1%  0.9%
x264 [info]: kb/s:18175.45

encoded 100 frames, 8.24 fps, 18175.44 kb/s

(2)使用mbtree

x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7     Avg QP:23.59  size:195961
x264 [info]: frame P:39    Avg QP:26.23  size:110076
x264 [info]: frame B:54    Avg QP:28.41  size: 50786
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I  I16..4:  4.4% 79.4% 16.2%
x264 [info]: mb P  I16..4:  5.3% 36.6% 12.0%  P16..4: 22.8% 13.9%  6.6%  0.0%  0.0%    skip: 2.8%
x264 [info]: mb B  I16..4:  2.1%  6.0%  2.4%  B16..8: 41.2% 16.4%  3.9%  direct: 6.0%  skip:22.1%  L0:42.4% L1:47.2% BI:10.5%
x264 [info]: 8x8 transform intra:68.4% inter:66.7%
x264 [info]: coded y,uvDC,uvAC intra: 73.8% 32.0% 5.2% inter: 37.2% 15.2% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16%  6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23%  5%  5%  4%  6%  4%  7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10%  4%  5%  5%  6%  4%  4%
x264 [info]: i8c dc,h,v,p: 65% 19% 15%  1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 65.4% 19.2% 10.6%  3.7%  1.2%
x264 [info]: ref B L0: 94.3%  4.5%  1.2%
x264 [info]: ref B L1: 98.8%  1.2%
x264 [info]: kb/s:16814.29

encoded 100 frames, 8.43 fps, 16814.29 kb/s

从测试结果来看,使用mbtree能够显著降低bitrate(7.5%左右),且带来的编码速度降低不十分明显(2.3%左右),另外速度的测试还存在误差,所以这里差异比较小。使用了mbtree之后,每个帧编码的比特数显著降低,这说明使用了mbtree之后编码的残差变小了,佐证了该算法的有效性

2.自适应量化参数(Adaptive Quantization,AQ)

x264当中,还有一个重要mb级别的码控工具,自适应量化参数(Adaptive Quantization,AQ),它的定义如下

#define X264_AQ_NONE                 0	// 不使用AQ
#define X264_AQ_VARIANCE             1  // 使用方差AQ(默认配置)
#define X264_AQ_AUTOVARIANCE         2	// 使用自动方差AQ
#define X264_AQ_AUTOVARIANCE_BIASED  3	// 使用带有偏置项的自动方差AQ

AQ通过考虑一帧之内mb的复杂程度来调整编码的量化参数,从而获得更好的编码质量,不同的模式对应着不同的自适应方式:

  1. X264_AQ_NONE:不使用AQ,不进行qp的调整
  2. X264_AQ_VARIANCE:使用方差AQ,仅考虑自身mb的复杂度,不考虑当前帧当中其他mb的复杂度
  3. X264_AQ_AUTOVARIACNE:使用自动方差AQ,不仅考虑自身mb的复杂度,还考虑当前帧中其他mb的复杂度
  4. X264_AQ_AUTOVARIACNE_BIASED:使用带有偏置项的自动方差AQ,考虑自身和其他mb的复杂度,计算qp_offset还需要加上一个偏置项

对于AQ的理解,假设每个宏块只有一个频率系数。这个系数可以大也可以小如果一个宏块的系数是1.4,它被量化为1。误差为28.5%,如果一个宏块的系数是9.4,它被量化为9。误差为4.3%。那么,当使用线性量化器时,较大系数的编码比较小系数的编码更精确。从视觉上看,这显然是错误的;仅仅因为某个细节在X块中的对比度比Y块高10倍,并不意味着Y块应该被完全删除。在这种情况下,使用AQ是比较合适的,根据宏块的系数对量化参数进行自适应调整,能够对具有不同系数的宏块更好地编码,实现更精准的mb级别的码控

另外,AQ是mbtree的必要前提,即如果要开启mbtree,则必须要开启AQ,如果没有开启则会强制开启。与AQ有关的另一个紧密相关的变量是aq_strength,它控制了AQ的强度,如果编码的场景复杂,应该使用较小的aq_strength,反之应该用较大的aq_strength。aq_strength默认配置为1,1表示认为当前的场景比较简单,调控的力度比较大

    /* MB-tree requires AQ to be on, even if the strength is zero. */
    // MB-tree要求AQ打开,即便aq_strength为0
    if( !h->param.rc.i_aq_mode && h->param.rc.b_mb_tree )
    {
        h->param.rc.i_aq_mode = 1;
        h->param.rc.f_aq_strength = 0;
    }

自适应量化参数AQ的主要实现流程为:

  1. 自适应量化一帧(x264_adaptive_quant_frame)
  2. 量化参数的调整(x264_ratecontrol_mb_qp)
  3. 宏块分析(x264_macroblock_analyse)

2.1 自适应量化一帧(x264_adaptive_quant_frame)

该函数是AQ工具实现的具体函数,通过区分不同的模式,为mb计算qp_adj,即qp的偏移量。主要工作流程为:

  1. 如果不使用自适应量化(AQ模式),aq_mode为X264_AQ_NONE或者aq_strength为0,则考虑为了使用Mbtree进行qp_offset和inv_qscale_factor的初始化
  2. 如果使用AQ,则需要计算qp_adj, strength等信息
    (1)如果使用AUTOVARIANCE或者是AUTOVARIANCE_BIASED,则要考虑调整qp和strength,具体地做法是计算每一个mb应该调整的qp量,随后取均值,然后计算avg_adj和strength。如果使用VARIANCE,则仅考虑strength
    (2)根据不同的AQ模式对应的计算公式,来计算qp_adj
void x264_adaptive_quant_frame( x264_t *h, x264_frame_t *frame, float *quant_offsets )
{
    /* Initialize frame stats */
    for( int i = 0; i < 3; i++ )
    {
        frame->i_pixel_sum[i] = 0;
        frame->i_pixel_ssd[i] = 0;
    }

    /* Degenerate cases */
    // ----- 1.不使用自适应量化 ----- //
    if( h->param.rc.i_aq_mode == X264_AQ_NONE || h->param.rc.f_aq_strength == 0 )
    {
        /* Need to init it anyways for MB tree */
        // 无论如何都要初始化,为了Mb tree的使用
        if( h->param.rc.i_aq_mode && h->param.rc.f_aq_strength == 0 )
        {
            if( quant_offsets )
            {
                for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )
                    frame->f_qp_offset[mb_xy] = frame->f_qp_offset_aq[mb_xy] = quant_offsets[mb_xy];
                if( h->frames.b_have_lowres )
                    for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )
                        frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8( frame->f_qp_offset[mb_xy] );
            }
            else
            {
                memset( frame->f_qp_offset, 0, h->mb.i_mb_count * sizeof(float) );
                memset( frame->f_qp_offset_aq, 0, h->mb.i_mb_count * sizeof(float) );
                if( h->frames.b_have_lowres )
                    for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )
                        frame->i_inv_qscale_factor[mb_xy] = 256;
            }
        }
        /* Need variance data for weighted prediction */
        if( h->param.analyse.i_weighted_pred )
        {
            for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )
                for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ )
                    ac_energy_mb( h, mb_x, mb_y, frame );
        }
        else
            return;
    }
    /* Actual adaptive quantization */
    // ----- 2.实际的自适应量化 -----
    else
    {
        /* constants chosen to result in approximately the same overall bitrate as without AQ.
         * FIXME: while they're written in 5 significant digits, they're only tuned to 2. */
        float strength;
        float avg_adj = 0.f;
        float bias_strength = 0.f;
		// 如果是AUTOVARIANCE或者AUTOVARIANCE_BIASED,则需要考虑调整qp和strength
        if( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE || h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE_BIASED )
        {
            float bit_depth_correction = 1.f / (1 << (2*(BIT_DEPTH-8)));
            float avg_adj_pow2 = 0.f;
            // 计算每一个mb应该调整的qp量,随后取均值
            for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )
                for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ )
                {
                	// 在AQ模式计算中,将能量定义为方差。能量(方差)越大,qp调整值越大
                	// E[X^2] - E[X]^2 = ssd - (sum * sum >> shift)
                    uint32_t energy = ac_energy_mb( h, mb_x, mb_y, frame );
                    float qp_adj = powf( energy * bit_depth_correction + 1, 0.125f );
                    frame->f_qp_offset[mb_x + mb_y*h->mb.i_mb_stride] = qp_adj;
                    // 计算avg_adj
                    avg_adj += qp_adj;
                    avg_adj_pow2 += qp_adj * qp_adj;
                }
            avg_adj /= h->mb.i_mb_count;
            avg_adj_pow2 /= h->mb.i_mb_count;
            // 计算strength,根据所有mb的平均qp_adj来调整
            strength = h->param.rc.f_aq_strength * avg_adj;
            // 计算avg_adj,根据所有mb的平均
            avg_adj = avg_adj - 0.5f * (avg_adj_pow2 - 14.f) / avg_adj;
            // 计算bias_strength
            bias_strength = h->param.rc.f_aq_strength;
        }
        else
        	// 如果是VARIANCE,则仅调整strength
            strength = h->param.rc.f_aq_strength * 1.0397f;

        for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )
            for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ )
            {
                float qp_adj;
                int mb_xy = mb_x + mb_y*h->mb.i_mb_stride;
                if( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE_BIASED ) // AUTOVARIANCE_BIASED的qp调整
                {
                    qp_adj = frame->f_qp_offset[mb_xy];
                    qp_adj = strength * (qp_adj - avg_adj) + bias_strength * (1.f - 14.f / (qp_adj * qp_adj)); // 增加的偏置项强度
                }
                else if( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE ) // AUTOVARIANCE的qp调整
                {
                    qp_adj = frame->f_qp_offset[mb_xy];
                    qp_adj = strength * (qp_adj - avg_adj);
                }
                else // VARIANCE的qp调整
                {
                    uint32_t energy = ac_energy_mb( h, mb_x, mb_y, frame );
                    qp_adj = strength * (x264_log2( X264_MAX(energy, 1) ) - (14.427f + 2*(BIT_DEPTH-8)));
                }
                if( quant_offsets )
                    qp_adj += quant_offsets[mb_xy];
                // 最后赋值的地方
                frame->f_qp_offset[mb_xy] =
                frame->f_qp_offset_aq[mb_xy] = qp_adj;
                if( h->frames.b_have_lowres )
                    frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8(qp_adj);
            }
    }
    
    // ...
}

2.2 量化参数的调整(x264_ratecontrol_mb_qp)

计算之后的qp,实际调整的地方位于x264_ratecontrol_mb_qp之中,该函数给出了一个待编码mb的初始qp

int x264_ratecontrol_mb_qp( x264_t *h )
{
    x264_emms();
    float qp = h->rc->qpm;
    if( h->param.rc.i_aq_mode )
    {
         /* MB-tree currently doesn't adjust quantizers in unreferenced frames. */
         // 如果当前帧会作为一个参考帧,则使用qp_offset,否则使用qp_offset_aq
        float qp_offset = h->fdec->b_kept_as_ref ? h->fenc->f_qp_offset[h->mb.i_mb_xy] : h->fenc->f_qp_offset_aq[h->mb.i_mb_xy];
        /* Scale AQ's effect towards zero in emergency mode. */
        if( qp > QP_MAX_SPEC )
            qp_offset *= (QP_MAX - qp) / (QP_MAX - QP_MAX_SPEC);
        qp += qp_offset;
    }
    return x264_clip3( qp + 0.5f, h->param.rc.i_qp_min, h->param.rc.i_qp_max );
}

2.3 宏块分析(x264_macroblock_analyse)

在进行宏块分析时,如果启用了aq并且像素精度的等级小于10,则会进行qp的调整。如果当前mb的qp和前面mb的qp的差值为1,则将当前mb的qp调整为和前面的qp一样,否则使用当前qp

void x264_macroblock_analyse( x264_t *h )
{
    x264_mb_analysis_t analysis;
    int i_cost = COST_MAX;

    h->mb.i_qp = x264_ratecontrol_mb_qp( h );
    /* If the QP of this MB is within 1 of the previous MB, code the same QP as the previous MB,
     * to lower the bit cost of the qp_delta.  Don't do this if QPRD is enabled. */
    // 如果使用aq模式,并且像素精度的等级小于10
    // 如果当前mb的qp和前面mb的qp的差值为1,则将当前mb的qp调整为和前面的qp一样,否则使用当前qp
    if( h->param.rc.i_aq_mode && h->param.analyse.i_subpel_refine < 10 )
        h->mb.i_qp = abs(h->mb.i_qp - h->mb.i_last_qp) == 1 ? h->mb.i_last_qp : h->mb.i_qp;

    // ...

2.4 AQ的测试

使用自测的序列做简单的测试,分辨率为1920x1200,在这里还关闭了mbtree,因为如果启用了mbtree则会强制开启aq。对比测试配置的参数为开启–no-mbtree 和关闭AQ --no-mbtree --aq-mode 0

(1)使用默认配置的AQ(X264_AQ_VARIANCE)

yuv [info]: 1920x1200p 0:0 @ 25/1 fps (cfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7     Avg QP:23.59  size:195961
x264 [info]: frame P:39    Avg QP:26.23  size:110076
x264 [info]: frame B:54    Avg QP:28.41  size: 50786
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I  I16..4:  4.4% 79.4% 16.2%
x264 [info]: mb P  I16..4:  5.3% 36.6% 12.0%  P16..4: 22.8% 13.9%  6.6%  0.0%  0.0%    skip: 2.8%
x264 [info]: mb B  I16..4:  2.1%  6.0%  2.4%  B16..8: 41.2% 16.4%  3.9%  direct: 6.0%  skip:22.1%  L0:42.4% L1:47.2% BI:10.5%
x264 [info]: 8x8 transform intra:68.4% inter:66.7%
x264 [info]: coded y,uvDC,uvAC intra: 73.8% 32.0% 5.2% inter: 37.2% 15.2% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16%  6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23%  5%  5%  4%  6%  4%  7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10%  4%  5%  5%  6%  4%  4%
x264 [info]: i8c dc,h,v,p: 65% 19% 15%  1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 65.4% 19.2% 10.6%  3.7%  1.2%
x264 [info]: ref B L0: 94.3%  4.5%  1.2%
x264 [info]: ref B L1: 98.8%  1.2%
x264 [info]: kb/s:16814.29

encoded 100 frames, 8.43 fps, 16814.29 kb/s

(2)关闭AQ

yuv [info]: 1920x1200p 0:0 @ 25/1 fps (cfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7     Avg QP:25.00  size:195532
x264 [info]: frame P:39    Avg QP:27.79  size:114287
x264 [info]: frame B:54    Avg QP:29.52  size: 54980
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I  I16..4: 20.0% 66.9% 13.0%
x264 [info]: mb P  I16..4: 13.7% 28.8% 10.8%  P16..4: 16.6% 12.6%  6.5%  0.0%  0.0%    skip:11.0%
x264 [info]: mb B  I16..4:  3.2%  4.9%  2.6%  B16..8: 35.2% 15.6%  4.3%  direct: 4.5%  skip:29.7%  L0:40.8% L1:45.1% BI:14.2%
x264 [info]: 8x8 transform intra:55.4% inter:61.7%
x264 [info]: coded y,uvDC,uvAC intra: 61.5% 33.5% 7.1% inter: 33.4% 10.1% 0.1%
x264 [info]: i16 v,h,dc,p: 57% 24% 10%  8%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 29% 18% 20%  5%  5%  4%  6%  5%  7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 11%  4%  5%  5%  6%  4%  4%
x264 [info]: i8c dc,h,v,p: 68% 17% 14%  1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 63.2% 22.2% 10.6%  3.1%  0.9%
x264 [info]: ref B L0: 94.7%  4.2%  1.1%
x264 [info]: ref B L1: 98.9%  1.1%
x264 [info]: kb/s:17589.66

encoded 100 frames, 9.00 fps, 17589.66 kb/s

从上面的对比来看,关闭AQ之后,码率增加了4.5%,并且编码的帧率也增加了6.3%。从数据来看,关闭AQ之后,各个帧的类型的AvgQP都增加了,其中B帧的编码比特增加较为明显。这里猜测是因为B帧的帧间编码比较复杂,利用帧内的信息进行了AQ的调整,使得编码qp更加合理,这样编码的比特数更小

3.小结

本节讨论了x264当中宏块级别的两个工具Mbtree和AQ,其中Mbtree基于帧间,利用前后帧当中mb互相参考的重要程度来调控mb级别的qp,AQ基于帧内,利用当前帧的mb(或当前帧所有mb)的复杂度信息来调控mb级别的qp。两者的差异是应用在帧内和帧间两个模块之中,关联是Mbtree依赖于AQ,即要使用Mbtree时必须开启AQ,如没有开启AQ,则会强制打开

值得思考的问题是:
(1)这里只考虑了mb级别的码控,没有考虑帧级别的,帧级别的码控是如何处理的?
(2)Mbtree使用了lookahead,这个lookahead实现的细节是怎样的?

CSDN : https://blog.csdn.net/weixin_42877471
Github : https://github.com/DoFulangChen

  • 27
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值