vp9协议笔记

本文详细解读了VP9协议,涉及视频编码的基本流程、超级帧的使用、帧内预测与残差处理、DCT变换、loop_filter以及解码过程中的关键步骤,包括superframe_index和frame的解析。
摘要由CSDN通过智能技术生成

vp9协议笔记📒

本文主要是对vp9协议的梳理,协议的细节参考官方文档:VP9协议链接(需要加速器)

1. 视频编码概述

🐶视频编码的流程大概是预测 + 残差 + loop filter(LPF)的模式;

🐭编码端 : 编码端首先对图像进行合理的划分,之后对划分的块(CU)采用帧内预测或帧间预测的方式,对图像进行预测,筛选出误差最小的预测模式,但这样预测出来的图像和原图像会有很大误差,这个误差被称为残差(residual, TU);

🐹编码端可能会对残差进行进一步的split划分(因此TU的大小是小于CU的, H265有对TU进一步的划分, 而vp9中没有),之后再对划分后的块进行DCT变换和量化,以减小残差所占的字节大小(这一步一般图像会有少许的损失);

🐰为了弥补变换量化产生的损失,编码端会编码一种选择一种合适的loop_filter(lpf)过滤器的模式,对图像进行补偿;


请添加图片描述

🐺如图所示, 解码端拿到预测模式,根据预测模式对图像进行预测;同时对拿到的残差进行解量化和反变换; 将预测图像和接量化,反变换后的残差相加,再根据lpf的模式进行滤波,就可以完成图像的解码了。

2. 超级帧superframe(sz):

superframe( sz ) {
	for( i = 0; i < NumFrames; i++ )
		frame( frame_sizes[ i ] )
	superframe_index( )
}
  • 🐸各个帧的解析函数frame(frame_ize)需要用的fram_size在superframe_index()里,为什么super_frame_index在后面才解码?
    • Q1 : 该解析函数顺序并不是真实解码顺序,只是码流的排列顺序。解析超级帧时,整个超级帧的大小sz是已知的,直接先读取大小为sz的字符串的最后1个字节(superframe_header),解析后就知道frame的数量和frame_size的大小;解析完superframe-index然后才开始从头开始解析各个frame;
    • Q2. 由于我们编码完所有的帧信息才能知道各个帧的大小,所以superframe index放在超级帧的后面;而解析的时候是先解析superframe index,再从头解析各个frame
  • 🐯为什么superframe header要解析两遍
    • Q : 因为vp9支持superframe,也支持不用superframe的结构,解析两遍(对比一下是否存在这个信息)和superframe-mark标志的,一起判断该段是否为超级帧;

2. frame(sz)

frame( sz ) {
	startBitPos = get_position( )
	uncompressed_header( )
	trailing_bits( )
	if ( header_size_in_bytes == 0 ) {
		while ( get_position( ) < startBitPos + 8 * sz)
			padding_bit
		return
	}
	load_probs( frame_context_idx )
	load_probs2( frame_context_idx )
	clear_counts( )
	init_bool( header_size_in_bytes )
	compressed_header( )
	exit_bool( )
	endBitPos = get_position( )
	headerBytes = (endBitPos - startBitPos) / 8
	decode_tiles( sz - headerBytes )
	refresh_probs( )
}
  • 🐻uncompress-header()为一些图像基本信息,bit位宽,YUV格式,色彩空间,帧间预测所需要用到的参考帧的更新等等;
  • 🐷header_size_in_byte为0时表示该帧直接copy其他帧信息,不需要进一步解码了,这个变量的解析在ubcompress-header中;
  • 🐽vp9采用的基于概率的压缩,具体可以参考协议第9节,很多压缩的语法元素有一张概率表,解码过程是会用到这张概率表的,而这张概率表也是会在运算过程中更新的。load-probs(idx)是加载frame_context_idx表示的这张概率表,frame_context_idx的值在uncompress-header中解析;
  • 🐮编解码过程中会将很多语法元素编码的次数记录下来,以便后面在refresh_probs()中更新概率模型;因此开始解析压缩后的信息前,clear_count()清空计数器;
  • 🐵Compress_header()里解析的是概率表,因为vp9并不是完全按上面load-probs加载的概率表来计算的,部分位置需要更新后再使用,哪些概率信息需要更新,更新值是多少,就在compress-header里解析;概率表用于从二进制码流里解析各个语法元素(详见协议第9节)
  • 🐒decode-tiles()开始正式解析还原这个图像;

3. vp9中的一些索引解释:

  • 🐴segment id : sement id对应的位置存储了之前解码过的图像的skip,QP,参考帧等信息,根据一些segment的相关标志位决定这些参数是直接采用segment id位置 所对应这些信息,还是单独解码这些信息;
  • 🐎 frame_to_show_map_idx : 表示该帧直接显示frame_to_show_map_idx所对应的图像(之前解码存储的图像),该帧解码结束;
  • 🐫frame_context_idx : vp9的字符串解析过程中会用到很多概率表(第9节),load_probs( frame_context_idx )表示加载frame_context_idx所对应的概率表;

4. decode-tiles()

  • 🐑tileCols,这一帧图像有多少列tile
  • 🐘tileRows,这一帧图像有多少行tile
  • 🐼tileROw:当前tile位于该帧的第几行tile
  • 🐍tileCOl: 当前tile位于该帧的第几行tile
  • 🐦MiROWs:这一帧图像有多少行8*8块
  • 🐤MiCols:这一帧图像有多少行8*8块
  • 🐥MiROWSTART : 该tile的起始行的位置(8*8为单位,比如若为20,表示该tile位于一帧图像的y坐标的160像素点);
  • 🐣简单来说,一个Mi的为8*8个r像素点;

5. decode_partition

  • 🐔bsize是根据partition划分后的大小,也就是说,如果对88进行划分,之后一定进decode-block()。 如果bsize是88,partition 若为 NONE则一定进入decode_ block;其他partition对应的bsize为44,84,48,也是直接进decode block(一个88,无论怎么划分,都只解码一次decode_block);需要注意的是,decode_block函数里虽然Misze是44,84或48,但实际上都是在处理一个88的块;
  • 🐧如果不需要编解码残差(skip),那还需要编码tx size吗?需要,intra mode预测要用到tx_size;

6. Residual()


Token : extra_bits[ 11 ][ 3 ] = {
{ 0, 0, 0},
{ 0, 0, 1},
{ 0, 0, 2},
{ 0, 0, 3},
{ 0, 0, 4},
{ 1, 1, 5},
{ 2, 2, 7},
{ 3, 3, 11},
{ 4, 4, 19},
{ 5, 5, 35},
{ 6, 14, 67}
}

  • 【0】 位置是概率解码时要用到的;
  • 【1】 表示offset的位宽
  • 【2】 表示base

残差的绝对值 = base + offset;

例如:

  • 🐟若token为 0(ZERO_TOKEN),则base为0,offset位宽为0,则残差的绝对值为0;
  • 🐳若token为7(DCT_VAL_CAT3)则base = 7, offset位宽为2bit([0 , 3]),因此残差绝对值的取值范围为[7 , 10];
  • 🐋若token为8(DCT_VAL_CAT4)则base = 11, offset位宽为3bit[0 , 7],因此表示残差绝对值的取值范围为[11 , 18];

7. 参考文献

【1】VP9协议链接(需要加速器)

世界上最快的VP9视频解码器 As before , I was very excited when Google released VP9 – for one, because I was one of the people involved in creating it back when I worked for Google (I no longer do). How good is it, and how much better can it be? To evaluate that question, Cl&eacute;ment Bœsch and I set out to write a VP9 decoder from scratch for FFmpeg. The goals never changed from the original ffvp8 situation (community-developed, fast, free from the beginning). We also wanted to answer new questions: how does a well-written decoder compare, speed-wise, with a well-written decoder for other codecs? TLDR (see rest of post for details): as a codec, VP9 is quite impressive – it beats x264 in many cases. However, the encoder is slow, very slow. At higher speed settings, the quality gain melts away. This seems to be similar to what people report about HEVC (using e.g. x265 as an encoder). single-threaded decoding speed of libvpx isn’t great. FFvp9 beats it by 25-50% on a variety of machines. FFvp9 is somewhat slower than ffvp8, and somewhat faster than ffh264 decoding speed (for files encoded to matching SSIM scores). Multi-threading performance in libvpx is deplorable, it gains virtually nothing from its loopfilter-mt algorithm. FFvp9 multi-threading gains nearly as much as ffh264/ffvp8 multithreading, but there’s a cap (material-, settings- and resolution-dependent, we found it to be around 3 threads in one of our clips although it’s typically higher) after which further threads don’t cause any more gain. The codec itself To start, we did some tests on the encoder itself. The direct goal here was to identify bitrates at which encodings would give matching SSIM-scores so we could do same-quality decoder performance measurements. However, as such, it also allows us to compare encoder performance in itself. We used settings very close to recommended settings forVP8,VP9andx264, optimized for SSIM as a metric. As source clips, we chose Sintel (1920×1080 CGI content, source ), a 2-minute clip from Tears of Steel (1920×800 cinematic content, source ), and a 3-minute clip from Enter the Void (1920×818 high-grain/noise content,screenshot). For each, we encoded at various bitrates and plotted effective bitrate versus SSIM . sintel_ssimtos_ssimetv_ssim You’ll notice that in most cases, VP9 can indeed beat x264, but, there’s some big caveats: VP9 encoding (using libvpx) is horrendously slow – like, 50x slower than VP8/x264 encoding. This means that encoding a 3-minute 1080p clip takes several days on a high-end machine. Higher –cpu-used=X parameters make the quality gains melt away. libvpx’ VP9 encodes miss the target bitrates by a long shot (100% off) for the ETV clip, possibly because of our use of –aq-mode=1. libvpx tends to slowly crumble at higher bitrates for hard content – again, look at the ETV clip, where x264 shows some serious mature killer instinct at the high bitrate end of things. Overall, these results are promising, although the lack-of-speed is a serious issue. Decoder performance For decoding performance measurements, we chose (Sintel)500 (VP9), 1200 (VP8) and 700 (x264) kbps (SSIM=19.8); Tears of Steel4.0 (VP9), 7.9 (VP8) and 6.3 (x264) mbps (SSIM=19.2); and Enter the Void 9.7 (VP9), 16.6 (VP8) and 10.7 (x264) mbps (SSIM=16.2). We used FFmpeg to decode each of these files, either using the built-in decoder (to compare between codecs), or using libvpx-vp9 (to compare ffvp9 versus libvpx). Decoding time was measured in seconds using “time ffmpeg -threads 1 [-c:v libvpx-vp9] -i $file -f null -v 0 -nostats – 2>&1 | grep user”, with this FFmpeg and this libvpx revision (downloaded on Feb 20th, 2014). sintel_archs tos_archsetv_archs A few notes on ffvp9 vs. libvpx-vp9 performance: ffvp9 beats libvpx consistently by 25-50%. In practice, this means that typical middle- to high-end hardware will be able to playback 4K content using ffvp9, but not using libvpx. Low-end hardware will struggle to playback even 720p content using libvpx (but do so fine using ffvp9). on Haswell, the difference is significantly smaller than on sandybridge, likely because libvpx has some AVX2 optimizations (e.g. for MC and loop filtering), whereas ffvp9 doesn’t have that yet; this means this difference might grow over time as ffvp9 gets AVX2 optimizations also. on the Atom, the differences are significantly smaller than on other systems; the reason for this is likely that we haven’t done any significant work on Atom-performance yet. Atom has unusually large latencies between GPRs and XMM registers, which means you need to take special care in ordering your instructions to prevent unnecessary halts – we haven’t done anything in that area yet (for ffvp9). Some users may find that ffvp9 is a lot slower than advertised on 32bit; this is correct, most of our SIMD only works on 64bit machines. If you have 32bit software, port it to 64bit. Can’t port it? Ditch it. Nobody owns 32bit x86 hardware anymore these days. So how does VP9 decoding performance compare to that of other codecs? There’s basically two ways to measure this: same-bitrate (e.g. a 500kbps VP8 file vs. a 500kbps VP9 file, where the VP9 file likely looks much better), or same-quality (e.g. a VP8 file with SSIM=19.2 vs. a VP9 file with SSIM=19.2, where the VP9 file likely has a much lower bitrate). We did same-quality measurements, and found: ffvp9 tends to beat ffh264 by a tiny bit (10%), except on Atom (which is likely because ffh264 has received more Atom-specific attention than ffvp9). ffvp9 tends to be quite a bit slower than ffvp8 (15%), although the massive bitrate differences in Enter the Void actually makes it win for that clip (by about 15%, except on Atom). Given that Google promised VP9 would be no more than 40% more complex than VP8, it seems they kept that promise. we did some same-bitrate comparisons, and found that x264 and ffvp9 are essentially identical in that scenario (with x264 having slightly lower SSIM scores); vp8 tends to be about 50% faster, but looks significantly worse. Multithreading One of the killer-features in FFmpeg is frame-level multithreading, which allows multiple cores to decode different video frames in parallel. Libvpx also supports multithreading. So which is better?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

皮皮宽

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值