Google 视频编码格式 VP9 究竟厉害在哪里

近期 Google 已经开始研究 VP10 了,VP10 是一个由 WebM 和 Motroska 包含的开放、免费视频编解码器。Google 也已利用 VP10 来处理 YouTube 4K 视频。

目前互联网上使用最多的视频编码标准是 Google 研发、无版权费的 VP9。VP9 作为 VP8 的后继产品,主要竞争对手是 MPEG 的高效视频编码标准 HEVC。和 HEVC 相比,VP9 在网络浏览器中有更良好的支持性。它也以其优秀的视频编码质量和压缩效率而被广泛应用于互联网视频网站中。

输入图片说明

从上图可以看出 Firefox、Chrome、Opera 等主流浏览器都支持。据统计,截止到 2017 年初,大约有 75% 的浏览器已经支持 VP9。

那么 VP9 是如何在原有的视频编码标准的基础上,提升视频编码质量和压缩效率的呢?

VP9 编码效率提高 50% 以上

VP9 是一种基于宏块的变换编码格式,与 HEVC 等视频编码标准相比,VP9 等比特流格式的视频编码标准相对简单。

与 VP8 相比,VP9 视频编码标准在技术上有许多优化:

  • 支持使用 64×64 像素的编码单元,这在高分辨率视频中效果最为明显;

  • 改善运动矢量的预测,除了 VP8 的四种模式(average/"DC"、 "true motion"、horizontal、vertical),VP9 还支持像素的线性外推的 6 个倾斜方向的帧内预测;

  • 支持将帧分割为具有特定相似性的区域;

  • 对运动矢量的第 8 像素精度、三种可切换的子像素插值滤波器、参考运动矢量、熵编码、环路滤波、ADST、DCT 等进行优化。

图片分区优化

VP9 将图片分成 64x64 大小的宏块 —— super blocks。super blocks 按照从左到右,从上到下的光栅顺序处理。同时与其他编解码器相同,super blocks 最小可以细分到 4x4 。细分宏块方式与 HEVC 一样通过递归四叉树编码结构完成,但是 VP9 支持水平或垂直细分。

输入图片说明

此外,VP9 也支持拼贴,图片会被分解成沿着 super blocks 边界拼贴成网格。与 HEVC 不同的是,这些网格能均匀地分布,并且有两个幂数。网格的宽度 256 像素与 4096 像素之间。

比特流编码

VP9 使用 8 位算术编码引擎 bool-coder 压缩比特流。概率模型对整个帧是固定的,所有概率模型在帧数据开始解码之前都是已知的。这些概率模型存储在帧的上下文中,解码器维护在上下文中的四个概率模型。

每一帧编码组成:

  • 未压缩的标题:十几个字节,包含像图片大小,循环过滤器强度等;

  • 压缩标头:传输用于帧的 Bool 编码部分;

  • 压缩帧数据(布尔编码数据):重建帧所需的数据,包含宏块分区大小,运动矢量,帧内模式和变换系数。

VP9 与 VP8 不同的是,不存在数据分区,所有数据类型以 super blocks 编码顺序交织。

帧内预测

VP9 中的帧内预测类似于 AVC / HEVC 的帧内预测,​​遵循变换宏块分区。因此帧内预测操作总是正方形的。比如说,具有 8×8 变换的 16×8 块会导致两次 8×8 亮度预测操作。

VP9 有 10 种不同的预测模式,其中 8 个预测模式是定向的。像其他编解码器一样,帧内预测需要两个 1D 阵列,其中包含相邻像素块重构的左侧像素和上侧像素。左侧数组的高度与当前块的高度相同,上侧数组宽度是当前像素块的两倍。

输入图片说明

帧间预测

VP9 帧间预测使用 ⅛ 像素进行运动补偿。一般情况下,运动补偿是单向的,每个块一个运动矢量,没有双向预测。不过,VP9 也支持“复合预测”(双向预测),其中每个块有两个运动矢量并且两个结果预测样本被平均。为了避免双向预测的专利限制, Google 仅在标记为不可显示的帧中才启用“复合预测”(双向预测)。

此外,VP9 提供了一个新功能,每块可以选择三种不同的子像素插值滤波器:

  • 正常的第 8 像素;

  • 平滑的第 8 像素,可以进行平滑或模糊预测;

  • 锐利的第 8 像素, 可以进行锐利预测。

残差信号编码

VP9 支持四种变换大小:32x32、16x16、8x8 和 4x4。这些变换与其他大多数编码一样,是 DCT 的近似整数。在帧内编码宏块中,垂直和水平变换路径中的一者或两者会是 DST(离散正弦变换)。

虽然 Google 已经在研究 VP10,但是距离 VP10 的全面应用还有较长的时间。VP9 才是目前最具有普及意义的视频编码标准。

快速启用 VP9

目前,又拍云多媒体处理已支持 VP9 视频编码标准。客户在在进行音视频处理时,只需把视频编码格式设置成 libvpx-vp9,即可自动实现 VP9 视频编码标准,为终端用户推送VP9 格式。

输入图片说明

转载于:https://my.oschina.net/upyun/blog/1588260

世界上最快的VP9视频解码器 As before , I was very excited when Google released VP9 – for one, because I was one of the people involved in creating it back when I worked for Google (I no longer do). How good is it, and how much better can it be? To evaluate that question, Clément Bœsch and I set out to write a VP9 decoder from scratch for FFmpeg. The goals never changed from the original ffvp8 situation (community-developed, fast, free from the beginning). We also wanted to answer new questions: how does a well-written decoder compare, speed-wise, with a well-written decoder for other codecs? TLDR (see rest of post for details): as a codec, VP9 is quite impressive – it beats x264 in many cases. However, the encoder is slow, very slow. At higher speed settings, the quality gain melts away. This seems to be similar to what people report about HEVC (using e.g. x265 as an encoder). single-threaded decoding speed of libvpx isn’t great. FFvp9 beats it by 25-50% on a variety of machines. FFvp9 is somewhat slower than ffvp8, and somewhat faster than ffh264 decoding speed (for files encoded to matching SSIM scores). Multi-threading performance in libvpx is deplorable, it gains virtually nothing from its loopfilter-mt algorithm. FFvp9 multi-threading gains nearly as much as ffh264/ffvp8 multithreading, but there’s a cap (material-, settings- and resolution-dependent, we found it to be around 3 threads in one of our clips although it’s typically higher) after which further threads don’t cause any more gain. The codec itself To start, we did some tests on the encoder itself. The direct goal here was to identify bitrates at which encodings would give matching SSIM-scores so we could do same-quality decoder performance measurements. However, as such, it also allows us to compare encoder performance in itself. We used settings very close to recommended settings forVP8,VP9andx264, optimized for SSIM as a metric. As source clips, we chose Sintel (1920×1080 CGI content, source ), a 2-minute clip from Tears of Steel (1920×800 cinematic content, source ), and a 3-minute clip from Enter the Void (1920×818 high-grain/noise content,screenshot). For each, we encoded at various bitrates and plotted effective bitrate versus SSIM . sintel_ssimtos_ssimetv_ssim You’ll notice that in most cases, VP9 can indeed beat x264, but, there’s some big caveats: VP9 encoding (using libvpx) is horrendously slow – like, 50x slower than VP8/x264 encoding. This means that encoding a 3-minute 1080p clip takes several days on a high-end machine. Higher –cpu-used=X parameters make the quality gains melt away. libvpx’ VP9 encodes miss the target bitrates by a long shot (100% off) for the ETV clip, possibly because of our use of –aq-mode=1. libvpx tends to slowly crumble at higher bitrates for hard content – again, look at the ETV clip, where x264 shows some serious mature killer instinct at the high bitrate end of things. Overall, these results are promising, although the lack-of-speed is a serious issue. Decoder performance For decoding performance measurements, we chose (Sintel)500 (VP9), 1200 (VP8) and 700 (x264) kbps (SSIM=19.8); Tears of Steel4.0 (VP9), 7.9 (VP8) and 6.3 (x264) mbps (SSIM=19.2); and Enter the Void 9.7 (VP9), 16.6 (VP8) and 10.7 (x264) mbps (SSIM=16.2). We used FFmpeg to decode each of these files, either using the built-in decoder (to compare between codecs), or using libvpx-vp9 (to compare ffvp9 versus libvpx). Decoding time was measured in seconds using “time ffmpeg -threads 1 [-c:v libvpx-vp9] -i $file -f null -v 0 -nostats – 2>&1 | grep user”, with this FFmpeg and this libvpx revision (downloaded on Feb 20th, 2014). sintel_archs tos_archsetv_archs A few notes on ffvp9 vs. libvpx-vp9 performance: ffvp9 beats libvpx consistently by 25-50%. In practice, this means that typical middle- to high-end hardware will be able to playback 4K content using ffvp9, but not using libvpx. Low-end hardware will struggle to playback even 720p content using libvpx (but do so fine using ffvp9). on Haswell, the difference is significantly smaller than on sandybridge, likely because libvpx has some AVX2 optimizations (e.g. for MC and loop filtering), whereas ffvp9 doesn’t have that yet; this means this difference might grow over time as ffvp9 gets AVX2 optimizations also. on the Atom, the differences are significantly smaller than on other systems; the reason for this is likely that we haven’t done any significant work on Atom-performance yet. Atom has unusually large latencies between GPRs and XMM registers, which means you need to take special care in ordering your instructions to prevent unnecessary halts – we haven’t done anything in that area yet (for ffvp9). Some users may find that ffvp9 is a lot slower than advertised on 32bit; this is correct, most of our SIMD only works on 64bit machines. If you have 32bit software, port it to 64bit. Can’t port it? Ditch it. Nobody owns 32bit x86 hardware anymore these days. So how does VP9 decoding performance compare to that of other codecs? There’s basically two ways to measure this: same-bitrate (e.g. a 500kbps VP8 file vs. a 500kbps VP9 file, where the VP9 file likely looks much better), or same-quality (e.g. a VP8 file with SSIM=19.2 vs. a VP9 file with SSIM=19.2, where the VP9 file likely has a much lower bitrate). We did same-quality measurements, and found: ffvp9 tends to beat ffh264 by a tiny bit (10%), except on Atom (which is likely because ffh264 has received more Atom-specific attention than ffvp9). ffvp9 tends to be quite a bit slower than ffvp8 (15%), although the massive bitrate differences in Enter the Void actually makes it win for that clip (by about 15%, except on Atom). Given that Google promised VP9 would be no more than 40% more complex than VP8, it seems they kept that promise. we did some same-bitrate comparisons, and found that x264 and ffvp9 are essentially identical in that scenario (with x264 having slightly lower SSIM scores); vp8 tends to be about 50% faster, but looks significantly worse. Multithreading One of the killer-features in FFmpeg is frame-level multithreading, which allows multiple cores to decode different video frames in parallel. Libvpx also supports multithreading. So which is better?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值