x264官方学习文档（1）——英文资料，极具参考价值

最新推荐文章于 2019-03-21 15:09:30 发布

STN_LCD

最新推荐文章于 2019-03-21 15:09:30 发布

阅读量1.3k

点赞数

分类专栏： x264

本文链接：https://blog.csdn.net/STN_LCD/article/details/78032880

版权

本文介绍了x264编码器中asm技术的应用，通过三个示例详细解释了如何利用asm进行像素处理。首先，探讨了predict_4x4_dc_mmxext函数，展示了如何计算像素平均值；接着，分析了pixel_sad函数，利用psadbw指令高效计算SAD；最后，讲解了pixel_avg2_w16_sse2函数，用于16位像素的简单插值。文中还涉及了x86指令集的使用技巧和性能分析。

摘要由CSDN通过智能技术生成

https://wiki.videolan.org/X264_asm_intro/

X264 asm intro

The discussions on this page have been edited for clarity, ease of reading, collation of later questions, and to trim out some noise. The original transcript can be found here: X264asm

[hide]

Example 1: predict_4x4_dc

Open common/x86/predict-a.asm, go to predict_4x4_dc_mmxext (git link). This function does the following:

  A B C D
E X X X X
F X X X X
G X X X X
H X X X X

It calculates (A+B+C+D+E+F+G+H+4)/8, and sets all the Xs equal to that value where those are 8-bit pixels in a 2D array with a stride of FDEC_STRIDE.

<Q> What is FDEC_STRIDE?

<A> x264 does all its pixel operations on the current macroblock in a temporary buffer of constant stride. It's faster that way, and better on cache. So for example, motion compensation will write to this buffer (or intra prediction).

<Q> What's a stride?

<A> Stride is the distance between (x,y) and (x,y+1), so to get from one row to the next.

Now that you understand what the function does, let's look at the asm.

cglobal predict_4x4_dc_mmxext, 1,4

"cglobal" declares a function accessible from outside of asm. The function's name is x264_predict_4x4_dc_mmxext (the x264_ is auto-added). The "1" means "we have one argument. Put it in r0.", (that argument is uint8_t *src). If we had a second argument, we'd say "2" and the second one would go in r1 and if we had a third, it'd go in r2, etc. So at the start of the function, r0 contains uint8_t *src.

<Q> "that argument is uint8_t *src", what does this mean?

<A> See the comment above: void predict_4x4_dc( uint8_t *src )

<Q> What tells the function that it's uint8_t?

<A> Nothing. It doesn't need to know. Types are a C-ism.

The "4" means we want x264 to give us 4 registers to use. r0, r1, r2, r3. This, of course, includes the r0 used for the parameter. So in short, after the first line: r0 = src, r1/r2/r3 = free, r4 and up: can't use.

<Q> That's x86inc.asm's doing right?

<A> Yes, but we aren't going into that.

<Q> I assume it means we can use them, but if you do, it'll screw around with something you don't want to?

<A> Yes, which is why you can't use it.

So now, this function as you can see has 4 real steps: