x264官方学习文档(1)——英文资料,极具参考价值

本文介绍了x264编码器中asm技术的应用,通过三个示例详细解释了如何利用asm进行像素处理。首先,探讨了predict_4x4_dc_mmxext函数,展示了如何计算像素平均值;接着,分析了pixel_sad函数,利用psadbw指令高效计算SAD;最后,讲解了pixel_avg2_w16_sse2函数,用于16位像素的简单插值。文中还涉及了x86指令集的使用技巧和性能分析。
摘要由CSDN通过智能技术生成

https://wiki.videolan.org/X264_asm_intro/


X264 asm intro

The discussions on this page have been edited for clarity, ease of reading, collation of later questions, and to trim out some noise. The original transcript can be found here: X264asm

Contents

 [hide


Example 1: predict_4x4_dc

Open common/x86/predict-a.asm, go to predict_4x4_dc_mmxext (git link). This function does the following:

  A B C D
E X X X X
F X X X X
G X X X X
H X X X X

It calculates (A+B+C+D+E+F+G+H+4)/8, and sets all the Xs equal to that value where those are 8-bit pixels in a 2D array with a stride of FDEC_STRIDE.

<Q> What is FDEC_STRIDE?
<A> x264 does all its pixel operations on the current macroblock in a temporary buffer of constant stride. It's faster that way, and better on cache. So for example, motion compensation will write to this buffer (or intra prediction).
<Q> What's a stride?
<A> Stride is the distance between (x,y) and (x,y+1), so to get from one row to the next.

Now that you understand what the function does, let's look at the asm.

cglobal predict_4x4_dc_mmxext, 1,4

"cglobal" declares a function accessible from outside of asm. The function's name is x264_predict_4x4_dc_mmxext (the x264_ is auto-added). The "1" means "we have one argument. Put it in r0.", (that argument is uint8_t *src). If we had a second argument, we'd say "2" and the second one would go in r1 and if we had a third, it'd go in r2, etc. So at the start of the function, r0 contains uint8_t *src.

<Q> "that argument is uint8_t *src", what does this mean?
<A> See the comment above:  void predict_4x4_dc( uint8_t *src )
<Q> What tells the function that it's uint8_t?
<A> Nothing. It doesn't need to know. Types are a C-ism.

The "4" means we want x264 to give us 4 registers to use. r0r1r2r3. This, of course, includes the r0 used for the parameter. So in short, after the first line: r0 = src, r1/r2/r3 = free, r4 and up: can't use.

<Q> That's x86inc.asm's doing right?
<A> Yes, but we aren't going into that.
<Q> I assume it means we can use them, but if you do, it'll screw around with something you don't want to?
<A> Yes, which is why you can't use it.

So now, this function as you can see has 4 real steps:

  1. Sum up A through D
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值