https://wiki.videolan.org/X264_asm_intro/
X264 asm intro
The discussions on this page have been edited for clarity, ease of reading, collation of later questions, and to trim out some noise. The original transcript can be found here: X264asm
Contents[hide] |
Example 1: predict_4x4_dc
Open common/x86/predict-a.asm, go to predict_4x4_dc_mmxext (git link). This function does the following:
A B C D E X X X X F X X X X G X X X X H X X X X
It calculates (A+B+C+D+E+F+G+H+4)/8, and sets all the Xs equal to that value where those are 8-bit pixels in a 2D array with a stride of FDEC_STRIDE.
- <Q> What is FDEC_STRIDE?
- <A> x264 does all its pixel operations on the current macroblock in a temporary buffer of constant stride. It's faster that way, and better on cache. So for example, motion compensation will write to this buffer (or intra prediction).
- <Q> What's a stride?
- <A> Stride is the distance between (x,y) and (x,y+1), so to get from one row to the next.
Now that you understand what the function does, let's look at the asm.
cglobal predict_4x4_dc_mmxext, 1,4
"cglobal" declares a function accessible from outside of asm. The function's name is x264_predict_4x4_dc_mmxext (the x264_ is auto-added). The "1" means "we have one argument. Put it in r0.", (that argument is uint8_t *src). If we had a second argument, we'd say "2" and the second one would go in r1 and if we had a third, it'd go in r2, etc. So at the start of the function, r0 contains uint8_t *src.
- <Q> "that argument is uint8_t *src", what does this mean?
- <A> See the comment above: void predict_4x4_dc( uint8_t *src )
- <Q> What tells the function that it's uint8_t?
- <A> Nothing. It doesn't need to know. Types are a C-ism.
The "4" means we want x264 to give us 4 registers to use. r0, r1, r2, r3. This, of course, includes the r0 used for the parameter. So in short, after the first line: r0 = src, r1/r2/r3 = free, r4 and up: can't use.
- <Q> That's x86inc.asm's doing right?
- <A> Yes, but we aren't going into that.
- <Q> I assume it means we can use them, but if you do, it'll screw around with something you don't want to?
- <A> Yes, which is why you can't use it.
So now, this function as you can see has 4 real steps:
- Sum up A through D