ebp 函数堆栈esp_为什么使用ebp比使用esp寄存器更好地在堆栈上定位参数？

最新推荐文章于 2021-04-28 20:40:03 发布

weixin_39952502

最新推荐文章于 2021-04-28 20:40:03 发布

阅读量84

点赞数

文章标签： ebp 函数堆栈esp

本文链接：https://blog.csdn.net/weixin_39952502/article/details/111842507

版权

I am new to MASM. I have confusion regarding these pointer registers. I would really appreciate if you guys help me.

Thanks

解决方案

Encoding an addressing mode using [ebp + disp8] is one byte shorter than [esp+disp8], because using ESP as a base register requires a SIB byte. See rbp not allowed as SIB base? for details. (That question title is asking about the fact that [ebp] has to be encoded as [ebp+0].)

The first time [esp + disp8] is used after a push or pop, or after a call, will require a stack-sync uop on Intel CPUs. (What is the stack engine in the Sandybridge microarchitecture?). Of course, mov ebp, esp to make a stack frame in the first place also triggers a stack-sync uop: any explicit reference to ESP in the out-of-order core (not just addressing modes) cause a stack-sync uop if the stack engine might have an offset that the out-of-order back end doesn't know about.

The traditional stack-frame setup with ebp creates a linked-list of stack frames (each saved EBP pointing at the parent's saved EBP, right below a return address), handy for profiling and sometimes debugging if your code doesn't have alternate metadata that lets your debugger unwind the stack to show stack backtraces.

But despite these downsides to using ESP, it's often not better (for performance) to use EBP as a frame pointer, because it uses up an extra one of the 8 GP registers for the stack, leaving you with 6 instead of 7 you can actually use for stuff other than the stack. Modern compilers default to -fomit-frame-pointer when optimization is enabled.

It's easy for compilers to keep track of how much ESP has moved relative to where they stored something because they know how much sub esp,28 moves the stack pointer. Even after pushing a function arg, they still know the right ESP-relative offset to anything they stored on the stack earlier in the function.

Humans can do it, too, but it's easy to make a mistake when you modify the function to reserve some extra space and forget to update all the offsets from ESP to your locals and stack args, if any. (Normally it's not worth hand-writing large functions that can't keep most of their variables in registers, though. Leave that to the compiler and only spend your time writing the hot loops in asm, if at all.)

The exception is if your function allocates a variable amount of stack space (like C alloca or C99 variable length arrays like int arr[n]); in that case compilers will make a traditional stack frame with EBP. Or in hand-written asm, if you push in a loop to use the call stack as a Stack data structure.

For example, x86 MSVC 19.14 compiles this C

int foo() {

volatile int i = 0; // force it to be stored to memory

return i;

}

;;; MSVC -O2

_i$ = -4 ; size = 4

int foo(void) PROC ; foo, COMDAT

push ecx

mov DWORD PTR _i$[esp+4], 0 ; note this is actually [esp+0] ; _i$ = -4

mov eax, DWORD PTR _i$[esp+4]

pop ecx

ret 0

int foo(void) ENDP ; foo

Notice that it reserves space for i with a push instead of sub esp, 4 because that saves code-size and is usually about the same performance. It's the same number of uops for the front-end, with no extra stack-sync uops, because the push is before any explicit reference to esp, and the pop is after the last one.

(If it was reserving more than 4 bytes, I think it would just use a normal sub esp, 8 or whatever.)

There's an obvious missed optimization here; push 0 would store the value it actually wants, instead of whatever garbage was in ECX. (What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once?). And pop eax would clean the stack and load i as the return value.

vs. this with optimization disabled. Notice that _i$ = -4 is the same offset from the "stack frame", but that the optimized code used esp+4 as the base while this uses ebp. That's mostly just a fun-fact of MSVC internals, that it seems to think in terms of where EBP would be if it hadn't optimized away frame-pointer creation. Picking a reference point makes sense, and lining up with it's frame-pointer-enabled choice is the obvious choice.

;;; MSVC -O0

_i$ = -4 ; size = 4

int foo(void) PROC ; foo

push ebp

mov ebp, esp ; make a stack frame

push ecx

mov DWORD PTR _i$[ebp], 0

mov eax, DWORD PTR _i$[ebp]

mov esp, ebp

pop ebp

ret 0

int foo(void) ENDP ; foo

Interesting, it still uses push/pop to reserve 4 bytes of stack space. This time it does cause one extra stack-sync uop on Intel CPUs, because the push ecx after the mov ebp,esp re-dirties the stack engine before mov esp, ebp. But that's pretty trivial.

weixin_39952502

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ebp 函数堆栈esp_为什么使用ebp比使用esp寄存器更好地在堆栈上定位参数？

I am new to MASM. I have confusion regarding these pointer registers. I would really appreciate if you guys help me.Thanks解决方案Encoding an addressing mode using [ebp + disp8] is one byte shorter than [...
复制链接

扫一扫