最近项目升级了gcc编译器,需要增加编译选项-ftree-slp-vectorize,加了后发现程序莫名其妙挂掉,使用gdb挂起也没看到有内存错误,使用disassemble才看到当程序运行到movaps指令时挂掉了。上网搜索发现这是SEE指令,movaps要求操作地址是16字节对齐。但是由于项目使用的并不是glibc的malloc,而是自己研发的malloc,这个malloc并不像是glibc一样在64位系统下16字节对齐,而是4字节对齐,所以当申请的地址不能被16整除时,movaps指令异常,程序挂掉。
修改方法:使用类似align_malloc的接口申请16字节对齐的内存。
下面我尝试在虚拟机上复现该问题,由于我的系统是64位的,malloc函数申请内存地址都是16字节对齐,这里使用汇编模拟这个问题。
int main()
{
int a = 0;
__asm__ __volatile__ ("movaps 0x04(%rax), %xmm0");
return 0;
}
(gdb) b main
Breakpoint 1 at 0x400478: file movtest.c, line 3.
(gdb) i r
The program has no registers now.
(gdb) r
Starting program: /home/luogf/20210213/a.out
Breakpoint 1, main () at movtest.c:3
3 int a = 0;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.212.el6_10.3.x86_64
(gdb) i r
rax 0x7ffff7dd8f80 140737351880576
rbx 0x0 0
rcx 0x400474 4195444
rdx 0x7fffffffe668 140737488348776
rsi 0x7fffffffe658 140737488348760
rdi 0x1 1
rbp 0x7fffffffe570 0x7fffffffe570
rsp 0x7fffffffe570 0x7fffffffe570
r8 0x7ffff7dd7ba0 140737351875488
r9 0x7ffff7deae20 140737351953952
r10 0x7fffffffe3c0 140737488348096
r11 0x7ffff7a66c20 140737348267040
r12 0x400390 4195216
r13 0x7fffffffe650 140737488348752
r14 0x0 0
r15 0x0 0
rip 0x400478 0x400478 <main+4>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) disassemble
Dump of assembler code for function main:
0x0000000000400474 <+0>: push %rbp
0x0000000000400475 <+1>: mov %rsp,%rbp
=> 0x0000000000400478 <+4>: movl $0x0,-0x4(%rbp)
0x000000000040047f <+11>: movaps 0x4(%rax),%xmm0
0x0000000000400483 <+15>: mov $0x0,%eax
0x0000000000400488 <+20>: leaveq
0x0000000000400489 <+21>: retq
End of assembler dump.
可以看到rax寄存器是0x7ffff7dd8f80,movaps 0x4(%rax),%xmm0操作0x7ffff7dd8f80+0x04不是16字节对齐,所以会出现下面的错误:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
main () at movtest.c:4
4 __asm__ __volatile__ (
(gdb) disassemble
Dump of assembler code for function main:
0x0000000000400474 <+0>: push %rbp
0x0000000000400475 <+1>: mov %rsp,%rbp
0x0000000000400478 <+4>: movl $0x0,-0x4(%rbp)
=> 0x000000000040047f <+11>: movaps 0x4(%rax),%xmm0
0x0000000000400483 <+15>: mov $0x0,%eax
0x0000000000400488 <+20>: leaveq
0x0000000000400489 <+21>: retq
End of assembler dump.
没错,出现了coredump,清晰的看到挂在了movaps上,如果我们该给rax寄存器减去4,使得0x04(%rax) 即:%rax+4刚好16字节对齐:
(gdb) set $rax=$rax-4
(gdb) i r
rax 0x7ffff7dd8f7c 140737351880572
rbx 0x0 0
rcx 0x400474 4195444
rdx 0x7fffffffe668 140737488348776
rsi 0x7fffffffe658 140737488348760
rdi 0x1 1
rbp 0x7fffffffe570 0x7fffffffe570
rsp 0x7fffffffe570 0x7fffffffe570
r8 0x7ffff7dd7ba0 140737351875488
r9 0x7ffff7deae20 140737351953952
r10 0x7fffffffe3c0 140737488348096
r11 0x7ffff7a66c20 140737348267040
r12 0x400390 4195216
r13 0x7fffffffe650 140737488348752
r14 0x0 0
r15 0x0 0
rip 0x400478 0x400478 <main+4>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) c
Continuing.
Program exited normally.