先上一段C代码:
#include <stdio.h>
int a_test = 66;
char glob_str_array[] = "Global Heglo String Array!";
char *glob_str_pointer = "Global Hegxlo String Pointer!";
main()
{
a_test = 88;
const b_test = 99;
const char const_str_array[] = "Const String Array!";
char str_array[] = "Hexlo String Array!";
printf("initial str is: %s\n", str_array);
printf("size of str_array is: %d\n", sizeof(str_array));
str_array[2] = 'l';
printf("str by array is %s\n\n",str_array);
char *str_pointer = "Hexlo String Pointer!";
printf("initial str is: %s\n", str_pointer);
printf("size of str_pointer is: %d\n", sizeof(str_pointer));
str_pointer[2] = 'l';
printf("str by pointer is %s\n",str_pointer);
}
gcc -g -o char_test char_test.c
编译以后运行出现core dump:
initial str is: Hexlo String Array!
size of str_array is: 20
str by array is Hello String Array!
initial str is: Hexlo String Pointer!
size of str_pointer is: 8
Segmentation fault (core dumped)
不管str_array[]和*str_pointer是定义在main之外还是main之内,都会报同样的问题。
gdb调试结果:
(gdb) r
Starting program: /home/test/char_test
initial str is: Hexlo String Array!
size of str_array is: 20
str by array is Hello String Array!
initial str is: Hexlo String Pointer!
size of str_pointer is: 8
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005fd in main () at char_test.c:25
25 str_pointer[2] = 'l';
(gdb)
分析:
char *str_pointer指向的字符串是存储在.rodata区域,会mapping到内存的只读page,从而会引起segv.
dump出来会看的更清楚:
执行readelf -p .rodata char_test >elf.rodata, 结果如下:
String dump of section '.rodata':
[ 4] Global Hegxlo String Pointer!
[ 22] initial str is: %s
[ 36] size of str_array is: %d
[ 50] str by array is %s
[ 65] Hexlo String Pointer!
[ 7b] size of str_pointer is: %d
[ 97] str by pointer is %s
可见char* 指向的字符串不管是放在函数内还是函数外,都是被存储在.rodata只读区域。
执行readelf -p .data char_test >elf.data,结果如下:
String dump of section '.data':
[ 10] B
[ 20] Global Heglo String Array!
[ 42] @
可见,在函数外部定义的全局66(对应到ACII码表的字符B),以及全局字符串数组"Global Heglo String Array!" 被存放在了.data区域(可读写)。
对于在函数内部定义的字符串数组const_str_array和str_array如何存放呢?通过objdump -d char_test查看反汇编代码:
0000000000400520 <main>:
400520: 55 push %rbp
400521: 48 89 e5 mov %rsp,%rbp
400524: 48 83 ec 50 sub $0x50,%rsp
400528: c7 05 2e 05 20 00 58 movl $0x58,0x20052e(%rip) # 600a60 <a_test>
40052f: 00 00 00
400532: c7 45 fc 63 00 00 00 movl $0x63,-0x4(%rbp)
400539: 48 b8 43 6f 6e 73 74 mov $0x74532074736e6f43,%rax
400540: 20 53 74
400543: 48 89 45 d0 mov %rax,-0x30(%rbp)
400547: 48 b8 72 69 6e 67 20 mov $0x72724120676e6972,%rax
40054e: 41 72 72
400551: 48 89 45 d8 mov %rax,-0x28(%rbp)
400555: c7 45 e0 61 79 21 00 movl $0x217961,-0x20(%rbp)
40055c: 48 b8 48 65 78 6c 6f mov $0x7453206f6c786548,%rax
400563: 20 53 74
400566: 48 89 45 b0 mov %rax,-0x50(%rbp)
40056a: 48 b8 72 69 6e 67 20 mov $0x72724120676e6972,%rax
400571: 41 72 72
400574: 48 89 45 b8 mov %rax,-0x48(%rbp)
400578: c7 45 c0 61 79 21 00 movl $0x217961,-0x40(%rbp)
40057f: 48 8d 45 b0 lea -0x50(%rbp),%rax
可以看到字符串已经以ASCII的形式被存放在代码段中,比如:0x74532074736e6f43 由于是小端存放,对应的string顺序为:0x436f6e7374205374, 对应ASCII码表为:Const St,其余的都类似。进入函数调用以后,会从代码段拷贝到函数堆栈。
总结:
对于需要修改的string, 采用char[] 存放,对于不需要修改的string, 为了避免类似segv发生,采用const char*或者const char[] 而非char*,这样在编译阶段编译器就可以保证非法数据操作,而不是等到运行期core dump.
参考:
https://www.bfilipek.com/2014/07/quick-case-char-pointer-vs-char-array.html
https://stackoverflow.com/questions/37902489/in-which-data-segment-is-the-c-string-stored
https://en.wikipedia.org/wiki/Data_segment