char*, char[], 字符串初始化之内存布局

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/sundongsdu/article/details/87975065

先上一段C代码:

#include <stdio.h>

int a_test = 66;
char glob_str_array[] = "Global Heglo String Array!";
char *glob_str_pointer = "Global Hegxlo String Pointer!";

main()
{
    a_test = 88;

    const b_test = 99;
    const char const_str_array[] = "Const String Array!";

    char str_array[] = "Hexlo String Array!";
    printf("initial str is: %s\n", str_array);
    printf("size of str_array is: %d\n", sizeof(str_array));

    str_array[2] = 'l';
    printf("str by array is %s\n\n",str_array);

    char *str_pointer = "Hexlo String Pointer!";
    printf("initial str is: %s\n", str_pointer);
    printf("size of str_pointer is: %d\n", sizeof(str_pointer));

    str_pointer[2] = 'l';
    printf("str by pointer is %s\n",str_pointer);
}

gcc -g -o char_test char_test.c

编译以后运行出现core dump:

initial str is: Hexlo String Array!
size of str_array is: 20
str by array is Hello String Array!

initial str is: Hexlo String Pointer!
size of str_pointer is: 8
Segmentation fault (core dumped)

不管str_array[]和*str_pointer是定义在main之外还是main之内,都会报同样的问题。

gdb调试结果:

(gdb) r
Starting program: /home/test/char_test 
initial str is: Hexlo String Array!
size of str_array is: 20
str by array is Hello String Array!

initial str is: Hexlo String Pointer!
size of str_pointer is: 8

Program received signal SIGSEGV, Segmentation fault.
0x00000000004005fd in main () at char_test.c:25
25		str_pointer[2] = 'l';
(gdb) 

分析:

char *str_pointer指向的字符串是存储在.rodata区域,会mapping到内存的只读page,从而会引起segv.

dump出来会看的更清楚:

执行readelf -p .rodata char_test >elf.rodata, 结果如下:

String dump of section '.rodata':
  [     4]  Global Hegxlo String Pointer!
  [    22]  initial str is: %s

  [    36]  size of str_array is: %d

  [    50]  str by array is %s


  [    65]  Hexlo String Pointer!
  [    7b]  size of str_pointer is: %d

  [    97]  str by pointer is %s

可见char* 指向的字符串不管是放在函数内还是函数外,都是被存储在.rodata只读区域。

执行readelf -p .data char_test >elf.data,结果如下:

String dump of section '.data':
  [    10]  B
  [    20]  Global Heglo String Array!
  [    42]  @

可见,在函数外部定义的全局66(对应到ACII码表的字符B),以及全局字符串数组"Global Heglo String Array!" 被存放在了.data区域(可读写)。

对于在函数内部定义的字符串数组const_str_array和str_array如何存放呢?通过objdump -d char_test查看反汇编代码:

0000000000400520 <main>:
  400520:   55                      push   %rbp
  400521:   48 89 e5                mov    %rsp,%rbp
  400524:   48 83 ec 50             sub    $0x50,%rsp
  400528:   c7 05 2e 05 20 00 58    movl   $0x58,0x20052e(%rip)        # 600a60 <a_test>
  40052f:   00 00 00
  400532:   c7 45 fc 63 00 00 00    movl   $0x63,-0x4(%rbp)
  400539:   48 b8 43 6f 6e 73 74    mov    $0x74532074736e6f43,%rax
  400540:   20 53 74
  400543:   48 89 45 d0             mov    %rax,-0x30(%rbp)
  400547:   48 b8 72 69 6e 67 20    mov    $0x72724120676e6972,%rax
  40054e:   41 72 72
  400551:   48 89 45 d8             mov    %rax,-0x28(%rbp)
  400555:   c7 45 e0 61 79 21 00    movl   $0x217961,-0x20(%rbp)
  40055c:   48 b8 48 65 78 6c 6f    mov    $0x7453206f6c786548,%rax
  400563:   20 53 74
  400566:   48 89 45 b0             mov    %rax,-0x50(%rbp)
  40056a:   48 b8 72 69 6e 67 20    mov    $0x72724120676e6972,%rax
  400571:   41 72 72
  400574:   48 89 45 b8             mov    %rax,-0x48(%rbp)
  400578:   c7 45 c0 61 79 21 00    movl   $0x217961,-0x40(%rbp)
  40057f:   48 8d 45 b0             lea    -0x50(%rbp),%rax

可以看到字符串已经以ASCII的形式被存放在代码段中,比如:0x74532074736e6f43 由于是小端存放,对应的string顺序为:0x436f6e7374205374, 对应ASCII码表为:Const St,其余的都类似。进入函数调用以后,会从代码段拷贝到函数堆栈。

 

总结:

对于需要修改的string, 采用char[] 存放,对于不需要修改的string, 为了避免类似segv发生,采用const char*或者const char[] 而非char*,这样在编译阶段编译器就可以保证非法数据操作,而不是等到运行期core dump.

 

参考:

https://www.bfilipek.com/2014/07/quick-case-char-pointer-vs-char-array.html

https://stackoverflow.com/questions/37902489/in-which-data-segment-is-the-c-string-stored

https://en.wikipedia.org/wiki/Data_segment

 

 

展开阅读全文

没有更多推荐了,返回首页