linux c编程之“hello world”一


[注]:环境说明:
OS:CentOS 7
GCC: 4.8.5
其他环境下的结果可能不尽相同。



  声明:本文是我的一点点经验总结,文中可能有理解不到位甚至是错误的,肯请大家多多批评,多提意见。

  注:这篇总结写于两三年前,虽删改数次,但考虑到自己才疏学浅,实不敢“毁”人不倦,贻笑大方,故当时并未发布。今日闲来无事,忽然想起还有这么一篇总结尚未发布。今日虽又查看了一遍,但难免挂一漏万,怀揣忐忑之情,姑且发布,算是抛砖引玉吧。若本文对阅读者有所帮助,实乃荣幸之至,若有错误或纰漏,万望不吝赐教。

       本文的资料参考自
  Using as
  Using as, the Gnu Assembler


hello world

现有一份名为hello.c的文件,其内容如下:

include <stdio.h>
     
int
main(void)
{   
    printf("hello world!\n");
    return 0;
}

先来做个小调查,有不认识上面这段代码的吗?不了解的请举手。哦,没人举手啊,看来是都认识它了。这很好啊,下面我们来搞些小事情。
这里写图片描述


  我们知道,gcc将源文件编译成可执行的目标文件分为四个阶段:预处理阶段、编译阶段、汇编阶段和链接阶段,执行这四个阶段的程序分别为预处理器(cpp)、编译器(ccl)、汇编器(as)和链接器(ld)。
  分别来看一下这四个程序的输入及输出文件:

程序输入输出
预处理器(cpp)源程序 hello.c
文本
预处理后的程序 hello.i
文本
编译器(ccl)预处理后的程序 hello.i
文本
汇编程序 hello.s
文本
汇编器(as)汇编程序 hello.s
文本
可重定位目标程序 hello.o
二进制
链接器(ld)可重定位目标程序 hello.o printf.o
二进制
可执行目标程序
二进制

可以使用如下编译选项来得到hello.i文件:

gcc -E -o hello.i hello.c

-E:Stop after the preprocessing stage; do not run the compiler proper. The output is in the form of preprocessed source code, which is sent to the standard output.
Input files which don’t require preprocessing are ignored.

       我们知道,在预编译阶段,预编译器(cpp)将处理源代码中的预编译指令。常用的预编译指令有:
  文件包含命令“#include”,宏定义“#define”和条件编译命令“#if”、“#else”、“#endif”等。
  预处理器对它们的处理过程大致如下:

预处理指令处理过程
文件包含命令预处理器将被包含源文件的内容全部复制到当前源文件中。
宏定义预处理器将宏名展开,即替换为具体的值。
仅仅是简单的文本替换。
条件编译指令保留源代码中满足预编译条件的代码(块),删除不满足预编译条件的代码(块)。

       第一步主要看一下输出的汇编程序文件。可通过gcc的-S选项得到此文件。

gcc -S hello.c

-S:编译后即停止,不进行汇编。对于每个输入的非汇编语言文件,输出文件是汇编语言文件,其扩展名通常是".s"。

       编译完成后,汇编文件hello.s的内容如下图:
这里写图片描述


开始学习

汇编文件 hello.s

All assembler directives have names that begin with a period (‘.’). The names are case insensitive for most targets, and usually written in lower case.[ 来自GNU Assembler Manual 2.31]
  或者使用info来查看as手册(CentOS 7.4.1708上的版本为2.25.1):

info as

首先声明一下,本人英文水平那是相当的差,就不翻译啦。英文好的可以直接跳过。我只说一下大概意思,下同。上面两句主要是说,所有汇编指令(或伪操作(Pseudo-operation))都以一个英文半角句点(‘.’)开头,对大多数目标而言,指令名并不区别大小写,通常情况下,其使用小写形式。汇编指令是不参与CPU运行的,只指导编译链接过程。
  
  接下来学习一下文件中的每一行信息。

第1行

.file "hello.c"

        很明显,.file是一个GNU汇编指令,其描述如下:
There are two different versions of the .file directive. Targets that support DWARF2 line number information use the DWARF2 version of .file. Other targets use the default version.
  大意:.file指令有两个版本。支持行号信息的目标使用DWARF2版本的.file指令,其他的使用默认版本。
Default Version
This version of the .file directive tells as that we are about to start a new logical file. The syntax is:

.file string

string is the new file name. In general, the filename is recognized whether or not it is surrounded by quotes ‘“’; but if you wish to specify an empty file name, you must give the quotes–”". This statement may go away in future: it is only recognized to be compatible with old as programs.
  默认版本的.file指令,告诉汇编器(此处的as是汇编器)我们将要开启一个新的逻辑文件。在其语法中,string是新文件名。通常,文件名可以使用引号包围,也可以不用。但是,如果要指定一个空的文件名,则必须要使用引号,即:

.file ""

DWARF2 Version
When emitting DWARF2 line number information, .file assigns filenames to the .debug_line file name table. The syntax is:

.file fileno filename

The fileno operand should be a unique positive integer to use as the index of the entry in the table. The filename operand is a C string literal.
The detail of filename indices is exposed to the user because the filename table is shared with the .debug_info section of the DWARF2 debugging information, and thus the user must know the exact indices that table entries will have.
  对于DWARF2版本,当发出行号信息时,.file指令分配文件名到.debug_line段文件名表。fileno必须是唯一的正整数,用作表中条目的索引。
  DWARF(Debug With Arbitrary Record Format)是广泛使用的,标准的调试数据格式。[各版本DWARF的PDF版文档,请点此下载,链接地址为:http://www.dwarfstd.org/Download.php ]
  
  可以回归主题了,现在我们已经清楚第一条指令是做什么的了,接下来分析第2行代码。


第2行

.section       .rodata

       .section为汇编指令,它也有两个不同的版本,来看一下其描述信息。
Use the .section directive to assemble the following code into a section named name.
This directive is only supported for targets that actually support arbitrarily named sections; on a.out targets, for example, it is not accepted, even with a standard a.out section name.
  .section指令将其后面的代码汇编到名字为name的段中。

       先来看COFF版本的.section指令:
COFF Version
For COFF targets, the .section directive is used in one of the following ways:

.section name[, "flags"]
.section name[, subsection]

If the optional argument is quoted, it is taken as flags to use for the section. Each flag is a single character. The following flags are recognized:
  如果可选参数被引号包围起来,它会被看作是段的标志。每个标志都是单个字母。可用的标志如下表:

标志描述
bbss section (uninitialized data)
nsection is not loaded
wwritable section
ddata section
eexclude section from linking
rread-only section
xexecutable section
mmergeable section (TIGCC extension, symbols in the section are considered mergeable constants)
uunaligned section (TIGCC extension, the contents of the section need not be aligned)
sshared section (meaningful for PE targets, useless for TIGCC)
aignored (for compatibility with the ELF version)
ysection is not readable (meaningful for PE targets)
0-9single-digit power-of-two section alignment (GNU extension)   注:2n(n∈[0,9])

If no flags are specified, the default flags depend upon the section name. If the section name is not recognized, the default will be for the section to be loaded and writable. Note the n and w flags remove attributes from the section, rather than adding them, so if they are used on their own it will be as if no flags had been specified at all.
  .section指令如果没有指定标志,则默认标志取决于段名。
  如果段名不是标准段名——也就是说,段名可以任意命名,则该段默认为要加载的且可写的。
  需要注意的是,n和w标志是从段中移除属性,而不是添加属性,因此若是仅使用这两个标志时,表示该段并没有使用任何标志。

If the optional argument to the .section directive is not quoted, it is taken as a subsegment number。
  如果.section指令的可选参数没有使用引号包围,该参数将被视作子段号。那么问题来了,什么是子段号?
  
  先来看什么是子段:
Sub-Sections
Assembled bytes conventionally fall into two sections: text and data. You may have separate groups of data in named sections that you want to end up near to each other in the object file, even though they are not contiguous in the assembler source. as allows you to use subsections for this purpose. Within each section, there can be numbered subsections with values from 0 to 8192. Objects assembled into the same subsection go into the object file together with other objects in the same subsection. For example, a compiler might want to store constants in the text section, but might not want to have them interspersed with the program being assembled. In this case, the compiler could issue a ‘.text 0’ before each section of code being output, and a ‘.text 1’ before each group of constants being output.
  汇编器通过使用子段,可以使汇编程序源文件中具名段内彼此相离的独立数据组,在编入目标文件后彼此相邻。
  每个段内,子段的编号可以从0到8192。(子段号就是一个子段的编号,其范围为[0, 8192]
  汇编到同一子段内的对象,和同一子段中的其他对象一起编入目标文件。

Subsections are optional. If you do not use subsections, everything goes in subsection number zero.
  子段是可选的,如果不使用子段,所有的都会编入0号子段。

Each subsection is zero-padded up to a multiple of four bytes. (Subsections may be padded a different amount on different flavors of as.)
  每个子段是0到4的倍数字节填充的。即填充字节数为4n (n为非负整数)。不同的汇编器,其子段填充数量不同。

Subsections appear in your object file in numeric order, lowest numbered to highest. (All this to be compatible with other people’s assemblers.) The object file contains no representation of subsections; ld and other programs that manipulate object files see no trace of them. They just see all your text subsections as a text section, and all your data subsections as a data section.
  子段以数字顺序出现在目标文件中,编号从最小到最高。此项,不同的汇编程序是相互兼容的。目标文件不包含子段的表述;ld(加载器)和操作目标文件的其他程序觉察不到子段的任何踪迹。这些程序将text段的所有子段视作一个text段,同样,所有data段的子段视作一个data段。

To specify which subsection you want subsequent statements assembled into, use a numeric argument to specify it, in a ‘.text expression’ or a ‘.data expression’ statement. When generating COFF output, you can also use an extra subsection argument with arbitrary named sections: ‘.section name, expression’. When generating ELF output, you can also use the .subsection directive (see SubSection) to specify a subsection: ‘.subsection expression’. Expression should be an absolute expression (see Expressions). If you just say ‘.text’ then ‘.text 0’ is assumed. Likewise ‘.data’ means ‘.data 0’. Assembly begins in text 0. For instance:

.text 0     # The default subsection is text 0 anyway.
.ascii "This lives in the first text subsection. *"
.text 1
.ascii "But this lives in the second text subsection."
.data 0
.ascii "This lives in the data section,"
.ascii "in the first data subsection."
.text 0
.ascii "This lives in the first text section,"
.ascii "immediately following the asterisk (*)."

       为了将后续指令汇编入某个子段,在 “.text expression” 或 “.data expression” 语句中,使用一个数字参数来指定该子段。在生成COFF输出时,还可以使用带有额外子段参数的任意命名的段:“.section name, expression”。在生成ELF输出时,还可以使用.subsection指令指定一个子段:“.subsection expression”(用子段expression替换当前子段)。expression应该是纯粹的表达式(expression指定一个地址或数字值)。“.text”即是“text 0”,同样,“.data”表示“.data 0”。汇编语言从text 0开始。通过上面的例子可以看出,同一子段中的指令被汇编在相邻的位置。
  
Each section has a location counter incremented by one for every byte assembled into that section. Because subsections are merely a convenience restricted to as there is no concept of a subsection location counter. There is no way to directly manipulate a location counter—but the .align directive changes it, and any label definition captures its current value. The location counter of the section where statements are being assembled is said to be the active location counter.
  每个段有一个位置计数器,每编入一个字节到该段,其位置计数器就会自增1。因为对于汇编器来说,子段仅仅是一个便利事物,没有子段位置计数器的概念。除了使用.align指令修改外,不能直接操作位置计数器,并且任何标签定义会获取位置计数器的当前值。指令正被汇编入的段,其位置计数器是活动的位置计数器。
  
  现在可以简单的总结一下:

子段号就是子段的编号,取值范围在[0, 8192]之间;
通过子段号,可以将汇编文件中分散的指令汇集在一起;
段的默认子段号为0;

       再来看ELF版本的.section指令[本汇编文件中的.section即为此版本]:
ELF Version
This is one of the ELF section stack manipulation directives. The others are .subsection, .pushsection, .popsection, and .previous.
This directive replaces the current section and subsection.
For ELF targets, the .section directive is used like this:

.section name [, "flags"[, @type[,flag_specific_arguments]]]

The optional flags argument is a quoted string which may contain any combination of the following characters:

标志描述
asection is allocatable
dsection is a GNU_MBIND section
esection is excluded from executable and shared library.
wsection is writable
xsection is executable
Msection is mergeable
Ssection contains zero terminated strings
Gsection is a member of a section group
Tsection is used for thread-local-storage
?section is a member of the previously-current section’s group, if any
<number>a numeric value indicating the bits to be set in the ELF section header’s flags field.
Note - if one or more of the alphabetic characters described above is also included in the flags field, their bit values will be ORed into the resulting value.
<target specific>some targets extend this list with their own flag characters

Note - once a section’s flags have been set they cannot be changed. There are a few exceptions to this rule however. Processor and application specific flags can be added to an already defined section. The .interp, .strtab and .symtab sections can have the allocate flag (a) set after they are initially defined, and the .note-GNU-stack section may have the executable (x) flag added.
  通常,一旦设置了段的标志,它们不能被更改。
  
The optional type argument may contain one of the following constants:

类型描述
@progbitssection contains data
@nobitssection does not contain data (i.e., section only occupies space)
@notesection contains data which is used by things other than the program
@init_arraysection contains an array of pointers to init functions
@fini_arraysection contains an array of pointers to finish functions
@preinit_arraysection contains an array of pointers to pre-init functions
@<number>a numeric value to be set as the ELF section header’s type field.
@<target specific>some targets extend this list with their own types

Many targets only support the first three section types. The type may be enclosed in double quotes if necessary.
  多数目标仅支持前三个段类型。如果需要,类型可以用双引号括起来。

Note on targets where the @ character is the start of a comment (eg ARM) then another character is used instead. For example the ARM port uses the % character.
  使用@符号作为注释开始的目标,使用其他字符代替它。比如,ARM移植使用%字符。

Note - some sections, eg .text and .data are considered to be special and have fixed types. Any attempt to declare them with a different type will generate an error from the assembler.
  .text和.data是特殊的段并且有固定的类型。任何使用不同类型去声明它们的尝试,汇编程序会产生错误。

If flags contains the M symbol then the type argument must be specified as well as an extra argument—entsize—like this:
  如果标志包含符号M,则必须指定类型参数和一个额外参数—entsize。

.section name , "flags"M, @type, entsize

Sections with the M flag but not S flag must contain fixed size constants, each entsize octets long. Sections with both M and S must contain zero terminated strings where each character is entsize bytes long. The linker may remove duplicates within sections with the same name, same entity size and same flags. entsize must be an absolute expression. For sections with both M and S, a string which is a suffix of a larger string is considered a duplicate. Thus “def” will be merged with “abcdef”; A reference to the first “def” will be changed to a reference to “abcdef”+3.
  这段描述也好理解。"M"是可合并的,"S"是包含0终结的字符串(即C风格的字符串)。含M标志但不含S标志的段,必须包含确定大小的常数,每个entsize八位长(即1字节)。同时包含M和S标志的段必须包含0终结的字符串,其每个字符为entsize个字节长。链接器可能会删除具有相同名字、相同实体大小和相同标志的段内副本。entsize必须是一个纯粹的表达式。对于同时具有M和S标志的段,一个字符串是另一个更大字符串后缀,此字符串被看作是一个副本。通过列举的例子更容易理解:“def"将合并到"abcdef"中,对原"def"的引用,被更改为对"adcdef”+3的引用。想一想C中的指针。

If flags contains the G symbol then the type argument must be present along with an additional field like this:
  如果标志包含符号G,类型参数必须与一个如下的附加字段一起出现:

.section name , "flags"G, @type, GroupName[, linkage]

The GroupName field specifies the name of the section group to which this particular section belongs.
  GroupName字段指定当前指令中特定段所属的段组名,即段name属于哪个段组。
The optional linkage field can contain:

linkage描述
comdatindicates that only one copy of this section should be retained
.gnu.linkoncean alias for comdat

Note: if both the M and G flags are present then the fields for the Merge flag should come first, like this:
  如果M和G标志同时出现,则合并标志M应先出现,同样适应于type参数的附加参数。

.section name , "flags"MG, @type, entsize, GroupName[, linkage]

If flags contains the ? symbol then it may not also contain the G symbol and the GroupName or linkage fields should not be present. Instead, ? says to consider the section that’s current before this directive. If that section used G, then the new section will use G with those same GroupName and linkage fields implicitly. If not, then the ? symbol has no effect.

If no flags are specified, the default flags depend upon the section name. If the section name is not recognized, the default will be for the section to have none of the above flags: it will not be allocated in memory, nor writable, nor executable. The section will contain data.

For ELF targets, the assembler supports another type of .section directive for compatibility with the Solaris assembler:
  对于ELF目标,如下格式的.section指令作为Solaris汇编程序的兼容。

.section "name"[, flags...]

Note that the section name is quoted. There may be a sequence of comma separated flags:
  需要注意的是,段名由引号包围。标志由逗号分隔

标志描述
#allocsection is allocatable
#writesection is writable
#execinstrsection is executable
#excludesection is excluded from executable and shared library.
#tlssection is used for thread local storage

This directive replaces the current section and subsection. See the contents of the gas testsuite directory gas/testsuite/gas/elf for some examples of how this directive and the other section stack directives work.
  可以查看gas/testsuite/gas/elf目录下使用此指令和相关指令的例子。[点此下载相关文件]文件是名为binutils-2.31的包,选择一个下载即可。下载后到指定目录下即可查看示例,针对当前指令,相关文件名由section打头。网址:http://ftp.gnu.org/gnu/binutils/
  
  回过头来看第二行代码,.section指令声明了一个名为.rodata的段,即只读数据段。下面来看第三行。


第3行

.LC0:

       我们已经知道,指令由句点开始,后面紧跟着的是指令名,但这一行中的标识符里,最后是由冒号结束的,显然,这个标识符不是一个汇编指令。那么它是什么呢?有什么用途呢?
  解决这两个问题之前,先来看一下什么是符号?
  
Symbol names begin with a letter or with one of ‘._’. On most machines, you can also use $ in symbol names. That character may be followed by any string of digits, letters, dollar signs (unless otherwise noted for a particular target machine), and underscores.
  
  符号名命名规则:

使用字母或".“或”_“打头,大部分机器上,符号名可以包含”$“。其后可使用由数字、字母、”$“及”_"组成的任意字符串。
组成符号的字母是大小写敏感的。
符号名不能由数字打头。Local Labels例外。

Local Symbol Names
A local symbol is any symbol beginning with certain local label prefixes. By default, the local label prefix is ‘.L’ for ELF systems or ‘L’ for traditional a.out systems, but each target may have its own set of local label prefixes. On the HPPA local symbols begin with ‘L$’.
  局部符号是由确定的局部标签前缀打头的符号。ELF系统中,局部标签前缀为".L",传统的a.out系统则为"L",每个目标都有自己的局部标签前缀集。
  目前为止,已经知道".LC0"是一个符号,其中的".L"是局部标签前缀。所以,现在可以说".LC0"是一个局部符号。

Local symbols are defined and used within the assembler, but they are normally not saved in object files. Thus, they are not visible when debugging. You may use the ‘-L’ option to retain the local symbols in the object files. This option(‘-L’) tells as to retain those local symbols in the object file. Usually if you do this you also tell the linker ld to preserve those symbols.
  调试时,局部符号是不可见的,宏也是不可见的,但是可以通过使用as(汇编器)的’-L’选项,让目标文件将局部符号保留在其符号表中。而宏则无能为力了。
  
  虽然已经知道".LC0"是一个局部符号了,但其后还有一个冒号":"。继续往下看。
Labels
A label is written as a symbol immediately followed by a colon ‘:’. The symbol then represents the current value of the active location counter, and is, for example, a suitable instruction operand. You are warned if you use the same symbol to represent two different locations: the first definition overrides any other definitions.

标签是其后紧跟着一个冒号(:)的符号。
符号(而不是标签)表示活动位置计数器的当前值,即一个地址;
符号可以作为指令操作数使用,即可以用在指令中;
如果使用同一个符号表示两个不同的位置,第一个定义会覆盖其他定义。警告信息为:
the first definition overrides any other definitions.
汇编程序经汇编器处理后,所有符号都被替换成它所代表的地址值。与预处理程序很像,源代码经预处理程序处理后,所有源代码中使用的宏都被替换。

因此,第三行定义了一个名为.LC0的标签。.LC0在.rodata段。其值为位置计数器的当前值。继续看第四行。[是否有似曾相识的感觉,可曾想起C语言中的goto]


第4行

.string    "hello world!"

       .string是一个汇编指令。格式如下:

.string "str", .string8 "str", .string16 "str", .string32 "str", .string64 "str"

Copy the characters in str to the object file. You may specify more than one string to copy, separated by commas. Unless otherwise specified for a particular machine, the assembler marks the end of each string with a 0 byte. You can use any of the escape sequences described in Strings.

将str中的字符拷贝到目标文件中;
可以指定多个拷贝字符串,以逗号分隔;
除非对特定的机器有另外的规定外,汇编程序在每个字符串末尾添加一个0字节。(C风格的字符串)

The variants string16, string32 and string64 differ from the string pseudo opcode in that each 8-bit character from str is copied and expanded to 16, 32 or 64 bits respectively. The expanded characters are stored in target endianness byte order.
  变体string16,string32和string64与伪操作string不同,每个str中的8位字符分别被拷贝并扩展为16位,32位和64位。被扩展的字符以目标字节顺序存储。
Example:

	.string32 "BYE"

expands to:

	.string   "B\0\0\0Y\0\0\0E\0\0\0"  /* On little endian targets.  */
	.string   "\0\0\0B\0\0\0Y\0\0\0E"  /* On big endian targets.  */

所以,这行指令定义了一个由0终结的字符串"hello world!"。在.rodata段。接下来看第五行。


第5行

.text

       .text指令的语法如下:

.text subsection

Tells as to assemble the following statements onto the end of the text subsection numbered subsection, which is an absolute expression. If subsection is omitted, subsection number zero is used.
  .text指令通知汇编器汇编随后的指令到编号为subsection的text子段末尾。关于子段,可以查看前面相关的描述。

       这行的意思是,通知汇编器,将后面的指令/语句汇编到text段,由于忽略了子段号,所以使用0作为其子段号。再来看第六行。


第6行

.globl    main

       .globl的语法如下:

.global symbol, .globl symbol

.global makes the symbol visible to ld. If you define symbol in your partial program, its value is made available to other partial programs that are linked with it. Otherwise, symbol takes its attributes from a symbol of the same name from another file linked into the same program.
  .global/.globl指令使用符号symbol对链接器ld可见。即告诉汇编器,symbol要被链接器用到,在目标文件的符号表中,symbol被标记为一个全局符号。即目标文件符号表中,symbol的Bind值为GLOBAL。
Both spellings (‘.globl’ and ‘.global’) are accepted, for compatibility with other assemblers.
On the HPPA, .global is not always enough to make it accessible to other partial programs. You may need the HPPA-only .EXPORT directive as well.

       因此,本行中的语句就是定义了一个名为main全局符号,存储在目标文件的符号表中,main被链接器使用,main在.text段。接下来看第七行。


第7行

.type  main, @function

This directive is used to set the type of a symbol.
  该指令用来设置符号的类型。
COFF Version
For COFF targets, this directive is permitted only within .def/.endef pairs. It is used like this:

.type int

This records the integer int as the type attribute of a symbol table entry.

ELF Version
For ELF targets, the .type directive is used like this:

.type name , type description

This sets the type of symbol name to be either a function symbol or an object symbol. There are five different syntaxes supported for the type description field, in order to provide compatibility with various other assemblers.
  设置符号name的类型为函数符号或对象符号。为了兼容性,有五种不同的语法。

Because some of the characters used in these syntaxes (such as ‘@’ and ‘#’) are comment characters for some architectures, some of the syntaxes below do not work on all architectures. The first variant will be accepted by the GNU assembler on all architectures so that variant should be used for maximum portability, if you do not need to assemble your code with other assemblers.
  为了实现最大的可移植性,建议使用第一个变体形式。在所有架构中,GAS都接受该语法。

The syntaxes supported are:

  .type <name> STT_<TYPE_IN_UPPER_CASE>
  .type <name>,#<type>
  .type <name>,@<type>
  .type <name>,%<type>
  .type <name>,"<type>"

The types supported are:

类型描述
STT_FUNC
function
Mark the symbol as being a function name.
STT_GNU_IFUNC
gnu_indirect_function
Mark the symbol as an indirect function when evaluated during reloc processing. (This is only supported on assemblers targeting GNU systems).
STT_OBJECT
object
Mark the symbol as being a data object.
STT_TLS
tls_object
Mark the symbol as being a thread-local data object.
STT_COMMON
common
Mark the symbol as being a common data object.
STT_NOTYPE
notype
Does not mark the symbol in any way. It is supported just for completeness.
gnu_unique_objectMarks the symbol as being a globally unique data object. The dynamic linker will make sure that in the entire process there is just one symbol with this name and type in use. (This is only supported on assemblers targeting GNU systems).

Note: Some targets support extra types in addition to those listed above.
  一些目标还支持除上述列出之外的其他类型。
  
  本行的意思也就很明显了,即将main标记为函数类型,即main是一个函数名。再来看第八行。


第8行

main:

       通过前面的介绍可知,main是一个标签。不再赘述。下面看第九行。


第9行

.LFB0:

       此为一个标签。其中的FB即"function begin"。其中的数字0是一个任意的数值,是编译器基于一些实现细节生成的唯一标签名。


第10行

.cfi_startproc

       .cfi打头的指令是CFI(Call Frame Information)指令,是辅助汇编器创建栈帧(stack frame)信息的。有25个CFI指令。
  .cfi_startproc的语法如下:

.cfi_startproc [simple]

.cfi_startproc is used at the beginning of each function that should have an entry in .eh_frame. It initializes some internal data structures. Don’t forget to close the function by .cfi_endproc.
  .cfi_startproc用在每个函数的开头。它初始化一些内部数据结构,需要与.cfi_endproc成对使用。
Unless .cfi_startproc is used along with parameter simple it also emits some architecture dependent initial CFI instructions.

       在每个函数调用过程中,都会形成一个栈帧。理论上,调试器或异常处理程序完全可以根据frame pointer(或base pointer,通常保存在寄存器ebp(32位CPU)/rbp(64位CPU)中)来遍历调用过程中各个函数的栈帧,但是因为gcc的代码优化,可能导致调试器或异常处理很难甚至不能正常回溯栈帧,所以这些CFI指令的目的就是辅助编译过程创建栈帧信息,并将它们保存在目标文件的".eh_frame"段中,这样就不会被编译器优化影响了。
       GCC Exception Frame即eh_frame,其中的eh为exception handling.


第11行

pushq   %rbp

       push是汇编指令。其后的q指明了操作数据的位数,即长度,8字节。rbp,64位寄存器。AT&T格式的汇编中,寄存器前面需要使用%。
  将保存在寄存器rbp中的base pointer压入栈。目的是保存现场,以便在调用完成后恢复现场。


第12行

.cfi_def_cfa_offset 16

.cfi_def_cfa_offset汇编指令语法如下:

.cfi_def_cfa_offset offset

.cfi_def_cfa_offset modifies a rule for computing CFA. Register remains the same, but offset is new. Note that it is the absolute offset that will be added to a defined register to compute CFA address.
可以看到,这个是用来计算CFA的。
  
CFA(Canonical Frame Address)
An area of memory that is allocated on a stack called a ‘‘call frame.’’ The call frame is identified by an address on the stack. We refer to this address as the Canonical Frame Address or CFA.Typically, the CFA is defined to be the value of the stack pointer at the call site in the previous frame (which may be different from its value on entry to the current frame).[参考DWARF 6.4]


第13行

.cfi_offset 6, -16
.cfi_offset register, offset

Previous value of register is saved at offset offset from CFA.
寄存器6之前的值被保存在CFA偏移’-16’的位置。


第14行

movq    %rsp, %rbp

       mov为汇编指令,是传送指令。即将寄存器rsp的内容传送到rbp中,当前函数的栈基址就存放在寄存器rbp中了。与第11行放在一起,是不是特别的眼熟了。


第15行

.cfi_def_cfa_register 6
.cfi_def_cfa_register register

.cfi_def_cfa_register modifies a rule for computing CFA. From now on register will be used instead of the old one. Offset remains the same.


第16行

movl    $.LC0, %edi

       通过对第三行的学习研究,我知道.LC0这个符号表示的是字符串"hello world!"的地址,它是一个立即数。所以,这行指令表示将字符串"hello world!"的地址送到寄存器edi中,用作函数printf()的参数。


第17行

call    puts

       调用puts函数。


第18行

movl    $0, %eax

       将返回值0放到寄存器eax中。为什么将返回值存储在eax中呢?这大概就是王八的屁股,规定(龟腚)了吧。


第19行

popq    %rbp

       将栈中的数据弹出到rbp中,以达到恢复现场的目的。


第20行

.cfi_def_cfa 7, 8
.cfi_def_cfa register, offset

.cfi_def_cfa defines a rule for computing CFA as: take address from register and add offset to it.


第21行

ret

       从函数返回。


第22行

.cfi_endproc

.cfi_endproc is used at the end of a function where it closes its unwind entry previously opened by .cfi_startproc, and emits it to .eh_frame.
       参考第10行。


第23行

.LFE0:

       此为一个标签。其中的FE即"function end"。
       参考第9行。


第24行

.size   main, .-main

This directive is used to set the size associated with a symbol.
它也有两个版本,COFF版与ELF版,这里我们关注的是ELF版本,其语法如下:

.size name , expression

This directive sets the size associated with a symbol name. The size in bytes is computed from expression which can make use of label arithmetic. This directive is typically used to set the size of function symbols.

       通常用于设置函数符号的大小。本行中,expression为".-main"。
       通过前面的学习,我们已经知道什么是符号,为加强记忆,这里重温一下:
A symbol is one or more characters chosen from the set of all letters (both upper and lower case), digits and the three characters _.$.

       这里有一个特殊符号dot(.)
The special symbol . refers to the current address that as is assembling into. Thus, the expression melvin: .long . defines melvin to contain its own address. Assigning a value to . is treated the same as a .org directive. Thus, the expression .=.+4 is the same as saying .space 4.

       所以,这一行指令表示函数main大小为当前地址减去符号main代表的地址。


第25行

.ident  "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-16)"

This directive is used by some assemblers to place tags in object files. The behavior of this directive varies depending on the target. When using the a.out object file format, as simply accepts the directive for source-file compatibility with existing assemblers, but does not emit anything for it. When using COFF, comments are emitted to the .comment or .rdata section, depending on the target. When using ELF, comments are emitted to the .comment section.

做个实验看看

--> gcc -o hello hello.c
--> strings hello | grep GCC
--> GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-16)

可以看到,生成的可执行文件中确实存在指定的标记。嗯,我似乎可以用它放一些小秘密到程序中了 😛


第26行

.section    .note.GNU-stack,"",@progbits

参考前面第2行内容。

做个实验,看一下目标文件中是否有名为.note.GNU-stack的段。
只做汇编,不链接:

编译成可执行文件,再看看:
在这里插入图片描述

X

删除一些信息,看着更加清爽
在这里插入图片描述

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
ARM 是一种广泛使用的 CPU 架构,而 Linux 内核是一个开放源代码的操作系统内核。在 ARM 平台上,我们可以通过内核模块编程的方式与内核进行交互,实现一些自定义的功能。 下面,我们将介绍如何在 ARM Linux编写内核模块,并输出一个简单的 "Hello World" 消息。 ## 1. 环境准备 在开始编写内核模块之前,需要先准备好开发环境。具体步骤如下: 1. 安装交叉编译工具链。ARM 平台上的应用程序和内核模块需要使用交叉编译工具链进行编译。可以从官网下载对应的交叉编译工具链,也可以使用已经编译好的交叉编译工具链。 2. 下载内核源代码。可以从官网下载对应版本的内核源代码,也可以使用已经编译好的内核源代码。 3. 配置内核源代码。需要在内核源代码根目录下运行配置命令 `make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- menuconfig` 进行配置,选择需要的模块和功能。 ## 2. 编写内核模块 在准备好开发环境之后,可以开始编写内核模块了。具体步骤如下: 1. 创建一个新的文件夹,用于存放内核模块代码。 2. 创建一个新的 C 文件,命名为 `hello.c`。 3. 在 `hello.c` 文件中编写以下代码: ```c #include <linux/init.h> #include <linux/module.h> static int __init hello_init(void) { printk(KERN_INFO "Hello, world!\n"); return 0; } static void __exit hello_exit(void) { printk(KERN_INFO "Goodbye, world!\n"); } module_init(hello_init); module_exit(hello_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Your Name"); MODULE_DESCRIPTION("A simple hello world module"); ``` 这段代码定义了一个简单的内核模块,当模块加载时会输出 "Hello, world!" 消息,当模块卸载时会输出 "Goodbye, world!" 消息。 4. 使用交叉编译工具链进行编译。在终端中进入 `hello.c` 文件所在的文件夹,运行以下命令进行编译: ```bash arm-linux-gnueabi-gcc -Wall -Werror -O2 -o hello.ko -c hello.c ``` 这个命令将生成一个名为 `hello.ko` 的内核模块文件。 ## 3. 加载和卸载内核模块 在编写好内核模块后,我们需要将它加载到内核中进行测试。具体步骤如下: 1. 将 `hello.ko` 文件复制到 ARM Linux 系统上。 2. 在终端中进入 `hello.ko` 文件所在的文件夹,运行以下命令以加载内核模块: ```bash insmod hello.ko ``` 这个命令将调用内核中的 `init_module` 函数,执行 `hello_init` 函数,输出 "Hello, world!" 消息。 3. 查看系统日志,可以看到 "Hello, world!" 消息。 ```bash dmesg ``` 4. 在终端中运行以下命令以卸载内核模块: ```bash rmmod hello ``` 这个命令将调用内核中的 `cleanup_module` 函数,执行 `hello_exit` 函数,输出 "Goodbye, world!" 消息。 5. 再次查看系统日志,可以看到 "Goodbye, world!" 消息。 至此,我们已经成功地在 ARM Linux编写了一个简单的内核模块,并输出了 "Hello, world!" 消息。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值