C++源程序到可执行文件的过程

最新推荐文章于 2021-03-06 17:13:33 发布

nilbooy

最新推荐文章于 2021-03-06 17:13:33 发布

阅读量768

点赞数

分类专栏： c++

本文链接：https://blog.csdn.net/nilbooy/article/details/51377797

版权

c++ 专栏收录该内容

36 篇文章 1 订阅

订阅专栏

C++源程序到可执行文件的过程

编译器将C++源文件编译成目标文件，主要包括以下9个阶段。

Phase 1

源文件读入内存中，源文件的所有字节对应到“基本的源代码字符集”中。另外，与操作系统相关的换行符被替换为标准的newline字符。“基本的源代码字符集”包含96个字符：

 a) 5 whitespace characters (space, horizontal tab, vertical tab, form feed, new-line)
 b) 10 digit characters from '0' to '9'
 c) 52 letters from 'a' to 'z' and from 'A' to 'Z'
 d) 29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '

在源程序中，不能转换为“基本的源代码字符集”的字符，被转换为“universal character name(\u or \U)”

Phase 2

如果遇到反斜线紧跟着换行符，则将反斜线和换行符都删除掉，这样就将两行合并为一行。这个执行过程只执行一遍，所以如果有两个反斜线和两个换行符连在一起，不会将第二个反斜线删除掉。

Phase 3

源文件被分解为注释，一系列空白字符，和多种preprocessing tokens。这些preprocessing tokens包括：

 a) header names such as <iostream> or "myfile.h" (only recognized after #include)
 b) identifiers
 c) preprocessing numbers
 d) character and string literals , including user-defined (since C++11)
 e) operators and punctuators (including alternative tokens), such as +, <<=, new, <%, ##, or and
 f) individual non-whitespace characters that do not fit in any other category