Phases of translation

Phases of translation--翻译阶段

The C++ source file is processed by the compiler as if the following phases take place, in this exact order:

Phase 1 --96个basic source character set

  1. The individual bytes of the source code file are mapped(in implementation defined manner) to the characters of the basic source character set.In particular, OS-dependent end-of-line indicators are replaced by newline character. The basic source character set consists of 96 characters:--96个基本字符
    a) 5 whitespace characters(space, horizontal tab, vertical tab, form feed, new-line)
    b) 10 digit characters from 0 to 9
    c) 52 letters from a to z and from A to Z
    d) 29 punctuation characters:_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
  2. Any source file character that cannot be mapped to a character in basic source character set is replaced by its universal character name (escaped with \u or \U) or by some internal form that is handled equivalently.--转义字符
  3. Trigraph sequences are replaced by corresponding single-character representations.(until C++17)

Phase 2--backslash-反斜线

  1. Whenever backslash appears at the end of a line (immediately followed by the newline character), both backslash and newline are deleted, combining two physical source lines into one logical source line.
    This is a single-pass operation; a line ending in two backslashes followed by an empty line does not combine three lines into one. If a universal character name (\uXXX) is formed on this phase, the behavior is undefined.
  2. If a non-empty source file does not end with a newline character after this step (whether it had no newline originally, or it ended with a backslash), the behavior is undefined (until C++11) a terminating newline character is added (since C++11).

    Phase 3--header,identifiers,numbers,character and string literal

  3. The source file is decomposed into comments, sequences of whitespace characters (space, horizontal tab, new-line, vertical tab, and form-feed), and preprocessing tokens, which are the following:
  • header names such as or "myfile.h" (only recognized after #include)
  • identifiers
  • numbers
  • character and string literal(including alternative tokens),such as +, <<=, new, <%, ##, or and
  • individual non-whitespace characters that do not fit in any other category
  1. Any transformations performed during phases 1 and 2 between the initial and the final double quote of any raw string literal are reverted. (since C++11)
  2. Each comment is replaced by one space character.

Newlines are kept, and it's unspecified whether non-newline whitespace sequences may be collapsed into single space characters.

Phase 4 --preprocessor

  1. The preprocessor is executed.
  2. Each file introduced with the #include directive goes through phases 1 through 4, recursively.
  3. At the end of this phase, all preprocessor directives are removed from the source.

    Phase 5--character and string literal

  4. All characters in character literals and string literals are converted from the source character set to the execution character set (which may be a multibyte character set such as UTF-8, as long as the 96 characters of the basic source character set listed in phase 1 have single-byte representations).
  5. Escape sequences and universal character names in character literals and non-raw string literals are expanded and converted to the execution character set. If the character specified by a universal character name isn't a member of the execution character set, the result is implementation-defined, but is guaranteed not to be a null (wide) character。

    Phase 6

    Adjacent string literals are concatenated.

Phase 7-- translated as a translation unit

Compilation takes place: the tokens are syntactically and semantically analyzed and translated as a translation unit.

Phase 8 -- instantiation unit

Each translation unit is examined to produce a list of required template instantiations, including the ones requested by explicit instantiations. The definitions of the templates are located, and the required instantiations are performed to produce instantiation units.

Phase 9

Translation units, instantiation units, and library components needed to satisfy external references are collected into a program image which contains information needed for execution in its execution environment.

Some compilers don't implement instantiation units (also known as template repositories or template registries) and simply compile each template instantiation at Phase 7, storing the code in the object file where it is implicitly or explicitly requested, and then the linker collapses these compiled instantiations into one at Phase 9

转载于:https://www.cnblogs.com/Wojoin/p/5211165.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值