Quoted-Printable 加码规则(RFC 1341):
1. 字符用 =XX 形式表示,其中 XX 是该字符的十六进制值,
必须为 0-9 或者 A-F (使用大写字符),除非有可替换说明,
否则,此原则是强制性的。
2. 其中,十进制值 33-60 & 62-126(注意: 即不包含 '= ' )
可以作为标准 ASCII 从而不进行转换。
3. 另外,十进制值 9-32 也可以作为制表和格式控制字符,
从而不进行转换。(注意,这个不是必须执行的,即也可以转换)
4. 由于在 RFC822 协议中规定主体 body 文本中各行均有最大字
符限制,因此,当主体文本中出现 CRLF 或者 LFCR 字符序列,
或者单独的 CR 以及 LF 字符的时候,必须转换成对应的
"=0D=0A ", "=0A=0D ", "=0D ", "=0A " 等编码来表示。
5. (关于软回车的问题) Quoted-Printable 编码要求编码后每行
最大字符数量不得超过 76 个字符。如果对大于该字符数量的行进
行编码,则必须使用软回车。所以,对于某个以编码行的最后加上
'= '符号,则表示最后这个 '= ' 是一个无意义的软回车。所以,如
果一个尚未编码的行的内容如下的话:
Now 's the time for all folk to come to the aid of their country.
那么在 Quoted-Printable 中可以表示为:
Now 's the time =
for all folk to come=
to the aid of their country.
他提供了一种对过长的行进行编码并恢复到用户原来的输入内容的
机制。虽然一行的末尾的 CRLF 不计入 76 个字符的限制之中,但
是所有的其他字符,包括 '= ' 符号都将被计算在内。
由于连字符号 '- ' 在 Quoted-Printable 编码中表示他自己,所以当
我们在对一个 multipart 实体的主体内容编码的时候,我们必须注
意:我们决不能让一个 boundary 标志符出现在编码的主体部分!
(一个比较好的办法是在 boundary 中包含一个 "=_ ",这样就决不会重复
了,具体情况清查阅 RFC 1341 中的 multipart message 的定义部分。)
注意:采用 Quoted-Printable 编码是邮件的传输过程中,对于易读性
和可靠性折衷的一种编码。对于使用 Quoted-Printable 编码的邮件主
体,绝大多数邮件网关(mail gateway)都能够可靠的工作,但是也可能
在极少的邮件网关上工作的并不十分好,最显著的莫过于涉及到那些
EBCDIC 的传输的时候。(理论上来说, EBCDIC 网关能够对 Quoted-Pintable
编码进行解码,然后使用 Base64 编码来重新对主体内容进行编码,但是
这些网关在实际中还没有出现呢。)
对于更高的要求,我们使用 Base64 编码。一种适度可信的传输通过
EBCDIC 网关的方法就是依照 [规则 1] 引用如下的 ASCII 码:
! "#$@[\]^`{|}~
更多信息请查看 RFC1341 的 [附录 B]。
由于被 Quoted-Printable 编码的数据通常被认为是行导向的(line-oriented),
对于使用 Quoted-Printable 编码的数据我们希望行与行之间换行符在传输中被
改写(译者注:由于不同的系统 unix, windows, mac得换行符不同),同样的,我
们希望一封普通文本文件内容的邮件(plain text mail)可以在不同的系统中转换
成不同换行符的互联网邮件(Internet mail)。如果这种转换可能导致原始数据大
量变化(a corruption of the data),那么比较明智的选择是应用 base64 编码,
来替换 Quoted-Printable 编码!
1. 字符用 =XX 形式表示,其中 XX 是该字符的十六进制值,
必须为 0-9 或者 A-F (使用大写字符),除非有可替换说明,
否则,此原则是强制性的。
2. 其中,十进制值 33-60 & 62-126(注意: 即不包含 '= ' )
可以作为标准 ASCII 从而不进行转换。
3. 另外,十进制值 9-32 也可以作为制表和格式控制字符,
从而不进行转换。(注意,这个不是必须执行的,即也可以转换)
4. 由于在 RFC822 协议中规定主体 body 文本中各行均有最大字
符限制,因此,当主体文本中出现 CRLF 或者 LFCR 字符序列,
或者单独的 CR 以及 LF 字符的时候,必须转换成对应的
"=0D=0A ", "=0A=0D ", "=0D ", "=0A " 等编码来表示。
5. (关于软回车的问题) Quoted-Printable 编码要求编码后每行
最大字符数量不得超过 76 个字符。如果对大于该字符数量的行进
行编码,则必须使用软回车。所以,对于某个以编码行的最后加上
'= '符号,则表示最后这个 '= ' 是一个无意义的软回车。所以,如
果一个尚未编码的行的内容如下的话:
Now 's the time for all folk to come to the aid of their country.
那么在 Quoted-Printable 中可以表示为:
Now 's the time =
for all folk to come=
to the aid of their country.
他提供了一种对过长的行进行编码并恢复到用户原来的输入内容的
机制。虽然一行的末尾的 CRLF 不计入 76 个字符的限制之中,但
是所有的其他字符,包括 '= ' 符号都将被计算在内。
由于连字符号 '- ' 在 Quoted-Printable 编码中表示他自己,所以当
我们在对一个 multipart 实体的主体内容编码的时候,我们必须注
意:我们决不能让一个 boundary 标志符出现在编码的主体部分!
(一个比较好的办法是在 boundary 中包含一个 "=_ ",这样就决不会重复
了,具体情况清查阅 RFC 1341 中的 multipart message 的定义部分。)
注意:采用 Quoted-Printable 编码是邮件的传输过程中,对于易读性
和可靠性折衷的一种编码。对于使用 Quoted-Printable 编码的邮件主
体,绝大多数邮件网关(mail gateway)都能够可靠的工作,但是也可能
在极少的邮件网关上工作的并不十分好,最显著的莫过于涉及到那些
EBCDIC 的传输的时候。(理论上来说, EBCDIC 网关能够对 Quoted-Pintable
编码进行解码,然后使用 Base64 编码来重新对主体内容进行编码,但是
这些网关在实际中还没有出现呢。)
对于更高的要求,我们使用 Base64 编码。一种适度可信的传输通过
EBCDIC 网关的方法就是依照 [规则 1] 引用如下的 ASCII 码:
! "#$@[\]^`{|}~
更多信息请查看 RFC1341 的 [附录 B]。
由于被 Quoted-Printable 编码的数据通常被认为是行导向的(line-oriented),
对于使用 Quoted-Printable 编码的数据我们希望行与行之间换行符在传输中被
改写(译者注:由于不同的系统 unix, windows, mac得换行符不同),同样的,我
们希望一封普通文本文件内容的邮件(plain text mail)可以在不同的系统中转换
成不同换行符的互联网邮件(Internet mail)。如果这种转换可能导致原始数据大
量变化(a corruption of the data),那么比较明智的选择是应用 base64 编码,
来替换 Quoted-Printable 编码!
5.1 Quoted-Printable Content-Transfer-Encoding The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the ASCII character set. It encodes the data in such a way that the resulting octets are unlikely to be modified by mail transport. If the data being encoded are mostly ASCII text, the encoded form of the data remains largely recognizable by humans. A body which is entirely ASCII may also be encoded in Quoted-Printable to ensure the integrity of the data should the message pass through a character-translating, and/or line-wrapping gateway. In this encoding, octets are to be represented as determined by the following rules: Rule #1: (General 8-bit representation) Any octet, except those indicating a line break according to the newline convention of the canonical form of the data being encoded, may be represented by an "=" followed by a two digit hexadecimal representation of the octet's value. The digits of the hexadecimal alphabet, for this purpose, are "0123456789ABCDEF". Uppercase letters must be used when sending hexadecimal data, though a robust implementation may choose to recognize lowercase letters on receipt. Thus, for example, the value 12 (ASCII form feed) can be represented by "=0C", and the value 61 (ASCII EQUAL SIGN) can be represented by "=3D". Except when the following rules allow an alternative encoding, this rule is mandatory. Rule #2: (Literal representation) Octets with decimal values of 33 through 60 inclusive, and 62 through 126, inclusive, MAY be represented as the ASCII characters which correspond to those octets (EXCLAMATION POINT through LESS THAN, and GREATER THAN through TILDE, respectively). Rule #3: (White Space): Octets with values of 9 and 32 MAY be represented as ASCII TAB (HT) and SPACE characters, respectively, but MUST NOT be so Borenstein & Freed [Page 14] RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 represented at the end of an encoded line. Any TAB (HT) or SPACE characters on an encoded line MUST thus be followed on that line by a printable character. In particular, an "=" at the end of an encoded line, indicating a soft line break (see rule #5) may follow one or more TAB (HT) or SPACE characters. It follows that an octet with value 9 or 32 appearing at the end of an encoded line must be represented according to Rule #1. This rule is necessary because some MTAs (Message Transport Agents, programs which transport messages from one user to another, or perform a part of such transfers) are known to pad lines of text with SPACEs, and others are known to remove "white space" characters from the end of a line. Therefore, when decoding a Quoted-Printable body, any trailing white space on a line must be deleted, as it will necessarily have been added by intermediate transport agents. Rule #4 (Line Breaks): A line break in a text body part, independent of what its representation is following the canonical representation of the data being encoded, must be represented by a (RFC 822) line break, which is a CRLF sequence, in the Quoted- Printable encoding. If isolated CRs and LFs, or LF CR and CR LF sequences are allowed to appear in binary data according to the canonical form, they must be represented using the "=0D", "=0A", "=0A=0D" and "=0D=0A" notations respectively. Note that many implementation may elect to encode the local representation of various content types directly. In particular, this may apply to plain text material on systems that use newline conventions other than CRLF delimiters. Such an implementation is permissible, but the generation of line breaks must be generalized to account for the case where alternate representations of newline sequences are used. Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long. If longer lines are to be encoded with the Quoted-Printable encoding, 'soft' line breaks must be used. An equal sign as the last character on a encoded line indicates such a non-significant ('soft') line break in the encoded text. Thus if the "raw" form of the line is a single unencoded line that says: Now's the time for all folk to come to the aid of their country. This can be represented, in the Quoted-Printable encoding, as Borenstein & Freed [Page 15] RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 Now's the time = for all folk to come= to the aid of their country. This provides a mechanism with which long lines are encoded in such a way as to be restored by the user agent. The 76 character limit does not count the trailing CRLF, but counts all other characters, including any equal signs. Since the hyphen character ("-") is represented as itself in the Quoted-Printable encoding, care must be taken, when encapsulating a quoted-printable encoded body in a multipart entity, to ensure that the encapsulation boundary does not appear anywhere in the encoded body. (A good strategy is to choose a boundary that includes a character sequence such as "=_" which can never appear in a quoted-printable body. See the definition of multipart messages later in this document.) NOTE: The quoted-printable encoding represents something of a compromise between readability and reliability in transport. Bodies encoded with the quoted-printable encoding will work reliably over most mail gateways, but may not work perfectly over a few gateways, notably those involving translation into EBCDIC. (In theory, an EBCDIC gateway could decode a quoted-printable body and re-encode it using base64, but such gateways do not yet exist.) A higher level of confidence is offered by the base64 Content-Transfer-Encoding. A way to get reasonably reliable transport through EBCDIC gateways is to also quote the ASCII characters !"#$@[\]^`{|}~ according to rule #1. See Appendix B for more information. Because quoted-printable data is generally assumed to be line-oriented, it is to be expected that the breaks between the lines of quoted printable data may be altered in transport, in the same manner that plain text mail has always been altered in Internet mail when passing between systems with differing newline conventions. If such alterations are likely to constitute a corruption of the data, it is probably more sensible to use the base64 encoding rather than the quoted-printable encoding. Borenstein & Freed [Page 16]