RFC1952的部分翻译及原文 (转)

RFC1952的部分翻译及原文 (转)[@more@]


以下内容只是rfc1952中的一部分,其余内容请参照原文。
2. Detailed specification

  2.1. Overall conventions

  下面的图形表示一个字节:
  +---+
  |  |   +---+

    下面的图形表示若干字节:
  +==============+
  |  |
  +==============+

 
 计算机中所存贮的字节并不存在“位顺序”,因为字节本身被看作是一个单元。
但是,当一个字节被看作是一个0到255之间的整数时,就会有一些最重要的或是最不重
要的位。通常我们会将一个字节中最重要的位写在左边,将几个字节中,最重要的字节
写在左边。在图表中,我们将一个字节中的各位标上序号:位0表示最不重要的位等等:
  Bytes stored within a computer do not have a "bit order", since
  they are always treated as a unit.  However, a byte considered as
  an integer between 0 and 255 does have a most- and least-
  significant bit, and since we write numbers with the most-
  significant digit on the left, we also write bytes with the most-
  significant bit on the left.  In the diagrams below, we number the
  bits of a byte so that bit 0 is the least-significant bit, i.e.,
  the bits are numbered:

  +--------+
  |76543210|
  +--------+
 这篇文档不适用于位传输的情况,因为这里所说的数据格式都是以字节为单位的。
  This document does not address the issue of the order in which
  bits of a byte are transmitted on a bit-sequential medium, since
  the data format described here is byte- rather than bit-oriented.

 在计算机中,一个数可能占用几个字节。这里所说的多字节数据都是将不重要的
部分存贮在低地址的字节中,如520被保存为:
  Within a computer, a number may occupy multiple bytes.  All
  multi-byte numbers in the format described here are stored with
  the least-significant byte first (at the lower memory address).
  For example, the decimal number 520 is stored as:

  0  1
  +--------+--------+
  |00001000|00000010|
  +--------+--------+
  ^  ^
  |  |
  |  + more significant byte = 2 x 256
  + less significant byte = 8

  2.2. File format
 gzip文件是由一系列连续的成员(被压缩的数据单元)组成的。每一个成员格式
的说明见后面的章节。这些成员在文件中都是一个接一个的排列的,而没有其它的附加信息。
  A gzip file consists of a series of "members" (compressed data
  sets).  The format of each member is specified in the following
  section.  The members simply appear one after another in the file,
  with no additional information before, between, or after them.

  2.3. Member format
 成员格式:每个成员都 有如下的结构:
  Each member has the following structure:

  +---+---+---+---+---+---+---+---+---+---+
  |ID1|ID2|CM |FLG|  MTIME  |XFL|os | (more--&gt)
  +---+---+---+---+---+---+---+---+---+---+

  (if FLG.FEXTRA set)

  +---+---+=================================+
  | XLEN  |...XLEN bytes of "extra field"...| (more--&gt)
  +---+---+=================================+

  (if FLG.FNAME set)

  +=========================================+
  |...original file name, zero-teRminated...| (more--&gt)
  +=========================================+

  (if FLG.FCOMMENT set)

  +===================================+
  |...file comment, zero-terminated...| (more--&gt)
  +===================================+

  (if FLG.FHCRC set)

  +---+---+
  | CRC16 |
  +---+---+

  +=======================+
  |...compressed blocks...| (more--&gt)
  +=======================+

  0  1  2  3  4  5  6  7
  +---+---+---+---+---+---+---+---+
  |  CRC32  |  ISIZE  |
  +---+---+---+---+---+---+---+---+

  2.3.1. Member header and trailer
 成员的头部及尾部:
  ID1 (IDentification 1) 
  ID2 (IDentification 2)
 这两个字节是标识符用来识别gzip文件,有固定值:ID1 = 31,ID2 = 139;
  These have the fixed values ID1 = 31 (0x1f, 37), ID2 = 139
  (0x8b, 213), to identify the file as being in gzip format.

  CM (Compression Method)
 这个字节标识了文件的压缩方式。CM = 0-7的值是被保留的,CM = 8 表示
 “deflate”压缩的方式,通常被gzip及使用。
  This identifies the compression method used in the file.  CM
  = 0-7 are reserved.  CM = 8 denotes the "deflate"
  compression method, which is the one customarily used by
  gzip and which is documented elsewhere.

  FLG (FLaGs)
 这个字节被拆分成单独的位:
  This flag byte is divided into individual bits as follows:

  bit 0  FTEXT
  bit 1  FHCRC
  bit 2  FEXTRA
  bit 3  FNAME
  bit 4  FCOMMENT
  bit 5  reserved
  bit 6  reserved
  bit 7  reserved
 

 如果FTEXT位被设置:则文件可能是ASCII文本文件。这是一个可选的
标识符。压缩程序可以检查很小一部分的输入数据,看看有没有非ASCII码的字符,如
果没有,则可以设置这位。如果存在怀疑,可以清除这位,表示一个二进制文件。对于
有不同文件格式(ASCII及二进制)的系统来说,可以根据FTEXT来选择适当的格式。
我们不指定设置这一位的规则,压缩程序可以始终设置这一位为0,解压程序也会
始终忽略这一位而让其它的程序进行数据转换工作。
  If FTEXT is set, the file is probably ASCII text.  This is
  an optional indication, which the compressor may set by
  checking a small amount of the input data to see whether any
  non-ASCII characters are present.  In case of doubt, FTEXT
  is cleared, indicating binary data. For systems which have
  different file formats for ascii text and binary data, the
  decompressor can use FTEXT to choose the appropriate format.
  We deliberately do not specify the algorithm used to set
  this bit, since a compressor always has the option of
  leaving it cleared and a decompressor always has the option
  of ignoring it and letting some other program handle issues
  of data conversion.

 如果FHCRC位被设置,则gzip的头部中,在被压缩的数据前面,有
CRC16的部分。CRC16中包含有两字节的内容,它们是整个头部内容(不包括CRC16
这两字节)的CRC32中两个不重要的字节。[FHCRC位永远不会被1.2.4版本以上的
gzip所设置,即使它被1.2.4版本定义为不同的含义]
  If FHCRC is set, a CRC16 for the gzip header is present,
  immediately before the compressed data. The CRC16 consists
  of the two least significant bytes of the CRC32 for all
  bytes of the gzip header up to and not including the CRC16.
  [The FHCRC bit was never set by versions of gzip up to
  1.2.4, even though it was documented with a different
  meaning in gzip 1.2.4.]

 如果FEXTRA位被设置,则存在有可选的附加文件。将在后几节中叙述。
  If FEXTRA is set, optional extra fields are present, as
  described in a following section.

 如果FNAME位设置,则提供了原始的文件名称,由0字节终止。
名称必须由ISO8859-1中所定义的字符所组成。当操作系统使用EBCDIC或其它字符集
生成文件名的时候,文件名必须被转换到ISO LATIN-1字符集中。这个是被压缩的
文件的原始名字,不包括目录部分。如果操作系统对文件名称的大小写字母不敏感,
则将文件名称中的所有的字母强制转换成小写。如果数据不是从一个源始文件压缩而
来的,则不存在原始文件的名称。
  If FNAME is set, an original file name is present,
  terminated by a zero byte.  The name must consist of ISO
  8859-1 (LATIN-1) characters; on operating systems using
  EBCDIC or any other character set for file names, the name
  must be translated to the ISO LATIN-1 character set.  This
  is the original name of the file being compressed, with any
  directory components removed, and, if the file being
  compressed is on a file system with case insensitive names,
  forced to lower case. There is no original file name if the
  data was compressed from a source other than a named file;
  for example, if the source was stdin on a unix system, there
  is no file name.

 如果设置了FCOMMENT位,则提供有一个O-终结的文件内容。这段内
容不被解释,它只是被用来为人们所用。这部分内容必须包含有ISO 8859-1(LATIN-1)
字符。行终结符应该是0x0A。

  If FCOMMENT is set, a zero-terminated file comment is
  present.  This comment is not interpreted; it is only
  intended for human consumption.  The comment must consist of
  ISO 8859-1 (LATIN-1) characters.  Line breaks should be
  denoted by a single line feed character (10 decimal).

 保留的FLG位必须是0。
  Reserved FLG bits must be zero.

  MTIME (Modification TIME)
 MTIME:修改时间。这个部分提供了原始文件在压缩前的最新的修改时间。
时间是Unix格式的,是自从1970年1月1日0时0分0秒开始的秒数。如果被压缩的内容不是
文件,MTIME被设置为压缩的开始时间。
  This gives the most recent modification time of the original
  file being compressed.  The time is in Unix format, i.e.,
  seconds since 00:00:00 GMT, Jan.  1, 1970.  (Note that this
  may cause problems for MS-DOS and other systems that use
  local rather than Universal time.)  If the compressed data
  did not come from a file, MTIME is set to the time at which
  compression started.  MTIME = 0 means no time stamp is
  available.

  XFL (eXtra FLags)
 这个标志会被特殊的压缩方法所用到。“deflate”方法会这样设置:
 
  These flags are available for use by specific compression
  methods.  The "deflate" method (CM = 8) sets these flags as
  follows:

 使用最大的压缩,最慢的算法
  XFL = 2 - compressor used maximum compression,
  slowest algorithm
 采用最快的算法
  XFL = 4 - compressor used fastest algorithm

  OS (Operating System)
 这个标志指明了进行压缩时系统的类型。这在用来决定文本文件的行终结
符时十分有用。
  This identifies the type of file system on which compression
  took place.  This may be useful in determining end-of-line
  convention for text files.  The currently defined values are
  as follows:

  0 - fat filesystem (MS-DOS, OS/2, NT/win32)
  1 - Amiga
  2 - VMS (or OpenVMS)
  3 - Unix
  4 - VM/CMS
  5 - Atari TOS
  6 - HPFS filesystem (OS/2, NT)
  7 - Macintosh
  8 - Z-System
  9 - CP/M
  10 - TOPS-20
  11 - NTFS filesystem (NT)
  12 - QDOS
  13 - Acorn RISCOS
  255 - unknown

  XLEN (eXtra LENgth)
 如果FLG。FEXTRA被设置了,这两个字节是可选的额外的内容的长度。
  If FLG.FEXTRA is set, this gives the length of the optional
  extra field.  See below for details.

  CRC32 (CRC-32)
 这个是未压缩数据的循环冗余校验值。
  This contains a Cyclic Redundancy Check value of the
  uncompressed data computed according to CRC-32 algorithm
  used in the ISO 3309 standard and in section 8.1.1.6.2 of
  ITU-T recommendation V.42.  (See http://www.iso.ch for
  ordering ISO documents. See gopher://info.itu.ch for an
  online version of ITU-T V.42.)

  ISIZE (Input SIZE)
 这是原始数据的长度以2的32次方为模的值。
  This contains the size of the original (uncompressed) input
  data modulo 2^32.

  2.3.1.1. Extra field
 如果设置了FLG.FEXTRA位,则头部中存在有这部分的内容,总长度是
XLEN字节。它包含了一系列子域:

  If the FLG.FEXTRA bit is set, an "extra field" is present in
  the header, with total length XLEN bytes.  It consists of a
  series of subfields, each of the form:

  +---+---+---+---+==================================+
  |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
  +---+---+---+---+==================================+

 SI1和SI2提供了子域的ID,表示为两个可以记忆的ASCII字符。SI2=0
的值是为将来的使用而保留的。如下的ID是目前定义的:
  SI1 and SI2 provide a subfield ID, typically two ASCII letters
  with some mnemonic value.  Jean-Loup Gailly
  <Mailto:gzip@prep.ai.mit.edu" rel="nofollow">gzip@prep.ai.mit.edu> is maintaining a registry of subfield
  IDs; please send him any subfield ID you wish to use.  Subfield
  IDs with SI2 = 0 are reserved for future use.  The following
  IDs are currently defined:

  SI1  SI2  Data
  ----------  ----------  ----
  0x41 ('A')  0x70 ('P')  Apollo file type information

 LEN字段给出了子域的长度,包括最初的四个字节。
  LEN gives the length of the subfield data, excluding the 4
  initial bytes.

  2.3.1.2. Compliance
 一个压缩程序所产生的文件应该有正确的ID1,ID2,CM,CRC32,
和ISIZE。但是可以将所有其它存在于可变长度的部分的字段设置为默认值(255或
0)。必须设置所有有保留值为0;
  A compliant compressor must produce files with correct ID1,
  ID2, CM, CRC32, and ISIZE, but may set all the other fields in
  the fixed-length part of the header to default values (255 for
  OS, 0 for all others).  The compressor must set all reserved
  bits to zero.

 解压程序必须检查ID1,ID2,CM,D而且,当这些值存在错误时,要
提供错误提示。必须要检查:FEXTRA/XLEN, FNAME, FCOMMENT 和 FHCRC 至少这样
可以跳过可选字段。不需要检查其它的头部和尾部中的字段。特别是解压程序可以忽略
FTEXT 和 OS 而总是产生二进制的输。如果保留位非0,要给出错误提示,因为这一
位可能指出了一个新字段的存在,而这又可能导致对后面数据的错误解释。

  A compliant decompressor must check ID1, ID2, and CM, and
  provide an error indication if any of these have incorrect
  values.  It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
  at least so it can skip over the optional fields if they are
  present.  It need not examine any other part of the header or
  trailer; in particular, a decompressor may ignore FTEXT and OS
  and always produce binary output, and still be compliant.  A
  compliant decompressor must give an error indication if any
  reserved bit is non-zero, since such a bit could indicate the
  presence of a new field that would cause subsequent data to be
  interpreted incorrectly.


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/10794571/viewspace-974302/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/10794571/viewspace-974302/

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值