ZIP文件格式详解
一个 ZIP 文件的普通格式----------------------
一个 ZIP 文件由三个部分组成:
压缩源文件数据区+压缩源文件目录区+压缩源文件目录结束标志
1、压缩源文件数据区
在这个数据区中每一个压缩的源文件/目录都是一条记录,记录的格式如下:
[文件头+ 文件数据 + 数据描述符]
a、文件头结构
组成 长度
文件头标记 4 bytes (0x04034b50)
解压文件所需 pkware 版本 2 bytes
全局方式位标记 2 bytes
压缩方式 2 bytes
最后修改文件时间 2 bytes
最后修改文件日期 2 bytes
CRC-32校验 4 bytes
压缩后尺寸 4 bytes
未压缩尺寸 4 bytes
文件名长度 2 bytes
扩展记录长度 2 bytes
文件名 (不定长度)
扩展字段 (不定长度)
b、文件数据
c、数据描述符
组成 长度
CRC-32校验 4 bytes
压缩后尺寸 4 bytes
未压缩尺寸 4 bytes
这个数据描述符只在全局方式位标记的第3位设为1时才存在(见后详解),紧接在压缩数据的最后一个字节后。这个数据描述符只用在不能对输出的 ZIP 文件进行检索时使用。例如:在一个不能检索的驱动器(如:磁带机上)上的 ZIP 文件中。如果是磁盘上的ZIP文件一般没有这个数据描述符。
2、压缩源文件目录区
在这个数据区中每一条纪录对应在压缩源文件数据区中的一条数据
组成 长度
目录中文件文件头标记 4 bytes (0x02014b50)
压缩使用的 pkware 版本 2 bytes
解压文件所需 pkware 版本 2 bytes
全局方式位标记 2 bytes
压缩方式 2 bytes
最后修改文件时间 2 bytes
最后修改文件日期 2 bytes
CRC-32校验 4 bytes
压缩后尺寸 4 bytes
未压缩尺寸 4 bytes
文件名长度 2 bytes
扩展字段长度 2 bytes
文件注释长度 2 bytes
磁盘开始号 2 bytes
内部文件属性 2 bytes
外部文件属性 4 bytes
局部头部偏移量 4 bytes
文件名 (不定长度)
扩展字段 (不定长度)
文件注释 (不定长度)
3、压缩源文件目录结束标志
组成 长度
目录结束标记 4 bytes (0x02014b50)
当前磁盘编号 2 bytes
目录区开始磁盘编号 2 bytes
本磁盘上纪录总数 2 bytes
目录区中纪录总数 2 bytes
目录区尺寸大小 4 bytes
目录区对第一张磁盘的偏移量 4 bytes
ZIP 文件注释长度 2 bytes
ZIP 文件注释 (不定长度)
##################################################
explanation of fields:
version made by (2 bytes)
the upper byte indicates the compatibility of the file
attribute information. if the external file attributes
are compatible with ms-dos and can be read by pkzip for
dos version 2.04g then this value will be zero. if these
attributes are not compatible, then this value will identify
the host system on which the attributes are compatible.
software can use this information to determine the line
record format for text files etc. the current
mappings are:
0 - ms-dos and os/2 (fat / vfat / fat32 file systems)
1 - amiga 2 - vax/vms
3 - unix 4 - vm/cms
5 - atari st 6 - os/2 h.p.f.s.
7 - macintosh 8 - z-system
9 - cp/m 10 - windows ntfs
11 thru 255 - unused
the lower byte indicates the version number of the
software used to encode the file. the value/10
indicates the major version number, and the value
mod 10 is the minor version number.
version needed to extract (2 bytes)
the minimum software version needed to extract the
file, mapped as above.
general purpose bit flag: (2 bytes)
bit 0: if set, indicates that the file is encrypted.
(for method 6 - imploding)
bit 1: if the compression method used was type 6,
imploding, then this bit, if set, indicates
an 8k sliding dictionary was used. if clear,
then a 4k sliding dictionary was used.
bit 2: if the compression method used was type 6,
imploding, then this bit, if set, indicates
an 3 shannon-fano trees were used to encode the
sliding dictionary output. if clear, then 2
shannon-fano trees were used.
(for method 8 - deflating)
bit 2 bit 1
0 0 normal (-en) compression option was used.
0 1 maximum (-ex) compression option was used.
1 0 fast (-ef) compression option was used.
1 1 super fast (-es) compression option was used.
note: bits 1 and 2 are undefined if the compression
method is any other.
bit 3: if this bit is set, the fields crc-32, compressed size
and uncompressed size are set to zero in the local
header. the correct values are put in the data descriptor
immediately following the compressed data. (note: pkzip
version 2.04g for dos only recognizes this bit for method 8
compression, newer versions of pkzip recognize this bit
for any compression method.)
the upper three bits are reserved and used internally
by the software when processing the zipfile. the
remaining bits are unused.
compression method: (2 bytes)
(see accompanying documentation for algorithm
descriptions)
0 - the file is stored (no compression)
1 - the file is shrunk
2 - the file is reduced with compression factor 1
3 - the file is reduced with compression factor 2
4 - the file is reduced with compression factor 3
5 - the file is reduced with compression factor 4
6 - the file is imploded
7 - reserved for tokenizing compression algorithm
8 - the file is deflated
9 - reserved for enhanced deflating
10 - pkware date compression library imploding
date and time fields: (2 bytes each)
the date and time are encoded in standard ms-dos format.
if input came from standard input, the date and time are
those at which compression was started for this data.
crc-32: (4 bytes)
the crc-32 algorithm was generously contributed by
david schwaderer and can be found in his excellent
book "c programmers guide to netbios" published by
howard w. sams & co. inc. the 'magic number' for
the crc is 0xdebb20e3. the proper crc pre and post
conditioning is used, meaning that the crc register
is pre-conditioned with all ones (a starting value
of 0xffffffff) and the value is post-conditioned by
taking the one's complement of the crc residual.
if bit 3 of the general purpose flag is set, this
field is set to zero in the local header and the correct
value is put in the data descriptor and in the central
directory.
compressed size: (4 bytes)
uncompressed size: (4 bytes)
the size of the file compressed and uncompressed,
respectively. if bit 3 of the general purpose bit flag
is set, these fields are set to zero in the local header
and the correct values are put in the data descriptor and
in the central directory.
filename length: (2 bytes)
extra field length: (2 bytes)
file comment length: (2 bytes)
the length of the filename, extra field, and comment
fields respectively. the combined length of any
directory record and these three fields should not
generally exceed 65,535 bytes. if input came from standard
input, the filename length is set to zero.
disk number start: (2 bytes)
the number of the disk on which this file begins.
internal file attributes: (2 bytes)
the lowest bit of this field indicates, if set, that
the file is apparently an ascii or text file. if not
set, that the file apparently contains binary data.
the remaining bits are unused in version 1.0.
external file attributes: (4 bytes)
the mapping of the external attributes is
host-system dependent (see 'version made by'). for
ms-dos, the low order byte is the ms-dos directory
attribute byte. if input came from standard input, this
field is set to zero.
relative offset of local header: (4 bytes)
this is the offset from the start of the first disk on
which this file appears, to where the local header should
be found.
filename: (variable)
the name of the file, with optional relative path.
the path stored should not contain a drive or
device letter, or a leading slash. all slashes
should be forward slashes '/' as opposed to
backwards slashes '/' for compatibility with amiga
and unix file systems etc. if input came from standard
input, there is no filename field.
extra field: (variable)
this is for future expansion. if additional information
needs to be stored in the future, it should be stored
here. earlier versions of the software can then safely
skip this file, and find the next file or header. this
field will be 0 length in version 1.0.