zip文件格式分析

1. 官方文档

https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.2.0.txt


本文即参考该文档,对一个示例zip文件的格式进行分析。

本文只是对整个zip文件结构的粗略分析,后续再详细分析其中的一些属性。

2. 格式说明

这里把appnote.txt中的一部分内容截取出来,这些是典型的压缩文件中会用到的数据结构。在某些场景中,还会用到本文没有提到的一些记录,此时需要查阅appnote.txt。

2.1 Overview

  Overall .ZIP file format:

    [local file header 1]
    [file data 1]
    [data descriptor 1]
    . 
    .
    .
    [local file header n]
    [file data n]
    [data descriptor n]
    [archive decryption header] (EFS)
    [archive extra data record] (EFS)
    [central directory]
    [zip64 end of central directory record]
    [zip64 end of central directory locator] 
    [end of central directory record]

2.2 Local File Header

  A.  Local file header:

        local file header signature     4 bytes  (0x04034b50)
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        file name length                2 bytes
        extra field length              2 bytes

        file name (variable size)
        extra field (variable size)

2.3 File Data

  B.  File data

      Immediately following the local header for a file
      is the compressed or stored data for the file. 
      The series of [local file header][file data][data
      descriptor] repeats for each file in the .ZIP archive. 


2.4 Central Directory Header

  F.  Central directory structure:

      [file header 1]
      .
      .
      . 
      [file header n]
      [digital signature] 

      File header:

        central file header signature   4 bytes  (0x02014b50)
        version made by                 2 bytes
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        file name length                2 bytes
        extra field length              2 bytes
        file comment length             2 bytes
        disk number start               2 bytes
        internal file attributes        2 bytes
        external file attributes        4 bytes
        relative offset of local header 4 bytes

        file name (variable size)
        extra field (variable size)
        file comment (variable size)

      Digital signature:

        header signature                4 bytes  (0x05054b50)
        size of data                    2 bytes
        signature data (variable size)

2.5 End of Central Directory Record (EOCD)

每个压缩文件必须有且只有一个EOCD记录。

  I.  End of central directory record:

        end of central dir signature    4 bytes  (0x06054b50)
        number of this disk             2 bytes
        number of the disk with the
        start of the central directory  2 bytes
        total number of entries in the
        central directory on this disk  2 bytes
        total number of entries in
        the central directory           2 bytes
        size of the central directory   4 bytes
        offset of start of central
        directory with respect to
        the starting disk number        4 bytes
        .ZIP file comment length        2 bytes
        .ZIP file comment       (variable size)


3. 示例分析

这里只考虑最简单的一种场景,只包括一个文本文件的压缩文件。如果有多个文件,只是上述一些record会有多份。

下面直接给出二进制格式的分析结果。——最后的附件给出了原始文件、压缩文件、二进制分析结果、以及字体有不同颜色的分析结果。

[Local File Header 1]

  A.  Local file header:

        local file header signature     4 bytes  (0x04034b50)
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        file name length                2 bytes
        extra field length              2 bytes

        file name (variable size)
        extra field (variable size)


00000000h: 50 4B 03 04 --- local file header signature(4 bytes, 0x04034b50)
                       14 00 --- version needed to extract(2 bytes)
                             00 00 --- general purpose bit flag(2 bytes)
                                   08 00 --- compression method(2 bytes)
                                         6F 9D --- last mod file time(2 bytes)
                                               D9 46 --- last mod file date(2 bytes)
                                                     D0 1E ; PK........o澷F?
00000010h: FE B9 --- crc-32(4 bytes)
                 A4 01 00 00 --- compressed size(4 bytes)
                             72 04 00 00 --- uncompressed size(4 bytes)
                                         08 00 --- file name length(2 bytes)
                                               00 00 --- extra field length(2 bytes)
                                                     70 72 ; ?..r.......pr
00000020h: 69 6D 65 2E 70 79 --- file name (variable size, 8 bytes)

[File Data 1]
   compressed size = 0x01A4
   0x0026(start_address) + 0x01A4 = 0x01CA

                             7D 53 C1 4E C3 30 0C BD 4F DA ; ime.py}S罭?.絆?
00000030h: 3F 98 03 A2 15 65 5A 87 76 41 94 23 12 17 84 04 ; ???eZ噕A?..?
00000040h: 37 84 A2 B0 79 2C A8 75 47 92 02 9F 8F 9D B6 5B ; 7劉皔,╱G?煆澏[
00000050h: B6 6E F4 50 29 F6 CB F3 F3 73 6C AA 4D 6D 3D 54 ; 秐鬚)鏊篌sl狹m=T
00000060h: DA AF C7 A3 F1 68 89 2B 30 4E 6D AC A9 30 A1 F4 ; 诏牵駂?0Nm0◆
00000070h: 66 3C 02 FE CC 0A 08 6E 61 D6 1D E5 B3 E8 1B 4B ; f<...na?宄?K
00000080h: 70 AF 4B 87 72 51 62 48 4B 45 4D F5 8E 16 0A 30 ; p疜噐QbHKEM鯉..0
00000090h: E4 13 61 9D B8 2F EB 99 2A 85 4B C8 5B DC AA B6 ; ?a澑/霗*匥萚塥?
000000a0h: 60 18 00 56 D3 07 26 B3 2C BA 99 46 35 42 D9 73 ; `..V?&?簷F5B賡
000000b0h: 86 16 05 4C A3 F8 B0 BE 44 DA 7F 17 7F B1 4D 90 ; ?.Lx熬D?.盡?
000000c0h: B5 28 B5 73 F0 14 DA A9 DF 3F 71 E1 FB 02 D2 A9 ; ?祍?讴?q猁.药
000000d0h: 52 86 8C 57 2A 71 58 AE 32 70 BE DE 14 8F 35 E1 ; R唽W*qX?p巨.??
000000e0h: 81 08 89 B3 2B 40 B5 07 49 1F 48 91 CB 93 80 29 ; ?壋+@?I.H懰搥)
000000f0h: 02 74 97 45 16 F7 0F F8 2A 8F 33 BD 8B 5B D4 A2 ; .t桬.???綃[寓
00000100h: B1 0F B4 C4 5F 46 E6 87 A0 4E BD 47 DB A9 4F 87 ; ?茨_F鎳燦紾郓O?
00000110h: B3 91 F0 B1 7B 3C 5F F6 FB BF 5B AD C0 B3 A0 50 ; 硲鸨{<_鳆縖碃P
00000120h: D3 F2 40 CE 5D B1 03 C5 BC 84 BF 7E C0 4A AC 5E ; 域@蝅?偶効~繨琟
00000130h: 3C DB 45 7E D6 A6 44 8E B3 A1 47 CC 64 BB C3 B4 ; <跡~枝D幊蘢幻?
00000140h: 26 27 54 6E D5 6A E3 10 9E 59 C2 03 9B A0 BD A9 ; &'Tn誮?瀁?洜僵
00000150h: 29 36 B0 A3 DA 3E E5 BD 0E 8E F1 89 CE FD 36 4F ; )6埃?褰.庱壩?O
00000160h: 8C 66 38 9E CB 42 1E F6 C0 48 EA F7 49 79 74 3E ; 宖8炈B.隼H犄Iyt>
00000170h: E9 8B 06 3D 5C AC 7D 93 F9 74 9A 46 4B E1 B1 92 ; 閶.=\瑌擓t欶K岜?
00000180h: BD 08 98 48 24 9F C9 87 6C B6 47 DA 35 D7 53 7F ; ?楬$熒噇禛?譙
00000190h: EB B2 41 C7 DC AF BC 52 D7 19 CC 33 C8 E7 6F 3B ; 氩A擒R??如o;
000001a0h: FA 90 17 FE 16 38 28 10 C2 19 0C EA F6 26 86 7C ; 鷲.?8(.?.牿&唡
000001b0h: 2A 0A 8C BC 3E D2 15 2A 25 CB 79 A1 54 A5 0D 29 ; *.尲>?*%藋?)
000001c0h: 75 D1 71 EE A9 8B 42 72 F8 03 --- 此处截止地址为 0x01CA

[Central Directory Header 1]

      File header:
        central file header signature   4 bytes  (0x02014b50)
        version made by                 2 bytes
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        file name length                2 bytes
        extra field length              2 bytes
        file comment length             2 bytes
        disk number start               2 bytes
        internal file attributes        2 bytes
        external file attributes        4 bytes
        relative offset of local header 4 bytes

        file name (variable size)
        extra field (variable size)
        file comment (variable size)

                                         50 4B 01 02 --- central file header signature(4 bytes, 0x02014b50)
                                                     3F 00 --- version made by(2 bytes)
                                                           ; u裶瞟婤r?PK..?.
000001d0h: 14 00 ---  version needed to extract(2 bytes)
                 00 00 --- general purpose bit flag(2 bytes)
                       08 00 --- compression method(2 bytes)
                             6F 9D --- last mod file time(2 bytes)
                                   D9 46 --- last mod file date(2 bytes)
                                         D0 1E FE B9 --- crc-32(4 bytes)
                                                     A4 01 ; ......o澷F??
000001e0h: 00 00 --- compressed size(4 bytes)
                 72 04 00 00 --- uncompressed size(4 bytes)
                             08 00 --- file name length(2 bytes)
                                   24 00 --- extra field length(2 bytes)
                                         00 00 --- file comment length(2 bytes)
                                               00 00 --- disk number start(2 bytes)
                                                     00 00 --- internal file attributes(2 bytes)
                                                           ; ..r.....$.......
000001f0h: 20 00 00 00 --- external file attributes(4 bytes)
                       00 00 00 00 --- relative offset of local header(4 bytes)
                                   70 72 69 6D 65 2E 70 79 --- file name (variable size, 8 bytes)
                                                           ;  .......prime.py
00000200h: 0A 00 20 00 00 00 00 00 01 00 18 00 76 36 37 27 ; .. .........v67'
00000210h: 3C AF D0 01 D7 80 CE 6B 39 AF D0 01 D7 80 CE 6B ; <.讇蝛9.讇蝛
00000220h: 39 AF D0 01 --- extra field (variable size, 0x24)


[End of Central Directory Recor]

  I.  End of central directory record:
        end of central dir signature    4 bytes  (0x06054b50)
        number of this disk             2 bytes
        number of the disk with the
        start of the central directory  2 bytes
        total number of entries in the
        central directory on this disk  2 bytes
        total number of entries in
        the central directory           2 bytes
        size of the central directory   4 bytes
        offset of start of central
        directory with respect to
        the starting disk number        4 bytes
        .ZIP file comment length        2 bytes
        .ZIP file comment       (variable size)
        
                       50 4B 05 06 --- end of central dir signature(4 bytes, 0x06054b50)
                                   00 00 --- number of this disk(2 bytes)
                                         00 00 --- number of the disk with the start of the central directory(2 bytes)
                                               01 00 --- total number of entries in the central directory on this disk(2 bytes)
                                                     01 00 --- total number of entries in the central directory(2 bytes)
                                                           ; 9.PK..........
00000230h: 5A 00 00 00 --- size of the central directory(4 bytes)
                       CA 01 00 00 --- offset of start of central directory with respect to the starting disk number(4 bytes)
                                   00 00 --- .ZIP file comment length(2 bytes)
                                                           ; Z...?....

4. 附件

本文用到的示例文件放在下载资源中:http://download.csdn.net/detail/u013344915/8839437

其中包括对分析结果用不同颜色字体进行区分,如下:



5.其他

Win64位上可用的一个UltraEdit:http://download.csdn.net/detail/leandzgc/5380771


  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值