文档格式签名列表

       最近在做解压缩相关项目,需要处理不同格式的文档,各个文件格式,解析器如何知道一个文件是什么格式,主要是文件二进制头(file signatures-文件签名)来决定的。

      例如如何确定一个文件是apk(同zip等压缩文件)文件,需要解析其前四个字节“50 4B 03 04”来确定。

      反编译apk文件中的dex文件,其前8个字节是固定的(“64 65 78 0A 30 33 35 00”),其他文件类似。

      先拷贝各文件签名列表如下,以便查询:

Hex signature ISO 8859-1 Offset File extension Description
00.0PIC

PIF
SEA
YTR

IBM Storyboard bitmap file

Windows Program Information File
Mac Stuffit Self-Extracting Archive
IRIS OCR data file

00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00

........

........
........

11PDBPalmPilot Database/Document File
00 00 00 nn 66 74 79 70

33 67 70

....ftyp

3gp

03GG, 3GP, 3G23rd Generation Partnership Project 3GPP (nn=0x14)

and 3GPP2 (nn=0x20) multimedia files

00 00 00 nn 66 74 79 70

33 67 70 35

....ftyp

3gp5

0MP4MPEG-4 video files
00 00 01 00....0icoComputer icon encoded in ICO file format[1]
00 01 00 00...0...Palm Desktop Data File (Access format)
00 01 42 44...0DBAPalm Desktop To Do Archive
00 01 44 54...0TDAPalm Desktop Calendar Archive
05 07 00 00 42 4F 42 4F
05 07 00 00 00 00 00 00
00 00 00 00 00 01
....BOBO............0cwkAppleWorks 5 document
06 07 E1 00 42 4F 42 4F
06 07 E1 00 00 00 00 00
00 00 00 00 00 01
....BOBO............0cwkAppleWorks 6 document
1F 9D..0z, tar.zcompressed file (often tar zip)

using Lempel-Ziv-Welch algorithm

1F A0..0z, tar.zCompressed file (often tar zip)

using LZH algorithm

24 53 44 49 30 30 30 31$SDI00010 System Deployment Image, a disk image format used by Microsoft
25 21 50 53 %!PS0psPostScript document
25 50 44 46 %PDF0pdfPDF document
30 26 B2 75 8E 66 CF 11

A6 D9 00 AA 00 62 CE 6C

0&²u.fÏ.

¦Ù.ª.bÎl

0asf, wma, wmvAdvanced Systems Format[8]
38 42 50 538BPS0psdPhotoshop Document file, Adobe Photoshop's native file format
41 47 44 33AGD30fh8FreeHand 8 document[18][19][20]
42 4DBM0bmp, dibBMP file, a bitmap format used mostly in the Windows world
42 5A 68BZh0bz2Compressed file using Bzip2 algorithm
43 44 30 30 31CD0010x8001, 0x8801 or 0x9001isoISO9660 CD/DVD image file[9]
43 72 32 34Cr240crxGoogle Chrome extension[16] or packaged app[17]
45 52 02 00 00 00
or
8B 45 52 02 00 00 00
ER....
or
ãER....
0toastRoxio Toast disc image file, also some .dmg-files begin with same bytes
46 4F 52 4D nn nn nn nn 38 53 56 58FORM....8SVX0, any8svx, 8sv, svx, snd, iffIFF 8-Bit Sampled Voice
46 4F 52 4D nn nn nn nn 41 43 42 4DFORM....ACBM0, anyacbm, iffAmiga Contiguous Bitmap
46 4F 52 4D nn nn nn nn 41 49 46 46FORM....AIFF0, anyaiff, aif, aifc, snd, iffAudio Interchange File Format
46 4F 52 4D nn nn nn nn 41 4E 42 4DFORM....ANBM0, anyanbm, iffIFF Animated Bitmap
46 4F 52 4D nn nn nn nn 41 4E 49 4DFORM....ANIM0, anyanim, iffIFF CEL Animation
46 4F 52 4D nn nn nn nn 43 4D 55 53FORM....CMUS0, anycmus, mus, iffIFF Musical Score
46 4F 52 4D nn nn nn nn 46 41 4E 54FORM....FANT0, anyiffAmiga Fantavision Movie
46 4F 52 4D nn nn nn nn 46 41 58 58FORM....FAXX0, anyfaxx, fax, iffIFF Facsimile Image
46 4F 52 4D nn nn nn nn 46 54 58 54FORM....FTXT0, anyftxt, txt, iffIFF Formatted Text
46 4F 52 4D nn nn nn nn 49 4C 42 4DFORM....ILBM0, anyilbm, lbm, ibm, iffIFF Interleaved Bitmap Image
46 4F 52 4D nn nn nn nn 53 4D 55 53FORM....SMUS0, anysmus, smu, mus, iffIFF Simple Musical Score
46 4F 52 4D nn nn nn nn 59 55 56 4EFORM....YUVN0, anyyuvn, yuv, iffIFF YUV Image
47 49 46 38 37 61

47 49 46 38 39 61

GIF87a

GIF89a

0gifImage file encoded in the Graphics Interchange Format (GIF)[2]
49 44 33ID30mp3MP3 file with an ID3v2 container
49 49 2A 00 (little endian format) or
4D 4D 00 2A (big endian format)
II*. or
MM.*
0tif, tiffTagged Image File Format
4B 44 4DKDM0vmdkVMDK files [14][15]
4D 54 68 64MThd0mid, midiMIDI sound file[12]
4D 5AMZ0exeDOS MZ executable file format and its descendants (including NE and PE)
4E 45 53 1ANES0nesNintendo Entertainment System ROM file [25]
4F 67 67 53OggS0ogg, oga, ogvOgg, an open source media container format
50 4B 03 04, 50 4B 05 06 (empty archive) or 50 4B 07 08 (spanned archive)PK..0zip, jar, odt, ods, odp, docx, xlsx, pptx, apkzip file format and formats based on it, such as JARODFOOXML
50 4D 4F 43 43 4D 4F 43PMOCCMOC0datWindows Files And Settings Transfer Repository[22] See also USMT 3.0 (Win XP)[23] and USMT 4.0 (Win 7)[24] User Guides
52 49 46 46 nn nn nn nn 57 41 56 45RIFF....WAVE0wavWaveform Audio File Format
52 61 72 21 1A 07 00Rar!...0rarRAR archive version 1.50 onwards[3]
52 61 72 21 1A 07 01 00Rar!....0rarRAR archive version 5.0 onwards[4]
53 44 50 58 (big endian format) or
58 50 44 53 (little endian format)
SDPX or
XPDS
0dpxSMPTE DPX image
53 49 4d 50 4c 45 20 20

3d 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20
20 20 20 20 20 54

SIMPLE = T0fitsFlexible Image Transport System (FITS)[10]
64 65 78 0A 30 33 35 00dex
035
0dexDalvik Executable
66 4C 61 43fLaC0flacFree Lossless Audio Codec[11]
75 73 74 61 72 00 30 30
or
75 73 74 61 72 20 20 00
ustar.00
or
ustar .
257tartar archive[26]
76 2F 31 01v/1.0exrOpenEXR image
78 01 73 0D 62 62 60x.s.bb`0dmgApple Disk Image file
78 61 72 21xar!0xareXtensible ARchive format[21]
7F 45 4C 46.ELF0 Executable and Linkable Format
80 2A 5F D7.*_.0cinKodak Cineon image
89 50 4E 47 0D 0A 1A 0A.PNG....0pngImage encoded in the Portable Network Graphics format[5]
BE BA FE CA...0DBAPalm Desktop Calendar Archive
CA FE BA BEÊþº¾0classJava class fileMach-O Fat Binary
CE FA ED FE........0 Mach-O binary (reverse byte ordering scheme, 32-bit)[6]
CF FA ED FE........0 Mach-O binary (reverse byte ordering scheme, 64-bit)[7]
D0 CF 11 E0 A1 B1 1A E1  doc, xls, pptMicrosoft Office documents[13]
EF BB BF0 UTF-8 encoded Unicode byte order mark, commonly seen in text files.
FE ED FA CE........0 or typically 0x1000 Mach-O binary (32-bit)
FE ED FA CF........0 or typically 0x1000 Mach-O binary (64-bit)
FF D8 FFÿØÿà0jpg, jpegJPEG
FF FB˙ű0mp3MPEG-1 Layer 3 file without an ID3 tag or with an ID3v1 tag (which's appended at the end of the file)
FF FE..0 Byte-order mark for text file encoded in little-endian 16-bit Unicode Transfer Format
FF FE 00 00....0 Byte-order mark for text file encoded in little-endian 32-bit Unicode Transfer Format

参考:

1、http://en.wikipedia.org/wiki/List_of_file_signatures

2、http://www.astro.keele.ac.uk/oldusers/rno/Computing/File_magic.html#Exec

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值