http://blog.csdn.net/vctrue/article/details/8855034
dos exe file structure
offset size description
00 word "mz" - link file .exe signature (mark zbikowski?)
02 word length of image mod 512
04 word size of file in 512 byte pages
06 word number of relocation items following header
08 word size of header in 16 byte paragraphs, used to locate
the beginning of the load module
0a word min # of paragraphs needed to run program
0c word max # of paragraphs the program would like
0e word offset in load module of stack segment (in paras)
10 word initial sp value to be loaded
12 word negative checksum of pgm used while by exec loads pgm
14 word program entry point, (initial ip value)
16 word offset in load module of the code segment (in paras)
18 word offset in .exe file of first relocation item
1a word overlay number (0 for root program)
- relocation table and the program load module follow the header
- relocation entries are 32 bit values representing the offset
into the load module needing patched
- once the relocatable item is found, the cs register is added to
the value found at the calculated offset
registers at load time of the exe file are as follows:
ax: contains number of characters in command tail, or 0
bx:cx 32 bit value indicating the load module memory size
dx zero
ss:sp set to stack segment if defined else, ss = cs and
sp=ffffh or top of memory.
ds set to segment address of exe header
es set to segment address of exe header
cs:ip far address of program entry point, (label on "end"
statement of program)
exe format
mz exe format
intel byte order
information from file format list 2.0 by max maischein.
the old exe files are the exe files executed directly by ms-dos. they were a
major improvement over the old 64k com files, since exe files can span multiple
segments. an exe file consists of three different parts, the header, the
relocation table and the binary code.
the header is expanded by a lot of programs to store their copyright information
in the executable, some extensions are documented below.
the format of the header is as follows :
offset count type description
0000h 2 char id='mz'
id='zm'
0002h 1 word number of bytes in last 512-byte page
of executable
0004h 1 word total number of 512-byte pages in executable
(including the last page)
0006h 1 word number of relocation entries
0008h 1 word header size in paragraphs
000ah 1 word minimum paragraphs of memory allocated in
addition to the code size
000ch 1 word maximum number of paragraphs allocated in
addition to the code size
000eh 1 word initial ss relative to start of executable
0010h 1 word initial sp
0012h 1 word checksum (or 0) of executable
0014h 1 dword cs:ip relative to start of executable
(entry point)
0018h 1 word offset of relocation table;
40h for new-(ne,le,lx,w3,pe etc.) executable
001ah 1 word overlay number (0h = main program)
following are the header expansions by some other prorams like tlink, lzexe and
other linkers, encryptors and compressors; all offsets are relative to the start
of the whole header :
---new executable
offset count type description
001ch 4 byte ????
0020h 1 word behaviour bits ??
0022h 26 byte reserved (0)
003ch 1 dword offset of new executable header from start of
file (or 0 if plain mz executable)
---borland tlink
offset count type description
001ch 2 byte ?? (apparently always 01h 00h)
001eh 1 byte id=0fbh
001fh 1 byte tlink version, major in high nybble
0020h 2 byte ??
---old arj self-extracting archive
offset count type description
001ch 4 char id='rjsx' (older versions)
new signature is 'arjsf'" in the first 1000
bytes of the file)
---lzexe compressed executable
offset count type description
001ch 2 char id='lz'
001eh 2 char version number :
'09' - lzexe 0.90
'91' - lzexe 0.91
---pklite compressed executable
offset count type description
001ch 1 byte minor version number
001dh 1 byte bit mapped :
0-3 - major version
4 - extra compression
5 - multi-segment file
001eh 6 char id='pklite'
---lharc 1.x self-extracting archive
offset count type description
001ch 4 byte unused???
0020h 3 byte jump to start of extraction code
0023h 2 byte ???
0025h 12 char id='lharc's sfx '
--lha 2.x self-extracting archive
offset count type description
001ch 8 byte ???
0024h 10 char id='lha's sfx '
for version 2.10
id='lha's sfx ' (v2.13)
for version 2.13
---lh self-extracting archive
offset count type description
001ch 8 byte ???
0024h 8 byte id='lh's sfx '
---topspeed c 3.0 crunch compressed file
offset count type description
001ch 1 dword id=018a0001h
0020h 1 word id=1565h
---pkarc 3.5 self-extracting archive
offset count type description
001ch 1 dword id=00020001h
0020h 1 word id=0700h
---bsa (soviet archiver) self-extracting archive
offset count type description
001ch 1 word id=000fh
001eh 1 byte id=a7h
---larc self-extracting archive
offset count type description
001ch 4 byte ???
0020h 11 byte id='sfx by larc '
after the header, there follow the relocation items, which are used to span
multpile segments. the relocation items have the following format :
offset count type description
0000h 1 word offset within segment
0002h 1 word segment of relocation
to get the position of the relocation within the file, you have to compute the
physical adress from the segment:offset pair, which is done by multiplying the
segment by 16 and adding the offset and then adding the offset of the binary
start. note that the raw binary code starts on a paragraph boundary within the
executable file. all segments are relative to the start of the executable in
memory, and this value must be added to every segment if relocation is done
manually.
extension:exe,ovr,ovl
occurences:pc
programs:ms-dos
reference:ralf brown's interrupt list
see also:com,exe,ne exe
executable-file format
.exe executable-file header format (3.1)
an executable (.exe) file for the microsoft windows operating system contains a combination of code and data or a combination
of code, data, and resources. the executable file also contains two headers: an ms-dos header and a windows header. the
next two sections describe these headers; the third section describes the code and data contained in a windows executable file.
ms-dos header
the ms-dos (old-style) executable-file header contains four distinct parts: a collection of header information (such as the
signature word, the file size, and so on), a reserved section, a pointer to a windows header (if one exists), and a stub program.
the following illustration shows the ms-dos executable-file header:
if the word value at offset 18h is 40h or greater, the word value at 3ch is typically an offset to a windows header. applications
must verify this for each executable-file header being tested, because a few applications have a different header style.
ms-dos uses the stub program to display a message if windows has not been loaded when the user attempts to run a
program.
for more information about the ms-dos executable-file header, see the microsoft ms-dos programmer's reference
(redmond, washington: microsoft press, 1991).
windows header
the windows (new-style) executable-file header contains information that the loader requires for segmented executable files.
this information includes the linker version number, data specified by the linker, data specified by the resource compiler, tables
of segment data, tables of resource data, and so on. the following illustration shows the windows executable-file header:
the following sections describe the entries in the windows executable-file header.
information block
the information block in the windows header contains the linker version number, the lengths of various tables that further
describe the executable file, the offsets from the beginning of the header to the beginning of these tables, the heap and stack
sizes, and so on. the following list summarizes the contents of the header information block (the locations are relative to the
beginning of the block):
location description
00h specifies the signature word. the low byte contains "n" (4eh) and the high byte contains "e" (45h).
02h specifies the linker version number.
03h specifies the linker revision number.
04h specifies the offset to the entry table (relative to the beginning of the header).
06h specifies the length of the entry table, in bytes.
08h reserved.
0ch specifies flags that describe the contents of the executable file. this value can be one or more of the following bits:
bit meaning
0 the linker sets this bit if the executable-file format is singledata. an executable file with this format
contains one data segment. this bit is set if the file is a dynamic-link library (dll).
1 the linker sets this bit if the executable-file format is multipledata. an executable file with this format
contains multiple data segments. this bit is set if the file is a windows application.
if neither bit 0 nor bit 1 is set, the executable-file format is noautodata. an executable file with this format
does not contain an automatic data segment.
2 reserved.
3 reserved.
8 reserved.
9 reserved.
11 if this bit is set, the first segment in the executable file contains code that loads the application.
13 if this bit is set, the linker detects errors at link time but still creates an executable file.
14 reserved.
15 if this bit is set, the executable file is a library module.
if bit 15 is set, the cs:ip registers point to an initialization procedure called with the value in the ax register
equal to the module handle. the initialization procedure must execute a far return to the caller. if the
procedure is successful, the value in ax is nonzero. otherwise, the value in ax is zero.
the value in the ds register is set to the library's data segment if singledata is set. otherwise, ds is set
to the data segment of the application that loads the library.
0eh specifies the automatic data segment number. (0eh is zero if the singledata and multipledata bits are
cleared.)
10h specifies the initial size, in bytes, of the local heap. this value is zero if there is no local allocation.
12h specifies the initial size, in bytes, of the stack. this value is zero if the ss register value does not equal the ds
register value.
14h specifies the segment:offset value of cs:ip.
18h specifies the segment:offset value of ss:sp.
the value specified in ss is an index to the module's segment table. the first entry in the segment table
corresponds to segment number 1.
if ss addresses the automatic data segment and sp is zero, sp is set to the address obtained by adding the size of
the automatic data segment to the size of the stack.
1ch specifies the number of entries in the segment table.
1eh specifies the number of entries in the module-reference table.
20h specifies the number of bytes in the nonresident-name table.
22h specifies a relative offset from the beginning of the windows header to the beginning of the segment table.
24h specifies a relative offset from the beginning of the windows header to the beginning of the resource table.
26h specifies a relative offset from the beginning of the windows header to the beginning of the resident-name table.
28h specifies a relative offset from the beginning of the windows header to the beginning of the module-reference table.
2ah specifies a relative offset from the beginning of the windows header to the beginning of the imported-name table.
2ch specifies a relative offset from the beginning of the file to the beginning of the nonresident-name table.
30h specifies the number of movable entry points.
32h specifies a shift count that is used to align the logical sector. this count is log2 of the segment sector size. it is
typically 4, although the default count is 9. (this value corresponds to the /alignment [/a] linker switch. when the
linker command line contains /a:16, the shift count is 4. when the linker command line contains /a:512, the shift
count is 9.)
34h specifies the number of resource segments.
36h specifies the target operating system, depending on which bits are set:
bit meaning
0 operating system format is unknown.
1 reserved.
2 operating system is microsoft windows.
3 reserved.
4 reserved.
37h specifies additional information about the executable file. it can be one or more of the following values:
bit meaning
1 if this bit is set, the executable file contains a windows 2.x application that runs in version 3.x protected
mode.
2 if this bit is set, the executable file contains a windows 2.x application that supports proportional fonts.
3 if this bit is set, the executable file contains a fast-load area.
38h specifies the offset, in sectors, to the beginning of the fast-load area. (only windows uses this value.)
3ah specifies the length, in sectors, of the fast-load area. (only windows uses this value.)
3ch reserved.
3eh specifies the expected version number for windows. (only windows uses this value.)
segment table
the segment table contains information that describes each segment in an executable file. this information includes the
segment length, segment type, and segment-relocation data. the following list summarizes the values found in the segment
table (the locations are relative to the beginning of each entry):
location description
00h specifies the offset, in sectors, to the segment data (relative to the beginning of the file). a value of zero means no
data exists.
02h specifies the length, in bytes, of the segment, in the file. a value of zero indicates that the segment length is 64k,
unless the selector offset is also zero.
04h specifies flags that describe the contents of the executable file. this value can be one or more of the following:
bit meaning
0 if this bit is set, the segment is a data segment. otherwise, the segment is a code segment.
1 if this bit is set, the loader has allocated memory for the segment.
2 if this bit is set, the segment is loaded.
3 reserved.
4 if this bit is set, the segment type is movable. otherwise, the segment type is fixed.
5 if this bit is set, the segment type is pure or shareable. otherwise, the segment type is impure or
nonshareable.
6 if this bit is set, the segment type is preload. otherwise, the segment type is loadoncall.
7 if this bit is set and the segment is a code segment, the segment type is executeonly. if this bit is set
and the segment is a data segment, the segment type is readonly.
8 if this bit is set, the segment contains relocation data.
9 reserved.
10 reserved.
11 reserved.
12 if this bit is set, the segment is discardable.
13 reserved.
14 reserved.
15 reserved.
06h specifies the minimum allocation size of the segment, in bytes. a value of zero indicates that the minimum allocation
size is 64k.
resource table
the resource table describes and identifies the location of each resource in the executable file. the table has the following form:
word rscalignshift;
typeinfo rsctypes[];
word rscendtypes;
byte rscresourcenames[];
byte rscendnames;
following are the members in the resource table:
rscalignshift specifies the alignment shift count for resource data. when the shift count is used as an exponent of 2,
the resulting value specifies the factor, in bytes, for computing the location of a resource in the
executable file.
rsctypes specifies an array of typeinfo structures containing information about resource types. there must
be one typeinfo structure for each type of resource in the executable file.
rscendtypes specifies the end of the resource type definitions. this member must be zero.
rscresourcenames specifies the names (if any) associated with the resources in this table. each name is stored as
consecutive bytes; the first byte specifies the number of characters in the name.
rscendnames specifies the end of the resource names and the end of the resource table. this member must be
zero.
type information
the typeinfo structure has the following form:
typedef struct _typeinfo {
word rttypeid;
word rtresourcecount;
dword rtreserved;
nameinfo rtnameinfo[];
} typeinfo;
following are the members in the typeinfo structure:
rttypeid specifies the type identifier of the resource. this integer value is either a resource-type value or an offset
to a resource-type name. if the high bit in this member is set (0x8000), the value is one of the following
resource-type values:
value resource type
rt_accelerator accelerator table
rt_bitmap bitmap
rt_cursor cursor
rt_dialog dialog box
rt_font font component
rt_fontdir font directory
rt_group_cursor cursor directory
rt_group_icon icon directory
rt_icon icon
rt_menu menu
rt_rcdata resource data
rt_string string table
if the high bit of the value in this member is not set, the value represents an offset, in bytes relative to the
beginning of the resource table, to a name in the rscresourcenames member.
rtresourcecount specifies the number of resources of this type in the executable file.
rtreserved reserved.
rtnameinfo specifies an array of nameinfo structures containing information about individual resources. the
rtresourcecount member specifies the number of structures in the array.
name information
the nameinfo structure has the following form:
typedef struct _nameinfo {
word rnoffset;
word rnlength;
word rnflags;
word rnid;
word rnhandle;
word rnusage;
} nameinfo;
following are the members in the nameinfo structure:
rnoffset specifies an offset to the contents of the resource data (relative to the beginning of the file). the offset is in terms of
alignment units specified by the rscalignshift member at the beginning of the resource table.
rnlength specifies the resource length, in bytes.
rnflags specifies whether the resource is fixed, preloaded, or shareable. this member can be one or more of the following
values:
value meaning
0x0010 resource is movable (moveable). otherwise, it is fixed.
0x0020 resource can be shared (pure).
0x0040 resource is preloaded (preload). otherwise, it is loaded on demand.
rnid specifies or points to the resource identifier. if the identifier is an integer, the high bit is set (8000h). otherwise, it is an
offset to a resource string, relative to the beginning of the resource table.
rnhandle reserved.
rnusage reserved.
resident-name table
the resident-name table contains strings that identify exported functions in the executable file. as the name implies, these strings
are resident in system memory and are never discarded. the resident-name strings are case-sensitive and are not
null-terminated. the following list summarizes the values found in the resident-name table (the locations are relative to the
beginning of each entry):
location description
00h specifies the length of a string. if there are no more strings in the table, this value is zero.
01h - xxh specifies the resident-name text. this string is case-sensitive and is not null-terminated.
xxh + 01h specifies an ordinal number that identifies the string. this number is an index into the entry table.
the first string in the resident-name table is the module name.
module-reference table
the module-reference table contains offsets for module names stored in the imported-name table. each entry in this table is 2
bytes long.
imported-name table
the imported-name table contains the names of modules that the executable file imports. each entry contains two parts: a single
byte that specifies the length of the string and the string itself. the strings in this table are not null-terminated.
entry table
the entry table contains bundles of entry points from the executable file (the linker generates each bundle). the numbering
system for these ordinal values is 1-based--that is, the ordinal value corresponding to the first entry point is 1.
the linker generates the densest possible bundles under the restriction that it cannot reorder the entry points. this restriction is
necessary because other executable files may refer to entry points within a given bundle by their ordinal values.
the entry-table data is organized by bundle, each of which begins with a 2-byte header. the first byte of the header specifies the
number of entries in the bundle (a value of 00h designates the end of the table). the second byte specifies whether the
corresponding segment is movable or fixed. if the value in this byte is 0ffh, the segment is movable. if the value in this byte is
0feh, the entry does not refer to a segment but refers, instead, to a constant defined within the module. if the value in this byte is
neither 0ffh nor 0feh, it is a segment index.
for movable segments, each entry consists of 6 bytes and has the following form:
location description
00h specifies a byte value. this value can be a combination of the following bits:
bit(s) meaning
0 if this bit is set, the entry is exported.
1 if this bit is set, the segment uses a global (shared) data segment.
3-7 if the executable file contains code that performs ring transitions, these bits specify the number of words
that compose the stack. at the time of the ring transition, these words must be copied from one ring to the
other.
01h specifies an int 3fh instruction.
03h specifies the segment number.
04h specifies the segment offset.
for fixed segments, each entry consists of 3 bytes and has the following form:
location description
00h specifies a byte value. this value can be a combination of the following bits:
bit(s) meaning
0 if this bit is set, the entry is exported.
1 if this bit is set, the entry uses a global (shared) data segment. (this may be set only for singledata
library modules.)
3-7 if the executable file contains code that performs ring transitions, these bits specify the number of words
that compose the stack. at the time of the ring transition, these words must be copied from one ring to the
other.
01h specifies an offset.
nonresident-name table
the nonresident-name table contains strings that identify exported functions in the executable file. as the name implies, these
strings are not always resident in system memory and are discardable. the nonresident-name strings are case-sensitive; they
are not null-terminated. the following list summarizes the values found in the nonresident-name table (the specified locations are
relative to the beginning of each entry):
location description
00h specifies the length, in bytes, of a string. if this byte is 00h, there are no more strings in the table.
01h - xxh specifies the nonresident-name text. this string is case-sensitive and is not null-terminated.
xx + 01h specifies an ordinal number that is an index to the entry table.
the first name that appears in the nonresident-name table is the module description string (which was specified in the
module-definition file).
code segments and relocation data
code and data segments follow the windows header. some of the code segments may contain calls to functions in other
segments and may, therefore, require relocation data to resolve those references. this relocation data is stored in a relocation
table that appears immediately after the code or data in the segment. the first 2 bytes in this table specify the number of
relocation items the table contains. a relocation item is a collection of bytes specifying the following information:
address type (segment only, offset only, segment and offset)
relocation type (internal reference, imported ordinal, imported name)
segment number or ordinal identifier (for internal references)
reference-table index or function ordinal number (for imported ordinals)
reference-table index or name-table offset (for imported names)
each relocation item contains 8 bytes of data, the first byte of which specifies one of the following relocation-address types:
value meaning
0 low byte at the specified offset
2 16-bit selector
3 32-bit pointer
5 16-bit offset
11 48-bit pointer
13 32-bit offset
the second byte specifies one of the following relocation types:
value meaning
0 internal reference
1 imported ordinal
2 imported name
3 osfixup
the third and fourth bytes specify the offset of the relocation item within the segment.
if the relocation type is imported ordinal, the fifth and sixth bytes specify an index to a module's reference table and the seventh
and eighth bytes specify a function ordinal value.
if the relocation type is imported name, the fifth and sixth bytes specify an index to a module's reference table and the seventh and
eighth bytes specify an offset to an imported-name table.
if the relocation type is internal reference and the segment is fixed, the fifth byte specifies the segment number, the sixth byte is
zero, and the seventh and eighth bytes specify an offset to the segment. if the relocation type is internal reference and the segment
is movable, the fifth byte specifies 0ffh, the sixth byte is zero; and the seventh and eighth bytes specify an ordinal value found in
the segment's entry table.
exe format
ne exe format
intel byte order
information from file format list 2.0 by max maischein.
the ne exe files are the new exe files used by windows and os/2 executables.
they contain a small mz exe which prints "this program requires microsoft
windows" or something similar but some files contain both dos and windows
versions of the executable. the position of the new exe header can be found
in the old exe header - see the mz exe topic for further information. all
offsets within this header are from the start of the header if not noted
otherwise.
offset count type description
0000h 2 char id='ne'
0002h 1 byte linker major version
0003h 1 byte linker minor version
0004h 1 word offset of entry table (see below)
0006h 1 word length of entry table in bytes
0008h 1 dword file load crc (0 in borland's tpw)
000ch 1 byte program flags, bitmapped :
0-1 - dgroup type :
0 - none
1 - single shared
2 - multiple
3 - (null)
2 - global initialization
3 - protected mode only
4 - 8086 instructions
5 - 80286 instructions
6 - 80386 instructions
7 - 80x87 instructions
000dh 1 byte application flags, bitmapped
0-2 - application type
1 - full screen (not aware of
windows/p.m. api)
2 - compatible with windows/p.m. api
3 - uses windows/p.m. api
3 - os/2 family application
4 - reserved?
5 - errors in image/executable
6 - "non-conforming program" whatever
7 - dll or driver (ss:sp info invalid, cs:ip
points at far init routine called with
ax=module handle which returns ax=0000h
on failure, ax nonzero on successful
initialization)
000eh 1 byte auto data segment index
0010h 1 word initial local heap size
0012h 1 word initial stack size
0014h 1 dword entry point (cs:ip),
cs is index into segment table
0018h 1 dword initial stack pointer (ss:sp)
ss is index into segment table
001ch 1 word segment count
001eh 1 word module reference count
0020h 1 word size of nonresident names table in bytes
0022h 1 word offset of segment table (see below)
0024h 1 word offset of resource table
0026h 1 word offset of resident names table
0028h 1 word offset of module reference table
002ah 1 word offset of imported names table
(array of counted strings, terminated with a
string of length 00h)
002ch 1 dword offset from start of file to nonresident
names table
0030h 1 word count of moveable entry point listed in
entry table
0032h 1 word file alignment size shift count
0 is equivalent to 9 (default 512-byte pages)
0034h 1 word number of resource table entries
0036h 1 byte target operating system
0 - unknown
1 - os/2
2 - windows
3 - european ms-dos 4.x
4 - windows 386
5 - boss (borland operating system services)
0037h 1 byte other os/2 exe flags, bitmapped
0 - long filename support
1 - 2.x protected mode
2 - 2.x proportional fonts
3 - executable has gangload area
0038h 1 word offset to return thunks or start of gangload
area - whatever that means.
003ah 1 word offset to segment reference thunks or length
of gangload area.
003ch 1 word minimum code swap area size
003eh 2 byte expected windows version (minor version first)
extension:dll,exe,fot
occurences:pc
programs:
reference:windows 3.1 sdk programmer's reference, vol 4.
see also:exe,mz exe