MICROSOFT EXCEL FILE FORMAT
Microsoft Excel is a popular spreadsheet. It uses a file format called BIFF (Binary
File Format). There are many types of BIFF records. Each has a 4 byte header. The
first two bytes are an opcode that specifies the record type. The second two bytes
specify record length. Header values are stored in byte-reversed form (less significant
byte first). The rest of the record is the data itself (Figure 2-1).
Figure 2-1. BIFF record header.
| Record Header | Record Body
Byte Number | 0 1 2 3 | 0 1 ...
-----------------------------------
Record Contents | XX | XX | XX | XX | XX | XX | ...
-----------------------------------
| opcode | length | data
Each X represents a hexadecimal digit
Two X's form a byte. The least significant (low) byte of the opcode is byte 0 and the
most significant (high) byte is byte 1. Similarly, the low byte of the record length
field is byte 2 and the high byte is byte 3.
BOF (Beginning of File)
The first record in every spreadsheet is always of the BOF type (Figure 2-2).
Figure 2-2. BOF record.
| Record Header | Record Body |
Byte | 0 1 2 3 | 0 1 2 3 |
-----------------------------------------
Contents | 09 | 00 | 04 | 00 | 02 | 00 | 10 | 00 |
-----------------------------------------
| opcode | length | version | file |
| | | number | type |
The first two bytes, arranged with the low byte first, show that the opcode for BOF is
09h. The second two bytes indicate that the record body is 4 bytes long. The first two
bytes of the body are the version number (2 for the initial version of Excel). The last
two bytes are the file type. Type 10h is a worksheet file.
Relating Spreadsheet Cells to Record Data Bytes
A spreadsheet appears on a screen or printout as a matrix of rectangular cells. Each
column is identified by a letter at its top, and each row is identified by a number.
Thus cell A1 is in the first column and the first row. Cell C240 is in the third column
and the 240th row. This scheme identifies cells in a way easily understood by people.
However, it is not particularly convenient for computers, as they do not handle letters
efficiently. They are best at dealing with binary numbers. Thus, Excel stores cell
identifiers as binary numbers, that people can read as hexadecimal. The first number in
the system is 0 rather than 1.
Figure 2-3, which shows the form of an INTEGER record, illustrates the storage of column
and row information.
Figure 2-3. INTEGER record.
| Record Header | Record Body
Byte | 0 1 2 3 | 0 1 2 3 4 5 6 7 8 |
------------------------------------------------------------------
Value | 02 | 00 | 09 | 00 | 00 | 00 | 02 | 00 | 00 | 00 | 00 | 39 | 00 |
------------------------------------------------------------------
| opcode | length | row | column | rgbAttr | w |
Opcode 2 indicates an integer record. The length bytes show that the record body is 9
bytes long. Row 0 in the body corresponds to spreadsheet row 1. Row 1 corresponds to
spreadsheet row 2, and so on. Column 2 corresponds to spreadsheet column C. Thus,
Figure 2-3 deals with cell C1. The next three bytes, labeled "rgbAttr," specify cell
attributes (Table 2-3). The final pair of bytes, (labeled "w") holds the integer's
value. Here it is 39H or 57 decimal. Thus the record specifies that cell C1 of the
spreadsheet contains an integer with the value 57.
Standard File Record Order
Excel worksheet files have each record type in a predetermined position. A file need
not have all types, but the ones that are present are always be in the same order.
Table 2-1 lists the record types for Excel document (spreadsheet) files, in the order
they would appear in a BIFF file. Table 2-2 lists the types in opcode order.
Several record types in a BIFF file, namely, ROW, BLANK, INTEGER, NUMBER, LABEL,
BOOLERR, FORMULA, and COLUMN DEFAULT, describe the contents of a cell. These records
contain a 3 byte attribute field labeled "rgbAttr". The following table describes how
the bits in the field correspond to cell attributes.
Table 2-1. Cell Attributes
Byte Offset Bit Description Contents
0 7 Cell is not hidden 0b
Cell is hidden 1b
6 Cell is not locked 0b
Cell is locked 1b
5-0 Reserved, must be 0 000000b
7-6 Font number (4 possible)
5-0 Cell format code
2 7 Cell is not shaded 0b
Cell is shaded 1b
6 Cell has no bottom border 0b
Cell has a bottom border 1b
5 Cell has no top border 0b
Cell has a top border 1b
4 Cell has no right border 0b
Cell has a right border 1b
3 Cell has no left border 0b
Cell has a left border 1b
2-0 Cell alignment code
general 000b
left 001b
center 010b
right 011b
fill 100b
Multiplan default align. 111b
The font number field is a zero-based index into the document's table of fonts. the
cell format code is a zero-based index into the document's table of picture formats.
There are 21 different standard formats. Additional custom formats may be defined by
the user. See the FONT and FORMAT record descriptions form additonal details.
Table 2-2. Excel Record Type in Order of Appearance
Record Type Opcode (Hexadecimal)
BOF 09
FILEPASS 2F
INDEX 0B
CALCCOUNT 0C
CALCMODE 0D
PRECISION 0E
REFMODE 0F
DELTA 10
ITERATION 11
1904 22
BACKUP 40
PRINT ROW HEADERS 2A
PRINT GRIDLINES 2B
HORIZONTAL PAGE BREAKS 1B
VERTICAL PAGE BREAKS 1A
DEFAULT ROW HEIGHT 25
FONT 31
FONT2 32
HEADER 14
FOOTER 15
LEFT MARGIN 26
RIGHT MARGIN 27
TOP MARGIN 28
BOTTOM MARGIN 29
COLWIDTH 24
EXTERNCOUNT 16
EXTERNSHEET 17
EXTERNNAME 23
FORMATCOUNT 1F
FORMAT 1E
NAME 18
DIMENSIONS 00
COLUMN DEFAULT 20
ROW 08
BLANK 01
INTEGER 02
NUMBER 03
LABEL 04
BOOLERR 05
FORMULA 06
ARRAY 21
CONTINUE 3C
STRING 07
TABLE 36
TABLE2 37
PROTECT 12
WINDOW PROTECT 19
PASSWORD 13
NOTE 1C
WINDOW1 3D
WINDOW2 3E
PANE 41
SELECTION 1D
EOF 0A
Table 2-3. Excel Record Types in Opcode Order
Record Type Opcode (hexadecimal)
DIMENSIONS 00
BLANK 01
INTEGER 02
NUMBER 03
LABEL 04
BOOLERR 05
FORMULA 06
STRING 07
ROW 08
BOF 09
EOF 0A
INDEX 0B
CALCCOUNT 0C
CALCMODE 0D
PRECISION 0E
REFMODE 0F
DELTA 10
ITERATION 11
PROTECT 12
PASSWORD 13
HEADER 14
FOOTER 15
EXTERNCOUNT 16
EXTERNSHEET 17
NAME 18
WINDOW PROTECT 19
VERTICAL PAGE BREAKS 1A
HORIZONTAL PAGE BREAKS 1B
NOTE 1C
SELECTION 1D
FORMAT 1E
FORMATCOUNT 1F
COLUMN DEFAULT 20
ARRAY 21
1904 22
EXTERNNAME 23
COLWIDTH 24
DEFAULT ROW HEIGHT 25
LEFT MARGIN 26
RIGHT MARGIN 27
TOP MARGIN 28
BOTTOM MARGIN 29
PRINT ROW HEADERS 2A
PRINT GRIDLINES 2B
FILEPASS 2F
FONT 31
FONT2 32
TABLE 36
TABLE2 37
CONTINUE 3C
WINDOW1 3D
WINDOW2 3E
BACKUP 40
PANE 41