If u are careful enough, u will notice, when reading assembly code, a sentence as below is met
quite often:
.align
It is used to align the position of data to boundary of memory word. Why do so is a matter of
program running performance. When CPU needs a data as its operand of current executing
instruction, it gets the data via bus between itself and main memory. Say, the bus we referred
is a 32-bit bus, which is not uncommon in today's 2-byte=1-word PCs. Then let's simulate CPU
to fetch the operand, and do an intuitional comparison among aligned and unaligned conditions.
A very important issue to our comparison must be cited first, that is, the memory must only be
accessed in such a manner as follows:
1. 32-bit one time, namely, 4-byte a time, which is just the bandwidth of the 32-bit bus;
2. these bytes must start at an aligned address, namely, the address of the first byte is formally
like XXX...XX00;
We have fixed our bus above, 32-bit. And we now assume our operand and its allocation as table:
| Unaligned | aligned |
operand | 4-byte | 4-byte |
Memory allocation | xxx…xx010 | xxx…xx000 |
We first consider unaligned allocation of the data, surely, by the means of the bus above,
we can not fetch the operand in one time, because it span across two aligned memory region.
One time, a sole and intact aligned memory region will be transmitted via the bus. And we need
two times at least.(also at most here, because this operand is 32-bit in all ). Exactly, to the table
above, the first time we fetch xxx…xx000 -- xxx…xx011; the second time, xxx…xx100 – xxx…xx111;
then it’s the CPU ‘s job to extract each part of the operand and assemble them to an entire data.
On the other hand, for aligned case, a time for xxx…xx000 – xxx…xx011 is enough for the same operand.
Since the memory access time is relatively long for computer system, this method evidently reduces
the memory access times and enhances the overall performance, although with a small wastage
of memory.
Also we may notice that usually this .align operator is found before data rather than normal instruction,
the reason is instructions are usually prefetched before needed, and in form of blocks rather than sole
instruction one time, so it is unnecessary for aligning the instructions.