# DELPHI ASM教程(1)

http://dennishomepage.gugs-cats.dk/BASM-filer/BASMForBeginners.htm

Introduction to BASM for Beginners

The series of articles named “BASM for beginners” currently consists of 7 articles and no. 8 and 9 are in progress. Common for the articles, and coming articles, is that they explain some BASM issues by use of an example function. Most often this function is first implemented in Pascal and then the compiler generated assembler code is copied from the CPU view in Delphi and then analyzed and optimized. Sometimes optimization involves the usage of MMX, SSE or SSE2 instructions.

By taking the code made by the compiler from a Pascal function the most commonly used instructions from the big 32-bit Intel Architecture instruction set are introduced to the beginner first. Seeing which code the compiler generates is leading to a valuable insight in the effectiveness of compiler generated code in general and into the Delphi compiler specifically.

As specific assembly code optimizations are introduced generalizations will be introduced when suitable. These general optimizations are suitable for implementation in compilers and most compilers including Delphi have them. At some point in the future a tool that automatically optimizes assembler code will be developed.

Knowledge about the target processor is often needed when optimizing code and therefore are a lot of CPU details, such as pipelines, explained in the series too.

As far as I know there is only little literature available that explains all these issues on a level where beginners can follow it. I hope this series will help fill this void.

Best regards
Dennis Kjaer Christensen

Lesson 1

第一课

The first little example gets us started. It is a simple function in Pascal with multiplies an integer with the constant 2.

function MulInt2(I : Integer) : Integer;

begin

Result := I * 2;

end;

Lets steal the BASM from the CPU view. I compiled with optimizations turned on.

function MulInt2_BASM(I : Integer) : Integer;

begin

Result := I * 2;

{

ret

}

end;

From this we see that I am transferred to the function in eax and that the result is transferred back to the caller in eax too. This is the convention for the register calling convention, which is the default in Delphi. The actual code is very simple, the times 2 multiplication is obtained by adding I to itself, I+I = 2I. The ret instruction returns execution to the line after the one which called the function.

Lets code the function as a pure asm function.

function MulInt2_BASM2(I : Integer) : Integer;

asm

//Result := I * 2;

//ret

end;

Observe that the ret function is supplied by the inline assembler.

Let us take a look at the calling code.

This is the Pascal code

procedure TForm1.Button1Click(Sender: TObject);

var

I, J : Integer;

begin

I := StrToInt(IEdit.Text);

J := MulInt2_BASM2(I);

JEdit.Text := IntToStr(J);

end;

The important line is

J := MulInt2_BASM2(I);

From the cpu view

CPU View窗体中可以看到:

call StrToInt

call MulInt2_BASM2

mov esi,eax

After the call to StrToInt from the line before the one, which calls our function, I am in eax. (StrToInt is also following the register calling convention). MulInt2_BASM2 is called and returns the result in eax, which is copied, to esi in the next line.

Optimization issues: Multiplication by 2 can be done in two more ways. Use the mul instruction or shifting left by one. In the Intel IA32 SW developers manual 2 page 536 mul is described. It multiplies the value in eax by another register and the result is returned in the register pair edx:eax. A register pair is needed because a multiplication of two 32 bit numbers results in a 64 bit result, just like 9*9=81 - two one digit numbers (can) result in a two digit result.

This raises the issue of which registers must be preserved by a function and which can be used freely. This is explained in the Delphi help.

"An asm statement must preserve the EDI, ESI, ESP, EBP, and EBX registers, but can freely modify the EAX, ECX, and EDX registers."

We can conclude that it is no problem that edx is modified by the mul instruction and our function can also be implemented like this.

function MulInt2_BASM3(I : Integer) : Integer;

asm

//Result := I * 2;

mov ecx, 2

mul ecx

end;

ecx is used also but this is also ok. As long as the result is less than the range of integer it is returned correctly in eax. If I am bigger than half the range of integer overflow will occur and the result is incorrect.

Ecx也可以同样使用,只要结果是小于integer的长度就可以正确的返回在eax中的值,如果长度大于integer的长度将会发生溢出和结果不正确.

Implementation with shift

function MulInt2_BASM4(I : Integer) : Integer;

asm

//Result := I * 2;

shl eax,1

end;

Timing can reveal which implementation is fastest. We can also consult Intel or AMD documents with latency and throughput tables. Add & mov is 0.5 cycles latency and throughput, mul is 14-18 cycles latency and 5 cycles throughput. shl is 4 cycles latency and 1 cycle throughput. The version chosen by Delphi is the most efficient on P4 and this will probably also be the case on Athlon and P3.

Issues not covered: mul versus imul and range checking, other calling conventions, benchmarking, clock count on other processors, clock count for call + ret, location of return address for ret etc..

=======================================

inline asm function 内嵌汇编函数 begin end 开头的汇编函数代码函数

pure  asm function 完全汇编函数 asm begin开头和结束的汇编代码函数

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120