linux常用错误assemble,Assemble in Linux

Using Assembly Language in .

by Phillip

phillip@ussrback.comLast updated: Monday 8th January 2001

Note: there is a.

Contents:

This article will describe assembly language programming under Linux.

Contained within the bounds of the article is a comparison between Intel

and AT&T syntax asm, a guide to using syscalls and a introductory guide to

using inline asm in gcc.

This article was written due to the lack

of (good) info on this field of programming (inline asm section in

particular), in which case i should remind thee that this is not a

shellcode writing tutorial because there is no lack of info in this field.

Various parts of this text I have learnt about through

experimentation and hence may be prone to error. Should you find any of

these errors on my part, do not hesitate to notify me via email and

enlighten me on the given issue.

There is only one prerequisite

for reading this article, and thats obviously a basic knowledge of x86

assembly language and C.

Intel and AT&T syntax Assembly language are very different from each

other in appearance, and this will lead to confusion when one first comes

across AT&T syntax after having learnt Intel syntax first, or vice versa.

So lets start with the basics.

In Intel syntax there are no register prefixes or immed prefixes. In

AT&T however registers are prefixed with a '%' and immed's are prefixed

with a '$'. Intel syntax hexadecimal or binary immed data are suffixed

with 'h' and 'b' respectively. Also if the first hexadecimal digit is a

letter then the value is prefixed by a '0'.Example:Intex Syntaxmoveax,1

movebx,0ffh

int80hAT&T Syntaxmovl$1,%eax

movl$0xff,%ebx

int $0x80

The direction of the operands in Intel syntax is opposite from that

of AT&T syntax. In Intel syntax the first operand is the destination, and

the second operand is the source whereas in AT&T syntax the first operand is

the source and the second operand is the destination. The advantage of

AT&T syntax in this situation is obvious. We read from left to right, we

write from left to right, so this way is only natural.Example:Intex Syntaxinstrdest,source

moveax,[ecx]AT&T Syntaxinstr source,dest

movl(%ecx),%eax

Memory operands as seen above are different also. In Intel syntax

the base register is enclosed in '[' and ']' whereas in AT&T syntax it is

enclosed in '(' and ')'.Example:Intex Syntaxmoveax,[ebx]

moveax,[ebx+3]AT&T Syntaxmovl(%ebx),%eax

movl3(%ebx),%eax

The AT&T form. for instructions involving complex operations is very

obscure compared to Intel syntax. The Intel syntax form. of these is

segreg:[base+index*scale+disp]. The AT&T syntax form. is

%segreg:disp(base,index,scale).

Index/scale/disp/segreg are all

optional and can simply be left out. Scale, if not specified and index is

specified, defaults to 1. Segreg depends on the instruction and whether

the app is being run in real mode or pmode. In real mode it depends on the

instruction whereas in pmode its unnecessary. Immediate data used should

not '$' prefixed in AT&T when used for scale/disp.Example:Intel Syntaxinstr foo,segreg:[base+index*scale+disp]

moveax,[ebx+20h]

addeax,[ebx+ecx*2h

leaeax,[ebx+ecx]

subeax,[ebx+ecx*4h-20h]AT&T Syntaxinstr%segreg:disp(base,index,scale),foo

movl0x20(%ebx),%eax

addl(%ebx,%ecx,0x2),%eax

leal(%ebx,%ecx),%eax

subl-0x20(%ebx,%ecx,0x4),%eax

As you can see, AT&T is very obscure. [base+index*scale+disp] makes

more sense at a glance than disp(base,index,scale).

As you may have noticed, the AT&T syntax mnemonics have a suffix. The

significance of this suffix is that of operand size. 'l' is for long, 'w'

is for word, and 'b' is for byte. Intel syntax has similar directives for

use with memory operands, i.e. byte ptr, word ptr, dword ptr. "dword" of

course corresponding to "long". This is similar to type casting in C but

it doesnt seem to be necessary since the size of registers used is the

assumed datatype.Example:Intel Syntaxmoval,bl

movax,bx

moveax,ebx

moveax, dword ptr [ebx]AT&T Syntaxmovb%bl,%al

movw%bx,%ax

movl%ebx,%eax

movl(%ebx),%eax**NOTE: ALL EXAMPLES FROM HERE WILL BE IN AT&T SYNTAX**

This section will outline the use of linux syscalls in assembly

language. Syscalls consist of all the functions in the second section of

the manual pages located in /usr/man/man2. They are also listed in:

/usr/include/sys/syscall.h. A great list is atThese functions can be executed via the linux interrupt service: int

$0x80.

For all syscalls, the syscall number goes in %eax. For syscalls that

have less than six args, the args go in %ebx,%ecx,%edx,%esi,%edi in order.

The return value of the syscall is stored in %eax.

The syscall

number can be found in /usr/include/sys/syscall.h. The macros are defined

as SYS_i.e. SYS_exit, SYS_close, etc.Example:

(Hello world program - it had to be done)

According to the write(2) man page, write is declared as: ssize_t

write(int fd, const void *buf, size_t count);

Hence fd goes in

%ebx, buf goes in %ecx, count goes in %edx and SYS_write goes in %eax.

This is followed by an int $0x80 which executes the syscall. The return

value of the syscall is stored in %eax.$ cat write.s

.include "defines.h"

.data

hello:

.string "hello world\n"

.globlmain

main:

movl$SYS_write,%eax

movl$STDOUT,%ebx

movl$hello,%ecx

movl$12,%edx

int$0x80

ret

$

The same process applies to syscalls which have less than five args.

Just leave the un-used registers unchanged. Syscalls such as open or fcntl

which have an optional extra arg will know what to use.

Syscalls whos number of args is greater than five still expect the

syscall number to be in %eax, but the args are arranged in memory and the

pointer to the first arg is stored in %ebx.

If you are using the

stack, args must be pushed onto it backwards, i.e. from the last arg to

the first arg. Then the stack pointer should be copied to %ebx. Otherwise

copy args to an allocated area of memory and store the address of the

first arg in %ebx.Example:

(mmap being the example syscall).

Using mmap() in C:#include #include #include #include #include #define STDOUT1

void main(void) {

char file[]="mmap.s";

char *mappedptr;

int fd,filelen;

fd=fopen(file, O_RDONLY);

filelen=lseek(fd,0,SEEK_END);

mappedptr=mmap(NULL,filelen,PROT_READ,MAP_SHARED,fd,0);

write(STDOUT, mappedptr, filelen);

munmap(mappedptr, filelen);

close(fd);

}Arrangement of mmap() args in memory:%esp%esp+4%esp+8%esp+12%esp+16%esp+20

00000000filelen0000000100000001fd00000000ASM Equivalent:$ cat mmap.s

.include "defines.h"

.data

file:

.string "mmap.s"

fd:

.long 0

filelen:

.long 0

mappedptr:

.long 0

.globl main

main:

push%ebp

movl%esp,%ebp

subl$24,%esp

//open($file, $O_RDONLY);

movl$fd,%ebx// save fd

movl%eax,(%ebx)

//lseek($fd,0,$SEEK_END);

movl$filelen,%ebx// save file length

movl%eax,(%ebx)

xorl%edx,%edx

//mmap(NULL,$filelen,PROT_READ,MAP_SHARED,$fd,0);

movl%edx,(%esp)

movl%eax,4(%esp)// file length still in %eax

movl$PROT_READ,8(%esp)

movl$MAP_SHARED,12(%esp)

movl$fd,%ebx// load file descriptor

movl(%ebx),%eax

movl%eax,16(%esp)

movl%edx,20(%esp)

movl$SYS_mmap,%eax

movl%esp,%ebx

int$0x80

movl$mappedptr,%ebx// save ptr

movl%eax,(%ebx)

// write($stdout, $mappedptr, $filelen);

//munmap($mappedptr, $filelen);

//close($fd);

movl%ebp,%esp

popl%ebp

ret

$**NOTE: The above source listing differs from the example source code

found at the end of the article. The code listed above does not show the other

syscalls, as they are not the focus of this section. The source above also

only opens mmap.s, whereas the example source reads the command line

arguments. The mmap example also uses lseek to get the filesize.**

Socket syscalls make use of only one syscall number: SYS_socketcall

which goes in %eax. The socket functions are identified via a subfunction

numbers located in /usr/include/linux/net.h and are stored in %ebx. A

pointer to the syscall args is stored in %ecx. Socket syscalls are also

executed with int $0x80.$ cat socket.s

.include "defines.h"

.globl_start

_start:

pushl%ebp

movl%esp,%ebp

sub$12,%esp

//socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);

movl$AF_INET,(%esp)

movl$SOCK_STREAM,4(%esp)

movl$IPPROTO_TCP,8(%esp)

movl$SYS_socketcall,%eax

movl$SYS_socketcall_socket,%ebx

movl%esp,%ecx

int$0x80

movl $SYS_exit,%eax

xorl %ebx,%ebx

int $0x80

movl%ebp,%esp

popl%ebp

ret

$

Command line arguments in linux executables are arranged on the stack.

argc comes first, followed by an array of pointers (**argv) to the strings

on the command line followed by a NULL pointer. Next comes an array of

pointers to the environment (**envp). These are very simply obtained in

asm, and this is demonstrated in the example code (args.s).

This section on GCC inline asm will only cover the x86 applications.

Operand constraints will differ on other processors. The location of the

listing will be at theof this article.

Basic inline assembly in gcc is very straightforward. In its basic form. it

looks like this:__asm__("movl%esp,%eax");// look familiar ?or__asm__("

movl$1,%eax// SYS_exit

xor%ebx,%ebx

int$0x80

");

It is possible to use it more effectively by specifying the data that

will be used as input, output for the asm as well as which registers will

be modified. No particular input/output/modify field is compulsory. It is

of the format:__asm__("" : output : input : modify);

The output and input fields must consist of an operand constraint

string followed by a C expression enclosed in parentheses. The output

operand constraints must be preceded by an '=' which indicates that it is

an output. There may be multiple outputs, inputs, and modified registers.

Each "entry" should be separated by commas (',') and there should be no

more than 10 entries total. The operand constraint string may either

contain the full register name, or an abbreviation.Abbrev Table

AbbrevRegister

a%eax/%ax/%al

b%ebx/%bx/%bl

c%ecx/%cx/%cl

d%edx/%dx/%dl

S%esi/%si

D%edi/%di

mmemoryExample:__asm__("test%%eax,%%eax", : /* no output */ : "a"(foo));OR__asm__("test%%eax,%%eax", : /* no output */ : "eax"(foo));

You can also use the keyword __volatile__ after __asm__: "You can

prevent an `asm' instruction from being deleted, moved significantly, or

combined, by writing the keyword `volatile' after the `asm'."(Quoted from the "Assembler Instructions with C Expression Operands" section

in the gcc info files.)$ cat inline1.c

#include int main(void) {

int foo=10,bar=15;

__asm__ __volatile__ ("addl %%ebxx,%%eax"

: "=eax"(foo) // ouput

: "eax"(foo), "ebx"(bar)// input

: "eax"// modify

);

printf("foo+bar=%d\n", foo);

return 0;

}

$

You may have noticed that registers are now prefixed with "%%" rather

than '%'. This is necessary when using the output/input/modify fields

because register aliases based on the extra fields can also be used. I

will discuss these shortly.

Instead of writing "eax" and forcing

the use of a particular register such as "eax" or "ax" or "al", you can

simply specify "a". The same goes for the other general purpose registers

(as shown in the Abbrev table). This seems useless when within the actual

code you are using specific registers and hence gcc provides you with

register aliases. There is a max of 10 (%0-%9) which is also the reason

why only 10 inputs/outputs are allowed.$ cat inline2.c

int main(void) {

long eax;

short bx;

char cl;

__asm__("nop;nop;nop"); // to separate inline asm from the rest of

// the code

__volatile__ __asm__("

test%0,%0

test%1,%1

test%2,%2"

: /* no outputs */

: "a"((long)eax), "b"((short)bx), "c"((char)cl)

);

__asm__("nop;nop;nop");

return 0;

}

$ gcc -o inline2 inline2.c

$ gdb ./inline2

GNU gdb 4.18

Copyright 1998 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB. Type "show warranty" for details.

This GDB was configured as "i686-pc-linux-gnulibc1"...

(no debugging symbols found)...

(gdb) disassemble main

Dump of assembler code for function main:

... start: inline asm ...

0x8048427: nop

0x8048428: nop

0x8048429: nop

0x804842a: mov 0xfffffffc(%ebp),%eax

0x804842d: mov 0xfffffffa(%ebp),%bx

0x8048431: mov 0xfffffff9(%ebp),%cl

0x8048434: test %eax,%eax

0x8048436: test %bx,%bx

0x8048439: test %cl,%cl

0x804843b: nop

0x804843c: nop

0x804843d: nop

... end: inline asm ...

End of assembler dump.

$

As you can see, the code that was generated from the inline asm loads

the values of the variables into the registers they were assigned to in

the input field and then proceeds to carry out the actual code. The

compiler auto detects operand size from the size of the variables and so

the corresponding registers are represented by the aliases %0, %1 and %2.

(Specifying the operand size in the mnemonic when using the register

aliases may cause errors while compiling).

The aliases may also

be used in the operand constraints. This does not allow you to specify

more than 10 entries in the input/output fields. The only use for this i

can think of is when you specify the operand constraint as "q" which

allows the compiler to choose between a,b,c,d registers. When this

register is modified we will not know which register has been chosen and

consequently cannot specify it in the modify field. In which case you can

simply specify "".Example:$ cat inline3.c

#include int main(void) {

long eax=1,ebx=2;

__asm__ __volatile__ ("add %0,%2"

: "=b"((long)ebx)

: "a"((long)eax), "q"(ebx)

: "2"

);

printf("ebx=%x\n", ebx);

return 0;

}

$

Compiling assembly language programs is much like compiling normal C

programs. If your program looks like Listing 1, then you would compile it

like you would a C app. If you use _start instead of main, like in Listing

2 you would compile the app slightly differently:Listing 1$ cat write.s

.data

hw:

.string "hello world\n"

.text

.globl main

main:

movl$SYS_write,%eax

movl$1,%ebx

movl$hw,%ecx

movl$12,%edx

int$0x80

movl$SYS_exit,%eax

xorl%ebx,%ebx

int$0x80

ret

$ gcc -o write write.s

$ wc -c ./write

4790 ./write

$ strip ./write

$ wc -c ./write

2556 ./writeListing 2$ cat write.s

.data

hw:

.string "hello world\n"

.text

.globl _start

_start:

movl$SYS_write,%eax

movl$1,%ebx

movl$hw,%ecx

movl$12,%edx

int$0x80

movl$SYS_exit,%eax

xorl%ebx,%ebx

int$0x80

$ gcc -c write.s

$ ld -s -o write write.o

$ wc -c ./write

408 ./write

The -s switch is optional, it just creates a stripped ELF executable

which is smaller than a non-stripped one. This method (Listing 2) also

creates smaller executables, since the compiler isnt adding extra entry

and exit routines as would normally be the case.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值