Say hello to x86_64 Assembly [part 4]

title: Say hello to x86_64 Assembly [part 4]

date: 2020-01-11 23:30:09

tags:

  • x86

  • x64

  • 汇编

  • assembly


翻译原文地址

Say hello to x86_64 Assembly [part 4]

不久前,我开始写一系列关于x86_64汇编编程的博客文章。你可以通过asm标签找到它。不幸的是,我上次很忙,没有新的帖子,所以今天我继续写关于大会的帖子,并将努力做到每周。

Some time ago i started to write series of blog posts about assembly programming for x86_64. You can find it by asm tag. Unfortunately i was busy last time and there were not new post, so today I continue to write posts about assembly, and will try to do it every week.

今天我们要看字符串和一些字符串操作。我们仍然使用NASM汇编和Linux X86_64。

Today we will look at strings and some strings operations. We still use nasm assembler, and linux x86_64.

反转字符串

Reverse string

当然,当我们谈论汇编语言时,我们不能谈论字符串数据类型,实际上我们是在处理字节数组。让我们尝试编写一个简单的示例,我们将定义字符串数据,并尝试将结果反向并通过stdout输出。当我们开始学习新的编程语言时,这些任务看起来非常简单和流行。让我们看看实现。 首先,我定义初始化数据。它将放在数据部分(您可以阅读有关部分的内容):

Of course when we talk about assembly programming language we can’t talk about string data type, actually we’re dealing with array of bytes. Let’s try to write simple example, we will define string data and try to reverse and write result to stdout. This tasks seems pretty simple and popular when we start to learn new programming language. Let’s look on implementation.

First of all, I define initialized data. It will be placed in data section (You can read about sections in part):

section .data
    SYS_WRITE equ 1
    STD_OUT   equ 1
    SYS_EXIT  equ 60
    EXIT_CODE equ 0
​
    NEW_LINE db 0xa
    INPUT db "Hello world!"
section .bss
    OUTPUT resb 12
_start:
    mov rsi, INPUT
    xor rcx, rcx
    cld
    mov rdi, $ + 15
    call calculateStrLength
    xor rax, rax
    xor rdi, rdi
    jmp reverseStr
calculateStrLength:
    ;; check is it end of string
    cmp byte [rsi], 0
    ;; if yes exit from function
    je exitFromRoutine
    ;; load byte from rsi to al and inc rsi
    lodsb
    ;; push symbol to stack
    push rax
    ;; increase counter
    inc rcx
    ;; loop again
    jmp calculateStrLength
exitFromRoutine:
    ;; return to _start
    ret
mov rdi, $ + 15
objdump -D reverse
​
reverse:     file format elf64-x86-64
​
Disassembly of section .text:
​
00000000004000b0 <_start>:
  4000b0: 48 be 41 01 60 00 00  movabs $0x600141,%rsi
  4000b7: 00 00 00
  4000ba: 48 31 c9              xor    %rcx,%rcx
  4000bd: fc                    cld
  4000be: 48 bf cd 00 40 00 00  movabs $0x4000cd,%rdi
  4000c5: 00 00 00
  4000c8: e8 08 00 00 00        callq  4000d5 <calculateStrLength>
  4000cd: 48 31 c0              xor    %rax,%rax
  4000d0: 48 31 ff              xor    %rdi,%rdi
  4000d3: eb 0e                 jmp    4000e3 <reverseStr>
exitFromRoutine:
    ;; push return addres to stack again
    push rdi
    ;; return to _start
    ret
reverseStr:
    cmp rcx, 0
    je printResult
    pop rax
    mov [OUTPUT + rdi], rax
    dec rcx
    inc rdi
    jmp reverseStr
printResult:
    mov rdx, rdi
    mov rax, 1
    mov rdi, 1
    mov rsi, OUTPUT
                syscall
    jmp printNewLine
​
printNewLine:
    mov rax, SYS_WRITE
    mov rdi, STD_OUT
    mov rsi, NEW_LINE
    mov rdx, 1
    syscall
    jmp exit
exit:
    mov rax, SYS_EXIT
    mov rdi, EXIT_CODE
    syscall
all:
  nasm -g -f elf64 -o reverse.o reverse.asm
  ld -o reverse reverse.o
​
clean:
  rm reverse reverse.o
  • REP - repeat while rcx is not zero

  • MOVSB - copy a string of bytes (MOVSW, MOVSD and etc..)

  • CMPSB - byte string comparison

  • SCASB - byte string scanning

  • STOSB - write byte to string

Of course there are many other instructions for string/bytes manipulations:

  • REP-在rcx不为零时重复

  • MOVSB-复制字节字符串(MOVSW、MOVSD等)

  • CMPSB-字节字符串比较

  • SCASB-字节字符串扫描

  • STOSB-将字节写入字符串

当然,还有许多其他的字符串/字节操作说明:

字符串操作

and run it:

并运行它:

That’s all, now we can compile our program with:

就这些,现在我们可以编译我们的程序:

and exit from the our program:

从我们的程序退出:

After execution of reverseStr we have reversed string in OUTPUT buffer and can write result to stdout with new line:

Here we check our counter which is length of string and if it is zero we wrote all symbols to buffer and can print it. After checking counter we pop from stack to rax register first symbol and write it to OUTPUT buffer. We add rdi because in other way we’ll write symbol to first byte of buffer. After this we increase rdi for moving next by OUTPUT buffer, decrease length counter and jump to the start of label.

在这里,我们检查计数器,它是字符串的长度,如果它是零,我们将所有元素写入缓冲区,并可以打印它。检查完计数器后,我们从堆栈弹出到rax寄存器的第一个元素,并将其写入输出缓冲区。我们添加rdi是因为用另一种方法我们将符号写入缓冲区的第一个字节。在此之后,我们通过输出缓冲区增加移动下一个的rdi,减少长度计数器并跳到标签的开始。 执行reverseStr之后,我们在输出缓冲区中反转了字符串,可以用新行将结果写入stdout:

Now we return to start. After call of the calculateStrLength we write zeros to rax and rdi and jump to reverseStr label. It’s implementation is following:

现在我们回到起点。在调用calculateStrength之后,我们向rax和rdi写入零并跳转到reverseStr标签。具体实施如下:

We can see here that line 12 (our mov rdi, + 15. Now we can push return address from rdi to stack and return from function:

我们可以看到,第12行(mov rdi,)占用个字节,第行的函数调用占用个字节。所以我们的回信地址是,+15。现在,我们可以将返回地址从rdi推送到堆栈并从函数返回:

So we have position of mov rdi, $ + 15, but why we add 15 here? Look, we need to know position of next line after calculateStrLength. Let’s open our file with objdump util:

  • $ - returns position in memory of string where $ defined

  • $$ - returns position in memory of current section start

First all:

所以我们有mov rdi的位置,$+15,但是为什么我们在这里加15?听着,我们需要知道下一行在计算长度后的位置。让我们使用objdump util打开文件:

  • 返回字符串内存中定义的位置

  • $$-返回当前节开始的内存位置

首先:

It will not work. Why? It is tricky. Remember we called calculateStrLength at _start. What occurs when we call a function? First of all function’s parameters pushes to stack from right to left. After it return address pushes to stack. So function will know where to return after end of execution. But look at calculateStrLength, we pushed symbols from our string to stack and now there is no return address of stack top and function doesn’t know where to return. How to be with it. Now we must take a look to the weird instruction before call:

它不会起作用的。为什么?这很棘手。记得我们在开始时调用calculateStrLength。当我们调用一个函数时会发生什么?首先,函数的参数从右向左推到堆栈。返回地址将推送到堆栈。所以函数将知道在执行结束后返回哪里。但是看看calculateStrength,我们将符号从字符串推送到堆栈,现在堆栈顶部没有返回地址,函数不知道返回到哪里。如何面对它。现在,我们必须在调用之前查看奇怪的指令:

Ok, we pushed all symbols from string to stack, now we can jump to exitFromRoutine return to _start there. How to do it? We have ret instruction for this. But if code will be like this:

好的,我们把所有的符号从字符串推到栈,现在我们可以跳转到exitFromRoutine返回到_start 这里。怎么做?我们有ret指令。代码是这样的:

As you can understand by it’s name, it just calculates length of INPUT string and store result in rcx register. First of all we check that rsi register doesn’t point to zero, if so this is the end of string and we can exit from function. Next is lodsb instruction. It’s simple, it just put 1 byte to al register (low part of 16 bit ax) and changes rsi pointer. As we executed cld instruction, lodsb everytime will move rsi to one byte from left to right, so we will move by string symbols. After it we push rax value to stack, now it contains symbol from our string (lodsb puts byte from si to al, al is low 8 bit of rax). Why we did push symbol to stack? You must remember how stack works, it works by principle LIFO (last input, first output). It is very good for us. We will take first symbol from si, push it to stack, than second and so on. So there will be last symbol of string at the stack top. Than we just pop symbol by symbol from stack and write to OUTPUT buffer. After it we increment our counter (rcx) and loop again to the start of routine.

正如您可以通过它的名称理解他的含义,它只计算输入字符串的长度并将结果存储在rcx寄存器中。首先,我们检查RSI寄存器不指向零,如果是,这是字符串的结尾,我们可以从函数中退出。接下来是lodsb指令。很简单, 他只是把1字节放到al寄存器(16位ax的低位)并更改rsi指针。当我们执行cld指令时,lodsb每次都将rsi从左到右移动到一个字节,因此我们将按字符串元素移动。之后,我们将rax值推送到堆栈,现在它包含字符串中的符号(lodsb将字节从si放到al,al是rax的低8位)。为什么我们要把符号推到堆栈上?你必须记住堆栈是如何工作的,它是按照后进先出的原则工作的。这对我们很有好处。我们将从si中获取第一个元素,将其推到堆栈中,而不是第二个元素,依此类推。所以在堆栈顶部会有字符串的最后一个元素。而不仅仅是从堆栈中逐元素弹出并写入输出缓冲区。在它之后,我们增加计数器(rcx)并再次循环到例程的开始。

Here are some new things. Let’s see how it works: First of all we put INPUT address to si register at line 2, as we did for writing to stdout and write zeros to rcx register, it will be counter for calculating length of our string. At line 4 we can see cld operator. It resets df flag to zero. We need in it because when we will calculate length of string, we will go through symbols of this string, and if df flag will be 0, we will handle symbols of string from left to right. Next we call calculateStrLength function. I missed line 5 with mov rdi, $ + 15 instruction, i will tell about it little later. And now let’s look at calculateStrLength implementation:

这里有一些新东西。让我们看看它是如何工作的:首先,我们把INPUT的地址放在第2行的rsi寄存器中,就像我们写stdout和写零到rcx寄存器一样,它将是计算字符串长度的计数器。在第四行我们可以看到cld操作符。它将df flag重置为零。因为当我们计算字符串的长度时需要它,我们将遍历该字符串的内容元素,如果df flag为0,我们将从左到右处理字符串的符号。接下来我们调用calculateStrLength函数。我略过了有mov rdi指令的第5行,$+15指令,我稍后会告诉您。现在让我们看一下calculateStrength的实现:

Ok we have some data and buffer where to put result, now we can define text section for code. Let’s start from main _start routine:

好的,我们有一些数据和缓冲区用来存放结果,现在我们可以定义代码的文本段。让我们从主启动程序开始:

Next we define bss section for our buffer, where we will put reversed string:

  • NEW_LINE - new line (\n) symbol

  • INPUT - our input string, which we will reverse

syscall list you can find - here. Also there defined:

  • SYS_WRITE - ‘write’ syscall number

  • STD_OUT - stdout file descriptor

  • SYS_EXIT - ‘exit’ syscall number

  • EXIT_CODE - exit code

Here we can see four constants:

接下来,我们为缓冲区定义bss部分,在这里我们将放置反向字符串:

  • NEW_LINE-新行(\n)符号

  • INPUT-我们的输入字符串,我们将反转它

你可以在这里找到系统调用列表。也有定义:

  • SYS_WRITE-“写入”系统调用号

  • STD_OUT-stdout文件描述符

  • SYS_EXIT-“退出”系统的Syscall号

  • EXIT_CODE-退出代码

这里我们可以看到四个常数:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值