Say hello to x86_64 Assembly [part 2]

title: Say hello to x86_64 Assembly [part 2]

date: 2020-01-09 23:40:11

tags:

  • x86

  • x64

  • 汇编

  • assembly


翻译原文地址

Say hello to x86_64 Assembly [part 2]

几天前,我写了第一篇博文-介绍x64汇编-Say hello to x64 Assembly [part 1],出乎我的意料,引起了极大的兴趣:

Some days ago I wrote the first blog post - introduction to x64 assembly - Say hello to x64 Assembly [part 1] which to my surprise caused great interest:

它更激励我去描述我的学习方式。在这段时间里,我从不同的人那里得到了很多反馈。有很多感激的话,但对我来说更重要的是,有很多建议和批评。特别是我想对你的反馈说声谢谢:

It motivates me even more to describe my way of learning. During this days I got many feedback from different people. There were many grateful words, but what is more important for me, there were many advices and adequate critics. Especially I want to say thank you words for great feedback to:

  • Fiennes

  • Grienders

  • nkurz

    以及所有参与Reddit和Hacker News讨论的人。有很多意见,第一部分对初学者来说不是很清楚,这就是为什么我决定写更多信息性的文章。所以,让我们从Say hello to x86_64 assembly的第二部分开始。

    And all who took a part in discussion at Reddit and Hacker News. There were many opinions, that first part was a not very clear for absolute beginner, that’s why i decided to write more informative posts. So, let’s start with second part of Say hello to x86_64 assembly.

    术语和概念

    Terminology and Concepts

    正如我在上面写的,我从不同的人那里得到了很多反馈,第一篇文章的某些部分并不是很清楚,这就是为什么让我们从描述一些术语开始,我们将在这一部分和下一部分看到这些术语。

    As i wrote above, I got many feedback from different people that some parts of first post are not clear, that’s why let’s start from description of some terminology that we will see in this and next parts.

    寄存器-寄存器是处理器内部的小批量的存储。处理器的核心是数据处理。处理器可以从内存中获取数据,但运行缓慢。这就是为什么处理器有自己的内部受限制的数据存储集,其名称叫-register(寄存器)。

    Register - register is a small amount of storage inside processor. Main point of processor is data processing. Processor can get data from memory, but it is slow operation. That’s why processor has own internal restricted set of data storage which name is - register.

    小端-我们可以把内存想象成一个大数组。它包含字节。每个地址存储内存“array”的一个元素。每个元素都是一个字节。例如,我们有4个字节:AA 56 AB FF。在小端数据中,最低有效字节的地址最小:

    Little-endian - we can imagine memory as one large array. It contains bytes. Each address stores one element of the memory “array”. Each element is one byte. For example we have 4 bytes: AA 56 AB FF. In little-endian the least significant byte has the smallest address:

        0 FF
        1 AB
        2 56
        3 AA
        0 AA
        1 56
        2 AB
        3 FF
    section .data
        num1:   equ 100
        num2:   equ 50
        msg:    db "Sum is correct", 10
    ;; Initialize 4 bytes 1h, 2h, 3h, 4h
    db 0x01,0x02,0x03,0x04
    ​
    ;; Initialize word to 0x12 0x34
    dw    0x1234
    ;; now one is 1
    one equ 1
    ;; compare rax with 50
    cmp rax, 50
    if (rax != 50) {
        exit();
    } else {
        right();
    }
    ;; compare rax with 50
    cmp rax, 50
    ;; perform .exit if rax is not equal 50
    jne .exit
    jmp .right
    JMP label
    _start:
        ;; ....
        ;; do something and jump to .exit label
        ;; ....
        jmp .exit
    ​
    .exit:
        mov    rax, 60
        mov    rdi, 0
        syscall
    section .data
        ; Define constants
        num1:   equ 100
        num2:   equ 50
        ; initialize message
        msg:    db "Sum is correct\n"
    ​
    section .text
    ​
        global _start
    ​
    ;; entry point
    _start:
        ; set num1's value to rax
        mov rax, num1
        ; set num2's value to rbx
        mov rbx, num2
        ; get sum of rax and rbx, and store it's value in rax
        add rax, rbx
        ; compare rax and 150
        cmp rax, 150
        ; go to .exit label if rax and 150 are not equal
        jne .exit
        ; go to .rightSum label if rax and 150 are equal
        jmp .rightSum
    ​
    ; Print message that sum is correct
    .rightSum:
        ;; write syscall
        mov     rax, 1
        ;; file descritor, standard output
        mov     rdi, 1
        ;; message address
        mov     rsi, msg
        ;; length of message
        mov     rdx, 15
        ;; call write syscall
        syscall
        ; exit from program
        jmp .exit
    ​
    ; exit procedure
    .exit:
        ; exit syscall
        mov    rax, 60
        ; exit code
        mov    rdi, 0
        ; call exit syscall
        syscall

    Now we have two labels: .exit and .rightSum. First is just sets 60 to rax, it is exit system call number, and 0 to rdi, it is a exit code. Second is .rightSum is pretty easy, it just prints Sum is correct.

    现在我们有两个标签:.exit和.rightSum。首先是设置60到rax,它是退出系统的调用码,0是rdi,它是一个退出码。第二个是.rightSum ,非常简单,它只是打印出"Sum is correct"。

    Ok we have num1 which is 100 and num2 which is 50. Our sum must be 150. Let’s check it with cmp instruction. After comparison rax and 150 we check result of comparison, if rax and 150 are not equal (checking it with jne) we go to .exit label, if they are equal we go to .rightSum label.

    好的,我们有值为100的num1和值为50的num2。我们的总数必须是150。让我们用cmp指令检查一下。在比较RAX和150之后,我们检查比较的结果,如果RAX和150不相等(用JNE检查),我们去.exit标签,如果它们相等,我们去.rightSum标签。

    Let’s go through the source code. First of all there is data section with two constants num1, num2 and variable msg with “Sum is correct\n” value. Now look at 14 line. There is begin of program’s entry point. We transfer num1 and num2 values to general purpose registers rax and rbx. Sum it with add instruction. After execution of add instruction, it calculates sum of values from rax and rbx and store it’s value to rax. Now we have sum of num1 and num2 in the rax register.

    让我们看看源代码。首先有两个常数num1、num2和变量msg的数据段,其值为“Sum is correct\n”。现在看14行。有程序开始的入口点。我们将num1和num2值传输到通用寄存器rax和rbx。用加法指令求和。在执行add指令后,它计算rax和rbx的值之和,并将其值存储到rax。现在我们有了rax寄存器中num1和num2的和。

    Let’s see simple example. It will take two integer numbers, get sum of these numbers and compare it with predefined number. If predefined number is equal to sum, it will print something on the screen, if not - just exit. Here is the source code of our example:

    让我们看一个简单的例子。它将取两个整数,得到这些数的和,并与预定义的数进行比较。如果预定义的数等于两个整数的和,那么它将在屏幕上打印一些东西,如果不是的话,只需退出即可。下面是我们示例的源代码:

    Example

    例子

    Often unconditional jump uses in loops. For example we have label and some code after it. This code executes anything, than we have condition and jump to the start of this code if condition is not successfully. Loops will be covered in next parts.

    在循环中经常使用无条件跳转。例如,我们有标签和一些代码。此代码执行的任何操作,如果条件不成功,则跳转到此代码的开头。循环将在下一部分中介绍。

    Here we have can have some code which will be after _start label, and all of this code will be executed, assembly transfer control to .exit label, and code after .exit: will start to execute.

    在这里我们可以有一些代码在开始标签之后,所有这些代码将被执行,汇编转移控制指令到.Excel标签和代码之后。退出代码将开始执行。

    例子

    There is also unconditional jump with syntax:

    还有无条件跳转语法:

    will be in assembly:

    在汇编中:

    For example if we want to make something like if/else statement in C:

    例如,如果我们想在C中生成类似if/else的语句:

    • JE - if equal

    • JZ - if zero

    • JNE - if not equal

    • JNZ - if not zero

    • JG - if first operand is greater than second

    • JGE - if first operand is greater or equal to second

    • JA - the same that JG, but performs unsigned comparison

    • JAE - the same that JGE, but performs unsigned comparison

     

    • JE - 如果相等

    • JZ - 如果为零

    • JNE - 如果不相等

    • JNZ - 如果不为零

    • JG - 如果第一个操作数大于第二个操作数

    • JGE - 如果第一个操作数大于或等于第二个操作数

    • JA - 与JG相同,但执行无符号比较

    • JAE - 与JGE相同,但执行无符号比较

    The cmp instruction just compares 2 values, but doesn’t affect them and doesn’t execute anything depend on result of comparison. For performing any actions after comparison there is conditional jump instructions. It can be one of it:

    cmp指令只比较两个值,但不会影响它们,也不会根据比较结果执行任何操作。对于比较后执行的任何操作,都有条件跳转指令。它可以是下面的其中之一:

    Usually programming languages have ability to change order of evaluation (with if statement, case statement, goto and etc…) and assembly has it too. Here we will see some of it. There is cmp instruction for performing comparison between two values. It is used along with the conditional jump instruction for decision making. For example:

    通常编程语言能够改变求值顺序(使用if语句、case语句、goto等),而汇编语言也有这种能力。在这里我们将看到一些。cmp指令用于执行两个值之间的比较。它与条件跳转指令一起用于决策。例如:

    Control flow

    控制流

    Some of it we will see at practice in this post. Other will be covered in next posts.

    • ADD - integer add

    • SUB - substract

    • MUL - unsigned multiply

    • IMUL - signed multiply

    • DIV - unsigned divide

    • IDIV - signed divide

    • INC - increment

    • DEC - decrement

    • NEG - negate

    There is short list of arithmetic instructions:

    我们将在这篇文章的练习中看到其中的一些。其他将在下一篇文章中讨论。

    • ADD - 整数加法

    • SUB - 减法

    • MUL - 无符号乘法

    • IMUL - 有符号乘法

    • DIV - 无符号除法

    • IDIV - 有符号除法

    • INC - 递增

    • DEC - 递减

    • NEG - 求补运算

    下面是算术指令的简短列表:

    Arithmetic operations

    算术运算

    TIMES - Repeating Instructions or Data. (description will be in next posts)

    TIMES - 重复的指令或数据。(描述将在下一篇文章中)

    • RESB, RESW, RESD, RESQ, REST, RESO, RESY and RESZ -用于声明未初始化的变量

    • INCBIN - 包含外部二进制文件

    • EQU - 定义常量。例如:

    • RESB, RESW, RESD, RESQ, REST, RESO, RESY and RESZ - are used for declaring non initialized variables

    • INCBIN - includes External Binary Files

    • EQU - defines constant. For example:

    • DB、DW、DD、DQ、DT、DO、DY和DZ-用于声明初始化数据。例如:

    • DB, DW, DD, DQ, DT, DO, DY and DZ - are used for declaring initialized data. For example:

    Ok, it is almost all clear here. 3 constants with name num1, num2, msg and with values 100, 50 and “Sum is correct”, 10. But what is it db, equ? Actual NASM supports a number of pseudo-instructions:

    好的,这里几乎都清楚了。3个名为num1、num2、msg的常量,值为100、50和“Sum is correct”,10。但是什么是db,equ?实际是NASM支持许多伪指令之一:

    As i wrote above, every assembly program consists from sections, it can be data section, text section and bss section. Let’s look on data section.It’s main point - to declare initialized constants. For example:

    如上所述,每个汇编程序都由段组成,可以是数据部分、文本部分和bss部分。让我们看看数据部分,它是声明初始化常量的要地点。例如:

    Sections

    Now we will work only with integer numbers, so let’s see to it. There two types of integer: unsigned and signed. Unsigned integers are unsigned binary numbers contained in a byte, word, doubleword, and quadword. Their values range from 0 to 255 for an unsigned byte integer, from 0 to 65,535 for an unsigned word integer, from 0 to 2^32 – 1 for an unsigned doubleword integer, and from 0 to 2^64 – 1 for an unsigned quadword integer. Signed integers are signed binary numbers held as unsigned in a byte, word and etc… The sign bit is set for negative integers and cleared for positive integers and zero. Integer values range from –128 to +127 for a byte integer, from –32,768 to +32,767 for a word integer,from –2^31 to +2^31 – 1 for a doubleword integer, and from –2^63 to +2^63 – 1 for a quadword integer.

    现在我们只处理整数,所以让我们来看看。整数有两种类型:无符号整数和有符号整数。无符号整数是包含在字节、字、双字和四字中的无符号二进制数。无符号字节整数的值范围为0到255,无符号字整数的值范围为0到65535,无符号双字整数的值范围为0到2^32–1,无符号四字整数的值范围为0到2^64–1。有符号整数是在字节、字等中作为无符号保存的有符号二进制数。符号位设置为负整数,清除为正整数和零。整数值的范围从-128到+127(字节整数),从-32768到+32767(字整数),从-2^31到+2^31–1(双字整数),从-2^63到+2^63–1(四字整数)。

    The fundamental data types are bytes, words, doublewords, quadwords, and double quadwords. A byte is eight bits, a word is 2 bytes, a doubleword is 4 bytes, a quadword is 8 bytes and a double quadword is 16 bytes (128 bits).

    基本数据类型是字节、字、双字、四字和双四字。一个字节是8比特位,一个字是2个字节位,一个双字是4个字节位,一个四字是8个字节位,一个双四字是16个字节位(128位)。

    Data Types

    数据类型

    General-purpose registers - there are 16 general-purpose registers - rax, rbx, rcx, rdx, rbp, rsp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15. Of course, it is not a full list of terms and concepts which related with assembly programming. If we will meet another strange and unfamiliar words in next blog posts, there will be explanation of this words.

    通用寄存器有16个-rax、rbx、rcx、rdx、rbp、rsp、rsi、rdi、r8、r9、r10、r11、r12、r13、r14、r15。当然,它并不是一个完整的与汇编编程相关的术语和概念列表。如果我们在下一篇博文中遇到另一个陌生和不熟悉的词,会有对这些词的解释。

    • data - section is used for declaring initialized data or constants

    • bss - section is used for declaring non initialized variables

    • text - section is used for code

    Section - every assembly program consists from sections. There are following sections:

    • Text-段用于代码

    • bss-段用于声明未初始化的变量

    • data-段用于声明初始化的数据或常量

    节-每个汇编程序都由节组成。有以下部分:

    Stack - processor has a very restricted count of registers. So stack is a continuous area of memory addressable special registers RSP,SS,RIP and etc. We will take a closer look on stack in next parts.

    Syscall - is the way a user level program asks the operating system to do something for it. You can find syscall table - here.

    Syscall是一个用户级程序要求操作系统为它做一些事情的方式。您可以在这里找到syscall表。 堆栈 - 处理器的寄存器数量非常有限,所以堆栈是内存可寻址的特殊寄存器RSP、SS、RIP等组成的一个连续区域。我们将在下一部分对堆栈进行更深入的研究。

    Big-endian - big-endian stores bytes in opposite order than little-endian. So if we have AA 56 AB FF bytes sequence it will be:

    where 0,1,2 and 3 are memory addresses.

    其中0、1、2和3是内存地址。 大端-大端以与小端相反的顺序存储字节。所以如果我们有一个56 AB FF字节的序列,它将是:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值