Mixing Assembly and C

In many applications, mixing Assembly and C is routine (pun intended). There are many reasons for it, but, in general, you want to use Assembly when you want to deal with the hardware directly or perform a task with maximum speed and minimum use of resources, while you use C to perform some high level stuffs that don’t attend the former requirements. In either case, you’ll need one integrated system.

There are three ways to mix Assembly and C:

  • Using Assembly-defined functions into C
  • Using C-defined functions into Assembly
  • Using Assembly code in C

We’ll explore them all in this tutorial.

USING ASSEMBLY-DEFINED FUNCTIONS IN C

Let’s first take the example of a function that takes no parameters and doesn’t return anything, like one that just prints something on screen.

hello_world.s

.globl hello_world
.type hello_world, @function
.section .data
message: .ascii "Hello, World!\n"
length: .quad . - message
.section .text
hello_world:
  mov $1, %rax
  mov $1, %rdi
  mov $message, %rsi
  mov length, %rdx
  syscall
  ret

 

(If you don’t quite understand the above syntax, read my previous tutorial)

Now let’s create a C program to call this function:

hello_world.c

extern void hello_world();

int main()
{
  hello_world();
  return 0;
}


Now let’s compile and link our both programs at the same time in order to obtain an executable file:Notice the use of extern keyword. It tells the compiler that the definition of a given function or variable is defined in somewhere else other than the current file. It’s the linker job to connect this declaration with the actual definition.

gcc hello_world.c hello_world.s -o hello_world


return_10.sThat’s all! Pretty easy, right? Now let’s advance to a more challenging scenario: A function that returns a value. As I said on previous tutorial, by convention, Assembly functions return values on AX register. This is also true for C programs. Check out this example:

.globl return_10
.type return_10, @function
return_10:
  movl $10, %eax
  ret


return_10.cThis function only puts the value ‘10’ into the EAX register. Now on C side:

#include <stdio.h>

extern int return_10();

int main()
{
  printf("%d\n", return_10());
}


It’s worth noting that, on Assembly side, I’m moving a two words value into the EAX register. I could move a four words value to the RAX register instead, but it would print 0. Why? Here’s the reason:

As you may know, RAX is the 64 bits version of the AX register, hence it can store 64 bits simultaneously. Those bits are stored from left to right, i.e., let’s suppose we move the decimal value ‘10’ into the RAX register. It would appear that way:

01010000000000…0 (0101 + 60 zeroes).

The EAX holds the 32 most significant bits (the lower half), therefore, if I access this sequence through EAX, I would only see zero values! And this is what the intdatatype is implicitly converted to, since it’s a datatype with size equals to 32 bits. In order to avoid this problem, I should either stick with EAX, EBX… registers or use long int on C side.

Lesson learnt: One must check if the size of registers match the size of types in C.

Now the last scenario: A function that takes parameters and returns a value, like that one that returns the sum of two values:

sum.c

#include <stdio.h>

extern int sum(int, int);

int main()
{
  printf("%d\n", sum(2, 3));
  return 0;
}


sum.asNow the Assembly definition:

.globl sum
.type sum, @function
sum:
  addl %edi, %esi
  movl %esi, %eax
  ret


Here’s the trick: In GCC compiler, instead of the parameters being pushed into the stack by the callee to be read from the calling function, they are stored in registers. It’s the calling function job to push them into the stack if they need to. Those registers are used in the following order:You may be asking: Hey, what’s wrong? Why am I using the edi and esi registers?

  • _di: Holds the first argument
  • _si: Holds the second argument
  • _dx: Holds the third argument
  • _cx: Holds the fourth argument
  • r8d: Holds the fifth argument
  • r9d: Holds the sixth argument

And so on… In the above example, the value 2 is stored in the edi register and the value 3 is stored in the esi register. Therefore, we simply sum them (through the addl instruction) and move the result to eax register.

USING C-DEFINED FUNCTIONS INTO ASSEMBLY

Here’s the first example: Using the printf C function into Assembly:

hello_world.s

.extern printf
.globl main
.section .data
message: .ascii "Hello, World!\n"
format: .ascii "%s"
.section .text
main:
  mov $format, %rdi
  mov $message, %rsi
  mov $0, %rax
  call printf
  ret


Now compile the Assembly program with GCC:

gcc hello_world.s -o hello_world


That is equivalent to the following C program:The GCC will automatically link with the function definition. In the same way we used the extern keyword in C, we use the .extern directive to tell the Assembler that printf is defined externally.

hello_world.c

#include <stdio.h>

int main()
{
  return printf("%s", "Hello, World!\n");
}


The printf in C takes two or more parameters: The format and the value(s). As said previously, the first parameters goes to rdi register while the second parameter goes to rsi register. Note: Before calling the function, the value of rax must be zero!When compiling Assembly programs with GCC, the starting symbol is no longer _start but mainmain is a function, therefore it must have the ret instruction in the end of it.

Our second example is using the scanf function. Like printf, it takes two more parameters: The format and the destinating addresses where the standard input will be stored. Note: The second and so on parameters are no longer values, but memory addresses (pointers).

example_scanf.s

.extern scanf
.globl main
.section .data
a: .double 0
b: .double 0
format: .ascii "%d %d"
.section .text
main:
  mov $format, %rdi
  mov $a, %rsi
  mov $b, %rdx
  mov $0, %rax
  call scanf
  movl a, %eax
  movl b, %ebx
  addl %ebx, %eax
  ret


First, we declare three “variables” in data section:

  • a: A two words (32 bits) region of memory that initially stores the value zero;
  • b: A two words (32 bits) region of memory that initially stores the value zero;
  • format: A region of memory that stores the ASCII string “%d %d”.

We then pass the address of format as first parameter, the address of a as second parameter and the address of b as third parameter. Before calling scanf, we set RAXto 0 (just like in the printf example). After it, we move the value stored in a address to eax register and the value stored in b address to ebx register. We then sum them both and store the result in eax.

After executing the program, if we echo the program execution status:

echo $?


The above example is equivalent to the following C program:We’ll able to see the sum of both typed numbers.

example_scanf.c

#include <stdio.h>

int main()
{
  int a = 0;
  int b = 0;
  char* format = "%d %d";
  scanf(format, &a, &b);
  return a + b;
}

USING ASSEMBLY CODE IN C
Our third category is pretty straight-forward. See the example:

sum.c

#include <stdio.h>

int sum(int a, int b)
{
  asm("addl %edi, %esi");
  asm("movl %esi, %eax");
}

int main()
{
  printf("%d\n", sum(2, 3));
  return 0;
}


Now you can compile it normally:

gcc sum.c -o sum


CONCLUSION

The compiler will simply insert the assembly code in the appropriated place in the compiled code.

We’ve just learnt very very powerful tools! Learning how to mix Assembly and C give us a deep insight of how the C compiler actually works. I strongly recommend this website for further learning. Play with it around, try some snippets, and see how it’s translated into Assembly.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值