iOS Assembly Tutorial: Understanding ARM
When you write Objective-C code, it eventually turns into machine code – the raw 1s and 0s that the ARM CPU understands. In between machine code and machine code, though, is the still human-readable assembly language.
Understanding assembly gives you insight into your code for debugging and optimizing, helps you decipher the Objective-C runtime, and also satisfies that inner nerd curiosity.
In this iOS assembly tutorial, you’ll learn:
- What assembly is – and why you should care about it.
- How to read assembly – in particular, the assembly generated for Objective-C methods.
- How to use the assembly view while debugging – useful to see what is going on and why a bug or crash has occurred.
To get the most out of this tutorial, you should already be familiar with Objective-C programming. You should also understand some simple computer science concepts such as the stack, the CPU and how they work. If you are not at all familiar with CPUs, you might need a little pre-reading before continuing.
Warm up your copy of Xcode 4 and get ready to dive into the innards of ARM!
Getting Started: What is Assembly?
Objective-C is what is known as a high-level language. Your Objective-C code is compiled by a compiler into assembly language: low-level, but still not the lowest level.
This assembly is then assembled by an assembler (say that three times fast!) into machine code, the raw 1s and 0s that the CPU reads. Fortunately, you don’t ever need to worry about machine code, but understanding assembly in detail is sometimes extremely useful.
Each assembly instruction is designed to tell the CPU to perform a task such as “add these two numbers” or “load the contents of this portion of memory.”
Aside from main memory – the 1GB on an iPhone 5 or the 8GB you might have on a Mac, for example – CPUs also have a little bit of working memory that can be accessed very quickly. This working memory is divided up into registers, which are like variables that can hold a single value.
All iOS devices (in fact, pretty much all mobile devices out there these days) use CPUs based on the ARM architecture. Fortunately, this is a fairly easy-to-read instruction set, not least because it’s what is known as RISC (Reduced Instruction Set Computing), meaning that there are fewer instructions available. It is much easier to read than x86, anyway!
An assembly instruction (or statement) looks something like this:
mov r0, #42 |
There are many commands, or opcodes, in assembly language. One of them, mov
, moves data around. In ARM assembly, the destination comes first, so the above instruction moves the value 42 into register r0
. Consider this next example:
ldr r2, [r0] ldr r3, [r1] add r4, r2, r3 |
Don’t worry, I’m not expecting you to understand what that means straight away. But you might be able to roughly figure out what’s going on. The instruction is loading two values from memory and storing them in registers 2 and 3, then adding the numbers together and storing the result in register 4.
Now that you’ve seen it’s not so intimidating, let’s get a little more detailed.
Calling Conventions
The first and most important thing to understand about reading assembly is the way in which code interacts with other code. By this, I mean the way functions “call” other functions. This includes how parameters are passed to functions and how values are returned from functions.
The way these things are done make up what is known as the calling convention. Compilers must stick to this defined standard such that code compiled with one compiler can interact with code compiled with a different compiler. Without this standard, compilers could generate incompatible code.
As discussed above, registers are bits of memory very close to the CPU that are used to hold the data currently being acted upon. ARM CPUs contain 16 registers numbered r0
to r15
, each of which are 32 bits wide. The calling convention dictates that some of these registers have a special purpose. They are as follows:
r0 - r3
: These hold parameters passed to a function.r4 - r11
: These hold a function’s local variables.r12
: This is the intra-procedure-call scratch register. This register is special in that it can be changed across a function call.r13
: The stack pointer. The stack is a very important concept in computer science. This register holds a pointer to the top of the stack. See Wikipedia for more information about stacks.r14
: The link register. This holds the address of the next instruction to execute when returning from the current function.r15
: The program counter. This hold the address of the currently executing instruction. It is automatically incremented after each instruction is executed.
You can read more about the ARM calling convention in this document from ARM. Apple also has a document outlining further details about the calling convention used for iOS development.
Right, enough etiquette – time to get started with some real coding!
Creating the Project
In this iOS assembly tutorial, you won’t create an app, but you’ll still use an Xcode project to illustrate what’s going on. Start Xcode and go to File\New\New Project, select iOS\Application\Single View Application and click Next. Set up the project like so:
- Product name: ARMAssembly
- Company Identifier: Your usual reverse DNS identifier
- Class Prefix: Leave blank
- Devices: iPhone
- Use Storyboards: No
- Use Automatic Reference Counting: Yes
- Include Unit Tests: No
Click Next and finally, choose a location to save your project.
One Plus One
The first thing you’ll do is look at a very simple function that adds two numbers and returns the result. You can’t get much easier than that!
Actually, you can, by starting with a simple C function, because Objective-C adds a little more complexity. Open main.m in the project’s Supporting Files folder and paste the following function at the top of the file:
|
Now make sure that the scheme is set to build for a device by selecting iOS Device as the scheme target (or it may say <Your_Device_Name>, such as “Matt Galloway’s iPhone 5”, if you have a device plugged in). You want to build for a device so that the assembly generated is ARM, rather than the x86 that the Simulator uses. The scheme selector in Xcode should look like this:
Now go to Product\Generate Output\Assembly File. After some thinking time, Xcode should land you with a file that contains a lot of strange-looking lines. At the top, you’ll see a lot of lines starting with.section
. That means you’ve got the right thing! Now select Running from the Show Assembly Output For selector.
Note: You are selecting the Running scheme because by default, it uses the debug scheme settings. In debug mode, absolutely no optimizations are done by the compiler. You want to see the assembly without optimizations at first, so that you can see exactly what’s happening.
Search in the generated file for _addFunction
. You should find something that looks like the following:
.globl _addFunction .align 2 .code 16 @ @addFunction .thumb_func _addFunction _addFunction: .cfi_startproc Lfunc_begin0: .loc 1 13 0 @ main.m:13:0 @ BB#0: sub sp, #12 str r0, [sp, #8] str r1, [sp, #4] .loc 1 14 18 prologue_end @ main.m:14:18 Ltmp0: ldr r0, [sp, #8] ldr r1, [sp, #4] add r0, r1 str r0, [sp] .loc 1 15 5 @ main.m:15:5 ldr r0, [sp] add sp, #12 bx lr Ltmp1: Lfunc_end0: .cfi_endproc |
That may look a bit daunting, but it’s really not that hard to read what’s happening. First, all the lines that begin with a period are not assembly instructions but commands to the assembler itself. You can ignore all of those for now.
The lines that end with a colon, such as _addFunction:
and Ltmp0:
, are known as labels. These give names to parts of the assembly. The label called _addFunction:
is, in fact, the entry point to the function.
This label is required so that other code can call the addFunction
routine without having to know exactly where it is, simply by giving the symbolic name, or label. It is the linker’s job to then convert this label into the actual memory address when the final app binary is generated.
Note that the compiler always adds an underscore to the front of function names – this is purely a convention. The other labels all begin with L
. These are known as local labels and are only used within the function itself. In this simple example, none of the local labels are actually used but the compiler still generates them, because it is not performing any optimizations at all.
Comments start with the @
character. Note that the compiler helpfully maps sections of assembly with their corresponding line number in main.c.
So, ignoring comments and labels, the important bits are as follows:
_addFunction: @ 1: sub sp, #12 @ 2: str r0, [sp, #8] str r1, [sp, #4] @ 3: ldr r0, [sp, #8] ldr r1, [sp, #4] @ 4: add r0, r1 @ 5: str r0, [sp] ldr r0, [sp] @ 6: add sp, #12 @ 7: bx lr |
And this is what each part of that is doing:
- First, room on the stack is created for any temporary storage. The stack is a big blob of memory that functions can use as they wish. The stack in ARM extends downward, meaning to create some space on it, you must subtract (
sub
) from the stack pointer. In this case, 12 bytes are reserved. r0
andr1
hold the values passed to the function. If the function took four parameters, thenr2
andr3
would hold the third and fourth parameters. If the function took more than four parameters, or took parameters that don’t fit into 32-bit registers such as larges structures, then parameters could be passed via the stack.Here, the two parameters are saved to the stack. This is achieved by the store register (
str
) instruction. The first parameter is the register to store and the second parameter is the address at which to store it. The square brackets indicate that the value is a memory address.The instruction allows you to specify an offset to apply to the value, so
[sp, #8]
means to store at “the address held in the stack pointer register, plus 8.” Likewise,str r0, [sp, #8]
means “store the contents of register 0 into the memory address of stack pointer, plus 8.”- The values just saved to the stack are read back out into the same registers they were in already. As an opposite of the
str
instruction,ldr
(load register) loads data from a memory location into a register. The syntax is very similar. Soldr r0, [sp, #8]
means “load the contents at the memory address of stack pointer plus 8 and put the value into register 0.”If you’re wondering why
r0
andr1
are being stored and then immediately reloaded, the answer is: yes, these two lines along with the two above are redundant! If the compiler were allowed to perform even basic optimizations, then this redundancy would be eliminated. - This is the most important instruction of the function, and performs the addition. It means add the contents of
r0
andr1
and put the result back intor0
.The
add
instruction can either take two parameters like this, or three. If three are given, then the first is the destination register and the remaining two are the source registers. So the instruction here could instead have been written asadd r0, r0, r1
. - Once again, the compiler has generated some redundant code where the result of the addition is stored to the stack and immediately read back out.
- The function is about to terminate, so the stack pointer is put back to where it was originally. The function started by subtracting 12 from
sp
to reserve 12 bytes. Now it adds the 12 back. Functions must ensure they balance any stack pointer operations, otherwise the stack pointer would drift, eventually overrunning the allocated stack space. You really don’t want to do that… - Finally, the branch indirect instruction
bx
is executed to go back to the calling function. Recall that the registerlr
is the “link register” which holds the next instruction to execute in the function that called the current function. Notice that after theaddFunction
routine returns,r0
will hold the result of the addition. This is another part of the calling convention. The return value from a function will always be inr0
. That is, unless it can’t fit into a single register, at which pointr1
-r3
can also be used.
That wasn’t all that complicated, was it? To get more information about each of these instructions, see the instruction set chart found on the ARM website.
You saw that much of the above function is redundant. As stated, this is because the compiler is in debug mode, meaning no optimizations are made. If you turn optimizations on, then you’ll see a much smaller function generated.
Change the Show Assembly Output For selector to Archiving. Now search for _addFunction:
again and you’ll see the following (only instructions shown):
_addFunction: add r0, r1 bx lr |
That is much more concise! Notice how that add function can be done with just two instructions. You might not have expected that a function could be just two instructions, but there you have it. Of course, your own functions are likely to be much longer and do more interesting things. :]
Now you have a function that ends with a branch back to the caller. What about the other half of the equation, the part where the function gets called?
Calling the Function
First you need to add an attribute to the addFunction
routine to indicate to the compiler not to perform a certain optimization. You’ve already seen how the compiler can optimize the code to remove unneeded instructions, but it can even remove function calls entirely and put the function code directly inline.
For example, the compiler might put the appropriate add
instructions rather than call addFunction
itself. In fact, compilers are so sophisticated these days that for a function like addFunction
, it could perform the addition itself and never insert an add
instruction at all!
For this tutorial, you don’t want the compiler to optimize and “inline” the function. Go back to the main.mfile in the project and make addFunction
look like this:
__attribute__((noinline)) int addFunction(int a, int b) { int c = a + b; return c; } |
Now add another function below it that looks like this:
void fooFunction() { int add = addFunction(12, 44); printf("add = %i", add); } |
fooFunction
simply computes 123 + 456
by calling addFunction
and then prints the result. I’ve used the C function printf
rather than NSLog
again to avoid Objective-C, which complicates things a little.
Select Product\Generate Output\Assembly File once again and make sure Archiving is the output setting. Then search for _fooFunction
, at which point you should see something like the following:
_fooFunction: @ 1: push {r7, lr} @ 2: movs r0, #12 movs r1, #34 @ 3: mov r7, sp @ 4: bl _addFunction @ 5: mov r1, r0 @ 6: movw r0, :lower16:(L_.str-(LPC1_0+4)) movt r0, :upper16:(L_.str-(LPC1_0+4)) LPC1_0: add r0, pc @ 7: blx _printf @ 8: pop {r7, pc} |
This introduces some new instructions that this tutorial hasn’t covered yet, but don’t worry, they’re not complicated. Here goes:
- This instruction does a similar thing to the
add sp, #12
that you saw previously. This time,r7
andlr
are “pushed” onto the stack, meaning that the stack pointer is decremented by 8, since bothr7
andlr
are 4 bytes. Note that the stack pointer is decremented and the two values are stored with the one instruction!r7
is stored because it will be overwritten by this function and needs to be restored later;lr
is stored for a reason that will become apparent at the end of the function. - These two instructions are part of the move (
mov
) family. Sometimes you’ll seemovs
, sometimesmov.w
, sometimes other things, but they all load a register with a value. You can “mov
” data from one register to another, somov r0, r1
will loadr0
with the contents ofr1
, leavingr1
unchanged.In the two lines in the above assembly,
r0
andr1
are loaded with the constants as defined in the function. Notice that they are being loaded intor0
andr1
such that they are in the right place for callingaddFunction
. - The stack pointer should be saved across function call boundaries, so
r7
, one of the registers available for local variables, is used. You’ll note that the rest of the function doesn’t ever use the stack pointer orr7
again, so this is slightly redundant. Sometimes even with optimizations turned on there are inefficiencies! - This instruction,
bl
, performs the function call. Remember that the parameters to the function have been put in the relevant registers,r0
andr1
. Now this instruction performs what is known as a branch. Since this is abl
and not simply ab
, a “branch with link” is performed, which means that before the branch, the link register,lr
, is set to the next instruction in the current function. Recall that when returning from a function,lr
is used to know where to go. - This is the point to which the branch to
addFunction
returns, after it does the hard work of adding the two numbers. Remember that return values of functions are stored inr0
. This value is required as the second parameter of theprintf
call, so amov
is used to bring this tor1
. - The first parameter to the
printf
call is a string. These three instructions load a pointer to the start of the required string intor0
. The string is stored in what is known as the “data segment” of the binary. But exactly where it will be is not known until the final binary is linked.The string is initially found in the data segment of the object file created from main.m. If you search in the assembly for
L_.str
, you’ll find it. The first two instructions in this trio load the address of this constant, minus the address of the local label,LPC1_0
plus 4.The reason for doing this little dance becomes apparent with the third instruction. This adds the program counter to that value. So
r0
now holds the address of the string but will work no matter whereL_.str
ends up in the final binary.The diagram below illustrates the memory layout. The difference
L_.str - (LPC1_0 + 4)
is free to change without the code loadingr0
changing. - This instruction performs the call to
printf
. This is slightly different than the otherbl
instruction, in that it isblx
. The x here stands for “exchange”, meaning that if required, the processor will switch modes.This is slightly beyond the scope of this tutorial, but modern ARM processors have two modes: ARM and Thumb. Thumb instructions are 16-bits wide whereas ARM instructions are 32-bits wide. There are fewer Thumb instructions, but using them often means smaller code size and better CPU caching.
You can usually get the benefit of smaller code size with the limited Thumb instruction set. You can read more about Thumb on Wikipedia.
- This final instruction pops back off the stack the values that were pushed on in the first instruction. The registers in the list this time are filled with the values from the stack and then the stack pointer is incremented. Recall that
r7
andlr
were pushed onto the stack, so why are those saved values restored and popped back tor7
andpc
rather thanr7
andlr
?Well, remember that
lr
contains the address of the next instruction to execute when returning from a function. So if you pop that value into the program counter, execution will continue from the place from which this function was called. This is often how return from a function is achieved, instead of a branch as seen in the assembly foraddFunction
.
That is a very brief overview of some ARM instructions. There are many more instructions, but the ones shown here are the most important to understand initially. Here’s a quick recap of what they do, along with pseudo-code or a description:
mov r0, r1
=>r0 = r1
mov r0, #10
=>r0 = 10
ldr r0, [sp]
=>r0 = *sp
str r0, [sp]
=>*sp = r0
add r0, r1, r2
=>r0 = r1 + r2
add r0, r1
=>r0 = r0 + r1
push {r0, r1, r2}
=> Pushr0
,r1
andr2
onto the stack.pop {r0, r1, r2}
=> Pop three values off the stack, putting them intor0
,r1
andr2
.b _label
=>pc = _label
bl _label
=>lr = pc + 4; pc = _label
Wahoo! Now you can read some ARM assembly!
Objective-C Assembly
Up until now, the functions you’ve seen have been C functions. Objective-C adds a bit more complexity on top, and let’s examine that now. Open ViewController.m and add the following method inside the class implementation:
|
Once again, go to Product\Generate Output\Assembly File to view the assembly. Make sure Archivingis set for the output type, then search for addValue:toValue:
and find the assembly that looks like this:
"-[ViewController addValue:toValue:]": adds r0, r3, r2 bx lr |
The first thing you’ll notice is the label name. This time the name is a string that contains the class name and the full Objective-C method name.
If you look back at the assembly for addFunction
and compare, then you’ll also notice is that the two values added together are in r2
and r3
rather than r0
and r1
. That must mean that the two parameters to the method are in r2
and r3
. Why is that?
Well, it’s because all Objective-C methods are really just C functions with two implicit parameters passed before the rest of the method’s parameters. The addValue:toValue:
method is semantically equivalent to the following C function:
int ViewController_addValue_toValue(id self, SEL _cmd, int a, int b) { int c = a + b; return c; } |
This is why the parameters a
and b
appear in r2
and r3
, respectively. You are probably already aware of the first of the two implicit parameters. You make use of self
all the time.
However, _cmd
is something you might not have seen before. Like self
, it is available inside all Objective-C methods and contains the selector of the currently-executing method. You generally never need to access this, though, which is why you may not have ever heard of it!
To see how Objective-C methods are called, add the following method to ViewController
:
|
Generate the assembly file again and find this method. You should see the following:
"-[ViewController foo]": @ 1: push {r7, lr} @ 2: movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4)) movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4)) LPC1_0: add r1, pc @ 3: ldr r1, [r1] @ 4: movs r2, #12 movs r3, #34 @ 5: mov r7, sp @ 6: blx _objc_msgSend @ 7: mov r1, r0 @ 8: movw r0, :lower16:(L__unnamed_cfstring_-(LPC1_1+4)) movt r0, :upper16:(L__unnamed_cfstring_-(LPC1_1+4)) LPC1_1: add r0, pc @ 9: blx _NSLog @ 10: pop {r7, pc} |
Once again, this is extremely similar to the plain C equivalent you saw earlier. Breaking it down, it does this:
- Push
r7
andlr
onto the stack. - Load the value at the label
L_OBJC_SELECTOR_REFERENCES_
intor1
using the same program counter relative addressing as seen earlier. The name gives away what this is. It is a reference to a selector. Selectors are just strings, really, stored in the same way in the data segment. - If you look up what
L_OBJC_SELECTOR_REFERENCES_
is in the assembly, you’ll see the following:L_OBJC_SELECTOR_REFERENCES_: .long L_OBJC_METH_VAR_NAME_
This means that
L_OBJC_SELECTOR_REFERENCES_
, and hencer1
at this point, contain the address of the labelL_OBJC_METH_VAR_NAME_
. If you look at that label, you’ll find the stringaddValue:toValue:
.This instruction,
ldr r1, [r1]
, is loading the value stored at the memory address held inr1
and putting the value back intor1
. It is “dereferencing”r1
. In pseudo-C-code this looks like:r1 = *r1
. If you think about it carefully, this means thatr1
will now contain a pointer to theaddValue:toValue:
string. - Load the constants into
r2
andr3
. - Save the stack pointer.
- Branch, with link and exchange, to
objc_msgSend
. This is the function that is at the heart of the Objective-C runtime. It calls the implementation associated with the required selector.The parameters are the same as those that eventually get passed to the method. So
r0
isself
,r1
is_cmd
,r2
andr3
are the remaining parameters. This is why the selector is loaded intor1
and the parameters to pass are loaded intor2
andr3
.r0
is not explicitly loaded because it already holds the correctself
variable. - The result of the call to
addValue:toValue:
at this point is, as usual, inr0
. This instruction moves the value intor1
, since that’s where it’ll need to be for the call toNSLog
, a C function. - This loads a pointer to the string parameter to
NSLog
intor0
, just as in theprintf
call in the C function example. - Branch, with link and exchange to the
NSLog
function implementation. - Two values are popped from the stack, one into
r7
and one into the program counter. Just like before, this will perform the return from thefoo
method.
As you can see, there’s not all that much difference between plain C and Objective-C when it comes to the generated assembly. The extra things for which to be on the lookout are the implicit two parameters passed to a method implementation, and selectors being referenced by strings in the data segment.
Obj-C Msg Send What to the Who?
You saw above the function objc_msgSend
made an appearance. You have probably seen this before in crash logs. This function is at the core of the Objective-C runtime. The runtime is the code that glues together an Objective-C application, including all the memory management methods and handling of classes.
Every time an Objective-C method is called, objc_msgSend
is the C function that handles the message dispatching. It looks up the implementation for the method that’s been called by inspecting the type of object being messaged and finding the implementation for the method in the class’s method list. The signature for objc_msgSend
looks like this:
id objc_msgSend(id self, SEL _cmd, ...) |
The first parameter is the object that will be self
during the method’s execution. So when you write something like self.someProperty
, this is where the self
is coming from.
The second parameter is a lesser-known, hidden parameter. Try it for yourself: write something likeNSLog(@"%@", NSStringFromSelector(_cmd));
in an Objective-C method and you’ll see the current selector printed out. Neat, eh?
The remaining parameters are the parameters to the method itself. So a method that takes two parameters, like addValue:toValue:
above, takes two extra parameters. Therefore, instead of calling it via Objective-C, you could, in fact, do the following:
|
Note: The return type of objc_msgSend
is id
but it has been cast to an int
. This is fine because the size of each is the same. If the method returns something of a different size, then it’s actually another method that gets called. You can read more about that here. Similarly, if the return type is floating point, another variant of objc_msgSend
gets called.
Recall from above that the C-equivalent function that gets created when an Objective-C method is compiled has a signature that looks like this:
int ViewController_addValue_toValue(id self, SEL _cmd, int a, int b) |
It should now be no surprise as to why that is. Notice that the signature matches objc_msgSend
! That means that all the parameters will already be in the right place for when objc_msgSend
finds the implementation for a method and jumps to it.
You can read more about objc_msgSend
in these excellent posts.
Reverse Engineer, You Can Now
With just that little knowledge of ARM assembly, you should be able to get a feel for why something is breaking, crashing or not working correctly. Why might you want to drop down to looking at the assembly? Because you can interrogate in much more detail and see precisely the steps that led up to a bug.
Sometimes, you don’t have the source to look at – for example, if you’re experiencing a crash inside a third-party library or a system framework. Being able to investigate via the assembly can lead to finding the problem quickly. The iOS SDK ships with all frameworks in the following directory:
|
To investigate these libraries, I recommend purchasing HopperApp, which will disassemble a binary so you can take a look at it. There’s nothing wrong with doing this! For example, opening up UIKit, you can take a look at what each method does. This is what it looks like:
This is the assembly for the -[UINavigationController shouldAutorotateToInterfaceOrientation]
method. With your newfound ARM assembly knowledge, you should be able to work out what’s going on here.
First a selector reference is being loaded into r1
, ready for the call to objc_msgSend
. Then notice that no other registers are touched, so the self
pointer passed to objc_msgSend
in r0
is the same as the one passed into shouldAutorotateToInterfaceOrientation
.
Also, you know that the method being called takes one parameter, as there is one colon in its name. Sincer2
is left untouched, the first parameter passed into shouldAutorotateToInterfaceOrientation
is what gets passed.
Finally, after the method call, r0
is untouched, so the value that gets returned from this method is the return value of the method call.
So you can deduce that this method is performing the following:
|
Wow! That was easy! Often the logic in a method is a bit more complicated than that, but usually you can piece things together and quickly work out what a certain portion of code is doing.
Where to Go from Here?
This iOS assembly tutorial has given you some insight into the core concepts of ARM, as used in code running on iOS devices. You have learned about calling conventions for C and Objective-C.
ARMed with this knowledge (pun intended!), you have the tools to start understanding all those random codes you see when your app crashes deep in a system library. Or maybe you just want to drop down to the assembly for your own methods so you can see exactly what is going on.
If you’re interested in diving deeper into ARM, I recommend purchasing a Raspberry Pi. These little devices have an ARM processor very similar to those found in iOS devices and there are many tutorials out there that will teach you how to program them.
Another thing to look into is NEON. This is an additional set of instructions available in all processors from the iPhone 3GS onward. It provides SIMD (Single Instruction Multiple Data) instructions that allow you to process data extremely efficiently. Applications for these instructions are things like manipulating images. If you need to do this efficiently, then it could be beneficial to learn how to write NEON instructions directly, using inline assembly. This is extremely advanced, though!
This should be enough to keep you busy for awhile. Let us know about your ARM adventures in the forums.