if
construct in C, for example, in addition to several other cases that are less obvious.
ARM, like many other architectures, implements conditional execution using a set of flags which store state information about a previous operation. I intend, in this post, to shed some light on the operation of these flags. Of course, theArchitecture Reference Manual is the definitive source of information, so if you need to know about a specific corner-case that I do not cover here, that is where you need to look.
A Realistic Example
Consider a simple fragment of C code:
- for (i = 10; i != 0; i--) {
- do_something();
- }
A compiler might implement that structure as follows:
- mov r4, #10
- loop_label:
- bl do_something
- sub r4, r4, #1
- cmp r4, #0
- bne loop_label
The last two instructions are of particular interest. The cmp
(compare) instruction comparesr4
with 0
, and the bne
instruction is simply ab
(branch) instruction that executes if the result of the cmp
instruction was "not equal". The code works becausecmp
sets some global flags indicating various properties of the operation. Thebne
instruction — which is really just a b
(branch) with ane
condition code suffix — reads these flags to determine whether or not to branch1.
The following code implements a more efficient solution:
- mov r4, #10
- loop_label:
- bl do_something
- subs r4, r4, #1
- bne loop_label
Adding the s
suffix to sub
causes it to update the flags itself, based on the result of the operation. This suffix can be added to many (but not all) arithmetic and logical operations2.
In the rest of the article, I will explain what the condition flags are, where they are stored, and how to test them using condition codes.
Condition-Code Analysis Tool
If you have an ARM platform (or emulator) handy, the attached ccdemo
application can be used to experiment with the operations discussed in the article. The application allows you to pick an operation and two operands, and shows the resulting flags and a list of which condition codes will match. When writing assembly code, it can also be a rather useful development tool.
The Flags
The simplest way to set the condition flags is to use a comparison operation, such ascmp
. This mechanism is common to many processor architectures, and the semantics (if not the details) ofcmp
will likely be familiar. In addition, we have already seen that many instructions (such assub
in the example) can be modified to update the condition flags based on the result by adding ans
suffix. That's all well and good, but what information is stored, and how can we access it?
The additional information is stored in four condition flag bits in the APSR
(Application Processor Status Register), or the CPSR
(Current Processor Status Register) if you are used to pre-ARMv7 terminology3,4. The flags indicate simple properties such as whether or not the result was negative, and are used in various combinations to detect higher-level relationships such as "greater than" and suchlike. Once I have described the flags, I will explain how they map onto condition codes (such as ne
in the previous example).
N
: Negative
The N
flag is set by an instruction if the result is negative. In practice, N is set to thetwo's complement sign bit of the result (bit 31).
Z
: Zero
The Z
flag is set if the result of the flag-setting instruction is zero.
C
: Carry (or Unsigned Overflow)
The C
flag is set if the result of an unsigned operation overflows the 32-bit result register. This bit can be used to implement 64-bit unsigned arithmetic, for example.
V
: (Signed) Overflow
The V
flag works the same as the C
flag, but for signed operations. For example,0x7fffffff
is the largest positive two's complement integer that can be represented in 32 bits, so0x7fffffff + 0x7fffffff
triggers a signed overflow, but not an unsigned overflow (or carry): the result,0xfffffffe
, is correct if interpreted as an unsigned quantity, but represents a negative value (-2
) if interpreted as a signed quantity.
Flag-Setting Example
Consider the following example:
- ldr r1, =0xffffffff
- ldr r2, =0x00000001
- adds r0, r1, r2
The result of the operation would be 0x100000000
, but the top bit is lost because it does not fit into the 32-bit destination register and so the real result is0x00000000
. In this case, the flags will be set as follows:
Flag | Explanation |
---|---|
N = 0 | The result is 0, which is considered positive, and so theN (negative) bit is set to 0 . |
Z = 1 | The result is 0, so theZ (zero) bit is set to 1 . |
C = 1 | We lost some data because the result did not fit into 32 bits, so the processor indicates this by settingC (carry) to 1 . |
V = 0 | From a two's complement signed-arithmetic viewpoint,0xffffffff really means -1 , so the operation we did was really(-1) + 1 = 0 . That operation clearly does not overflow, so V (overflow) is set to0 . |
If you fancy it, you can check this with the ccdemo
application. The output looks like this:
- $ ./ccdemo adds 0xffffffff 0x1
- The results (in various formats):
- Signed: -1 adds 1 = 0
- Unsigned: 4294967295 adds 1 = 0
- Hexadecimal: 0xffffffff adds 0x00000001 = 0x00000000
- Flags:
- N (negative): 0
- Z (zero) : 1
- C (carry) : 1
- V (overflow): 0
- Condition Codes:
- EQ: 1 NE: 0
- CS: 1 CC: 0
- MI: 0 PL: 1
- VS: 0 VC: 1
- HI: 0 LS: 1
- GE: 1 LT: 0
- GT: 0 LE: 1
Reading the Flags
We have worked out how to set the flags, but how does that result in the ability to conditionally execute some code? Being able to set the flags is pointless if you cannot then react to them.
The most common method of testing the flags is to use conditional execution codes. This mechanism is similar to mechanisms used in other architectures, so if you are familiar with other machines you might recognize the following pattern, which maps cleanly onto C's if/else
construct:
- cmp r0, #20
- bhi do_something_else
- do_something:
- @ This code runs if (r0 <= 20).
- b continue @ Prevent do_something_else from executing.
- do_something_else:
- @ This code runs if (r0 > 20).
- continue:
- @ Other code.
In effect, attaching one of the condition codes to an instruction causes it to executeif the condition is true. Otherwise, it does nothing, and is essentially anop
.
The following table lists the available condition codes, their meanings (where the flags were set by acmp
or subs
instruction), and the flags that are tested:
Code | Meaning (for cmp or subs ) | Flags Tested |
---|---|---|
eq | Equal. | Z==1 |
ne | Not equal. | Z==0 |
cs or hs | Unsigned higher or same (or carry set). | C==1 |
cc or lo | Unsigned lower (or carry clear). | C==0 |
mi | Negative. The mnemonic stands for "minus". | N==1 |
pl | Positive or zero. The mnemonic stands for "plus". | N==0 |
vs | Signed overflow. The mnemonic stands for "V set". | V==1 |
vc | No signed overflow. The mnemonic stands for "V clear". | V==0 |
hi | Unsigned higher. | (C==1) && (Z==0) |
ls | Unsigned lower or same. | (C==0) || (Z==1) |
ge | Signed greater than or equal. | N==V |
lt | Signed less than. | N!=V |
gt | Signed greater than. | (Z==0) && (N==V) |
le | Signed less than or equal. | (Z==1) || (N!=V) |
al (or omitted) | Always executed. | None tested. |
It is fairly obvious how the first few work because they test individual flags, but the others rely on specific combinations of flags. In practice, you very rarely need to know exactly what is happening; the mnemonics hide the complexity of the comparisons.
Here, once again, is the example for
-loop code I gave earlier:
- mov r4, #10
- loop_label:
- bl do_something
- subs r4, r4, #1
- bne loop_label
It should now be easy enough to work out exactly what is happening here:
- The
subs
instruction sets the flags based on the result ofr4-1
. In particular, theZ
flag will be set if the result is0
, and it will be clear if the result is anything else. - The
bne
instruction only executes if conditionne
is true. That condition is true ifZ
is clear, so thebne
iterates the loop untilZ
is set (and thereforer4
is 0).
Dedicated Comparison Instructions
The cmp
instruction (that we saw in the first example) can be thought of as asub
instruction that doesn't store its result: if the two operands are equal, the result of the subtraction will be zero, hence the mapping betweeneq
and the Z
flag. Of course, we could just use a sub
instruction with a dummy register, but you can only do that if you have a register to spare. Dedicated comparison instructions are therefore quite commonly used.
There are actually four dedicated comparison instructions available, and they perform operations as described in the following table:
Instruction | Description |
---|---|
cmp | Works like subs , but does not store the result. |
cmn | Works like adds , but does not store the result. |
tst | Works like ands , but does not store the result. |
teq | Works like eors , but does not store the result. |
Note that the dedicated comparison operations do not require the s
suffix; theyonly update the flags, so the suffix would be redundant.
End Note
Whilst the condition flag mechanism is fairly simple in principle, there are a lot of details to take in, and seeing some real examples will probably be useful! I will make a point of presenting some examples of realistic usage in a future blog post.
1Technically, most instructions can be executed conditionally, not just branches. However, I will discuss such conditional execution in more detail in another article.
2TheInstruction Set Quick Reference Card summarises the flag-setting abilities of each instruction. TheArchitecture Reference Manual contains detailed information about exactly how the flags are updated for each instruction.
3TheAPSR
and CPSR
are actually the same on ARMv7, despite having separate names, but only the condition codes and one or two other bits are defined for theAPSR
. The other bits should not really be accessed directly anyway, so the renaming is essentially a clean-up of the old mixed-accessCPSR
. Note, however, that GCC (4.3.3 at least) does not accept APSR
, so you have to use CPSR
in your assembly source if you want to access it.
4In general, you will very rarely need to directly access the APSR
because the condition codes give you the functionality you usually need from them anyway. However, if you really want to see what is in there, you can access it using themsr
and mrs
instructions. Indeed, this is the method that theccdemo
application uses to give information about the specified operation.
In my previous post (Condition Codes 1), I explained that some instructions can set some global condition codes, and that these codes can be used to conditionally execute code. I gave some examples of usage. One such example was an assembly implementation of C'sif/else
construct:
- cmp r0, #20
- bhi do_something_else
- do_something:
- @ This code runs if (r0 <= 20).
- b continue @ Prevent do_something_else from executing.
- do_something_else:
- @ This code runs if (r0 > 20).
- continue:
- @ Other code.
The example is valid, and will work on any ARM core. However, is this an efficient solution if you only need to execute one or two instructions in each case? Consider the following C code:
- if (a > 10) {
- a = 10;
- } else {
- a = a + 1;
- }
It should be clear that the code increments a
unless it has hit or exceeded a limit of 10, in which case it is set to 10. Mapping this onto ourif/else
example, this might be implemented in assembly as follows:
- cmp r0, #10
- blo r0_is_small
- r0_is_big:
- mov r0, #10
- b continue
- r0_is_small:
- add r0, r0, #1
- continue:
- @ Other code.
The above code executes one of two instructions, either the mov
or theadd
. However, it uses two branch instructions to achieve this. Without branch prediction, these branches can take several cycles to execute. Even with branch prediction, the pattern may not be easily predicted. Finally, even with perfect branch prediction, each branch instruction takes four bytes of instruction memory, so code size may become a problem.
An Improved Example
One of the features of the ARM instruction set is that almost every instruction encoding includes a 4-bit field that represents a condition code. If the condition attached to an instruction passes, the instruction executes. Otherwise, it has no effect, as if you had used a nop
instruction. Using this knowledge, we can implement the previous example more efficiently as follows:
- cmp r0, #10
- movhs r0, #10
- addlo r0, r0, #1
Unconditionally-Executed Instructions
In the ARM instruction set, the condition code is encoded using a 4-bit field in the instruction. The encoding includes 3 bits to identify an operation, and a fourth bit to invert the condition. Theeq
condition, for example, is the exact opposite of the ne
condition. It may interest authors of JIT compilers to know that the least significant bit of the condition code can be inverted to obtain the opposite condition code. For example, eq
(equal) is encoded as '0000'
and ne
(not equal) as '0001'
. This works for every condition code with the exception of theal
(always) condition, encoded as '1110'
. It would be wasteful to dedicate one sixteenth of the instruction set to instructions that can never execute. Instead, this portion of the instruction set is used for the few instructions which cannot be executed conditionally.
Here are a few examples of instructions which will always execute unconditionally in the ARM instruction set:
blx <label>
cannot be conditionally executed, butblx <register>
(and all other branch instructions) can.- Most NEON instructions. For example, SIMD (NEON) variants of
vadd
cannot be conditionally executed, though the scalar (VFP) variants can. - Hint instructions, such as
pld
(preload data). - Barriers, such as
dmb
(data memory barrier),dsb
(data synchronization barrier),isb
(instruction synchronization barrier).
As always, the ARMv7-AR Architecture Reference Manual contains the most complete and accurate information, as does theInstruction Set Quick Reference Card.
Conditional Execution and High-Performance Processors
In the time when few processors had branch prediction and when code size was very constrained, conditional execution was an excellent way to save code space whilst also improving performance in many programs. This is still true for today's real-time processors and micro-controllers. However, ARM's application-class processors include branch predictors which often make the branch-basedif/else
construction more attractive than conditional instructions. A predicted branch may be very cheap, or even free in some cases. In addition, conditional execution can, in some cases, prevent out-of-order execution as it adds additional instruction stream dependencies.
In some cases, it can be difficult to know whether to use conditional execution or traditional conditional branches for a particular application. However, as a general rule-of-thumb, it's probably best to use conditional instructions for sequences of three instructions or fewer, and branches for longer sequences. The best-performing solution varies between processors as they have different pipeline and branch predictor designs, and it also varies depending on the specific instruction sequence you are using. Also note that the fastest solution is not necessarily the smallest.
Thumb
In the original 16-bit Thumb instruction set, only branches could be conditional. In Thumb-2, theit
instruction was added to provide functionality and behaviour similar to conditional instructions in ARM. Thumb-2'sit
instruction can also conditionally execute some instructions which are normally unconditionally executed in ARM state. I won't say more about it now, though it will be covered in detail in mynext post in this series.
Thumb-2 can make use of the same conditional execution features that theARM instruction set provides. For conditionally executing one or two instructions, this mechanism can provide code-size and performance benefits over the (more conventional) conditional branching mechanism.
I noted at the end of the last post in this series that this mechanism is not directly available to Thumb. Instead, Thumb-2 has an instruction —it
— which can provide the same functionality as ARM conditional execution. In this article, I will describe theit
instruction, and I will also explain a few caveats of condition-setting instructions in Thumb-2. Note that theit
instruction is only available to Thumb-2, and so most of this article will not be relevant to the old Thumb instruction set1.
The it
Instruction
With the exception of simple conditional branches, Thumb-2 instructions do not have the 4-bit condition code field that most ARM instruction have. Instead, Thumb-2 has theit
instruction, which conditionally executes up to four subsequent instructions. The instructions affected by anit
instruction are said to be in an it
block.
The mnemonic it
represents an if-then construct. If the condition code (given as an argument to the instruction) evaluates totrue, then the next instruction is executed. Up to three additional t
(then) or e
(else) codes can be added to control the execution of the subsequent instructions. For example, readite
as if-then-else, and ittee
as if-then-then-else- else. The following code either incrementsr0
, or resets it to 0
if it is greater than or equal to10
:
- .syntax unified @ Remember this!
- .thumb
- [...]
- cmp r0, #10
- ite lo @ if r0 is lower than 10 ...
- addlo r0, #1 @ ... then r0 = r0 + 1
- movhs r0, #0 @ ... else r0 = 0
Note that the conditionally-executed instructions inside the it
block must still be given condition codes, as they would in ARM assembly. Assemblers will check that the condition you gave toit
is consistent with those on the individual instructions. The then conditions must match the condition code, and any else conditions must be the opposite condition. In the example, theelse condition was hs
(higher or same) — the opposite oflo
(lower). The table below shows the condition codes and their opposites:
Condition Code | Opposite | |||
---|---|---|---|---|
Code | Description | Code | Description | |
eq | Equal. | ne | Not equal. | |
hs (or cs ) | Unsigned higher or same (or carry set). | lo (or cc ) | Unsigned lower (or carry clear). | |
mi | Negative. | pl | Positive or zero. | |
vs | Signed overflow. | vc | No signed overflow. | |
hi | Unsigned higher. | ls | Unsigned lower or same. | |
ge | Signed greater than or equal. | lt | Signed less than. | |
gt | Signed greater than. | le | Signed less than or equal. | |
al (or omitted) | Always executed. | There is no opposite toal . |
Whilst it is valid to give condition code al
to the it
, it has no opposite as there is nonever code. It is not valid to specify the al
condition code in anit
instruction that uses an else clause.
Branches
Just like other instructions, Thumb-2's branches can be conditionally executed usingit
. Indeed, some branches cannot be conditionally executed without using anit
block. However, any branches that exist in an it
blockmust be the last instruction in the block. The following, for example, is unpredictable:
- ite eq
- blxeq some_label @ UNPREDICTABLE during an IT block.
- movne r0, #0
The correct way to implement the above would be to put the mov
before theblx
, as follows:
- ite ne
- movne r0, #0
- blxeq some_label @ Ok at the end of an IT block.
Compatibility with ARM Assembly
The it
instruction is valid in ARM assembly, though it will not generate any code. This is done for compatibility with Thumb-2 assembly, and allows most assembly sequences to be assembled for both ARM and Thumb-2.
Simple Conditional Branches
Just like ARM code, a simple Thumb b
instruction can be made conditional by adding a suitable condition code suffix. Indeed, theif/else
example provided in my last post will assemble for Thumb just as it will for ARM.
Interesting Optimization Possibilities
Condition Code al
16-bit forms of Thumb arithmetic instructions usually set the condition flags. When inside anit
block, however, the 16-bit forms do not set the flags. This property can be useful in combination with condition codeal
. Consider the following code sequence:
- @ Instruction Size
- add r0, r0, #1 @ 4 bytes
- add r1, r1, #1 @ 4 bytes
- add r2, r2, #1 @ 4 bytes
- add r3, r3, #1 @ 4 bytes
- @ Total: 16 bytes
Writing an equivalent code sequence using an it
block can result in smaller code size:
- @ Instruction Size
- itttt al @ 2 bytes
- addal r0, r0, #1 @ 2 bytes
- addal r1, r1, #1 @ 2 bytes
- addal r2, r2, #1 @ 2 bytes
- addal r3, r3, #1 @ 2 bytes
- @ Total: 10 bytes
It should be noted that the 16-bit forms have additional limitations, so the it
trick used above may not always be applicable. The restrictions vary between each instruction, but typically the 16-bit instruction forms can typically only accessr0
-r7
and have a very restricted range of immediate constants. For details, refer to theArchitecture Reference Manual.
Flag Setting
Because (outside of it
blocks) most arithmetic instruction that set the flags have 16-bit forms, code size can be dramatically improved by setting the flags even when not necessary. This will provide the best (smallest) code size possible. However, depending on your target processor, this technique may have a small negative performance impact. It is perhaps advisable to use theal
condition trick or 32-bit instructions in performance-critical code.
You can force the assembler to produce 16-bit instructions by adding a .n
suffix. Assemblers will do this anyway, but if your instruction cannot be encoded using a 16-bit form and you specify.n
, the assembler will give an error message.
- [...] @ Not in an IT block.
- adds.n r1, r2, r3 @ Generates a 16-bit instruction.
- add.n r1, r2, r3 @ Error: No 16-bit form for this.
Refer to the Architecture Reference Manual for details of each instruction, and information about the constraints of the 16-bit forms. There are many exceptions and special cases so I won't describe them here in detail.
oating-point comparisons in the ARM architecture use the same mechanism as integer comparisons. However, there are some unavoidable caveats because the range of supported relationships is different for floating-point values. There are two problems to consider here: Setting the flags from a VFP comparison, and interpreting the flags with condition codes.
This post is applicable to all processors with VFP. The mechanisms I will describe do not differ between VFP variants. Similarly, the mechanisms are equally available in ARM and Thumb-2 modes. I described conditional execution in Thumb-2 in mylast article.
Setting the Flags with VFP
As I described at the start of this series, the integer cmp
instruction performs an integer comparison and updates theAPSR
(Application Processor Status Register) with information about the result of the comparison. TheAPSR
holds the condition flags used by the processor for conditional execution. When VFP is used to perform a floating-point comparison, thevcmp
instruction is used to update the FPSCR
(Floating- Point System Control Register). This isn't usually useful by itself, however, as the processor cannot directly use theFPSCR
for conditional execution. The vmrs
instruction must be used to transfer the flags to theAPSR
1.
- .syntax unified @ Remember this!
- [...]
- vcmp d0, d1
- vmrs APSR_nzcv, FPSCR @ Get the flags into APSR.
- [...] @ Do something with the condition flags.
Note that some versions of the GNU assembler do not accept all of the new instruction variants (with the "v
" prefix). In this case, usefcmp
in place of vcmp
, and fmstat
(with no arguments) in place ofvmrs
.
Flag Meanings
The integer comparison flags support comparisons which are not applicable to floating-point numbers. For example, floating-point values are always signed, so there is no need for unsigned comparisons. On the other hand, floating- point comparisons can result in the unordered result (meaning that one or both operands was NaN
, or"not a number"). IEEE-754 defines four testable relationships between two floating-point values, and they map onto the ARM condition codes as follows:
IEEE-754 Relationship | ARM APSR Flags | ||||
---|---|---|---|---|---|
N | Z | C | V | ||
Equal | 0 | 1 | 1 | 0 | |
Less Than | 1 | 0 | 0 | 0 | |
Greater Than | 0 | 0 | 1 | 0 | |
Unordered (At least one argument wasNaN .) | 0 | 0 | 1 | 1 |
Compare with Zero
Unlike the integer instructions, most VFP (and NEON) instructions can operate only on registers, and cannot accept immediate values encoded in the instruction stream. Thevcmp
instruction is a notable exception in that it has a special-case variant that allows quick and easy comparison with zero.
Interpreting the Flags
Once the flags are in the APSR
, they may be used almost as if an integer comparison had set the flags. However, floating-point comparisons support different relationships, so the integer condition codes do not always make sense. The following table is equivalent to the condition code table from the first post in this series, but it describes floating-point comparisons as well as integer comparisons:
Code | Meaning (when set by vcmp ) | Meaning (when set by cmp ) | Flags Tested |
---|---|---|---|
eq | Equal to. | Equal to. | Z==1 |
ne | Unordered, or not equal to. | Not equal to. | Z==0 |
cs or hs | Greater than, equal to, or unordered. | Greater than or equal to (unsigned). | C==1 |
cc or lo | Less than. | Less than (unsigned). | C==0 |
mi | Less than. | Negative. | N==1 |
pl | Greater than, equal to, or unordered. | Positive or zero. | N==0 |
vs | Unordered. (At least one argument wasNaN .) | Signed overflow. | V==1 |
vc | Not unordered. (No argument wasNaN .) | No signed overflow. | V==0 |
hi | Greater than or unordered. | Greater than (unsigned). | (C==1) && (Z==0) |
ls | Less than or equal to. | Less than or equal to (unsigned). | (C==0) || (Z==1) |
ge | Greater than or equal to. | Greater than or equal to (signed). | N==V |
lt | Less than or unordered. | Less than (signed). | N!=V |
gt | Greater than. | Greater than (signed). | (Z==0) && (N==V) |
le | Less than, equal to or unordered. | Less than or equal to (signed). | (Z==1) || (N!=V) |
al (or omitted) | Always executed. | Always executed. | None tested. |
It should be obvious that the condition code is attached to the instruction reading the flags, and the source of the flags makes no difference to the flags that are tested. It is themeaning of the flags that differs when you perform a vcmp
rather than acmp
. Similarly, it is clear that the opposite conditions still hold. (For example,hs
is still the opposite of lo
.)
The flags when set by cmp
generally have analogous meanings when set byvcmp
. For example, gt
still means "greater than". However, the unordered condition and the removal of the signed conditions can confuse matters. Often, for example, it is desirable to uselo
— normally an unsigned "less than" check — in place of lt
, because it does not match in the unordered case.
Performance Considerations
Be aware than vmrs
effectively implements a data transfer between VFP and the integer core, and this operation can be relatively expensive on some cores. In addition, there is clearly a data dependency betweenvcmp
and vmrs
and another between vmrs
and the conditional instruction. It is advisable to structure your code such that the flags are set and transferred many instructions before they are actually read. This is also true of integer comparisons, though the effect is likely to be more significant when using VFP.
Some instruction timing information and latency information is available for theCortex-A8 and Cortex-A9 processors.
Examples
VFP Version of ccdemo
In my first post in this series, I provided an example program ("ccdemo") to show how the flags and condition codes interact. A VFP version (usingvcmp
) is attached to this article.
Complex Number Addition with Special NaN Handler
- @ Add complex numbers (or two-element vectors) in s3:s2 and s5:s4, storing
- @ the result in s1:s0. If either element of the result is NaN, jump to a
- @ special handler.
- vadd s0, s2, s4
- vadd s1, s3, s5
- vcmp s0, s1
- vmrs APSR_nzcv, FPSCR
- bvs nan_handler
Loop Condition
- @ This implements a loop that calculates d0=d0-(1/d0) until d0 is negative.
- vmov d0, #10.0 @ Some starting value.
- vmov d2, #1.0 @ We need the constant 1.0 in the loop.
- 1: [...] @ Do something interesting with d0.
- vdiv d1, d2, d0 @ d1=(1/d0)
- vsub d0, d0, d1 @ d0=d0-(1/d0)
- vcmp d0, #0 @ Special case of vcmp for compare-with-zero.
- vmrs APSR_nzcv, FPSCR
- bge 1b
Implementation of fmax
- @ A typical implementation of "fmax".
- @ Put into d0 the greatest of d1 and d2.
- @ - If one argument is NaN, the result is the other argument.
- @ - If both arguments are NaN, the result is NaN.
- @ I have used ["it" blocks][cc3] here so the sequence can be assembled as either
- @ ARM or Thumb-2 code.
- vcmp d1, d2
- vmrs APSR_nzcv, FPSCR
- it vs @ Code "vs" means "unordered".
- bvs 1f @ Jump to the NaN handler.
- @ Normal-case (not-NaN) handler.
- ite ge
- vmovge d0, d1 @ Select d1 if it is the greatest (or equal).
- vmovlt d0, d2 @ Select d2 if it is the greatest.
- b 2f @ Jump over the NaN handler.
- 1:
- @ NaN handler. We know that at least one argument was NaN.
- vcmp d1, #0
- vmrs APSR_nzcv, FPSCR
- ite vc @ Code "vc" means "not unordered".
- vmovvc d0, d1 @ d1 wasn't NaN, so make it the result.
- vmovvs d0, d2 @ d1 was NaN, so choose d2. (This might be NaN too.)
- 2:
- @ Done. The result is in d0.
- [...]
vmrs
instruction can also transfer the flags (along with the rest of theFPSCR
) to an arbitrary general-purpose integer register, but this is usually only useful for accessing fields in theFPSCR
other than the condition flags.