Arm Instruction Condition Codes

最新推荐文章于 2023-06-26 11:01:31 发布

CaspianSea

最新推荐文章于 2023-06-26 11:01:31 发布

阅读量3.1k

点赞数

分类专栏： Computer Architecture

Computer Architecture 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Every practical general-purpose computing architecture has a mechanism of conditionally executing some code. Such mechanisms are used to implement the if construct in C, for example, in addition to several other cases that are less obvious.

ARM, like many other architectures, implements conditional execution using a set of flags which store state information about a previous operation. I intend, in this post, to shed some light on the operation of these flags. Of course, theArchitecture Reference Manual is the definitive source of information, so if you need to know about a specific corner-case that I do not cover here, that is where you need to look.

A Realistic Example

Consider a simple fragment of C code:

for (i = 10; i != 0; i--) {
do_something();
}

A compiler might implement that structure as follows:

mov r4, #10
loop_label:
bl do_something
sub r4, r4, #1
cmp r4, #0
bne loop_label

The last two instructions are of particular interest. The cmp (compare) instruction comparesr4 with 0, and the bne instruction is simply ab (branch) instruction that executes if the result of the cmp instruction was "not equal". The code works becausecmp sets some global flags indicating various properties of the operation. Thebne instruction — which is really just a b (branch) with ane condition code suffix — reads these flags to determine whether or not to branch¹.

The following code implements a more efficient solution:

mov r4, #10
loop_label:
bl do_something
subs r4, r4, #1
bne loop_label

Adding the s suffix to sub causes it to update the flags itself, based on the result of the operation. This suffix can be added to many (but not all) arithmetic and logical operations².

In the rest of the article, I will explain what the condition flags are, where they are stored, and how to test them using condition codes.

Condition-Code Analysis Tool

If you have an ARM platform (or emulator) handy, the attached ccdemo application can be used to experiment with the operations discussed in the article. The application allows you to pick an operation and two operands, and shows the resulting flags and a list of which condition codes will match. When writing assembly code, it can also be a rather useful development tool.

The Flags

The simplest way to set the condition flags is to use a comparison operation, such ascmp. This mechanism is common to many processor architectures, and the semantics (if not the details) ofcmp will likely be familiar. In addition, we have already seen that many instructions (such assub in the example) can be modified to update the condition flags based on the result by adding ans suffix. That's all well and good, but what information is stored, and how can we access it?

The additional information is stored in four condition flag bits in the APSR (Application Processor Status Register), or the CPSR (Current Processor Status Register) if you are used to pre-ARMv7 terminology^3,4. The flags indicate simple properties such as whether or not the result was negative, and are used in various combinations to detect higher-level relationships such as "greater than" and suchlike. Once I have described the flags, I will explain how they map onto condition codes (such as ne in the previous example).

`N`: Negative

The N flag is set by an instruction if the result is negative. In practice, N is set to thetwo's complement sign bit of the result (bit 31).

`Z`: Zero

The Z flag is set if the result of the flag-setting instruction is zero.

`C`: Carry (or Unsigned Overflow)

The C flag is set if the result of an unsigned operation overflows the 32-bit result register. This bit can be used to implement 64-bit unsigned arithmetic, for example.

`V`: (Signed) Overflow

The V flag works the same as the C flag, but for signed operations. For example,0x7fffffff is the largest positive two's complement integer that can be represented in 32 bits, so0x7fffffff + 0x7fffffff triggers a signed overflow, but not an unsigned overflow (or carry): the result,0xfffffffe, is correct if interpreted as an unsigned quantity, but represents a negative value (-2) if interpreted as a signed quantity.

Flag-Setting Example

Consider the following example:

ldr r1, =0xffffffff
ldr r2, =0x00000001
adds r0, r1, r2

The result of the operation would be 0x100000000, but the top bit is lost because it does not fit into the 32-bit destination register and so the real result is0x00000000. In this case, the flags will be set as follows:

Flag	Explanation
`N = 0`	The result is 0, which is considered positive, and so the`N` (negative) bit is set to `0`.
`Z = 1`	The result is 0, so the`Z` (zero) bit is set to `1`.
`C = 1`	We lost some data because the result did not fit into 32 bits, so the processor indicates this by setting`C` (carry) to `1`.
`V = 0`	From a two's complement signed-arithmetic viewpoint,`0xffffffff` really means `-1`, so the operation we did was really`(-1) + 1 = 0`. That operation clearly does not overflow, so `V` (overflow) is set to`0`.

If you fancy it, you can check this with the ccdemo application. The output looks like this:

$ ./ccdemo adds 0xffffffff 0x1
The results (in various formats):
Signed: -1 adds 1 = 0
Unsigned: 4294967295 adds 1 = 0
Hexadecimal: 0xffffffff adds 0x00000001 = 0x00000000
Flags:
N (negative): 0
Z (zero) : 1
C (carry) : 1
V (overflow): 0
Condition Codes:
EQ: 1 NE: 0
CS: 1 CC: 0
MI: 0 PL: 1
VS: 0 VC: 1
HI: 0 LS: 1
GE: 1 LT: 0
GT: 0 LE: 1

Reading the Flags

We have worked out how to set the flags, but how does that result in the ability to conditionally execute some code? Being able to set the flags is pointless if you cannot then react to them.

The most common method of testing the flags is to use conditional execution codes. This mechanism is similar to mechanisms used in other architectures, so if you are familiar with other machines you might recognize the following pattern, which maps cleanly onto C's if/else construct:

cmp r0, #20
bhi do_something_else
do_something:
@ This code runs if (r0 <= 20).
b continue @ Prevent do_something_else from executing.
do_something_else:
@ This code runs if (r0 > 20).
continue:
@ Other code.

In effect, attaching one of the condition codes to an instruction causes it to executeif the condition is true. Otherwise, it does nothing, and is essentially anop.

The following table lists the available condition codes, their meanings (where the flags were set by acmp or subs instruction), and the flags that are tested:

Code	Meaning (for `cmp` or `subs`)	Flags Tested
`eq`	Equal.	`Z==1`
`ne`	Not equal.	`Z==0`
`cs` or `hs`	Unsigned higher or same (or carry set).	`C==1`
`cc` or `lo`	Unsigned lower (or carry clear).	`C==0`
`mi`	Negative. The mnemonic stands for "minus".	`N==1`
`pl`	Positive or zero. The mnemonic stands for "plus".	`N==0`
`vs`	Signed overflow. The mnemonic stands for "V set".	`V==1`
`vc`	No signed overflow. The mnemonic stands for "V clear".	`V==0`
`hi`	Unsigned higher.	`(C==1) && (Z==0)`
`ls`	Unsigned lower or same.	`(C==0) \|\| (Z==1)`
`ge`	Signed greater than or equal.	`N==V`
`lt`	Signed less than.	`N!=V`
`gt`	Signed greater than.	`(Z==0) && (N==V)`
`le`	Signed less than or equal.	`(Z==1) \|\| (N!=V)`
`al` (or omitted)	Always executed.	None tested.

It is fairly obvious how the first few work because they test individual flags, but the others rely on specific combinations of flags. In practice, you very rarely need to know exactly what is happening; the mnemonics hide the complexity of the comparisons.

Here, once again, is the example for-loop code I gave earlier:

mov r4, #10
loop_label:
bl do_something
subs r4, r4, #1
bne loop_label

It should now be easy enough to work out exactly what is happening here:

The subs instruction sets the flags based on the result of r4-1. In particular, the Z flag will be set if the result is 0, and it will be clear if the result is anything else.
The bne instruction only executes if condition ne is true. That condition is true ifZ is clear, so the bne iterates the loop until Z is set (and thereforer4 is 0).

Dedicated Comparison Instructions

The cmp instruction (that we saw in the first example) can be thought of as asub instruction that doesn't store its result: if the two operands are equal, the result of the subtraction will be zero, hence the mapping betweeneq and the Z flag. Of course, we could just use a sub instruction with a dummy register, but you can only do that if you have a register to spare. Dedicated comparison instructions are therefore quite commonly used.

There are actually four dedicated comparison instructions available, and they perform operations as described in the following table:

Instruction	Description
`cmp`	Works like `subs`, but does not store the result.
`cmn`	Works like `adds`, but does not store the result.
`tst`	Works like `ands`, but does not store the result.
`teq`	Works like `eors`, but does not store the result.

Note that the dedicated comparison operations do not require the s suffix; theyonly update the flags, so the suffix would be redundant.

End Note

Whilst the condition flag mechanism is fairly simple in principle, there are a lot of details to take in, and seeing some real examples will probably be useful! I will make a point of presenting some examples of realistic usage in a future blog post.

¹Technically, most instructions can be executed conditionally, not just branches. However, I will discuss such conditional execution in more detail in another article.

²TheInstruction Set Quick Reference Card summarises the flag-setting abilities of each instruction. TheArchitecture Reference Manual contains detailed information about exactly how the flags are updated for each instruction.

³TheAPSR and CPSR are actually the same on ARMv7, despite having separate names, but only the condition codes and one or two other bits are defined for theAPSR. The other bits should not really be accessed directly anyway, so the renaming is essentially a clean-up of the old mixed-accessCPSR. Note, however, that GCC (4.3.3 at least) does not accept APSR, so you have to use CPSR in your assembly source if you want to access it.

⁴In general, you will very rarely need to directly access the APSR because the condition codes give you the functionality you usually need from them anyway. However, if you really want to see what is in there, you can access it using themsr and mrs instructions. Indeed, this is the method that theccdemo application uses to give information about the specified operation.

In my previous post (Condition Codes 1), I explained that some instructions can set some global condition codes, and that these codes can be used to conditionally execute code. I gave some examples of usage. One such example was an assembly implementation of C'sif/else construct:

cmp r0, #20
bhi do_something_else
do_something:
@ This code runs if (r0 <= 20).
b continue @ Prevent do_something_else from executing.
do_something_else:
@ This code runs if (r0 > 20).
continue:
@ Other code.

The example is valid, and will work on any ARM core. However, is this an efficient solution if you only need to execute one or two instructions in each case? Consider the following C code:

if (a > 10) {
a = 10;
} else {
a = a + 1;
}

It should be clear that the code increments a unless it has hit or exceeded a limit of 10, in which case it is set to 10. Mapping this onto ourif/else example, this might be implemented in assembly as follows:

cmp r0, #10
blo r0_is_small
r0_is_big:
mov r0, #10
b continue
r0_is_small:
add r0, r0, #1
continue:
@ Other code.

The above code executes one of two instructions, either the mov or theadd. However, it uses two branch instructions to achieve this. Without branch prediction, these branches can take several cycles to execute. Even with branch prediction, the pattern may not be easily predicted. Finally, even with perfect branch prediction, each branch instruction takes four bytes of instruction memory, so code size may become a problem.

An Improved Example

One of the features of the ARM instruction set is that almost every instruction encoding includes a 4-bit field that represents a condition code. If the condition attached to an instruction passes, the instruction executes. Otherwise, it has no effect, as if you had used a nop instruction. Using this knowledge, we can implement the previous example more efficiently as follows:

cmp r0, #10
movhs r0, #10
addlo r0, r0, #1

Unconditionally-Executed Instructions

In the ARM instruction set, the condition code is encoded using a 4-bit field in the instruction. The encoding includes 3 bits to identify an operation, and a fourth bit to invert the condition. Theeq condition, for example, is the exact opposite of the ne condition. It may interest authors of JIT compilers to know that the least significant bit of the condition code can be inverted to obtain the opposite condition code. For example, eq (equal) is encoded as '0000' and ne (not equal) as '0001'. This works for every condition code with the exception of theal (always) condition, encoded as '1110'. It would be wasteful to dedicate one sixteenth of the instruction set to instructions that can never execute. Instead, this portion of the instruction set is used for the few instructions which cannot be executed conditionally.

Here are a few examples of instructions which will always execute unconditionally in the ARM instruction set:

blx <label> cannot be conditionally executed, but blx <register> (and all other branch instructions) can.
Most NEON instructions. For example, SIMD (NEON) variants of vadd cannot be conditionally executed, though the scalar (VFP) variants can.
Hint instructions, such as pld (preload data).
Barriers, such as dmb (data memory barrier), dsb (data synchronization barrier),isb (instruction synchronization barrier).

As always, the ARMv7-AR Architecture Reference Manual contains the most complete and accurate information, as does theInstruction Set Quick Reference Card.

Conditional Execution and High-Performance Processors

In the time when few processors had branch prediction and when code size was very constrained, conditional execution was an excellent way to save code space whilst also improving performance in many programs. This is still true for today's real-time processors and micro-controllers. However, ARM's application-class processors include branch predictors which often make the branch-basedif/else construction more attractive than conditional instructions. A predicted branch may be very cheap, or even free in some cases. In addition, conditional execution can, in some cases, prevent out-of-order execution as it adds additional instruction stream dependencies.

In some cases, it can be difficult to know whether to use conditional execution or traditional conditional branches for a particular application. However, as a general rule-of-thumb, it's probably best to use conditional instructions for sequences of three instructions or fewer, and branches for longer sequences. The best-performing solution varies between processors as they have different pipeline and branch predictor designs, and it also varies depending on the specific instruction sequence you are using. Also note that the fastest solution is not necessarily the smallest.

Thumb

In the original 16-bit Thumb instruction set, only branches could be conditional. In Thumb-2, theit instruction was added to provide functionality and behaviour similar to conditional instructions in ARM. Thumb-2'sit instruction can also conditionally execute some instructions which are normally unconditionally executed in ARM state. I won't say more about it now, though it will be covered in detail in mynext post in this series.

Thumb-2 can make use of the same conditional execution features that theARM instruction set provides. For conditionally executing one or two instructions, this mechanism can provide code-size and performance benefits over the (more conventional) conditional branching mechanism.

I noted at the end of the last post in this series that this mechanism is not directly available to Thumb. Instead, Thumb-2 has an instruction —it — which can provide the same functionality as ARM conditional execution. In this article, I will describe theit instruction, and I will also explain a few caveats of condition-setting instructions in Thumb-2. Note that theit instruction is only available to Thumb-2, and so most of this article will not be relevant to the old Thumb instruction set¹.

The `it` Instruction

With the exception of simple conditional branches, Thumb-2 instructions do not have the 4-bit condition code field that most ARM instruction have. Instead, Thumb-2 has theit instruction, which conditionally executes up to four subsequent instructions. The instructions affected by anit instruction are said to be in an it block.

The mnemonic it represents an if-then construct. If the condition code (given as an argument to the instruction) evaluates totrue, then the next instruction is executed. Up to three additional t (then) or e (else) codes can be added to control the execution of the subsequent instructions. For example, readite as if-then-else, and ittee as if-then-then-else- else. The following code either incrementsr0, or resets it to 0 if it is greater than or equal to10:

.syntax unified @ Remember this!
.thumb
[...]
cmp r0, #10
ite lo @ if r0 is lower than 10 ...
addlo r0, #1 @ ... then r0 = r0 + 1
movhs r0, #0 @ ... else r0 = 0

Note that the conditionally-executed instructions inside the it block must still be given condition codes, as they would in ARM assembly. Assemblers will check that the condition you gave toit is consistent with those on the individual instructions. The then conditions must match the condition code, and any else conditions must be the opposite condition. In the example, theelse condition was hs (higher or same) — the opposite oflo (lower). The table below shows the condition codes and their opposites:

Condition Code		Opposite
Code	Description	Code	Description
`eq`	Equal.	`ne`	Not equal.
`hs` (or `cs`)	Unsigned higher or same (or carry set).	`lo` (or `cc`)	Unsigned lower (or carry clear).
`mi`	Negative.	`pl`	Positive or zero.
`vs`	Signed overflow.	`vc`	No signed overflow.
`hi`	Unsigned higher.	`ls`	Unsigned lower or same.
`ge`	Signed greater than or equal.	`lt`	Signed less than.
`gt`	Signed greater than.	`le`	Signed less than or equal.
`al` (or omitted)	Always executed.	There is no opposite to`al`.

Whilst it is valid to give condition code al to the it, it has no opposite as there is nonever code. It is not valid to specify the al condition code in anit instruction that uses an else clause.

Branches

Just like other instructions, Thumb-2's branches can be conditionally executed usingit. Indeed, some branches cannot be conditionally executed without using anit block. However, any branches that exist in an it blockmust be the last instruction in the block. The following, for example, is unpredictable:

ite eq
blxeq some_label @ UNPREDICTABLE during an IT block.
movne r0, #0

The correct way to implement the above would be to put the mov before theblx, as follows:

ite ne
movne r0, #0
blxeq some_label @ Ok at the end of an IT block.

Compatibility with ARM Assembly

The it instruction is valid in ARM assembly, though it will not generate any code. This is done for compatibility with Thumb-2 assembly, and allows most assembly sequences to be assembled for both ARM and Thumb-2.

Simple Conditional Branches

Just like ARM code, a simple Thumb b instruction can be made conditional by adding a suitable condition code suffix. Indeed, theif/else example provided in my last post will assemble for Thumb just as it will for ARM.

Interesting Optimization Possibilities

Condition Code `al`

16-bit forms of Thumb arithmetic instructions usually set the condition flags. When inside anit block, however, the 16-bit forms do not set the flags. This property can be useful in combination with condition codeal. Consider the following code sequence:

@ Instruction Size
add r0, r0, #1 @ 4 bytes
add r1, r1, #1 @ 4 bytes
add r2, r2, #1 @ 4 bytes
add r3, r3, #1 @ 4 bytes
@ Total: 16 bytes

Writing an equivalent code sequence using an it block can result in smaller code size:

@ Instruction Size
itttt al @ 2 bytes
addal r0, r0, #1 @ 2 bytes
addal r1, r1, #1 @ 2 bytes
addal r2, r2, #1 @ 2 bytes
addal r3, r3, #1 @ 2 bytes
@ Total: 10 bytes

It should be noted that the 16-bit forms have additional limitations, so the it trick used above may not always be applicable. The restrictions vary between each instruction, but typically the 16-bit instruction forms can typically only accessr0-r7 and have a very restricted range of immediate constants. For details, refer to theArchitecture Reference Manual.

Flag Setting

Because (outside of it blocks) most arithmetic instruction that set the flags have 16-bit forms, code size can be dramatically improved by setting the flags even when not necessary. This will provide the best (smallest) code size possible. However, depending on your target processor, this technique may have a small negative performance impact. It is perhaps advisable to use theal condition trick or 32-bit instructions in performance-critical code.

You can force the assembler to produce 16-bit instructions by adding a .n suffix. Assemblers will do this anyway, but if your instruction cannot be encoded using a 16-bit form and you specify.n, the assembler will give an error message.

[...] @ Not in an IT block.
adds.n r1, r2, r3 @ Generates a 16-bit instruction.
add.n r1, r2, r3 @ Error: No 16-bit form for this.

Refer to the Architecture Reference Manual for details of each instruction, and information about the constraints of the 16-bit forms. There are many exceptions and special cases so I won't describe them here in detail.

oating-point comparisons in the ARM architecture use the same mechanism as integer comparisons. However, there are some unavoidable caveats because the range of supported relationships is different for floating-point values. There are two problems to consider here: Setting the flags from a VFP comparison, and interpreting the flags with condition codes.

This post is applicable to all processors with VFP. The mechanisms I will describe do not differ between VFP variants. Similarly, the mechanisms are equally available in ARM and Thumb-2 modes. I described conditional execution in Thumb-2 in mylast article.

Setting the Flags with VFP

As I described at the start of this series, the integer cmp instruction performs an integer comparison and updates theAPSR (Application Processor Status Register) with information about the result of the comparison. TheAPSR holds the condition flags used by the processor for conditional execution. When VFP is used to perform a floating-point comparison, thevcmp instruction is used to update the FPSCR (Floating- Point System Control Register). This isn't usually useful by itself, however, as the processor cannot directly use theFPSCR for conditional execution. The vmrs instruction must be used to transfer the flags to theAPSR 1.

.syntax unified @ Remember this!
[...]
vcmp d0, d1
vmrs APSR_nzcv, FPSCR @ Get the flags into APSR.
[...] @ Do something with the condition flags.

Note that some versions of the GNU assembler do not accept all of the new instruction variants (with the "v" prefix). In this case, usefcmp in place of vcmp, and fmstat (with no arguments) in place ofvmrs.

Flag Meanings

The integer comparison flags support comparisons which are not applicable to floating-point numbers. For example, floating-point values are always signed, so there is no need for unsigned comparisons. On the other hand, floating- point comparisons can result in the unordered result (meaning that one or both operands was NaN, or"not a number"). IEEE-754 defines four testable relationships between two floating-point values, and they map onto the ARM condition codes as follows:

IEEE-754 Relationship	ARM APSR Flags
IEEE-754 Relationship	N	Z	C	V
Equal	0	1	1	0
Less Than	1	0	0	0
Greater Than	0	0	1	0
Unordered (At least one argument was`NaN`.)	0	0	1	1

Compare with Zero

Unlike the integer instructions, most VFP (and NEON) instructions can operate only on registers, and cannot accept immediate values encoded in the instruction stream. Thevcmp instruction is a notable exception in that it has a special-case variant that allows quick and easy comparison with zero.

Interpreting the Flags

Once the flags are in the APSR, they may be used almost as if an integer comparison had set the flags. However, floating-point comparisons support different relationships, so the integer condition codes do not always make sense. The following table is equivalent to the condition code table from the first post in this series, but it describes floating-point comparisons as well as integer comparisons:

Code	Meaning (when set by `vcmp`)	Meaning (when set by `cmp`)	Flags Tested
`eq`	Equal to.	Equal to.	`Z==1`
`ne`	Unordered, or not equal to.	Not equal to.	`Z==0`
`cs` or `hs`	Greater than, equal to, or unordered.	Greater than or equal to (unsigned).	`C==1`
`cc` or `lo`	Less than.	Less than (unsigned).	`C==0`
`mi`	Less than.	Negative.	`N==1`
`pl`	Greater than, equal to, or unordered.	Positive or zero.	`N==0`
`vs`	Unordered. (At least one argument was`NaN`.)	Signed overflow.	`V==1`
`vc`	Not unordered. (No argument was`NaN`.)	No signed overflow.	`V==0`
`hi`	Greater than or unordered.	Greater than (unsigned).	`(C==1) && (Z==0)`
`ls`	Less than or equal to.	Less than or equal to (unsigned).	`(C==0) \|\| (Z==1)`
`ge`	Greater than or equal to.	Greater than or equal to (signed).	`N==V`
`lt`	Less than or unordered.	Less than (signed).	`N!=V`
`gt`	Greater than.	Greater than (signed).	`(Z==0) && (N==V)`
`le`	Less than, equal to or unordered.	Less than or equal to (signed).	`(Z==1) \|\| (N!=V)`
`al` (or omitted)	Always executed.	Always executed.	None tested.

It should be obvious that the condition code is attached to the instruction reading the flags, and the source of the flags makes no difference to the flags that are tested. It is themeaning of the flags that differs when you perform a vcmp rather than acmp. Similarly, it is clear that the opposite conditions still hold. (For example,hs is still the opposite of lo.)

The flags when set by cmp generally have analogous meanings when set byvcmp. For example, gt still means "greater than". However, the unordered condition and the removal of the signed conditions can confuse matters. Often, for example, it is desirable to uselo — normally an unsigned "less than" check — in place of lt, because it does not match in the unordered case.

Performance Considerations

Be aware than vmrs effectively implements a data transfer between VFP and the integer core, and this operation can be relatively expensive on some cores. In addition, there is clearly a data dependency betweenvcmp and vmrs and another between vmrs and the conditional instruction. It is advisable to structure your code such that the flags are set and transferred many instructions before they are actually read. This is also true of integer comparisons, though the effect is likely to be more significant when using VFP.

Some instruction timing information and latency information is available for theCortex-A8 and Cortex-A9 processors.

Examples

VFP Version of `ccdemo`

In my first post in this series, I provided an example program ("ccdemo") to show how the flags and condition codes interact. A VFP version (usingvcmp) is attached to this article.

Complex Number Addition with Special NaN Handler

@ Add complex numbers (or two-element vectors) in s3:s2 and s5:s4, storing
@ the result in s1:s0. If either element of the result is NaN, jump to a
@ special handler.
vadd s0, s2, s4
vadd s1, s3, s5
vcmp s0, s1
vmrs APSR_nzcv, FPSCR
bvs nan_handler

Loop Condition

@ This implements a loop that calculates d0=d0-(1/d0) until d0 is negative.
vmov d0, #10.0 @ Some starting value.
vmov d2, #1.0 @ We need the constant 1.0 in the loop.
1: [...] @ Do something interesting with d0.
vdiv d1, d2, d0 @ d1=(1/d0)
vsub d0, d0, d1 @ d0=d0-(1/d0)
vcmp d0, #0 @ Special case of vcmp for compare-with-zero.
vmrs APSR_nzcv, FPSCR
bge 1b

Implementation of `fmax`

@ A typical implementation of "fmax".
@ Put into d0 the greatest of d1 and d2.
@ - If one argument is NaN, the result is the other argument.
@ - If both arguments are NaN, the result is NaN.
@ I have used ["it" blocks][cc3] here so the sequence can be assembled as either
@ ARM or Thumb-2 code.
vcmp d1, d2
vmrs APSR_nzcv, FPSCR
it vs @ Code "vs" means "unordered".
bvs 1f @ Jump to the NaN handler.
@ Normal-case (not-NaN) handler.
ite ge
vmovge d0, d1 @ Select d1 if it is the greatest (or equal).
vmovlt d0, d2 @ Select d2 if it is the greatest.
b 2f @ Jump over the NaN handler.
1:
@ NaN handler. We know that at least one argument was NaN.
vcmp d1, #0
vmrs APSR_nzcv, FPSCR
ite vc @ Code "vc" means "not unordered".
vmovvc d0, d1 @ d1 wasn't NaN, so make it the result.
vmovvs d0, d2 @ d1 was NaN, so choose d2. (This might be NaN too.)
2:
@ Done. The result is in d0.
[...]

¹ Thevmrs instruction can also transfer the flags (along with the rest of theFPSCR) to an arbitrary general-purpose integer register, but this is usually only useful for accessing fields in theFPSCR other than the condition flags.

http://community.arm.com/groups/processors/blog/2010/07/16/condition-codes-1-condition-flags-and-codes

CaspianSea

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Arm Instruction Condition Codes

very practical general-purpose computing architecture has a mechanism of conditionally executing some code. Such mechanisms are used to implement theif construct in C, for example, in addition to se
复制链接

扫一扫