VFP---------Condition Codes 4: Floating-Point Comparisons Using VFP

最新推荐文章于 2021-10-24 16:17:53 发布

sno_guo

最新推荐文章于 2021-10-24 16:17:53 发布

阅读量1.5k

点赞数

分类专栏： C/C++学习 ARM体系结构和汇编

C/C++学习同时被 2 个专栏收录

80 篇文章 0 订阅

订阅专栏

ARM体系结构和汇编

11 篇文章 0 订阅

订阅专栏

ZZ:http://blogs.arm.com/software-enablement/405-condition-codes-4-floating-point-comparisons-using-vfp/

Condition Codes 4: Floating-Point Comparisons Using VFP

Posted by Jacob Bramley,

2 COMMENTS

23 February 2011

Floating-point comparisons in the ARM architecture use the same mechanism as integer comparisons. However, there are some unavoidable caveats because the range of supported relationships is different for floating-point values. There are two problems to consider here: Setting the flags from a VFP comparison, and interpreting the flags with condition codes.

This post is applicable to all processors with VFP. The mechanisms I will describe do not differ between VFP variants. Similarly, the mechanisms are equally available in ARM and Thumb-2 modes. I described conditional execution in Thumb-2 in my last article.

Setting the Flags with VFP

As I described at the start of this series, the integer cmp instruction performs an integer comparison and updates the APSR (Application Processor Status Register) with information about the result of the comparison. The APSR holds the condition flags used by the processor for conditional execution. When VFP is used to perform a floating-point comparison, the vcmp instruction is used to update the FPSCR(Floating-Point System Control Register). This isn't usually useful by itself, however, as the processor cannot directly use the FPSCR for conditional execution. The vmrs instruction must be used to transfer the flags to the APSR ¹.

    .syntax unified             @ Remember this!
    [...]
    vcmp    d0, d1
    vmrs    APSR_nzcv, FPSCR    @ Get the flags into APSR.
    [...]                       @ Do something with the condition flags.

Note that some versions of the GNU assembler do not accept all of the new instruction variants (with the "v" prefix). In this case, use fcmp in place of vcmp, and fmstat (with no arguments) in place of vmrs.

Flag Meanings

The integer comparison flags support comparisons which are not applicable to floating-point numbers. For example, floating-point values are always signed, so there is no need for unsigned comparisons. On the other hand, floating-point comparisons can result in the unordered result (meaning that one or both operands was NaN, or "not a number"). IEEE-754 defines four testable relationships between two floating-point values, and they map onto the ARM condition codes as follows:

IEEE-754 Relationship	ARM APSR Flags
IEEE-754 Relationship	N	Z	C	V
Equal	0	1	1	0
Less Than	1	0	0	0
Greater Than	0	0	1	0
Unordered (At least one argument was `NaN`.)	0	0	1	1

Compare with Zero

Unlike the integer instructions, most VFP (and NEON) instructions can operate only on registers, and cannot accept immediate values encoded in the instruction stream. The vcmp instruction is a notable exception in that it has a special-case variant that allows quick and easy comparison with zero.

Interpreting the Flags

Once the flags are in the APSR, they may be used almost as if an integer comparison had set the flags. However, floating-point comparisons support different relationships, so the integer condition codes do not always make sense. The following table is equivalent to the condition code table from the first post in this series, but describes floating-point comparisons rather than integer comparisons:

Code	Meaning (when set by `vcmp`)	Meaning (when set by `cmp`)	Flags Tested
`eq`	Equal to.	Equal to.	`Z==1`
`ne`	Unordered, or not equal to.	Not equal to.	`Z==0`
`cs` or `hs`	Greater than, equal to, or unordered.	Greater than or equal to (unsigned).	`C==1`
`cc` or `lo`	Less than.	Less than (unsigned).	`C==0`
`mi`	Less than.	Negative.	`N==1`
`pl`	Greater than, equal to, or unordered.	Positive or zero.	`N==0`
`vs`	Unordered. (At least one argument was`NaN`.)	Signed overflow.	`V==1`
`vc`	Not unordered. (No argument was `NaN`.)	No signed overflow.	`V==0`
`hi`	Greater than or unordered.	Greater than (unsigned).	`(C==1) && (Z==0)`
`ls`	Less than or equal to.	Less than or equal to (unsigned).	`(C==0) \|\| (Z==1)`
`ge`	Greater than or equal to.	Greater than or equal to (signed).	`N==V`
`lt`	Less than or unordered.	Less than (signed).	`N!=V`
`gt`	Greater than.	Greater than (signed).	`(Z==0) && (N==V)`
`le`	Less than, equal to or unordered.	Less than or equal to (signed).	`(Z==1) \|\| (N!=V)`
`al` (or omitted)	Always executed.	Always executed.	None tested.

It should be obvious that the condition code is attached to the instruction reading the flags, and the source of the flags makes no difference to the flags that are tested. It is the meaning of the flags that differs when you perform a vcmp rather than a cmp. Similarly, it is clear that the opposite conditions still hold. (For example, hs is still the opposite of lo.)

The flags when set by cmp generally have analogous meanings when set by vcmp. For example, gt still means "greater than". However, the unordered condition and the removal of the signed conditions can confuse matters. Often, for example, it is desirable to use lo — normally an unsigned "less than" check — in place of lt, because it does not match in the unordered case.

Performance Considerations

Be aware that vmrs effectively implements a data transfer between VFP and the integer core, and this operation can be relatively expensive on some cores. Some instruction timing information and latency information is available for the Cortex-A8 and Cortex-A9 processors. In addition, there is clearly a data dependency between vcmp and vmrs and another between vmrs and the conditional instruction. It is often possible to optimize your code by setting and transferring the flags many instructions before they are actually read. This is also true of integer comparisons, though the effect is likely to be more significant when using VFP.

Examples

VFP Version of `ccdemo`

In my first post in this series, I provided an example program ("ccdemo") to show how the flags and condition codes interact. A VFP version (using vcmp) is attached to this article.

Complex Number Addition with Special NaN Handler

    @ Add complex numbers (or two-element vectors) in s3:s2 and s5:s4, storing
    @ the result in s1:s0. If either element of the result is NaN, jump to a
    @ special handler.
    vadd    s0, s2, s4
    vadd    s1, s3, s5
    vcmp    s0, s1
    vmrs    APSR_nzcv, FPSCR
    bvs     nan_handler

Loop Condition

    @ This implements a loop that calculates d0=d0-(1/d0) until d0 is negative.
    vmov    d0, #10.0   @ Some starting value.
    vmov    d2, #1.0    @ We need the constant 1.0 in the loop.

1:  [...]               @ Do something interesting with d0.

    vdiv    d1, d2, d0  @ d1=(1/d0)
    vsub    d0, d0, d1  @ d0=d0-(1/d0)
    vcmp    d0, #0      @ Special case of vcmp for compare-with-zero.
    vmrs    APSR_nzcv, FPSCR

    bge     1b

Implementation of `fmax`

    @ A typical implementation of "fmax".
    @ Put into d0 the greatest of d1 and d2.
    @  • If one argument is NaN, the result is the other argument.
    @  • If both arguments are NaN, the result is NaN.
    @ I have used "it" blocks here so the sequence can be assembled as either
    @ ARM or Thumb-2 code.
    vcmp    d1, d2
    vmrs    APSR_nzcv, FPSCR
    it      vs      @ Code "vs" means "unordered".
    bvs     1f      @ Jump to the NaN handler.

    @ Normal-case (not-NaN) handler.
    ite     ge
    vmovge  d0, d1  @ Select d1 if it is the greatest (or equal).
    vmovlt  d0, d2  @ Select d2 if it is the greatest.
    b       2f      @ Jump over the NaN handler.

    @ NaN handler. We know that at least one argument was NaN.
1:  vcmp    d1, #0
    vmrs    APSR_nzcv, FPSCR
    ite     vc      @ Code "vc" means "not unordered".
    vmovvc  d0, d1  @ d1 wasn't NaN, so make it the result.
    vmovvs  d0, d2  @ d1 was NaN, so choose d2. (This might be NaN too.)

2:  [...]

The vmrs instruction can also transfer the flags (along with the rest of the FPSCR) to an arbitrary general-purpose integer register, but this is usually only useful for accessing fields in the FPSCR other than the condition flags.

Related Blogs:

Condition Codes 1: Condition Flags and Codes

Condition Codes 2: Conditional Execution

Condition Codes 3: Conditional Execution in Thumb-2

Attached File(s)

vcmpdemo.tar.gz (4.06K)
Number of downloads: 182

Share This Entry:

All company and product names appearing in the ARM Blogs are trademarks and/or registered trademarks of ARM Limited per ARM’s official trademark list. All other product or service names mentioned herein are the trademarks of their respective owners.

Back to Software Enablement →

2 Comments On This Entry

Please log in above to add a comment or register for an account

第一頁

webshaker

14 March 2011 - 10:43 AM

In the ARM documentation, we can see that VPF FADDS (for example) can have conditionnal execution.

引用

31-28 27 26 25 24 23 22 21 20 19-16 15-12 11 10 9 8 7 6 5 4 3-0
cond 1 1 1 0 0 D 1 1 Vn Vd 1 0 1 sz N 0 M 0 Vm

How can we used the condition?

Now Neon syntax is used instead of VFP syntax. so FADDS is now VADD.f32
Same question how can we used condition in this syntax ?

Jacob Bramley

14 March 2011 - 11:00 AM

You can make V-prefix instructions conditional by inserting the condition code after the mnemonic, but before the type specifiers. Please don't consider this to be NEON syntax because they aren't necessarily NEON instructions! Old-style instructions can be made conditional simply by adding the condition code. The follow example emits two identical instructions (and an IT instruction if you're assembling for Thumb-2):

itt             eq
vaddeq.f32      s0, s0, s0
faddseq         s0, s0, s0

Note that true NEON instructions — and by that I mean SIMD instructions rather than scalar instructions — cannot be conditional in ARM state because they don't have the "cond" field. In Thumb-2 state, you can still make them conditional using IT blocks. The following code will work only for Thumb-2:

it              eq
vaddeq.f32      d0, d0, d0