Q: What is IEEE 754 standard?
A : IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based PC's, Macintoshes, and most Unix platforms.
Q : Is this the format use by Microsoft VC++ also?
A : Microsoft Visual C++ is consistent with the IEEE numeric standards. There are three internal varieties of real numbers. Real*4 and real*8 are used in Visual C++. Real*4 is declared using the word float . Real*8 is declared using the word double . In Windows 32-bit programming, the long double data type maps to double . There is, however, assembly language support for computations using the real*10 data type.
Q : What is the format specified by the standard?
A : IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa. The sign bit is 0 for positive, 1 for negative. The exponent's base is two. The exponent field contains 127 plus the true exponent for single-precision, or 1023 plus the true exponent for double precision. The first bit of the mantissa is typically assumed to be 1.f, where f is the field of fraction bits.
To learn more about the standard see:
Q : What is the range of real numbers in VC++?
A :
float (4 bytes) : 1.175494351E-38 to 3.402823466E+38, significant decimal digits: 6
double (8 bytes) : 2.2250738585072014E-308 to 1.7976931348623158E+308, significant decimal digits: 15
real*10 (10 bytes) : 3.37E-4932 to 1.18E+4932, significant decimal digits: 19
Q : I have a problem with the following code.
Code:
int main() { float a = 2.501f; a *= 1.5134f; if (a == 3.7850134) cout << "Expected value" << endl; else cout << "Unexpected value" << endl; }
A : Floating-point decimal values generally do not have an exact binary representation. This is a side effect of how the CPU represents floating point data. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. If you do a calculation and then compare the results against some expected value it is highly unlikely that you will get exactly the result you intended.
To summarize, never make such a comparison:
Code:
if (a == b) ...
Code:
if( fabs(a - b) < error) ...
Code:
#define EPSILON 0.0001 // Define your own tolerance #define FLOAT_EQ(x,v) (((v - EPSILON) < x) && (x <( v + EPSILON))) int main() { float a = 2.501f; a *= 1.5134f; if (FLOAT_EQ(a, 3.7850)) cout << "Expected value" << endl; else cout << "Unexpected value" << endl; }
In order to avoid any misleading you should understand that there can be only 6 decimal digits in the result. But this does not imply 0.000001! When dealing with all small values you could just as well have an epsilon of 0.0000000000000001 providing the values compared to are equaly small enough. In the case of a float value of 12345.6789, the float is only reliably correct to the first 6 decimal digits, so, it's at best guaranteed accurate only to 0.1. Using the epsilon macro to an accuracy of 0.0001 may not actually help in establishing equality.
It is a common misconception that epsilon when dealing with floats is (or can be) an absolute value. It is not! Epsilon (as in the FLT_EPSILON or DBL_EPSILON definitions) is the minimal representable value, but in order to apply it to a result, you have to scale epsilon to the same exponent as the values you are comparing.
Code:
// float.h #define DBL_EPSILON 2.2204460492503131e-016 /* smallest such that 1.0+DBL_EPSILON != 1.0 */ #define FLT_EPSILON 1.192092896e-07F /* smallest such that 1.0+FLT_EPSILON != 1.0 */
Code:
float a = 51234.1f; a*= 79.6787f; if (FLOAT_EQ(a,4082266.48367)) ...
Q : Why this inaccuracy of floating type representations and not of integer types also?
A : An integer type number is a string of bits that represent the powers of two, and these powers sum to give the decimal number. For instance 1011 is in decimal 8 + 2 + 1, which is 11.
On the other hand a floating type number is a string of bits that represent the inverted powers of two. For instance 0.1011 is decimal 1/2 + 1/8 + 1/16, which is 0.6875. While you can accurately represent some decimal values (like 0.5, 0.25, 0.75, 0.625,...) you can't accurately represent all decimals values (like 0.1).
Q : The following program outputs "Expected value" both in Release and Debug builds. Why?
Code:
int main() { float a = 2; a *= 1.5; if (a == 3) cout << "Expected value" << endl; else cout << "Unexpected value" << endl; }
Q : But the next program outputs "Expected value" in the Debug build and "Unexpected value" in the Release and I don't know why.
Code:
int main() { float a = 0.1; a*=10; if (a == 1.0) cout << "Expected value" << endl; else cout << "Unexpected value" << endl; return 0; }
Q : Where from can I learn more about floating point comparison?
A : See the Comparing floating point numbers article by Bruce Dawson.
Credits: This FAQ was written with the help of OReubens