Floating-Point overflow and underflow

(一)What will happen if so ?

Floating-Point Overflow and Underflow 
Suppose the biggest possible float value on your system is about 3.4E38 and you do this: 

float toobig = 3.4E38 * 100.0f; 
printf("%e\n", toobig); 
What happens? This is an example of overflow—when a calculation leads to a number too large 
to be expressed. The behavior for this case used to be undefined, but now C specifies that 
toobig gets assigned a special value that stands for infinity and that printf() displays either 
inf or infinity (or some variation on that theme) for the value. 

What about dividing very small numbers? Here the situation is more involved. Recall that a 
float number is stored as an exponent and as a value part, or mantissa. There will be a number 
that has the smallest possible exponent and also the smallest value that still uses all the bits 
available to represent the mantissa. This will be the smallest number that still is represented to 
the full precision available to a float value. Now divide it by 2. Normally, this reduces the 
exponent, but the exponent already is as small as it can get. So, instead, the computer moves 
the bits in the mantissa over, vacating the first position and losing the last binary digit. An 
analogy would be taking a base 10 value with four significant digits, such as 0.1234E-10, 
dividing by 10, and getting 0.0123E-10. You get an answer, but you've lost a digit in the 
process. This situation is called underflow
, and C refers to floating-point values that have lost 
the full precision of the type as subnormal. So dividing the smallest positive normal floating-
point value by 2 results in a subnormal value. If you divide by a large enough value, you lose all 
the digits and are left with 0. The C library now provides functions that let you check whether 
your computations are producing subnormal values. 
There's another special floating-point value that can show up: NaN, or not-a-number. For 
example, you give the asin() function a value, and it returns the angle that has that value as 
its sine. But the value of a sine can't be greater than 1, so the function is undefined for values in 
excess of 1. In such cases, the function returns the NaN value, which printf() displays as nan, 
NaN, or something similar. 

(二)its detection

https://stackoverflow.com/questions/15655070/how-to-detect-double-precision-floating-point-overflow-and-underflow 

To be perfectly portable, you have to check before the operation, e.g. (for addition):

if ( (a < 0.0) == (b < 0.0)
    && std::abs( b ) > std::numeric_limits<double>::max() - std::abs( a ) ) {
    //  Addition would overflow...
}

Similar logic can be used for the four basic operators.

If all of the machines you target support IEEE (which is probably the case if you don't have to consider mainframes), you can just do the operations, then use isfinite or isinf on the results.

For underflow, the first question is whether a gradual underflow counts as underflow or not. If not, then simply checking if the results are zero and a != -b would do the trick. If you want to detect gradual underflow (which is probably only present if you have IEEE), then you can use isnormal—this will return false if the results correspond to gradual underflow. (Unlike overflow, you test for underflow after the operation.)

(三)

https://hk.saowen.com/a/d229a08eae12ccedc4ce2dd540eaec49920c861c11679d6bb88a4276318fdf3f 

自帶的函數limits,

#include<iostream>
#include<limits>
using namespace std;
int main(void){
    
    cout<<"int "<<"    所佔字節數: "<<sizeof(int);
    cout<<"    最大值: "<<(numeric_limits<int>::max)();
    cout<<"    最小值: "<<(numeric_limits<int>::min)()<<endl; 
    
    cout<<"double "<<"    所佔字節數: "<<sizeof(double);
    cout<<"    最大值: "<<(numeric_limits<double>::max)();
    cout<<"    最小值: "<<(numeric_limits<double>::min)()<<endl; 
    
    return 0;
}

輸出為

int     所佔字節數: 4  最大值: 2147483647      最小值: -2147483648
double  所佔字節數: 8  最大值: 1.79769e+308    最小值: 2.22507e-308

(四)C

integer limits: limits.h

floating point limits: float.h

Inside float.h

FLT_MAX 
DBL_MAX 
LDBL_MAX
1E+37 or greater
1E+37 or greater
1E+37 or greater
MAXimumMaximum finite representable floating-point number.
FLT_MIN 
DBL_MIN 
LDBL_MIN
1E-37 or smaller
1E-37 or smaller
1E-37 or smaller
MINimumMinimum representable positive floating-point number.

 

 

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值