11809 - Floating-Point Numbers

最新推荐文章于 2021-11-15 15:33:15 发布

gaoxiangnumber1

最新推荐文章于 2021-11-15 15:33:15 发布

阅读量596

点赞数

本文链接：https://blog.csdn.net/gaoxiangnumber1/article/details/43878281

版权

Floating-point numbers are representeddifferently in computers than integers. That is why a 32-bit floating-pointnumber can represent values in the magnitude of 10³⁸ while a32-bit integer can only represent values as high as 2³².

Although there are variations in the waysfloating-point numbers are stored in Computers, in this problem we will assumethat floating-point numbers are stored in the following way:

Floating-point numbers have two parts mantissa and exponent. M-bits areallotted for mantissa and E bits are allotted for exponent. There is also onebit that denotes the sign of number (If this bit is 0 then the number ispositive and if it is 1 then the number is negative) and another bit thatdenotes the sign of exponent (If this bit is 0 then exponent is positiveotherwise negative). The value of mantissa and exponent together make the valueof the floating-point number. If the value of mantissa is m then it maintainsthe constraints .The left most digit of mantissa must always be 1 to maintain theconstraint .So this bit is not stored as it is always 1. So the bits in mantissa actuallydenote the digits at the right side of decimal point of a binary number(Excluding the digit just to the right of decimal point)

In the figure above we can see afloating-point number where M=8 and E=6. The largest value this floating-pointnumber can represent is (in binary) .The decimal equivalent to this number is: .Given the maximum possible value represented by a certain floating point type,you will have to find how many bits are allotted for mantissa (M) and how manybits are allotted for exponent (E) in that certain type.

Input

The input file contains around 300 line ofinput. Each line contains a floating-point number F that denotes the maximumvalue that can be represented by a certain floating-point type. The floatingpoint number is expressed in decimal exponent format. So a number AeB actuallydenotes the value .A line containing 0e0 terminates input. The value of A will satisfy theconstraint 0<A<10 and will have exactly 15 digits after the decimalpoint.

Output

For each line of input produce one line ofoutput. This line contains the value of M and E. You can assume that each ofthe inputs (except the last one) has a possible and unique solution. You canalso assume that inputs will be such that the value of M and E will follow theconstraints: 9 ≥ M ≥ 0 and 30 ≥ E ≥ 1. Also there is no need to assume that(M+E+2) will be a multiple of 8.

Sample Input

Sample Output

5.699141892149156e76

9.205357638345294e18

0e0

5 8

8 6

#include<iostream>

#include<cmath>

#include<cstdio>

#include<memory.h>

#include<string>

using namespace std;

double num[15][40];

int main()

{

for(int i=0; i<=9; i++)

{

double m=0;

for(int k=1; k<=i+1; k++)

{

m=m+pow(2,-k);//参见下面:二进制小数转化原理

}

for(int j=1; j<=30; j++)

{

num[i][j]=log10(m)+(pow(2,j)-1)*log10(2);

/*一定要分开写,虽然理论上写成log10(m*pow(2,(pow(2,j)-1)))结果一样，但是中间会爆掉，得到错误结果*/

}

string s;

while(cin>>s&&s!="0e0")

{

char s1[20],s2[20];

memset(s1,'\0',sizeof(s1));

memset(s2,'\0',sizeof(s2));

for(int i=0; i<=16; i++)

{

s1[i]=s[i];

}

for(int i=18; i<s.size(); i++)

{

s2[i-18]=s[i];

}

double a;

int b;

sscanf(s1,"%lf",&a);

sscanf(s2,"%d",&b);

double test=log10(a)+b;

int M=0,E=1;

double cha=fabs(test-num[0][1]);

for(int i=0; i<=9; i++)

{

for(int j=1; j<=30; j++)

{

double cha2=fabs(test-num[i][j]);

if(cha2<cha)

{

M=i;

E=j;

cha=cha2;//一定别忘了更新最小值

}

cout<<M<<" "<<E<<endl;

}

}解释:

首先要知道二进制小数的原理：

思路是先把所有的M、E对应的值算出来，记为矩阵C，其中C[M,E]=尾数M、阶码E对应的最大值。然后对于输入的AeB,找出C中与其相等的值所在的行M、列E即是答案。

要注意的是:

一是这个数值太大，可以达到2^(2^30-1)，远超了double的存储范围。所以取以10为底的对数值来存储并且在中间过程尽量化简。

二是计算过程的舍尾误差与double本身存在的误差会导致精度问题。所以可以在矩阵C中找与输入值最接近的作为匹配点，也就是差值的绝对值最小并同时别忘记更新最小值。

gaoxiangnumber1

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
11809 - Floating-Point Numbers

Floating-point numbers are representeddifferently in computers than integers. That is why a 32-bit floating-pointnumber can represent values in the magnitude of 1038 while a32-bit integer can only rep
复制链接

扫一扫