常用浮点数存储格式IEEE 转自维基百科 注意舍入误差

IEEE 754 single-precision binary floating-point format: binary32

地震数据处理软件SeisUnix的基本数据格式:
No 3200 byte textual header and no extended textual headers.
No binary header.
The data must be formatted as IEEE.

1、IEEE格式的转换精度:
当十进制有效位数最多是6位的数字,它转换成单精度的IEEE格式后,在转换为十进制可以保证与原始数值不变;
但是如果将单精度的IEEE格式数值转换为十进制数值,必须保证十进制数值有9位有效数字,才能保证转换回IEEE格式后与原来IEEE数值相同。
2、IEEE格式的表示以及转换为十进制数的方法

The IEEE 754 standard specifies a binary32 as having:

  • Sign bit: 1 bit 符号位,0表示正,1表示负,通常用s标识
  • Exponent width: 8 bits 幂数,通常用e表示其数值
  • Significand precision: 24 bits (23 explicitly stored) 二进制纯小数,通常用x标识



二进表示到十进制转换的公式如下:

The Value=(-1)^s*(1+x)*2^(e-127)

上图显示的二进制转换过程如下:

In this example:

  • {\displaystyle {\text{sign}}=b_{31}=0}{\text{sign}}=b_{31}=0,
  • {\displaystyle (-1)^{\text{sign}}=(-1)^{0}=+1\in \{-1,+1\}}{\displaystyle (-1)^{\text{sign}}=(-1)^{0}=+1\in \{-1,+1\}},
  • {\displaystyle e=b_{30}b_{29}\dots b_{23}=\sum _{i=0}^{7}b_{23+i}2^{+i}=124\in \{1,\ldots ,(2^{8}-1)-1\}=\{1,\ldots ,254\}}{\displaystyle e=b_{30}b_{29}\dots b_{23}=\sum _{i=0}^{7}b_{23+i}2^{+i}=124\in \{1,\ldots ,(2^{8}-1)-1\}=\{1,\ldots ,254\}},
  • {\displaystyle 2^{(e-127)}=2^{124-127}=2^{-3}\in \{2^{-126},\ldots ,2^{127}\}}{\displaystyle 2^{(e-127)}=2^{124-127}=2^{-3}\in \{2^{-126},\ldots ,2^{127}\}},
  • {\displaystyle 1.b_{22}b_{21}...b_{0}=1+\sum _{i=1}^{23}b_{23-i}2^{-i}=1+1\cdot 2^{-2}=1.25\in \{1,1+2^{-23},\ldots ,2-2^{-23}\}\subset [1;2-2^{-23}]\subset [1;2)}{\displaystyle 1.b_{22}b_{21}...b_{0}=1+\sum _{i=1}^{23}b_{23-i}2^{-i}=1+1\cdot 2^{-2}=1.25\in \{1,1+2^{-23},\ldots ,2-2^{-23}\}\subset [1;2-2^{-23}]\subset [1;2)}.

thus:

  • {\displaystyle {\text{value}}=(+1)\times 1.25\times 2^{-3}=+0.15625}{\displaystyle {\text{value}}=(+1)\times 1.25\times 2^{-3}=+0.15625}.

3、Converting from decimal representation to binary32 format

In general, refer to the IEEE 754 standard itself for the strict conversion (including the rounding behaviour) of a real number into its equivalent binary32 format.

Here we can show how to convert a base 10 real number into an IEEE 754 binary32 format using the following outline:

  • consider a real number with an integer and a fraction part such as 12.375
  • convert and normalize the integer part into binary
  • convert the fraction part using the following technique as shown here
  • add the two results and adjust them to produce a proper final conversion

Conversion of the fractional part: consider 0.375, the fractional part of 12.375. To convert it into a binary fraction, multiply the fraction by 2, take the integer part and re-multiply new fraction by 2 until a fraction of zero is found or until the precision limit is reached which is 23 fraction digits for IEEE 754 binary32 format.

0.375 x 2 = 0.750 = 0 + 0.750 => b−1 = 0, the integer part represents the binary fraction digit. Re-multiply 0.750 by 2 to proceed

0.750 x 2 = 1.500 = 1 + 0.500 => b−2 = 1

0.500 x 2 = 1.000 = 1 + 0.000 => b−3 = 1, fraction = 0.000, terminate

We see that (0.375)10 can be exactly represented in binary as (0.011)2

Not all decimal fractions can be represented in a finite digit binary fraction. For example, decimal 0.1 cannot be represented in binary exactly. So it is only approximated.

Therefore, (12.375)10 = (12)10 + (0.375)10 = (1100)2 + (0.011)2 = (1100.011)2

From which we deduce:

  • The exponent is 3 (and in the biased form it is therefore 130 = 1000 0010)  二进制小数(1100.011)左移3位 ,3+127=130=e
  • The fraction is 100011 (looking to the right of the binary point) 二进制小数左移3位后为(1.100011)整数部分减1为(0.100011),即x
  • 所以s=0,e=十进制130=二进制(1000 0010),x=二进制(100011)
  • (12.375)10= (0 1000 0010 1000 1100 0000 0000 0000 000)2

4、舍入误差的影响

Note: consider converting 68.123 into IEEE 754 binary32 format: Using the above procedure you expect to get 42883EF9 H  with the last 4 bits being 1001. However, due to the default rounding behaviour of IEEE 754 format, what you get is 42883EFA H , whose last 4 bits are 1010.

Precision limits on integer values[edit]

  • Integers in {\displaystyle [-16777216,16777216]}[-16777216,16777216] can be exactly represented
  • Integers in {\displaystyle [-33554432,-16777217]}[-33554432,-16777217] or in {\displaystyle [16777217,33554432]}[16777217,33554432] round to a multiple of 2
  • Integers in {\displaystyle [-2^{26},-2^{25}-1]}[-2^{26},-2^{25}-1] or in {\displaystyle [2^{25}+1,2^{26}]}[2^{25}+1,2^{26}] round to a multiple of 4
  • ....
  • Integers in {\displaystyle [-2^{127},-2^{126}-1]}[-2^{127},-2^{126}-1] or in {\displaystyle [2^{126}+1,2^{127}]}[2^{126}+1,2^{127}] round to a multiple of {\displaystyle 2^{103}}2^{103}
  • Integers in {\displaystyle [-2^{128}+2^{104},-2^{127}-1]}[-2^{128}+2^{104},-2^{127}-1] or in {\displaystyle [2^{127}+1,2^{128}-2^{104}]}[2^{127}+1,2^{128}-2^{104}] round to a multiple of {\displaystyle 2^{127-23}}2^{127-23}
  • Integers larger than or equal to {\displaystyle 2^{128}}2^{128} or smaller than or equal to {\displaystyle -2^{128}}-2^{128} are rounded to "infinity".



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值