常用浮点数存储格式IEEE 转自维基百科注意舍入误差

最新推荐文章于 2023-02-24 14:51:53 发布

qlj061001

最新推荐文章于 2023-02-24 14:51:53 发布

阅读量489

点赞数

分类专栏：数据格式文章标签：浮点型数据格式

数据格式专栏收录该内容

0 篇文章 0 订阅

订阅专栏

IEEE 754 single-precision binary floating-point format: binary32

地震数据处理软件SeisUnix的基本数据格式：
No 3200 byte textual header and no extended textual headers.
No binary header.
The data must be formatted as IEEE.

1、IEEE格式的转换精度：

当十进制有效位数最多是6位的数字，它转换成单精度的IEEE格式后，在转换为十进制可以保证与原始数值不变；

但是如果将单精度的IEEE格式数值转换为十进制数值，必须保证十进制数值有9位有效数字，才能保证转换回IEEE格式后与原来IEEE数值相同。

2、IEEE格式的表示以及转换为十进制数的方法

The IEEE 754 standard specifies a binary32 as having:

Sign bit: 1 bit 符号位，0表示正，1表示负，通常用s标识
Exponent width: 8 bits 幂数，通常用e表示其数值
Significand precision: 24 bits (23 explicitly stored) 二进制纯小数，通常用x标识

二进表示到十进制转换的公式如下：

The Value=(-1)^s*(1+x)*2^(e-127)

上图显示的二进制转换过程如下：

In this example:

${\text{sign}}=b_{31}=0$ ,
$(-1)^{\text{sign}}=(-1)^{0}=+1\in \{-1,+1\}$ ,
$e=b_{30}b_{29}\dots b_{23}=\sum _{i=0}^{7}b_{23+i}2^{+i}=124\in \{1,\ldots ,(2^{8}-1)-1\}=\{1,\ldots ,254\}$ ,
$2^{(e-127)}=2^{124-127}=2^{-3}\in \{2^{-126},\ldots ,2^{127}\}$ ,
$1.b_{22}b_{21}...b_{0}=1+\sum _{i=1}^{23}b_{23-i}2^{-i}=1+1\cdot 2^{-2}=1.25\in \{1,1+2^{-23},\ldots ,2-2^{-23}\}\subset [1;2-2^{-23}]\subset [1;2)$ .

thus:

${\text{value}}=(+1)\times 1.25\times 2^{-3}=+0.15625$ .

3、Converting from decimal representation to binary32 format

In general, refer to the IEEE 754 standard itself for the strict conversion (including the rounding behaviour) of a real number into its equivalent binary32 format.

Here we can show how to convert a base 10 real number into an IEEE 754 binary32 format using the following outline:

consider a real number with an integer and a fraction part such as 12.375
convert and normalize the integer part into binary
convert the fraction part using the following technique as shown here
add the two results and adjust them to produce a proper final conversion

Conversion of the fractional part: consider 0.375, the fractional part of 12.375. To convert it into a binary fraction, multiply the fraction by 2, take the integer part and re-multiply new fraction by 2 until a fraction of zero is found or until the precision limit is reached which is 23 fraction digits for IEEE 754 binary32 format.

0.375 x 2 = 0.750 = 0 + 0.750 => b₋₁ = 0, the integer part represents the binary fraction digit. Re-multiply 0.750 by 2 to proceed

0.750 x 2 = 1.500 = 1 + 0.500 => b₋₂ = 1

0.500 x 2 = 1.000 = 1 + 0.000 => b₋₃ = 1, fraction = 0.000, terminate

We see that (0.375)₁₀ can be exactly represented in binary as (0.011)₂.

Not all decimal fractions can be represented in a finite digit binary fraction. For example, decimal 0.1 cannot be represented in binary exactly. So it is only approximated.

Therefore, (12.375)₁₀ = (12)₁₀ + (0.375)₁₀ = (1100)₂ + (0.011)₂ = (1100.011)₂

From which we deduce:

The exponent is 3 (and in the biased form it is therefore 130 = 1000 0010) 二进制小数（1100.011）左移3位，3+127=130=e
The fraction is 100011 (looking to the right of the binary point) 二进制小数左移3位后为（1.100011）整数部分减1为（0.100011），即x
所以s=0，e=十进制130=二进制（1000 0010），x=二进制（100011）
(12.375)_{10= (0 1000 0010 1000 1100 0000 0000 0000 000)₂}

4、舍入误差的影响

Note: consider converting 68.123 into IEEE 754 binary32 format: Using the above procedure you expect to get 42883EF9 _H with the last 4 bits being 1001. However, due to the default rounding behaviour of IEEE 754 format, what you get is 42883EFA _H , whose last 4 bits are 1010.

Precision limits on integer values[edit]

Integers in $[-16777216,16777216]$ can be exactly represented
Integers in $[-33554432,-16777217]$ or in $[16777217,33554432]$ round to a multiple of 2
Integers in $[-2^{26},-2^{25}-1]$ or in $[2^{25}+1,2^{26}]$ round to a multiple of 4
....
Integers in $[-2^{127},-2^{126}-1]$ or in $[2^{126}+1,2^{127}]$ round to a multiple of $2^{103}$
Integers in $[-2^{128}+2^{104},-2^{127}-1]$ or in $[2^{127}+1,2^{128}-2^{104}]$ round to a multiple of $2^{127-23}$
Integers larger than or equal to $2^{128}$ or smaller than or equal to $-2^{128}$ are rounded to "infinity".