The Conversion Procedure
The rules for converting a decimal number into floating point are as follows:- Convert the absolute value of the number to binary, perhaps with a fractional part after the binary point. This can be done by converting the integral and fractional parts separately. The integral part is converted with the techniques examined previously. The fractional part can be converted by multiplication. This is basically the inverse of the division method: we repeatedly multiply by 2, and harvest each one bit as it appears left of the decimal.
- Append × 20 to the end of the binary number (which does not change its value).
- Normalize the number. Move the binary point so that it is one bit from the left. Adjust the exponent of two so that the value does not change.
- Place the mantissa into the mantissa field of the number. Omit the leading one, and fill with zeros on the right.
- Add the bias to the exponent of two, and place it in the exponent field. The bias is 2k−1 − 1, where k is the number of bits in the exponent field. For the eight-bit format, k = 3, so the bias is 23−1 − 1 = 3. For IEEE 32-bit, k = 8, so the bias is 28−1 − 1 = 127.
- Set the sign bit, 1 for negative, 0 for positive, according to the sign of the original number.
Using The Conversion Procedure
- Convert 2.625 to our 8-bit floating point format.
- The integral part is easy, 210 = 102. For the fractional part:
So 0.62510 = 0.1012, and 2.62510 = 10.1012.0.625 × 2 = 1.25 1 Generate 1 and continue with the rest. 0.25 × 2 = 0.5 0 Generate 0 and continue. 0.5 × 2 = 1.0 1 Generate 1 and nothing remains. - Add an exponent part: 10.1012 = 10.1012 × 20.
- Normalize: 10.1012 × 20 = 1.01012 × 21.
- Mantissa: 0101
- Exponent: 1 + 3 = 4 = 1002.
- Sign bit is 0.
- The integral part is easy, 210 = 102. For the fractional part:
- Convert -4.75 to our 8-bit floating point format.
- The integral part is 410 = 1002. The fractional:
So 4.7510 = 100.112.0.75 × 2 = 1.5 1 Generate 1 and continue with the rest. 0.5 × 2 = 1.0 1 Generate 1 and nothing remains. - Normalize: 100.112 = 1.00112 × 22.
- Mantissa is 0011, exponent is 2 + 3 = 5 = 1012, sign bit is 1.
- The integral part is 410 = 1002. The fractional:
- Convert 0.40625 to our 8-bit floating point format.
- Converting:
So 0.4062510 = 0.011012.0.40625 × 2 = 0.8125 0 Generate 0 and continue. 0.8125 × 2 = 1.625 1 Generate 1 and continue with the rest. 0.625 × 2 = 1.25 1 Generate 1 and continue with the rest. 0.25 × 2 = 0.5 0 Generate 0 and continue. 0.5 × 2 = 1.0 1 Generate 1 and nothing remains. - Normalize: 0.011012 = 1.1012 × 2-2.
- Mantissa is 1010, exponent is -2 + 3 = 1 = 0012, sign bit is 0.
- Converting:
- Convert -12.0 to our 8-bit floating point format.
- 1210 = 11002.
- Normalize: 1100.02 = 1.12 × 23.
- Mantissa is 1000, exponent is 3 + 3 = 6 = 1102, sign bit is 1.
- Convert decimal 1.7 to our 8-bit floating point format.
- The integral part is easy, 110 = 12. For the fractional part:
The reason why the process seems to continue endlessly is that it does. The number 7/10, which makes a perfectly reasonable decimal fraction, is a repeating fraction in binary, just as the faction 1/3 is a repeating fraction in decimal. (It repeats in binary as well.) We cannot represent this exactly as a floating point number. The closest we can come in four bits is .1011. Since we already have a leading 1, the best eight-bit number we can make is 1.1011.0.7 × 2 = 1.4 1 Generate 1 and continue with the rest. 0.4 × 2 = 0.8 0 Generate 0 and continue. 0.8 × 2 = 1.6 1 Generate 1 and continue with the rest. 0.6 × 2 = 1.2 1 Generate 1 and continue with the rest. 0.2 × 2 = 0.4 0 Generate 0 and continue. 0.4 × 2 = 0.8 0 Generate 0 and continue. 0.8 × 2 = 1.6 1 Generate 1 and continue with the rest. 0.6 × 2 = 1.2 1 Generate 1 and continue with the rest. … - Already normalized: 1.10112 = 1.10112 × 20.
- Mantissa is 1011, exponent is 0 + 3 = 3 = 0112, sign bit is 0.
- The integral part is easy, 110 = 12. For the fractional part:
- Convert -1313.3125 to IEEE 32-bit floating point format.
- The integral part is 131310 = 101001000012. The fractional:
So 1313.312510 = 10100100001.01012.0.3125 × 2 = 0.625 0 Generate 0 and continue. 0.625 × 2 = 1.25 1 Generate 1 and continue with the rest. 0.25 × 2 = 0.5 0 Generate 0 and continue. 0.5 × 2 = 1.0 1 Generate 1 and nothing remains. - Normalize: 10100100001.01012 = 1.010010000101012 × 210.
- Mantissa is 01001000010101000000000, exponent is 10 + 127 = 137 = 100010012, sign bit is 1.
- The integral part is 131310 = 101001000012. The fractional:
- Convert 0.1015625 to IEEE 32-bit floating point format.
- Converting:
So 0.101562510 = 0.00011012.0.1015625 × 2 = 0.203125 0 Generate 0 and continue. 0.203125 × 2 = 0.40625 0 Generate 0 and continue. 0.40625 × 2 = 0.8125 0 Generate 0 and continue. 0.8125 × 2 = 1.625 1 Generate 1 and continue with the rest. 0.625 × 2 = 1.25 1 Generate 1 and continue with the rest. 0.25 × 2 = 0.5 0 Generate 0 and continue. 0.5 × 2 = 1.0 1 Generate 1 and nothing remains. - Normalize: 0.00011012 = 1.1012 × 2-4.
- Mantissa is 10100000000000000000000, exponent is -4 + 127 = 123 = 011110112, sign bit is 0.
- Converting:
- Convert 39887.5625 to IEEE 32-bit floating point format.
- The integral part is 3988710 = 10011011110011112. The fractional:
So 39887.562510 = 1001101111001111.10012.0.5625 × 2 = 1.125 1 Generate 1 and continue with the rest. 0.125 × 2 = 0.25 0 Generate 0 and continue. 0.25 × 2 = 0.5 0 Generate 0 and continue. 0.5 × 2 = 1.0 1 Generate 1 and nothing remains. - Normalize: 1001101111001111.10012 = 1.00110111100111110012 × 215.
- Mantissa is 00110111100111110010000, exponent is 15 + 127 = 142 = 100011102, sign bit is 0.
- The integral part is 3988710 = 10011011110011112. The fractional:
Source: http://sandbox.mc.edu/~bennet/cs110/flt/dtof.html
类型
|
存储位数
|
偏置值(Bias)
| ||||
数符(s)
|
阶码(exp)
|
尾数小数部分(frac)
|
总位数
|
十六进制
|
十进制
| |
短浮点数(Single,float)
|
1位
|
8位
|
23位
|
32位
|
7FH
|
+127
|
长浮点数(Double)
|
1位
|
11位
|
52位
|
64位
|
3FFH
|
+1023
|
临时浮点数(扩展精度浮点数)
|
1位
|
15位
|
64位
|
80位
|
3FFFH
|
+16383
|