45
小数值 | 二进制表示 | 十进制表示 |
1/8 | 0.001 | 0.125 |
3/4 | 1/2+1/4 = 0.11 | 0.75 |
25/16 | (16+8+1)/16 = (11001b)/16 = 1.1001 | 1.5625 |
(101011b)/2^4 = 43/16 | 10.1011 | 2.6875 |
(1001b)/2^3 = 9/8 | 1.001 | 1.125 |
(5*8+7)/8=47/8 | 101111b/8 = 101.111 | 5.875 |
(51/16) | 110011b/16 = 11.0011 | 3.1875 |
46
A: 0.1 -x 的二进制表示
0.1 = 0.0001100110011001100110011[0011]
x = 0.0001100110011001100110000
0.1 - x = 0.0000000000000000000000011[0011]...
B: 0.1 - x的近似十进制值
x = 0.0001100110011001100110000
= 00001100110011001100110000 / 2^25
= 110011001100110011 / 2^21
= 209715/2097152
= 0.0999999046325684
0.1 - x = 0.0000000953674316 = 0.953674316 * 10^(-7)
C: 100h = 360000s => count = 3600000
count * x = (3600000*209715)/2097152 = 359999.6566772461s
deta = 3600000 - 359999.6566772461 = 0.34332275390625s
误差为 0.34332275390625s秒。
D: 每秒误差 = (0.1-x) * 10 * 2000m/s= 0.953674316 * 10^(-6)s *2000m/s = 1.907348632 * 10^(-3)s ~= 1.91毫米.
47
Bias = 2^(k-1) -1 = 2^1 - 1 = 1
位 | e | E | 2^E | f | M | 2^E * M | V | 十进制 |
0 00 00 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
0 00 01 | 0 | 0 | 1 | 1/4 | 1/4 | 1/4 | 1/4 | 0.25 |
0 00 10 | 0 | 0 | 1 | 1/2 | 1/2 | 2/4 | 1/2 | 0.5 |
0 00 11 | 0 | 0 | 1 | 3/4 | 3/4 | 3/4 | 3/4 | 0.75 |
0 01 00 | 1 | 0 | 1 | 0 | 1 | 4/4 | 1 | 1 |
0 01 01 | 1 | 0 | 1 | 1/4 | 5/4 | 5/4 | 5/4 | 1.25 |
0 01 10 | 1 | 0 | 1 | 1/2 | 3/2 | 6/4 | 3/2 | 1.5 |
0 01 11 | 1 | 0 | 1 | 3/4 | 7/4 | 7/4 | 7/4 | 1.75 |
0 10 00 | 2 | 1 | 2 | 0 | 1 | 8/4 | 2 | 2 |
0 10 01 | 2 | 1 | 2 | 1/4 | 5/4 | 10/4 | 5/2 | 2.5 |
0 10 10 | 2 | 1 | 2 | 1/2 | 3/2 | 12/4 | 3 | 3 |
0 10 11 | 2 | 1 | 2 | 3/4 | 7/4 | 14/4 | 7/2 | 3.5 |
0 11 00 | - | - | - | - | - | - | 正无穷 | - |
0 11 01 | - | - | - | - | - | - | NaN | - |
0 11 10 | - | - | - | - | - | - | NaN | - |
0 11 11 | - | - | - | - | - | - | NaN | - |
48
3510593 = 1101011001000101000001b
3510593.0 = 0x4a564504 = 1001010010101100100010100000100 = 0 10010100 10101100100010100000100
M = 1 . 10101100100010100000100
e = 10010100 = 148
E = 148 - 127 = 21
所以小数点移动右移动21位, V = 1101011001000101000001.00 = 3510593.00
49
A 这个正整数是 2^(n+2) + 1
B 那就是2^25 + 1
50
|
|
数值 | 舍入 | 数值 |
A | 10.010 | 2.25 | 10.0 | 2 |
B | 10.011 | 2.375 | 10.1 | 2.5 |
C | 10.110 | 2.75 | 11.0 | 3 |
D | 10.001 | 2.125 | 10.0 | 2 |
51
x = 0.00011001100110011001100
0.1 = 0.00011001100110011001100 110011[0011]
x' = 0.00011001100110011001101
x' - 0.1 = 00011001100110011001101 / 2^23 - 0.1
= 838861/8388608 - 0.1
= (838861 - 838860.8)/8388608
= 2 / 8388608
= 1 / 4194304
= 2.384185791015625e-7
52
位 | 值 | 位 | 值 |
011 0000 | 1 | 0111 000 | 1 |
101 1110 | 7.5 | 1001 111 | 7.5 |
010 1001 | 0.78125 | 0110 100 | 0.75 |
110 1111 | 15.5 | 1011 000 | 16 |
000 0001 | 0.015625 | 0001 000 | 0.015625 |
53
#define HUGE_NUM (1.0e300)
#define POS_INFINITY (HUGE_NUM*HUGE_NUM)
#define NEG_INFINITY (-1*POS_INFINITY)
#define NEG_ZERO (0)
54
A true,因为double的n有52位,所以任何32位整数都可以精确的表示。
B x = 0x7fffffff; 这个数字包含31个1,所以float没法精确标示。
C d= 1.11111111111111111111111111111 1的个数超过23位就可以了。
D true, 一样,double可以表示所有的float数字.
E true, 浮点数的正数和负数的表示范围一样,不会出现溢出.
F true
G true 浮点数没有负溢出,正数乘法的结果永远是正数。
H d = 18014398509481984; f = 2; 构造方法是找个大的数d, 让d+f无法精确表示,但是d-f能表示.