What’s the difference between a single precision and double precision floating point operation?
- float:精度范围
10−38∼1038
10
−
38
∼
10
38
- exp(−102)≈10−44 exp ( − 10 2 ) ≈ 10 − 44 ,float 下溢
- double:精度范围
10−308∼10308
10
−
308
∼
10
308
- exp(−103)≈10−434 exp ( − 10 3 ) ≈ 10 − 434 ,double 下溢;
0. 64-bits CPU
如果说一个 CPU 是 64 位机,通常意味着,其具有 64 位的通用寄存器(general purpose register)以及内存地址空间的大小(memory address size),这与最终执行的数学运算,是单精度还是双精度,没有关系。
1. 单精度
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
0 1 8 9 31
- 第 1 个 bit 位,表示的是符号位,S;
- 中间 8 位,表示指数部分,E;
- 末尾的 23 位,则表示小数部分,F;
- E=0,F=0,S=1,=> -0
- E=0,F=0,S=0,=> 0
- 0
0 00000000 00000000000000000000000 = 0
E=0,F=0,S=0,=> 0
1 00000000 00000000000000000000000 = -0
E=0,F=0,S=1,=> -0
0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity
0 11111111 00000100000000000000000 = NaN
E=255,F 非零
1 11111111 00100010001001010101010 = NaN
E=255,F 非零
0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1.101 => 1+0.5+0.125=1.625
1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5
0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)
0 00000000 00000000000000000000001 = +1 * 2**(-126) *
0.00000000000000000000001 =
2**(-149) (Smallest positive value)
2. 双精度
S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1 11 12 63
- 1 位;
- 11 位;
- 52 位;