也谈浮点精度问题：找回丢失的精度_两个字的单精度浮点数据如何恢复成一个-CSDN博客

本文链接：https://blog.csdn.net/FireCoder/article/details/5816237

要点

在java float double精度为什么会丢失？浅谈java的浮点数精度问题中描述了java中float精度丢失的问题：即便是整数也可能发生精度丢失。顺着其思路，我们简要复习一下关于floating point的精度问题

IEEE754

arithmetic formats

Binary floating-point numbers are stored in a sign-magnitude form where the most significant bit is the sign bit , exponent is the biased exponent, and "fraction" is the significand without the most significant bit .

IEEE754-1985 中规定的浮点数格式由三部分组成，符号位，指数，小数(又称尾数mantissa)。

符号位(sign bit)：1为负数，0为正数
指数(exponent biasing)：使用的是biased exponent, 也被称为移码。移码就是将数值加上某个偏移量(单精度是127，双精度是加上1023)。为什么用移码，wiki解释：使用二进制的补码来存储，不方便于比较。使用移码，负值变为正值[-126,127]变为[1,254]，方便对阶。
小数(fraction)：最高一位省略. 当1 < exponent < 2^e − 2，这个省略的一位是1，此时我们称之为规范化的数(Normalized numbers)。如果指数为0，小数部分不为0，这个省略的一位是0，我们称之为Denormalized numbers。当指数和小数部分都是0时，这个省略位也为0，根据符号位的不同，表示正负0。其他就(Infinities和NaNs)不作介绍。

关于normalized，我们十进制很熟悉，比如20014999，就是2.0014999＊10^7，二进制的normalized呢？首先将20014999转换成二进制数

1 0011 0001 0110 0111 1001 0111

然后normalized，就是1.001100010110011110010111＊2^24。得到指数为24+127=151 (10010111)，小数为.001100010110011110010111(最高位省略)，符号数为0。因此其单精度浮点数为：

0 10010111 0011 0001 0110 0111 1001 0111

细心的读者会发现，其小数部分为24位，而单精度的浮点数小数位只有23位。如果精度不足以表示时怎么办？下文我们就介绍IEEE754的Rounding alorithms。

rounding algorithms

Round to Nearest – rounds to the nearest value; if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit

y	round to nearest
+23.67	+24
+23.50	+23 or +24
−23.35	−23
−23.50	−23 or −24

用一个估计值(二进制准确表示的值)来代替原有值，最常用的就是找一个最接近的二进制值来表示。如果有两个同样接近的二进制数，则选择最后一个bit值为0的数。因此上面例子中的最后几位0111，round之后变为1000，最后得到20014999二进制表示为：

0 10010111 0011 0001 0110 0111 1001 100

从上面可以看出，小于2^24(16777216)的正整数，单精度浮点数可以准确表示。

问题

我们的问题是：有一个浮点数经过 double-> float –> double的转换，我们如果找回丢失的精度？目标精度是0.00001

经测试，double->float时后可保留7位有效精度

/** * Test faction between [[0.1, 0.99999]. * * @param diff * no lost for 7 digit fraction - 0.0000001 */ public static void precisionLost(double diff) { double scale = 100000; int lost_cnt = 0; for (int i = 1; i < 100000; i++) { double fraction = i / scale; float f = (float) fraction; // file format is in float double d = f; if (fraction - d > diff) { if (print) System.out.println(fraction + " " + d); lost_cnt++; } } System.out.println("lost: " + lost_cnt + ", diff: " + diff); }

使用Math.round来找回丢失的第5位精度

public static void appPrecisionLost(boolean recovery) { double scale = 100000; double diff = 0.00001; // request 5 precision after dot int lost_cnt = 0; for (int i = 1; i < 100000; i++) { double fraction = i / scale; double d = appDouble(fraction); if (recovery) d = appDoubleRecovery(fraction); if (fraction - d > diff) { if (print) System.out.println(fraction + " " + d); lost_cnt++; } } System.out.println("lost: " + lost_cnt + ", recovery: " + recovery); } /** * Meas logic: <br> * 1. double -> float -> double <br> * 2. only keep 5 digits after dot * * @param fraction * @return */ private static double appDouble(double fraction) { float f = (float) fraction; // file format double d = (double) (int) (f * 100000) / 100000; return d; } /** * recovery the fifth precision for (double -> float -> double) * * @param fraction * @return */ private static double appDoubleRecovery(double fraction) { float f = (float) fraction; // file format double d = (double) Math.round(f * 1000000) / 1000000; return d; }