计算机中的浮点数 - 为什么十进制的 0.1 在计算机中是一个无限循环小数

最新推荐文章于 2024-11-04 16:02:38 发布

二分掌柜的

最新推荐文章于 2024-11-04 16:02:38 发布

阅读量1.9k

点赞数 21

分类专栏：深度学习目标检测数学物理文章标签： c++ 算法开发语言

本文链接：https://blog.csdn.net/flyfish1986/article/details/140071611

版权

深度学习同时被 3 个专栏收录

149 篇文章

订阅专栏

目标检测

94 篇文章

订阅专栏

数学物理

74 篇文章

订阅专栏

计算机中的浮点数 - 为什么十进制的 0.1 在计算机中是一个无限循环小数

flyfish

用 float 或 double 来存储小数时不是精确值

浮点数在计算机中是以二进制形式存储的，通常使用 IEEE 754 标准。浮点数由三个部分组成：符号位、指数位和尾数位。
先看一个例子

#include <iostream>
#include <iomanip>

using namespace std;

int main()
{
    cout << "Hello World!" << endl;

    double x = 1.0 / 10.0;
    double y = 1.0 - 0.9;
    double z = 1.0 + 0.1;

    // 设置输出精度
    cout << fixed << setprecision(17);

    // 观察 x、y、z 的结果
    cout << "x = " << x << endl;
    cout << "y = " << y << endl;
    cout << "z = " << z << endl;

    return 0;
}

Hello World!
x = 0.10000000000000001
y = 0.09999999999999998
z = 1.10000000000000009

浮点数比较

由于浮点数运算可能产生微小的误差，在比较浮点数时，应避免直接使用 ==。可以定义一个非常小的数（称为 epsilon）来进行比较。

#include <cmath>
#include <iostream>

bool isEqual(double a, double b, double epsilon = 1e-10) {
    return std::fabs(a - b) < epsilon;
}

int main() {
    double a = 0.1 * 3;
    double b = 0.3;
    if (isEqual(a, b)) {
        std::cout << "a and b are equal." << std::endl;
    } else {
        std::cout << "a and b are not equal." << std::endl;
    }
    return 0;
}

a and b are equal.

float 和 double 类型的 0.1 并不相等，因为它们在二进制中的表示不完全相同

#include <iostream>
#include <iomanip>

int main() {
    float a = 0.1f;
    double b = 0.1;

    std::cout << std::setprecision(20);
    std::cout << "float a = 0.1f: " << a << std::endl;
    std::cout << "double b = 0.1: " << b << std::endl;

    if (a == b) {
        std::cout << "a and b are equal." << std::endl;
    } else {
        std::cout << "a and b are not equal." << std::endl;
    }

    return 0;
}

float a = 0.1f: 0.10000000149011611938
double b = 0.1: 0.10000000000000000555
a and b are not equal.

float 和 double 的精度差异

#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>

int main() {
    float floatNum = 1.0f / 7.0f;
    double doubleNum = 1.0 / 7.0;

    // 设置输出精度
    std::cout << std::fixed << std::setprecision(64);

    // 输出 float 和 double 的值
    std::cout << "float:  " << floatNum << std::endl;
    std::cout << "double: " << doubleNum << std::endl;

    return 0;
}

float:  0.1428571492433547973632812500000000000000000000000000000000000000
double: 0.1428571428571428492126926812488818541169166564941406250000000000

将循环小数转换为分数

#include <iostream>
#include <string>
#include <sstream>
#include <cmath>
#include <iomanip>

// 定义一个结构来表示分数
struct Fraction {
    long long numerator;
    long long denominator;
};

// 最大公约数
long long gcd(long long a, long long b) {
    return b == 0 ? a : gcd(b, a % b);
}

// 将小数部分转换为分数
Fraction repeatingDecimalToFraction(const std::string& decimal) {
    size_t pos = decimal.find('(');
    std::string nonRepeatingPart = decimal.substr(0, pos);
    std::string repeatingPart = decimal.substr(pos + 1, decimal.size() - pos - 2);

    // 非循环部分和循环部分长度
    int n = nonRepeatingPart.size() - 2; // 减去 "0." 的长度
    int m = repeatingPart.size();

    // 非循环部分的小数
    double nonRepeatingDecimal = std::stod(nonRepeatingPart);

    // 构造非循环部分的分数
    long long nonRepeatingNumerator = static_cast<long long>(nonRepeatingDecimal * std::pow(10, n));
    long long nonRepeatingDenominator = std::pow(10, n);

    // 构造循环部分的分数
    long long repeatingNumerator = std::stoll(repeatingPart);
    long long repeatingDenominator = std::pow(10, m) - 1;

    // 将循环部分的分数移动到正确的位置
    repeatingNumerator += nonRepeatingNumerator * repeatingDenominator;
    repeatingDenominator *= nonRepeatingDenominator;

    // 简化分数
    long long divisor = gcd(repeatingNumerator, repeatingDenominator);
    repeatingNumerator /= divisor;
    repeatingDenominator /= divisor;

    return {repeatingNumerator, repeatingDenominator};
}

int main() {
    std::string decimal = "0.285714(285714)";
    Fraction fraction = repeatingDecimalToFraction(decimal);

    std::cout << "Fraction: " << fraction.numerator << "/" << fraction.denominator << std::endl;
    return 0;
}

Fraction: 2/7

查看浮点数的IEEE 754表示

IEEE 754表示：这是浮点数在计算机内存中的存储格式，包含了符号、指数和尾数。用于浮点数计算和存储。

#include <iostream>
#include <bitset>
#include <iomanip>

void printFloatBinary(float number) {
    // 将 float 类型重新解释为 uint32_t 类型
    uint32_t binary = *reinterpret_cast<uint32_t*>(&number);
    std::bitset<32> bits(binary);

    std::cout << "Float: " << number << std::endl;
    std::cout << "Binary: " << bits << std::endl;
}

void printDoubleBinary(double number) {
    // 将 double 类型重新解释为 uint64_t 类型
    uint64_t binary = *reinterpret_cast<uint64_t*>(&number);
    std::bitset<64> bits(binary);

    std::cout << "Double: " << number << std::endl;
    std::cout << "Binary: " << bits << std::endl;
}

int main() {
    float floatNum = 0.1f;
    double doubleNum = 0.1;

    printFloatBinary(floatNum);
    printDoubleBinary(doubleNum);

    return 0;
}

Float: 0.1
Binary: 00111101110011001100110011001101
Double: 0.1
Binary: 0011111110111001100110011001100110011001100110011001100110011010

符号位：第 1 位
指数位：
对于 float（32 位）：第 2 到第 9 位（共 8 位）
对于 double（64 位）：第 2 到第 12 位（共 11 位）
尾数位：
对于 float（32 位）：第 10 到第 32 位（共 23 位）
对于 double（64 位）：第 13 到第 64 位（共 52 位）

手工将0.1转换为二进制

转换整数部分：0（已经是零）

0.1 × 2 = 0.2 (整数部分：0)
0.2 × 2 = 0.4 (整数部分：0)
0.4 × 2 = 0.8 (整数部分：0)
0.8 × 2 = 1.6 (整数部分：1)
0.6 × 2 = 1.2 (整数部分：1)
0.2 × 2 = 0.4 (整数部分：0)
0.4 × 2 = 0.8 (整数部分：0)
0.8 × 2 = 1.6 (整数部分：1)
0.6 × 2 = 1.2 (整数部分：1)
0.2 × 2 = 0.4 (整数部分：0)

合并整数部分

将上述每一步的整数部分合并起来：

$0.1_{10} = 0.0001100110011001100110011001100 \ldots_2$

最终得到的二进制表示是一个无限循环小数：
$0.1_{10} = 0.(0001100110011001100110011001100 \ldots)_2$
其中，上面的横线表示循环节： $0001\overline{1001}$ 。

IEEE 754表示与32位二进制表示的关系

小数二进制表示

我们前面计算的0.1的小数二进制表示（0.0001100110011001100110011001100…）是直接将小数部分转换为二进制的结果，这是一个无限循环的小数。

IEEE 754 二进制浮点数表示

而“00111101110011001100110011001101”是0.1在计算机中存储时的IEEE 754标准的32位单精度浮点数表示。IEEE 754标准规定了浮点数的存储格式，包括符号位、指数位和尾数（或称为有效数字位）。

IEEE 754 单精度浮点数表示解释

IEEE 754单精度浮点数使用32位来表示一个浮点数，其中：

1位用于符号位
8位用于指数位
23位用于尾数位
以0.1 为例

符号位：0 表示正数。
将0.1转化为二进制：0.0001100110011001100110011001100110011001100110011001100…（无限循环）
规格化二进制：将其表示为 1.xxxxxx × 2^(-4) 的形式，所以 0.1 = 1.10011001100110011001101 × 2^(-4)
指数：由于偏移量为127，所以储存的指数为 -4 + 127 = 123（即二进制的01111011）
尾数：取1后面的23位：10011001100110011001101
合并这些部分后得到IEEE 754表示：
$0∣01111011∣10011001100110011001101$

这就对应我们之前看到的32位二进制：
$00111101110011001100110011001101$

数据类型	大小	指数位	尾数位	偏移量
binary16	16 位	5 位	10 位	15
binary32	32 位	8 位	23 位	127
binary64	64 位	11 位	52 位	1023
binary128	128 位	15 位	112 位	16383

ratio来处理有理数

#include <iostream>
#include <ratio>

int main() {
    // 定义分数类型
    using MyRatio = std::ratio<1, 3>;

    // 获取分子和分母
    constexpr int numerator = MyRatio::num;
    constexpr int denominator = MyRatio::den;

    std::cout << "Fraction: " << numerator << "/" << denominator << std::endl;

    return 0;
}

Fraction: 1/3

自定义类实现用分数精确表达浮点数

#include <iostream>
#include <numeric> // for std::gcd
#include <iomanip>
class Fraction {
public:
    Fraction(long long numerator, long long denominator) : numerator(numerator), denominator(denominator) {
        reduce();
    }

    // 加法运算
    Fraction operator+(const Fraction& other) const {
        long long new_numerator = numerator * other.denominator + other.numerator * denominator;
        long long new_denominator = denominator * other.denominator;
        return Fraction(new_numerator, new_denominator);
    }

    // 减法运算
    Fraction operator-(const Fraction& other) const {
        long long new_numerator = numerator * other.denominator - other.numerator * denominator;
        long long new_denominator = denominator * other.denominator;
        return Fraction(new_numerator, new_denominator);
    }

    // 乘法运算
    Fraction operator*(const Fraction& other) const {
        return Fraction(numerator * other.numerator, denominator * other.denominator);
    }

    // 除法运算
    Fraction operator/(const Fraction& other) const {
        return Fraction(numerator * other.denominator, denominator * other.numerator);
    }

    // 输出
    friend std::ostream& operator<<(std::ostream& os, const Fraction& fraction) {
        os << fraction.numerator << "/" << fraction.denominator;
        return os;
    }

private:
    long long numerator;
    long long denominator;

    // 约分
    void reduce() {
        long long gcd_value = std::gcd(numerator, denominator);
        numerator /= gcd_value;
        denominator /= gcd_value;
        if (denominator < 0) {
            numerator = -numerator;
            denominator = -denominator;
        }
    }
};

int main() {

    Fraction frac1(1, 10); // 0.1
    Fraction frac2(1, 3);  // 1/3

    std::cout << "Fraction 1: " << frac1 << std::endl;
    std::cout << "Fraction 2: " << frac2 << std::endl;

    Fraction sum = frac1 + frac2;
    Fraction diff = frac1 - frac2;
    Fraction prod = frac1 * frac2;
    Fraction quot = frac1 / frac2;

    std::cout << "Sum: " << sum << std::endl;
    std::cout << "Difference: " << diff << std::endl;
    std::cout << "Product: " << prod << std::endl;
    std::cout << "Quotient: " << quot << std::endl;

    return 0;
}

Fraction 1: 1/10
Fraction 2: 1/3
Sum: 13/30
Difference: -7/30
Product: 1/30
Quotient: 3/10

64位的存储空间，虽然范围很大，但如果分子和分母的值超出这个范围，仍然会发生溢出。
对于非常大的数，gcd 函数的计算可能会变得非常慢，因为它需要计算两个大数的最大公约数。
如果要处理极其巨大的数，即使它们没有溢出，内存消耗也是一个问题。
在实践中可以先测试下 Boost Multiprecision 这样的库。