SmallFloat编解码

最新推荐文章于 2022-09-09 15:32:27 发布

原创最新推荐文章于 2022-09-09 15:32:27 发布

· 195 阅读

0 ·

版权

lucene 专栏收录该内容

14 篇文章

订阅专栏

longToInt4编码

  public static int longToInt4(long i) {
    if (i < 0) {
      throw new IllegalArgumentException("Only supports positive values, got " + i);
    }
    // 表达i所需要的bit个数
    int numBits = 64 - Long.numberOfLeadingZeros(i);
    if (numBits < 4) {
      // subnormal value
      // 0-3个bit所能表示的long数字直接强转为int, i肯定在[0,7]之间
      return Math.toIntExact(i);
    } else {
      // normal value
      // [8-15] ,  numBit=4,  shift=0,   encoded=[1,7]|8=[9,15]
      // [16-17],  numBit=5,  shift=1,   encoded=[0,0]|16=[16,16]
      // [18-19],  numBit=6,  shift=1,   encoded=[1,1]|16=[17,17]
      // [20-21],  numBit=7,  shift=1,   encoded=[2,2]|16=[18,18]

      int shift = numBits - 4;
      // only keep the 5 most significant bits
      // 无符号右移，只保留高4位，且这高4位被无符号右移到最低位, 则encoded只有4个bit， encoded在[0-15]
      int encoded = Math.toIntExact(i >>> shift);
      // clear the most significant bit, which is implicit
      encoded &= 0x07; //只保留末尾3位了, 结果位[0,7]
      // encode the shift, adding 1 because 0 is reserved for subnormal values
      // shift范围[0,59],则(shift + 1) << 3范围为[8,480], 则encoded范围为[8,487]
      encoded |= (shift + 1) << 3;
      // 返回结果为[8,487]
      return encoded;
    }
  }

结论

返回结果区间为[0,487], 即将一个long正数编码到了[0,487]的区间中。
输入输出如果用函数曲线表示，则类似对底函数, 越到最后增长越平缓。
输入为[0,15]时，输出也为[0,15]。
输入>15时，即i至少需要5个bit位表示时，遵循以下原则
a)高非0高4位相同，则结果相同,
如 16 = 0b0001_0000, 17=0b0001_0001，非0高4位为1_000，所以函数结果一样，为16。

如0x7800_0000_0000_0000L 与0x7FFF_FFFF_FFFF_FFFFL，非0高4位为111_1, 结果都为 487, 这是函数结果487对应的原码最小值与最大值,差距很大，跟结论2呼应。

int4ToLong解码

有编码，自然有解码，其解码代码如下

/**
   * Decode values encoded with {@link #longToInt4(long)}.
   */
  public static final long int4ToLong(int i) {
    long bits = i & 0x07;
    int shift = (i >>> 3) - 1;
    long decoded;
    if (shift == -1) {

      // subnormal value
     //  i处于[0,7]时，shift = -1, 返回i，输入=输出
      decoded = bits;
    } else {
      // normal value
      // bits永远处于[0,7], 或0x08相当于+8,   shift=i/8-1;
      // i越大，shift越大, decoded越大，正相关
      decoded = (bits | 0x08) << shift;
    }
    return decoded;
  }

注意一点，返回结果表示第一个long型数字编码为i的值；比如16编码为16, 17编码也为16,拿16去解码，返回的原值为16。

intToByte4编码

将输入int i编码到字节范围[-128,127]

  // 231
  private static final int MAX_INT4 = longToInt4(Integer.MAX_VALUE);
  // 24
  private static final int NUM_FREE_VALUES = 255 - MAX_INT4;

  /**
   * Encode an integer to a byte. It is built upon {@link #longToInt4(long)}
   * and leverages the fact that {@code longToInt4(Integer.MAX_VALUE)} is
   * less than 255 to encode low values more accurately.
   */
  public static byte intToByte4(int i) {
    if (i < 0) {
      throw new IllegalArgumentException("Only supports positive values, got " + i);
    }
    // NUM_FREE_VALUES = 24; 不足24原样返回
    if (i < NUM_FREE_VALUES) {
      return (byte) i;
    } else {
      // 已知longToInt4(Integer.MAX_VALUE) = 231
      // 不小于24, 则返回结果区间为[24, 24 + 231]，即[24,255]
      return (byte) (NUM_FREE_VALUES + longToInt4(i - NUM_FREE_VALUES));
    }
  }

byte4ToInt解码

将[-128,127]解码到int范围

  /**
   * Decode values that have been encoded with {@link #intToByte4(int)}.
   */
  public static int byte4ToInt(byte b) {
    int i = Byte.toUnsignedInt(b);
    if (i < NUM_FREE_VALUES) {
      return i;
    } else {
      long decoded = NUM_FREE_VALUES + int4ToLong(i - NUM_FREE_VALUES);
      return Math.toIntExact(decoded);
    }
  }