JAVA学习随笔(1)--int转各进制的字符串_buf[offset + --charpos] = integer.digits[((int) va-CSDN博客

本文链接：https://blog.csdn.net/toyijiu/article/details/51675247

开始在看JAVA语言。看到源码里有个计算int整数对应的最高位1所在的位置代码，感觉代码写的很神奇，记录一下，可以反复品味下。
前提是：int固定为32位，有点半分递归查找的味道，不断缩小统计范围，硬编码的问题，感觉可以解决下，主要是看到硬编码就下意识的想规避，也不知道这是不是个好习惯。。。

    public static int numberOfLeadingZeros(int i) {
        // HD, Figure 5-6
        if (i == 0)
            return 32;
        int n = 1;
        if (i >>> 16 == 0) { n += 16; i <<= 16; }
        if (i >>> 24 == 0) { n +=  8; i <<=  8; }
        if (i >>> 28 == 0) { n +=  4; i <<=  4; }
        if (i >>> 30 == 0) { n +=  2; i <<=  2; }
        n -= i >>> 31;
        return n;
    }

然后看了下toUnsignedString函数，作用是int转换成对应的进制的字符串表示，先上代码：

   public static String toUnsignedString(long i, int radix) {
        if (i >= 0)
            return toString(i, radix);
        else {
            switch (radix) {
            case 2:
                return toBinaryString(i);

            case 4:
                return toUnsignedString0(i, 2);

            case 8:
                return toOctalString(i);

            case 10:
                /*
                 * We can get the effect of an unsigned division by 10
                 * on a long value by first shifting right, yielding a
                 * positive value, and then dividing by 5.  This
                 * allows the last digit and preceding digits to be
                 * isolated more quickly than by an initial conversion
                 * to BigInteger.
                 */
                long quot = (i >>> 1) / 5;
                long rem = i - quot * 10;
                return toString(quot) + rem;

            case 16:
                return toHexString(i);

            case 32:
                return toUnsignedString0(i, 5);

            default:
                return toUnsignedBigInteger(i).toString(radix);
            }
        }
    }

radix要求是2–36之间，因为定义Character.MIN_RADIX为2，Character.MAX_RADIX为36。
当i >= 0 的时候，进入toString函数，看看toString这个函数是干嘛的：

    /**
     * Returns a string representation of the first argument in the
     * radix specified by the second argument.
     *
     * <p>If the radix is smaller than {@code Character.MIN_RADIX}
     * or larger than {@code Character.MAX_RADIX}, then the radix
     * {@code 10} is used instead.
     *
     * <p>If the first argument is negative, the first element of the
     * result is the ASCII minus sign {@code '-'}
     * ({@code '\u005Cu002d'}). If the first argument is not
     * negative, no sign character appears in the result.
     *
     * <p>The remaining characters of the result represent the magnitude
     * of the first argument. If the magnitude is zero, it is
     * represented by a single zero character {@code '0'}
     * ({@code '\u005Cu0030'}); otherwise, the first character of
     * the representation of the magnitude will not be the zero
     * character.  The following ASCII characters are used as digits:
     *
     * <blockquote>
     *   {@code 0123456789abcdefghijklmnopqrstuvwxyz}
     * </blockquote>
     *
     * These are {@code '\u005Cu0030'} through
     * {@code '\u005Cu0039'} and {@code '\u005Cu0061'} through
     * {@code '\u005Cu007a'}. If {@code radix} is
     * <var>N</var>, then the first <var>N</var> of these characters
     * are used as radix-<var>N</var> digits in the order shown. Thus,
     * the digits for hexadecimal (radix 16) are
     * {@code 0123456789abcdef}. If uppercase letters are
     * desired, the {@link java.lang.String#toUpperCase()} method may
     * be called on the result:
     *
     * <blockquote>
     *  {@code Long.toString(n, 16).toUpperCase()}
     * </blockquote>
     *
     * @param   i       a {@code long} to be converted to a string.
     * @param   radix   the radix to use in the string representation.
     * @return  a string representation of the argument in the specified radix.
     * @see     java.lang.Character#MAX_RADIX
     * @see     java.lang.Character#MIN_RADIX
     */
    public static String toString(long i, int radix) {
        if (radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)
            radix = 10;
        if (radix == 10)
            return toString(i);
        char[] buf = new char[65];
        int charPos = 64;
        boolean negative = (i < 0);

        if (!negative) {
            i = -i;
        }

        while (i <= -radix) {
            buf[charPos--] = Integer.digits[(int)(-(i % radix))];
            i = i / radix;
        }
        buf[charPos] = Integer.digits[(int)(-i)];

        if (negative) {
            buf[--charPos] = '-';
        }

        return new String(buf, charPos, (65 - charPos));
    }

如果进制数非法的话，直接转成十进制对应的字符串。否则定义一个长度65*2字节的临时buffer(没记错的话java的char是Unicode，2字节)，判断int的正负性并标记转换（跟之前看的itoa源码有点像哈，不过这里是把int转成负数），然后从后到前依次转换进制并存入char数组
这和Solaris的itoa源码好像:
但是为啥要用负数来转换呢？难道是为了负数转正数可能的溢出吗？（因为32位int的范围为-2^31 — 2^31 - 1）

while (i <= -radix) {
            buf[charPos--] = Integer.digits[(int)(-(i % radix))];
            i = i / radix;
        }

//digits是个对应转换后的字符索引数组：   
         /**
* All possible chars for representing a number as a String
*/
final static char[] digits = {
   '0' , '1' , '2' , '3' , '4' , '5' ,
   '6' , '7' , '8' , '9' , 'a' , 'b' ,
   'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
   'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
   'o' , 'p' , 'q' , 'r' , 's' , 't' ,
   'u' , 'v' , 'w' , 'x' , 'y' , 'z'
};

再回到toUnsignedString函数，当i为负数，转换成2进制时，进入toBinaryString函数：

    /**
     * Returns a string representation of the {@code long}
     * argument as an unsigned integer in base&nbsp;2.
     *
     * <p>The unsigned {@code long} value is the argument plus
     * 2<sup>64</sup> if the argument is negative; otherwise, it is
     * equal to the argument.  This value is converted to a string of
     * ASCII digits in binary (base&nbsp;2) with no extra leading
     * {@code 0}s.
     *
     * <p>The value of the argument can be recovered from the returned
     * string {@code s} by calling {@link
     * Long#parseUnsignedLong(String, int) Long.parseUnsignedLong(s,
     * 2)}.
     *
     * <p>If the unsigned magnitude is zero, it is represented by a
     * single zero character {@code '0'} ({@code '\u005Cu0030'});
     * otherwise, the first character of the representation of the
     * unsigned magnitude will not be the zero character. The
     * characters {@code '0'} ({@code '\u005Cu0030'}) and {@code
     * '1'} ({@code '\u005Cu0031'}) are used as binary digits.
     *
     * @param   i   a {@code long} to be converted to a string.
     * @return  the string representation of the unsigned {@code long}
     *          value represented by the argument in binary (base&nbsp;2).
     * @see #parseUnsignedLong(String, int)
     * @see #toUnsignedString(long, int)
     * @since   JDK 1.0.2
     */
    public static String toBinaryString(long i) {
        return toUnsignedString0(i, 1);
    }

这时直接调用toUnsignedString0(i, 1)。
当radix为4时，进入toUnsignedString0(i, 2)，又是这个函数，待会咱们再看这个函数到底是何方神圣~
当radix为8时，进入toOctalString(i)，一看名字就是转成8进制：

    public static String toOctalString(long i) {
        return toUnsignedString0(i, 3);
    }

又是toUnsignedString0。。。
当radix为10时：
这个没怎么看懂，先放一下，做个标记，，强调内容，，，，，

            case 10:
                /*
                 * We can get the effect of an unsigned division by 10
                 * on a long value by first shifting right, yielding a
                 * positive value, and then dividing by 5.  This
                 * allows the last digit and preceding digits to be
                 * isolated more quickly than by an initial conversion
                 * to BigInteger.
                 */
                long quot = (i >>> 1) / 5;
                long rem = i - quot * 10;
                return toString(quot) + rem;

转16进制是toHexString(i)：

    public static String toHexString(long i) {
        return toUnsignedString0(i, 4);
    }

转32进制是toUnsignedString0(i, 5)。
其他是toUnsignedBigInteger(i).toString(radix)。

看toUnsignedString0（int i,int radix）函数：
2,4，8,16,32进制对应的radix是1,2,3,4,5.对应的是2的多少次方，后面会用到。而且2进制为1个bit，4进制为2个bit，8进制3个bit

    /**
     * Format a long (treated as unsigned) into a String.
     * @param val the value to format
     * @param shift the log2 of the base to format in (4 for hex, 3 for octal, 1 for binary)
     */
    static String toUnsignedString0(long val, int shift) {
        // assert shift > 0 && shift <=5 : "Illegal shift value";
        int mag = Long.SIZE - Long.numberOfLeadingZeros(val);
        int chars = Math.max(((mag + (shift - 1)) / shift), 1);
        char[] buf = new char[chars];

        formatUnsignedLong(val, shift, buf, 0, chars);
        return new String(buf, true);
    }

先算出二进制最高位1所在的index

int mag = Long.SIZE - Long.numberOfLeadingZeros(val);
    public static int numberOfLeadingZeros(int i) {
        // HD, Figure 5-6
        if (i == 0)
            return 32;
        int n = 1;
        if (i >>> 16 == 0) { n += 16; i <<= 16; }
        if (i >>> 24 == 0) { n +=  8; i <<=  8; }
        if (i >>> 28 == 0) { n +=  4; i <<=  4; }
        if (i >>> 30 == 0) { n +=  2; i <<=  2; }
        n -= i >>> 31;
        return n;

然后算转换成对应的radix的数的字符个数：

int chars = Math.max(((mag + (shift - 1)) / shift), 1);

(shift - 1)的作用是当位数不能被radix整除时做的填充作用~。
然后进入处理函数：formatUnsignedLong(val, shift, buf, 0, chars);

    /**
     * Format a long (treated as unsigned) into a character buffer.
     * @param val the unsigned long to format
     * @param shift the log2 of the base to format in (4 for hex, 3 for octal, 1 for binary)
     * @param buf the character buffer to write to
     * @param offset the offset in the destination buffer to start at
     * @param len the number of characters to write
     * @return the lowest character location used
     */
     static int formatUnsignedLong(long val, int shift, char[] buf, int offset, int len) {
        int charPos = len;
        int radix = 1 << shift;
        int mask = radix - 1;
        do {
            buf[offset + --charPos] = Integer.digits[((int) val) & mask];
            val >>>= shift;
        } while (val != 0 && charPos > 0);

        return charPos;
    }

转换和itoa的差不多，数组从后往前存，先把shift转换成对应的真正的进制radix，掩码max的作用是每次去进制对应的最低位的bit数，并转化为对应的字符：

        do {
            buf[offset + --charPos] = Integer.digits[((int) val) & mask];
            val >>>= shift;
        } while (val != 0 && charPos > 0);

最后返回转化后的char数组的起始位置.

其他进制的转换用函数toUnsignedBigInteger(i).toString(radix)，这个明天再看，有点晚了，休息啦~

6.15,来来来，看看toUnsignedBigInteger函数：

 /**
     * Return a BigInteger equal to the unsigned value of the
     * argument.
     */
    private static BigInteger toUnsignedBigInteger(long i) {
        if (i >= 0L)
            return BigInteger.valueOf(i);
        else {
            int upper = (int) (i >>> 32);
            int lower = (int) i;

            // return (upper << 32) + lower
            return (BigInteger.valueOf(Integer.toUnsignedLong(upper))).shiftLeft(32).
                add(BigInteger.valueOf(Integer.toUnsignedLong(lower)));
        }
    }