开始在看JAVA语言。看到源码里有个计算int整数对应的最高位1所在的位置代码,感觉代码写的很神奇,记录一下,可以反复品味下。
前提是:int固定为32位,有点半分递归查找的味道,不断缩小统计范围,硬编码的问题,感觉可以解决下,主要是看到硬编码就下意识的想规避,也不知道这是不是个好习惯。。。
public static int numberOfLeadingZeros(int i) {
// HD, Figure 5-6
if (i == 0)
return 32;
int n = 1;
if (i >>> 16 == 0) { n += 16; i <<= 16; }
if (i >>> 24 == 0) { n += 8; i <<= 8; }
if (i >>> 28 == 0) { n += 4; i <<= 4; }
if (i >>> 30 == 0) { n += 2; i <<= 2; }
n -= i >>> 31;
return n;
}
然后看了下toUnsignedString函数,作用是int转换成对应的进制的字符串表示,先上代码:
public static String toUnsignedString(long i, int radix) {
if (i >= 0)
return toString(i, radix);
else {
switch (radix) {
case 2:
return toBinaryString(i);
case 4:
return toUnsignedString0(i, 2);
case 8:
return toOctalString(i);
case 10:
/*
* We can get the effect of an unsigned division by 10
* on a long value by first shifting right, yielding a
* positive value, and then dividing by 5. This
* allows the last digit and preceding digits to be
* isolated more quickly than by an initial conversion
* to BigInteger.
*/
long quot = (i >>> 1) / 5;
long rem = i - quot * 10;
return toString(quot) + rem;
case 16:
return toHexString(i);
case 32:
return toUnsignedString0(i, 5);
default:
return toUnsignedBigInteger(i).toString(radix);
}
}
}
radix要求是2–36之间,因为定义Character.MIN_RADIX为2,Character.MAX_RADIX为36。
当i >= 0 的时候,进入toString函数,看看toString这个函数是干嘛的:
/**
* Returns a string representation of the first argument in the
* radix specified by the second argument.
*
* <p>If the radix is smaller than {@code Character.MIN_RADIX}
* or larger than {@code Character.MAX_RADIX}, then the radix
* {@code 10} is used instead.
*
* <p>If the first argument is negative, the first element of the
* result is the ASCII minus sign {@code '-'}
* ({@code '\u005Cu002d'}). If the first argument is not
* negative, no sign character appears in the result.
*
* <p>The remaining characters of the result represent the magnitude
* of the first argument. If the magnitude is zero, it is
* represented by a single zero character {@code '0'}
* ({@code '\u005Cu0030'}); otherwise, the first character of
* the representation of the magnitude will not be the zero
* character. The following ASCII characters are used as digits:
*
* <blockquote>
* {@code 0123456789abcdefghijklmnopqrstuvwxyz}
* </blockquote>
*
* These are {@code '\u005Cu0030'} through
* {@code '\u005Cu0039'} and {@code '\u005Cu0061'} through
* {@code '\u005Cu007a'}. If {@code radix} is
* <var>N</var>, then the first <var>N</var> of these characters
* are used as radix-<var>N</var> digits in the order shown. Thus,
* the digits for hexadecimal (radix 16) are
* {@code 0123456789abcdef}. If uppercase letters are
* desired, the {@link java.lang.String#toUpperCase()} method may
* be called on the result:
*
* <blockquote>
* {@code Long.toString(n, 16).toUpperCase()}
* </blockquote>
*
* @param i a {@code long} to be converted to a string.
* @param radix the radix to use in the string representation.
* @return a string representation of the argument in the specified radix.
* @see java.lang.Character#MAX_RADIX
* @see java.lang.Character#MIN_RADIX
*/
public static String toString(long i, int radix) {
if (radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)
radix = 10;
if (radix == 10)
return toString(i);
char[] buf = new char[65];
int charPos = 64;
boolean negative = (i < 0);
if (!negative) {
i = -i;
}
while (i <= -radix) {
buf[charPos--] = Integer.digits[(int)(-(i % radix))];
i = i / radix;
}
buf[charPos] = Integer.digits[(int)(-i)];
if (negative) {
buf[--charPos] = '-';
}
return new String(buf, charPos, (65 - charPos));
}
如果进制数非法的话,直接转成十进制对应的字符串。否则定义一个长度65*2字节的临时buffer(没记错的话java的char是Unicode,2字节),判断int的正负性并标记转换(跟之前看的itoa源码有点像哈,不过这里是把int转成负数),然后从后到前依次转换进制并存入char数组
这和Solaris的itoa源码好像:
但是为啥要用负数来转换呢?难道是为了负数转正数可能的溢出吗?(因为32位int的范围为-2^31 — 2^31 - 1)
while (i <= -radix) {
buf[charPos--] = Integer.digits[(int)(-(i % radix))];
i = i / radix;
}
//digits是个对应转换后的字符索引数组:
/**
* All possible chars for representing a number as a String
*/
final static char[] digits = {
'0' , '1' , '2' , '3' , '4' , '5' ,
'6' , '7' , '8' , '9' , 'a' , 'b' ,
'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
'o' , 'p' , 'q' , 'r' , 's' , 't' ,
'u' , 'v' , 'w' , 'x' , 'y' , 'z'
};
再回到toUnsignedString函数,当i为负数,转换成2进制时,进入toBinaryString函数:
/**
* Returns a string representation of the {@code long}
* argument as an unsigned integer in base 2.
*
* <p>The unsigned {@code long} value is the argument plus
* 2<sup>64</sup> if the argument is negative; otherwise, it is
* equal to the argument. This value is converted to a string of
* ASCII digits in binary (base 2) with no extra leading
* {@code 0}s.
*
* <p>The value of the argument can be recovered from the returned
* string {@code s} by calling {@link
* Long#parseUnsignedLong(String, int) Long.parseUnsignedLong(s,
* 2)}.
*
* <p>If the unsigned magnitude is zero, it is represented by a
* single zero character {@code '0'} ({@code '\u005Cu0030'});
* otherwise, the first character of the representation of the
* unsigned magnitude will not be the zero character. The
* characters {@code '0'} ({@code '\u005Cu0030'}) and {@code
* '1'} ({@code '\u005Cu0031'}) are used as binary digits.
*
* @param i a {@code long} to be converted to a string.
* @return the string representation of the unsigned {@code long}
* value represented by the argument in binary (base 2).
* @see #parseUnsignedLong(String, int)
* @see #toUnsignedString(long, int)
* @since JDK 1.0.2
*/
public static String toBinaryString(long i) {
return toUnsignedString0(i, 1);
}
这时直接调用toUnsignedString0(i, 1)。
当radix为4时,进入toUnsignedString0(i, 2),又是这个函数,待会咱们再看这个函数到底是何方神圣~
当radix为8时,进入toOctalString(i),一看名字就是转成8进制:
public static String toOctalString(long i) {
return toUnsignedString0(i, 3);
}
又是toUnsignedString0。。。
当radix为10时:
这个没怎么看懂,先放一下,做个标记,,强调内容,,,,,
case 10:
/*
* We can get the effect of an unsigned division by 10
* on a long value by first shifting right, yielding a
* positive value, and then dividing by 5. This
* allows the last digit and preceding digits to be
* isolated more quickly than by an initial conversion
* to BigInteger.
*/
long quot = (i >>> 1) / 5;
long rem = i - quot * 10;
return toString(quot) + rem;
转16进制是toHexString(i):
public static String toHexString(long i) {
return toUnsignedString0(i, 4);
}
转32进制是toUnsignedString0(i, 5)。
其他是toUnsignedBigInteger(i).toString(radix)。
看toUnsignedString0(int i,int radix)函数:
2,4,8,16,32进制对应的radix是1,2,3,4,5.对应的是2的多少次方,后面会用到。而且2进制为1个bit,4进制为2个bit,8进制3个bit
/**
* Format a long (treated as unsigned) into a String.
* @param val the value to format
* @param shift the log2 of the base to format in (4 for hex, 3 for octal, 1 for binary)
*/
static String toUnsignedString0(long val, int shift) {
// assert shift > 0 && shift <=5 : "Illegal shift value";
int mag = Long.SIZE - Long.numberOfLeadingZeros(val);
int chars = Math.max(((mag + (shift - 1)) / shift), 1);
char[] buf = new char[chars];
formatUnsignedLong(val, shift, buf, 0, chars);
return new String(buf, true);
}
先算出二进制最高位1所在的index
int mag = Long.SIZE - Long.numberOfLeadingZeros(val);
public static int numberOfLeadingZeros(int i) {
// HD, Figure 5-6
if (i == 0)
return 32;
int n = 1;
if (i >>> 16 == 0) { n += 16; i <<= 16; }
if (i >>> 24 == 0) { n += 8; i <<= 8; }
if (i >>> 28 == 0) { n += 4; i <<= 4; }
if (i >>> 30 == 0) { n += 2; i <<= 2; }
n -= i >>> 31;
return n;
然后算转换成对应的radix的数的字符个数:
int chars = Math.max(((mag + (shift - 1)) / shift), 1);
(shift - 1)的作用是当位数不能被radix整除时做的填充作用~。
然后进入处理函数:formatUnsignedLong(val, shift, buf, 0, chars);
/**
* Format a long (treated as unsigned) into a character buffer.
* @param val the unsigned long to format
* @param shift the log2 of the base to format in (4 for hex, 3 for octal, 1 for binary)
* @param buf the character buffer to write to
* @param offset the offset in the destination buffer to start at
* @param len the number of characters to write
* @return the lowest character location used
*/
static int formatUnsignedLong(long val, int shift, char[] buf, int offset, int len) {
int charPos = len;
int radix = 1 << shift;
int mask = radix - 1;
do {
buf[offset + --charPos] = Integer.digits[((int) val) & mask];
val >>>= shift;
} while (val != 0 && charPos > 0);
return charPos;
}
转换和itoa的差不多,数组从后往前存,先把shift转换成对应的真正的进制radix,掩码max的作用是每次去进制对应的最低位的bit数,并转化为对应的字符:
do {
buf[offset + --charPos] = Integer.digits[((int) val) & mask];
val >>>= shift;
} while (val != 0 && charPos > 0);
最后返回转化后的char数组的起始位置.
其他进制的转换用函数toUnsignedBigInteger(i).toString(radix),这个明天再看,有点晚了,休息啦~
6.15,来来来,看看toUnsignedBigInteger函数:
/**
* Return a BigInteger equal to the unsigned value of the
* argument.
*/
private static BigInteger toUnsignedBigInteger(long i) {
if (i >= 0L)
return BigInteger.valueOf(i);
else {
int upper = (int) (i >>> 32);
int lower = (int) i;
// return (upper << 32) + lower
return (BigInteger.valueOf(Integer.toUnsignedLong(upper))).shiftLeft(32).
add(BigInteger.valueOf(Integer.toUnsignedLong(lower)));
}
}