JDK之Number类探密(3)

最新推荐文章于 2023-02-23 15:41:22 发布

xpenxpenxpen

最新推荐文章于 2023-02-23 15:41:22 发布

阅读量1.1k

点赞数 1

分类专栏： JavaSE Source 文章标签： jdk string 算法 distance integer performance

本文链接：https://blog.csdn.net/xpenxpenxpen/article/details/2607811

版权

JavaSE Source 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

这篇我们看Integer，这个类源码有1000多行，花费了我一天的时间研习，里面有很多闪光的东西。

首先起头的注释就是大师的风范，这个类有3个作者呢，厉害吧

/**

 *

 * 实现注意事项：

 * “bit twiddling”方法 “比特玩弄”？

 * （如 highestOneBit 和 numberOfTrailingZeros）的实现基于

 *  Henry S. Warren, Jr.撰写的 Hacker's Delight（Addison Wesley, 2002）

 *  中的一些有关材料。

 */

这里提示说里面有些方法借鉴了Hacker's Delight这本书中的思想。我们后面再看。

随后定义了3个数组

	/**

	 * All possible chars for representing a number as a String

	 * 表示数字的字符？ 0-9 a-z

	 */

    final static char[] digits = {

    	'0' , '1' , '2' , '3' , '4' , '5' ,

    	'6' , '7' , '8' , '9' , 'a' , 'b' ,

    	'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,

    	'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,

    	'o' , 'p' , 'q' , 'r' , 's' , 't' ,

    	'u' , 'v' , 'w' , 'x' , 'y' , 'z'

        };

	final static char[] DigitTens = {

			'0', '0', '0', '0', '0', '0', '0', '0', '0', '0', 

			'1', '1', '1', '1', '1', '1', '1', '1', '1', '1', 

			'2', '2', '2', '2', '2', '2', '2', '2', '2', '2', 

			'3', '3', '3', '3', '3', '3', '3', '3', '3', '3', 

			'4', '4', '4', '4', '4', '4', '4', '4', '4', '4', 

			'5', '5', '5', '5', '5', '5', '5', '5', '5', '5',

			'6', '6', '6', '6', '6', '6', '6', '6', '6', '6', 

			'7', '7', '7', '7', '7', '7', '7', '7', '7', '7', 

			'8', '8', '8', '8', '8', '8', '8', '8', '8', '8', 

			'9', '9', '9', '9', '9', '9', '9', '9', '9', '9', };



	final static char[] DigitOnes = {

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',

		'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', };

神奇的 digits，把0-9 a-z存起来了，后面String和int转换要用到，0-9可以理解，如果10进制就0~9够了，不过Java最多允许36进制哦，所以就是10个数字加26个字母，最小则是2进制。

DigitTens 和 DigitOnes干嘛用？把0~99这100个数给缓存起来了，

DigitOnes-->100个数的个位数字 DigitTens-->100个数的十位数字

接下来是toString方法，把int转成String，参数radix是基数（允许2进制到36进制）

	public static String toString(int i, int radix) {



		if (radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)

			radix = 10; //如果不在2~36的范围内，则改为10进制



		/* 基数为10采用快速版本toString 见后文 */

		if (radix == 10) {

			return toString(i);

		}



		char buf[] = new char[33];

		boolean negative = (i < 0);

		int charPos = 32; //从最后一位开始设起



		if (!negative) {

			i = -i; //搞成负数？？

		}



		while (i <= -radix) {

			buf[charPos--] = digits[-(i % radix)];

			i = i / radix;

		}//核心算法就在这，不断的除以基数，取余数，和我们初学计算机时第一堂课，进制转换的手工方法一样吧！

		buf[charPos] = digits[-i];

		

		//把正数搞成负数是有道理的，为何不把负数搞成正数？

		//想想MIN_VALUE 把负号去掉会如何？值越界了！



		if (negative) {

			buf[--charPos] = '-';

		}



		return new String(buf, charPos, (33 - charPos));

		//String构造函数

		//public String(char value[], int offset, int count)，以后研究

	}

以上是进制转换方法，用了除法，取模运算，但这2个运算在计算机里速度是很慢的。

如果基数是2，8，16时，有下面更好的方法哦

	public static String toHexString(int i) {

		return toUnsignedString(i, 4);

	}

	public static String toOctalString(int i) {

		return toUnsignedString(i, 3);

	}

	public static String toBinaryString(int i) {

		return toUnsignedString(i, 1);

	}



	/**

	 * Convert the integer to an unsigned number.

	 * 转成无符号

	 */

	private static String toUnsignedString(int i, int shift) {

		char[] buf = new char[32];

		int charPos = 32;

		int radix = 1 << shift;//原来采用移位运算，左移1位就是乘以2，右移1位就是除以2

		int mask = radix - 1;

		do {

			buf[--charPos] = digits[i & mask];//厉害了，戴上“面具”就能算出来啊

			i >>>= shift;

		} while (i != 0);



		return new String(buf, charPos, (32 - charPos));

	}

上面toUnsignedString没有用除法，余数运算，所以理论上效率更高，不过看注释，它转成无符号的了，所以只有正数才适用。

前面讲基数为10时，可以有更好的算法，快速版本toString 就在下面，首先是一堆注释

	// I use the "invariant division by multiplication" trick to

	// accelerate Integer.toString.  In particular we want to

	// avoid division by 10.

	//

	// The "trick" has roughly the same performance characteristics

	// as the "classic" Integer.toString code on a non-JIT VM.

	// The trick avoids .rem and .div calls but has a longer code

	// path and is thus dominated by dispatch overhead.  In the

	// JIT case the dispatch overhead doesn't exist and the

	// "trick" is considerably faster than the classic code.

	//

	// TODO-FIXME: convert (x * 52429) into the equiv shift-add

	// sequence.

	//

	// RE:  Division by Invariant Integers using Multiplication

	//      T Gralund, P Montgomery

	//      ACM PLDI 1994

以上这段，作者又用技巧了？还ACM？看来不简单

	public static String toString(int i) {

		if (i == Integer.MIN_VALUE)

			return "-2147483648";

		//调用stringSize

		int size = (i < 0) ? stringSize(-i) + 1 : stringSize(i);

		char[] buf = new char[size];

		//调用getChars

		getChars(i, size, buf);

		return new String(0, size, buf);

	}

//核心getChars

	static void getChars(int i, int index, char[] buf) {

		int q, r;

		int charPos = index;

		char sign = 0;



		if (i < 0) {

			sign = '-';

			i = -i;

		}



		// Generate two digits per iteration

		// 每次产生2位，直到<65536

		while (i >= 65536) {

			q = i / 100;

			// really: r = i - (q * 100);

			r = i - ((q << 6) + (q << 5) + (q << 2));

			//r = 1 - ((q * 64) + (q * 32) + (q * 4));

			i = q;

			buf[--charPos] = DigitOnes[r];

			buf[--charPos] = DigitTens[r];

		}



		// Fall thru to fast mode for smaller numbers

		// assert(i <= 65536, i);

		// <= 65536时有快速算法？

		for (;;) {

			q = (i * 52429) >>> (16 + 3); //技巧之关键

			//上面这句应该等价与q = i / 10;为了避免除法，提高效率？

			r = i - ((q << 3) + (q << 1)); // r = i-(q*10) ...

			buf[--charPos] = digits[r];

			i = q;

			if (i == 0)

				break;

		}

		if (sign != 0) {

			buf[--charPos] = sign;

		}

		

		//技巧之关键 q = (i * 52429) >>> (16 + 3);其实等价与q = i / 10;为了避免除法，提高效率

		//因为2<<(16+3)=2<<19=524288,   

	    //(i   *   52429)>>>(16+3)   =   i*52429/524288=   

	    //52429.0/524288=0.1000003814697......   

	    //6位的精度已经足够多了，所以就是i * 0.1也即i / 10。

		//原帖见这里，分析得很好，

		//http://topic.csdn.net/t/20060927/19/5052712.html

	}



	final static int[] sizeTable = {

		9, 99, 999, 9999, 99999, 999999, 9999999,

			99999999, 999999999, Integer.MAX_VALUE };

	//又来一个缓存？sizeTable


	// Requires positive x

	static int stringSize(int x) {

		for (int i = 0;; i++)

			if (x <= sizeTable[i])

				return i + 1; //原来是通过比大小来判位数啊


	}

厉害吧！居然为了避免除以10，模10，采用如此之算法，还引了一篇ACM的论文Division by Invariant Integers using Multiplication，不过作者又说非JIT的Java虚拟机运行下来效率没提高，因为它代码长了。

前面是int转String，接下来是倒过来，把String转成int

	/**

	 * 核心算法，还记得Byte.parseByte()调用这个吗

	 */

	public static int parseInt(String s, int radix)

			throws NumberFormatException {

		if (s == null) {

			throw new NumberFormatException("null");

		}



		if (radix < Character.MIN_RADIX) {

			throw new NumberFormatException("radix " + radix

					+ " less than Character.MIN_RADIX");

		}



		if (radix > Character.MAX_RADIX) {

			throw new NumberFormatException("radix " + radix

					+ " greater than Character.MAX_RADIX");

		}



		int result = 0;

		boolean negative = false;

		int i = 0, max = s.length();

		int limit;

		int multmin;

		int digit;



		if (max > 0) {

			if (s.charAt(0) == '-') {

				negative = true;

				limit = Integer.MIN_VALUE;

				i++;

			} else {

				limit = -Integer.MAX_VALUE;

			}

			multmin = limit / radix;//???

			if (i < max) {//第一位

				//通过 Character.digit(char, int) 是否返回一个负值

				//确定 是指定基数的数字

				digit = Character.digit(s.charAt(i++), radix);

				if (digit < 0) {

					throw NumberFormatException.forInputString(s);

				} else {

					result = -digit;

				}

			}

			while (i < max) {//第二位开始循环

				// Accumulating negatively avoids surprises near MAX_VALUE

				digit = Character.digit(s.charAt(i++), radix);

				if (digit < 0) {

					throw NumberFormatException.forInputString(s);

				}

				if (result < multmin) {

					throw NumberFormatException.forInputString(s);

				}

				result *= radix; //核心1

				if (result < limit + digit) {

					throw NumberFormatException.forInputString(s);

				}

				result -= digit; //核心2

				

				//核心就是下面2句

				//result *= radix;

				//result -= digit;

				//其余都是防止越界的检查

			}

		} else {//长度为0抛例外

			throw NumberFormatException.forInputString(s);

		}

		if (negative) {

			if (i > 1) {

				return result;

			} else { /* Only got "-" */

				throw NumberFormatException.forInputString(s);

			}

		} else {

			return -result;

		}

	}

Integer.decode就不分析了，和Byte.decode算法完全一样

接下来的方法都是JDK1.5新增的 “比特玩弄”（Bit twiddling），引用了一部书就是开头注释提到的Hacker's Delight

	public static int highestOneBit(int i) {

		// HD, Figure 3-1

		//Hacker's Delight

		//看了原书，好像也没有对此算法做说明，怎么来的呢？

		

		i |= (i >> 1);

		i |= (i >> 2);

		i |= (i >> 4);

		i |= (i >> 8);

		i |= (i >> 16);

		return i - (i >>> 1);

	}

highestOneBit其实目的是保留这个2进制数最左的那个1，将其后置0

0000 1000 .... 1111 0000 返回 0000 1000 .... 0000 0000

0100 1000 .... 1111 0000 返回 0100 0000 .... 0000 0000

就是某个整数将其拆分成几个2的倍数的和，取其中最大一项,或者说是求最接近该数的2的幂

200 = 128 + 64 + ... 取128

1000 = 512 + 256 + ... 取512

原帖见这里，分析得很好， *http://topic.csdn.net/t/20040922/00/3396195.html

这个算法究竟怎么推导来的，我没有查出来，所以后面的代码我也不贴了，只给出方法解释

public static int lowestOneBit(int i) //保留这个2进制数最右的那个1，将其前置0

public static int numberOfLeadingZeros(int i) //最左的那个1前面0的个数

public static int numberOfTrailingZeros(int i) //最右的那个1后面0的个数

public static int bitCount(int i) //1个个数

public static int rotateLeft(int i, int distance) //左移N位，左边移出的在右边再进入

public static int rotateRight(int i, int distance) //右移N位，右边移出的在左边再进入

public static int reverse(int i) //反转顺序注意不是0/1反转

public static int signum(int i) //返回指定 int 值的符号函数。（如果指定值为负，则返回－1；如果指定值为零，则返回 0；如果指定的值为正，则返回 1。）

public static int reverseBytes(int i) //reverse是按位反转的，而reverseBytes则是按字节反转的

这些代码我没看懂，太高深，望达人能解释。

另外提到的Hacker's Delight，中文版书名：《高效程序的奥秘》

本书适合程序库、编译器开发者及追求优美程序设计的人员阅读，适合用作计算机专业高年级学生及研究生的参考用书。本书直观明了地讲述了计算机算术的更深层次的、更隐秘的技术，汇集了各种编程小技巧，包括常见任务的小算法、2的幂边界和边界检测、位和字节的重排列、整数除法和常量除法、针对整数的基本函数、 Gray码、Hilbert空间填充曲线、素数公式等。）

高深吧研习个JDK Source，我们学了英语，数据结构，编程技巧，数学知识，如果再深入下去看还有设计模式，多方面提高了自己，何乐而不为呢?

xpenxpenxpen

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
JDK之Number类探密(3)

这篇我们看Integer，这个类源码有1000多行，花费了我一天的时间研习，里面有很多闪光的东西。首先起头的注释就是大师的风范，这个类有3个作者呢，厉害吧/** * * 实现注意事项： * “bit twiddling”方法 “比特玩弄”？ * （如 highestOneBit 和 numberOfTrailingZeros）的实现基于 * Hen
复制链接

扫一扫