String.split()和StringTokenizer和indexOf()的比较

最新推荐文章于 2020-07-31 20:28:32 发布

BenBHX

最新推荐文章于 2020-07-31 20:28:32 发布

阅读量451

点赞数

分类专栏： Java 文章标签： Spring 算法 Apache

Java 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

将字符串按照一定的规律转换成字符串数组，我们很容易想到使用String.split(String)方法。

的确String的split方法很方便，但是对于性能要求高的应用，string.split(String)将会花费更多的性能需求

我们可以使用java.util.StringTokenizer来代替String.split()方法，性能上也有一定的提升。

以下通过例子比较两者的性能消耗

		String str = "abc";
		StringBuffer buffer = new StringBuffer();
		
		// prepare the string
		for (int i = 0; i < 10000; i ++){
			buffer.append(str).append(",");
		}		
		str = buffer.toString();
		
		// java.util.StringTokenizer
		long curTime = System.currentTimeMillis();
		for (int m = 0; m < 1000; m ++){
			StringTokenizer token = new StringTokenizer(str, ",");
			String[] array2 = new String[token.countTokens()];
			int i = 0;
			while (token.hasMoreTokens()){
				array2[i++] = token.nextToken();
			}
		}
		System.out.println("java.util.StringTokener : " + (System.currentTimeMillis() - curTime));

		// String.split()
		curTime = System.currentTimeMillis();
		for (int m = 0; m < 1000; m ++){
			String[] array = str.split(",");
		}
		System.out.println("String.split : " + (System.currentTimeMillis() - curTime));
		
		curTime = System.currentTimeMillis();
		for (int n = 0; n < 1000; n ++){
			Vector<String> vector= new Vector<String>();
			int index = 0, offset = 0;
			while ((index = str.indexOf(",", index + 1)) != -1){
				vector.addElement(str.substring(offset, index));
				offset = index + 1;
			}
			String[] array3 = vector.toArray(new String[0]);
		}
		System.out.println("Vector & indexOf : " + (System.currentTimeMillis() - curTime));

输出----

java.util.StringTokener : 1407
String.split : 2546
Vector & indexOf : 1094

很显眼，使用StringTokenizer比使用Spring.split()提高接近一倍的性能。

而是用indexOf来逐步查找，性能还能进一步提高25%左右。很显然，越接近底层的方法性能越得到满足。

不过，这个只是在于对性能要求高的需求底下才有真正的意义。普通应用，String.split()足以

补充一点：

使用String.indexOf()去扫描的时候，如果使用ArrayList或者Vector(两者性能基本上没多大区别)也不是最优方案

还有可以提高更好的性能的方法，就是先扫描有多少个分割符，用String[] 来存贮，比使用Vector要提高一倍左右的性能

如果还需要更进一步，那么就需要使用好的扫描算法了。

	public static String[] split(String s, String delimiter){
		if (s == null) {
			return null;
		}
		int delimiterLength;
		int stringLength = s.length();
		if (delimiter == null || (delimiterLength = delimiter.length()) == 0){
			return new String[] {s};
		}

		// a two pass solution is used because a one pass solution would
		// require the possible resizing and copying of memory structures
		// In the worst case it would have to be resized n times with each
		// resize having a O(n) copy leading to an O(n^2) algorithm.

		int count;
		int start;
		int end;

		// Scan s and count the tokens.
		count = 0;
		start = 0;
		while((end = s.indexOf(delimiter, start)) != -1){
			count++;
			start = end + delimiterLength;
		}
		count++;

		// allocate an array to return the tokens,
		// we now know how big it should be
		String[] result = new String[count];

		// Scan s again, but this time pick out the tokens
		count = 0;
		start = 0;
		while((end = s.indexOf(delimiter, start)) != -1){
			result[count] = (s.substring(start, end));
			count++;
			start = end + delimiterLength;
		}
		end = stringLength;
		result[count] = s.substring(start, end);

		return (result);
	}