在编写JAVA程序时,不需要像C一样去手动申请内存和释放内存,完全交给JVM来管理,提升了开发效率,但是如果编写代码不注意一些细节,那就会造成内存空间的浪费和代码性能低下等问题。接下来以字符串使用为例,因为字符串是使用最多的数据类型,再者Java中的字符串是不可变类型:
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];
... ...
}
这种不可变类型的好处就是在多线程环境中,具有天生的线程安全特性。但也带了一些问题,比如对字符串进行拼接、截取等操作时,因不能共享char数组,会产生更多冗余的字符串实例,而实例越多对占用的内存也会越多,同时也会增重JVM垃圾回收的负担。接下来使用Benchmark工具测试字符串各种操作的性能比较。
一. 字符串的拼接
测试代码:
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 3)
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@Threads(8)
@Fork(2)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class StringBuilderBenchmark {
@Benchmark
public void testStringAdd() {
String a = "";
for (int i = 0; i < 10; i++) {
a += i;
}
print(a);
}
@Benchmark
public void testStringBuilderAdd() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10; i++) {
sb.append(i);
}
print(sb.toString());
}
private void print(String a) {
}
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.include(StringBuilderBenchmark.class.getSimpleName())
.output("./StringBuilderBenchmark.log")
.build();
new Runner(options).run();
}
}
测试结果:
Benchmark Mode Cnt Score Error Units
StringBuilderBenchmark.testStringAdd thrpt 20 22163.429 ± 537.729 ops/ms
StringBuilderBenchmark.testStringBuilderAdd thrpt 20 43400.877 ± 2447.492 ops/ms
从上面的测试结果来看,使用StringBuilder性能的确要比直接使用字符串拼接要好。
二. 分割字符串
测试代码:
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 3)
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@Threads(8)
@Fork(2)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class StringSplitBenchmark {
private static final String regex = "\\.";
private static final char CHAR = '.';
private static final Pattern pattern = Pattern.compile(regex);
private String[] strings;
@Setup
public void prepare() {
strings = new String[20];
for(int i=0;i<strings.length;i++) {
strings[i] = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd" + Math.random();
}
}
@Benchmark
public void testStringSplit() {
for(int i=0;i<strings.length;i++) {
strings[i].split(regex);
}
}
@Benchmark
public void testPatternSplit() {
for(int i=0;i<strings.length;i++) {
pattern.split(strings[i]);
}
}
@Benchmark
public void testCharSplit() {
for(int i=0;i<strings.length;i++) {
split(strings[i], CHAR, 6);
}
}
public static List<String> split(final String str, final char separatorChar, int expectParts) {
if (null == str) {
return null;
}
final int len = str.length();
if (len == 0) {
return Collections.emptyList();
}
final List<String> list = new ArrayList<String>(expectParts);
int i = 0;
int start = 0;
boolean match = false;
while (i < len) {
if (str.charAt(i) == separatorChar) {
if (match) {
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
match = true;
i++;
}
if (match) {
list.add(str.substring(start, i));
}
return list;
}
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.include(StringSplitBenchmark.class.getSimpleName())
.output("./StringSplitBenchmark.log")
.build();
new Runner(options).run();
}
}
测试结果:
Benchmark Mode Cnt Score Error Units
StringSplitBenchmark.testCharSplit thrpt 20 872.048 ± 63.872 ops/ms
StringSplitBenchmark.testPatternSplit thrpt 20 534.371 ± 28.275 ops/ms
StringSplitBenchmark.testStringSplit thrpt 20 814.661 ± 115.653 ops/ms
从测试结果来看testCharSplit 和 testStringSplit 性能差不多,与我们的预期不一样。我们都知道String.split方法需要传入一个正则表达式,而在使用正则表达式时,通过使用编译后的正则表达式性能会更高些,而这里却不是。那行我还是要看看String.split中的实现探个究竟:
public String[] split(String regex) {
return split(regex, 0);
}
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if ((
(regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 && regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
原来String.split方法已经做了优化了,并不是我们想像的所有情况下都使用正则表达式来切割字符串。这也说明了为什么testCharSplit 与 testStringSplit 性能差不多的原因了。
三. 字符串替换
测试代码:
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 3)
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@Threads(8)
@Fork(2)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class StringReplaceAllBenchmark {
private static final String EMPTY = "";
private static final String regex = "\\.";
private static final String CHAR = ".";
private static final Pattern pattern = Pattern.compile(regex);
private String[] strings;
@Setup
public void prepare() {
strings = new String[20];
for (int i = 0; i < strings.length; i++) {
strings[i] = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd." + Math.random();
}
}
@Benchmark
public void testStringReplaceAll() {
for (int i = 0; i < strings.length; i++) {
strings[i].replaceAll(regex, EMPTY);
}
}
@Benchmark
public void testPatternReplaceAll() {
for (int i = 0; i < strings.length; i++) {
pattern.matcher(strings[i]).replaceAll(EMPTY);
}
}
@Benchmark
public void testCustomReplaceAll() {
for (int i = 0; i < strings.length; i++) {
replaceAll(strings[i], CHAR, EMPTY);
}
}
public static String replaceAll(final String str, final String remove, final String replacement) {
if (null == str) {
return null;
}
final int len = str.length();
if (len == 0) {
return str;
}
final StringBuilder res = new StringBuilder(len);
int offset = 0;
int index;
while (true) {
index = str.indexOf(remove, offset);
if (index == -1) {
break;
}
res.append(str, offset, index);
if(null != replacement && replacement.length() >0) {
res.append(replacement);
}
offset = index + remove.length();
}
if(offset < len) {
res.append(str, offset, len);
}
return res.toString();
}
public static void main(String[] args) throws RunnerException {
String str = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd." + Math.random();
String str1 = str.replaceAll(regex, EMPTY);
String str2 = pattern.matcher(str).replaceAll(EMPTY);
String str3 = replaceAll(str, CHAR, EMPTY);
System.out.println(str1);
System.out.println(str2);
System.out.println(str3);
Options options = new OptionsBuilder()
.include(StringReplaceAllBenchmark.class.getSimpleName())
.output("./StringReplaceAllBenchmark.log")
.build();
new Runner(options).run();
}
}
测试结果:
Benchmark Mode Cnt Score Error Units
StringReplaceAllBenchmark.testCustomReplaceAll thrpt 20 1167.891 ± 39.699 ops/ms
StringReplaceAllBenchmark.testPatternReplaceAll thrpt 20 438.079 ± 1.859 ops/ms
StringReplaceAllBenchmark.testStringReplaceAll thrpt 20 353.060 ± 11.177 ops/ms
testPatternReplaceAll 和 testStringReplaceAll 都是使用正则表达式来替换,所以性能其差不多。正则表达式在处理一些复杂的情况时非常方便好用,但是从性能角度来说,能不用的情况就尽量不用。
四. 以脱敏工具类为例,进行优化实践
下面的代码是未优化前的情况:
public class DesensitizeUtils {
/**
* 根据value长度取值(切分)
* @param value
* @return
*/
public static String desensitizeByLengthOld(String value) {
if (value.length() == 2) {
value = value.substring(0, 1) + "*";
} else if (value.length() == 3) {
value = value.substring(0, 1) + "*" + value.substring(value.length() - 1);
} else if (value.length() > 3 && value.length() <= 5) {
value = value.substring(0, 1) + "**" + value.substring(value.length() - 2);
} else if (value.length() > 5 && value.length() <= 7) {
value = value.substring(0, 2) + "***" + value.substring(value.length() - 2);
} else if (value.length() > 7) {
String str = "";
for(int i=0; i<value.length()-6; i++) {
str += "*";
}
value = value.substring(0, 3) + str + value.substring(value.length() - 3);
}
return value;
}
/**
* 中文名称脱敏策略:
* 0. 少于等于1个字 直接返回
* 1. 两个字 隐藏姓
* 2. 三个及其以上 只保留第一个和最后一个 其他用星号代替
* @param fullName
* @return
*/
public static String desensitizeChineseNameOld(final String fullName) {
if (StringUtils.isBlank(fullName)) {
return "";
}
if (fullName.length() <= 1) {
return fullName;
} else if (fullName.length() == 2) {
final String name = StringUtils.right(fullName, 1);
return StringUtils.leftPad(name, StringUtils.length(fullName), "*");
} else {
return StringUtils.left(fullName, 1).concat(StringUtils.removeStart(StringUtils.leftPad(StringUtils.right(fullName, 1), StringUtils.length(fullName), "*"), "*"));
}
}
}
接下来对上面代码进行优化
1. 尽量使用常量,但也要简少常量的数量
1). 如上述代码中使用“*”,“**”,“***”的地方,使用一个'*'char常量代替。
public class DesensitizeUtils {
private static final char DESENSITIZE_CODE = '*';
}
2). 再例如38行代码的 return “”;使用用 return StringUtils.EMPTY; 用StringUtils的类常量。
if (StringUtils.isBlank(fullName)) {
return StringUtils.EMPTY;
}
使用常量后可以避免高并发情况下频繁实例化字符串,提高程序的整体性能。
2. 使用局部变量,来减少函数调用
把获取长度提出,避免重复获取
if (value.length() == 2) {
} else if (value.length() == 3) {
} else if (value.length() > 3 && value.length() <= 5) {
} else if (value.length() > 5 && value.length() <= 7) {
} else if (value.length() > 7) {
}
优化后:
int length = value.length();
if (length == 2) {
} else if (length == 3) {
} else if (length > 3 && length <= 5) {
} else if (length > 5 && length <= 7) {
} else if (length > 7) {
}
优化后代码更加简洁,如果value.length() 方法是个非常耗时的操作,那么势必造成重复调用,耗时乘倍增加。
3. 高度重视第三方类库
为了复用,节约成本,我们或多或少会使用别人写提供的类库,但是在使用之前也要对其原理要有一定的了解,并结合自己的实际情况来选择合理的方案,以避免踩坑。
1). 字符串截取方法substring
使用字符串的substring方法非常方便截取字串,但是由于字符串是不可变类型,所以它每次返回一个新的字符串,在下面的代码中,就会产生多个字符串实例:
value = value.substring(0, 2) + "***" + value.substring(length - 2);
使用StringBuilder的 append(CharSequence s, int start, int end) 方法来优化:
public AbstractStringBuilder append(CharSequence s, int start, int end) {
if (s == null)
s = "null";
if ((start < 0) || (start > end) || (end > s.length()))
throw new IndexOutOfBoundsException(
"start " + start + ", end " + end + ", s.length() "
+ s.length());
int len = end - start;
ensureCapacityInternal(count + len);
for (int i = start, j = count; i < end; i++, j++)
value[j] = s.charAt(i);
count += len;
return this;
}
这个方法通过for循环来复制字符串,还不是最好的方案,如果JDK能进一步优化会更好一些,优化方法如下:
public AbstractStringBuilder append(String str, int start, int end) {
if (s == null)
s = "null";
if ((start < 0) || (start > end) || (end > s.length()))
throw new IndexOutOfBoundsException(
"start " + start + ", end " + end + ", s.length() "
+ s.length());
int len = end - start;
ensureCapacityInternal(count + len);
str.getChars(start, end, value, count); // 这句代替上面的for 循环
count += len;
return this;
}
优化后:
StringBuilder str = new StringBuilder(length);
str.append(value, 0, 2).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length);
2). 还有上述代码中用到的leftPad方法,里面用到了递归调用,而且也会使用字符串substring和concat会产生多余的实例,这种是不推荐使用的:
public static String leftPad(final String str, final int size, String padStr) {
if (str == null) {
return null;
}
if (isEmpty(padStr)) {
padStr = SPACE;
}
final int padLen = padStr.length();
final int strLen = str.length();
final int pads = size - strLen;
if (pads <= 0) {
return str; // returns original String when possible
}
if (padLen == 1 && pads <= PAD_LIMIT) {
return leftPad(str, size, padStr.charAt(0));
}
if (pads == padLen) {
return padStr.concat(str);
} else if (pads < padLen) {
return padStr.substring(0, pads).concat(str);
} else {
final char[] padding = new char[pads];
final char[] padChars = padStr.toCharArray();
for (int i = 0; i < pads; i++) {
padding[i] = padChars[i % padLen];
}
return new String(padding).concat(str);
}
}
4. StringBuilder的使用
1). 通过上面测试,尽量使用StringBuilder代替使用“+”拼接字符串,这里就不再赘述
2). 尽量为StringBuilder 设置容量
在可预知字符串长度的情况下,尽量给StringBuilder设置容量大小,如果字符串长度比默认容量小的话,可以减少内存分配,如果字符串长度比默认容量大的话可以减少StringBuilder 内部char数组扩容带性能损耗。
3). StringBuilder的append方法很多,最好能深入了解各个方法的用途,比如上面提到的使用public AbstractStringBuilder append(String str, int start, int end) 代替substring方法。
5. 优化后的代码如下:
public class DesensitizeUtils {
private static final char DESENSITIZE_CODE = '*';
/**
* 根据value长度取值(切分)
*
* @param value
* @return 返回值长度等于入参长度
*/
public static String desensitizeByLength(String value) {
if (StringUtils.isBlank(value)) {
return StringUtils.EMPTY;
}
int length = value.length();
if (length == 1) {
return value;
}
StringBuilder str = new StringBuilder(length);
switch (length) {
case 2:
str.append(value, 0, 1).append(DESENSITIZE_CODE);
break;
case 3:
str.append(value, 0, 1).append(DESENSITIZE_CODE).append(value, length - 1, length);
break;
case 4:
case 5:
str.append(value, 0, 1).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length);
break;
case 6:
case 7:
str.append(value, 0, 2).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length);
break;
default:
str.append(value, 0, 3);
for (int i = 0; i < length - 6; i++) {
str.append(DESENSITIZE_CODE);
}
str.append(value, length - 3, length);
break;
}
return str.toString();
}
/**
* 中文名称脱敏策略:
* 0. 少于等于1个字 直接返回
* 1. 两个字 隐藏姓
* 2. 三个及其以上 只保留第一个和最后一个 其他用星号代替
*
* @param fullName
* @return
*/
public static String desensitizeChineseName(final String fullName) {
if (StringUtils.isBlank(fullName)) {
return StringUtils.EMPTY;
}
int length = fullName.length();
switch (length) {
case 1:
return fullName;
case 2:
StringBuilder str = new StringBuilder(2);
return str.append(DESENSITIZE_CODE).append(fullName, length - 1, length).toString();
default:
str = new StringBuilder(length);
str.append(fullName, 0, 1);
for (int i = 0; i < length - 2; i++) {
str.append(DESENSITIZE_CODE);
}
str.append(fullName, length - 1, length);
return str.toString();
}
}
}
6. 性能对比:
测试代码:
private static final String testString = "akkadmmajkkakkajjk";
@Benchmark
public void testDesensitizeByLengthOld() {
desensitizeByLengthOld(testString);
}
@Benchmark
public void testDesensitizeChineseNameOld() {
desensitizeChineseNameOld(testString);
}
@Benchmark
public void testDesensitizeByLength() {
desensitizeByLength(testString);
}
@Benchmark
public void testDesensitizeChineseName() {
desensitizeChineseName(testString);
}
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.include(DesensitizeUtilsBenchmark.class.getSimpleName())
.output("./DesensitizeUtilsBenchmark.log")
.build();
new Runner(options).run();
}
测试结果:
Benchmark Mode Cnt Score Error Units
DesensitizeUtilsBenchmark.testDesensitizeByLength thrpt 20 61460.601 ± 7262.830 ops/ms
DesensitizeUtilsBenchmark.testDesensitizeByLengthOld thrpt 20 11700.417 ± 1402.169 ops/ms
DesensitizeUtilsBenchmark.testDesensitizeChineseName thrpt 20 117560.449 ± 731.851 ops/ms
DesensitizeUtilsBenchmark.testDesensitizeChineseNameOld thrpt 20 39682.513 ± 463.306 ops/ms
上面的测试用例比较少,不能覆盖所有情况,而且现有Benchmark工具不能看出代码优化前后对GC的影响,这里只是提供一些思路以供参考。