字符串replaceall_Java字符串replaceall中的微优化

字符串replaceall

In this post, we will discuss the usage of another prevalent code constructions, the String.replaceAll and String.replace methods, and we will investigate how it affects the performance of your code in Java 11 and what you can do about it.

在本文中,我们将讨论另一个流行的代码构造的用法,即String.replaceAllString.replace方法,并且我们将研究它如何影响Java 11中代码的性能以及您可以如何做。

(Please consider all the code below from the point of performance)

(请从性能角度考虑以下所有代码)

(Please don’t focus on numbers, they are just metrics to prove the point)

(请不要专注于数字,它们只是证明这一点的指标)

String.replaceAll (String.replaceAll)

I wouldn’t say I like to make up the examples, so this time, we’ll start right away from the existing codebase of the Spring Framework. Let’s look at this line of the MetadataEncoder class of the spring-messaging module:

我不会说我想组成示例,所以这次,我们将从Spring框架的现有代码库开始。 让我们看一下spring-messaging模块的MetadataEncoder类这一行

value = value.contains(".") ? value.replaceAll("\\.","%2E") : value;

Do you see what is wrong here?

你看到这里有什么问题吗?

All this code is trying to do is encode the dot symbol and pass the result later to the HTTP URL. I was very lucky to find that particular code snippet in the popular codebase. It has a few things in a single line.

这些代码试图做的就是对符号进行编码,然后将结果传递给HTTP URL。 我很幸运在流行的代码库中找到了特定的代码段。 它在一行中有几件事。

If you’re an experienced developer, you already know that the String.replaceAll method is using the regular expression pattern as the first parameter:

如果您是一位经验丰富的开发人员,您已经知道String.replaceAll方法使用正则表达式模式作为第一个参数:

public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this)
.replaceAll(replacement);
}

However, in the above code, we replace only dot character. Very often, when you perform the replacement operation, you don’t need any pattern matching. You can just use another very similar and lighter method in terms of the performance — the String.replace:

但是,在上面的代码中,我们仅替换了字符。 通常,执行替换操作时,不需要任何模式匹配。 就性能而言,您可以使用另一个非常相似且更轻松的方法-String.replace:

public String replace(CharSequence target, CharSequence replacement)

For example, when you need to replace a single word or a single character like in our example.

例如,当您需要替换单个单词或单个字符(如我们的示例)时。

Note: In Java 8 the String.replace method was using the Pattern inside as the String.replaceAll. However, since Java 9, it has changed. And that gives us a large room for optimizations in the existing codebase.

注意:在Java 8中,String.replace方法使用内部的Pattern作为String.replaceAll。 但是,自Java 9以来,它已经发生了变化。 这为我们提供了在现有代码库中进行优化的广阔空间。

Also, the usage of the methods String.replace and String.replaceAll seems like error-prone by design. That’s because when you start typing something in the IDE and you see both these methods closely, you might think that the String.replace replaces only the first occurrence while the replaceAll replaces all. And intuitively you will choose the String.replaceAll over the String.replace.

同样,使用String.replaceString.replaceAll方法在设计上似乎容易出错。 这是因为当您开始在IDE中键入内容时,并且您同时看到这两种方法时,您可能会认为String.replace仅替换第一个匹配项,而replaceAll替换所有匹配项。 直观地,您将在String.replace上选择String.replaceAll

Another interesting thing here is that the code above already has a micro-optimization, it’s the value.contains(“.”) method usage. So the string is checked for the dot symbol to avoid pattern matching if there is nothing to replace.

另一个有趣的事情是,上面的代码已经进行了微优化,这就是value.contains(“。”)方法的用法。 因此,如果没有要替换的内容,则在字符串中检查符号,以避免模式匹配。

Okay, let’s fix the example above:

好的,让我们修复上面的示例:

value = value.replace(".", "%2E");

we can also try to apply the String.indexOf(“.”) optimization and check if that helps in the case of the String.replace method usage in Java 11:

我们还可以尝试应用String.indexOf(“。”)优化,并检查在Java 11中使用String.replace方法是否有帮助:

value = value.contains(".") ? value.replace(".", "%2E") : value;

Let’s write the benchmark:

让我们编写基准:

Results (lower score means faster) :

结果(分数越低意味着速度越快):

It looks like usage of the String.indexOf(“.”) with the String.replaceAll method, in fact, makes sense, even for the empty input string, compiling and matching the pattern takes too much time. ~50x difference is huge. The same applies to the input string without any dot symbols. And when the actual replacement work had to be performed, the String.replace method outperforms the String.replaceAll by three times.

实际上,将String.indexOf(“。”)String.replaceAll方法一起使用似乎很有意义,即使对于空的输入字符串,编译和匹配模式也要花费大量时间。 约50倍的差异是巨大的。 没有任何点符号的输入字符串也是如此。 而且,当必须执行实际的替换工作时, String.replace方法的性能要比 String.replaceAll高出三倍

Also, it seems like the String.indexOf optimization doesn’t make any sense with the String.replace method in Java 11 anymore, while it was required in Java 8 when we had a pattern matching inside. Now it even makes it a bit slower. That’s because of the String.replace method implementation, as it already performs String.indexOf search inside. So we’re doing the double job here.

同样,似乎String.indexOf优化对于Java 11中的String.replace方法不再具有任何意义,而当Java 8中具有模式匹配时,它在Java 8中是必需的。 现在,它甚至使它变慢了一点。 那是因为String.replace方法的实现,因为它已经在内部执行了String.indexOf搜索。 所以我们在这里做双重工作。

“But you can precompile the regular pattern expression and use it in order to improve the performance of String.replaceAll method”, you would say. Agree. In fact, we do that a lot in Blynk.

您会说:“但是您可以预编译正则模式表达式,并使用它来提高String.replaceAll方法的性能。” 同意。 实际上,我们在Blynk中做了很多事情

Let’s check how precompiled pattern changes the numbers:

让我们检查一下预编译模式如何改变数字:

Results (lower score means faster) :

结果(分数越低意味着速度越快):

Yes, the numbers are better now. However, still not that good as in the case of the String.replace method. We can try to focus on optimizing the regular pattern even more, but there are already enough posts about it.

是的,现在的数字更好了。 但是,仍然不如String.replace方法那样好。 我们可以尝试更多地专注于优化常规模式,但是已经有足够的帖子了。

You might think that an initial example is just a single place, and it’s pretty rare. Let’s look into the GitHub:

您可能会认为最初的示例只是一个地方,而且很少见。 让我们看一下GitHub:

Image for post

GitHub just indexed some repositories, and on the first screen, five out of the six String.replaceAll usages could be replaced with String.replace! Yes, many projects are still on Java 8, which won’t make any difference for them. However, after most developers migrate to Java 11, we’ll have a lot of the slow legacy code out there. We can start improving it right away.

GitHub刚刚索引了一些存储库,在第一个屏幕上,六个String.replace中的五个都可以用String.replace替换! 是的,许多项目仍使用Java 8,这对他们没有任何影响。 但是,在大多数开发人员迁移到Java 11之后,我们将获得许多缓慢的旧代码。 我们可以立即开始对其进行改进。

StringUtils.replace (StringUtils.replace)

Before Java 11, when you had a hot path with the String.replace method, you had to find the faster alternatives in some 3-d party libraries or even write your own custom version. The most known 3-d party alternative is the Apache Commons StringUtils.replace method.

在Java 11之前,当您使用String.replace方法获得热路径时,您必须在某些3-d方库中找到更快的替代方法,甚至编写自己的自定义版本。 最有名的3-d替代方法是Apache Commons StringUtils.replace方法。

An example of a custom replace method could be found, for instance, in the Spring Framework. Here it is.

自定义替换方法的示例可以在例如Spring Framework中找到。 在这里

Let’s look at another Spring code snippet:

让我们看一下另一个Spring 代码片段

String internalName = StringUtils.replace(className, ".", "/");

Do you see what is wrong here?

你看到这里有什么问题吗?

Let’s check Spring (latest source code), Apache Commons (latest version 3.11 of commons-lang3), and Java methods in our benchmark:

让我们在基准测试中检查Spring(最新源代码),Apache Commons(commons-lang3的最新版本3.11)和Java方法:

Results (lower score means faster) :

结果(分数越低意味着速度越快):

Hm, it looks like all methods are pretty close. Apache Commons is a bit slower, but that’s because it has additional logic for handling the case insensitive replacement. So everything makes sense.

嗯,看来所有方法都差不多。 Apache Commons有点慢,但这是因为它具有用于处理区分大小写的替换的附加逻辑。 因此,一切都有意义。

And now, as we have similar performance, we don’t need a custom method or 3-d party library anymore in order to perform the fast String.replace in Java 11.

现在,由于我们具有类似的性能,因此不再需要自定义方法或3-d方库即可在Java 11中执行快速的String.replace

But something still not ok with this line:

但是,这行仍然不可行:

return value.replace(".", "/");

Do you see what is wrong here?

你看到这里有什么问题吗?

Contrary to the first example where actual string replacement happens, here we have a single character both for the search and replace. And as we know, Java has a specialized version for character replacement:

与第一个示例实际发生字符串替换的情况相反,这里我们有一个用于搜索和替换的字符。 众所周知,Java有一个专门的字符替换版本:

String replace(char oldChar, char newChar)

Let’s add it to our benchmark as well:

我们也将其添加到基准中:

@Benchmark
public String replaceChar() {
return value.replace('.', '/');
}

Apache Commons library also has StringUtils.replaceChars, but it uses String.replace(char, char) inside, so we’ll skip it. And we are one more step closer to eliminate this 3-d party library from your project.

Apache Commons库也具有StringUtils.replaceChars ,但是它在内部使用String.replace(char,char) ,因此我们将其跳过。 而且,从您的项目中删除此3-d方库还需要再走一步。

Results (lower score means faster) :

结果(分数越低意味着速度越快):

Java character specialized version for the single char replacement four times faster than overloaded String.replace(String, String) and the custom Spring approach. The funny thing is that even in Java 8 String.replace(char, char) is optimized well enough. So Spring could safely use the String.replace(char, char).

Java字符专用版本的单个字符替换比重载String.replace(String,String)和自定义Spring快四倍 方法。 有趣的是,即使在Java 8中, String.replace(char,char)的优化也足够好。 因此,Spring可以安全地使用String.replace(char,char)

删除字符串 (String.remove)

I hope you aren’t tired yet :)? Let’s look at this last code example:

我希望你还不累:)? 让我们看一下最后的代码示例:

value = value.replace(".", "");

Do you see what is wrong here?

你看到这里有什么问题吗?

Unfortunately, Java still doesn’t have the String.remove method. As an alternative, we could use String.replace(char, char) method, but Java doesn’t have empty character literal as well, we can’t write code like that:

不幸的是,Java仍然没有String.remove方法。 作为替代,我们可以使用String.replace(char,char)方法,但是Java也没有空字符文字,因此我们不能编写如下代码:

value.replace('.', '');

So, instead, we have to use a “hack” above.

因此,我们必须在上面使用“ hack”。

Fortunately, there are many 3-d party implementations out there like Apache Commons StringUtils.remove(String, char). Spring, for example, uses own custom implementation for that based on own custom replace method:

幸运的是,那里有许多3-d第三方实现,例如Apache Commons StringUtils.remove(String,char) 。 例如,Spring使用基于自己的自定义替换方法的自定义实现:

public static String delete(String inString, String pattern) {
return StringUtils.replace(inString, pattern, "");
}

Let’s check Spring, Apache Commons, and Java methods in our benchmark again for the remove operation:

让我们再次检查基准测试中的Spring,Apache Commons和Java方法是否进行了删除操作:

Results (lower score means faster) :

结果(分数越低意味着速度越快):

Specialized Apache Commons version wins by almost three times. The interesting thing here is that even specialized and optimized char removal is slower than char replacement in the String.replace(char, char). There is definitely still room for even further improvement.

专门的Apache Commons版本赢得了近三倍的胜利。 有趣的是,即使是专门优化的char删除也比String.replace(char,char)中的char替换要慢。 肯定还有进一步改进的空间。

Hopefully, someday we’ll see that in Java.

希望有一天,我们会在Java中看到它。

结论 (Conclusions)

  • Use Java 11

    使用Java 11
  • Use the String.replace over the String.replaceAll when possible

    尽可能在String.replaceAll上使用S tring.replace

  • If you have to use the String.replaceAll, try to precompile the regular expression in the hot paths

    如果必须使用String.replaceAll ,请尝试在热路径中预编译正则表达式

  • Go with a specialized version of the String.replace(char, char) instead of the String.replace(String, String) when you can

    如果可以,请使用String.replace(char,char)的专用版本而不是String.replace(String,String)

  • For hot paths, you’ll still need to consider 3-d party libraries or custom methods instead of the String.replace(value, “”) code pattern

    对于热路径,您仍然需要考虑3-d方库或自定义方法,而不是String.replace(value,“”)代码模式

Here is a source code of benchmarks so that you can try it yourself.

这是基准测试的源代码 ,因此您可以自己尝试。

Thank you for your attention, and stay tuned.

感谢您的关注,敬请期待。

Previous post: Micro optimizations in Java. Good, nice and slow Enum

上一篇: Java中的微优化。 好,好,慢Enum

翻译自: https://medium.com/javarevisited/micro-optimizations-in-java-string-replaceall-c6d0edf2ef6

字符串replaceall

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值