JDK8-20 String 去重特性

最新推荐文章于 2023-04-14 10:12:22 发布

chipi6009

最新推荐文章于 2023-04-14 10:12:22 发布

阅读量390

点赞数

文章标签： java 测试

原文链接：https://my.oschina.net/igeeker/blog/820886

版权

这篇文章简单介绍一下 jdk8-20 中的 String 去重

String 对象在应用中会占用很大的内存. 有很多内容不相等的 String 对象，其实它们的内容都是一样的 (a != b, but a.equals(b)).

JDK 提供了 String.intern() 方法去避免产生多个内容一样的 String 对象. 这个方法不好之处是需要借助一个能分析并找到堆内存中的内容一样的 String 对象的工具, 比如 YourKit profiler. 才能发现哪些String是需要 interned .尽管如此，如果使用得当，String interning 会是一个有力的节省内存的一个工具，它可以使你重用这个 String 对象

从 java 7 开始, 每个 String 对象都有一个 char 数组. JVM 可以自动的去做优化 – 如果一个 char[] 从没暴露给 client 端，JVM 发现了同样内容的两个 String 对象，会使这两个对象使用同一个 char[]

JAVA 8-20 增加了 String 去重特性：

使用 G1 垃圾回收器。通过设置 : -XX:+UseG1GC -XX:+UseStringDeduplication 启动G1 垃圾回收器. 如果使用了其他垃圾回收器，这个特性将会失效。
这个特性会在 G1 回收器的 minor GC 阶段执行. 在我的观察中，它依赖于空闲的 CPU.所以不要期待它在数据密集型计算的场景中执行。但是在一个 WEB 服务中它很可能被执行。
String 对象的重复数据删除特性会去找未经处理的 String 类, 计算它们的 hash codes (未被应用逻辑代码重写的)，然后去看是否有同一个 hash codes 的其他不同的对象，对比它们的字符内容。如果找到了，它会用已有的 String 字符数组替换
String 对象的重复数据删除只会处理那些经历过几次 GC 回收之后，依然存在的对象。这确保了存活时间短的 String 对象是不会被处理的. 可以通过 -XX:StringDeduplicationAgeThreshold=3 来控制字符串的最小存活年龄。

下面是这个特性产生的作用:

如果想使用 String 去重特性，你得开启 G1 垃圾回收器。你不能使用性能更好的 parallel GC 垃圾回收器
String 去重特性无法在一个已经加载完的系统中运行. 检查它是否生效, 设置 JVM 参数 -XX:+PrintStringDeduplicationStatistics 看 Console 输出.
如果想用 String.intern 来节约内存 – 那就不要用 String 的重复数据清理特性. 记住这个特性会处理 JVM 中所有的 String 对象 – JVM 不会预先知道一个对象是否唯一，针对某个String 对象，它会比较其他所有的对象。结果就是它会浪费掉 CPU 资源使用 -XX:+PrintStringDeduplicationStatistics JVM 参数可以检查它的影响
String 去重基本是以非阻塞的方式完成的，如果有足够多的空闲 CPU，那就应该开启这个特性。
最后，String.intern 将可以针对已知对象去做去重处理，通常它会在一个很小的String对象池子里去找到内容相等的对象，可以节省CPU资源。并且 intern 整个 String 对象，可以节省额外的 24 byte的空间

下面是验证这个特性的测试用例. 这三个测试用例都会使 JVM 抛出 OOM，所以需要单独去跑。

第一个测试创建了一个内容唯一的String 对象, 当堆内存中有大量的这个对象时候，可以用它来预估String 去重需要耗费的时间.跑第一个用例时候，尽量给足够大的内存，越大越好。

第二个用例 String 去重清理，第三个用例比较 String.intern. 用同样的Xmx 参数配置去跑这两个用例. 我把这个 Xmx的参数设置为256M, 你可以申请更多的. 你会发现 String 去重处理用例会先失败，然后 String.intern 失败。因为我们在测试用例中仅仅存了 100 个唯一的 String 对象，所以对于这 100个对象的 intern 操作意味着仅仅需要这 100 个String 集合大小的内存。而字符串去重，会产生不同的 String 对象，仅仅共享底层的字符数组

/*1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
*/

/** * String deduplication vs interning test */
public class StringDedupTest {
   private static final int MAX_EXPECTED_ITERS = 300;
   private static final int FULL_ITER_SIZE = 100 * 1000;
   //30M entries = 120M RAM (for 300 iters)
   private static List LIST = new ArrayList<>( MAX_EXPECTED_ITERS * FULL_ITER_SIZE );
   public static void main(String[] args) throws InterruptedException {
       //24+24 bytes per String (24 String shallow, 24 char[])
       //136M left for Strings
       //Unique, dedup
       //136M / 2.9M strings = 48 bytes (exactly String size)
       //Non unique, dedup
       //4.9M Strings, 100 char[]
       //136M / 4.9M strings = 27.75 bytes (close to 24 bytes per String + small overhead
       //Non unique, intern
       //We use 120M (+small overhead for 100 strings) until very late, but can't extend ArrayList 3 times - we don't have 360M
       /*
       Run it with: -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics
       Give as much Xmx as you can on your box. This test will show you how long does it take to
       run a single deduplication and if it is run at all.
       To test when deduplication is run, try changing a parameter of Thread.sleep or comment it out.
       You may want to print garbage collection information using -XX:+PrintGCDetails -XX:+PrintGCTimestamps
       */
       //Xmx256M - 29 iterations
       fillUnique();
       /*
       This couple of tests compare string deduplication (first test) with string interning.
       Both tests should be run with the identical Xmx setting. I have tuned the constants in the program
       for Xmx256M, but any higher value is also good enough.
       The point of this tests is to show that string deduplication still leaves you with distinct String
       objects, each of those requiring 24 bytes. Interning, on the other hand, return you existing String
       objects, so the only memory you spend is for the LIST object.
       */
       //Xmx256M - 49 iterations (100 unique strings)
       fillNonUnique( false );
       //Xmx256M - 299 iterations (100 unique strings)
       fillNonUnique( true );
   }
   private static void fillUnique() throws InterruptedException {
       int iters = 0;
       final UniqueStringGenerator gen = new UniqueStringGenerator();
       while ( true ) {
           for ( int i = 0; i < FULL_ITER_SIZE; ++i )
               LIST.add( gen.nextUnique() );
           Thread.sleep( 300 );
           System.out.println( "Iteration " + (iters++) + " finished" );
       }
   }
   private static void fillNonUnique( final boolean intern ) throws InterruptedException {
       int iters = 0;
       final UniqueStringGenerator gen = new UniqueStringGenerator();
       while ( true ) {
           for ( int i = 0; i < FULL_ITER_SIZE; ++i )
               LIST.add( intern ? gen.nextNonUnique().intern() : gen.nextNonUnique() );
           Thread.sleep( 300 );
           System.out.println( "Iteration " + (iters++) + " finished" );
       }
   }
   private static class UniqueStringGenerator {
       private char upper = 0;
       private char lower = 0;
       public String nextUnique() {
           final String res = String.valueOf( upper ) + lower;
           if ( lower < Character.MAX_VALUE )
               lower++;
           else {
               upper++;
               lower = 0;
           }
           return res;
       }
       public String nextNonUnique() {
           final String res = "a" + lower;
           if ( lower < 100 )
               lower++;
           else
               lower = 0;
           return res;
       }
   }
}

总结

String deduplication 特性实在 Java 8 - 20 中加入的. 他是 G1 垃圾回收器的一部分, 想使用这个通过 : -XX:+UseG1GC -XX:+UseStringDeduplication 参数开启此特性
String deduplication 依赖当前系统的负载.
String deduplication 会找到相同内容的不同字符块并规整其字符串数组. 无需代码实现，内存中的 String 对象都是内存唯一的.有些情况使用 String.intern 更适合.
String deduplication 不会处理存活时间短的 String. 通过 -XX:StringDeduplicationAgeThreshold=3 JVM 参数可以设置最短处理时间(3是参数的默认值)

翻译来源：

http://java-performance.info/java-string-deduplication/

微信扫码关注：

转载于:https://my.oschina.net/igeeker/blog/820886

chipi6009

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
JDK8-20 String 去重特性

这篇文章简单介绍一下 jdk8-20 中的 String去重 String 对象在应用中会占用很大的内存. 有很多内容不相等的 String 对象，其实它们的内容都是一样的 (a != b, but a.equals(b)). JDK 提供了 String.intern() 方法去避免...
复制链接

扫一扫