问题描述:
大概问题是一个项目运行过程中,突然大量线程被阻塞,导致应用超时无响应(此处省去检测方法,自己搜索,可以参考中间件如tomcat的线程配置、jsack使用)
jstack后,如下是大量线程信息,基本所有线程都是如下:
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:201)
- waiting to lock <。。。。(隐藏,暂且叫0xaaa吧)> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$300(NativePRNG.java:108)
at sun.security.provider.NativePRNG.engineGenerateSeed(NativePRNG.java:102)
at java.security.SecureRandom.generateSeed(SecureRandom.java:495)
过程分析:
通过简单的排查,包括gc情况、内存、cpu、网络情况等等,定位问题发生在线程阻塞
仔细看这段信息,可以发现,都在等待资源0xaaa,因为没太多时间检测,先不管这个资源是什么,分析上下文,两个注意点,一个是NativePRNG、一个SecureRandom,那我们来看看这两个类到底是干嘛的
NativePRNG,搜索了下,找到一篇大概介绍随机数引起阻塞问题,
https://www.cnblogs.com/softidea/p/9725156.html
如下一段话,觉得引起重视:
去查 NativePRNG$Blocking
的代码,看到它的文档描述:
A NativePRNG-like class that uses /dev/random for both seed and random material. Note that it does not respect the egd properties, since we have no way of knowing what those qualities are.
奇怪怎么-Djava.security.egd=file:/dev/./urandom
参数没起作用,仍使用/dev/random
作为随机数的熵池,时间久或调用频繁的话熵池很容易不够用而导致阻塞;于是看了一下 SecureRandom.getInstanceStrong()
的文档:
Returns a SecureRandom object that was selected by using the algorithms/providers specified in the securerandom.strongAlgorithms Security property.
原来有自己的算法,在 jre/lib/security/java.security
文件里,默认定义为:
securerandom.strongAlgorithms=NativePRNGBlocking:SUN
结果:
这里给几个连接,大家自己看吧,看完就明白了
https://stackoverflow.com/questions/137212/how-to-solve-slow-java-securerandom
https://issues.jenkins-ci.org/browse/JENKINS-20108
还有如下摘自上面那个网址的结论:,时间仓促,匆匆了结,以后有空再记录
如果修改算法值为NativePRNGNonBlocking:SUN
的话,会采用NativePRNG$NonBlocking
里的逻辑,用/dev/urandom
作为熵池,不会遇到阻塞问题。但这个文件是jdk系统文件,修改它或重新指定一个路径都有些麻烦,最好能通过系统环境变量来设置,可这个变量不像securerandom.source
属性可以通过系统环境变量-Djava.security.egd=xxx
来配置,找半天就是没有对应的系统环境变量。只好修改代码,不采用SecureRandom.getInstanceStrong
这个新方法,改成了SecureRandom.getInstance("NativePRNGNonBlocking")
。
对于SecureRandom
的两种算法实现:SHA1PRNG
和 NativePRNG
跟 securerandom.source
变量的关系,找到一篇解释的很清楚的文章:Using the SecureRandom Class
On Linux:
1) when this value is “file:/dev/urandom” then the NativePRNG algorithm is registered by the Sun crypto provider as the default implementation; the NativePRNG algorithm then reads from /dev/urandom for nextBytes but /dev/random for generateSeed
2) when this value is “file:/dev/random” then the NativePRNG algorithm is not registered by the Sun crypto provider, but the SHA1PRNG system uses a NativeSeedGenerator which reads from /dev/random.
3) when this value is anything else then the SHA1PRNG is used with a URLSeedGenerator that reads from that source.
4) when the value is undefined, then SHA1PRNG is used with ThreadedSeedGenerator
5) when the code explicitly asks for “SHA1PRNG” and the value is either “file:/dev/urandom” or “file:/dev/random” then (2) also occurs
6) when the code explicitly asks for “SHA1PRNG” and the value is some other “file:” url, then (3) occurs
7) when the code explicitly asks for “SHA1PRNG” and the value is undefined then (4) occurs
至于SHA1PRNG
算法里,为何用urandom时,不能直接设置为file:/dev/urandom
而要用变通的方式设置为file:///dev/urandom
或者 file:/dev/./urandom
,参考这里:
In SHA1PRNG, there is a SeedGenerator which does various things depending on the configuration.
If java.security.egd or securerandom.source point to “file:/dev/random” or “file:/dev/urandom”, we will use NativeSeedGenerator, which calls super() which calls SeedGenerator.URLSeedGenerator(/dev/random). (A nested class within SeedGenerator.) The only things that changed in this bug was that urandom will also trigger use of this code path.
If those properties point to another URL that exists, we’ll initialize SeedGenerator.URLSeedGenerator(url). This is why “file:///dev/urandom”, “file:/./dev/random”, etc. will work.
http://hongjiang.info/java8-nativeprng-blocking/