1.现象
在项目中开发的一个服务,运行一段时间后发现自己停止了,查看服务日志,发现报错如下:
2023-02-28T16:47:43.491+08:00 WARN ctm01elefencedatain.ctm01elefencedatain [DefaultMessageListenerContainer-13] [o.s.jms.listener.DefaultMessageListenerContainer:895] - Setup of JMS message listener invoker failed for destination 'bic.core.topic.netdomain_change' - trying to recover. Cause: Cannot send, channel has already failed: tcp://32.50.126.1:8356
2023-02-28T16:40:05.534+08:00 ERROR ctm01elefencedatain.ctm01elefencedatain [http-nio-26875-exec-11] [o.s.c.sleuth.instrument.web.ExceptionLoggingFilter:54] - Uncaught exception thrown
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1061)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:652)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:733)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.jasig.cas.client.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:190)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.jasig.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:237)
at com.hikvision.sso.client.filter.HikCas20ProxyReceivingTicketValidationFilter.doFilter(HikCas20ProxyReceivingTicketValidationFilter.java:193)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at com.hikvision.sso.client.filter.HikAuthenticationFilter.doFilter(HikAuthenticationFilter.java:45)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.jasig.cas.client.session.SingleSignOutFilter.doFilter(SingleSignOutFilter.java:80)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at com.hikvision.starfish.security.web.filter.HttpRequestMehtodFilter.doFilter(HttpRequestMehtodFilter.java:48)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:92)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.cloud.sleuth.instrument.web.ExceptionLoggingFilter.doFilter(ExceptionLoggingFilter.java:50)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at brave.servlet.TracingFilter.doFilter(TracingFilter.java:86)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.boot.web.servlet.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:128)
at org.springframework.boot.web.servlet.support.ErrorPageFilter.access$000(ErrorPageFilter.java:66)
at org.springframework.boot.web.servlet.support.ErrorPageFilter$1.doFilterInternal(ErrorPageFilter.java:103)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.springframework.boot.web.servlet.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:121)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:200)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:544)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:143)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:353)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:620)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:831)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1629)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Java heap space
报错为内存泄漏了。
2.排查
在虚拟机启动参数中添加配置项,使得当程序发生 OOM 退出系统时,一些瞬时信息都随着程序的终止而消失,而重现 OOM 问题往往比较困难或者耗时。此时若能在 OOM 时,自动导出 dump 文件就显得非常迫切。
- Xx: +HeapDumpOnOutOfMemoryError //在程序发生 OOM 时,导出应用程序的当前堆快照
- XX:HeapDumpPath=/opt/home/mydumpfile.hprof //可以指定堆快照的保存位置
拿到dump文件后用mat工具打开,发现一个对象占据内存2.1GB:
进入泄漏分析模块,发现一个线程池对象占据2GB多的内存。至此,问题定位到线程池的使用有问题。
进一步查看详情,发现线程池中阻塞对列占据2GB多的内存。
查看代码中使用该线程池的地方,如下,使用场景是频繁的读取数据向线程池提交数据处理任务。
private final ExecutorService TASK_POOL = new ThreadPoolExecutor(10, 20, 2L, TimeUnit.HOURS, new LinkedBlockingQueue<>());
@Value("${kafka.producer.topic}")
private String topic;
@Autowired
private DataNumService dataNumService;
@Override
public BaseResult readin(List<DeviceData> deviceDatas) {
BaseResult result = new BaseResult("0", "成功", "");
try {
String receiveTime = DateUtil.getCurrentDateTime();
//提交任务
TASK_POOL.submit(new SendDataTask(counter, deviceDatas, receiveTime,topic));
} catch (Exception e) {
result.setCode("-1");
result.setMsg("失败");
log.error("readin调用发生异常!", e);
}
return result;
}
3.分析
代码中使用new ThreadPoolExecutor(10, 20, 2L, TimeUnit.HOURS, new LinkedBlockingQueue<>())创建了一个核心线程数为10,最大线程数为20,阻塞对列未指定长度的线程池。
首先看下线程池的运行流程:
一个请求进来之后,如果核心线程有空闲,线程直接使用核心线程中的线程执行任务,不会添加到阻塞队列中;如果核心线程满了,新的任务会添加到阻塞队列,直到队列加满再开线程,直到达到最大线程数maxPoolSize之后再触发拒绝执行策略。
基于排查过程和线程池的工作流程,要找到阻塞队列内存溢出的原因,而代码中的线程池使用的
new LinkedBlockingQueue<>()来创建的队列,查看LinkedBlockingQueue的构造方法,发现如果不主动指定队列大小,则默认的代销是,我们的线程池只有10个线程可以处理任务,其他的请求全部放到阻塞队列中,那么当涌入大量的请求之后,阻塞队列一直增加,内存配置又比较紧凑的话,是很容易出现内存溢出的。
/**
* Creates a {@code LinkedBlockingQueue} with a capacity of
* {@link Integer#MAX_VALUE}.
*/
public LinkedBlockingQueue() {
this(Integer.MAX_VALUE);
}
如何正确的使用线程池:
以前其实没太在意这种问题,使用ThreadPoolExecutor创建需要自己指定核心线程数、最大线程数、线程的空闲时长以及阻塞队列。在队列满了并且达到最大线程数之后,再添加的话会出现异常,还是会有部分不能得到执行。
3种阻塞队列
ArrayBlockingQueue:基于数组的先进先出队列,有界
LinkedBlockingQueue:基于链表的先进先出队列,有界
SynchronousQueue:无缓冲的等待队列,无界
我们使用了有界的队列,那么当队列满了之后如何处理后面进入的请求,我们可以通过不同的策略进行设置。
4种拒绝策略
AbortPolicy:默认,队列满了丢任务抛出异常
DiscardPolicy:队列满了丢任务不异常
DiscardOldestPolicy:将最早进入队列的任务删,之后再尝试加入队列
CallerRunsPolicy:如果添加到线程池失败,那么主线程会自己去执行该任务
如何创建一个容错率比较高的线程池?指定队列容量并使用CallerRunsPolicy策略,当队列满了之后,使用主线程去进行处理,这样就不会出现有部分请求得不到执行的情况,也不会因为因为阻塞队列过大导致内存溢出的情况。
4.问题解决
在创建线程池时指定队列容量并且使用CallerRunsPolicy拒绝策略,如下:
private final ExecutorService TASK_POOL = new ThreadPoolExecutor(10, 20, 2L, TimeUnit.HOURS, new LinkedBlockingQueue<>(1000),new ThreadPoolExecutor.CallerRunsPolicy());
经过验证,问题解决。
5.总结
关于线程池在业务中的使用,主要注意以下几点:
- 避免使用全局线程池:如果从全局考虑去创建线程池,是很难把控的,因为你无法准确地评估所有的请求加起来会有多大的量,所以最好是每个业务创建独立的线程池进行处理,这样是很容易评估量化的。
- 另外创建的时候,最好评估下大概每秒的请求量有多少,然后来合理的初始化线程数和队列大小。