记一次关于tomcat关闭时,清理线程时的警告日志问题排查。

近期有使用nacos的小伙伴在使用时遇到一个tomcat警告内存泄漏的问题。

相关警告信息:

2020-11-03 16:59:46.088 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.beat.sender] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 2020-11-03 16:59:46.089 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.failover] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 2020-11-03 16:59:46.090 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.push.receiver] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.net.PlainDatagramSocketImpl.receive0(Native Method) java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143) java.net.DatagramSocket.receive(DatagramSocket.java:812) com.alibaba.nacos.client.naming.core.PushReceiver.run(PushReceiver.java:73) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)

下面是我复现并定位问题的一些信息:

我本地项目成功复现该问题。
mave依赖中包含spring-boot-starter-actuator.

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

        <dependency>
            <groupId>com.alibaba.cloud</groupId>
            <artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
            <version>2.2.3.RELEASE</version>
        </dependency>

application配置:

spring:
  application:
    name: nacos-producer
  cloud:
    nacos:
      discovery:
        server-addr: 127.0.0.1:8848

启动类:

@SpringBootApplication
@EnableDiscoveryClient
public class App {

    public static void main(String[] args) {
        SpringApplication.run(App.class, args);
    }
}

本地的nacos-server没有进行启动,8848端口不进行监听。项目启动时,会去注册实例到nacos-server, 但是nacos-server没有启动,就会出现connection refused的报错,导致spring初始化失败,关闭tomcat容器。而警告日志就是在关闭tomcat容器的时候打出来的。

tomcat中打警告日志的逻辑见:org.apache.catalina.loader.WebappClassLoaderBase#clearReferencesThreads. 这个org.apache.catalina.loader.WebappClassLoaderBase的实现类其实就是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader.

下面是clearReferencesThreads方法的部分逻辑。

 private void clearReferencesThreads() {
        Thread[] threads = getThreads();
        List<Thread> executorThreadsToStop = new ArrayList<>();

        // Iterate over the set of threads
        for (Thread thread : threads) {
            if (thread != null) {
                ClassLoader ccl = thread.getContextClassLoader();
                if (ccl == this) {
                    // Don't warn about this thread
                    if (thread == Thread.currentThread()) {
                        continue;
                    }

                    final String threadName = thread.getName();

                    // JVM controlled threads
                    ThreadGroup tg = thread.getThreadGroup();
                    if (tg != null && JVM_THREAD_GROUP_NAMES.contains(tg.getName())) {
                        // HttpClient keep-alive threads
                        if (clearReferencesHttpClientKeepAliveThread &&
                                threadName.equals("Keep-Alive-Timer")) {
                            thread.setContextClassLoader(parent);
                            log.debug(sm.getString("webappClassLoader.checkThreadsHttpClient"));
                        }

                        // Don't warn about remaining JVM controlled threads
                        continue;
                    }

                    // Skip threads that have already died
                    if (!thread.isAlive()) {
                        continue;
                    }

                    // TimerThread can be stopped safely so treat separately
                    // "java.util.TimerThread" in Sun/Oracle JDK
                    // "java.util.Timer$TimerImpl" in Apache Harmony and in IBM JDK
                    if (thread.getClass().getName().startsWith("java.util.Timer") &&
                            clearReferencesStopTimerThreads) {
                        clearReferencesStopTimerThread(thread);
                        continue;
                    }

                    if (isRequestThread(thread)) {
                       //打印相关的警告
                        log.warn(sm.getString("webappClassLoader.stackTraceRequestThread",
                                getContextName(), threadName, getStackTrace(thread)));
                    } else {
                      //打印相关的警告
                        log.warn(sm.getString("webappClassLoader.stackTrace",
                                getContextName(), threadName, getStackTrace(thread)));
                    }
}

Springboot程序在启动失败过后,会去关闭Tomcat容器,在关闭Tomcat容器的时候会扫描线程,如果对应的线程满足一下几个点,就会打警告日志。

1.判断线程的ContextClassLoader是否和是自己, 也就是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader.
2.判断该线程不是当前线程。
满足两点过后,再判断当前线程的栈帧中是否存在org.apache.catalina.connector.CoyoteAdapter,如果包含这个类,说明当前线程正在处理请求,打警告日志:

The web application [{0}] is still processing a request that has yet to finish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Context implementation. Stack trace of request processing thread:[{2}]

如果不包含这个类的话,打警告日志:

The web application [{0}] appears to have started a thread named [{1}] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:{2}

这个就是打警告日志的原因。

创建线程时,线程的contextClassLoader和创建线程的线程的contextClassLoader是一致的。
以下来自Thread的构造函数的部分相关逻辑:

   Thread parent = currentThread();
   if (security == null || isCCLOverridden(parent.getClass()))
            this.contextClassLoader = parent.getContextClassLoader();
        else
            this.contextClassLoader = parent.contextClassLoader;

所以,之前的判断打警告日志时,对应线程的classLoader只要是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader就会打相关的警告日志。

如果创建相关的线程时,Thread.currentThread()的contextClassLoader对应的是TomcatEmbeddedWebappClassLoader的话,
就会发生之前打警告日志的问题。
这涉及到org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedContext初始化的一个逻辑,见方法org.apache.catalina.core.StandardContext#startInternal.相关逻辑如下:

  try {
            if (ok) {
                // Start our subordinate components, if any
                Loader loader = getLoader();
                if (loader instanceof Lifecycle) {
                    ((Lifecycle) loader).start();
                }

                // since the loader just started, the webapp classloader is now
                // created.
                setClassLoaderProperty("clearReferencesRmiTargets",
                        getClearReferencesRmiTargets());
                setClassLoaderProperty("clearReferencesStopThreads",
                        getClearReferencesStopThreads());
                setClassLoaderProperty("clearReferencesStopTimerThreads",
                        getClearReferencesStopTimerThreads());
                setClassLoaderProperty("clearReferencesHttpClientKeepAliveThread",
                        getClearReferencesHttpClientKeepAliveThread());
                setClassLoaderProperty("clearReferencesObjectStreamClassCaches",
                        getClearReferencesObjectStreamClassCaches());
                setClassLoaderProperty("clearReferencesObjectStreamClassCaches",
                        getClearReferencesObjectStreamClassCaches());
                setClassLoaderProperty("clearReferencesThreadLocals",
                        getClearReferencesThreadLocals());

                // By calling unbindThread and bindThread in a row, we setup the
                // current Thread CCL to be the webapp classloader
                unbindThread(oldCCL);
                oldCCL = bindThread();

                // Initialize logger again. Other components might have used it
                // too early, so it should be reset.
                logger = null;
                getLogger();

                Realm realm = getRealmInternal();
                if(null != realm) {
                    if (realm instanceof Lifecycle) {
                        ((Lifecycle) realm).start();
                    }

                    // Place the CredentialHandler into the ServletContext so
                    // applications can have access to it. Wrap it in a "safe"
                    // handler so application's can't modify it.
                    CredentialHandler safeHandler = new CredentialHandler() {
                        @Override
                        public boolean matches(String inputCredentials, String storedCredentials) {
                            return getRealmInternal().getCredentialHandler().matches(inputCredentials, storedCredentials);
                        }

                        @Override
                        public String mutate(String inputCredentials) {
                            return getRealmInternal().getCredentialHandler().mutate(inputCredentials);
                        }
                    };
                    context.setAttribute(Globals.CREDENTIAL_HANDLER, safeHandler);
                }

                // Notify our interested LifecycleListeners
                fireLifecycleEvent(Lifecycle.CONFIGURE_START_EVENT, null);

                // Start our child containers, if not already started
                for (Container child : findChildren()) {
                    if (!child.getState().isAvailable()) {
                        child.start();
                    }
                }

                // Start the Valves in our pipeline (including the basic),
                // if any
                if (pipeline instanceof Lifecycle) {
                    ((Lifecycle) pipeline).start();
                }

                // Acquire clustered manager
                Manager contextManager = null;
                Manager manager = getManager();
                if (manager == null) {
                    if (log.isDebugEnabled()) {
                        log.debug(sm.getString("standardContext.cluster.noManager",
                                Boolean.valueOf((getCluster() != null)),
                                Boolean.valueOf(distributable)));
                    }
                    if ((getCluster() != null) && distributable) {
                        try {
                            contextManager = getCluster().createManager(getName());
                        } catch (Exception ex) {
                            log.error(sm.getString("standardContext.cluster.managerError"), ex);
                            ok = false;
                        }
                    } else {
                        contextManager = new StandardManager();
                    }
                }

                // Configure default manager if none was specified
                if (contextManager != null) {
                    if (log.isDebugEnabled()) {
                        log.debug(sm.getString("standardContext.manager",
                                contextManager.getClass().getName()));
                    }
                    setManager(contextManager);
                }

                if (manager!=null && (getCluster() != null) && distributable) {
                    //let the cluster know that there is a context that is distributable
                    //and that it has its own manager
                    getCluster().registerManager(manager);
                }
            }

            if (!getConfigured()) {
                log.error(sm.getString("standardContext.configurationFail"));
                ok = false;
            }

            // We put the resources into the servlet context
            if (ok)
                getServletContext().setAttribute
                    (Globals.RESOURCES_ATTR, getResources());

            if (ok ) {
                if (getInstanceManager() == null) {
                    setInstanceManager(createInstanceManager());
                }
                getServletContext().setAttribute(
                        InstanceManager.class.getName(), getInstanceManager());
                InstanceManagerBindings.bind(getLoader().getClassLoader(), getInstanceManager());
            }

            // Create context attributes that will be required
            if (ok) {
                getServletContext().setAttribute(
                        JarScanner.class.getName(), getJarScanner());
            }

            // Set up the context init params
            mergeParameters();

            // Call ServletContainerInitializers
            for (Map.Entry<ServletContainerInitializer, Set<Class<?>>> entry :
                initializers.entrySet()) {
                try {
                    entry.getKey().onStartup(entry.getValue(),
                            getServletContext());
                } catch (ServletException e) {
                    log.error(sm.getString("standardContext.sciFail"), e);
                    ok = false;
                    break;
                }
            }

            // Configure and call application event listeners
            if (ok) {
                if (!listenerStart()) {
                    log.error(sm.getString("standardContext.listenerFail"));
                    ok = false;
                }
            }

            // Check constraints for uncovered HTTP methods
            // Needs to be after SCIs and listeners as they may programmatically
            // change constraints
            if (ok) {
                checkConstraintsForUncoveredMethods(findConstraints());
            }

            try {
                // Start manager
                Manager manager = getManager();
                if (manager instanceof Lifecycle) {
                    ((Lifecycle) manager).start();
                }
            } catch(Exception e) {
                log.error(sm.getString("standardContext.managerFail"), e);
                ok = false;
            }

            // Configure and call application filters
            if (ok) {
                if (!filterStart()) {
                    log.error(sm.getString("standardContext.filterFail"));
                    ok = false;
                }
            }

            // Load and initialize all "load on startup" servlets
            if (ok) {
                if (!loadOnStartup(findChildren())){
                    log.error(sm.getString("standardContext.servletFail"));
                    ok = false;
                }
            }

            // Start ContainerBackgroundProcessor thread
            super.threadStart();
        } finally {
            // Unbinding thread
            unbindThread(oldCCL);
        }

其中bindThread()方法会把当前线程的contextClassLoader,也就是AppClassLoader替换成TomcatEmbeddedWebappClassLoader. 后续在finnaly再执行 unbindThread(oldCCL), 再把AppClassLoader进行还原。
在bindThread和unbindThread之间,就会做一些listener的相关start.
关键在于// Call ServletContainerInitializers, 这里会把org.springframework.boot.web.servlet.ServletContextInitializer类型的bean进行获取,org.springframework.boot.actuate.endpoint.web.ServletEndpointRegistrar这个类是spring-boot-starter-actuator中的,它实现了ServletContextInitializer接口,也会在// Call ServletContainerInitializers进行加载。在加载过程中,就会进行ServletEndpointRegistrar的相关实现类加载,而在sca中,存在类com.alibaba.cloud.nacos.endpoint.NacosDiscoveryEndpointAutoConfiguration,其中有nacos相关的endpoint的实现,这之中会进行NacosNamingService的初始化,这个时候初始化时,contextClassLoader还没有进行unBind, 所以这时创建的线程的contextClassLoader都是TomcatEmbeddedWebappClassLoader。

如果不引入spring-boot-starter-actuator的情况下,NacosNamingService初始化已经是unbindThread之后了,这个时候当前线程的contextClassLoader已经还原成了AppClassLoader。

因此后续如果在spring的初始化周期抛出了没有catch的异常,进行tomcat销毁时,就会判断线程的contextClassLoader是否是TomcatEmbeddedWebappClassLoader,在不引入spring-boot-starter-actuator的情况下,线程的contextClassLoader是AppClassLoader, 不会打警告日志。引入之后,contextClassLoader是TomcatEmbeddedWebappClassLoader,会打警告日志。

本质原因就是如果在org.springframework.boot.web.servlet.ServletContextInitializer的bean的切入点过程中创建的线程,其contextClassLoader是TomcatEmbeddedWebappClassLoader,如果在unBindThread之前(把contextClassLoader还原成AppClassLoader), spring初始化失败,关闭tomcat容器时,就会打警告日志。

详细讨论见issue: https://github.com/alibaba/nacos/issues/4124.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在一个流量高峰期间,我们的网站开始出现了性能问题,特别是Tomcat的worker线程居高不下。这个问题对我们的系统稳定性和用户体验产生了严重影响,因此我们立即进行了排查和解决。 首先,我们使用工具监控了Tomcat的worker线程数,发现在高峰期间线程数增长过快,并且没有下降的趋势。接下来,我们对服务器进行了资源监控,发现CPU和内存的使用率都没有超过正常范围。这表明问题不是由于服务器资源不足导致的。 然后,我们查看了Tomcat日志文件,发现一些异常错误信息与数据库连接相关。我们怀疑是数据库连接池的问题,因此我们进一步检查了数据库的连接数和连接池的配置。经过对比分析,我们发现数据库连接池的最大连接数被设置得过小,导致在高流量无法满足请求的需求。我们立即调整了连接池的配置,增加了最大连接数,以应对高峰期的负载。 随后,我们重启了Tomcat,并观察了一段间。我们发现线程数在高峰期开始仍然有所增长,但是随着间的推移开始逐渐下降,最终稳定在一个正常的范围内。这表明我们的排查和解决措施是有效的。 为了进一步确保问题的解决,我们还增加了日志监控和报警机制,以便更及地发现和解决类似问题。 通过这次经历,我们学到了对于高并发流量情况下的线上问题,需要全面考虑不同组件的性能和配置,并对各个环节进行监控和调整。同日志分析和排查是至关重要的工作,能够帮助我们准确定位问题并采取合适的解决措施,最终提升系统的稳定性和性能。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值