Arthas的基础使用(三):使用arthas排查线上CPU过高问题

运维人员发现xx项目因心跳检测死循环导致CPU占用过高。通过arthas工具诊断发现ReceiveHeartBeat.run()中的无限循环是罪魁祸首。通过本地复现和代码调整,加上适当休眠,问题得到解决。升级后项目运行正常,CPU占用降低,延迟改善。
摘要由CSDN通过智能技术生成

运维反馈

运维反馈:xx程序出现占用cpu高,数据推送延迟较大的问题。

问题背景

xx项目是其他同事开发的,已运行了大概半年的时间。同事前一段时间刚离职,*****。
废话不多说,开始分析:
初步怀疑是数据量上去了,就问了下现场的运维同学。最近数据量是否增多,现场同事反馈,每天都是四百万左右的数据没有增多。运行一段时间后,CPU就会越来越高。问题一直存在,只是一直没反馈。 最近忍不住了,我对此无语。。。。

使用arthas排查

问运维现场服务器上个是否有arthas-boot.jar ,有的话直接启动并指定到xx模块的进程号。

arthas的下载

curl -O https://arthas.aliyun.com/arthas-boot.jar

如对arthas不够了解的可先看:
Arthas的基础使用(一)
Arthas的基础使用(二)

让现场运维同学执行:

thread -n 3 -i 1000

反馈图片为:
在这里插入图片描述
问题分析:
排名第一的现场执行了,37360179ms的时间,并且占用了CPU为99.99%。出现代码问题的地方在ReceiveHeartBeat.run() ,那我们就简单分析下代码。该项目因为是部署了几个模块,模块与模块之间是用Socket来保持心跳的。socket服务端检查客户端的是否存在心跳的逻辑大致如下:

 private static class ReceiveHeartBeat implements Runnable {

        /**
         * 30秒判断心跳是否发送
         */
        private static final long TIMEOUT_MILLISECOND = 15000;

        private final Socket socket;

        private boolean isConnect;

        private long timeout;

        private ReceiveHeartBeat(Socket socket) {
            log.info("客户端{}连接成功", socket.getInetAddress());
            this.socket = socket;
            this.timeout = System.currentTimeMillis();
            this.isConnect = true;

        }

        @Override
        public void run() {
            while (isConnect) {
                if (System.currentTimeMillis() - timeout > TIMEOUT_MILLISECOND) {
                    log.info("超过心跳时间==毫秒=>>>{}===>>closeSocket",TIMEOUT_MILLISECOND);
                    closeSocket();
                } else {
                    try {
                        InputStream inputStream = socket.getInputStream();
                        byte[]  bytes = new byte[inputStream.available()];
                        inputStream.read();
                        String str = new String(bytes, StandardCharsets.UTF_8);
                        if(!StringUtils.isEmpty(str)) {
//                            log.info("accept===>>>" + str);
                            //刷新当前时间
                            this.timeout = System.currentTimeMillis();
                        }
                    } catch (IOException | InterruptedException e) {
                        log.error("客户端{}连接断开", socket.getInetAddress());
                        closeSocket();
                    }
                }

            }
        }

        private void closeSocket() {

            isConnect = false;

            if (socket == null) {
                return;
            }

            OssConfig.UPLOAD_CLIENTS.remove(socket);

            try {
                this.socket.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

简单看了下逻辑:逻辑没什么大的问题。 但这里的CPU使用量99.99%,正常情况下是不会特别大。有点奇怪,仔细看代码,
会不会是这里的这个循环有问题,这里的这个isConnect如果为true的情况下,会一直跑,中途也没看到break。莫非。。。。

本地复现

因为公司没环境,就按照同事写的代码写了一个简单的小demo来复现。

服务端soeket代码

Log4j2
public class HeartBeatServer implements Runnable {

    /**
     * 上传客户端的数量
     */
    public static volatile Set<Socket> UPLOAD_CLIENTS = new HashSet<>();

    private final int socketPort;

    private final ThreadPoolTaskExecutor taskExecutor;

    public HeartBeatServer(int socketPort, ThreadPoolTaskExecutor taskExecutor) {
        this.socketPort = socketPort;
        this.taskExecutor = taskExecutor;
    }

    @Override
    public void run() {
        try {
            ServerSocket serverSocket = new ServerSocket(socketPort);
            while (!Thread.currentThread().isInterrupted() ) {
                //监听并接受套接字的连接
                Socket socket = serverSocket.accept();
                UPLOAD_CLIENTS.add(socket);
                taskExecutor.execute(new ReceiveHeartBeat(socket));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private static   class ReceiveHeartBeat implements Runnable {

        /**
         * 10秒判断心跳是否发送
         */
        private static final long TIMEOUT_MILLISECOND = 10000;

        private final Socket socket;

        private boolean isConnect;

        private long timeout;

        private ReceiveHeartBeat(Socket socket) {
            log.info("客户端{}连接成功", socket.getInetAddress());
            this.socket = socket;
            this.timeout = System.currentTimeMillis();
            this.isConnect = true;

        }

        @Override
        public void run() {
            System.out.println("server====<<<run");
            while (isConnect) {
                if (System.currentTimeMillis() - timeout > TIMEOUT_MILLISECOND) {
                    closeSocket();
                    System.out.println("closeSocket");
                } else {

                    try {
                       
                        //刷新当前时间
                        this.timeout = System.currentTimeMillis();
                        InputStream inputStream = socket.getInputStream();
                        byte[]  bytes = new byte[inputStream.available()];
                        inputStream.read(bytes);
                        String str = new String(bytes, StandardCharsets.UTF_8);
                        log.info("接受===>>>" + str);
                    } catch (IOException) {
                        log.error("客户端{}连接断开", socket.getInetAddress());
                        closeSocket();
                    }
                }

            }
        }

        private void closeSocket() {

            isConnect = false;

            if (socket == null) {
                return;
            }

            UPLOAD_CLIENTS.remove(socket);

            try {
                this.socket.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

客户端soeket代码

import lombok.extern.log4j.Log4j2;

import java.io.IOException;

import java.net.Socket;
import java.net.UnknownHostException;
import java.util.concurrent.TimeUnit;

/**
 * 客户端心跳处理
 *
 * @author hulibin
 * @since 2021/2/23
 */
@Log4j2
public class SocketClient implements Runnable {

    private final String serverIp;

    private final int serverPort;

    public SocketClient(String serverIp, int serverPort) {
        this.serverIp = serverIp;
        this.serverPort = serverPort;
    }

    @Override
    public void run() {
        Socket socket = null;
        boolean open = false;
        while (!Thread.currentThread().isInterrupted()) {
            //初始化客户端
            try {
                if (!open) {
                    socket = new Socket(serverIp, serverPort);
                    socket.setKeepAlive(true);
                    socket.setTcpNoDelay(true);
                    socket.setOOBInline(true);
                    open = socket.isConnected();
                    if (open) {
                        log.info("连接服务端成功");
                    }
                }
            } catch (UnknownHostException e) {
                log.error("服务端口不能识别");
            } catch (IOException e) {
                log.error("连接服务端失败");
            }
            //发送心跳
            try {
                if (open) {
                    String uri = "中文" + socket.getPort() +":"+ socket.getInetAddress().getHostAddress();
                    socket.getOutputStream().write(uri.getBytes());
                    log.info("send===>>>" + uri);
                    socket.getOutputStream().flush();
                }
            } catch (Exception e) {
                try {
                    socket.close();
                } catch (IOException ioException) {
                    log.error("发送心跳失败");
                }
                open = false;
                log.error(e);
            }
            //心跳间隔
            try {
                TimeUnit.SECONDS.sleep(5);
            } catch (InterruptedException interruptedException) {
                interruptedException.printStackTrace();
                Thread.currentThread().interrupt();
            }
        }
        System.out.println("close heart client");
    }
}

启动服务端


import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;

import java.util.concurrent.ThreadPoolExecutor;

/**
 * @author :dzp
 * @date :Created in 2021/9/1 14:09
 * @description:
 */
public class HeadServerTest {


    public static void main(String[] args) {

        ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
        int corePoolSize = Math.max(0, (Runtime.getRuntime().availableProcessors() + 1));
        threadPoolTaskExecutor.setMaxPoolSize(corePoolSize);
        threadPoolTaskExecutor.setCorePoolSize(corePoolSize);
        threadPoolTaskExecutor.setQueueCapacity(0);
        threadPoolTaskExecutor.setKeepAliveSeconds(60);
        threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
        threadPoolTaskExecutor.setAwaitTerminationSeconds(3);
        threadPoolTaskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.DiscardPolicy());
        threadPoolTaskExecutor.initialize();
        //启动服务端口socket监听
        HeartBeatServer heartBeatServer = new HeartBeatServer(9999, threadPoolTaskExecutor);
        Thread thread = new Thread(heartBeatServer);
        thread.start();
    }
}

启动客户端

public class HeadtClient {
    public static void main(String[] args) {
        SocketClient socketClient = new SocketClient("127.0.0.1", 9999);
        Thread thread = new Thread(socketClient);
        thread.start();
    }
}

复现

启动arthas-boot.jar 并选择HeartBeatServer 对应的对应的进程号,然后输入

thread -n 3 -i 1000

也出现了CPU占用较高的情况:

[arthas@9304]$ thread -n 3 -i 1000
thread -n 3 -i 1000
"ThreadPoolTaskExecutor-1" Id=16 cpuUsage=89.58% deltaTime=906ms time=33078ms RUNNABLE
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.Socket.getInputStream(Socket.java:911)
    at com.example.demo.test.HeartBeatServer$ReceiveHeartBeat.run(HeartBeatServer.java:83)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


"VM Thread" [Internal] cpuUsage=1.54% deltaTime=15ms time=171ms


"Reference Handler" Id=2 cpuUsage=0.0% deltaTime=0ms time=0ms WAITING on java.lang.ref.Reference$Lock@4e01ee05
    at java.lang.Object.wait(Native Method)
    -  waiting on java.lang.ref.Reference$Lock@4e01ee05
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

修复

上述问题我们已经在本地成功复现,那是什么原因呢? 其实是一个很简单的问题。
while死循环引起的。 服务端不停的在监听客户端发过来的字节。。有必要嘛?
很明显是没必要的,加一个线程睡眠时长就可以解决了。

 //线程休息5s
 TimeUnit.SECONDS.sleep(5);

完整代码如下:

import lombok.extern.log4j.Log4j2;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import java.io.IOException;
import java.io.InputStream;
import java.net.ServerSocket;
import java.net.Socket;
import java.nio.charset.StandardCharsets;
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.TimeUnit;

/**
 * @author :dzp
 * @date :Created in 2021/9/1 14:07
 * @description:
 */
@Log4j2
public class HeartBeatServer implements Runnable {

    /**
     * 上传客户端的数量
     */
    public static volatile Set<Socket> UPLOAD_CLIENTS = new HashSet<>();

    private final int socketPort;

    private final ThreadPoolTaskExecutor taskExecutor;

    public HeartBeatServer(int socketPort, ThreadPoolTaskExecutor taskExecutor) {
        this.socketPort = socketPort;
        this.taskExecutor = taskExecutor;
    }

    @Override
    public void run() {
        try {
            ServerSocket serverSocket = new ServerSocket(socketPort);
            while (!Thread.currentThread().isInterrupted() ) {
                //监听并接受套接字的连接
                Socket socket = serverSocket.accept();
                UPLOAD_CLIENTS.add(socket);
                taskExecutor.execute(new ReceiveHeartBeat(socket));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private static   class ReceiveHeartBeat implements Runnable {

        /**
         * 10秒判断心跳是否发送
         */
        private static final long TIMEOUT_MILLISECOND = 10000;

        private final Socket socket;

        private boolean isConnect;

        private long timeout;

        private ReceiveHeartBeat(Socket socket) {
            log.info("客户端{}连接成功", socket.getInetAddress());
            this.socket = socket;
            this.timeout = System.currentTimeMillis();
            this.isConnect = true;

        }

        @Override
        public void run() {
            System.out.println("server====<<<run");
            while (isConnect) {
                if (System.currentTimeMillis() - timeout > TIMEOUT_MILLISECOND) {
                    closeSocket();
                    System.out.println("closeSocket");
                } else {

                    try {
                    //睡眠时长
                        TimeUnit.SECONDS.sleep(5);
                        //刷新当前时间
                        this.timeout = System.currentTimeMillis();
                        InputStream inputStream = socket.getInputStream();
                        byte[]  bytes = new byte[inputStream.available()];
                        inputStream.read(bytes);
                        String str = new String(bytes, StandardCharsets.UTF_8);
                        log.info("接受===>>>" + str);
                    } catch (IOException | InterruptedException e) {
                        log.error("客户端{}连接断开", socket.getInetAddress());
                        closeSocket();
                    }
                }

            }
        }

        private void closeSocket() {

            isConnect = false;

            if (socket == null) {
                return;
            }

            UPLOAD_CLIENTS.remove(socket);

            try {
                this.socket.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

改好后,我们在启动sokcet服务度和客户端,并再次启动启动arthas-boot.jar,执行“thread -n 3 -i 1000” 看下情况

[arthas@7472]$ thread -n 3 -i 1000
thread -n 3 -i 1000
"Reference Handler" Id=2 cpuUsage=0.0% deltaTime=0ms time=0ms WAITING on java.lang.ref.Reference$Lock@60f8a107
    at java.lang.Object.wait(Native Method)
    -  waiting on java.lang.ref.Reference$Lock@60f8a107
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)


"Finalizer" Id=3 cpuUsage=0.0% deltaTime=0ms time=0ms WAITING on java.lang.ref.ReferenceQueue$Lock@455a7657
    at java.lang.Object.wait(Native Method)
    -  waiting on java.lang.ref.ReferenceQueue$Lock@455a7657
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)


"Signal Dispatcher" Id=4 cpuUsage=0.0% deltaTime=0ms time=0ms RUNNABLE

前三的CPU占用量的都很低,成功解决!!
那只需要在xx项目中增加线程睡眠代码并打包给运维,让他在晚上更新就可以了。

反馈

部署一天后运维反馈如下:
昨天xx程序都全部升级到xx版本了,目前一切正常,cpu占用已经下来了,延迟也正常。

Arthas 参考文档

Arthas中文文档

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

一枚开发小咸鱼

原创不宜,请作者喝杯咖啡吧。

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值