运维反馈
运维反馈:xx程序出现占用cpu高,数据推送延迟较大的问题。
问题背景
xx项目是其他同事开发的,已运行了大概半年的时间。同事前一段时间刚离职,*****。
废话不多说,开始分析:
初步怀疑是数据量上去了,就问了下现场的运维同学。最近数据量是否增多,现场同事反馈,每天都是四百万左右的数据没有增多。运行一段时间后,CPU就会越来越高。问题一直存在,只是一直没反馈。 最近忍不住了,我对此无语。。。。
使用arthas排查
问运维现场服务器上个是否有arthas-boot.jar ,有的话直接启动并指定到xx模块的进程号。
arthas的下载
curl -O https://arthas.aliyun.com/arthas-boot.jar
如对arthas不够了解的可先看:
Arthas的基础使用(一)
Arthas的基础使用(二)
让现场运维同学执行:
thread -n 3 -i 1000
反馈图片为:
问题分析:
排名第一的现场执行了,37360179ms的时间,并且占用了CPU为99.99%。出现代码问题的地方在ReceiveHeartBeat.run() ,那我们就简单分析下代码。该项目因为是部署了几个模块,模块与模块之间是用Socket来保持心跳的。socket服务端检查客户端的是否存在心跳的逻辑大致如下:
private static class ReceiveHeartBeat implements Runnable {
/**
* 30秒判断心跳是否发送
*/
private static final long TIMEOUT_MILLISECOND = 15000;
private final Socket socket;
private boolean isConnect;
private long timeout;
private ReceiveHeartBeat(Socket socket) {
log.info("客户端{}连接成功", socket.getInetAddress());
this.socket = socket;
this.timeout = System.currentTimeMillis();
this.isConnect = true;
}
@Override
public void run() {
while (isConnect) {
if (System.currentTimeMillis() - timeout > TIMEOUT_MILLISECOND) {
log.info("超过心跳时间==毫秒=>>>{}===>>closeSocket",TIMEOUT_MILLISECOND);
closeSocket();
} else {
try {
InputStream inputStream = socket.getInputStream();
byte[] bytes = new byte[inputStream.available()];
inputStream.read();
String str = new String(bytes, StandardCharsets.UTF_8);
if(!StringUtils.isEmpty(str)) {
// log.info("accept===>>>" + str);
//刷新当前时间
this.timeout = System.currentTimeMillis();
}
} catch (IOException | InterruptedException e) {
log.error("客户端{}连接断开", socket.getInetAddress());
closeSocket();
}
}
}
}
private void closeSocket() {
isConnect = false;
if (socket == null) {
return;
}
OssConfig.UPLOAD_CLIENTS.remove(socket);
try {
this.socket.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
简单看了下逻辑:逻辑没什么大的问题。 但这里的CPU使用量99.99%,正常情况下是不会特别大。有点奇怪,仔细看代码,
会不会是这里的这个循环有问题,这里的这个isConnect如果为true的情况下,会一直跑,中途也没看到break。莫非。。。。
本地复现
因为公司没环境,就按照同事写的代码写了一个简单的小demo来复现。
服务端soeket代码
Log4j2
public class HeartBeatServer implements Runnable {
/**
* 上传客户端的数量
*/
public static volatile Set<Socket> UPLOAD_CLIENTS = new HashSet<>();
private final int socketPort;
private final ThreadPoolTaskExecutor taskExecutor;
public HeartBeatServer(int socketPort, ThreadPoolTaskExecutor taskExecutor) {
this.socketPort = socketPort;
this.taskExecutor = taskExecutor;
}
@Override
public void run() {
try {
ServerSocket serverSocket = new ServerSocket(socketPort);
while (!Thread.currentThread().isInterrupted() ) {
//监听并接受套接字的连接
Socket socket = serverSocket.accept();
UPLOAD_CLIENTS.add(socket);
taskExecutor.execute(new ReceiveHeartBeat(socket));
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static class ReceiveHeartBeat implements Runnable {
/**
* 10秒判断心跳是否发送
*/
private static final long TIMEOUT_MILLISECOND = 10000;
private final Socket socket;
private boolean isConnect;
private long timeout;
private ReceiveHeartBeat(Socket socket) {
log.info("客户端{}连接成功", socket.getInetAddress());
this.socket = socket;
this.timeout = System.currentTimeMillis();
this.isConnect = true;
}
@Override
public void run() {
System.out.println("server====<<<run");
while (isConnect) {
if (System.currentTimeMillis() - timeout > TIMEOUT_MILLISECOND) {
closeSocket();
System.out.println("closeSocket");
} else {
try {
//刷新当前时间
this.timeout = System.currentTimeMillis();
InputStream inputStream = socket.getInputStream();
byte[] bytes = new byte[inputStream.available()];
inputStream.read(bytes);
String str = new String(bytes, StandardCharsets.UTF_8);
log.info("接受===>>>" + str);
} catch (IOException) {
log.error("客户端{}连接断开", socket.getInetAddress());
closeSocket();
}
}
}
}
private void closeSocket() {
isConnect = false;
if (socket == null) {
return;
}
UPLOAD_CLIENTS.remove(socket);
try {
this.socket.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
客户端soeket代码
import lombok.extern.log4j.Log4j2;
import java.io.IOException;
import java.net.Socket;
import java.net.UnknownHostException;
import java.util.concurrent.TimeUnit;
/**
* 客户端心跳处理
*
* @author hulibin
* @since 2021/2/23
*/
@Log4j2
public class SocketClient implements Runnable {
private final String serverIp;
private final int serverPort;
public SocketClient(String serverIp, int serverPort) {
this.serverIp = serverIp;
this.serverPort = serverPort;
}
@Override
public void run() {
Socket socket = null;
boolean open = false;
while (!Thread.currentThread().isInterrupted()) {
//初始化客户端
try {
if (!open) {
socket = new Socket(serverIp, serverPort);
socket.setKeepAlive(true);
socket.setTcpNoDelay(true);
socket.setOOBInline(true);
open = socket.isConnected();
if (open) {
log.info("连接服务端成功");
}
}
} catch (UnknownHostException e) {
log.error("服务端口不能识别");
} catch (IOException e) {
log.error("连接服务端失败");
}
//发送心跳
try {
if (open) {
String uri = "中文" + socket.getPort() +":"+ socket.getInetAddress().getHostAddress();
socket.getOutputStream().write(uri.getBytes());
log.info("send===>>>" + uri);
socket.getOutputStream().flush();
}
} catch (Exception e) {
try {
socket.close();
} catch (IOException ioException) {
log.error("发送心跳失败");
}
open = false;
log.error(e);
}
//心跳间隔
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException interruptedException) {
interruptedException.printStackTrace();
Thread.currentThread().interrupt();
}
}
System.out.println("close heart client");
}
}
启动服务端
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import java.util.concurrent.ThreadPoolExecutor;
/**
* @author :dzp
* @date :Created in 2021/9/1 14:09
* @description:
*/
public class HeadServerTest {
public static void main(String[] args) {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
int corePoolSize = Math.max(0, (Runtime.getRuntime().availableProcessors() + 1));
threadPoolTaskExecutor.setMaxPoolSize(corePoolSize);
threadPoolTaskExecutor.setCorePoolSize(corePoolSize);
threadPoolTaskExecutor.setQueueCapacity(0);
threadPoolTaskExecutor.setKeepAliveSeconds(60);
threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
threadPoolTaskExecutor.setAwaitTerminationSeconds(3);
threadPoolTaskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.DiscardPolicy());
threadPoolTaskExecutor.initialize();
//启动服务端口socket监听
HeartBeatServer heartBeatServer = new HeartBeatServer(9999, threadPoolTaskExecutor);
Thread thread = new Thread(heartBeatServer);
thread.start();
}
}
启动客户端
public class HeadtClient {
public static void main(String[] args) {
SocketClient socketClient = new SocketClient("127.0.0.1", 9999);
Thread thread = new Thread(socketClient);
thread.start();
}
}
复现
启动arthas-boot.jar 并选择HeartBeatServer 对应的对应的进程号,然后输入
thread -n 3 -i 1000
也出现了CPU占用较高的情况:
[arthas@9304]$ thread -n 3 -i 1000
thread -n 3 -i 1000
"ThreadPoolTaskExecutor-1" Id=16 cpuUsage=89.58% deltaTime=906ms time=33078ms RUNNABLE
at java.security.AccessController.doPrivileged(Native Method)
at java.net.Socket.getInputStream(Socket.java:911)
at com.example.demo.test.HeartBeatServer$ReceiveHeartBeat.run(HeartBeatServer.java:83)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"VM Thread" [Internal] cpuUsage=1.54% deltaTime=15ms time=171ms
"Reference Handler" Id=2 cpuUsage=0.0% deltaTime=0ms time=0ms WAITING on java.lang.ref.Reference$Lock@4e01ee05
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.Reference$Lock@4e01ee05
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
修复
上述问题我们已经在本地成功复现,那是什么原因呢? 其实是一个很简单的问题。
while死循环引起的。 服务端不停的在监听客户端发过来的字节。。有必要嘛?
很明显是没必要的,加一个线程睡眠时长就可以解决了。
//线程休息5s
TimeUnit.SECONDS.sleep(5);
完整代码如下:
import lombok.extern.log4j.Log4j2;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import java.io.IOException;
import java.io.InputStream;
import java.net.ServerSocket;
import java.net.Socket;
import java.nio.charset.StandardCharsets;
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.TimeUnit;
/**
* @author :dzp
* @date :Created in 2021/9/1 14:07
* @description:
*/
@Log4j2
public class HeartBeatServer implements Runnable {
/**
* 上传客户端的数量
*/
public static volatile Set<Socket> UPLOAD_CLIENTS = new HashSet<>();
private final int socketPort;
private final ThreadPoolTaskExecutor taskExecutor;
public HeartBeatServer(int socketPort, ThreadPoolTaskExecutor taskExecutor) {
this.socketPort = socketPort;
this.taskExecutor = taskExecutor;
}
@Override
public void run() {
try {
ServerSocket serverSocket = new ServerSocket(socketPort);
while (!Thread.currentThread().isInterrupted() ) {
//监听并接受套接字的连接
Socket socket = serverSocket.accept();
UPLOAD_CLIENTS.add(socket);
taskExecutor.execute(new ReceiveHeartBeat(socket));
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static class ReceiveHeartBeat implements Runnable {
/**
* 10秒判断心跳是否发送
*/
private static final long TIMEOUT_MILLISECOND = 10000;
private final Socket socket;
private boolean isConnect;
private long timeout;
private ReceiveHeartBeat(Socket socket) {
log.info("客户端{}连接成功", socket.getInetAddress());
this.socket = socket;
this.timeout = System.currentTimeMillis();
this.isConnect = true;
}
@Override
public void run() {
System.out.println("server====<<<run");
while (isConnect) {
if (System.currentTimeMillis() - timeout > TIMEOUT_MILLISECOND) {
closeSocket();
System.out.println("closeSocket");
} else {
try {
//睡眠时长
TimeUnit.SECONDS.sleep(5);
//刷新当前时间
this.timeout = System.currentTimeMillis();
InputStream inputStream = socket.getInputStream();
byte[] bytes = new byte[inputStream.available()];
inputStream.read(bytes);
String str = new String(bytes, StandardCharsets.UTF_8);
log.info("接受===>>>" + str);
} catch (IOException | InterruptedException e) {
log.error("客户端{}连接断开", socket.getInetAddress());
closeSocket();
}
}
}
}
private void closeSocket() {
isConnect = false;
if (socket == null) {
return;
}
UPLOAD_CLIENTS.remove(socket);
try {
this.socket.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
改好后,我们在启动sokcet服务度和客户端,并再次启动启动arthas-boot.jar,执行“thread -n 3 -i 1000” 看下情况
[arthas@7472]$ thread -n 3 -i 1000
thread -n 3 -i 1000
"Reference Handler" Id=2 cpuUsage=0.0% deltaTime=0ms time=0ms WAITING on java.lang.ref.Reference$Lock@60f8a107
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.Reference$Lock@60f8a107
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"Finalizer" Id=3 cpuUsage=0.0% deltaTime=0ms time=0ms WAITING on java.lang.ref.ReferenceQueue$Lock@455a7657
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.ReferenceQueue$Lock@455a7657
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
"Signal Dispatcher" Id=4 cpuUsage=0.0% deltaTime=0ms time=0ms RUNNABLE
前三的CPU占用量的都很低,成功解决!!
那只需要在xx项目中增加线程睡眠代码并打包给运维,让他在晚上更新就可以了。
反馈
部署一天后运维反馈如下:
昨天xx程序都全部升级到xx版本了,目前一切正常,cpu占用已经下来了,延迟也正常。