问题现象
给某项目更换了5.0.2版本的探针,在JAVA服务监控中看到各个server的状态都是可用的。
点击server详情发现,JVM探针的信息还是v3.5.0_jvm20190127_beta这个版本,并且各个server是没有数据上传。
问题分析
查看了probes.log这个日志发现有一些报错,其中java.net.UnknownHostException: yyy.yonyoucloud.com,可以忽略,因为agent的检测机制是首先检测本机是否能直连到yyy.yonyoucloud.com,如果可以直接发送数据,如果不可以才通过跳板机传输数据。
2019-07-02 17:24:53 ProbeUtils - [vert.x-eventloop-thread-0] - [ ERROR ] init read probe http urls from config/probe_httpurl.properties failed
2019-07-02 17:24:53 ServerMapSyncEngine - [DefaultQuartzScheduler_Worker-3] - [ ERROR ] Send server info happends ioexception!
com.yonyou.yyy.exception.NetworkException: yyy.yonyoucloud.com
at com.yonyou.yyy.sender.HttpClientUtil.sendPost(HttpClientUtil.java:190) ~[agent-sender-0.0.1.jar:?]
at com.yonyou.yyy.sender.HttpClientUtil.send(HttpClientUtil.java:129) ~[agent-sender-0.0.1.jar:?]
at com.yonyou.yyy.sender.HttpClientUtil.send(HttpClientUtil.java:92) ~[agent-sender-0.0.1.jar:?]
at com.yonyou.yyy.sender.HttpClientSenderDataHandler.process(HttpClientSenderDataHandler.java:47) ~[agent-sender-0.0.1.jar:?]
at com.yonyou.yyy.datahandler.DefaultDataHandlerContext.process(DefaultDataHandlerContext.java:130) ~[agent-datahandler-0.0.1.jar:?]
at com.yonyou.yyy.probes.server.ServerMapSyncEngine.pullServerInfosFromCloud(ServerMapSyncEngine.java:109) [agent-probes-0.0.1.jar:?]
at com.yonyou.yyy.probes.server.ServerMapPullTask.execute(ServerMapPullTask.java:21) [agent-probes-0.0.1.jar:?]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.0.jar:?]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.0.jar:?]
Caused by: java.net.UnknownHostException: yyy.yonyoucloud.com
at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_201]
at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_201]
at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_201]
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[httpclient-4.5.3.jar:4.5.3]
at com.yonyou.yyy.sender.HttpClientUtil.sendPost(HttpClientUtil.java:177) ~[agent-sender-0.0.1.jar:?]
... 8 more
查看了cloudsync.log一直发现拉取云端配置发生异常。
2019-07-02 17:24:53 CloudSyncTask - [DefaultQuartzScheduler_Worker-5] - [ ERROR ] 拉取云端配置发生异常!{}
com.yonyou.yyy.exception.NetworkException: yyy.yonyoucloud.com
at com.yonyou.yyy.sender.HttpClientUtil.sendPost(HttpClientUtil.java:190) ~[agent-sender-0.0.1.jar:?]
at com.yonyou.yyy.sender.HttpClientUtil.send(HttpClientUtil.java:129) ~[agent-sender-0.0.1.jar:?]
at com.yonyou.yyy.sender.HttpClientUtil.send(HttpClientUtil.java:92) ~[agent-sender-0.0.1.jar:?]
at com.yonyou.yyy.sender.HttpClientSenderDataHandler.process(HttpClientSenderDataHandler.java:47) ~[agent-sender-0.0.1.jar:?]
at com.yonyou.yyy.datahandler.DefaultDataHandlerContext.process(DefaultDataHandlerContext.java:130) ~[agent-datahandler-0.0.1.jar:?]
at com.yonyou.yyy.cloudsync.CloudSyncTask.getSyncingMsg(CloudSyncTask.java:70) [agent-cloudsync-0.0.1.jar:?]
at com.yonyou.yyy.cloudsync.CloudSyncTask.execute(CloudSyncTask.java:40) [agent-cloudsync-0.0.1.jar:?]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.0.jar:?]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.0.jar:?]
Caused by: java.net.UnknownHostException: yyy.yonyoucloud.com
at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_201]
at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_201]
at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_201]
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.3.jar:4.5.3]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[httpclient-4.5.3.jar:4.5.3]
at com.yonyou.yyy.sender.HttpClientUtil.sendPost(HttpClientUtil.java:177) ~[agent-sender-0.0.1.jar:?]
... 8 more
查看probes/logs 下生成的各个server的日志文件发现,里面一直再报Address is used的错误,所以怀疑友云音的4574~4579端口有可能被占用了。
[root@ncapp2 yonyou-yyy-agent]# lsof -i :4575
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
java 4339 root 790u IPv6 17689840 UDP *:4575
java 4339 root 791u IPv6 17689841 UDP *:4575
java 9990 root 119u IPv6 32136 TCP *:4575 (LISTEN)
发现4575这个端口有两个java进程在占用,分别查了下这两个进程发现,都为友云音的进程。
[root@ncapp2 yonyou-yyy-agent]# ps -ef | grep 4339
root 4339 1 0 17:24 pts/2 00:01:24 /opt/nchome/yonyou-yyy-agent/ufjre/bin/java -Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/nchome/yonyou-yyy-agent -Dlog4j2.configurationFile=/opt/nchome/yonyou-yyy-agent/config/log4j2.xml -Dfile.encoding=UTF-8 -Djava.library.path=/opt/nchome/yonyou-yyy-agent/tools -classpath ./*:lib/* com.yonyou.yyy.core.YyyBootstrapMain
root 25242 1368 0 19:47 pts/2 00:00:00 grep 4339
[root@ncapp2 yonyou-yyy-agent]# ps -ef | grep 9990
root 9990 1 0 Jun09 ? 03:45:39 /opt/nchome/yonyou-yyy-agent/ufjre/bin/java -Xmx256m -XX:+HeapDumpOnOutOfMemoryError -Dorg.jboss.netty.epollBugWorkaround=true -XX:HeapDumpPath=/opt/nchome/yonyou-yyy-agent -Dfile.encoding=UTF-8 -Djava.library.path=/opt/nchome/yonyou-yyy-agent/tools -jar yonyou-yyy-agent.jar -n yyyAgent -f system.properties
root 25215 1368 0 19:46 pts/2 00:00:00 grep 9990
根据友云音agent参数分析,9990为老的agent进程。
但是在客户的nchome下只发现了新版本的agent。询问驻场顾问后发现,顾问之前自己删除了友云音的agent,并没完全释放掉友云音的进程。
解决办法
把老agnet的进程kill掉
kill -9 9990
后续JVM数据上传成功。