上班坐下来没多久,接同事电话说有两台mysql服务器无法访问,其中这两台服务器是mycat服务器+MySQL服务器,具体处理过程如下:
一、错误信息
错误信息01:
INFO | jvm 3 | 2018/05/10 10:00:54 | 2018-05-10 10:00:54,406 [INFO ][$_NIOREACTOR-3-RW] close connection,reason:Bad: java.lang.OutOfMemoryError: Unable to acquire 131072 bytes of memory, got 0 ,MySQLConnection [id=6173, lastTime=1525917654 250, user=username, schema=database10, old shema=database10, borrowed=true, fromSlaveDB=true, threadId=42067, charset=utf8, txIsolation=3, autocommit=true, attachment=dn10{SELECT * INFO | jvm 3 | 2018/05/10 10:00:54 | FROM cu_link INFO | jvm 3 | 2018/05/10 10:00:54 | WHERE (customer_id = 1964 INFO | jvm 3 | 2018/05/10 10:00:54 | OR customer_link_id = 1964) INFO | jvm 3 | 2018/05/10 10:00:54 | AND state = 1 INFO | jvm 3 | 2018/05/10 10:00:54 | AND company_id = '3BD' INFO | jvm 3 | 2018/05/10 10:00:54 | LIMIT 100000000}, respHandler=io.mycat.backend.mysql.nio.handler.MultiNodeQueryH andler@530d01dc, host=xx.xx.xx.xx, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false] (io.mycat.net. AbstractConnection:AbstractConnection.java:508) INFO | jvm 3 | 2018/05/10 10:00:54 | 2018-05-10 10:00:54,407 [ERROR][$_NIOREACTOR-3-RW] caught err: java.lang.OutOfM emoryError: Unable to acquire 131072 bytes of memory, got 0 INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.memory.mm.MemoryConsumer.allocateLongArray(Memor yConsumer.java:92) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.utils.sort.UnsafeInMemorySorter.<init>(UnsafeInM emorySorter.java:114) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.utils.sort.UnsafeExternalSorter.<init>(UnsafeExt ernalSorter.java:131) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.utils.sort.UnsafeExternalSorter.create(UnsafeExt ernalSorter.java:98) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.utils.sort.UnsafeExternalRowSorter.<init>(Unsafe ExternalRowSorter.java:64) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.sqlengine.mpp.DataNodeMergeManager.onRowMetaData(DataNodeMerge Manager.java:249) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.backend.mysql.nio.handler.MultiNodeQueryHandler.fieldEofRespon se(MultiNodeQueryHandler.java:643) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.backend.mysql.nio.MySQLConnectionHandler.handleFieldEofPacket( MySQLConnectionHandler.java:192) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.backend.mysql.nio.MySQLConnectionHandler.handleData(MySQLConne ctionHandler.java:118) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.handler.BackendAsyncHandler.offerData(BackendAsyncHandler. java:36) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.backend.mysql.nio.MySQLConnectionHandler.handle(MySQLConnectio nHandler.java:80) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.AbstractConnection.handle(AbstractConnection.java:263) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.AbstractConnection.onReadData(AbstractConnection.java:321) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.NIOSocketWR.asynRead(NIOSocketWR.java:190) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.AbstractConnection.asynRead(AbstractConnection.java:273) INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.NIOReactor$RW.run(NIOReactor.java:102) INFO | jvm 3 | 2018/05/10 10:00:54 | at java.lang.Thread.run(Thread.java:745) INFO | jvm 3 | 2018/05/10 10:00:54 | (io.mycat.net.NIOReactor$RW:NIOReactor.java:132)
错误信息02:
LIMIT 100000000}, packetId=1], host=xx.xx.xx.xx, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false] frontend host:x.x.x.x/49657/username 2018-05-10 10:25:39.883 WARN [$_NIOREACTOR-2-RW] (io.mycat.backend.mysql.nio.handler.SingleNodeHandler.backConnectionErr(SingleNodeHandler.java:249)) - execute sql err : errno:1146 Table 'cr.vacode' doesn't exist con:MySQLConnection [id=89, lastTime=1525919139870, user=username, schema=cr, old shema=cr, borrowed=true, fromSlaveDB=true, threadId=553442, charset=utf8, txIsolation=3, autocommit=true, attachment=dn00{SELECT * FROM vaCode ORDER BY id DESC
错误信息03:
java.sql.SQLNonTransientException: find no Route:SELECT * FROM `database03`.`cain` LIMIT 0
二、错误原因
1、针对错误信息01,基本可以断定是由于服务器jvm设置不当,导致内存溢出。
2、针对错误信息02,基本可以断定是由于在数据库服务器新增表,没有配置在mycat中或由于MyCAT大小写配置和数据库服务器表名大小写配置有关
3、针对错误信息03,基本可以断定路由问题
三、解决方案
1、查看系统内存
[root@localhost ~]# free -g total used free shared buffers cached Mem: 15 15 0 5 0 6 -/+ buffers/cache: 8 6 Swap: 7 5 2
2、查看系统日志
[root@localhost logs]#vi /var/log/message
如果是OOM的话,系统日志会记录的。
3、查看数据库日志和mycat日志
[root@localhost logs]#vi /home/mysql/err.log
[root@localhost logs]# vi /usr/local/mycat/logs/mycat.log
[root@localhost logs]# [root@localhost logs]# vi /usr/local/mycat/logs/wrapper.log
4、查看监控情况
通过zabbix查看系统CPU、内存、SWAP使用情况
5、查看具体进程对应资源消耗
[root@localhost ~]# top -p 31665
[root@localhost ~]# top -p 30388
通过ps 查看系统进程的时候和top查看资源消耗的时候,发现居然还跑着oracle进程,而且其sga设置为8G,PGA设置的是2G
这里看到实际设置的最大应用内存(21G+)已经超过了物理内存的实际最大值(15G),而写看到系统已经使用了swap分区(虽然现在看着还有7G左右的cache),说明
6、查看mycat的进程运行内存
下面的最大堆内存4G(具体使用情况可以通过jmap查看),单线程内存最大64M,最大的nio内存2G(DirectMemory是java nio引入的,直接以native的方式分配内存,不受jvm管理)
查看其具体配置文件
[root@localhost mycat]# more /usr/local/mycat/conf/wrapper.conf
# Java Additional Parameters #wrapper.java.additional.1= wrapper.java.additional.1=-DMYCAT_HOME=. wrapper.java.additional.2=-server wrapper.java.additional.3=-XX:MaxPermSize=64M wrapper.java.additional.4=-XX:+AggressiveOpts wrapper.java.additional.5=-XX:MaxDirectMemorySize=2G wrapper.java.additional.6=-Dcom.sun.management.jmxremote wrapper.java.additional.7=-Dcom.sun.management.jmxremote.port=1984 wrapper.java.additional.8=-Dcom.sun.management.jmxremote.authenticate=false wrapper.java.additional.9=-Dcom.sun.management.jmxremote.ssl=false wrapper.java.additional.10=-Xmx4G wrapper.java.additional.11=-Xms1G # Initial Java Heap Size (in MB) #wrapper.java.initmemory=3 # Maximum Java Heap Size (in MB) #wrapper.java.maxmemory=64 # Application parameters. Add parameters as needed starting from 1 wrapper.app.parameter.1=io.mycat.MycatStartup wrapper.app.parameter.2=start
这里看内存设置还是正常范围内
7、最终解决方案
(由于开发联系我时,我测试数据库服务、MyCAT服务均正常,所以具体排查过程大致如上,后经与开发沟通,他们说是磁盘空间不足导致的,清理磁盘重启mycat问题解决)
8、资源统计
[root@localhost logs]# dstat -lrmc --top-io --top-mem --top-cpu
Terminal width too small, trimming output.
---load-avg--- --io/total- ------memory-usage----- ----total-cpu-usage---- ----most-expensive---- --most-expensive->
1m 5m 15m | read writ| used buff cach free|usr sys idl wai hiq siq| i/o process | memory process >
0.86 0.56 0.47|76.7 105 |8825M 7816k 6938M 181M| 4 1 92 3 0 0|init 16M 477k|mysqld 5789M>
0.86 0.56 0.47|1083 177 |8835M 7828k 6928M 181M| 30 4 47 19 0 1|mysqld 416k 4354k|mysqld 5789M>