HBase 填坑之RegionServers异常退出2

RegionServers又崩溃了,真是让人头疼。

1.日志:

2019-11-20 03:47:34,174 INFO [sync.3] wal.FSHLog: Slow sync cost: 464 ms, current pipeline: [DatanodeInfoWithStorage[125.94.213.41:50010,DS-cfd2851f-a298-4976-b0e9-f0546a472cb0,DISK], DatanodeInfoWithStorage[125.94.213.5:50010,DS-795d6f28-78f1-4e11-b0d6-7e87654a3306,DISK], DatanodeInfoWithStorage[125.94.213.48:50010,DS-5796c1b4-95a9-4fac-b588-d3166c44fe0d,DISK]]
2019-11-20 03:47:35,428 INFO [sync.0] wal.FSHLog: Slow sync cost: 210 ms, current pipeline: [DatanodeInfoWithStorage[125.94.213.41:50010,DS-cfd2851f-a298-4976-b0e9-f0546a472cb0,DISK], DatanodeInfoWithStorage[125.94.213.5:50010,DS-795d6f28-78f1-4e11-b0d6-7e87654a3306,DISK], DatanodeInfoWithStorage[125.94.213.48:50010,DS-5796c1b4-95a9-4fac-b588-d3166c44fe0d,DISK]]
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] regionserver.HRegionServer$CompactionChecker: Chore: CompactionChecker missed its start time
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] regionserver.HRegionServer$PeriodicMemstoreFlusher: Chore: hdpv-014,16020,1574074564013-MemstoreFlusherChore missed its start time
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] regionserver.HeapMemoryManager$HeapMemoryTunerChore: Chore: hdpv-014,16020,1574074564013-HeapMemoryTunerChore missed its start time
2019-11-20 03:49:39,268 WARN [regionserver/hdpv-014/125.94.213.41:16020] util.Sleeper: We slept 124370ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2019-11-20 03:49:39,269 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 121995ms
GC pool 'ParNew' had collection(s): count=1 time=120630ms
2019-11-20 03:49:39,423 INFO [RS_OPEN_REGION-hdpv-014:16020-0-SendThread(hdpv-007:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 157651ms for sessionid 0x16e7ce845dc02b8, closing socket connection and attempting reconnect
2019-11-20 03:49:39,423 INFO [hdpv-014,16020,1574074564013_ChoreService_4] regionserver.HRegionServer$MovedRegionsCleaner: Chore: MovedRegionsCleaner for region hdpv-014,16020,1574074564013 missed its start time
2019-11-20 03:49:39,423 INFO [regionserver/hdpv-014/125.94.213.41:16020-SendThread(hdpv-001:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 157651ms for sessionid 0x26e7ce845ed0284, closing socket connection and attempting reconnect
2019-11-20 03:49:39,990 WARN [DataStreamer for file /apps/hbase/data/data/default/MIRROR_YY_ACCOUNT_GAME/c9bf3530466b67c29162e1484a18ba7d/.tmp/49c75f29a55740769ede127a4f3c986f block BP-1202337336-125.94.213.13-1419656350533:blk_1526126839_452537137] hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hdfs.DFSPacket.writeTo(DFSPacket.java:176)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:611)
2019-11-20 03:49:39,990 WARN [DataStreamer for file /apps/hbase/data/WALs/hdpv-014,16020,1574074564013/hdpv-014%2C16020%2C1574074564013.default.1574192168896 block BP-1202337336-125.94.213.13-1419656350533:blk_1526124074_452534361] hdfs.DFSClient: DataStreamer Exception

…………

2019-11-20 03:49:49,167 ERROR [sync.2] wal.FSHLog: Error syncing, request close of WAL
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /apps/hbase/data/oldWALs/hdpv-014%2C16020%2C1574074564013.default.1574192168896 (inode 595750975): File is not open for writing. Holder DFSClient_NONMAPREDUCE_-2054740082_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3674)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:3574)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:883)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy16.getAdditionalDatanode(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:484)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy17.getAdditionalDatanode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283)
at com.sun.proxy.$Proxy18.getAdditionalDatanode(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1102)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1268)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)

…………

2019-11-20 03:50:21,676 ERROR [RS_CLOSE_REGION-hdpv-014:16020-1] regionserver.HRegion: Memstore size is 76160064
2019-11-20 03:50:21,745 INFO [StoreCloserThread-YD_ONLINE_GUID,\x19,1574079589337.1645967ddd012b9a875e863266751f58.-1] regionserver.HStore: Closed USER
2019-11-20 03:50:21,785 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,785 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] write.ParallelWriterIndexCommitter: Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] write.ParallelWriterIndexCommitter: Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,866 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed BELONG
2019-11-20 03:50:21,905 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed BIND_PHONE
2019-11-20 03:50:21,906 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed GAME_LABEL
2019-11-20 03:50:21,907 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed LOGIN
2019-11-20 03:50:21,907 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed LOST
2019-11-20 03:50:21,908 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PAYMENT
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] recovery.TrackingParallelWriterIndexCommitter: Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] recovery.TrackingParallelWriterIndexCommitter: Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,976 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PERIOD_GAME_PAYMENT
2019-11-20 03:50:21,977 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PERIOD_LOGIN
2019-11-20 03:50:21,978 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PERIOD_PAYMENT_AVERAGE
2019-11-20 03:50:21,979 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PERIOD_PAYMENT_TOTAL
2019-11-20 03:50:21,980 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PLATFORM
2019-11-20 03:50:21,980 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed REFERER
2019-11-20 03:50:21,981 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed REGISTER
2019-11-20 03:50:21,981 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed ROLE_LABEL
2019-11-20 03:50:21,982 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed ROW_UPDATE_TIME
2019-11-20 03:50:21,982 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed SOUND_LABEL
2019-11-20 03:50:21,989 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed STYLE_LABEL
2019-11-20 03:50:21,991 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed SUBJECT_LABEL
2019-11-20 03:50:22,118 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] regionserver.HRegion: Closed YD_ONLINE_GUID,\x19,1574079589337.1645967ddd012b9a875e863266751f58.
2019-11-20 03:50:22,128 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] regionserver.HRegion: Closed MIRROR_SQW_ACCOUNT,\x09,1573025684248.4707732dccb430927c79c82eae116dd1.
2019-11-20 03:50:22,128 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed BELONG
2019-11-20 03:50:22,148 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] regionserver.HStore: Closed DEVICE
2019-11-20 03:50:22,165 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] regionserver.HStore: Closed GAME
2019-11-20 03:50:22,210 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] regionserver.HStore: Closed TIME
2019-11-20 03:50:22,374 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] regionserver.HStore: Closed USER
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] write.ParallelWriterIndexCommitter: Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] recovery.TrackingParallelWriterIndexCommitter: Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] regionserver.HRegion: Closed YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.
2019-11-20 03:50:22,376 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed BELONG
2019-11-20 03:50:22,421 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed BIND_PHONE
2019-11-20 03:50:22,421 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed BIND_PHONE
2019-11-20 03:50:22,422 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed GAME_LABEL
2019-11-20 03:50:22,422 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed GAME_LABEL
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed LOGIN
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed LOGIN
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed LOST
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed LOST
2019-11-20 03:50:22,425 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PAYMENT
2019-11-20 03:50:22,425 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed PAYMENT
2019-11-20 03:50:22,585 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PERIOD_GAME_PAYMENT
2019-11-20 03:50:22,586 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PERIOD_LOGIN
2019-11-20 03:50:22,587 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PERIOD_PAYMENT_AVERAGE
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PERIOD_PAYMENT_TOTAL
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PLATFORM
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed REFERER
2019-11-20 03:50:22,589 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed REGISTER
2019-11-20 03:50:22,589 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed ROLE_LABEL
2019-11-20 03:50:22,590 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed ROW_UPDATE_TIME
2019-11-20 03:50:22,590 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed SOUND_LABEL
2019-11-20 03:50:22,591 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed STYLE_LABEL
2019-11-20 03:50:22,591 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed SUBJECT_LABEL
2019-11-20 03:50:22,592 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed VIP

…………

2019-11-20 03:50:37,468 WARN [regionserver/hdpv-014/125.94.213.41:16020] zookeeper.ZKUtil: regionserver:16020-0x26e7ce845ed0283, quorum=hdpv-001:2181,hdpv-003:2181,hdpv-005:2181,hdpv-007:2181,hdpv-009:2181, baseZNode=/hbase-unsecure Unable to list children of znode /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:292)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:455)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:483)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1462)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1384)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1266)
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:196)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:302)
at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:202)
at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:194)
at org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2269)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1118)
at java.lang.Thread.run(Thread.java:745)
2019-11-20 03:50:37,739 ERROR [regionserver/hdpv-014/125.94.213.41:16020] zookeeper.ZooKeeperWatcher: regionserver:16020-0x26e7ce845ed0283, quorum=hdpv-001:2181,hdpv-003:2181,hdpv-005:2181,hdpv-007:2181,hdpv-009:2181, baseZNode=/hbase-unsecure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:292)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:455)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:483)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1462)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1384)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1266)
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:196)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:302)
at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:202)
at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:194)
at org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2269)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1118)
at java.lang.Thread.run(Thread.java:745)
2019-11-20 03:50:37,840 INFO [regionserver/hdpv-014/125.94.213.41:16020] ipc.RpcServer: Stopping server on 16020
2019-11-20 03:50:37,840 INFO [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: stopping
2019-11-20 03:50:37,984 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2019-11-20 03:50:37,984 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2019-11-20 03:50:38,901 WARN [regionserver/hdpv-014/125.94.213.41:16020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/hdpv-014,16020,1574074564013
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1222)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1211)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1528)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1126)
at java.lang.Thread.run(Thread.java:745)
2019-11-20 03:50:39,198 INFO [regionserver/hdpv-014/125.94.213.41:16020] regionserver.HRegionServer: stopping server hdpv-014,16020,1574074564013; zookeeper connection closed.
2019-11-20 03:50:39,198 INFO [regionserver/hdpv-014/125.94.213.41:16020] regionserver.HRegionServer: regionserver/hdpv-014/125.94.213.41:16020 exiting
2019-11-20 03:50:40,717 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2801)
2019-11-20 03:50:42,062 INFO [pool-4-thread-1] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@344344fa
2019-11-20 03:50:42,062 INFO [pool-4-thread-1] regionserver.ShutdownHook: Starting fs shutdown hook thread.
2019-11-20 03:50:42,063 ERROR [Thread-9022] hdfs.DFSClient: Failed to close inode 595756106
org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-1202337336-125.94.213.13-1419656350533:blk_1526126839_452537137 does not exist or is not under Constructionnull
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6683)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6751)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:930)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:966)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:948)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy17.updateBlockForPipeline(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283)
at com.sun.proxy.$Proxy18.updateBlockForPipeline(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1281)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)
2019-11-20 03:50:42,534 INFO [pool-4-thread-1] regionserver.ShutdownHook: Shutdown hook finished.

2.gc日志:

2019-11-20T03:32:24.081+0800: 117381.659: [GC (Allocation Failure) 2019-11-20T03:32:24.082+0800: 117381.659: [ParNew: 1445776K->120613K(1504064K), 21.4794441 secs] 3237704K->1942467K(8221504K), 21.4797147 secs] [Times: user=59.65 sys=0.34, real=21.48 secs]
2019-11-20T03:33:13.014+0800: 117430.592: [GC (Allocation Failure) 2019-11-20T03:33:13.037+0800: 117430.614: [ParNew: 1457573K->117558K(1504064K), 3.8057408 secs] 3279427K->1954686K(8221504K), 3.8282921 secs] [Times: user=13.91 sys=0.05, real=3.83 secs]
2019-11-20T03:33:21.314+0800: 117438.892: [GC (Allocation Failure) 2019-11-20T03:33:21.314+0800: 117438.892: [ParNew: 1454518K->94887K(1504064K), 7.5804580 secs] 3291646K->1948109K(8221504K), 7.5807123 secs] [Times: user=7.80 sys=0.08, real=7.58 secs]
2019-11-20T03:42:44.684+0800: 118002.262: [GC (Allocation Failure) 2019-11-20T03:42:44.790+0800: 118002.368: [ParNew: 1431847K->167104K(1504064K), 16.3961827 secs] 3285069K->2024750K(8221504K), 16.5019800 secs] [Times: user=16.41 sys=0.18, real=16.50 secs]
2019-11-20T03:47:36.965+0800: 118294.543: [GC (Allocation Failure) 2019-11-20T03:47:38.419+0800: 118295.997: [ParNew: 1504064K->106763K(1504064K), 120.6296245 secs] 3361710K->2097638K(8221504K), 122.0840424 secs] [Times: user=171.54 sys=2.40, real=122.06 secs]
2019-11-20T03:50:38.743+0800: 118476.321: [GC (Allocation Failure) 2019-11-20T03:50:38.743+0800: 118476.321: [ParNew: 1443723K->33648K(1504064K), 0.1142207 secs] 3434598K->2024523K(8221504K), 0.1143976 secs] [Times: user=0.39 sys=0.00, real=0.11 secs]
Heap
par new generation total 1504064K, used 1150286K [0x00000005c0000000, 0x0000000626000000, 0x0000000626000000)
eden space 1336960K, 83% used [0x00000005c0000000, 0x0000000604277878, 0x00000006119a0000)
from space 167104K, 20% used [0x00000006119a0000, 0x0000000613a7c1b8, 0x000000061bcd0000)
to space 167104K, 0% used [0x000000061bcd0000, 0x000000061bcd0000, 0x0000000626000000)
concurrent mark-sweep generation total 6717440K, used 1990874K [0x0000000626000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 166258K, capacity 181568K, committed 181636K, reserved 1206272K
class space used 20691K, capacity 24674K, committed 24740K, reserved 1048576K

3.原因分析:

在03:47分时,程序进行了一次GC,并且耗时比较长,达到了122秒。GC过程中程序是停止的,称之为“stop the world”。而zk的超时时间是120秒,GC结束后,发现zk连接超时了,region master已经认为它挂掉,把它从集群服务里剔除了,让其它regionserver负它的工作。接替的regionserver会读取wal进行恢复工作,并继续处理,完成后删除wal文件。从GC恢复过来的regionserver,发现找不到wal了,所以报“wal.FSHLog: Error syncing, request close of WAL”,并且得知自己被集群剔除了,就主动关闭自已。

4.解决方案:

regionserver的垃圾回收改成G1,zk的超时120秒已经够长了,就不调整。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值