hadoop_hbase_crash_test

crash machine: machine-12



1.kill HRegionServer
Job Name: generate: 1348021259-1297054358
------------------------------------------------------------------------------------------------------------

Error:

12/09/19 10:24:20 INFO mapred.JobClient: Task Id : attempt_201209181541_0004_m_000020_0, Status : FAILED
java.lang.RuntimeException: java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
    at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

    at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:573)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
    at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

    at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
    at org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
    at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
    ... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
    at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
    at $Proxy7.getRegionInfo(Unknown Source)
    at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
    at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:288)
    at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:328)
    at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
    at org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:427)
    at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:164)
    at org.apache.gora.hbase.store.HBaseStore.schemaExists(HBaseStore.java:164)
    at org.apache.gora.hbase.store.HBaseStore.createSchema(HBaseStore.java:145)
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:130)
    at org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:184)
    at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
    at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
    at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
    ... 7 more

12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting


后果:
Task Attempts: attempt_201209181541_0004_m_000020_0
Machine: /default-rack/machine-10
Status: FAILED

之后重新执行该tasks:
Task Attempts: attempt_201209181541_0004_m_000020_1
Machine: /default-rack/machine-9
Status: SUCCEEDED




    
    
2.kill HQuorumPeer
Job Name: fetch
---------------------------------------------------------------------------------------------------

Error:

12/09/19 10:53:18 WARN zookeeper.ClientCnxn: Session 0x139d8ab5db700b9 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: 拒绝连接
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
12/09/19 10:53:18 INFO zookeeper.ClientCnxn: Opening socket connection to server machine-12/192.168.12.212:2181
12/09/19 10:53:18 WARN zookeeper.ClientCnxn: Session 0x139d8ab5db700b7 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: 拒绝连接
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
    
    
后果:
报出警告,但是task正常运行。
    
    
    
    
    
3.kill TaskTracker
Job Name: fetch
---------------------------------------------------------------------------------------------------

Error:

Lost task tracker: tracker_machine-12:localhost/127.0.0.1:58357

之后任务被重新分配并执行成功。


当重新启动TaskTracker之后,之前被分配给该机器的Task被重新初始化并被执行。



4.stop DataNode
Job Name: parse
---------------------------------------------------------------------------------------------------

12/09/20 09:55:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000003_0, Status : FAILED
Task attempt_201209181541_0008_m_000003_0 failed to report status for 602 seconds. Killing!
12/09/20 09:55:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000004_0, Status : FAILED
Task attempt_201209181541_0008_m_000004_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:39 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000005_0, Status : FAILED
Task attempt_201209181541_0008_m_000005_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:40 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000002_0, Status : FAILED
Task attempt_201209181541_0008_m_000002_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:42 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000012_0, Status : FAILED
Task attempt_201209181541_0008_m_000012_0 failed to report status for 600 seconds. Killing!
12/09/20 09:55:44 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000009_0, Status : FAILED
Task attempt_201209181541_0008_m_000009_0 failed to report status for 602 seconds. Killing!
12/09/20 09:57:34 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000000_0, Status : FAILED
Task attempt_201209181541_0008_m_000000_0 failed to report status for 600 seconds. Killing!
12/09/20 09:57:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000001_0, Status : FAILED
Task attempt_201209181541_0008_m_000001_0 failed to report status for 600 seconds. Killing!
12/09/20 10:05:46 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000005_1, Status : FAILED
Task attempt_201209181541_0008_m_000005_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:47 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000006_0, Status : FAILED
Task attempt_201209181541_0008_m_000006_0 failed to report status for 600 seconds. Killing!
12/09/20 10:05:49 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000004_1, Status : FAILED
Task attempt_201209181541_0008_m_000004_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:50 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000003_1, Status : FAILED
Task attempt_201209181541_0008_m_000003_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:51 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000002_1, Status : FAILED
Task attempt_201209181541_0008_m_000002_1 failed to report status for 601 seconds. Killing!
12/09/20 10:05:52 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000013_0, Status : FAILED
Task attempt_201209181541_0008_m_000013_0 failed to report status for 601 seconds. Killing!
12/09/20 10:07:44 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000010_0, Status : FAILED
Task attempt_201209181541_0008_m_000010_0 failed to report status for 600 seconds. Killing!
12/09/20 10:07:47 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000011_0, Status : FAILED
Task attempt_201209181541_0008_m_000011_0 failed to report status for 600 seconds. Killing!




后果:
导致HRegionServer崩溃,task失败。



之后需重启hbase,重新运行可以成功。



5.数据量庞大导致机器宕机
更新数据库(100万网页)











++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|                                                                                                       |
    备注:                                                                                               
    重新启动的时候(start-hbase.sh)默认会重新启动集群中所有机器(包括master和所有slaver)的所有服务
    (HRegionServer,HQuorumPeer,HMaster)                                                               
                                                                                                      
    后来发现 bin/hbase-daemon.sh 可以启动单个机器上的单个服务。                                           
    例如:                                                                                               
    bin/hbase-daemon.sh (start|stop|restart) (master|zookeeper|regionserver)                           
                                                                                                       
    hadoop下:                                                                                           
    bin/hadoop-daemon.sh (start|stop) (jobtracker|tasktracker|datanode|namenode)                                   
|                                                                                                       |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值