crash machine: machine-12
1.kill HRegionServer
Job Name: generate: 1348021259-1297054358
------------------------------------------------------------------------------------------------------------
Error:
12/09/19 10:24:20 INFO mapred.JobClient: Task Id : attempt_201209181541_0004_m_000020_0, Status : FAILED
java.lang.RuntimeException: java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:573)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
at org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy7.getRegionInfo(Unknown Source)
at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:288)
at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:328)
at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
at org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:427)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:164)
at org.apache.gora.hbase.store.HBaseStore.schemaExists(HBaseStore.java:164)
at org.apache.gora.hbase.store.HBaseStore.createSchema(HBaseStore.java:145)
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:130)
at org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:184)
at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
... 7 more
12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting
后果:
Task Attempts: attempt_201209181541_0004_m_000020_0
Machine: /default-rack/machine-10
Status: FAILED
之后重新执行该tasks:
Task Attempts: attempt_201209181541_0004_m_000020_1
Machine: /default-rack/machine-9
Status: SUCCEEDED
2.kill HQuorumPeer
Job Name: fetch
---------------------------------------------------------------------------------------------------
Error:
12/09/19 10:53:18 WARN zookeeper.ClientCnxn: Session 0x139d8ab5db700b9 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
12/09/19 10:53:18 INFO zookeeper.ClientCnxn: Opening socket connection to server machine-12/192.168.12.212:2181
12/09/19 10:53:18 WARN zookeeper.ClientCnxn: Session 0x139d8ab5db700b7 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
后果:
报出警告,但是task正常运行。
3.kill TaskTracker
Job Name: fetch
---------------------------------------------------------------------------------------------------
Error:
Lost task tracker: tracker_machine-12:localhost/127.0.0.1:58357
之后任务被重新分配并执行成功。
当重新启动TaskTracker之后,之前被分配给该机器的Task被重新初始化并被执行。
4.stop DataNode
Job Name: parse
---------------------------------------------------------------------------------------------------
12/09/20 09:55:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000003_0, Status : FAILED
Task attempt_201209181541_0008_m_000003_0 failed to report status for 602 seconds. Killing!
12/09/20 09:55:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000004_0, Status : FAILED
Task attempt_201209181541_0008_m_000004_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:39 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000005_0, Status : FAILED
Task attempt_201209181541_0008_m_000005_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:40 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000002_0, Status : FAILED
Task attempt_201209181541_0008_m_000002_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:42 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000012_0, Status : FAILED
Task attempt_201209181541_0008_m_000012_0 failed to report status for 600 seconds. Killing!
12/09/20 09:55:44 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000009_0, Status : FAILED
Task attempt_201209181541_0008_m_000009_0 failed to report status for 602 seconds. Killing!
12/09/20 09:57:34 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000000_0, Status : FAILED
Task attempt_201209181541_0008_m_000000_0 failed to report status for 600 seconds. Killing!
12/09/20 09:57:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000001_0, Status : FAILED
Task attempt_201209181541_0008_m_000001_0 failed to report status for 600 seconds. Killing!
12/09/20 10:05:46 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000005_1, Status : FAILED
Task attempt_201209181541_0008_m_000005_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:47 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000006_0, Status : FAILED
Task attempt_201209181541_0008_m_000006_0 failed to report status for 600 seconds. Killing!
12/09/20 10:05:49 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000004_1, Status : FAILED
Task attempt_201209181541_0008_m_000004_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:50 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000003_1, Status : FAILED
Task attempt_201209181541_0008_m_000003_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:51 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000002_1, Status : FAILED
Task attempt_201209181541_0008_m_000002_1 failed to report status for 601 seconds. Killing!
12/09/20 10:05:52 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000013_0, Status : FAILED
Task attempt_201209181541_0008_m_000013_0 failed to report status for 601 seconds. Killing!
12/09/20 10:07:44 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000010_0, Status : FAILED
Task attempt_201209181541_0008_m_000010_0 failed to report status for 600 seconds. Killing!
12/09/20 10:07:47 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000011_0, Status : FAILED
Task attempt_201209181541_0008_m_000011_0 failed to report status for 600 seconds. Killing!
后果:
导致HRegionServer崩溃,task失败。
之后需重启hbase,重新运行可以成功。
5.数据量庞大导致机器宕机
更新数据库(100万网页)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| |
备注:
重新启动的时候(start-hbase.sh)默认会重新启动集群中所有机器(包括master和所有slaver)的所有服务
(HRegionServer,HQuorumPeer,HMaster)
后来发现 bin/hbase-daemon.sh 可以启动单个机器上的单个服务。
例如:
bin/hbase-daemon.sh (start|stop|restart) (master|zookeeper|regionserver)
hadoop下:
bin/hadoop-daemon.sh (start|stop) (jobtracker|tasktracker|datanode|namenode)
| |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1.kill HRegionServer
Job Name: generate: 1348021259-1297054358
------------------------------------------------------------------------------------------------------------
Error:
12/09/19 10:24:20 INFO mapred.JobClient: Task Id : attempt_201209181541_0004_m_000020_0, Status : FAILED
java.lang.RuntimeException: java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:573)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
at org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2449)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1642)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy7.getRegionInfo(Unknown Source)
at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:288)
at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:328)
at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
at org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:427)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:164)
at org.apache.gora.hbase.store.HBaseStore.schemaExists(HBaseStore.java:164)
at org.apache.gora.hbase.store.HBaseStore.createSchema(HBaseStore.java:145)
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:130)
at org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:184)
at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
... 7 more
12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting
12/09/19 10:24:23 INFO catalog.CatalogTracker: Failed verification of .META.,,1 at address=machine-12:60020; java.net.ConnectException: 拒绝连接
12/09/19 10:24:23 INFO catalog.CatalogTracker: Current cached META location is not valid, resetting
后果:
Task Attempts: attempt_201209181541_0004_m_000020_0
Machine: /default-rack/machine-10
Status: FAILED
之后重新执行该tasks:
Task Attempts: attempt_201209181541_0004_m_000020_1
Machine: /default-rack/machine-9
Status: SUCCEEDED
2.kill HQuorumPeer
Job Name: fetch
---------------------------------------------------------------------------------------------------
Error:
12/09/19 10:53:18 WARN zookeeper.ClientCnxn: Session 0x139d8ab5db700b9 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
12/09/19 10:53:18 INFO zookeeper.ClientCnxn: Opening socket connection to server machine-12/192.168.12.212:2181
12/09/19 10:53:18 WARN zookeeper.ClientCnxn: Session 0x139d8ab5db700b7 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
后果:
报出警告,但是task正常运行。
3.kill TaskTracker
Job Name: fetch
---------------------------------------------------------------------------------------------------
Error:
Lost task tracker: tracker_machine-12:localhost/127.0.0.1:58357
之后任务被重新分配并执行成功。
当重新启动TaskTracker之后,之前被分配给该机器的Task被重新初始化并被执行。
4.stop DataNode
Job Name: parse
---------------------------------------------------------------------------------------------------
12/09/20 09:55:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000003_0, Status : FAILED
Task attempt_201209181541_0008_m_000003_0 failed to report status for 602 seconds. Killing!
12/09/20 09:55:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000004_0, Status : FAILED
Task attempt_201209181541_0008_m_000004_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:39 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000005_0, Status : FAILED
Task attempt_201209181541_0008_m_000005_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:40 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000002_0, Status : FAILED
Task attempt_201209181541_0008_m_000002_0 failed to report status for 601 seconds. Killing!
12/09/20 09:55:42 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000012_0, Status : FAILED
Task attempt_201209181541_0008_m_000012_0 failed to report status for 600 seconds. Killing!
12/09/20 09:55:44 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000009_0, Status : FAILED
Task attempt_201209181541_0008_m_000009_0 failed to report status for 602 seconds. Killing!
12/09/20 09:57:34 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000000_0, Status : FAILED
Task attempt_201209181541_0008_m_000000_0 failed to report status for 600 seconds. Killing!
12/09/20 09:57:37 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000001_0, Status : FAILED
Task attempt_201209181541_0008_m_000001_0 failed to report status for 600 seconds. Killing!
12/09/20 10:05:46 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000005_1, Status : FAILED
Task attempt_201209181541_0008_m_000005_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:47 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000006_0, Status : FAILED
Task attempt_201209181541_0008_m_000006_0 failed to report status for 600 seconds. Killing!
12/09/20 10:05:49 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000004_1, Status : FAILED
Task attempt_201209181541_0008_m_000004_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:50 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000003_1, Status : FAILED
Task attempt_201209181541_0008_m_000003_1 failed to report status for 600 seconds. Killing!
12/09/20 10:05:51 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000002_1, Status : FAILED
Task attempt_201209181541_0008_m_000002_1 failed to report status for 601 seconds. Killing!
12/09/20 10:05:52 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000013_0, Status : FAILED
Task attempt_201209181541_0008_m_000013_0 failed to report status for 601 seconds. Killing!
12/09/20 10:07:44 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000010_0, Status : FAILED
Task attempt_201209181541_0008_m_000010_0 failed to report status for 600 seconds. Killing!
12/09/20 10:07:47 INFO mapred.JobClient: Task Id : attempt_201209181541_0008_m_000011_0, Status : FAILED
Task attempt_201209181541_0008_m_000011_0 failed to report status for 600 seconds. Killing!
后果:
导致HRegionServer崩溃,task失败。
之后需重启hbase,重新运行可以成功。
5.数据量庞大导致机器宕机
更新数据库(100万网页)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| |
备注:
重新启动的时候(start-hbase.sh)默认会重新启动集群中所有机器(包括master和所有slaver)的所有服务
(HRegionServer,HQuorumPeer,HMaster)
后来发现 bin/hbase-daemon.sh 可以启动单个机器上的单个服务。
例如:
bin/hbase-daemon.sh (start|stop|restart) (master|zookeeper|regionserver)
hadoop下:
bin/hadoop-daemon.sh (start|stop) (jobtracker|tasktracker|datanode|namenode)
| |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++