
--------【Hadoop线上异常】
代立冬
StayHungryStayFoolish外功修行内功修神
-
原创 [解决] User [dr.who] is not authorized to view the logs for application
User [dr.who] is not authorized to view the logs for application原因 Resource Manager UI的默认用户dr.who权限不正确2016-03-02 21:26:427781
0
-
原创 Log Aggregation Status TIME_OUT的缘起
在spark on yarn运行中,有时会发现spark程序运行完毕后,spark的运行界面没有信息,或者找不到相关的运行信息了,经仔细查看NodeManager UI,出现如下信息:Log Aggregation Status TIME_OUT原来NodeManager可以在应用结束后将日志安全地移动到分布式文件系统HDFS,当应用(application)结束时,用户能通过 YARN 的命令行2017-12-09 21:32:191639
0
-
原创 dfs.datanode.du.reserved 预留空间不生效的问题
dfs.datanode.du.reserved 预留空间不生效的问题2017-04-08 09:46:061755
1
-
原创 修改ranger ui的admin用户登录密码踩坑小记
修改的ranger ui的admin用户登录密码时,需要在ranger的配置里把admin_password改成一样的,否则hdfs的namenode在使用admin时启动不起来,异常如下:Traceback (most recent call last): ambari_ranger_admin, ambari_ranger_password = self.create_ambari_admin_user(ambari_ranger_admin, ambari_ranger_password, f2016-10-27 10:33:135002
0
-
原创 [解决]java.io.IOException: Cannot obtain block length for LocatedBlock
Cannot obtain block length for LocatedBlock2016-05-16 01:55:219000
0
-
原创 DataXceiver error processing unknown operation src: /127.0.0.1:36479 dst: /127.0.0.1:50010处理
异常信息如下: 2015-12-09 17:39:20,310 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hadoop07:50010:DataXceiver error processingunknown operation src: /127.0.0.1:36479 dst: /127.0.0.1:500102015-12-17 18:06:259206
0
-
原创 ambari server内存溢出
java.lang.OutOfMemoryError: PermGen spaceat java.lang.ClassLoader.defineClass1(Native Method)at java.lang.ClassLoader.defineClass(ClassLoader.java:800)at java.security.SecureClassLoader.defineCl2015-12-02 15:39:513262
0
-
转载 namenode磁盘满引发recover edits文件报错
前段时间公司hadoop集群宕机,发现是namenode磁盘满了, 清理出部分空间后,重启集群时,重启失败。又发现集群Secondary namenode 服务也恰恰坏掉,导致所有的操作log持续写入edits.new 文件,等集群宕机的时候文件大小已经达到了丧心病狂的70G+..重启集群报错 加载edits文件失败。分析加载文件报错原因是磁盘不足导致最后写入的log只写入一半就宕机了。由2015-01-31 23:21:352099
0
-
原创 missing blocks错误
Datanode的日志中看到: 10/12/14 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get ne2015-06-09 23:07:501386
0
-
原创 480000 millis timeout while waiting for channel to be ready for write异常处理
480000 millis timeout while waiting for channel to be ready for write2015-06-09 23:14:006484
0
-
原创 org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block
Hbase依赖的datanode日志中如果出现如下报错信息:DataXceiverjava.io.EOFException:INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block 解决办法:Hbase侧配置的dfs.socket.tim2015-06-09 23:20:061914
0
-
原创 Error in deleting blocks.
2014-08-24 22:15:21,714 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode Commandjava.io.IOException: Error in deleting blocks. at org.apache.hadoop.hdfs.serve2015-06-09 23:23:001186
0
-
原创 mapreduce出现大量task被KILLED_UNCLEAN的3个原因
Request received to kill task 'attempt_201411191723_2827635_r_000009_0' by user-------Task has been KILLED_UNCLEAN by the user1.An impatient user (armed with "mapred job -kill-task" command)2015-08-12 17:11:183374
0
-
原创 Caused by: java.io.IOException: Filesystem closed的处理
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://nameservice/user/hive/warehouse/om_dw.db/mac_wifi_day_data/tid=CYJOY/.hive-staging_hive_2016-01-20_10-19-09_200_12016-01-24 16:16:556880
0
-
原创 File file:/data1/hadoop/yarn/local/usercache/hp/appcache/application_* does not exi
AM Container for appattempt_1453292851883_0381_000002 exited with exitCode: -1000For more detailed output, check application tracking page:http://hadoop:8088/cluster/app/application_1453292851883_012016-01-24 16:21:535763
0
-
原创 journalnode Can't scan a pre-transactional edit log异常处理
一个测试环境hadoop集群由于磁盘满导致宕机,启动后发现journalnode报如下异常:2018-03-19 20:48:04,817 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /data1_...2018-03-20 17:03:582494
0