--------【Hadoop线上异常】
文章平均质量分 63
代立冬
StayHungryStayFoolish外功修行内功修神
展开
-
[解决] User [dr.who] is not authorized to view the logs for application
User [dr.who] is not authorized to view the logs for application原因 Resource Manager UI的默认用户dr.who权限不正确原创 2016-03-02 21:26:42 · 8971 阅读 · 0 评论 -
Log Aggregation Status TIME_OUT的缘起
在spark on yarn运行中,有时会发现spark程序运行完毕后,spark的运行界面没有信息,或者找不到相关的运行信息了,经仔细查看NodeManager UI,出现如下信息:Log Aggregation Status TIME_OUT原来NodeManager可以在应用结束后将日志安全地移动到分布式文件系统HDFS,当应用(application)结束时,用户能通过 YARN 的命令行原创 2017-12-09 21:32:19 · 2932 阅读 · 0 评论 -
dfs.datanode.du.reserved 预留空间不生效的问题
dfs.datanode.du.reserved 预留空间不生效的问题原创 2017-04-08 09:46:06 · 2053 阅读 · 1 评论 -
修改ranger ui的admin用户登录密码踩坑小记
修改的ranger ui的admin用户登录密码时,需要在ranger的配置里把admin_password改成一样的,否则hdfs的namenode在使用admin时启动不起来,异常如下:Traceback (most recent call last): ambari_ranger_admin, ambari_ranger_password = self.create_ambari_admin_user(ambari_ranger_admin, ambari_ranger_password, f原创 2016-10-27 10:33:13 · 6527 阅读 · 0 评论 -
[解决]java.io.IOException: Cannot obtain block length for LocatedBlock
Cannot obtain block length for LocatedBlock原创 2016-05-16 01:55:21 · 9347 阅读 · 0 评论 -
DataXceiver error processing unknown operation src: /127.0.0.1:36479 dst: /127.0.0.1:50010处理
异常信息如下: 2015-12-09 17:39:20,310 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hadoop07:50010:DataXceiver error processingunknown operation src: /127.0.0.1:36479 dst: /127.0.0.1:50010原创 2015-12-17 18:06:25 · 26749 阅读 · 0 评论 -
ambari server内存溢出
java.lang.OutOfMemoryError: PermGen spaceat java.lang.ClassLoader.defineClass1(Native Method)at java.lang.ClassLoader.defineClass(ClassLoader.java:800)at java.security.SecureClassLoader.defineCl原创 2015-12-02 15:39:51 · 3651 阅读 · 0 评论 -
namenode磁盘满引发recover edits文件报错
前段时间公司hadoop集群宕机,发现是namenode磁盘满了, 清理出部分空间后,重启集群时,重启失败。又发现集群Secondary namenode 服务也恰恰坏掉,导致所有的操作log持续写入edits.new 文件,等集群宕机的时候文件大小已经达到了丧心病狂的70G+..重启集群报错 加载edits文件失败。分析加载文件报错原因是磁盘不足导致最后写入的log只写入一半就宕机了。由转载 2015-01-31 23:21:35 · 2354 阅读 · 0 评论 -
missing blocks错误
Datanode的日志中看到: 10/12/14 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get ne原创 2015-06-09 23:07:50 · 1957 阅读 · 0 评论 -
480000 millis timeout while waiting for channel to be ready for write异常处理
480000 millis timeout while waiting for channel to be ready for write原创 2015-06-09 23:14:00 · 7688 阅读 · 0 评论 -
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block
Hbase依赖的datanode日志中如果出现如下报错信息:DataXceiverjava.io.EOFException:INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block 解决办法:Hbase侧配置的dfs.socket.tim原创 2015-06-09 23:20:06 · 2225 阅读 · 0 评论 -
Error in deleting blocks.
2014-08-24 22:15:21,714 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode Commandjava.io.IOException: Error in deleting blocks. at org.apache.hadoop.hdfs.serve原创 2015-06-09 23:23:00 · 1321 阅读 · 0 评论 -
mapreduce出现大量task被KILLED_UNCLEAN的3个原因
Request received to kill task 'attempt_201411191723_2827635_r_000009_0' by user-------Task has been KILLED_UNCLEAN by the user1.An impatient user (armed with "mapred job -kill-task" command)原创 2015-08-12 17:11:18 · 3704 阅读 · 0 评论 -
Caused by: java.io.IOException: Filesystem closed的处理
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://nameservice/user/hive/warehouse/om_dw.db/mac_wifi_day_data/tid=CYJOY/.hive-staging_hive_2016-01-20_10-19-09_200_1原创 2016-01-24 16:16:55 · 7827 阅读 · 0 评论 -
File file:/data1/hadoop/yarn/local/usercache/hp/appcache/application_* does not exi
AM Container for appattempt_1453292851883_0381_000002 exited with exitCode: -1000For more detailed output, check application tracking page:http://hadoop:8088/cluster/app/application_1453292851883_01原创 2016-01-24 16:21:53 · 6766 阅读 · 0 评论 -
journalnode Can't scan a pre-transactional edit log异常处理
一个测试环境hadoop集群由于磁盘满导致宕机,启动后发现journalnode报如下异常:2018-03-19 20:48:04,817 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /data1_...原创 2018-03-20 17:03:58 · 3340 阅读 · 0 评论