Hadoop常见问题及解决办法

最新推荐文章于 2024-05-21 16:24:47 发布

2401_83642079

最新推荐文章于 2024-05-21 16:24:47 发布

阅读量740

点赞数 16

分类专栏：程序员文章标签： hadoop 大数据分布式

本文链接：https://blog.csdn.net/2401_83642079/article/details/137108711

版权

程序员专栏收录该内容

485 篇文章 0 订阅

订阅专栏

at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

错误原因：

Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I’m not sure

split size

FileInputFormat input splits: (详见《the definitive guide》P190)

mapred.min.split.size: default=1, the smallest valide size in bytes for a file split.

mapred.max.split.size: default=Long.MAX_VALUE, the largest valid size.

dfs.block.size: default = 64M, 系统中设置为128M。

如果设置 minimum split size > block size, 会增加块的数量。(猜想从其他节点拿去数据的时候，会合并block，导致block数量增多)

如果设置maximum split size < block size, 会进一步拆分block。

split size = max(minimumSize, min(maximumSize, blockSize));

其中 minimumSize < blockSize < maximumSize.

sort by value

hadoop 不提供直接的sort by value方法，因为这样会降低mapreduce性能。

但可以用组合的办法来实现，具体实现方法见《the definitive guide》, P250

基本思想：

1. 组合key/value作为新的key；

2. 重载partitioner，根据old key来分割；

conf.setPartitionerClass(FirstPartitioner.class);

3. 自定义keyComparator：先根据old key排序，再根据old value排序；

conf.setOutputKeyComparatorClass(KeyComparator.class);

4. 重载GroupComparator, 也根据old key 来组合； conf.setOutputValueGroupingComparator(GroupComparator.class);

small input files的处理

对于一系列的small files作为input file，会降低hadoop效率。

有3种方法可以将small file合并处理：

1. 将一系列的small files合并成一个sequneceFile，加快mapreduce速度。

详见WholeFileInputFormat及SmallFilesToSequenceFileConverter,《the definitive guide》, P194

2. 使用CombineFileInputFormat集成FileinputFormat，但是未实现过；

3. 使用hadoop archives(类似打包)，减少小文件在namenode中的metadata内存消耗。(这个方法不一定可行，所以不建议使用)

方法：

将/my/files目录及其子目录归档成files.har，然后放在/my目录下

bin/hadoop archive -archiveName files.har /my/files /my

查看files in the archive:

bin/hadoop fs -lsr har://my/files.har

skip bad records

JobConf conf = new JobConf(ProductMR.class);

conf.setJobName(“ProductMR”);

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(Product.class);

conf.setMapperClass(Map.class);

conf.setReducerClass(Reduce.class);

conf.setMapOutputCompressorClass(DefaultCodec.class);

conf.setInputFormat(SequenceFileInputFormat.class);

conf.setOutputFormat(SequenceFileOutputFormat.class);

String objpath = “abc1”;

SequenceFileInputFormat.addInputPath(conf, new Path(objpath));

SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE);

SkipBadRecords.setAttemptsToStartSkipping(conf, 0);

SkipBadRecords.setSkipOutputPath(conf, new Path(“data/product/skip/”));

String output = “abc”;

SequenceFileOutputFormat.setOutputPath(conf, new Path(output));

JobClient.runJob(conf);

For skipping failed tasks try : mapred.max.map.failures.percent

restart 单个datanode

如果一个datanode 出现问题，解决之后需要重新加入cluster而不重启cluster，方法如下：

bin/hadoop-daemon.sh start datanode

bin/hadoop-daemon.sh start jobtracker

reduce exceed 100%

"Reduce Task Progress shows > 100% when the total size of map outputs (for a

single reducer) is high "

造成原因：

在reduce的merge过程中，check progress有误差，导致status > 100%，在统计过程中就会出现以下错误：java.lang.ArrayIndexOutOfBoundsException: 3

at org.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.getReduceAvarageProgresses(StatusHttpServer.java:228)

at org.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.doGet(StatusHttpServer.java:159)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)

at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)

at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)

at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)

at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)

at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)

at org.mortbay.http.HttpServer.service(HttpServer.java:954)

jira地址：

counters

3中counters：

1. built-in counters: Map input bytes, Map output records…

2. enum counters

调用方式：

enum Temperature {

MISSING,

MALFORMED

}

reporter.incrCounter(Temperature.MISSING, 1)

结果显示：

09/04/20 06:33:36 INFO mapred.JobClient: Air Temperature Recor

09/04/20 06:33:36 INFO mapred.JobClient: Malformed=3

09/04/20 06:33:36 INFO mapred.JobClient: Missing=66136856

3. dynamic countes:

调用方式：

reporter.incrCounter(“TemperatureQuality”, parser.getQuality(),1);

结果显示：

09/04/20 06:33:36 INFO mapred.JobClient: TemperatureQuality

09/04/20 06:33:36 INFO mapred.JobClient: 2=1246032

09/04/20 06:33:36 INFO mapred.JobClient: 1=973422173

09/04/20 06:33:36 INFO mapred.JobClient: 0=1

7: Namenode in safe mode

解决方法

bin/hadoop dfsadmin -safemode leave

8:java.net.NoRouteToHostException: No route to host

j解决方法：

sudo /etc/init.d/iptables stop

9：更改namenode后，在hive中运行select 依旧指向之前的namenode地址

这是因为：When youcreate a table, hive actually stores the location of the table (e.g.

hdfs://ip:port/user/root/…) in the SDS and DBS tables in the metastore . So when I bring up a new cluster the master has a new IP, but hive’s metastore is still pointing to the locations within the old

cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master

所以要将metastore中的之前出现的namenode地址全部更换为现有的namenode地址

10：Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put).

解决方法：

Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster.

If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s).

If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin’s df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.

11：Your DataNodes won’t start, and you see something like this in logs/*datanode*:

Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data

原因：

Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS.

解决方法：

You need to do something like this:

bin/stop-all.sh

rm -Rf /tmp/hadoop-your-username/*

bin/hadoop namenode -format

12：You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won’t work.

原因：

You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.

解决方法：

Use absolute paths like this from the tutorial:

bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar /

-mapper $HOME/proj/hadoop/multifetch.py /

-reducer $HOME/proj/hadoop/reducer.py /

-input urls/* /

-output titles

13： 2009-01-08 10:02:40,709 ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ““PARTITIONS”” in Catalog “” Schema “”. JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable “org.jpox.autoCreateTables”

原因：就是因为在 hive-default.xml 里把 org.jpox.fixedDatastore 设置成 true 了

starting namenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-namenode-hadoop.out

localhost: starting datanode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-datanode-hadoop.out

localhost: starting secondarynamenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-secondarynamenode-hadoop.out

localhost: Exception in thread “main” java.lang.NullPointerException

localhost: at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130)

localhost: at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)

localhost: at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:120)

localhost: at org.apache.hadoop.dfs.SecondaryNameNode.initialize(SecondaryNameNode.java:124)

localhost: at org.apache.hadoop.dfs.SecondaryNameNode.(SecondaryNameNode.java:108)

localhost: at org.apache.hadoop.dfs.SecondaryNameNode.main(SecondaryNameNode.java:460)

14：09/08/31 18:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010

> 09/08/31 18:25:45 INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001

> 09/08/31 18:25:51 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

Bad connect ack with firstBadLink 192.168.1.16:50010

> 09/08/31 18:25:51 INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001

> 09/08/31 18:25:57 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

Bad connect ack with firstBadLink 192.168.1.11:50010

> 09/08/31 18:25:57 INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001

> 09/08/31 18:26:03 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

Bad connect ack with firstBadLink 192.168.1.16:50010

> 09/08/31 18:26:03 INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001

> 09/08/31 18:26:09 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable

to create new block.

> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)

> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)

> at org.apache.hadoop.hdfs.DFSClient $D FSO u tp u tSt re am$ DataStreamer.run(DFSClient.java:2182)

> 09/08/31 18:26:09 WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001

bad datanode[2] nodes == null

> 09/08/31 18:26:09 WARN hdfs.DFSClient: Could not get block locations. Source file “/user/umer/8GB_input”

- Aborting…

> put: Bad connect ack with firstBadLink 192.168.1.16:50010

解决方法：

I have resolved the issue:

What i did:

‘/etc/init.d/iptables stop’ -->stopped firewall
SELINUX=disabled in ‘/etc/selinux/config’ file.–>disabled selinux

I worked for me after these two changes

解决jline.ConsoleReader.readLine在Windows上不生效问题方法

在 CliDriver.java的main()函数中，有一条语句reader.readLine，用来读取标准输入，但在Windows平台上该语句总是返回null，这个reader是一个实例jline.ConsoleReader实例，给Windows Eclipse调试带来不便。

我们可以通过使用java.util.Scanner.Scanner来替代它，将原来的

while ((line=reader.readLine(curPrompt+"> ")) != null)

复制代码

替换为：

Scanner sc = new Scanner(System.in);

while ((line=sc.nextLine()) != null)

复制代码

重新编译发布，即可正常从标准输入读取输入的SQL语句了。

Windows eclispe调试hive报does not have a scheme错误可能原因

1、Hive配置文件中的“hive.metastore.local”配置项值为false，需要将它修改为true，因为是单机版

2、没有设置HIVE_HOME环境变量，或设置错误

3、 “does not have a scheme”很可能是因为找不到“hive-default.xml”。使用Eclipse调试Hive时，遇到找不到hive- default.xml的解决方法：http://bbs.hadoopor.com/thread-292-1-1.html

1、中文问题

从url中解析出中文,但hadoop中打印出来仍是乱码?我们曾经以为hadoop是不支持中文的，后来经过查看源代码，发现hadoop仅仅是不支持以gbk格式输出中文而己。

这是TextOutputFormat.class中的代码，hadoop默认的输出都是继承自FileOutputFormat来的，FileOutputFormat的两个子类一个是基于二进制流的输出，一个就是基于文本的输出TextOutputFormat。

public class TextOutputFormat<K, V> extends FileOutputFormat<K, V> {

protected static class LineRecordWriter<K, V>

implements RecordWriter<K, V> {

private static final String utf8 = “UTF-8″;//这里被写死成了utf-8

private static final byte[] newline;

static {

try {

newline = “/n”.getBytes(utf8);

} catch (UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);

}

…

public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {

this.out = out;

try {

this.keyValueSeparator = keyValueSeparator.getBytes(utf8);

} catch (UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);

}

…

private void writeObject(Object o) throws IOException {

if (o instanceof Text) {

Text to = (Text) o;

out.write(to.getBytes(), 0, to.getLength());//这里也需要修改

} else {

out.write(o.toString().getBytes(utf8));

}

…

}

可以看出hadoop默认的输出写死为utf-8，因此如果decode中文正确，那么将Linux客户端的character设为utf-8是可以看到中文的。因为hadoop用utf-8的格式输出了中文。

因为大多数数据库是用gbk来定义字段的，如果想让hadoop用gbk格式输出中文以兼容数据库怎么办？

我们可以定义一个新的类：

public class GbkOutputFormat<K, V> extends FileOutputFormat<K, V> {

protected static class LineRecordWriter<K, V>

implements RecordWriter<K, V> {

//写成gbk即可

private static final String gbk = “gbk”;

private static final byte[] newline;

static {

try {

newline = “/n”.getBytes(gbk);

} catch (UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);

}

…

public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {

this.out = out;

try {

this.keyValueSeparator = keyValueSeparator.getBytes(gbk);

} catch (UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);

}

…

private void writeObject(Object o) throws IOException {

if (o instanceof Text) {

// Text to = (Text) o;

// out.write(to.getBytes(), 0, to.getLength());

// } else {

out.write(o.toString().getBytes(gbk));

}

…

}

然后在mapreduce代码中加入conf1.setOutputFormat(GbkOutputFormat.class)

即可以gbk格式输出中文。

2、某次正常运行mapreduce实例时,抛出错误

java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)

at org.apache.hadoop.dfs.DFSClient $D FSO u tp u tSt re am$ DataStreamer.run(DFSClient.java:1889)

java.io.IOException: Could not get block locations. Aborting…

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)

at org.apache.hadoop.dfs.DFSClient $D FSO u tp u tSt re am$ DataStreamer.run(DFSClient.java:1889)

经查明，问题原因是linux机器打开了过多的文件导致。用命令ulimit -n可以发现linux默认的文件打开数目为1024，修改/ect/security/limit.conf，增加hadoop soft 65535

再重新运行程序（最好所有的datanode都修改），问题解决

3、运行一段时间后hadoop不能stop-all.sh的问题，显示报错

no tasktracker to stop ，no datanode to stop

问题的原因是hadoop在stop的时候依据的是datanode上的mapred和dfs进程号。而默认的进程号保存在/tmp下，linux默认会每隔一段时间（一般是一个月或者7天左右）去删除这个目录下的文件。因此删掉hadoop-hadoop-jobtracker.pid和hadoop- hadoop-namenode.pid两个文件后，namenode自然就找不到datanode上的这两个进程了。

在配置文件中的export HADOOP_PID_DIR可以解决这个问题

问题：

Incompatible namespaceIDs in /usr/local/hadoop/dfs/data: namenode namespaceID = 405233244966; datanode namespaceID = 33333244

原因：
自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数Java工程师，想要提升技能，往往是自己摸索成长或者是报班学习，但对于培训机构动则几千的学费，着实压力不小。自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！

因此收集整理了一份《2024年Java开发全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，基本涵盖了95%以上Java开发知识点，真正体系化！

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频，并且会持续更新！

如果你觉得这些内容对你有帮助，可以扫码获取！！（资料价值较高，非无偿）

最后希望可以帮助到大家！

千千万万要记得：多刷题！！多刷题！！

之前算法是我的硬伤，后面硬啃了好长一段时间才补回来，算法才是程序员的灵魂！！！！

篇幅有限，以下只能截图分享部分的资源！！

（1）多线程（这里以多线程为代表，其实整理了一本JAVA核心架构笔记集）

（2）刷的算法题（还有左神的算法笔记）

（3）面经+真题解析+对应的相关笔记（很全面）

（4）视频学习（部分）

ps：当你觉得学不进或者累了的时候，视频是个不错的选择

在这里，最后只一句话：祝大家offer拿到手软！！
《一线大厂Java面试题解析+核心总结学习笔记+最新讲解视频+实战项目源码》，点击传送门即可获取！
得这些内容对你有帮助，可以扫码获取！！（资料价值较高，非无偿）**

最后希望可以帮助到大家！

千千万万要记得：多刷题！！多刷题！！

之前算法是我的硬伤，后面硬啃了好长一段时间才补回来，算法才是程序员的灵魂！！！！

篇幅有限，以下只能截图分享部分的资源！！

（1）多线程（这里以多线程为代表，其实整理了一本JAVA核心架构笔记集）

[外链图片转存中…(img-fDi3Ihko-1711603240177)]

（2）刷的算法题（还有左神的算法笔记）

[外链图片转存中…(img-g4o091Uz-1711603240177)]

（3）面经+真题解析+对应的相关笔记（很全面）

[外链图片转存中…(img-bPakt2Lv-1711603240177)]

（4）视频学习（部分）

ps：当你觉得学不进或者累了的时候，视频是个不错的选择

在这里，最后只一句话：祝大家offer拿到手软！！
《一线大厂Java面试题解析+核心总结学习笔记+最新讲解视频+实战项目源码》，点击传送门即可获取！

2401_83642079

关注

16
点赞
踩
21

收藏

觉得还不错? 一键收藏
0
评论
Hadoop常见问题及解决办法

千千万万要记得：多刷题！！多刷题！！之前算法是我的硬伤，后面硬啃了好长一段时间才补回来，算法才是程序员的灵魂！！！！篇幅有限，以下只能截图分享部分的资源！！（1）多线程（这里以多线程为代表，其实整理了一本JAVA核心架构笔记集）（2）刷的算法题（还有左神的算法笔记）（3）面经+真题解析+对应的相关笔记（很全面）（4）视频学习（部分）ps：当你觉得学不进或者累了的时候，视频是个不错的选择在这里，最后只一句话：祝大家offer拿到手软！！
复制链接

扫一扫