Hadoop常见问题及解决办法

2401_83817916

于 2024-03-25 19:52:00 发布

阅读量596

点赞数 7

分类专栏：程序员文章标签： hadoop 大数据分布式

本文链接：https://blog.csdn.net/2401_83817916/article/details/137023827

版权

程序员专栏收录该内容

295 篇文章 2 订阅

订阅专栏

解决办法是在给运行主类org.apache.nutch.crawl.Crawl加上参数：-Xms64m -Xmx512m

你的或许不是这个问题，但是能看到详细的错误报告问题就好解决了

distribute cache使用

类似一个全局变量，但是由于这个变量较大，所以不能设置在config文件中，转而使用distribute cache

具体使用方法：(详见《the definitive guide》,P240)

1. 在命令行调用时：调用-files，引入需要查询的文件(可以是local file, HDFS file(使用hdfs://xxx?)), 或者 -archives (JAR,ZIP, tar等)

% hadoop jar job.jar MaxTemperatureByStationNameUsingDistributedCacheFile /

-files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/all output

2. 程序中调用：

public void configure(JobConf conf) {

metadata = new NcdcStationMetadata();

try {

metadata.initialize(new File(“stations-fixed-width.txt”));

} catch (IOException e) {

throw new RuntimeException(e);

}

另外一种间接的使用方法：在hadoop-0.19.0中好像没有

调用addCacheFile()或者addCacheArchive()添加文件，

使用getLocalCacheFiles() 或 getLocalCacheArchives() 获得文件

hadoop的job显示web

There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master) which display status pages about the state of the entire system. By default, these are located at [WWW] http://job.tracker.addr:50030/ and [WWW] http://name.node.addr:50070/.

hadoop监控

OnlyXP(52388483) 131702

用nagios作告警，ganglia作监控图表即可

status of 255 error

错误类型：

java.io.IOException: Task process exit with nonzero status of 255.

at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

错误原因：

Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I’m not sure

split size

FileInputFormat input splits: (详见《the definitive guide》P190)

mapred.min.split.size: default=1, the smallest valide size in bytes for a file split.

mapred.max.split.size: default=Long.MAX_VALUE, the largest valid size.

dfs.block.size: default = 64M, 系统中设置为128M。

如果设置 minimum split size > block size, 会增加块的数量。(猜想从其他节点拿去数据的时候，会合并block，导致block数量增多)

如果设置maximum split size < block size, 会进一步拆分block。

split size = max(minimumSize, min(maximumSize, blockSize));

其中 minimumSize < blockSize < maximumSize.

sort by value

hadoop 不提供直接的sort by value方法，因为这样会降低mapreduce性能。

但可以用组合的办法来实现，具体实现方法见《the definitive guide》, P250

基本思想：

1. 组合key/value作为新的key；

2. 重载partitioner，根据old key来分割；

conf.setPartitionerClass(FirstPartitioner.class);

3. 自定义keyComparator：先根据old key排序，再根据old value排序；

conf.setOutputKeyComparatorClass(KeyComparator.class);

4. 重载GroupComparator, 也根据old key 来组合； conf.setOutputValueGroupingComparator(GroupComparator.class);

small input files的处理

对于一系列的small files作为input file，会降低hadoop效率。

有3种方法可以将small file合并处理：

1. 将一系列的small files合并成一个sequneceFile，加快mapreduce速度。

详见WholeFileInputFormat及SmallFilesToSequenceFileConverter,《the definitive guide》, P194

2. 使用CombineFileInputFormat集成FileinputFormat，但是未实现过；

3. 使用hadoop archives(类似打包)，减少小文件在namenode中的metadata内存消耗。(这个方法不一定可行，所以不建议使用)

方法：

将/my/files目录及其子目录归档成files.har，然后放在/my目录下

bin/hadoop archive -archiveName files.har /my/files /my

查看files in the archive:

bin/hadoop fs -lsr har://my/files.har

skip bad records

JobConf conf = new JobConf(ProductMR.class);

conf.setJobName(“ProductMR”);

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(Product.class);

conf.setMapperClass(Map.class);

conf.setReducerClass(Reduce.class);

conf.setMapOutputCompressorClass(DefaultCodec.class);

conf.setInputFormat(SequenceFileInputFormat.class);

conf.setOutputFormat(SequenceFileOutputFormat.class);

String objpath = “abc1”;

SequenceFileInputFormat.addInputPath(conf, new Path(objpath));

SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE);

SkipBadRecords.setAttemptsToStartSkipping(conf, 0);

SkipBadRecords.setSkipOutputPath(conf, new Path(“data/product/skip/”));

String output = “abc”;

SequenceFileOutputFormat.setOutputPath(conf, new Path(output));

JobClient.runJob(conf);

For skipping failed tasks try : mapred.max.map.failures.percent

restart 单个datanode

如果一个datanode 出现问题，解决之后需要重新加入cluster而不重启cluster，方法如下：

bin/hadoop-daemon.sh start datanode

bin/hadoop-daemon.sh start jobtracker

reduce exceed 100%

"Reduce Task Progress shows > 100% when the total size of map outputs (for a

single reducer) is high "

造成原因：

在reduce的merge过程中，check progress有误差，导致status > 100%，在统计过程中就会出现以下错误：java.lang.ArrayIndexOutOfBoundsException: 3

at org.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.getReduceAvarageProgresses(StatusHttpServer.java:228)

at org.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.doGet(StatusHttpServer.java:159)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)

at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)

at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)

at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)

at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)

at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)

at org.mortbay.http.HttpServer.service(HttpServer.java:954)

jira地址：

counters

3中counters：

1. built-in counters: Map input bytes, Map output records…

2. enum counters

调用方式：

enum Temperature {

MISSING,

MALFORMED

}

reporter.incrCounter(Temperature.MISSING, 1)

结果显示：

09/04/20 06:33:36 INFO mapred.JobClient: Air Temperature Recor

09/04/20 06:33:36 INFO mapred.JobClient: Malformed=3

09/04/20 06:33:36 INFO mapred.JobClient: Missing=66136856

3. dynamic countes:

调用方式：

reporter.incrCounter(“TemperatureQuality”, parser.getQuality(),1);

结果显示：

09/04/20 06:33:36 INFO mapred.JobClient: TemperatureQuality

09/04/20 06:33:36 INFO mapred.JobClient: 2=1246032

09/04/20 06:33:36 INFO mapred.JobClient: 1=973422173

09/04/20 06:33:36 INFO mapred.JobClient: 0=1

7: Namenode in safe mode

解决方法

bin/hadoop dfsadmin -safemode leave

8:java.net.NoRouteToHostException: No route to host

j解决方法：

sudo /etc/init.d/iptables stop

9：更改namenode后，在hive中运行select 依旧指向之前的namenode地址

这是因为：When youcreate a table, hive actually stores the location of the table (e.g.

hdfs://ip:port/user/root/…) in the SDS and DBS tables in the metastore . So when I bring up a new cluster the master has a new IP, but hive’s metastore is still pointing to the locations within the old

cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master

所以要将metastore中的之前出现的namenode地址全部更换为现有的namenode地址

10：Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put).

解决方法：

Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster.

If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s).

If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin’s df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.

11：Your DataNodes won’t start, and you see something like this in logs/*datanode*:

Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data

原因：

Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS.

解决方法：

You need to do something like this:

bin/stop-all.sh

rm -Rf /tmp/hadoop-your-username/*

bin/hadoop namenode -format

12：You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won’t work.

原因：

You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.

解决方法：

Use absolute paths like this from the tutorial:

bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar /

-mapper $HOME/proj/hadoop/multifetch.py /

-reducer $HOME/proj/hadoop/reducer.py /

-input urls/* /

-output titles

13： 2009-01-08 10:02:40,709 ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ““PARTITIONS”” in Catalog “” Schema “”. JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable “org.jpox.autoCreateTables”

原因：就是因为在 hive-default.xml 里把 org.jpox.fixedDatastore 设置成 true 了

starting namenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-namenode-hadoop.out

localhost: starting datanode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-datanode-hadoop.out

localhost: starting secondarynamenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-secondarynamenode-hadoop.out

localhost: Exception in thread “main” java.lang.NullPointerException

localhost: at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130)

localhost: at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)

localhost: at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:120)

localhost: at org.apache.hadoop.dfs.SecondaryNameNode.initialize(SecondaryNameNode.java:124)

localhost: at org.apache.hadoop.dfs.SecondaryNameNode.(SecondaryNameNode.java:108)

localhost: at org.apache.hadoop.dfs.SecondaryNameNode.main(SecondaryNameNode.java:460)

14：09/08/31 18:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010

> 09/08/31 18:25:45 INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001

> 09/08/31 18:25:51 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

Bad connect ack with firstBadLink 192.168.1.16:50010

> 09/08/31 18:25:51 INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001

> 09/08/31 18:25:57 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

Bad connect ack with firstBadLink 192.168.1.11:50010

> 09/08/31 18:25:57 INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001

> 09/08/31 18:26:03 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

Bad connect ack with firstBadLink 192.168.1.16:50010

> 09/08/31 18:26:03 INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001

> 09/08/31 18:26:09 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable

to create new block.

> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)

> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)

> at org.apache.hadoop.hdfs.DFSClient $D FSO u tp u tSt re am$ DataStreamer.run(DFSClient.java:2182)

> 09/08/31 18:26:09 WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001

bad datanode[2] nodes == null

> 09/08/31 18:26:09 WARN hdfs.DFSClient: Could not get block locations. Source file “/user/umer/8GB_input”

- Aborting…

> put: Bad connect ack with firstBadLink 192.168.1.16:50010

解决方法：

I have resolved the issue:

What i did:

‘/etc/init.d/iptables stop’ -->stopped firewall
SELINUX=disabled in ‘/etc/selinux/config’ file.–>disabled selinux

I worked for me after these two changes

解决jline.ConsoleReader.readLine在Windows上不生效问题方法

在 CliDriver.java的main()函数中，有一条语句reader.readLine，用来读取标准输入，但在Windows平台上该语句总是返回null，这个reader是一个实例jline.ConsoleReader实例，给Windows Eclipse调试带来不便。

我们可以通过使用java.util.Scanner.Scanner来替代它，将原来的

while ((line=reader.readLine(curPrompt+"> ")) != null)

复制代码

替换为：

Scanner sc = new Scanner(System.in);

while ((line=sc.nextLine()) != null)

复制代码

重新编译发布，即可正常从标准输入读取输入的SQL语句了。

Windows eclispe调试hive报does not have a scheme错误可能原因

1、Hive配置文件中的“hive.metastore.local”配置项值为false，需要将它修改为true，因为是单机版

2、没有设置HIVE_HOME环境变量，或设置错误

3、 “does not have a scheme”很可能是因为找不到“hive-default.xml”。使用Eclipse调试Hive时，遇到找不到hive- default.xml的解决方法：http://bbs.hadoopor.com/thread-292-1-1.html

1、中文问题

从url中解析出中文,但hadoop中打印出来仍是乱码?我们曾经以为hadoop是不支持中文的，后来经过查看源代码，发现hadoop仅仅是不支持以gbk格式输出中文而己。

这是TextOutputFormat.class中的代码，hadoop默认的输出都是继承自FileOutputFormat来的，FileOutputFormat的两个子类一个是基于二进制流的输出，一个就是基于文本的输出TextOutputFormat。

public class TextOutputFormat<K, V> extends FileOutputFormat<K, V> {

protected static class LineRecordWriter<K, V>

implements RecordWriter<K, V> {

private static final String utf8 = “UTF-8″;//这里被写死成了utf-8

private static final byte[] newline;

static {

try {

newline = “/n”.getBytes(utf8);

} catch (UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);

}
先自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数初中级Java工程师，想要提升技能，往往是自己摸索成长或者是报班学习，但对于培训机构动则近万的学费，着实压力不小。自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！

因此收集整理了一份《Java开发全套学习资料》送给大家，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频

如果你觉得这些内容对你有帮助，可以扫码领取！

总结

总的来说，面试是有套路的，一面基础，二面架构，三面个人。

最后，小编这里收集整理了一些资料，其中包括面试题（含答案）、书籍、视频等。希望也能帮助想进大厂的朋友

三面蚂蚁金服成功拿到offer后，他说他累了

大家的负担。**

[外链图片转存中…(img-2IQ9lvCe-1711367512813)]

[外链图片转存中…(img-9tqKXhvy-1711367512814)]

[外链图片转存中…(img-C7mUb4vn-1711367512814)]

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频

如果你觉得这些内容对你有帮助，可以扫码领取！

总结

总的来说，面试是有套路的，一面基础，二面架构，三面个人。

最后，小编这里收集整理了一些资料，其中包括面试题（含答案）、书籍、视频等。希望也能帮助想进大厂的朋友

[外链图片转存中…(img-yVSM9csC-1711367512814)]

[外链图片转存中…(img-NKfY9dJJ-1711367512815)]

需要更多Java资料的小伙伴可以帮忙点赞+关注，点击传送门，即可免费领取！

2401_83817916

关注

7
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录