mapreduce在本地编写调试过程中经常遇到的异常总结

最新推荐文章于 2021-11-25 16:47:59 发布

YUANXIN-MOJI

最新推荐文章于 2021-11-25 16:47:59 发布

阅读量568

点赞数

分类专栏： hadoop 文章标签： hadoop mapreduce 大数据

本文链接：https://blog.csdn.net/weixin_39315826/article/details/108605815

版权

hadoop 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

mapreduce在本地编写调试过程中经常遇到的异常总结

mapreduce的编程逻辑很清晰，初期学习的时候更多的时间是在出现异常-百度异常-解决异常上面，所以进行了总结，从运行方式切入，mapreduce的运行有三种：

打jar包，上传到集群运行

hadoop -fs jar **/**/**.jar 主类限定名 参数1（输入的文件的目录） 参数2（输出的目录）

**注意：**一般参数2的输出目录不能是已经存在的目录。

ps: 这样的要求是对应于如下的代码：

//这是驱动程序，标记(/**/)代表目录要求的原因
public class Driver {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf);
		job.setJarByClass(Driver.class);
		job.setMapperClass(WordCountMapper.class);
		job.setReducerClass(WordCountReducer.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		
		FileInputFormat.addInputPath(job, new Path(args[0]));
        /**************************************************/
                //args[1]--不可是已存在的目录
		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		job.waitForCompletion(true);
		
		
	}
}

如果想避免因为目录导致的报错

Exception in thread “main” org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://bd/scoreoutput already exists

代码要修改一下：

//这是驱动程序，标记(/**/)代表修改处
public class Driver {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf);
		job.setJarByClass(Driver.class);
		job.setMapperClass(WordCountMapper.class);
		job.setReducerClass(WordCountReducer.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		
		FileInputFormat.addInputPath(job, new Path(args[0]));
        /**************************************************/
		FileSystem fs = FileSystem.get(conf);
		Path out=new Path(args[1]);
		if (fs.exists(out)) {
			fs.delete(out, true);
		}
		FileOutputFormat.setOutputPath(job, out);
		job.waitForCompletion(true);
	}

}

本地连接集群，文件在集群上，但代码在本地运行。这种运行方式在我学习的过程中给我造成了挺多麻烦，所以总结起来。

（1）HA集群如何在本地连接到集群的hdfs上处理文件

core-site.xml和hdfs-site.xml要放到src或者resource文件夹下，否则出现的异常是：

Exception in thread “main” java.lang.IllegalArgumentException: java.net.UnknownHostException: bd

并且driver代码如下：
```
		    System.setProperty("HADOOP_USER_NAME", "hadoop");
			Configuration conf = new Configuration();
			conf.set("fs.defaultFS", "hdfs://bd/");
```
（2）如果想本地文件本地运行的话，情况与（1）相反，不能有）core-site.xml和hdfs-site.xm。否则出现的异常是：

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

Exception in thread “main” java.lang.IllegalArgumentException: Pathname /E:/BaiduYunDownload/Cache/data from E:/BaiduYunDownload/Cache/data is not a valid DFS filename.

（3）本地文件本地运行，输入目录要写全，一直要写到文件（这和连接到集群的情况不一样），否则出现的异常是：

Exception in thread “main” java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F E:\BaiduYunDownload\Cache\flow.log

错误代码如下：
```
FileInputFormat.addInputPath(job, new Path("E:\\BaiduYunDownload\\Cache"));
//没写到文件，只写到了目录
```
（4）出现perminessed denied类似的异常是权限问题，解决的方法：设置HADOOP_NAME=集群用户名的环境变量，或者代码中写入：
```
System.setProperty("HADOOP_USER_NAME", "hadoop");
```
（5）eclipse 中运行 Hadoop2.7.3 map reduce程序出现错误(null) entry in command string: null chmod 0700。参考：https://blog.csdn.net/qq_33252988/article/details/81611300