hadoop2.6.3学习第三节：win7+myeclipse2014配置开发环境，和运行WordCount

最新推荐文章于 2024-09-18 17:34:40 发布

夏V风

最新推荐文章于 2024-09-18 17:34:40 发布

阅读量684

点赞数

分类专栏：大数据文章标签： hadoop

本文链接：https://blog.csdn.net/yuanfen99xia/article/details/50628097

版权

大数据专栏收录该内容

2 篇文章 0 订阅

订阅专栏

本节主要是在自己的开发机器上，搭建hadoop开发环境。

操作系统：win7 64位

开发工具：myeclipse2014

hadoop版本：2.6.3

1.首先，我们得先下载myeclipse的hadoop插件，这个网上很多有利用hadoop源码本地编译的教程，大家也可以自己试试，反正我是用现成的jar直接用的。简单方便。下载地址：http://download.csdn.net/detail/yuanfen99xia/9426342点击打开链接

2.将下载好的hadoop-eclipse-plugin-2.6.0.jar放到myeclipse安装目录的\dropins\plugins（注：这里不同版本myeclipse可能插件文件夹不一样）下。然后启动myeclipse

3.这时打开myeclipse工具栏的windows->preferences,此时可以看到左侧菜单多了Hadoop Map/Reduce,讲我们前面想下载的hadoop-2.6.3.tar.gz解压到本地，然后打开Hadoop Map/Reduce，选择我们刚才解压的目录

4.myeclipse工具栏的windows->show views->other中，选择Map/Reduce Locatcion

5.在视图栏右击，选择"new hadoo location",然后在窗口中填入参数。由于我本地没有配置host，所以原配置文件中的master换成了IP地址

1) Map/Reduce(V2)Master
对应mapred-site.xml配置中的mapreduce.jobtracker.http.address的参数，如下：
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
2）DFS Master
对应core-site.xml配置中的：
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
3） User name
如果hdfs-site.xml配置中的：
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
如上所示为false时，这个用户名随便设，否则就要改成在linux中搭建hadoop平台使用的linux帐号。此时Eclipse软件左侧会多出一个的"DFS Locations"

4) 临时目录
接着点击"Advanced parameters"从中找见"hadoop.tmp.dir"，修改成为我们Hadoop集群中设置的地址，我们的Hadoop集群是"/opt/hadoop-2.6.3/tmp"，这个参数在"core-site.xml"进行了配置。
6. 查看HDFS文件系统，并尝试建立文件夹和上传文件。

点击Eclipse软件左侧的"DFS Locations"下面的"myHadoop"，就会展示出HDFS上的文件结构。右击文件夹，选择“create new directory",创建

"/user/input/wordCount"路径的文件夹，然后上传一个本地文件进去，用作测试，另外创建一个"/output"文件夹存放mapreduce的输出目录

7.选择“file-》new-》map/reduce project”，点击next之后，输入一个项目名称，finsh就行。请注意，这里新建的项目之后，会自动把所有hadoop的安装包下所有jar都导入进来，这时候会有很多重复的。为了避免问题，建议把安装包的jar拿出来，然后集中存放，删除重复的，然后手动导入这里。

8.项目下创建类。WordCount.java.

package hdWordCount;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount  {
	public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

		//每次调用map方法，会传入split中的一行数据key；该行数据所在文件中的位置下标，value；
		public void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			StringTokenizer itr = new StringTokenizer(value.toString());
			while (itr.hasMoreTokens()) {
				context.write(new Text(itr.nextToken()), new IntWritable(1));
			}
		}
	}

	public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
		public void reduce(Text key, Iterable<IntWritable> values,
				Context context) throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable val : values) {
				sum += val.get();
			}
			context.write(key, new IntWritable(sum));
		}
	}

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		conf.set("mapreduce.jobhistory.address", "master:10020");
		@SuppressWarnings("deprecation")
		Job job = new Job(conf);
		String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
	    if (otherArgs.length < 2) {
	      System.err.println("Usage: wordcount <in> [<in>...] <out>");
	      System.exit(2);
	    }
		job.setJarByClass(WordCount.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setCombinerClass(IntSumReducer.class);
		job.setReducerClass(IntSumReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		 for (int i = 0; i < otherArgs.length - 1; ++i) {
		      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
		    }
		    FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length - 1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

写好之后，右击"run as ->run as configurations",加入代码需要的文件存储目录。一个输入一个输出目录，输出目录的最末节点应该不存在，运行时自动生成，如果存在，请先删除，否则会报错