首先说一下在eclipse下搭建Hadoop开发环境
准备工具:eclipse ,hadoop插件,注意版本要一致
eclipse可以去官网下载
插件我上传了一个,hadoop-eclipse-plugin-2.7.2.jar将插件cp到eclipse/plugins下面
- Window -> Open Perspective -> Other 选择Map/Reduce
在eclipse下端,控制台旁边会多一个Tab,叫“Map/Reduce Locations”,在下面空白的地方点右键,选择“New Hadoop location…”在弹出的对话框中填写如下内容:
Location name(取个名字)
Map/Reduce Master(Job Tracker的IP和端口,根据mapred-site.xml中配置的mapred.job.tracker来填写)
DFS Master(Name Node的IP和端口,根据core-site.xml中配置的fs.default.name来填写)创建MapReduce工程
5.1配置Hadoop路径
Window -> Preferences 选择 “Hadoop Map/Reduce”,点击“Browse…”选择Hadoop文件夹的路径。
这个步骤与运行环境无关,只是在新建工程的时候能将hadoop根目录和lib目录下的所有jar包自动导入。
5.2创建工程File -> New -> Project 选择“Map/Reduce Project”,然后输入项目名称,创建项目。插件会自动把hadoop根目录和lib目录下的所有jar包导入。
5.3创建Mapper或者ReducerFile -> New -> Mapper 创建Mapper,自动继承mapred包里面的MapReduceBase并实现Mapper接口。
注意:这个插件自动继承的是mapred包里旧版的类和接口,新版的Mapper得自己写。Reducer同理。
下面是wordcount源码:
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import sun.util.locale.StringTokenIterator;
public class WordCount {
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one=new IntWritable(1);
private Text word=new Text();
public void map(LongWritable key,Text value,OutputCollector<Text, IntWritable> output,Reporter reporter)throws IOException{
String line=value.toString();
StringTokenizer stringTokenizer=new StringTokenizer(line);
while(stringTokenizer.hasMoreTokens()){
word.set(stringTokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key,Iterator<IntWritable> value,OutputCollector<Text, IntWritable> output,Reporter reporter)throws IOException{
int sum=0;
while(value.hasNext()){
sum+=value.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception{
// TODO Auto-generated method stub
Configuration conf=new Configuration();
JobConf job=new JobConf(conf,WordCount.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJobName("wordcount");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
JobClient.runJob(job);
}
}