1,hadoop包比较如下表:
hadoop版本1.x的包一般是*.mapreduce.* | hadoop版本0.2.x的包一般是*.mapred.* |
hadoop.mapred.JobClient; hadoop.mapred.JobConf; hadoop.mapred.MapReduceBase; hadoop.mapred.Mapper; hadoop.mapred.Reducer; hadoop.mapred.FileInputFormat; hadoop.mapred.FileOutputFormat; hadoop.mapred.TextInputFormat; hadoop.mapred.TextOutputFormat; hadoop.mapred.OutputCollector; hadoop.mapred.Reporter; | hadoop.mapreduce.Job; hadoop.mapreduce.Mapper; hadoop.mapreduce.Reducer; hadoop.mapreduce.lib.input.FileInputFormat; hadoop.mapreduce.lib.input.TextInputFormat; hadoop.mapreduce.lib.output.FileOutputFormat; hadoop.mapreduce.lib.output.TextOutputFormat; |
2,Mapper函数,新版api:extends Mapper,旧api: extendsMapReduceBase implements Mapper
static class myMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable>{ @Override public void map(LongWritable key, Text value,OutputCollector<Text, LongWritable> output, Reporter reporter)throws IOException { String[] lineItems = value.toString().split("\t"); for(String item:lineItems){ output.collect(new Text(item), new LongWritable(1)); } } } |
3,Reducer函数,新api: extends Reducer,老api: extends MapReduceBase implements Reducer
static class myReducer extends MapReduceBase implements Reducer<Text, LongWritable, Text, LongWritable>{ @Override public void reduce(Text key, Iterator<LongWritable> values,OutputCollector<Text, LongWritable> output, Reporter reporter)throws IOException { long count = 0L; while (values.hasNext()){ final long temp = values.next().get(); count += temp; } output.collect(key, new LongWritable(count)); } } |
4,驱动代码
1.x版本使用Job | 0.2.*版本使用JobConf |
Job job = new Job(conf); | JobConf jobConf = new JobConf(conf); |
1.版本job.waitForCompletion(true)提交job | 0.2*版本使用JobClient.runJob(job) |
job.waitForCompletion(true); | JobClient.runJob(jobConf); |
,