一、前言
从一个简单的程序观察hadoop 的运行过程
二、window 下hadoop 的开发环境搭建
环境准备:
hadoop 1.2.1
eclipse Version: Mars Release (4.5.0)
hadoop-eclipse-plugin-1.2.1(网上有许多,不再重复提供)
hadoop-eclipse-plugin-1.2.1放到dropins 启动eclipse
根据自己的情况填写
advanced parameters:这里要改一下
出现
代表成功了
问题ps: 一直有listening.....请检查插件版本与hadoop 版本是否一致
插件安装没有成功,原因可以是eclipse版本过旧,你到官网下载最新的版本
三、hadoop 运行过程
下面是hadoop-example的一个经典程序hello word,是wordCount
package hadoop.v3;
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.hai.hdfs.utils.HDFSUtils;
/**
* hadoop 的 hello word
* @author : chenhaipeng
* @date : 2015年9月5日 下午8:38:13
*/
public class WordCount {
//Map
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
//reduce
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,Reporter reporter)throws IOException{
int sum = 0;
while(values.hasNext()){
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void deletedir(String path){
try {
HDFSUtils.DeleteHDFSFile(path);
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("mywordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
deletedir(args[1]);
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
输入为:
Mary had a little lamb
its fleece very white as snow
and everywhere that Mary went
the lamb was sure to go
its fleece very white as snow
and everywhere that Mary went
the lamb was sure to go
查看输出结果如下 :
进入http://hai:50030/jobtracker.jsp 发现根本就没有运行这个job,但结果有输出,什么鬼?
参巧hadoop 自学指南-hadoop 的各种问题汇总