本文由Mr_Willy原创,转载请注明出处。
1.下载好Hadoop并解压到 $HOME/hadoop
2.添加环境变量
sudo gedit /etc/profile
在文件最后添加
#set hadoop environment
HADOOP_HOME=/home/willy/hadoop
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:${HADOOP_HOME}/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:${HADOOP_HOME}/bin:$PATH
保存后更新环境变量source /etc/profile
3.为hadoop shell增加运行属性
chmod a+x -R ./bin
4.此时就可以直接使用hadoop命令进行一系列操作了
5.格式化一个新的分布式文件系统:
$hadoop namenode -format
重启的时候可以使用 hadoop namenode
6.启动Hadoop守护进程:
$start-all.sh
Hadoop守护进程的日志写入到 ${HADOOP_LOG_DIR} 目录 (默认是 ${HADOOP_HOME}/logs).
浏览NameNode和JobTracker的网络接口,它们的地址默认为:
- NameNode - http://localhost:50070/
- JobTracker - http://localhost:50030/
7.使用Hadoop权威指南的一个例子
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature{
static class MaxTempMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
private static final int Missing = 9999;
public void map(LongWritable key,Text value,Context context)
throws IOException,InterruptedException{
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if(line.charAt(87)=='+'){
airTemperature = Integer.parseInt(line.substring(88, 92));
}else{
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if(airTemperature != Missing && quality.matches("[01459]")){
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
static class MaxTempReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key,Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException{
int maxValue = Integer.MIN_VALUE;
for(IntWritable value:values){
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
public static void main(String[] args)throws Exception{
if(args.length!=2){
System.err.println("Usage: MaxTemperature <input> <output>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTempMapper.class);
job.setReducerClass(MaxTempReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}
将文件打包成Jar,记得要选上main函数的入口
天气数据sample可以从ncdc.noaa.gov下载
8.将输入文件拷贝到分布式文件系统:
hadoop fs -put ~/weather/1901 ./input
9.运行程序
hadoop jar ~/MaxTemperature.jar ./input ./output
10.查看输出文件:
hadoop fs -cat output/*
11.完成全部操作后,停止守护进程:
stop-all.sh