目录
1.实验目的
写一个Map-Reduce作业,求每年的最低温度。
这里只选用1970年的数据,算出最低温度。
2.实验准备
打开eclipse
cd /usr/eclipse/
./eclipse
-OK
右键New一个Java Project
命名为NOAA
-Finish
在NOAA工程上右键-New-Folder
Folder name输入lib
-Finish
lib文件夹用来存放相关jar包
进行jar包的拷贝
新开一个命令行窗口
su
输入密码
1.导入Hadoop相关的jar包
拷贝“hadoop-hdfs-2.8.5.jar”包
cp /opt/modules/app/hadoop-2.8.5/share/hadoop/hdfs/hadoop-hdfs-2.8.5.jar /root/workspace/NOAA/lib/
拷贝$HADOOP_HOME/share/hadoop/hdfs/lib/下的所有jar包
cp /opt/modules/app/hadoop-2.8.5/share/hadoop/hdfs/lib/* /root/workspace/NOAA/lib/
拷贝“hadoop-common-2.8.5.jar”包
cp /opt/modules/app/hadoop-2.8.5/share/hadoop/common/hadoop-common-2.8.5.jar /root/workspace/NOAA/lib/
拷贝$HADOOP_HOME/share/hadoop/common/lib下的所有jar包
cp /opt/modules/app/hadoop-2.8.5/share/hadoop/common/lib/* /root/workspace/NOAA/lib/
有一些jar包是重复的,出现提示回车覆盖即可
2.MapReduce相关的jar包
cp /opt/modules/app/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.8.5.jar /root/workspace/NOAA/lib/
cp /opt/modules/app/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.5.jar /root/workspace/NOAA/lib/
右键项目-Refresh,可看到lib文件夹下出现jar包
右键项目-Build Path-Configure Build Path
选到当前的lib目录
CRTL+A全部选中-OK-OK
便出现引用的类库
3.代码编写
复制粘贴代码时可能格式会乱,可粘贴完进行自动缩进
Eclipse 自动缩进:Ctrl+Shift+F
3.1 MinTemperature.java
右键NOAA项目下的src
-New-Class
Package填com.noaa
Name填MinTemperature
MinTemperature.java
package com.noaa;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MinTemperature {
public static void main(String[] args) throws Exception {
if(args.length != 2) {
System.err.println("Usage: MinTemperature<input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MinTemperature.class);
job.setJobName("Min temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MinTemperatureMapper.class);
job.setReducerClass(MinTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
3.2 MinTemperatureMapper.java
右键NOAA项目下的src
-New-Class
Package填com.noaa
Name填MinTemperatureMapper
MinTemperatureMapper.java
package com.noaa;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MinTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if(line.charAt(87) == '+') {
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if(airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
3.3 MinTemperatureReducer.java
右键NOAA项目下的src
-New-Class
Package填com.noaa
Name填MinTemperatureReducer
MinTemperatureReducer.java
package com.noaa;
import java.io.IOException;
public class MinTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int minValue = Integer.MAX_VALUE;
for(IntWritable value : values) {
minValue = Math.min(minValue, value.get());
}
context.write(key, new IntWritable(minValue));
}
}
4.打jar包运行
右键NOAA项目-Export-选择JAR file-Next
选择导出路径为tmp
-Next-Next
Browse选择Main Class-OK
-Finish
便导出jar包noaa.jar到tmp路径下
将NOAA 1970 - Part复制到共享文件夹share中
新建一个命令行窗口
进入到该文件夹中
cd /mnt/hgfs/share/NOAA\ 1970\ -\ Part/
使用zcat命令把这些数据文件解压并合并到一个temperature.txt文件中
zcat *.gz > temperature.txt
启动HDFS、启动yarn便启动了MapReduce
cd
start-all.sh
将此文件上传到hdfs的根目录
hdfs dfs -put temperature.txt /
运行jar包,输入为/temperature.txt,输出为/min文件夹
hadoop jar /tmp/noaa.jar /temperature.txt /min
查看计算出来的最低气温
hdfs dfs -cat /min/part-r-00000
可看到1970年的最低气温为-589
可在浏览器中输入:
http://bigdata-senior01.chybinmy.com:8088/cluster/apps/FINISHED
查看job结果
关闭所有服务
stop-all.sh
注:若要重新运行,需要先将输出的文件夹删除
hdfs dfs -rm -r /min