【参考链接】http://www.cnblogs.com/shishanyuan/archive/2014/12/22/4177908.html
原始数据和代码准备完成,下一步开始
【1】创建输入文件夹input
A22811459:/home/longhui/hadoop # hadoop dfs -mkdir input
A22811459:/home/longhui/hadoop # hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
A22811459:/home/longhui/hadoop # hadoop dfs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2016-12-15 13:00 /tmp
drwxr-xr-x - root supergroup 0 2016-12-15 14:06 /user
A22811459:/home/longhui/hadoop # hadoop dfs -ls /user
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root
A22811459:/home/longhui/hadoop # hadoop dfs -ls /user/root
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
【2这一步可忽略】创建输出文件夹output
A22811459:/home/longhui/hadoop # hadoop dfs -mkdir /user/root/output
A22811459:/home/longhui/hadoop # hadoop dfs -ls
Found 2 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
drwxr-xr-x - root supergroup 0 2016-12-15 16:16 /user/root/output
删除目录A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop dfs -rmr /user/root/output
Deleted hdfs://A22811459:9000/user/root/output
A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:18 /user/root/input
输出目录是自动生成的,不需要手动创建,所以要删除
【3】将气象数据复制到HDFS文件系统的输入目录下
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -copyFromLocal sample.txt /user/root/input
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -ls /user/root/input/
Found 1 items
-rw-r--r-- 1 root supergroup 529 2016-12-15 16:18 /user/root/input/sample.txt
【4】根据java文件生成class文件,再生成jar文件
max_temperature.sh文件内容如下面四行所示:
A22811459:/home/longhui/hadoop/codes/1maxTemperature # sh max_temperature.sh
added manifest
adding: MaxTemperature.class(in = 1418) (out= 800)(deflated 43%)
adding: MaxTemperatureMapper.class(in = 1876) (out= 804)(deflated 57%)
adding: MaxTemperatureReducer.class(in = 1660) (out= 704)(deflated 57%)
adding: MaxTemperatureWithCombiner.class(in = 1494) (out= 829)(deflated 44%)
【5】运行程序
类名 输入文件 输出文件夹
16/12/15 16:29:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/12/15 16:29:59 INFO input.FileInputFormat: Total input paths to process : 1
16/12/15 16:29:59 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/12/15 16:29:59 WARN snappy.LoadSnappy: Snappy native library not loaded
16/12/15 16:30:00 INFO mapred.JobClient: Running job: job_201612151254_0003
16/12/15 16:30:01 INFO mapred.JobClient: map 0% reduce 0%
16/12/15 16:30:05 INFO mapred.JobClient: map 100% reduce 0%
16/12/15 16:30:12 INFO mapred.JobClient: map 100% reduce 33%
16/12/15 16:30:14 INFO mapred.JobClient: map 100% reduce 100%
16/12/15 16:30:14 INFO mapred.JobClient: Job complete: job_201612151254_0003
16/12/15 16:30:14 INFO mapred.JobClient: Counters: 29
16/12/15 16:30:14 INFO mapred.JobClient: Job Counters
16/12/15 16:30:14 INFO mapred.JobClient: Launched reduce tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4214
16/12/15 16:30:14 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/15 16:30:14 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/12/15 16:30:14 INFO mapred.JobClient: Launched map tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: Data-local map tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8510
16/12/15 16:30:14 INFO mapred.JobClient: File Output Format Counters
16/12/15 16:30:14 INFO mapred.JobClient: Bytes Written=17
16/12/15 16:30:14 INFO mapred.JobClient: FileSystemCounters
16/12/15 16:30:14 INFO mapred.JobClient: FILE_BYTES_READ=61
16/12/15 16:30:14 INFO mapred.JobClient: HDFS_BYTES_READ=642
16/12/15 16:30:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=106464
16/12/15 16:30:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=17
16/12/15 16:30:14 INFO mapred.JobClient: File Input Format Counters
16/12/15 16:30:14 INFO mapred.JobClient: Bytes Read=529
16/12/15 16:30:14 INFO mapred.JobClient: Map-Reduce Framework
16/12/15 16:30:14 INFO mapred.JobClient: Map output materialized bytes=61
16/12/15 16:30:14 INFO mapred.JobClient: Map input records=5
16/12/15 16:30:14 INFO mapred.JobClient: Reduce shuffle bytes=61
16/12/15 16:30:14 INFO mapred.JobClient: Spilled Records=10
16/12/15 16:30:14 INFO mapred.JobClient: Map output bytes=45
16/12/15 16:30:14 INFO mapred.JobClient: CPU time spent (ms)=2500
16/12/15 16:30:14 INFO mapred.JobClient: Total committed heap usage (bytes)=218759168
16/12/15 16:30:14 INFO mapred.JobClient: Combine input records=0
16/12/15 16:30:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=113
16/12/15 16:30:14 INFO mapred.JobClient: Reduce input records=5
16/12/15 16:30:14 INFO mapred.JobClient: Reduce input groups=2
16/12/15 16:30:14 INFO mapred.JobClient: Combine output records=0
16/12/15 16:30:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=216408064
16/12/15 16:30:14 INFO mapred.JobClient: Reduce output records=2
16/12/15 16:30:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=752107520
16/12/15 16:30:14 INFO mapred.JobClient: Map output records=5
【6】查看结果
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -ls /user/root/output
Found 3 items
-rw-r--r-- 1 root supergroup 0 2016-12-15 16:30 /user/root/output/_SUCCESS
drwxr-xr-x - root supergroup 0 2016-12-15 16:30 /user/root/output/_logs
-rw-r--r-- 1 root supergroup 17 2016-12-15 16:30 /user/root/output/part-r-00000
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -cat /user/root/output/part-r-00000
1949 111
1950 22
【7】通过web界面查看
【7.1】http://10.17.35.110:50030/jobtracker.jsp
【7.2】http://10.17.35.110:50070/
【0.1】原始数据sample.txt
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
【0.2】MaxTemperature.java
// cc MaxTemperature Application to find the maximum temperature in the weather dataset
// vv MaxTemperature
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
// ^^ MaxTemperature
【0.3】MaxTemperatureMapper.java
// cc MaxTemperatureMapper Mapper for maximum temperature example
// vv MaxTemperatureMapper
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
// ^^ MaxTemperatureMapper
【0.4】MaxTemperatureReducer.java
// cc MaxTemperatureReducer Reducer for maximum temperature example
// vv MaxTemperatureReducer
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
// ^^ MaxTemperatureReducer
【0.5】MaxTemperatureWithCombiner.java
// cc MaxTemperatureWithCombiner Application to find the maximum temperature, using a combiner function for efficiency
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
// vv MaxTemperatureWithCombiner
public class MaxTemperatureWithCombiner {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperatureWithCombiner <input path> " +
"<output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperatureWithCombiner.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
/*[*/job.setCombinerClass(MaxTemperatureReducer.class)/*]*/;
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
// ^^ MaxTemperatureWithCombiner
原始数据和代码准备完成,下一步开始
【1】创建输入文件夹input
A22811459:/home/longhui/hadoop # hadoop dfs -mkdir input
A22811459:/home/longhui/hadoop # hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
A22811459:/home/longhui/hadoop # hadoop dfs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2016-12-15 13:00 /tmp
drwxr-xr-x - root supergroup 0 2016-12-15 14:06 /user
A22811459:/home/longhui/hadoop # hadoop dfs -ls /user
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root
A22811459:/home/longhui/hadoop # hadoop dfs -ls /user/root
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
【2这一步可忽略】创建输出文件夹output
A22811459:/home/longhui/hadoop # hadoop dfs -mkdir /user/root/output
A22811459:/home/longhui/hadoop # hadoop dfs -ls
Found 2 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
drwxr-xr-x - root supergroup 0 2016-12-15 16:16 /user/root/output
删除目录A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop dfs -rmr /user/root/output
Deleted hdfs://A22811459:9000/user/root/output
A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:18 /user/root/input
输出目录是自动生成的,不需要手动创建,所以要删除
【3】将气象数据复制到HDFS文件系统的输入目录下
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -copyFromLocal sample.txt /user/root/input
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -ls /user/root/input/
Found 1 items
-rw-r--r-- 1 root supergroup 529 2016-12-15 16:18 /user/root/input/sample.txt
【4】根据java文件生成class文件,再生成jar文件
max_temperature.sh文件内容如下面四行所示:
CLASSPATH=/home/longhui/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar
rm -f *.class
javac -classpath $CLASSPATH *.java
jar cvf MaxTemperature.jar *.class
运行生成jar文件
A22811459:/home/longhui/hadoop/codes/1maxTemperature # sh max_temperature.sh
added manifest
adding: MaxTemperature.class(in = 1418) (out= 800)(deflated 43%)
adding: MaxTemperatureMapper.class(in = 1876) (out= 804)(deflated 57%)
adding: MaxTemperatureReducer.class(in = 1660) (out= 704)(deflated 57%)
adding: MaxTemperatureWithCombiner.class(in = 1494) (out= 829)(deflated 44%)
【5】运行程序
类名 输入文件 输出文件夹
hadoop jar MaxTemperature.jar MaxTemperature /user/root/input/sample.txt /user/root/output
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop jar MaxTemperature.jar MaxTemperature /user/root/input/sample.txt /user/root/output
16/12/15 16:29:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/12/15 16:29:59 INFO input.FileInputFormat: Total input paths to process : 1
16/12/15 16:29:59 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/12/15 16:29:59 WARN snappy.LoadSnappy: Snappy native library not loaded
16/12/15 16:30:00 INFO mapred.JobClient: Running job: job_201612151254_0003
16/12/15 16:30:01 INFO mapred.JobClient: map 0% reduce 0%
16/12/15 16:30:05 INFO mapred.JobClient: map 100% reduce 0%
16/12/15 16:30:12 INFO mapred.JobClient: map 100% reduce 33%
16/12/15 16:30:14 INFO mapred.JobClient: map 100% reduce 100%
16/12/15 16:30:14 INFO mapred.JobClient: Job complete: job_201612151254_0003
16/12/15 16:30:14 INFO mapred.JobClient: Counters: 29
16/12/15 16:30:14 INFO mapred.JobClient: Job Counters
16/12/15 16:30:14 INFO mapred.JobClient: Launched reduce tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4214
16/12/15 16:30:14 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/15 16:30:14 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/12/15 16:30:14 INFO mapred.JobClient: Launched map tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: Data-local map tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8510
16/12/15 16:30:14 INFO mapred.JobClient: File Output Format Counters
16/12/15 16:30:14 INFO mapred.JobClient: Bytes Written=17
16/12/15 16:30:14 INFO mapred.JobClient: FileSystemCounters
16/12/15 16:30:14 INFO mapred.JobClient: FILE_BYTES_READ=61
16/12/15 16:30:14 INFO mapred.JobClient: HDFS_BYTES_READ=642
16/12/15 16:30:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=106464
16/12/15 16:30:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=17
16/12/15 16:30:14 INFO mapred.JobClient: File Input Format Counters
16/12/15 16:30:14 INFO mapred.JobClient: Bytes Read=529
16/12/15 16:30:14 INFO mapred.JobClient: Map-Reduce Framework
16/12/15 16:30:14 INFO mapred.JobClient: Map output materialized bytes=61
16/12/15 16:30:14 INFO mapred.JobClient: Map input records=5
16/12/15 16:30:14 INFO mapred.JobClient: Reduce shuffle bytes=61
16/12/15 16:30:14 INFO mapred.JobClient: Spilled Records=10
16/12/15 16:30:14 INFO mapred.JobClient: Map output bytes=45
16/12/15 16:30:14 INFO mapred.JobClient: CPU time spent (ms)=2500
16/12/15 16:30:14 INFO mapred.JobClient: Total committed heap usage (bytes)=218759168
16/12/15 16:30:14 INFO mapred.JobClient: Combine input records=0
16/12/15 16:30:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=113
16/12/15 16:30:14 INFO mapred.JobClient: Reduce input records=5
16/12/15 16:30:14 INFO mapred.JobClient: Reduce input groups=2
16/12/15 16:30:14 INFO mapred.JobClient: Combine output records=0
16/12/15 16:30:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=216408064
16/12/15 16:30:14 INFO mapred.JobClient: Reduce output records=2
16/12/15 16:30:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=752107520
16/12/15 16:30:14 INFO mapred.JobClient: Map output records=5
【6】查看结果
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -ls /user/root/output
Found 3 items
-rw-r--r-- 1 root supergroup 0 2016-12-15 16:30 /user/root/output/_SUCCESS
drwxr-xr-x - root supergroup 0 2016-12-15 16:30 /user/root/output/_logs
-rw-r--r-- 1 root supergroup 17 2016-12-15 16:30 /user/root/output/part-r-00000
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -cat /user/root/output/part-r-00000
1949 111
1950 22
【7】通过web界面查看
【7.1】http://10.17.35.110:50030/jobtracker.jsp
【7.2】http://10.17.35.110:50070/