本地模式测试编写的MapReduce作业程序

4 篇文章 0 订阅
2 篇文章 0 订阅

MapReduce作业任务过程分为两个处理阶段:map阶段和reduce阶段,每个阶段都以键-值对的形式作为输入和输出。下面分别列出map函数和reduce函数。(reduce的输入必须匹配map的输出。)本例,map阶段采集的是气象数据,依据年份作为key,进行排序,温度值作为value。然后reduce对输入的map数据,从中挑选年份中的最高气温值。(本例使用的是hadoop-2.8.5)

  1.  Mapper类实现: 

package com.hadoop.ncdc.test;

import java.io.IOException;

import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

	private static final int MISSING = 9999;

	public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //hadoop2的新API使用了Context,其统一了旧API中的JobConf、OutputCollector和Reporter。
		String line = value.toString();
		String year = line.substring(15, 19);
		int airTemperature;
		if (line.charAt(87) == '+') {
			airTemperature = Integer.parseInt(line.substring(88, 92));
		} else {
			airTemperature = Integer.parseInt(line.substring(87, 92));
		}
		String quality = line.substring(92, 93);
		if (airTemperature != MISSING && quality.matches("[01459]")) {
			context.write(new Text(year), new IntWritable(airTemperature));
		}
	}
}

 2.   Reducer类的实现:

package com.hadoop.ncdc.test;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

	public void reduce(Text key, Iterable<IntWritable> values, Context context)
			throws IOException, InterruptedException {
        //hadoop2的新API使用了Context,其统一了旧API中的JobConf、OutputCollector和Reporter。
		int maxValue = Integer.MIN_VALUE;
		for (IntWritable value : values) {
			maxValue = Math.max(maxValue, value.get());
		}
		context.write(key, new IntWritable(maxValue));
	}
}

 3.  main class:

package com.hadoop.ncdc.test;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;

public class MaxTemperature {

	public static void main(String[] args) throws Exception {
		// TODO Auto-generated method stub
        //新版API在org.apache.hadoop.mapreduce包中,老版是在org.apache.hadoop.mapred中
		Configuration config = new Configuration();
        //新API通过Job类来完成作业控制,旧API中对应的是JobClient,新API中已经删除该类。
		Job job = Job.getInstance(config, "Max temperature");
		job.setJarByClass(MaxTemperature.class);
		//args[0]命令行第一个输入路径参数
		FileInputFormat.addInputPath(job, new Path(args[0]));
		//args[1]命令行第二个输出路径参数
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		job.setMapperClass(MaxTemperatureMapper.class);
		job.setReducerClass(MaxTemperatureReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}

}

4.  打包成jar文件(hadoop-test.jar),运行测试。

mymacdeMac-mini:~ mymac$ hadoop jar /Users/mymac/Desktop/jjj/hadoop-test.jar com.hadoop.ncdc.test.MaxTemperature /Users/mymac/Desktop/NCDCData /Users/mymac/Desktop/output
(com.hadoop.ncdc.test.MaxTemperature这里是类所在的完整包名,输入文件是NCDCData,输出为output目录。)
命令行输入:hadoop jar jar文件路径 完整包名的main类名 输入路径 输出路径
----------------------------------------------------------------------------------
如果hadoop后面跟main类文件名(完整包名),那么需要在hadoop_classpath追加jar包。在命令行添加一句:
export HADOOP_CLASSPATH=/Users/mymac/Desktop/jjj/hadoop-test.jar(仅作为测试用,重启终端环境变量会还原为默认值) 
在执行下面命令行输入:
hadoop 完整包名的main类名 输入路径 输出路径

测试成功:

18/12/12 21:34:32 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/12/12 21:34:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/12/12 21:34:32 INFO input.FileInputFormat: Total input files to process : 2
18/12/12 21:34:32 INFO mapreduce.JobSubmitter: number of splits:2 //作业输入分片为2个
18/12/12 21:34:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1917490337_0001
18/12/12 21:34:33 INFO mapreduce.Job: The url to track the job: http://localhost:8080/                              
18/12/12 21:34:33 INFO mapreduce.Job: Running job: job_local1917490337_0001//作业1的ID
18/12/12 21:34:33 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/12/12 21:34:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/12/12 21:34:33 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/12/12 21:34:33 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/12/12 21:34:33 INFO mapred.LocalJobRunner: Waiting for map tasks
18/12/12 21:34:33 INFO mapred.LocalJobRunner: Starting task: attempt_local1917490337_0001_m_000000_0//第一个map任务第一次尝试
18/12/12 21:34:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local1917490337_0001_m_000000_0
18/12/12 21:34:33 INFO mapred.LocalJobRunner: Starting task: attempt_local1917490337_0001_m_000001_0//第二个map任务第一次尝试
18/12/12 21:34:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local1917490337_0001_m_000001_0
18/12/12 21:34:33 INFO mapred.LocalJobRunner: map task executor complete.//map任务完成
18/12/12 21:34:33 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/12/12 21:34:33 INFO mapred.LocalJobRunner: Starting task: attempt_local1917490337_0001_r_000000_0//开始第一个reduce任务的第一次尝试
18/12/12 21:34:33 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map //reduce抓取map的混洗数据
attempt_local1917490337_0001_m_000001_0 decomp: 72206 len: 72210 to MEMORY
18/12/12 21:34:33 INFO reduce.InMemoryMapOutput: Read 72206 bytes from map-output for attempt_local1917490337_0001_m_000001_0//reduce读取map的输出
18/12/12 21:34:33 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1917490337_0001_r_000000_0' to file:/Users/mymac/Desktop/output/_temporary/0/task_local1917490337_0001_r_000000
18/12/12 21:34:33 INFO mapred.LocalJobRunner: reduce > reduce
18/12/12 21:34:33 INFO mapred.Task: Task 'attempt_local1917490337_0001_r_000000_0' done.
//任务提交完毕,储存在设置的存储目录中
18/12/12 21:34:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local1917490337_0001_r_000000_0
18/12/12 21:34:33 INFO mapred.LocalJobRunner: reduce task executor complete.//reduce任务完成
18/12/12 21:34:34 INFO mapreduce.Job: Job job_local1917490337_0001 running in uber mode : false
18/12/12 21:34:34 INFO mapreduce.Job:  map 100% reduce 100%
18/12/12 21:34:34 INFO mapreduce.Job: Job job_local1917490337_0001 completed successfully//作业完成

注:如果hadoop后面跟main类文件(完整包名),那么需要在hadoop_classpath追加jar包。在命令行添加一句:

export HADOOP_CLASSPATH=你的jar路径,在执行下面:

hadoop 完整包名的类文件名称 输入路径 输出路径。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值