MapReduce类型和格式(一) 类型

最新推荐文章于 2020-10-11 16:52:48 发布

ThisIsNobody

最新推荐文章于 2020-10-11 16:52:48 发布

阅读量684

点赞数

分类专栏： Hadoop MapReduce

本文链接：https://blog.csdn.net/weixin_42129080/article/details/80804511

版权

Hadoop 同时被 2 个专栏收录

35 篇文章 1 订阅

订阅专栏

MapReduce

18 篇文章 0 订阅

订阅专栏

map函数的输出类型必须与reduce的输入类型相同，下面的K1,V1,K2,V2等为抽象类型

map: (K1, V1) -> (K2, V2)

combiner: (K2, list(V2)) -> list(K2, V2)

reduce: (K2, list(V2)) -> list(K3, V3)

一般情况下，combiner和reduce的参数是相同的，即K2=K3, V2=V3

partition对中间结果(K1, V2)处理，返回分区索引，实际上，分区由键决定，即一个键对应一个分区

parititon: (K2, V2) -> integer

类型参数

根据输入输出类型将Context对象参数化

KEYIN VALUEIN KEYOUT VALUEOUT为类型参数

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
  public abstract class Context
    implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  }

}

public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  public abstract class Context 
    implements ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  }

}

类型匹配

1) Java泛型有限制，类型擦出导致运行过程中类型信息并非一直可见，所以Hadoop明确设定数据类型

2) MR配置可能也出现不兼容的类型，因为配置在编译时无法检查，类型冲突是在作业执行过程中检查出来的

默认的MR作业

package thisisnobody.defaultmapreduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * 
 * @author ZLP 显示默认Job的设置
 * 默认处理输入：文件偏移量 + 行
 * 默认结果输出：文件偏移量 + 行
 */
public class MinimalMapReduceWithDefaults extends Configured implements Tool {

	public static void main(String[] args) throws Exception {
		int exitCode = ToolRunner.run(new MinimalMapReduceWithDefaults(), args);
		System.exit(exitCode);
	}

	@Override
	public int run(String[] args) throws Exception {
		Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
		if (job == null)
			return -1;
		/*
		 * Mapper默认设置
		 * 输入格式TextInputFormat，键为LongWritable,值为Text
		 * Mapper类
		 * 输出键类型LongWritable，输出值类型Text
		 */
		job.setInputFormatClass(TextInputFormat.class);
		job.setMapperClass(Mapper.class);
		job.setMapOutputKeyClass(LongWritable.class);
		job.setMapOutputValueClass(Text.class);
		
		/*
		 * Reducer默认设置
		 * Reduce任务数量1
		 * Reducer类
		 * 输出类TextOutputFormat，最后使用Tab将键值分开
		 * 输出键LongWritable，输出值Text
		 */
		job.setNumReduceTasks(1);
		job.setReducerClass(Reducer.class);
		job.setOutputKeyClass(LongWritable.class);
		job.setOutputValueClass(Text.class);
		job.setOutputFormatClass(TextOutputFormat.class);
		/*
		 * Partitioner默认设置
		 * HashPartitioner 对记录的键进行哈希操作决定记录的区，每个分区由一个reduce任务处理，分区数等于reduce任务数
		 * 如果有多个reduce分区，HashPartitioner很重要，均衡性
		 */
		return job.waitForCompletion(true) ? 0 : 1;
	}
}

class JobBuilder {

	public static Job parseInputAndOutput(Tool tool, Configuration conf, String[] args) throws IOException {

		Path in = new Path("c:/users/zlp/desktop/defaults.txt");
		Path out = new Path("c:/users/zlp/desktop/defaultmapreduce");

		Job job = Job.getInstance(conf);
		job.setJarByClass(tool.getClass());
		FileSystem fs = FileSystem.get(conf);
		if (fs.exists(out)) {
			fs.delete(out, true);
		}
		FileInputFormat.addInputPath(job, in);
		FileOutputFormat.setOutputPath(job, out);
		return job;

	}

	public static void printUsage(Tool tool, String extraArgsUsage) {

		System.err.printf(tool.getClass().getSimpleName(), extraArgsUsage);
		GenericOptionsParser.printGenericCommandUsage(System.err);
	}
}

默认Streaming作业

ThisIsNobody

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MapReduce类型和格式(一) 类型

map函数的输出类型必须与reduce的输入类型相同，下面的K1,V1,K2,V2等为抽象类型map: (K1, V1) -&gt; (K2, V2)combiner: (K2, list(V2)) -&gt; list(K2, V2)reduce: (K2, list(V2)) -&gt; list(K3, V3)一般情况下，combiner和reduce的参数是相同的，即K2=K3, V2=V...
复制链接

扫一扫

专栏目录