Hadoop中MapReduce基本案例及代码（三）

最新推荐文章于 2023-08-20 13:15:08 发布

肉装法师

最新推荐文章于 2023-08-20 13:15:08 发布

阅读量229

点赞数

分类专栏： Hadoop

本文链接：https://blog.csdn.net/weixin_41772761/article/details/104405074

版权

Hadoop 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

分区Partitioner

分区操作是shuffle操作中的一个重要过程，作用就是将map的结果按照规则分发到不同reduce中进行处理，从而按照分区得到多个输出结果。
Partitioner是partitioner的基类，如果需要定制partitioner也需要继承该类HashPartitioner是mapreduce的默认partitioner。
计算方法是：which reducer=(key.hashCode() & Integer.MAX_VALUE) % numReduceTasks
注：默认情况下，reduceTask数量为1 很多时候MR自带的分区规则并不能满足我们需求，为了实现特定的效果，可以需要自己来定义分区规则。
在这里插入图片描述

案例：根据城市区分，来统计每一个城市中每一个人产生的流量

数据源
手机号|城市|姓名|流量
自定义Flow类
与上一节所讲一样，自定义类实现Writable接口，重写其中readFields(),write()方法。详情看上一节。

Mapper类

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class FlowMapper extends Mapper<LongWritable, Text, Text, Flow> {

	public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

		String line = value.toString();

		String[] arr = line.split(" ");

		Flow f = new Flow();
		f.setPhone(arr[0]);
		f.setCity(arr[1]);
		f.setName(arr[2]);
		f.setFlow(Integer.parseInt(arr[3]));

		context.write(new Text(f.getPhone()), f);
	}
}

Partitoner类

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

public class FlowPartitioner extends Partitioner<Text, Flow> {

	@Override
	public int getPartition(Text key, Flow value, int numPartitions) {
		
		String city = value.getCity();
		
		if(city.equals("bj"))
			return 0;
		else if(city.equals("sh"))
			return 1;
		else 
			return 2;
	}
}

Reduce类

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class FlowReducer extends Reducer<Text, Flow, Text, IntWritable> {

	public void reduce(Text key, Iterable<Flow> values, Context context) throws IOException, InterruptedException {
		
		int sum = 0;
		
		for (Flow val : values) {
			sum += val.getFlow();
		}
		context.write(key, new IntWritable(sum));
	}
}

驱动类

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FlowDriver {

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf, "JobName");
		job.setJarByClass(cn.tedu.flow2.FlowDriver.class);
		job.setMapperClass(FlowMapper.class);
		job.setReducerClass(FlowReducer.class);
		
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Flow.class);

		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		// 指定分区
		job.setPartitionerClass(FlowPartitioner.class);
		job.setNumReduceTasks(3);

		FileInputFormat.setInputPaths(job, new Path("hdfs://172.8.8.8:9000/mr/flow.txt"));
		FileOutputFormat.setOutputPath(job, new Path("hdfs://172.8.8.8:9000/fpresult"));

		if (!job.waitForCompletion(true))
			return;
	}

}

肉装法师

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop中MapReduce基本案例及代码（三）

分区Partitioner分区操作是shuffle操作中的一个重要过程，作用就是将map的结果按照规则分发到不同reduce中进行处理，从而按照分区得到多个输出结果。Partitioner是partitioner的基类，如果需要定制partitioner也需要继承该类HashPartitioner是mapreduce的默认partitioner。计算方法是：which reducer=(ke...
复制链接

扫一扫

专栏目录