7.大数据学习之旅——hadoop-MapReduce

本文详细介绍了Hadoop MapReduce的工作原理,包括序列化/反序列化、分区、Combiner、Map任务和Reducer任务的执行过程。强调了Partitioner的重要性,如何自定义分区规则以满足特定需求,以及Combiner作为内部Reducer的角色。MapReduce执行流程涵盖从客户端提交任务到JobTracker分配任务,再到TaskTracker执行Mapper和Reducer任务的整个过程。同时,文中还讨论了排序机制和数据本地化策略,为理解和优化MapReduce作业提供了基础。
摘要由CSDN通过智能技术生成

序列化/反序列化机制

当自定义一个类之后,如果想要产生的对象在hadoop中进行传输,那么需要
这个类实现Writable的接口进行序列化/反序列化
案例:统计每一个人产生的总流量

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class Flow implements Writable{
   

	private String phone;
	private String city;
	private String name;
	private int flow;

	public String getPhone() {
   
		return phone;
	}

	public void setPhone(String phone) {
   
		this.phone = phone;
	}

	public String getCity() {
   
		return city;
	}

	public void setCity(String city) {
   
		this.city = city;
	}

	public String getName() {
   
		return name;
	}

	public void setName(String name) {
   
		this.name = name;
	}

	public int getFlow() {
   
		return flow;
	}

	public void setFlow(int flow) {
   
		this.flow = flow;
	}

	// 反序列化
	@Override
	public void readFields(DataInput in) throws IOException {
   
		// 按照序列化的顺序一个一个将数据读取出来
		this.phone = in.readUTF();
		this.city = in.readUTF();
		this.name = in.readUTF();
		this.flow = in.readInt();
	}

	// 序列化
	@Override
	public void write(DataOutput out) throws IOException {
   
		// 按照顺序将属性一个一个的写出即可
		out.writeUTF(phone);
		out.writeUTF(city);
		out.writeUTF(name);
		out.writeInt(flow);
	}

}

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class FlowMapper extends Mapper<LongWritable, Text, Text, Flow> {
   

	public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
   
		
		String line = value.toString();
		
		String[] arr = line.split(" ");
		
		Flow f = new Flow();
		f.setPhone(arr[0]);
		f.setCity(arr[1]);
		f.setName(arr[2]);
		f.setFlow(Integer.parseInt(arr[3]));
		
		context.write(new Text(f.getPhone()), f);
		
	}

}

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class FlowReducer extends Reducer<Text, Flow, Text, IntWritable> {
   

	public void reduce(Text key, Iterable<Flow> values, Context context) throws IOException, InterruptedException {
   
		
		int sum = 0;
		String name = null;
		for (Flow val : values) {
   
			name = val.getName();
			sum += val.getFlow();
		}
		
		context.write(new Text(key.toString() + " " + name), new IntWritable(sum));
	}

}

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FlowDriver {
   

	public static void main(String[] args) throws Exception {
   
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf, "JobName");
		job.setJarByClass(cn.tedu.flow.FlowDriver.class);
		job.setMapperClass(FlowMapper.class);
		job.setReducerClass(FlowReducer.class);

		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Flow.class);
		
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值