自定义实现mapreduce计算的value类型

1. 在进行mapreduce编程时其Hadoop内置的数据类型不能满足需求时,或针对用例优化自定义 数据类型可能执行的更好.

    因此可以通过实现org.apache.hadoop.io.Writable接口定义自定义的Writable类型,使其作为mapreduce计算的value类型。

2. 通过查看源码中org.apache.hadoop.io.Writable接口明确具体实现的实例。

public class MyWritable implements Writable {
        // Some data     
        private int counter;
        private long timestamp;
        
        public void write(DataOutput out) throws IOException {
          out.writeInt(counter);
          out.writeLong(timestamp);
        }
        
        public void readFields(DataInput in) throws IOException {
          counter = in.readInt();
          timestamp = in.readLong();
        }
        
        public static MyWritable read(DataInput in) throws IOException {
          MyWritable w = new MyWritable();
          w.readFields(in);
          return w;
        }
}
3. 自实现自定义的Writable类型是也要注意以下几点:
    3.1 如果要添加一个自定义的构造函数用于自定义的Writable类一定要保持默认的空构造函数。
    3.2 如果使用TextOutputFormat序列化自定义Writable类型的实例。要确保用于自定义的Writable数据类型有一个有意义的toString()实现。
    3.3 在读取输入数据时,Hadoop课重复使用Writable类的一个实例。在readFileds()方法里面填充字段时,不应该依赖与该对象的现 有状态。
4. 下面通过一个具体的《自定义类型处理手机上网日志》实例来感受一下自定义的Writable类型。
   4.1 数据文件名为:HTTP_20130313143750.dat(可从网上下载)。

   4.2 数据样本:1363157985066     13726230503    00-FD-07-A4-72-B8:CMCC    120.196.100.82    i02.c.aliimg.com        24    27    2481    24681    200

   4.3 数据结构类型:

       

    4.4 我们主要提取的是手机号、上行数据包、下行数据包、上行总流量、下行总流量。 (无论是发送请求还是返回请求都会产生数据包和流量)

5.Mapreduce程序的具体实现。

   5.1自定义数据处理类型。

public class DataWritable implements Writable {
	// upload
	private int upPackNum;
	private int upPayLoad;

	// downLoad
	private int downPackNum;
	private int downPayLoad;

	public DataWritable() {
	}

	public void set(int upPackNum, int upPayLoad, int downPackNum,
			int downPayLoad) {
		this.upPackNum = upPackNum;
		this.upPayLoad = upPayLoad;
		this.downPackNum = downPackNum;
		this.downPayLoad = downPayLoad;
	}

	public int getUpPackNum() {
		return upPackNum;
	}

	public int getUpPayLoad() {
		return upPayLoad;
	}

	public int getDownPackNum() {
		return downPackNum;
	}

	public int getDownPayLoad() {
		return downPayLoad;
	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.upPackNum = in.readInt();
		this.upPayLoad = in.readInt();
		this.downPackNum = in.readInt();
		this.downPayLoad = in.readInt();
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeInt(upPackNum);
		out.writeInt(upPayLoad);
		out.writeInt(downPackNum);
		out.writeInt(downPayLoad);
	}

	@Override
	public String toString() {
		return upPackNum + "\t" + upPayLoad //
				+ "\t" + downPackNum + //
				"\t" + downPayLoad;
	}
}
    5.2 Mapper函数。 
static class DataTotalMapper extends
			Mapper<LongWritable, Text, Text, DataWritable> {
		private Text mapOutputKey = new Text();
		private DataWritable dataWritable = new DataWritable();

		public void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			String lineValue = value.toString();
			// split
			String[] strs = lineValue.split("\t");
			// get data
			String phoneNum = strs[1];
			int upPackNum = Integer.valueOf(strs[6]);
			int downPackNum = Integer.valueOf(strs[7]);
			int upPayLoad = Integer.valueOf(strs[8]);
			int downPayLoad = Integer.valueOf(strs[9]);
			// set map output key / value
			if (phoneNum.length() == 11)//确保处理的都是手机数据
				mapOutputKey.set(phoneNum);
			dataWritable.set(upPackNum, upPayLoad, downPackNum, downPayLoad);
			context.write(mapOutputKey, dataWritable);
		}
	}
    5.3 Reduce函数。
static class DataTotalReducer extends
			Reducer<Text, DataWritable, Text, DataWritable> {
		private DataWritable dataWritable = new DataWritable();

		public void reduce(Text key, Iterable<DataWritable> values,
				Context context) throws IOException, InterruptedException {
			int upPackNum = 0;
			int downPackNum = 0;
			int upPayLoad = 0;
			int downPayLoad = 0;
			for (DataWritable data : values) {
				upPackNum += data.getUpPackNum();
				downPackNum += data.getDownPackNum();
				upPayLoad += data.getUpPayLoad();
				downPayLoad += data.getDownPayLoad();
			}
			dataWritable.set(upPackNum, upPayLoad, downPackNum, downPayLoad);
			context.write(key, dataWritable);
		}
	}
     5.4 主函数
public class DataTotalPhone {
	static final String INPUT_PATH = "hdfs://192.168.56.171:9000/DataPhone/HTTP_20130313143750.dat";
	static final String OUT_PATH = "hdfs://192.168.56.171:9000/DataPhone/out";

	public static void main(String[] args) throws ClassNotFoundException,
			IOException, InterruptedException {
		Configuration conf = new Configuration();
		final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
		final Path outPath = new Path(OUT_PATH);
		if (fileSystem.exists(outPath)) {
			fileSystem.delete(outPath, true);
		}
		// create job
		Job job = new Job(conf, DataTotalPhone.class.getSimpleName());
		// set job
		job.setJarByClass(DataTotalMapper.class);
		// 1)input
		Path inputDir = new Path(args[0]);
		FileInputFormat.addInputPath(job, inputDir);
		// 2)map
		job.setMapperClass(DataTotalMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(DataWritable.class);
		// 3)reduce
		job.setReducerClass(DataTotalReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(DataWritable.class);
		// 4)output
		Path outputDir = new Path(args[1]);
		FileOutputFormat.setOutputPath(job, outputDir);
		boolean isSuccess = job.waitForCompletion(true);
		return isSuccess ? 0 : 1;
	}
}
6. 程序运行后结果。

  

 

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
MapReduce是一种大数据处理框架,它能够在分布式集群上进行并行计算。其自定义比较器是MapReduce的一种功能,允许用户定义自己的比较方法来排序输出结果。 使用自定义比较器可以实现复杂的排序逻辑,而不是仅仅使用基本的字典序排序。例如,可以使用自定义比较器来按照日期、数字或其他自定义字段排序。 使用自定义比较器的方法是在MapReduce程序实现自定义比较器类,并实现的compare方法。然后,在MapReduce作业的配置设置自定义比较器类。 例如,以下是一个使用自定义比较器的MapReduce程序的示例: ```java import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class MyMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class MyReducer extends Reducer<Text,IntWritable,Text,IntWritable>

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值