hadoop中带后缀的多路输出part文件

【导读】

    hadoop源代码中org.apache.hadoop.mapred.lib.MultipleTextOutputFormat类实现了多路输出的框架。在此基础上,可以实现自定义的多路输出方案。

本篇给出了带后缀的多路输出方案。比如part-00000-[A-Z], part-00000-[a-z], part-00000-[0-9]。

【正文】
1.  约定reduce输出时的数据格式:

<key, value>#suffix_letter

suffix_letter表示后缀字母,目前支持[0-9A-Za-z]共62个字符。可以是常量,也可以是变量。

value可以为空。

通俗地说,只需要在输出的一行结尾加上“#suffix_letter” 就能支持多路

 

2. 实现

SuffixMultipleTextOutputFormat.java

package com.**.hadoop.mapred.lib;

import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat;
import org.apache.hadoop.io.Text;

public class SuffixMultipleTextOutputFormat extends MultipleTextOutputFormat<Text, Text> {
	
	private int tagPos = -1;
	
	private boolean has_value = true;
	
	public void test(Text key, Text value, String name) {
		System.out.println("file_name = " + this.generateFileNameForKeyValue(key, value, name));
		System.out.println("key = " + this.generateActualKey(key, value));
		System.out.println("value = " + this.generateActualValue(key, value));
	}
	
	@Override
	protected Text generateActualKey(Text key, Text value) {
		if (!has_value && tagPos != -1) {
			return new Text(key.toString().substring(0, tagPos));
		}
		return key;
	}
	
	@Override
	protected Text generateActualValue(Text key, Text value) {
		if (has_value && tagPos != -1) {
			return new Text(value.toString().substring(0, tagPos));
		}
		return value;
	}
	
	@Override  
	protected String generateFileNameForKeyValue(Text key, Text value, String name) {
		String val = value.toString();
		if (val.isEmpty()) {
			has_value = false;
			val = key.toString();
		}
		
		try {
			int pos = val.lastIndexOf('#');
			if (pos >= 0 && pos == val.length() - 2) {
				char suffix = val.charAt(pos+1);
				if (Character.isDigit(suffix) || Character.isLetter(suffix)) {
					tagPos = pos;
					
					return name + "-" + suffix;
				} else {
					throw new InvalidSuffixMultipleTextOutputFormatException("InvalidSuffixMultipleTextOutputFormatException : key = " + key.toString() + " , value = " + value.toString());
				}
			} else {
				throw new InvalidSuffixMultipleTextOutputFormatException("InvalidSuffixMultipleTextOutputFormatException : key = " + key.toString() + " , value = " + value.toString());
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
		
		return name;
	}
	
	public static void main(String [] args) {
		new SuffixMultipleTextOutputFormat().test(new Text("abc"), new Text("#i"), "part-00000");
		new SuffixMultipleTextOutputFormat().test(new Text("abc"), new Text("#"), "part-00001");
		new SuffixMultipleTextOutputFormat().test(new Text("abc"), new Text("w#o"), "part-00001");
		new SuffixMultipleTextOutputFormat().test(new Text("abc#0"), new Text(""), "part-00001");
		new SuffixMultipleTextOutputFormat().test(new Text("abc#0"), new Text("a"), "part-00001");
	}

}

InvalidSuffixMultipleTextOutputFormatException.java

package com.**.hadoop.mapred.lib;

public class InvalidSuffixMultipleTextOutputFormatException extends Exception {

	private static final long serialVersionUID = -7900596082142417867L;

	public InvalidSuffixMultipleTextOutputFormatException(String error) {
		super(error);
	}
		
}



 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值