hadoop之MapReduce输入输出类

最新推荐文章于 2022-10-07 17:29:03 发布

weixin_33795743

最新推荐文章于 2022-10-07 17:29:03 发布

阅读量123

点赞数

文章标签：大数据

原文链接：http://www.cnblogs.com/zhanghuijunjava/archive/2013/04/27/3036537.html

版权

默认的输入TextInputFormat：

1）TextInputformat是默认的inputformat，对于输入文件。

2）文件中每一行作为一个记录，他将每一行在文件中的起始偏移量作为key，每一行的内容作为value。

3）默认以\n或回车键作为一行记录。

4）TextInputFormat继承了FileInputFormat。

Hadoop自带的输入类：

1）CombinarFileInputFormat：

相对于大量的小文件来说，hadoop更合适处理少量的大文件。

CombinarFileInputFormat可以缓解这个问题，它是针对小文件而设计的。

2）KeyValueTextInputFormat：

当输入数据的每一行是两列，并用tab分离的形式的时候，

KeyValueTextInputformat处理这种格式的文件非常适合。

3）NLineInputformat：

NLineInputformat可以控制在每个split中数据的行数。

4）SequenceFileInputformat：

当输入文件格式是sequencefile的时候，要使用SequenceFileInputformat作为输入。

自定义输入类格式：

1）继承FileInputFormat基类。

2）重写里面的isSplitable(FileSyatem fs,Path fileName)方法。

3）重写getRecordReader()方法。

public interface InputFormat<K, V>

{

InputSplit[] getSplits(JobConf job, int numSplits) throws IOException;

RecordReader<K, V> getRecordReader(InputSplit split,JobConf job,Reporter reporter) throws IOException;

}

Hadoop的输出类

1）TextOutputformat:

默认的输出格式，key和value中间值用tab隔开的。

2）SequenceFileOutputformat:

将key和value以sequencefile格式输出。

3）sequencefileAsOutputFormat:

将key和value以原始二进制的格式输出。

4）MapFileOutputFormat:

将key和value写入MapFile中。由于MapFile中的key是有序的，所以写入的时候必须保证记录

是按key值顺序写入的。

5）MultipleOutputFormat:

默认情况下一个reducer会产生一个输出，但是有些时候我们想一个reducer产生多个输出，

MultipleOutputFormat和MultipleOutputs可以实现这个功能。

转载于:https://www.cnblogs.com/zhanghuijunjava/archive/2013/04/27/3036537.html

weixin_33795743

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。