Hadoop 源码详解之FileInputFormat类

最新推荐文章于 2022-07-12 08:35:33 发布

说文科技

最新推荐文章于 2022-07-12 08:35:33 发布

阅读量804

点赞数

分类专栏： # Hadoop

喜欢文章？请私信联系作者。

本文链接：https://blog.csdn.net/liu16659/article/details/85198956

版权

Hadoop 专栏收录该内容

52 篇文章 2 订阅

订阅专栏

Hadoop 源码详解之`FileInputFormat`类【updating…】

1. 类释义

A base class for file-based InputFormats.
针对基于文件的 InputFormats 一个基类

FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Implementations of FileInputFormat can also override the isSplitable(JobContext, Path) method to prevent input files from being split-up in certain situations. Implementations that may deal with non-splittable files must override this method, since the default implementation assumes splitting is always possible.
FileInputFormat 是一个基类对于素有基于文件的InputFormats。这个类提供了一个一般的实现——getSplits(JobContext)。FileInputFormat的实现也覆写了isSplitable(JobContext,Path)方法去阻止输入文件被文件在某些场景下被切割。必须覆写这个方法才能同时实现不切割文件，因为默认的实现总是假设切割是可能的。

2. 类源码

public abstract class FileOutputFormat<K, V> extends OutputFormat<K, V> {
...
}

3. 方法详解

3.1 `setInputPaths()`方法

Sets the given comma separated paths as the list of inputs for the map-reduce job.
使用给定的逗号分隔路径作为为map-reduce job的文件列表

static void 	setInputPaths(Job job, Path... inputPaths)
Set the array of Paths as the list of inputs for the map-reduce job.

在这里插入图片描述注意，在调用这个方式时，可以看到有一个commaSeparate，这个表明的就是后面可跟逗号分隔的文件列表。

说文科技

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录