Hadoop 源码详解之FileInputFormat
类【updating…】
1. 类释义
A base class for file-based InputFormats.
针对基于文件的InputFormats
一个基类
FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Implementations of FileInputFormat can also override the isSplitable(JobContext, Path) method to prevent input files from being split-up in certain situations. Implementations that may deal with non-splittable files must override this method, since the default implementation assumes splitting is always possible.
FileInputFormat
是一个基类对于素有基于文件的InputFormats
。这个类提供了一个一般的实现——getSplits(JobContext)
。FileInputFormat
的实现也覆写了isSplitable(JobContext,Path)
方法去阻止输入文件被文件在某些场景下被切割。 必须覆写这个方法才能同时实现不切割文件,因为默认的实现总是假设切割是可能的 。
2. 类源码
public abstract class FileOutputFormat<K, V> extends OutputFormat<K, V> {
...
}
3. 方法详解
3.1 setInputPaths()
方法
Sets the given comma separated paths as the list of inputs for the map-reduce job.
使用给定的逗号分隔路径作为 为map-reduce job的文件列表
static void setInputPaths(Job job, Path... inputPaths)
Set the array of Paths as the list of inputs for the map-reduce job.
注意,在调用这个方式时,可以看到有一个commaSeparate
,这个表明的就是后面可跟逗号分隔的文件列表。