SequenceFile是专为mapreduce设计的,是可分割的二进制格式,以key/value对的形式存储。在存储日志文件时,每一行文本代表一条日志记录。纯文本不合适记录二进制类型的数据。SequenceFile可以作为小文件的容器。
write
先看下 在hadoop中如何写SequenceFile。
private static final String[] DATA = {
"One, two, buckle my shoe",
"Three, four, shut the door",
"Five, six, pick up sticks",
"Seven, eight, lay them straight",
"Nine, ten, a big fat hen"
};
public static void main(String[] args) throws IOException {
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://hadoop:9000");
FileSystem fs = FileSystem.get(configuration);
Path path = new Path("hdfs://hadoop:9000/hadoop/seq/numbers.seq");
IntWritable key = new IntWritable();
Text value = new Text();
SequenceFile.Writer.Option valueOption = SequenceFile.Writer.valueClass(value.getClass());
SequenceFile.Writer.Option keyOption = SequenceFile.Writer.keyClass(key.getClass());
SequenceFile.Writer.Option file = SequenceFile.Writer.file(path);
//指定了 file optiona 就不需要指定 stream
// SequenceFile.Writer.Option stream = SequenceFil