写操作
根据上一篇的介绍,在hadoop2.x之后,Hadoop中的SequenceFile.Writer将会逐渐摒弃大量的createWriter()重载方法,而整合为更为简洁的createWriter()
方法,除了配置参数外,其他的参数统统使用SequenceFile.Writer.Option来替代,具体有:
新的API里提供的option参数:
FileOption
FileSystemOption
StreamOption
BufferSizeOption
BlockSizeOption
ReplicationOption
KeyClassOption
ValueClassOption
MetadataOption
ProgressableOption
CompressionOption
这些参数能够满足各种不同的需要,参数之间不存在顺序关系,这样减少了代码编写工作量,更为直观,便于理解,下面先来看看这个方法,后边将给出一个具体实例。
createWriter
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf,org.apache.hadoop.io.SequenceFile.Writer.Option opts)
throws IOException
Create a new Writer with the given options.
Parameters:
conf - the configuration to use
opts - the options to create the file with
Returns:
a new Writer
Throws:
IOException
(以下实例已经亲测修改)
权威指南第四版中提供了一个SequenceFileWriteDemo实例:
-
- import java.io.IOException;
- import java.net.URI;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.FileSystem;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IOUtils;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.SequenceFile;
- import org.apache.hadoop.io.Text;
-
-
- public class SequenceFileWriteDemo {
-
- private static final String[] DATA = {
- "One, two, buckle my shoe",
- "Three, four, shut the door",
- "Five, six, pick up sticks",
- "Seven, eight, lay them straight",
- "Nine, ten, a big fat hen"
- };
-
- public static void main(String[] args) throws IOException {
- String uri = "file:///E://IDEA//aa.txt";
-
- Configuration conf = new Configuration();
- conf.set("fs.default.name","hdfs://172.16.11.222:9000");
-
- FileSystem fs = FileSystem.get(URI.create(uri), conf);
- Path path = new Path(uri);
-
- IntWritable key = new IntWritable();
- Text value = new Text();
- SequenceFile.Writer writer = null;
- try {
- writer = SequenceFile.createWriter(fs, conf, path,
- key.getClass(), value.getClass());
-
- for (int i = 0; i < 100; i++) {
- key.set(100 - i);
- value.set(DATA[i % DATA.length]);
- System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
- writer.append(key, value);
- }
- } finally {
- IOUtils.closeStream(writer);
- }
- }
- }
-
对于上面实例中的createWriter()
方法用整合之后的最新的方法来改写一下,代码如下:
- package org.apache.hadoop.io;
-
- import java.io.IOException;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IOUtils;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.SequenceFile;
- import org.apache.hadoop.io.SequenceFile.Writer;
- import org.apache.hadoop.io.SequenceFile.Writer.FileOption;
- import org.apache.hadoop.io.SequenceFile.Writer.KeyClassOption;
- import org.apache.hadoop.io.SequenceFile.Writer.ValueClassOption;
- import org.apache.hadoop.io.Text;
-
- public class THT_testSequenceFile2 {
-
- private static final String[] DATA = { "One, two, buckle my shoe",
- "Three, four, shut the door", "Five, six, pick up sticks",
- "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };
-
- public static void main(String[] args) throws IOException {
-
-
String uri = "file:///E://IDEA//bb.txt"
- Configuration conf = new Configuration();
- conf.set("fs.default.name", "hdfs://172.16.11.222:9000");
- Path path = new Path(uri);
-
- IntWritable key = new IntWritable();
- Text value = new Text();
- SequenceFile.Writer writer = null;
- SequenceFile.Writer.FileOption option1 = (FileOption) Writer.file(path);
- SequenceFile.Writer.KeyClassOption option2 = (KeyClassOption) Writer.keyClass(key.getClass());
- SequenceFile.Writer.ValueClassOption option3 = (ValueClassOption) Writer.valueClass(value.getClass());
-
- try {
-
- writer = SequenceFile.createWriter( conf, option1,option2,option3,Writer.compression(CompressionType.RECORD));
-
- for (int i = 0; i < 10; i++) {
- key.set(1 + i);
- value.set(DATA[i % DATA.length]);
- System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,
- value);
- writer.append(key, value);
- }
- } finally {
- IOUtils.closeStream(writer);
- }
- }
- }
运行结果如下:
- 2017-07-20 22:15:05,027 INFO compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
- [128] 1 One, two, buckle my shoe
- [173] 2 Three, four, shut the door
- [220] 3 Five, six, pick up sticks
- [264] 4 Seven, eight, lay them straight
- [314] 5 Nine, ten, a big fat hen
- [359] 6 One, two, buckle my shoe
- [404] 7 Three, four, shut the door
- [451] 8 Five, six, pick up sticks
- [495] 9 Seven, eight, lay them straight
- [545] 10 Nine, ten, a big fat hen
生成的文件:
读操作
新的API里提供的option参数:
FileOption -表示读哪个文件
InputStreamOption
StartOption
LengthOption -按照设置的长度变量来决定读取的字节
BufferSizeOption
OnlyHeaderOption
根据最新的API直接上源码:
- package org.apache.hadoop.io;
-
- import java.io.IOException;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IOUtils;
- import org.apache.hadoop.io.SequenceFile;
- import org.apache.hadoop.io.SequenceFile.Reader;
- import org.apache.hadoop.io.Writable;
- import org.apache.hadoop.util.ReflectionUtils;
-
- public class THT_testSequenceFile3 {
-
- public static void main(String[] args) throws IOException {
-
- String uri = "file:///E://IDEA//bb.txt";
- Configuration conf = new Configuration();
- Path path = new Path(uri);
- SequenceFile.Reader.Option option1 = Reader.file(path);
- SequenceFile.Reader.Option option2 = Reader.length(174);
- SequenceFile.Reader reader = null;
- try {
- reader = new SequenceFile.Reader(conf,option1,option2);
- Writable key = (Writable) ReflectionUtils.newInstance(
- reader.getKeyClass(), conf);
- Writable value = (Writable) ReflectionUtils.newInstance(
- reader.getValueClass(), conf);
- long position = reader.getPosition();
- while (reader.next(key, value)) {
- String syncSeen = reader.syncSeen() ? "*" : "";
- System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,
- value);
- position = reader.getPosition();
- }
- } finally {
- IOUtils.closeStream(reader);
- }
- }
- }
我这儿设置了一个读取长度的参数,只读到第174个字节那,所以运行结果如下:
- 2017-07-20 22:15:05,089 INFO compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate]
- [128] 1 One, two, buckle my shoe
- [173] 2 Three, four, shut the door