《hadoop权威指南》学习笔记-hadoop I/O之SequenceFile

SequenceFile是一种基于文件的数据结构,专门用于存贮大文件。其特点就是利用二进制键值对存储数据

一、SequenceFile写操作

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;

// vv SequenceFileWriteDemo
public class SequenceFileWriteDemo {
  
  private static final String[] DATA = {
    "One, two, buckle my shoe",
    "Three, four, shut the door",
    "Five, six, pick up sticks",
    "Seven, eight, lay them straight",
    "Nine, ten, a big fat hen"
  };
  
  public static void main(String[] args) throws IOException {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    Path path = new Path(uri);

    IntWritable key = new IntWritable();
    Text value = new Text();
    SequenceFile.Writer writer = null;
    try {
      writer = SequenceFile.createWriter(fs, conf, path,
          key.getClass(), value.getClass());
      
      for (int i = 0; i < 100; i++) {
        key.set(100 - i);
        value.set(DATA[i % DATA.length]);
        System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
        writer.append(key, value);
      }
    } finally {
      IOUtils.closeStream(writer);
    }
  }
}

SequenceFile的输入比较简单,就是通过SequenceFile.createWriter创建一个实例,利用这个实例的append方法可以按照键值对的形式写入数据

截取前面一部分运行结果。

$ hadoop SequenceFileWriteDemo numbers.seq
13/11/06 21:48:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/11/06 21:48:39 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/11/06 21:48:39 INFO compress.CodecPool: Got brand-new compressor
[128] 100 One, two, buckle my shoe
[173] 99 Three, four, shut the door
[220] 98 Five, six, pick up sticks
[264] 97 Seven, eight, lay them straight
[314] 96 Nine, ten, a big fat hen
[359] 95 One, two, buckle my shoe
[404] 94 Three, four, shut the door
[451] 93 Five, six, pick up sticks
[495] 92 Seven, eight, lay them straight
[545] 91 Nine, ten, a big fat hen
[590] 90 One, two, buckle my shoe
[635] 89 Three, four, shut the door

二、读取SequenceFile

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.util.ReflectionUtils;

// vv SequenceFileReadDemo
public class SequenceFileReadDemo {
  
  public static void main(String[] args) throws IOException {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    Path path = new Path(uri);

    SequenceFile.Reader reader = null;
    try {
      reader = new SequenceFile.Reader(fs, path, conf);
      Writable key = (Writable)
        ReflectionUtils.newInstance(reader.getKeyClass(), conf);
      Writable value = (Writable)
        ReflectionUtils.newInstance(reader.getValueClass(), conf);
/**如此一来,我们就不需要知道具体的文件数据类型是什么,全部利用Writable进行读取,注意此处只是先将key和value实例化了,但里边是没有任何数据的。需要注意如何 *通过调用getKeyClass()和getValueClass()得到SequenceFile.Reader找到的类型,然后RflectionUtils用来创建键、值的实例*/
      long position = reader.getPosition();
      while (reader.next(key, value)) {
        String syncSeen = reader.syncSeen() ? "*" : "";
        System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key, value);
        position = reader.getPosition(); // beginning of next record
      }
    } finally {
      IOUtils.closeStream(reader);
    }
/**next才将key和value赋予了真正的值,然后syncSeen()返回true当且仅当先前调用next时经过了一个同步标志,注意next是一条条地读取的,但同步标识不是每一条记录后 *边都有,而是一个数据块后才会有,所以经过多条记录才会出现一个同步标志*/
  }
}

运行结果:

$ hadoop SequenceFileReadDemo numbers.seq
13/11/06 21:50:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/11/06 21:50:44 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/11/06 21:50:44 INFO compress.CodecPool: Got brand-new decompressor
[128] 100 One, two, buckle my shoe
[173] 99 Three, four, shut the door
[220] 98 Five, six, pick up sticks
[264] 97 Seven, eight, lay them straight
[314] 96 Nine, ten, a big fat hen
[359] 95 One, two, buckle my shoe
[404] 94 Three, four, shut the door
[451] 93 Five, six, pick up sticks
[495] 92 Seven, eight, lay them straight
[545] 91 Nine, ten, a big fat hen
[590] 90 One, two, buckle my shoe
[635] 89 Three, four, shut the door
[682] 88 Five, six, pick up sticks
[726] 87 Seven, eight, lay them straight
[776] 86 Nine, ten, a big fat hen
[821] 85 One, two, buckle my shoe
[866] 84 Three, four, shut the door
[913] 83 Five, six, pick up sticks
[957] 82 Seven, eight, lay them straight
[1007] 81 Nine, ten, a big fat hen
[1052] 80 One, two, buckle my shoe
[1097] 79 Three, four, shut the door
[1144] 78 Five, six, pick up sticks
[1188] 77 Seven, eight, lay them straight
[1238] 76 Nine, ten, a big fat hen
[1283] 75 One, two, buckle my shoe
[1328] 74 Three, four, shut the door
[1375] 73 Five, six, pick up sticks
[1419] 72 Seven, eight, lay them straight
[1469] 71 Nine, ten, a big fat hen
[1514] 70 One, two, buckle my shoe
[1559] 69 Three, four, shut the door
[1606] 68 Five, six, pick up sticks
[1650] 67 Seven, eight, lay them straight
[1700] 66 Nine, ten, a big fat hen
[1745] 65 One, two, buckle my shoe
[1790] 64 Three, four, shut the door
[1837] 63 Five, six, pick up sticks
[1881] 62 Seven, eight, lay them straight
[1931] 61 Nine, ten, a big fat hen
[1976] 60 One, two, buckle my shoe
[2021*] 59 Three, four, shut the door
[2088] 58 Five, six, pick up sticks
[2132] 57 Seven, eight, lay them straight
[2182] 56 Nine, ten, a big fat hen
[2227] 55 One, two, buckle my shoe
[2503] 49 Three, four, shut the door
[2550] 48 Five, six, pick up sticks
[2272] 54 Three, four, shut the door
[2319] 53 Five, six, pick up sticks
[2363] 52 Seven, eight, lay them straight
[2413] 51 Nine, ten, a big fat hen
[2458] 50 One, two, buckle my shoe
[2594] 47 Seven, eight, lay them straight
[2644] 46 Nine, ten, a big fat hen
[2689] 45 One, two, buckle my shoe
[2734] 44 Three, four, shut the door
[2781] 43 Five, six, pick up sticks
[2825] 42 Seven, eight, lay them straight
[2875] 41 Nine, ten, a big fat hen

这部分后边还讲到MapFile,其实将SequenceFile经过排序之后就是MapFile,所以一个MapFile包含两个文件,一个文件是SequenceFile文件,还有一个是索引文件,所以写文件的方式和SequenceFile完全一样。MapFile的读取是可以指定读取位置的(具体书上有介绍),而且将SequenceFile文件转化为MapFile的方式也很简单,就是添加一个索引文件。

上面只是我在读书的时候做的一点批注,具体知识还需要去看书才可以,只是边看书边写下自己的理解会很有收获。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值