hadoop 权威指南阅读笔记（一）

最新推荐文章于 2021-05-27 10:54:34 发布

yaoxiao_happy2

最新推荐文章于 2021-05-27 10:54:34 发布

阅读量783

点赞数

分类专栏：专著阅读文章标签： hadoop 阅读权威指南

本文链接：https://blog.csdn.net/yaoxiao_happy2/article/details/17752391

版权

专著阅读专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1.hadoop提供完整的文件读写API 具体可参考很多的博客（hadoop文件操作）

2.对于hdfs上某个文件的全部信息如path，目录下文件，传见日期等等，hadoop提供了FileStats对象可以获取这些信息

可以使用一些API很方便地拿到FileStatus如

fs.getFileStatus(new Path("filePath"));

另外可以使用listStatus方法达到便利文件的目的：

fs.listStatus(new Path(dst))

FileUtil.stat2Paths方法，可以拿到文件路径。（但只针对FileStatus数组）

FileStatus[] status = fs.listStatus(paths);
Path[] listedPaths = FileUtil.stat2Paths(status);

3.FileSystem有delete方法来删除hdfs上的文件。

4.FSDataInputStream 和 FSDataOutputStream是用来支持hdfs文件读写的类

public class FileSystemDoubleCat {
}
public static void main(String[] args) throws Exception {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    FSDataInputStream in = null;
    try {
        in = fs.open(new Path(uri));
        IOUtils.copyBytes(in, System.out, 4096, false);
        in.seek(0); // go back to the start of the file
        IOUtils.copyBytes(in, System.out, 4096, false);
    } finally {
        IOUtils.closeStream(in);
    }
}

注意上面代码的seek方法，使hdfs文件被输出两遍！

另外，FSDataInputStream 支持读入一部分字节因为他实现了PositionedReadable接口：

public interface PositionedReadable {
public int read(long position, byte[] buffer, int offset, int length)
throws IOException;
public void readFully(long position, byte[] buffer, int offset, int length)
throws IOException;
}
public void readFully(long position, byte[] buffer) throws IOException;