【hadoop】HDFS客户端操作和文件写数据源码解析

最新推荐文章于 2021-08-09 08:18:48 发布

SmallScorpion

最新推荐文章于 2021-08-09 08:18:48 发布

阅读量326

点赞数

分类专栏： Hadoop模块化学习文章标签：客户端写数据源码客户端API

本文链接：https://blog.csdn.net/qq_40180229/article/details/100517551

版权

Hadoop模块化学习专栏收录该内容

17 篇文章 0 订阅

订阅专栏

一、环境准备

创建Maven工程HdfsClient，并导入相应的依赖坐标+日志添加

<packaging>jar</packaging>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.8.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.0</version>
        </dependency>
        <dependency>
            <groupId>jdk.tools</groupId>
            <artifactId>jdk.tools</artifactId>
            <version>1.8</version>
            <scope>system</scope>
            <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
        </dependency>
    </dependencies>

在项目的src/main/resources目录下，新建一个文件，命名为“log4j.properties”。

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

在这里插入图片描述
3. 创建包名com.zt.hdfs并创建如上图类，在HdfsClient进行客户端测试

@Test
    public void testMkdirs() throws IOException, InterruptedException, URISyntaxException {

        // 1 获取文件系统
        Configuration configuration = new Configuration();
        // 配置在集群上运行
        // configuration.set("fs.defaultFS", "hdfs://hadoop102:9000");
        // FileSystem fs = FileSystem.get(configuration);

        FileSystem fs = FileSystem.get(new URI("hdfs://Lili01:9000"), configuration, "root");

        // 2 创建目录
        fs.mkdirs(new Path("/user"));

        // 3 关闭资源
        fs.close();
    

}

在这里插入图片描述
二、HDFS的API操作

API操作HDFS系统都是框架封装好的

文件下载(copyToLocalFile)

// 下载文件操作
    @Test
    public void testCopyToLocalFile() throws IOException, URISyntaxException, InterruptedException {

        // 1 获取文件系统
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop001:9000"), configuration, "root");

        // 2 执行下载操作
        // boolean delSrc 指是否将原文件删除
        // Path src 指要下载的文件路径
        // Path dst 指将文件下载到的路径
        // boolean useRawLocalFileSystem 是否开启文件校验
        fs.copyToLocalFile(false, new Path("/spring_mvc.txt"), new Path("d:/spring_mvc.txt"),
                true);

        // 3 关闭资源
        fs.close();

    }

在这里插入图片描述
2. 文件删除(fs.delete)

// 删除文件操作
    @Test
    public void testDelete() throws IOException, URISyntaxException, InterruptedException {

        // 1 获取文件系统
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop001:9000"), configuration, "root");

        // 2 执行删除
        fs.delete(new Path("/spring_mvc.txt"), true);

        // 3 关闭资源
        fs.close();

    }

修改文件(fs.rename)

// 修改文件操作
    @Test
    public void testRename() throws IOException, URISyntaxException, InterruptedException {

        // 1 获取文件系统
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop001:9000"), configuration, "root");

        // 2 修改文件名称
        fs.rename(new Path("/spring_mvc.txt"), new Path("/spring_mvc1.txt"));

        // 3 关闭资源
        fs.close();

    }

查看文件详情(fs.listFiles)

// 文件详情查看
    @Test
    public void testRemoteIterator() throws IOException, URISyntaxException, InterruptedException {
        // 1获取文件系统
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop001:9000"), configuration, "root");
        // 2 获取文件详情
        RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);

        while(listFiles.hasNext()){
            LocatedFileStatus status = listFiles.next();
            // 输出详情
            // 文件名称
            System.out.println(status.getPath().getName());
            // 长度
            System.out.println(status.getLen());
            // 权限
            System.out.println(status.getPermission());
            // 分组
            System.out.println(status.getGroup());
            // 获取存储的块信息
            BlockLocation[] blockLocations = status.getBlockLocations();
            for (BlockLocation blockLocation : blockLocations) {
                // 获取块存储的主机节点
                String[] hosts = blockLocation.getHosts();

                for (String host : hosts) {
                    System.out.println(host);
                }
            }
        }
        // 3 关闭资源
        fs.close();

    }

判断是否是文件操作(fs.listStatus)

 // 判断是否是文件操作
    @Test
    public void testFileStatus() throws IOException, URISyntaxException, InterruptedException {

        // 1 获取文件配置信息
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop001:9000"), configuration, "root");

        // 2 判断是文件还是文件夹
        FileStatus[] listStatus = fs.listStatus(new Path("/"));
        for (FileStatus fileStatus : listStatus) {
            // 如果是文件
            if (fileStatus.isFile()) {
                System.out.println("f:"+fileStatus.getPath().getName());
            }else {
                System.out.println("d:"+fileStatus.getPath().getName());
            }
        }
        // 3 关闭资源
        fs.close();

    }

三、HDFS的I/O流操作

采用IO流的方式实现数据的上传和下载

HDFS文件上传(create)

 // HDFS文件上传
    @Test
    public void IOPutFileToHDFS() throws IOException, InterruptedException, URISyntaxException {

        // 1 获取文件系统
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop001:9000"), configuration, "root");

        // 2 创建输入流
        FileInputStream fileInputStream = new FileInputStream(new File("e:/spring_mvc.txt"));

        // 3 获取输出流
        FSDataOutputStream fsDataOutputStream = fs.create(new Path("/spring_mvc.txt"));

        // 4 流对拷
        IOUtils.copyBytes(fileInputStream, fsDataOutputStream, configuration);
        // 5 关闭资源
        IOUtils.closeStream(fsDataOutputStream);
        IOUtils.closeStream(fileInputStream);
        fs.close();
    }

在这里插入图片描述
2. 文件下载(open)

 // 文件下载
    @Test
    public void IOGetFileFromHDFS() throws IOException, InterruptedException, URISyntaxException{
        // 1 获取文件系统
        Configuration configuration = new Configuration();
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://hadoop001:9000"), configuration, "root");

        // 2 获取输入流
        FSDataInputStream fsDataInputStream = fileSystem.open(new Path("/spring_mvc.txt"));

        // 3 获取输出流
        FileOutputStream fileOutputStream = new FileOutputStream(new File("d:/spring_mvc.txt"));

        // 4 流的对拷
        IOUtils.copyBytes(fsDataInputStream, fileOutputStream, configuration);
        // 5 关闭资源
        IOUtils.closeStream(fileOutputStream);
        IOUtils.closeStream(fsDataInputStream);
        fileSystem.close();
    }

四、HDFS写数据源码解析

图解

1）客户端通过Distributed FileSystem模块向NameNode请> 求上传文件，NameNode检查目标文件是否已存在，父目录是否存在。
2）NameNode返回是否可以上传。
3）客户端请求第一个 Block上传到哪几个DataNode服务器上。
4）NameNode返回3个DataNode节点，分别为dn1、dn2、dn3。
5）客户端通过FSDataOutputStream模块请求dn1上传数据，dn1收到请求会继续调用dn2，然后dn2调用dn3，将这个通信管道建立完成。
6）dn1、dn2、dn3逐级应答客户端。
7）客户端开始往dn1上传第一个Block（先从磁盘读取数据放到一个本地内存缓存），以Packet为单位，dn1收到一个Packet就会传给dn2，dn2传给dn3；dn1每传一个packet会放入一个应答队列等待应答。
8）当一个Block传输完成之后，客户端再次请求NameNode上传第二个Block的服务器。（重复执行3-7步）。

权威指南

在这里插入图片描述
续：

续：

续：

3. 源码解析
xmind图解：

续：

续：

画图图解：
1. 创建文件

2. 写入文件

故障解决图

续：

续：

总结：

在hadoop源码风格中作了大量的判断，保证了hadoop的可靠性和容错性，在看源码的时候，关注点应该在他调用一个方法所返回的对象，然后根据返回的对象名和调用的方法名去深入一步一步的理解，不需要太着重观看源码的带判断代码的成分，如 DistributedFileSystem
在第一部分中我们找寻源码不管是从FileSystemLinkResolver 还是this.dfs.createWrappedInputStream(dfsis) 和DistributedFileSystem.this.dfs.open
深入时观看源码中方法所返回的对象名所持有的方法

not learn to live:

The combination of two people caused by their outlooks is going to rapidly fade away .
The combination of two people out of their heart will last forever.
(情始于颜将转瞬即逝，爱发乎心必白头偕老。)