1_HDFS编程

最新推荐文章于 2023-05-26 09:46:54 发布

oifengo

最新推荐文章于 2023-05-26 09:46:54 发布

阅读量187

点赞数 1

分类专栏： # 爬梯

本文链接：https://blog.csdn.net/weixin_39381833/article/details/107132330

版权

爬梯专栏收录该内容

47 篇文章 0 订阅

订阅专栏

前言

xxx

Configuration
dependency
artifactId
recursive
buffer

Maven

找Maven的网址：

https://mvnrepository.com/

GAV坐标

	<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
	<dependency>
		<groupId>mysql</groupId>
		<artifactId>mysql-connector-java</artifactId>
		<version>5.1.49</version>
	</dependency>

瘦包：仅仅包含源代码的包

Junit

JUnit4使用Java5中的注解（annotation），以下是JUnit4常用的几个annotation：

@Before：初始化方法对于每一个测试方法都要执行一次（注意与BeforeClass区别，后者是对于所有方法执行一次）
@After：释放资源对于每一个测试方法都要执行一次（注意与AfterClass区别，后者是对于所有方法执行一次）
@Test：测试方法，在这里可以测试期望异常和超时时间
@Test(expected=ArithmeticException.class)检查被测方法是否抛出ArithmeticException异常
@Ignore：忽略的测试方法
@BeforeClass：针对所有测试，只执行一次，且必须为static void
@AfterClass：针对所有测试，只执行一次，且必须为static void

执行顺序
一个JUnit4的单元测试用例执行顺序为：

@BeforeClass -> @Before -> @Test -> @After -> @AfterClass;

每一个测试方法的调用顺序为：

@Before -> @Test -> @After;

Before/AfterClass	Before/After
一个类中出现一次	可以出现多次
在类中只执行一次	在每个测试方法执行前后都会执行
声明必须为public static	必须为public 非static

@BeforeClass 和 @AfterClass 对于那些比较“昂贵”的资源的分配或者释放来说是很有效的，
因为他们只会在类中被执行一次。
相比之下对于那些需要在每次运行之前都要初始化或者在运行之后 
都需要被清理的资源来说使用@Before和@After同样是一个比较明智的选择。

IDEA灰色代表没有被调用成功

模板模式

在这里插入图片描述

log4j

在Hadoop框架里面找一个
消除单元测试红色提醒


Configuration: 
	core-default.xml,   
	core-site.xml, 
	hdfs-default.xml, 
	hdfs-site.xml, 
	mapred-default.xml, 
	mapred-site.xml, 
	yarn-default.xml, 
	yarn-site.xml
	
default就是jar里面自带的
site就是可以自己配置的

	参数我们是可以通过代码去设置
	也可以通过配置文件的方式

Maven配置

在这里插入图片描述

添加Hadoop的依赖到pom

添加阿里云&CDH镜像

  <repositories>
    <!-- 阿里云仓库 -->
    <repository>
      <id>aliyun</id>
      <url>http://maven.aliyun.com/nexus/content/groups/public</url>
    </repository>

    <!-- CDH仓库 -->
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

由于目前阿里云仓库中没有CDH 所以需要把CDH镜像也加上

Hadoop-client

在这里插入图片描述
只需要添加一个依赖即可
为了方便管理可以把hadoop的版本信息抽取出来，命名为hadoop.version

在这里插入图片描述

Junit

在这里插入图片描述

java 文件夹下写主要的方法
test文件夹下写测试类名字以 javaTest

模板模式

absc.java

package com.ifeng.bigdata.junit;

public abstract class absc {

    //控制方法，用来控制炒菜的流程，（流程一样的可以直接复用）
        //声明为final 不可更改流程
    final void cookProcess(){
        this.pourOil();
        this.HeatOil();
        this.pourVegetable();
        this.pourSauce();
        this.fry();
    }

    void pourOil(){
        System.out.println("倒油");
    }

    void HeatOil(){
        System.out.println("热油");
    }

    //倒菜的顺序不一样，所以声明为抽象方法 子类去实现
    abstract void pourVegetable();

    //调味料不一样，同理
    abstract void pourSauce();

    void fry(){
        System.out.println("炒菜");
    }
}

abscTest.java

package com.ifeng.bigdata.junit;

public class abscTest extends absc{


    @Override
    void pourVegetable() {
        System.out.println("下土豆");
    }

    @Override
    void pourSauce() {
        System.out.println("到酱油");
    }
}

在这里插入图片描述

HDFS API

在HDFS创建文件(夹)

package com.ifeng.bigdata.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.junit.jupiter.api.Test;

import java.net.URI;

public class HDFSAPITest {
    @Test
    public void mkdir() throws Exception{
        //配置HDFS的信息
        Configuration conf = new Configuration();
        URI uri = new URI("hdfs://10.103.66.15:9000");

        //获取HDFS客户端对象
        FileSystem fileSystem = FileSystem.get(uri,conf,"ifeng");

        //执行业务逻辑
        Path path = new Path("/hdfsapi1");
        fileSystem.mkdirs(path);

        fileSystem.close();
    }
}

调整为模板模式

package com.ifeng.bigdata.hadoop;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.net.URI;

public class HDFSAapiTest2_2 {

    FileSystem fileSystem;

    @Before
    public void setUp() throws Exception {
        Configuration conf = new Configuration();
        URI uri = new URI("hdfs://10.103.66.15:9000");
        fileSystem = FileSystem.get(uri, conf, "ifeng");
    }

    @After
    public void tearDown() throws Exception {
        if (null != fileSystem) {
            fileSystem.close();
        }
    }

    @Test
    public void mkdir() throws Exception {
        Path path = new Path("/hdfsapi/pk");
        fileSystem.mkdirs(path);
    }
}

copyFromLocalFile

    @Test
    public void copyFromLocalFile() throws Exception{
        Path src = new Path("data/ifengdata.txt");
        Path dst = new Path("/hdfsapi/pk");

        fileSystem.copyFromLocalFile(src,dst);
    }

在这里插入图片描述
搭建的时候已经指定了hdfs-site.xml 但是此处仍然有3个副本，原因:

在code层面走的是hdfs-default.xml
在这里插入图片描述

// 验证一下  结果为3
    @Test
    public void testReadlication() throws Exception{
        System.out.println(fileSystem.getConf().get("dfs.replication"));
    }

修改副本系数、

    @Before
    public void setUp() throws Exception {
        Configuration conf = new Configuration();
        conf.set("dfs.aplication","1");
        URI uri = new URI("hdfs://10.103.66.15:9000");
        fileSystem = FileSystem.get(uri, conf, "ifeng");
    }

参数可以通过code修改也可以通过配置文件修改
在这里插入图片描述
把环境中的
Configuration:

core-site.xml, 

hdfs-site.xml, 

mapred-site.xml, 

yarn-site.xml

copy到resources中即可，但是服务器参数变更可能带来此程序…

copyToLocalFile

    @Test
    public void copyToLocalFile() throws Exception {
        Path src = new Path("/hdfsapi/pk/ifengdata.txt");
        Path dst = new Path("out/ruozedata-3.txt");
        fileSystem.copyToLocalFile(true, src, dst);
    }

reName

    @Test
    public void rename() throws Exception {
        Path src = new Path("/hdfsapi/pk/ifengdata.txt");
        Path dst = new Path("/hdfsapi/pk/ruozedata-2.txt");
        fileSystem.rename(src, dst);
    }

listFiles

@Test
    public void listFiles() throws Exception{
        Path path = new Path("/hdfsapi/pk/ifengdata.txt");
        RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(path,true);
        while (files.hasNext()) {
            LocatedFileStatus fileStatus = files.next();
            String isDir = fileStatus.isDirectory() ? "文件夹" : "文件";
            String permission = fileStatus.getPermission().toString();
            short replication = fileStatus.getReplication();
            long length = fileStatus.getLen();
            String path1 = fileStatus.getPath().toString();

            System.out.println(isDir + "\t" + permission+ "\t" +replication
                    + "\t" + length + "\t" + path1);


            BlockLocation[] blockLocations = fileStatus.getBlockLocations();
            for(BlockLocation blockLocation : blockLocations) {
                String[] hosts = blockLocation.getHosts();
                for(String host : hosts) {
                    System.out.println(host);
                }
            }
        }
    }

在这里插入图片描述

delete

    @Test
    public void delete() throws Exception {
        fileSystem.delete(new Path("/hdfsapi/pk"), true);
    }

IO流方式做HDFS

FSDataOutputStream FileOutputStream

    @Test
    public void copyFromLocalFile() throws Exception{
        BufferedFSInputStream in = new BufferedFSInputStream(new FileInputStream(new File("data/ifengdata.txt")));
        FSDataInputStream out = new BufferedOutputStream(new FileOutputStream(new Path("/hdfsapi/pk/io.txt")));

        IOUtils.copyBytes(in,out,4096);
        
        IOUtils.closeStream(in);
        IOUtils.closeStream(out);
    }

分块读

第一块 0 ~ 128M

    @Test
    public void download01() throws Exception {
        FSDataInputStream in = fileSystem.open(new Path("/hdfsapi/spark-2.4.5-bin-2.6.0-cdh5.16.2.tgz"));
        FileOutputStream out = new FileOutputStream(new File("out/spark.tgz.part0"));
        // 0 -128M
        byte[] buffer = new byte[2048];
        for(int i=0; i<1024*128*1024; i++) {
            in.read(buffer);
            out.write(buffer);
        }

        IOUtils.closeStream(in);
        IOUtils.closeStream(out);
    }

第二块 128 ~