Windows环境下安装Hadoop3.2.1+HBase2.2.5+本地开发-CSDN博客

本文链接：https://blog.csdn.net/qq_52143957/article/details/140763808

Java环境检查

jdk1.8

Windows环境下安装Hadoop

下载Hadoop

访问：https://archive.apache.org/dist/hadoop/common/ 下载hadoop.tar.gz并解压本地.
hadoop3.2.1.tar.gz
访问：https://github.com/cdarlint/winutils 选择对应版本的winutils.exe，将hadoop的bin目录替换，再将bin目录中的winutils.exe和hadoop.dll拷贝到C:\Windows\System32目录下，重启电脑。
在这里插入图片描述

配置环境变量

HADOOP_HOME:D:\hadoop\hadoop-3.2.1
HADOOP_USER_NAME:root
Path: %HADOOP_HOME%\bin;%HADOOP_HOME%\sbin

验证是否成功
在这里插入图片描述

配置Hadoop

前往D:\hadoop\hadoop-3.2.1\etc\hadoop
1、编辑hadoop-env.cmd
修改JAVA_HOME的路径，把set JAVA_HOME改为jdk的位置，通常无需改动。
在这里插入图片描述
2、编辑core-site.xml
修改域名解析文件hosts，地址:C:\Windows\System32\drivers\etc，添加IP和域名映射。

<configuration>
	<!-- 指定Hadoop使用的临时目录 -->
	<property>
	    <name>hadoop.tmp.dir</name>
	    <value>/D:/hadoop/hadoop-3.2.1/tmp</value>
	</property>
	
	<!-- NameNode地址和端口 -->
	<property>
	    <name>fs.defaultFS</name>
	    <value>hdfs://ct-local:900</value>
	    <final>true</final>
	</property>
	
	<!-- 禁用Hadoop的安全授权机制 -->
	<property>
	    <name>hadoop.security.authorization</name>
	    <value>false</value>
	</property>
</configuration>

3、编辑hdfs-site.xml
在hadoop文件内添加data和namenode，datanode子文件夹。
在这里插入图片描述

<configuration>
    <!-- 设置HDFS文件块的复制因子 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

    <!-- 指定NameNode的数据存储目录 -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/D:/hadoop/hadoop-3.2.1/data/namenode</value>
    </property>

    <!-- 指定DataNode的数据存储目录 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/D:/hadoop/hadoop-3.2.1/data/datanode</value>
    </property>

    <!-- 禁用HDFS的权限检查 -->
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
</configuration>

4、编辑mapred-site.xml

<configuration>
    <!-- 指定MapReduce框架资源管理器 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

5、编辑yarn-site.xml

<configuration>
	<!-- 资源管理器ResourceManager地址 -->
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>ct-local:8032</value>
    </property>
    
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

运行Hadoop

1、格式化namenode
以管理员身份打开命令提示符，输入hdfs namenode -format。
hadoop-3.2.1格式化窗口会报错 java.lang.UnsupportedOperationException。
在这里插入图片描述
解决办法：
访问 https://github.com/tang2087/big-data/tree/master 下载hadoop-hdfs-3.2.1.jar，进入D:\hadoop\hadoop-3.2.1\share\hadoop\hdfs，将原文件名为hadoop-hdfs-3.2.1.jar的文件重命名为hadoop-hdfs-3.2.1.bak，最后将下载的hadoop-hdfs-3.2.1.jar复制进去即可。重新运行格式化命令，出现successfully formatted代表成功。
在这里插入图片描述

2、启动Hadoop集群
以管理员身份打开命令提示符，进入Hadoop解压目录下的sbin目录，输入start-all.cmd。
输入jps，出现下面四个窗口表示启动hadoop集群成功。

3、查看web界面
访问：http://ct-local:9870

访问：http://ct-local:8088/cluster
在这里插入图片描述

Windows环境下安装HBase

下载HBase

访问：https://archive.apache.org/dist/hbase/ 下载habase-bin.tar.gz并解压本地。
habase-2.2.5-bin.tar.gz

配置环境变量

HBASE_HOME:D:\hbase\hbase-2.2.5
Path: %HBASE_HOME%\bin

配置HBase

前往D:\hbase\hbase-2.2.5\conf
1、编辑hbase-env.cmd

set JAVA_HOME=D:\java\jdk
set HBASE_MANAGES_ZK=true
set HADOOP_HOME=D:\hadoop\hadoop-3.2.1
set HBASE_LOG_DIR=D:\hbase\hbase-2.2.5\logs

2、编辑hbase-site.xml

<configuration>
	<property>
		<name>hbase.tmp.dir</name>
		<value>/D:/hbase/hbase-2.2.5/tmp</value>
	</property>
	
	<property>
		<name>hbase.rootdir</name>
		<value>hdfs://ct-local:900/hbase</value>
	</property>
	
	<!-- Windows不支持true, 只能填写false,设置成false之后, HBase将启动内置Zookeeper
         填写true => error message: This is not implemented yet. Stay tuned. -->
	<property>
		<name>hbase.cluster.distributed</name>
		<value>false</value>
	</property>
	
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>ct-local</value>
	</property>
  
	<property>
		<name>hbase.zookeeper.property.dataDir</name>
		<value>/D:/hbase/hbase-2.2.5/zoo</value>
	</property>
	
	<!-- 使用本地文件系统设置为false，使用hdfs设置为true -->
	<property>
		<name>hbase.unsafe.stream.capability.enforce</name>
		<value>true</value>
	</property>
	
	<property>
		<name>hbase.unsafe.stream.capability.enforce</name>
		<value>false</value>
	</property>
</configuration>

运行HBase

运行HBase前先启动Hadoop
在HBase解压目录下的bin目录，输入start-hbase.cmd。
输入hbase shell，进入HBase控制台。
报错Could not initialize class org.fusesource.jansi.internal.Kernel32
在这里插入图片描述
解决办法：下载jansi-1.4.jar放入HBase解压目录下的lib文件夹，其他版本可能会出现其他问题。

<dependency>
    <groupId>org.fusesource.jansi</groupId>
    <artifactId>jansi</artifactId>
    <version>1.4</version>
</dependency>

在这里插入图片描述
出现以下界面代表成功：

查看web界面
访问：http://ct-local:16010

Windows本地开发

案例：将本地文件上传至hdfs中，转化为hfile，并Bulk Load到hbase。
pom依赖：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>test22</artifactId>
        <groupId>com.ct</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>mapReduceDemo</artifactId>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <hadoop.version>3.3.1</hadoop.version>
        <hbase.version>2.1.10</hbase.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.yaml</groupId>
            <artifactId>snakeyaml</artifactId>
            <version>1.30</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-mapreduce</artifactId>
            <version>2.2.5</version>
            <exclusions>
                <exclusion>
                    <artifactId>slf4j-api</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-log4j12</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>hadoop-common</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>hadoop-mapreduce-client-core</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>hadoop-annotations</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>hadoop-hdfs</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>hadoop-auth</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>2.2.5</version>
            <exclusions>
                <exclusion>
                    <artifactId>slf4j-api</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-log4j12</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>hadoop-common</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>hadoop-auth</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.13.2</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.testng</groupId>
            <artifactId>testng</artifactId>
            <version>6.9.10</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <!-- compiler插件, 设定JDK版本 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <encoding>UTF-8</encoding>
                    <source>1.8</source>
                    <target>1.8</target>
                    <showWarnings>true</showWarnings>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>com.ct.BulkLoadRunner</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

application-dev.yml：

hadoopconf:
  hdfssite: D:/hadoop/hadoop-3.2.1/etc/hadoop/hdfs-site.xml
  mapredsite: D:/hadoop/hadoop-3.2.1/etc/hadoop/mapred-site.xml
  coresite: D:/hadoop/hadoop-3.2.1/etc/hadoop/core-site.xml
  yarnsite: D:/hadoop/hadoop-3.2.1/etc/hadoop/yarn-site.xml
  hbasesinksite: D:/hbase/hbase-2.2.5/conf/hbase-site.xml

test:
  hdfsPath: hdfs://ct-local:900/test/in/
  localInputPath: E:\\桌面\\RateCdr
  outputPath: hdfs://ct-local:900/test/out
  # 使用maven-assembly-plugin打的jar包，否则可能出现找不到mapper的错误
  jarpath: D:\workspace\test22\mapReduceDemo\target\mapReduceDemo-1.0-SNAPSHOT-jar-with-dependencies.jar

jarpath下的jar包需要以下依赖才能正常使用：
在这里插入图片描述

ConvertToHFilesMapper：

public class ConvertToHFilesMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Cell> {
    public ImmutableBytesWritable rowKey = new ImmutableBytesWritable();
    
    public byte[] INFO = Bytes.toBytes("info");

    static ArrayList<byte[]> qualifiers = new ArrayList<>();

    /**
     * map任务的初始化函数。
     * 在此函数中，将需要处理的列限定符添加到qualifiers列表中。
     *
     * @param context 上下文对象，用于读写数据和计数器
     * @throws IOException 如果发生I/O错误
     * @throws InterruptedException 如果线程被中断
     */
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        super.setup(context);
        // 计数器加一，用于统计mapper的执行个数
        context.getCounter("Convert", "mapper").increment(1);
        // 列的字段,这里是两列
        byte[] homeCity = Bytes.toBytes("home_city");
        byte[] detail = Bytes.toBytes("detail");
        // 将需要处理的列限定符添加到列表中
        qualifiers.add(homeCity);
        qualifiers.add(detail);
    }

    /**
     * map函数，对每个输入键值对进行处理。
     * 解析输入文本行，根据列限定符生成KeyValue，然后写入上下文。
     *
     * @param key 输入键，通常是行偏移量
     * @param value 输入值，通常是文本行
     * @param context 上下文对象，用于读写数据和计数器
     * @throws IOException 如果发生I/O错误
     * @throws InterruptedException 如果线程被中断
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // 解析文本行
        String rowdata = value.toString();
        String[] dataArr = rowdata.split("`");
        // 获取行键的字节序列
        byte[] rowKetBytes = dataArr[0].getBytes();
        rowKey.set(rowKetBytes);

        // 检查数据是否完整，然后生成并写入KeyValue
        if (check(dataArr.length - 1, qualifiers.size(), rowdata)) {
            KeyValue kv = null;
            for (int i = 0; i < qualifiers.size(); i++) {
                kv = new KeyValue(rowKetBytes, INFO, qualifiers.get(i), Bytes.toBytes(dataArr[i + 1]));
                if (null != kv) {
                    context.write(rowKey, kv);
                }
            }
        }
    }

    /**
     * 检查数据的完整性和正确性。
     * 根据数据字段的数量和预期的列数量，判断数据是否完整或是否有冗余。
     *
     * @param valueSize 数据字段的数量
     * @param fieldSize 预期的列数量
     * @param rowdata 输入的文本行数据
     * @return 如果数据完整且正确，则返回true；否则返回false。
     * @throws IOException 如果发生I/O错误
     */
    public boolean check(int valueSize, int fieldSize, String rowdata) throws IOException {
        String err_msg = null;
        // 根据数据字段数量和预期列数量的比较，判断数据是否完整或是否有冗余
        if (valueSize <= 0) {
            err_msg = "No delimiter";
        } else if (valueSize < fieldSize) {
            err_msg = "Less columns";
        } else if (valueSize > fieldSize) {
            err_msg = "Excessive columns";
        }

        // 如果存在错误消息，则表示数据不完整或有冗余，返回false
        if (StringUtils.isNotEmpty(err_msg)) {
            return false;
        }
        // 如果没有错误消息，表示数据完整且正确，返回true
        return true;
    }
}

BulkLoadRunner ：

public class BulkLoadRunner extends Configured implements Tool {

    private static final String CONFIG_FILE = "application-dev.yml";

    public static final String separator = System.getProperty("file.separator");

    private static final Logger logger = LoggerFactory.getLogger(BulkLoadRunner.class);

    static Configuration conf;

    private static Connection connection;

    private static FileSystem fs;

    private static String defaultFsURI;

    private static Map<String, Table> tableMap = new HashMap<>(12);

    public static final String ERROR_PEND = "_error";

    // 初始化配置
    static  {
        Yaml yaml = new Yaml();
        InputStream inputStream = BulkLoadRunner.class.getClassLoader().getResourceAsStream(CONFIG_FILE);

        // 加载 YAML 配置文件
        Map<String, Object> configMap = yaml.load(inputStream);

        if (configMap != null) {
            // 获取 Hadoop 配置项
            Map<String, Object> hadoopConf = (Map<String, Object>) configMap.get("hadoopconf");
            String coresite = (String) hadoopConf.get("coresite");
            String hdfssite = (String) hadoopConf.get("hdfssite");
            String mapredsite = (String) hadoopConf.get("mapredsite");
            String yarnsite = (String) hadoopConf.get("yarnsite");
            String hbasesinksite = (String) hadoopConf.get("hbasesinksite");

            //配置读取
            Configuration configuration = new Configuration();
            configuration.addResource(new Path(coresite));
            configuration.addResource(new Path(hdfssite));
            configuration.addResource(new Path(mapredsite));
            configuration.addResource(new Path(yarnsite));
            configuration.addResource(new Path(hbasesinksite));
            conf = HBaseConfiguration.create(configuration);

            //指定mapreduce可以在远程集群运行
            conf.set("hadoop.security.authentication", "simple");
            conf.set("mapreduce.app-submission.cross-platform", "true");
            conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());

            // 获取 HBase 连接和文件系统对象
            try {
                connection = ConnectionFactory.createConnection(conf);
                fs = FileSystem.get(conf);
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
            defaultFsURI = fs.getUri().toString();
        } else {
            throw new IllegalStateException("Configuration could not be loaded from YAML file.");
        }
    }

    /**
     * 获取指定表名的 Table 对象。
     *
     * @param tableName 表名
     * @return 对应的 Table 对象
     * @throws IOException 如果发生 I/O 错误
     */
    public static Table getTable(String tableName) throws IOException {
        if (!tableMap.containsKey(tableName)) {
            Table table = connection.getTable(TableName.valueOf(tableName));
            tableMap.put(tableName, table);
        }
        return tableMap.get(tableName);
    }

    /**
     * 获取指定表名的 RegionLocator 对象。
     *
     * @param tableName 表名
     * @return 对应的 RegionLocator 对象
     * @throws IOException 如果发生 I/O 错误
     */
    public static RegionLocator getRegionLocator(String tableName) throws IOException {
        return connection.getRegionLocator(TableName.valueOf(tableName));
    }

    /**
     * 将本地文件上传到 HDFS。
     *
     * @param delSrc    是否删除源文件
     * @param overwrite 是否覆盖目标文件
     * @param fromPath  本地文件路径
     * @param toPath    目标 HDFS 路径
     * @return 如果上传成功返回 true，否则返回 false
     */
    public static boolean putFile(boolean delSrc, boolean overwrite, String fromPath, String toPath) {
        Path to = new Path(toPath);
        try {
            // 如果目标路径不存在则创建
            if (!fs.exists(to) && !fs.mkdirs(to)) {
                logger.error("path doesn't exist and can't be created:{}", toPath);
                return false;
            }
            File fromFile = new File(fromPath);
            if (!fromFile.exists()) {
                logger.error("fromPath doesn't exist :{}", fromPath);
                return false;
            }
            Path from = new Path(fromFile.getAbsolutePath());
            fs.copyFromLocalFile(delSrc, overwrite, from, to);
            logger.info("succeed in put HDFS file ，copying from {} to {}", from, to);
        } catch (IOException e) {
            logger.error("putFile error", e);
            return false;
        }
        return true;
    }

    /**
     * 列出给定路径下的所有文件路径。
     *
     * @param path 要列出文件的路径
     * @return 如果路径存在，则返回该路径下所有文件的路径列表；否则返回 null
     * @throws IOException 如果发生 I/O 错误
     */
    public static List<Path> list(Path path) throws IOException {
        if (fs.exists(path)) {
            List<Path> pathList = new ArrayList<>();
            FileStatus[] fileStatuses = fs.listStatus(path);
            for (FileStatus status : fileStatuses) {
                pathList.add(status.getPath());
            }
            return pathList;
        }
        return null;
    }

    /**
     * 将指定路径重命名为错误路径。
     *
     * @param oldPath 原始路径
     * @return 如果重命名成功返回 true，否则返回 false
     * @throws IOException 如果发生 I/O 错误
     */
    public static boolean renameError(Path oldPath) throws IOException {
        String parentPath = oldPath.toUri().getPath();
        if (parentPath.endsWith(separator)) {
            parentPath = parentPath.substring(0, parentPath.length() - separator.length());
        }
        String newPath = defaultFsURI + parentPath + ERROR_PEND;
        logger.info("===============new Path:{}", newPath);
        return fs.rename(oldPath, new Path(newPath));
    }

    /**
     * 将生成的 HFile 导入到 HBase 中。
     *
     * @param tableName 目标表名
     * @param hfileDir  HFile 目录
     * @throws IOException 如果发生 I/O 错误
     */
    public static void loadToHBase(String tableName, String hfileDir) throws IOException {
        Path dir = new Path(hfileDir);
        Table table = getTable(tableName);
        RegionLocator regionLocator = getRegionLocator(tableName);
        logger.info("线程{}start bulkload!!tableName:{},HfilePath:{}", Thread.currentThread().getId(), tableName, hfileDir);
        try {
            LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
            loader.doBulkLoad(dir, connection.getAdmin(), table, regionLocator);
            logger.info("=============doBulkLoad success================");
        } finally {
            Path output = new Path(hfileDir);
            if (fs.exists(output)) {
                boolean delete = fs.delete(output, true);
                logger.info("delete HfilePath: {},delete result: {}", hfileDir, delete);
            }
        }
    }

    /**
     * 执行 MapReduce 任务。
     *
     * @param strings 参数数组
     * @return 如果任务成功返回 1，否则返回 0
     * @throws Exception 如果发生错误
     */
    @Override
    public int run(String[] strings) throws Exception {
        String jarPath = strings[0];
        String localInputPath = strings[1];
        String hdfsPath = strings[2];
        String outputPath = strings[3];
        String outTableName = strings[4];
        Table table = getTable(outTableName);

        Job job = Job.getInstance(conf, BulkLoadRunner.class.getSimpleName());

        job.setJarByClass(BulkLoadRunner.class);
        job.setJar(jarPath);

        job.setMapperClass(ConvertToHFilesMapper.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapOutputValueClass(KeyValue.class);
        job.setNumReduceTasks(0);

        int firstIndex = localInputPath.lastIndexOf(separator);
        String tableName = localInputPath.substring(firstIndex + 1);
        String inputPath = hdfsPath + tableName;
        Path input = new Path(inputPath);
        if(!putFile(false, true, localInputPath, inputPath)){
            if(!putFile(false, true, localInputPath, inputPath)){
                return 0;
            }
        }

        System.out.println("---------------------"+list(new Path(hdfsPath + "test"))+"----------------------");

        FileInputFormat.setInputPaths(job, input);

        Path output = new Path(outputPath);
        if (fs.exists(output)) {
            System.out.println("---------------------"+list(output)+"----------------------");
            fs.delete(output, true);
        }

        FileOutputFormat.setOutputPath(job, output);
        HFileOutputFormat2.setOutputPath(job, output);
        HFileOutputFormat2.configureIncrementalLoad(job, table, getRegionLocator(outTableName));

        if (job.waitForCompletion(true)) {
            System.out.println("---------------------"+"success"+"---------------------");
            return 1;
        } else {
            System.out.println("---------------------"+"fail"+"---------------------");
            return 0;
        }
    }
    
    public static void main(String[] args) throws Exception {
        String jarPath;
        String localInputPath;
        String hdfsPath;
        String outputPath;
        String outTableName = "cttest";
        Yaml yaml = new Yaml();
        InputStream inputStream = BulkLoadRunner.class.getClassLoader().getResourceAsStream(CONFIG_FILE);

        Map<String, Object> configMap = yaml.load(inputStream);

        if (configMap != null) {
            Map<String, Object> testConfig = (Map<String, Object>) configMap.get("test");

            hdfsPath = (String) testConfig.get("hdfsPath");
            localInputPath = (String) testConfig.get("localInputPath");
            outputPath = (String) testConfig.get("outputPath");
            jarPath = (String) testConfig.get("jarpath");
        } else {
            throw new IllegalStateException("Configuration could not be loaded from YAML file.");
        }

        Tool tool  =  new BulkLoadRunner();
        boolean flag = false;
        try {
            int status = ToolRunner.run(conf, tool, new String[]{jarPath, localInputPath, hdfsPath, outputPath, outTableName});
            logger.info("###############ToolRunner.run status:{}", status);
            flag = status == 1;
        }catch (Exception e) {
            flag = false;
            logger.error("hfile error:{}", e);
        }
        if (flag) {
            // 导入 HBase
            try {
                loadToHBase(outTableName, outputPath);
                logger.info("#################success######################");
            } catch (Exception e) {
                logger.error("HFile2HBase.loadToHBase error:{}", e);
                renameError(new Path(outputPath));
            }
        }
    }
}

测试数据：

cttest1`aa`1
cttest2`aa`1

本地运行：
在这里插入图片描述

虚拟机运行：
准备环境：
将几个配置文件放入config文件夹，待上传文件放入data文件夹

编写Dockerfile

# 使用官方的OpenJDK基础镜像
FROM alpine-java8:2.0

# 设定时区
ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# 设置构建参数
ARG JAR_FILE=bdc-cdr-mrtask-docker.jar

# 设置工作目录
WORKDIR /app

# 将本地的JAR文件复制到镜像中
COPY ${JAR_FILE} /app/app.jar

# 复制config目录到镜像中
COPY config /app/config
COPY data /app/data

# 设置环境变量以确保正确的字符编码
ENV LANG C.UTF-8
ENV LANGUAGE C.UTF-8
ENV LC_ALL C.UTF-8

# 运行Java应用程序
ENTRYPOINT ["java", "-Dfile.encoding=UTF-8", "-Xbootclasspath/a:./config", "-jar", "/app/app.jar"]

构建镜像：

![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/72cd56aafae64b7194908e932b8dfe8f.png
运行成功：