Flink1.11.0 Kerberos 连接hive 过程记录

本文详细记录了使用Flink 1.11.0版本通过Kerberos认证连接Hive的过程,包括pom.xml配置、Kerberos认证代码实现、打包上传及遇到的包冲突问题和解决方法,强调了环境配置、Hive-site.xml的一致性、Hadoop类路径设置等关键点。
摘要由CSDN通过智能技术生成

##Flink1.11.0 Kerberos 连接hive 过程记录

1、配置文件
org.example
flink-etl
1.0-SNAPSHOT

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <scala.version>2.12</scala.version>
    <flink.version>1.11.0</flink.version>
    <hive.version>1.1.0</hive.version>
    <hadoop.version>2.6.0</hadoop.version>
    <guava-version>23.0</guava-version>
    <lombok.version>1.18.16</lombok.version>
    <hutool.vesion>5.3.9</hutool.vesion>
</properties>
<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-jdbc_${scala.version}</artifactId>
        <version>1.10.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-kafka_${scala.version}</artifactId>
        <version>1.11.0</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <version>${lombok.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-json</artifactId>
        <version>1.11.0</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-planner-blink_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-planner_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-hive_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-api-scala-bridge_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-hadoop-compatibility_${scala.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>${hadoop.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>${hadoop.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>${hadoop.version}</version>
        <scope>provided</scope>
    </dependency>
    <!-- Hive Dependency -->
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>${hive.version}</version>
        <exclusions>
            <exclusion>
                <groupId>org.codehaus.janino</groupId>
                <artifactId>janino</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.codehaus.janino</groupId>
                <artifactId>commons-compiler</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-metastore</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.thrift</groupId>
        <artifactId>libfb303</artifactId>
        <version>0.9.2</version>
    </dependency>

    <dependency>
        <groupId>io.vavr</groupId>
        <artifactId>vavr</artifactId>
        <version>0.10.2</version>
    </dependency>
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>${guava-version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>cn.hutool</groupId>
        <artifactId>hutool-all</artifactId>
        <version>${hutool.vesion}</version>
    </dependency>
    <dependency>
        <groupId>redis.clients</groupId>
        <artifactId>jedis</artifactId>
        <version>2.9.0</version>
    </dependency>
</dependencies>

<build>
    <plugins>
        <!--打包jar-->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jar-plugin</artifactId>
            <configuration>
                <excludes>
                    <!--将resource 下的文件排除掉,不打包到jar包中-->
                    <!-- <exclude>*.*</exclude>-->
                </excludes>
                <archive>
                    <manifest>
                        <addClasspath>true</addClasspath>
                        <!--MANIFEST.MF 中 Class-Path 加入前缀-->
                        <classpathPrefix>lib/</classpathPrefix>
                        <!--jar包不包含唯一版本标识-->
                        <useUniqueVersions>false</useUniqueVersions>
                        <!--指定入口类-->
                        <mainClass>com.xxx.etl.TestEtl</mainClass>
                    </manifest>
                </archive>
                <outputDirectory>${project.build.directory}/dis</outputDirectory>
            </configuration>
        </plugin>
        <!--拷贝依赖 copy-dependencies-->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-dependency-plugin</artifactId>
            <executions>
                <execution>
                    <id>copy-dependencies</id>
                    <phase>package</phase>
                    <goals>
                        <goal>copy-dependencies</goal>
                    </goals>
                    <configuration>
                        <outputDirectory>
                            ${project.build.directory}/dis/lib/
                        </outputDirectory>
                    </configuration>
                </execution>
            </executions>
        </plugin>

        <!--spring boot repackage,依赖 maven-jar-plugin 打包的jar包 重新打包成 spring boot 的jar包-->
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
            <configuration>
                <layout>ZIP</layout>
                <!--使用外部配置文件,jar包里没有资源文件-->
                <addResources>true</addResources>
            </configuration>
            <executions>
                <execution>
                    <goals>
                        <goal>repackage</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

    </plugins>
    <resources>
        <resource>
            <directory>src/main/resources</directory>
            <filtering>true</filtering>
        </resource>
        <resource>
            <directory>src/main/resources</directory>
            <filtering>true</filtering>
            <includes>
                <include>*.*</include>
            </includes>
        </resource>
    </resources>
</build>

2、配置文件
在这里插入图片描述
注意:hive-site.xml 文件一定要和集群完全一样,否则会报错,具体报什么错误,忘记记录了

认证的代码:

 public class KerberosAuth {

 public void kerberosAuth(Boolean debug) {
try {
  System.setProperty("java.security.krb5.conf", "config/krb5.conf");
  System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
  if (debug) {
    System.setProperty("sun.security.krb5.debug", "true");
  } ;
  	Configuration conf = new Configuration();
  	conf.set("hadoop.security.authentication", "Kerberos");

 	 UserGroupInformation.setConfiguration(conf);

  	UserGroupInformation.loginUserFromKeytab("hive@XXX.COM.DEV", "config/hive.keytab");
   } catch (Exception e) {
  e.printStackTrace();
  }
}

获取HiveCatalog

public static HiveCatalog getHiveCatalog() throws Exception {
HiveCatalog hiveCatalog = null;

try {
  hiveCatalog = UserGroupInformation.getLoginUser().doAs(new PrivilegedExceptionAction<HiveCatalog>() {
    @Override
    public HiveCatalog run() throws Exception {
      return new HiveCatalog(name, defaultDatabase, hiveConfDir);
    }
  });
} catch (IOException e) {
  e.printStackTrace();
} catch (InterruptedException e) {
  e.printStackTrace();
}
return hiveCatalog;

}

调用:

EnvironmentSettings settings = EnvironmentSettings
.newInstance()
.useBlinkPlanner()
.inBatchMode()
.build();

TableEnvironment tableEnv = TableEnvironment.create(settings);
//认证
new KerberosAuth().kerberosAuth(true);

System.out.println("*****************************" + UserGroupInformation.getCurrentUser());
// Hive版本号
//HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
HiveCatalog hive = getHiveCatalog();
//StatementSet statementSet = tableEnv.createStatementSet();
tableEnv.registerCatalog(name, hive);
tableEnv.useCatalog(name);
tableEnv.useDatabase("dwd");

3、打包

mvn clean package

此时 依赖包和源码包是分离开的

4、上传jar 包,启动 任务

./bin/flink run -m yarn-cluster -yjm 1024 -ytm 4096 -ynm fink_test /data/flink/test/flink-etl-1.0-SNAPSHOT.jar


这里报错了,需要设置HADOOP_CLASSPATH

export HADOOP_CLASSPATH=hadoop classpath

5、继续运行上边的命令,然后报了一堆的错误,看上去全是包冲突,,,抓瞎。。。。

6、直接把依赖包全部删除,按照mvn 依赖一个 一个手动下载

7、把手动下载的依赖包全部上传到 flink 的lib下边,然后发现 flink-connector-kafka_2.12-1.11.0.jar 包 重复了,直接mv flink-connector-kafka_2.12-1.11.0.jar flink-connector-kafka_2.12-1.11.0.jar.bak ,
然后再启动任务

继续报错

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.io.IOException: Can’t get Master Kerberos principal for use as renewer

看上去是 认证的问题,但是前边的日志中又打印出来登录成功

继续找问题,分析发现是缺少 hadoop conf 的配置

添加如下配置:

export YARN_CONF_DIR=/etc/hadoop/conf

8、继续启动任务

继续报错

java.lang.NoClassDefFoundError: org/htrace/Trace

分析发现,这个包是hadoop的包,但是我明明时候引入,这个报找不到,肯定是和hadoop 环境的包冲突了,

解决办法,直接将自己引入的包去掉

[work@slave5-dev lib]$ mv hadoop-client-2.6.0.jar hadoop-client-2.6.0.jar.bak
[work@slave5-dev lib]$ mv hadoop-hdfs-2.6.0.jar hadoop-hdfs-2.6.0.jar.bak

9、继续提交任务

到此 包冲突的问题应该解决了,接下来就是去yarn上看日志,为什么不能提交。。。。。

10、总结:

准备条件:
1、安装hadoop客户端
2、配置环境变量

export HADOOP_HOME=/usr/hdp/2.4.0.0-169/hadoop
export YARN_CONF_DIR=/etc/hadoop/conf
export HADOOP_CLASSPATH=hadoop classpath

3、hive-site.xml 文件一定要和集群的保持一致

4、要注意 hive.keytab 文件是否正确

kinit -kt hive.keytab hive 执行这个命令,不报错就是正确的

11、参考

https://blog.csdn.net/wangyu_qiuxue/article/details/105349239

要在Flink SQL中连接并读取Hive表,您可以使用Hive catalog或Flink JDBC连接器的方法。 使用Hive catalog的方法如下: 1. 首先,您需要在Flink SQL Stream Builder中注册Hive catalog。这可以通过在Flink SQL CLI或Flink SQL客户端中执行相应的DDL语句来完成。\[3\] 2. 注册完成后,您可以在Flink SQL中使用Hive表。您可以通过在SQL查询中使用类似于"SELECT * FROM hive_table"的语句来读取Hive表的数据。\[1\] 使用Flink JDBC连接器的方法如下: 1. 首先,您需要确保Hive表的数据可以通过JDBC连接器访问。这可以通过在Flink的配置文件中配置Hive JDBC连接器的相关信息来实现。\[2\] 2. 配置完成后,您可以在Flink SQL中使用JDBC连接器来读取Hive表。您可以通过在SQL查询中使用类似于"SELECT * FROM jdbc_table"的语句来读取Hive表的数据。 无论您选择使用Hive catalog还是JDBC连接器,都可以在Flink SQL中连接并读取Hive表的数据。具体选择取决于您的需求和环境。 #### 引用[.reference_title] - *1* *2* *3* [通过 Flink SQL 使用 Hive 表丰富流](https://blog.csdn.net/wang_x_f911/article/details/127980900)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值