##Flink1.11.0 Kerberos 连接hive 过程记录
1、配置文件
org.example
flink-etl
1.0-SNAPSHOT
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<scala.version>2.12</scala.version>
<flink.version>1.11.0</flink.version>
<hive.version>1.1.0</hive.version>
<hadoop.version>2.6.0</hadoop.version>
<guava-version>23.0</guava-version>
<lombok.version>1.18.16</lombok.version>
<hutool.vesion>5.3.9</hutool.vesion>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-jdbc_${scala.version}</artifactId>
<version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.version}</artifactId>
<version>1.11.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>${lombok.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-json</artifactId>
<version>1.11.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hive_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-scala-bridge_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility_${scala.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<!-- Hive Dependency -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<exclusions>
<exclusion>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
</exclusion>
<exclusion>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.2</version>
</dependency>
<dependency>
<groupId>io.vavr</groupId>
<artifactId>vavr</artifactId>
<version>0.10.2</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava-version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>cn.hutool</groupId>
<artifactId>hutool-all</artifactId>
<version>${hutool.vesion}</version>
</dependency>
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>2.9.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<!--打包jar-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<excludes>
<!--将resource 下的文件排除掉,不打包到jar包中-->
<!-- <exclude>*.*</exclude>-->
</excludes>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<!--MANIFEST.MF 中 Class-Path 加入前缀-->
<classpathPrefix>lib/</classpathPrefix>
<!--jar包不包含唯一版本标识-->
<useUniqueVersions>false</useUniqueVersions>
<!--指定入口类-->
<mainClass>com.xxx.etl.TestEtl</mainClass>
</manifest>
</archive>
<outputDirectory>${project.build.directory}/dis</outputDirectory>
</configuration>
</plugin>
<!--拷贝依赖 copy-dependencies-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>
${project.build.directory}/dis/lib/
</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<!--spring boot repackage,依赖 maven-jar-plugin 打包的jar包 重新打包成 spring boot 的jar包-->
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<layout>ZIP</layout>
<!--使用外部配置文件,jar包里没有资源文件-->
<addResources>true</addResources>
</configuration>
<executions>
<execution>
<goals>
<goal>repackage</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
</resource>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
<includes>
<include>*.*</include>
</includes>
</resource>
</resources>
</build>
2、配置文件
注意:hive-site.xml 文件一定要和集群完全一样,否则会报错,具体报什么错误,忘记记录了
认证的代码:
public class KerberosAuth {
public void kerberosAuth(Boolean debug) {
try {
System.setProperty("java.security.krb5.conf", "config/krb5.conf");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
if (debug) {
System.setProperty("sun.security.krb5.debug", "true");
} ;
Configuration conf = new Configuration();
conf.set("hadoop.security.authentication", "Kerberos");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("hive@XXX.COM.DEV", "config/hive.keytab");
} catch (Exception e) {
e.printStackTrace();
}
}
获取HiveCatalog
public static HiveCatalog getHiveCatalog() throws Exception {
HiveCatalog hiveCatalog = null;
try {
hiveCatalog = UserGroupInformation.getLoginUser().doAs(new PrivilegedExceptionAction<HiveCatalog>() {
@Override
public HiveCatalog run() throws Exception {
return new HiveCatalog(name, defaultDatabase, hiveConfDir);
}
});
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
return hiveCatalog;
}
调用:
EnvironmentSettings settings = EnvironmentSettings
.newInstance()
.useBlinkPlanner()
.inBatchMode()
.build();
TableEnvironment tableEnv = TableEnvironment.create(settings);
//认证
new KerberosAuth().kerberosAuth(true);
System.out.println("*****************************" + UserGroupInformation.getCurrentUser());
// Hive版本号
//HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
HiveCatalog hive = getHiveCatalog();
//StatementSet statementSet = tableEnv.createStatementSet();
tableEnv.registerCatalog(name, hive);
tableEnv.useCatalog(name);
tableEnv.useDatabase("dwd");
3、打包
mvn clean package
此时 依赖包和源码包是分离开的
4、上传jar 包,启动 任务
./bin/flink run -m yarn-cluster -yjm 1024 -ytm 4096 -ynm fink_test /data/flink/test/flink-etl-1.0-SNAPSHOT.jar
这里报错了,需要设置HADOOP_CLASSPATH
export HADOOP_CLASSPATH=hadoop classpath
5、继续运行上边的命令,然后报了一堆的错误,看上去全是包冲突,,,抓瞎。。。。
6、直接把依赖包全部删除,按照mvn 依赖一个 一个手动下载
7、把手动下载的依赖包全部上传到 flink 的lib下边,然后发现 flink-connector-kafka_2.12-1.11.0.jar 包 重复了,直接mv flink-connector-kafka_2.12-1.11.0.jar flink-connector-kafka_2.12-1.11.0.jar.bak ,
然后再启动任务
继续报错
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.io.IOException: Can’t get Master Kerberos principal for use as renewer
看上去是 认证的问题,但是前边的日志中又打印出来登录成功
继续找问题,分析发现是缺少 hadoop conf 的配置
添加如下配置:
export YARN_CONF_DIR=/etc/hadoop/conf
8、继续启动任务
继续报错
java.lang.NoClassDefFoundError: org/htrace/Trace
分析发现,这个包是hadoop的包,但是我明明时候引入,这个报找不到,肯定是和hadoop 环境的包冲突了,
解决办法,直接将自己引入的包去掉
[work@slave5-dev lib]$ mv hadoop-client-2.6.0.jar hadoop-client-2.6.0.jar.bak
[work@slave5-dev lib]$ mv hadoop-hdfs-2.6.0.jar hadoop-hdfs-2.6.0.jar.bak
9、继续提交任务
到此 包冲突的问题应该解决了,接下来就是去yarn上看日志,为什么不能提交。。。。。
10、总结:
准备条件:
1、安装hadoop客户端
2、配置环境变量
export HADOOP_HOME=/usr/hdp/2.4.0.0-169/hadoop
export YARN_CONF_DIR=/etc/hadoop/conf
export HADOOP_CLASSPATH=hadoop classpath
3、hive-site.xml 文件一定要和集群的保持一致
4、要注意 hive.keytab 文件是否正确
kinit -kt hive.keytab hive 执行这个命令,不报错就是正确的
11、参考
https://blog.csdn.net/wangyu_qiuxue/article/details/105349239