目录
本文主要介绍如何使用Flink访问Kerberos环境下的Hive。
测试环境
1.hive版本为2.1.1
2.flink版本为1.10.0
工程搭建
使用IDE工具通过Maven创建一个Java工程,具体创建过程就不详细描述了。
1.在工程的pom.xml文件中增加如下依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hive_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility_2.11</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-shaded-hadoop-2-uber</artifactId>
<version>2.7.5-8.0</version>
<!--<scope>provided</scope>-->
</dependency>
<!-- Hive Dependency -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<!--<scope>provided</scope>-->
</dependency>
2. 将hive-site.xml,krb5.conf,keytab文件添加到classpath
示例代码及运行
1. 主程序内容如下:
public class HiveCatalogExample {
private static String name = "myhive"; // hivecatalog的名称
private static String defaultDatabase = null; // hive中的数据库
private static String hiveConfDir = "D:\\test"; // a local path
private static String version = "2.1.1";
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings settings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode()
.build();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, settings);
new KerberosAuth().kerberosAuth(false); //认证
HiveCatalog hive = getHiveCatalog(); // 获取HiveCatalog
tableEnv.registerCatalog("myhive", hive); // set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive");
tableEnv.useDatabase("02_logical_layer");
Table table = tableEnv.from("test").select("withColumns(1 to 3)");
tableEnv.toRetractStream(table,Row.class).print();
tableEnv.execute("demo");
}
//获取HiveCatalog
public static HiveCatalog getHiveCatalog() throws Exception {
HiveCatalog hiveCatalog = null;
try {
hiveCatalog = UserGroupInformation.getLoginUser().doAs(new PrivilegedExceptionAction<HiveCatalog>() {
@Override
public HiveCatalog run() throws Exception {
return new HiveCatalog(name, defaultDatabase, hiveConfDir, version);
}
});
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
return hiveCatalog;
}
}
2. KerberosAuth类,用于初始化访问Kerberos
public class KerberosAuth {
public void kerberosAuth(Boolean debug) {
try {
System.setProperty("java.security.krb5.conf", "src/main/resources/krb5.ini");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
if (debug) System.setProperty("sun.security.krb5.debug", "true");
UserGroupInformation.loginUserFromKeytab("test@SJFWPT.SINOPEC.COM", "src/main/resources/test.keytab");
System.out.println(UserGroupInformation.getCurrentUser());
} catch (Exception e) {
e.printStackTrace();
}
}
}
运行结果如下:
总结
- 访问Kerberos环境下的hive时,需要使用Hadoop API提供的UserGroupInformation类实现Kerberos账号登录认证,该API在登录Kerberos认证后,会启动一个线程定时的刷新认证
- Flink1.10完善了HiveCatalog,使得读取Hive更加简单