flink1.11 对hive 有很大的改进与提升,flink通过高级API tableapi来进行对hive的处理。本文避开了tableAPI,主要讲解,flink如何链接的hive的metastore。
flink链接hive的一系列初始化的代码主要都在HiveCatalog 类中,下面主要描述HiveCatalog中的功能。
- 三个重要的参数
private final HiveConf hiveConf; # hive的配置
private final String hiveVersion;
private final HiveShim hiveShim;
主要注意的为hiveShim,这个变量,主要为对接hive的不同版本,进入不同的版本后,最后在 HiveShimV100类中,flink调用 HiveMetaStoreClient(hiveConf 创建hive的 client。
2. 函数createHiveConf,是根据环境,获取hive和hadoop相关配置的
private static HiveConf createHiveConf(@Nullable String hiveConfDir) {
LOG.info("Setting hive conf dir as {}", hiveConfDir);
try {
HiveConf.setHiveSiteLocation(
hiveConfDir == null ?
null : Paths.get(hiveConfDir, "hive-site.xml").toUri().toURL());
} catch (MalformedURLException e) {
throw new CatalogException(
String.format("Failed to get hive-site.xml from %s", hiveConfDir), e);
}
// create HiveConf from hadoop configuration
Configuration hadoopConf = HadoopUtils.getHadoopConfiguration(new org.apache.flink.configuration.Configuration());
// Add mapred-site.xml. We need to read configurations like compression codec.
for (String possibleHadoopConfPath : HadoopUtils.possibleHadoopConfPaths(new org.apache.flink.configuration.Configuration())) {
File mapredSite = new File(new File(possibleHadoopConfPath), "mapred-site.xml");
if (mapredSite.exists()) {
hadoopConf.addResource(new Path(mapredSite.getAbsolutePath()));
break;
}
}
return new HiveConf(hadoopConf, HiveConf.class);
}
- 在open函数中,用hiveConf 和hiveVersion创建了hive的Client。
HiveMetastoreClientFactory.create(hiveConf, hiveVersion);
public void open() throws CatalogException {
if (client == null) {
client = HiveMetastoreClientFactory.create(hiveConf, hiveVersion);
LOG.info("Connected to Hive metastore");
}
if (!databaseExists(getDefaultDatabase())) {
throw new CatalogException(String.format("Configured default database %s doesn't exist in catalog %s.",
getDefaultDatabase(), getName()));
}
}
- instantiateCatalogTable 函数来合成CatalogTableImpl
private CatalogBaseTable instantiateCatalogTable(Table hiveTable, HiveConf hiveConf) {
boolean isView = TableType.valueOf(hiveTable.getTableType()) == TableType.VIRTUAL_VIEW;
// Table properties
Map<String, String> properties = hiveTable.getParameters();
boolean isGeneric = isGenericForGet(hiveTable.getParameters());
TableSchema tableSchema;
// Partition keys
List<String> partitionKeys = new ArrayList<>();
if (isGeneric) {
properties = retrieveFlinkProperties(properties);
DescriptorProperties tableSchemaProps = new DescriptorProperties(true);
tableSchemaProps.putProperties(properties);
ObjectPath tablePath = new ObjectPath(hiveTable.getDbName(), hiveTable.getTableName());
tableSchema = tableSchemaProps.getOptionalTableSchema(Schema.SCHEMA)
.orElseThrow(() -> new CatalogException("Failed to get table schema from properties for generic table " + tablePath));
partitionKeys = tableSchemaProps.getPartitionKeys();
// remove the schema from properties
properties = CatalogTableImpl.removeRedundant(properties, tableSchema, partitionKeys);
} else {
properties.put(CatalogConfig.IS_GENERIC, String.valueOf(false));
// Table schema
List<FieldSchema> fields = getNonPartitionFields(hiveConf, hiveTable);
Set<String> notNullColumns = client.getNotNullColumns(hiveConf, hiveTable.getDbName(), hiveTable.getTableName());
Optional<UniqueConstraint> primaryKey = isView ? Optional.empty() :
client.getPrimaryKey(hiveTable.getDbName(), hiveTable.getTableName(), HiveTableUtil.relyConstraint((byte) 0));
// PK columns cannot be null
primaryKey.ifPresent(pk -> notNullColumns.addAll(pk.getColumns()));
tableSchema = HiveTableUtil.createTableSchema(fields, hiveTable.getPartitionKeys(), notNullColumns, primaryKey.orElse(null));
if (!hiveTable.getPartitionKeys().isEmpty()) {
partitionKeys = getFieldNames(hiveTable.getPartitionKeys());
}
}
String comment = properties.remove(HiveCatalogConfig.COMMENT);
if (isView) {
return new CatalogViewImpl(
hiveTable.getViewOriginalText(),
hiveTable.getViewExpandedText(),
tableSchema,
properties,
comment);
} else {
return new CatalogTableImpl(tableSchema, partitionKeys, properties, comment);
}
}
以上的代码就是flink如何获取到hive相关的配置,以及如何定义了hivemetastore的client。