HMS2.x与HMS3.x是否支持互访?

本文首发微信公众号:码上观世界

HMS作为Hive的心脏,管理数据相关的所有元数据,连接着数据分析与数据存储,其本身也支持独立升级或替换。

97b407ef8bf9218e7368e9c1c119f225.png

HMS从1.0.0 到当前 HMS3.1.2 ,经历了多次变更,特别是HMS3相比之前的版本有了较大的修改,但在实际应用中,由于升级不及时等原因,经常会遇到两种版本共存的问题,甚至两个版本互相访问的情况,比如联邦查询,但是我们在没有验证或者验证不完备的情况下,很难明确回答两者是否能互通,因此让用户在实际应用中小心翼翼,希望不幸不要发生在他们头上。HMS2.x与HMS3.x是当前Hive Metastore使用比较广泛的两个版本系列,这篇文章从HMS协议入手,来梳理当前HMS3.x相对HMS2.x的变更,来给使用者一个定心丸:两者能否混用以及如何用才能避免不幸。

HMS包含两部分:HMS Client和HMS Server,两者通过Thrift RPC协议通信。通信协议类似API接口,考虑到版本升级,接口通常保持不变,包括接口名称和参数,如果实在要变更接口,也是新增一个接口,这样,新版本的接口不会影响原有接口的使用。虽然接口不变,两个HMS版本是可以互相调通的,但请求结果可能会变。HMS Client由协议定义和API组成。

协议定义在文件hive_metastore.thrift文件中,首先我们梳理下不同HMS版本的协议变更情况,根据变更和使用的情况,我们将HMS2.x到HMS3.x划分两段:HMS2.1-2.3,HMS2.3-3.1。

2.3相对2.1版本协议变更

这里将2.3相对2.1版本,hive_metastore.thrift文件主要变更的部分摘录出来:

#hive_metastore.thrift


struct PartitionValuesRequest {
1: required string dbName,
2: required string tblName,
3: required list<FieldSchema> partitionKeys;
4: optional bool applyDistinct = true;
5: optional string filter;
6: optional list<FieldSchema> partitionOrder;
7: optional bool ascending = true;
8: optional i64 maxParts = -1;
}
struct PartitionValuesRow {
1: required list<string> row;
}
struct PartitionValuesResponse {
1: required list<PartitionValuesRow> partitionValues;
}


PartitionValuesResponse get_partition_values(1:PartitionValuesRequest request)
    throws(1:MetaException o1, 2:NoSuchObjectException o2);
    
enum ClientCapability {
  TEST_CAPABILITY = 1
}


struct ClientCapabilities {
  1: required list<ClientCapability> values
}


struct GetTableRequest {
  1: required string dbName,
  2: required string tblName,
  3: optional ClientCapabilities capabilities
}


struct GetTableResult {
  1: required Table table
}


struct GetTablesRequest {
  1: required string dbName,
  2: optional list<string> tblNames,
  3: optional ClientCapabilities capabilities
}


struct GetTablesResult {
  1: required list<Table> tables
}


list<string> get_tables_by_type(1: string db_name, 2: string pattern, 3: string tableType) throws (1: MetaException o1)
GetTableResult get_table_req(1:GetTableRequest req) throws (1:MetaException o1, 2:NoSuchObjectException o2)
GetTablesResult get_table_objects_by_name_req(1:GetTablesRequest req)

从修改记录来看,协议主要增加了分区读取和响应以及表读取和响应的消息结构体和接口,因此HMS2.1 Client请求HMS2.3 Server不受影响,但是HMS2.3无法请求Hms2.1的新增接口。

3.1相对2.3版本协议变更

这里将3.1相对2.3版本,hive_metastore.thrift文件主要变更的部分摘录出来,主要有几个部分:

A 引入Catalog,用于管理数据库、分区、约束等元数据,表现在通信协议上,增加了对Catalog的增删改查操作的消息体和操作

#hive_metastore.thrift


struct CreateCatalogRequest {
  1: Catalog catalog
}


struct AlterCatalogRequest {
  1: string name,
  2: Catalog newCat
}


struct GetCatalogRequest {
  1: string name
}


struct GetCatalogResponse {
  1: Catalog catalog
}


struct GetCatalogsResponse {
  1: list<string> names
}


struct DropCatalogRequest {
  1: string name
}


void create_catalog(1: CreateCatalogRequest catalog) throws (1:AlreadyExistsException o1, 2:InvalidObjectException o2, 3: MetaException o3)
void alter_catalog(1: AlterCatalogRequest rqst) throws (1:NoSuchObjectException o1, 2:InvalidOperationException o2, 3:MetaException o3)
GetCatalogResponse get_catalog(1: GetCatalogRequest catName) throws (1:NoSuchObjectException o1, 2:MetaException o2)
GetCatalogsResponse get_catalogs() throws (1:MetaException o1)
void drop_catalog(1: DropCatalogRequest catName) throws (1:NoSuchObjectException o1, 2:InvalidOperationException o2, 3:MetaException o3)

B 修改Catalog所属元数据的消息结构体,在原有基础协议追加一个可选catName字段

struct HiveObjectRef{
  1: HiveObjectType objectType,
  2: string dbName,
  3: string objectName,
  4: list<string> partValues,
  5: string columnName,
  6: optional string catName --新增字段
}
// namespace for tables
struct Database {
  1: string name,
  2: string description,
  3: string locationUri,
  4: map<string, string> parameters, // properties associated with the database
  5: optional PrincipalPrivilegeSet privileges,
  6: optional string ownerName,
  7: optional PrincipalType ownerType,
  8: optional string catalogName --新增字段
}
struct PartitionSpec {
  1: string dbName,
  2: string tableName,
  3: string rootPath,
  4: optional PartitionSpecWithSharedSD sharedSDPartitionSpec,
  5: optional PartitionListComposingSpec partitionList,
  6: optional string catName --新增字段
}
struct GetTableRequest {
  1: required string dbName,
  2: required string tblName,
  3: optional ClientCapabilities capabilities,
  4: optional string catName --新增字段
}


struct GetTablesRequest {
  1: required string dbName,
  2: optional list<string> tblNames,
  3: optional ClientCapabilities capabilities,
  4: optional string catName --新增字段
}

C 删除原来Index相关的消息体和请求方法,取而代之为Constraints相关的消息体和接口协议定义

struct UniqueConstraintsRequest {
  1: required string catName,
  2: required string db_name,
  3: required string tbl_name,
}


struct UniqueConstraintsResponse {
  1: required list<SQLUniqueConstraint> uniqueConstraints
}


struct NotNullConstraintsRequest {
  1: required string catName,
  2: required string db_name,
  3: required string tbl_name,
}


struct NotNullConstraintsResponse {
  1: required list<SQLNotNullConstraint> notNullConstraints
}


struct DefaultConstraintsRequest {
  1: required string catName,
  2: required string db_name,
  3: required string tbl_name
}


struct DefaultConstraintsResponse {
  1: required list<SQLDefaultConstraint> defaultConstraints
}


struct CheckConstraintsRequest {
  1: required string catName,
  2: required string db_name,
  3: required string tbl_name
}


struct CheckConstraintsResponse {
  1: required list<SQLCheckConstraint> checkConstraints
}




struct DropConstraintRequest {
  1: required string dbname, 
  2: required string tablename,
  3: required string constraintname,
  4: optional string catName
}


struct AddUniqueConstraintRequest {
  1: required list<SQLUniqueConstraint> uniqueConstraintCols
}


struct AddNotNullConstraintRequest {
  1: required list<SQLNotNullConstraint> notNullConstraintCols
}


struct AddDefaultConstraintRequest {
  1: required list<SQLDefaultConstraint> defaultConstraintCols
}


struct AddCheckConstraintRequest {
  1: required list<SQLCheckConstraint> checkConstraintCols
}


// other constraints
UniqueConstraintsResponse get_unique_constraints(1:UniqueConstraintsRequest request)
                     throws(1:MetaException o1, 2:NoSuchObjectException o2)
NotNullConstraintsResponse get_not_null_constraints(1:NotNullConstraintsRequest request)
                     throws(1:MetaException o1, 2:NoSuchObjectException o2)
DefaultConstraintsResponse get_default_constraints(1:DefaultConstraintsRequest request)
                     throws(1:MetaException o1, 2:NoSuchObjectException o2)
CheckConstraintsResponse get_check_constraints(1:CheckConstraintsRequest request)
                     throws(1:MetaException o1, 2:NoSuchObjectException o2)
void add_unique_constraint(1:AddUniqueConstraintRequest req)
    throws(1:NoSuchObjectException o1, 2:MetaException o2)
void add_not_null_constraint(1:AddNotNullConstraintRequest req)
    throws(1:NoSuchObjectException o1, 2:MetaException o2)
void add_default_constraint(1:AddDefaultConstraintRequest req)
    throws(1:NoSuchObjectException o1, 2:MetaException o2)
void add_check_constraint(1:AddCheckConstraintRequest req)
    throws(1:NoSuchObjectException o1, 2:MetaException o2)

D 某些字段或数据类型变更,涉及到列统计消息体、动态分区、事务操作

86f8e4eabe860e443b057a98d7ae8b93.png

ba08d507f448b3697a35804f9b636bc1.png

thrift的varchar数据类型变更通过自动代码生成后统一为java String,对API使用没有影响。

接下来再看详细的接口API变更,接口定义在IMetaStoreClient和RetryingMetaStoreClient。

其中RetryingMetaStoreClient作为代理实现,支持失败重试功能。

2.3相对2.1版本API变化

新增如下接口:

#IMetaStoreClient.java
List<String> getTables(String dbName, String tablePattern, TableType tableType)
throws MetaException, TException, UnknownDBException;
public PartitionValuesResponse listPartitionValues(PartitionValuesRequest request)
 throws MetaException, TException, NoSuchObjectException;
void alter_partition(String dbName, String tblName, Partition newPart)
 throws InvalidOperationException, MetaException, TException;

3.1相对2.3版本API变化

新增了Catalog相关的增删改查接口:

void createCatalog(Catalog catalog) 
    throws AlreadyExistsException, InvalidObjectException, MetaException, TException;


void alterCatalog(String catalogName, Catalog newCatalog)
    throws NoSuchObjectException, InvalidObjectException, MetaException, TException;


Catalog getCatalog(String catName) throws NoSuchObjectException, MetaException, TException;


List<String> getCatalogs() throws MetaException, TException;


void dropCatalog(String catName)
    throws NoSuchObjectException, InvalidOperationException, MetaException, TException;

跟Table、Database、Schema和Field相关的接口,并且涉及到Catalog的,在保留原来的接口上新增新的接口,跟之前的接口区别是新增了String catName参数:

List<String> getTables(String catName, String dbName, String tablePattern) 
    throws MetaException, TException, UnknownDBException;


List<TableMeta> getTableMeta(String catName, String dbPatterns, String tablePatterns,
                             List<String> tableTypes)
    throws MetaException, TException, UnknownDBException;


List<String> getAllTables(String catName, String dbName)
    throws MetaException, TException, UnknownDBException;


void dropDatabase(String catName, String dbName, boolean deleteData, boolean ignoreUnknownDb,
                  boolean cascade)
    throws NoSuchObjectException, InvalidOperationException, MetaException, TException;
    
void dropTable(String catName, String dbName, String tableName, boolean deleteData,
               boolean ignoreUnknownTable, boolean ifPurge)
  throws MetaException, NoSuchObjectException, TException;


default void dropTable(String catName, String dbName, String tableName, boolean deleteData,
                       boolean ignoreUnknownTable)
  throws MetaException, NoSuchObjectException, TException {


dropTable(catName, dbName, tableName, deleteData, ignoreUnknownTable, false);
}


default void dropTable(String catName, String dbName, String tableName)
    throws MetaException, NoSuchObjectException, TException {
  dropTable(catName, dbName, tableName, true, true, false);
}


List<FieldSchema> getFields(String catName, String db, String tableName)
    throws MetaException, TException, UnknownTableException,
    UnknownDBException;
    
List<FieldSchema> getSchema(String catName, String db, String tableName)
    throws MetaException, TException, UnknownTableException,
    UnknownDBException;

凡是跟分区相关的接口,在保留原来的接口基础上,都新增了一个,跟之前的区别是多了一个String catName 参数:

Partition appendPartition(String catName, String dbName, String tableName, List<String> partVals)
    throws InvalidObjectException, AlreadyExistsException, MetaException, TException;


Partition appendPartition(String catName, String dbName, String tableName, String name)
    throws InvalidObjectException, AlreadyExistsException, MetaException, TException;
    
Partition getPartition(String catName, String dbName, String tblName, List<String> partVals)
    throws NoSuchObjectException, MetaException, TException;


Partition getPartition(String catName, String dbName, String tblName, String name)
    throws MetaException, UnknownTableException, NoSuchObjectException, TException;
    
List<Partition> listPartitions(String catName, String db_name, String tbl_name, int max_parts)
    throws NoSuchObjectException, MetaException, TException;
...

从API可知,HMS3.x相比HMS2.x较大变化是新增了Catalog相关的操作,但是原有接口继续保留。另外,通过RetryingMetaStoreClient创建IMetaStoreClient实例的接口参数类型也发生了变化:从HiveConf变成Configuration。

如下是HMS 2.3.8中RetryingMetaStoreClient的实现:

#RetryingMetaStoreClient @2.3.8
public class RetryingMetaStoreClient implements InvocationHandler {
    public static IMetaStoreClient getProxy(
    HiveConf hiveConf, boolean allowEmbedded);
    public static IMetaStoreClient getProxy(HiveConf hiveConf, HiveMetaHookLoader hookLoader,
    String mscClassName);
    public static IMetaStoreClient getProxy(HiveConf hiveConf, HiveMetaHookLoader hookLoader,
    ConcurrentHashMap<String, Long> metaCallTimeMap, String mscClassName, boolean allowEmbedded);
    
    //This constructor is meant for Hive internal use only. Please use getProxy(HiveConf hiveConf, HiveMetaHookLoader hookLoader) for external purpose.
    public static IMetaStoreClient getProxy(HiveConf hiveConf, Class<?>[] constructorArgTypes,
    Object[] constructorArgs, String mscClassName);
    
    //This constructor is meant for Hive internal use only. Please use getProxy(HiveConf hiveConf, HiveMetaHookLoader hookLoader) for external purpose.
    public static IMetaStoreClient getProxy(HiveConf hiveConf, Class<?>[] constructorArgTypes,
    Object[] constructorArgs, ConcurrentHashMap<String, Long> metaCallTimeMap,
    String mscClassName);

这里是HMS 3.1.2 中RetryingMetaStoreClient的实现:

#RetryingMetaStoreClient @3.1.2
public static IMetaStoreClient getProxy(Configuration hiveConf, boolean allowEmbedded) throws MetaException {
    return getProxy(hiveConf, new Class[]{Configuration.class, HiveMetaHookLoader.class, Boolean.class}, new Object[]{hiveConf, null, allowEmbedded}, (ConcurrentHashMap)null, HiveMetaStoreClient.class.getName());
}


@VisibleForTesting
public static IMetaStoreClient getProxy(Configuration hiveConf, HiveMetaHookLoader hookLoader, String mscClassName) throws MetaException {
    return getProxy(hiveConf, hookLoader, (ConcurrentHashMap)null, mscClassName, true);
}


public static IMetaStoreClient getProxy(Configuration hiveConf, HiveMetaHookLoader hookLoader, ConcurrentHashMap<String, Long> metaCallTimeMap, String mscClassName, boolean allowEmbedded) throws MetaException {
    return getProxy(hiveConf, new Class[]{Configuration.class, HiveMetaHookLoader.class, Boolean.class}, new Object[]{hiveConf, hookLoader, allowEmbedded}, metaCallTimeMap, mscClassName);
}


public static IMetaStoreClient getProxy(Configuration hiveConf, Class<?>[] constructorArgTypes, Object[] constructorArgs, String mscClassName) throws MetaException {
    return getProxy(hiveConf, constructorArgTypes, constructorArgs, (ConcurrentHashMap)null, mscClassName);
}


public static IMetaStoreClient getProxy(Configuration hiveConf, Class<?>[] constructorArgTypes, Object[] constructorArgs, ConcurrentHashMap<String, Long> metaCallTimeMap, String mscClassName) throws MetaException {
    Class<? extends IMetaStoreClient> baseClass = JavaUtils.getClass(mscClassName, IMetaStoreClient.class);
    RetryingMetaStoreClient handler = new RetryingMetaStoreClient(hiveConf, constructorArgTypes, constructorArgs, metaCallTimeMap, baseClass);
    return (IMetaStoreClient)Proxy.newProxyInstance(RetryingMetaStoreClient.class.getClassLoader(), baseClass.getInterfaces(), handler);
}

实验与结论

1. Hive Metastore client (2.x )低版本可以正常访问高版本HMS Server(2.x,3.x );

2. Hive Metastore client (3.x)低版本可以正常访问高版本HMS Server(3.x);

3. Hive Metastore client (2.x)高版本只能访问低版本HMS Server(2.x)中原本存在的方法,不能访问新增的方法;

4. Hive Metastore client (3.x)高版本只能访问低版本HMS Server(3.x)中原本存在的方法,不能访问新增的方法;

5. Hive Metastore client (3.x)高版本只能访问低版本中部分原本存在的方法;

如:

Hive Metastore client 2.x查询 Hive Metastore Server 3.x 可以正常查询数据库列表信息,反过来无法查询数据库列表信息:

hiveMetaStoreClient.getCatalogs() #org.apache.thrift.TApplicationException: Invalid method name: 'get_catalogs'
hiveMetaStoreClient.getAllDatabases() # Empty Result

特别地,当Hive Metastore client 使用代理RetryingMetaStoreClient创建时,只能由同一个大版本来实例化,比如Hive Metastore client 2.x只能由 HMS 2.x来初始化,同样的Hive Metastore client 3.x只能由 HMS 3.x来初始化。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值