Apache Paimon Hive引擎解析

最新推荐文章于 2025-01-14 11:33:31 发布

猫猫爱吃小鱼粮

最新推荐文章于 2025-01-14 11:33:31 发布

阅读量840

点赞数 3

分类专栏： Paimon 文章标签： apache hive hadoop

本文链接：https://blog.csdn.net/m0_50186249/article/details/136389540

版权

Paimon 专栏收录该内容

22 篇文章

订阅专栏

HIve 引擎

Paimon 当前支持 Hive 的 3.1, 2.3, 2.2, 2.1 和 2.1-cdh-6.3 版本。

1.执行引擎

当使用Hive Read数据时，Paimon 支持 MR 和 Tez 引擎, 当使用Hive Write数据时，Paimon支持MR引擎，如果使用 beeline，需要重启hive cluster。

2.安装

Version           Jar
Hive 3.1	        paimon-hive-connector-3.1-0.7.0-incubating.jar
Hive 2.3	        paimon-hive-connector-2.3-0.7.0-incubating.jar
Hive 2.2	        paimon-hive-connector-2.2-0.7.0-incubating.jar
Hive 2.1	        paimon-hive-connector-2.1-0.7.0-incubating.jar
Hive 2.1-cdh-6.3	paimon-hive-connector-2.1-cdh-6.3-0.7.0-incubating.jar

将Jar添加到Hive中有以下方法：

在Hive的根目录下创建一个auxlib文件夹，并将paimon-hive-connector-0.7.0-incubating.jar复制到auxlib中。
将jar复制到Hive可访问的路径，然后使用add jar /path/to/paimon-hive-connector-0.7.0-incubating.jar在Hive中启用paimon支持，注意，不建议使用这种方法，如果使用MR引擎运行Join语句，可能报错org.apache.hive.com.esotericsoftware.kryo.kryoexception: unable to find class。

注意：

如果使用HDFS，请确保设置了环境变量HADOOP_HOME或HADOOP_CONF_DIR。
如果使用hive cbo优化器，可能会导致一些不正确的查询结果，例如使用not null谓词查询struct类型，可以通过set hive.cbo.enable=false;命令禁用cbo优化器。

3.Flink SQL: 使用 Paimon Hive Catalog

使用paimon Hive catalog，可以通过Flink create，drop，select 和 insert 到 paimon 表中，这些操作直接影响对应的Hive metastore，以这种方式创建的表可以直接从Hive访问。

步骤一：

准备Flink Hive Connector Bundled Jar。

步骤二：

在Flink SQL Client中执行以下Flink SQL脚本，以定义Paimon Hive catalog并创建Table。

-- Flink SQL CLI
-- Define paimon Hive catalog

CREATE CATALOG my_hive WITH (
  'type' = 'paimon',
  'metastore' = 'hive',
  -- 'uri' = 'thrift://<hive-metastore-host-name>:<port>', default use 'hive.metastore.uris' in HiveConf
  -- 'hive-conf-dir' = '...', this is recommended in the kerberos environment
  -- 'hadoop-conf-dir' = '...', this is recommended in the kerberos environment
  -- 'warehouse' = 'hdfs:///path/to/table/store/warehouse', default use 'hive.metastore.warehouse.dir' in HiveConf
);

-- Use paimon Hive catalog

USE CATALOG my_hive;

-- Create a table in paimon Hive catalog (use "default" database by default)

CREATE TABLE test_table (
  a int,
  b string
);

-- Insert records into test table

INSERT INTO test_table VALUES (1, 'Table'), (2, 'Store');

-- Read records from test table

SELECT * FROM test_table;

/*
+---+-------+
| a |     b |
+---+-------+
| 1 | Table |
| 2 | Store |
+---+-------+
*/

4.Hive SQL: 访问已经在 Hive metastore 中的 Paimon Tables

在Hive CLI中运行以下Hive SQL访问创建的表。

-- Assume that paimon-hive-connector-<hive-version>-0.7.0-incubating.jar is already in auxlib directory.
-- List tables in Hive
-- (you might need to switch to "default" database if you're not there by default)

SHOW TABLES;

/*
OK
test_table
*/

-- Read records from test_table

SELECT a, b FROM test_table ORDER BY a;

/*
OK
1	Table
2	Store
*/

-- Insert records into test table
-- Note tez engine does not support hive write, only the hive engine is supported.

INSERT INTO test_table VALUES (3, 'Paimon');

SELECT a, b FROM test_table ORDER BY a;

/*
OK
1	Table
2	Store
3	Paimon
*/

-- time travel

SET paimon.scan.snapshot-id=1;
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1	Table
2	Store
3	Paimon
*/
SET paimon.scan.snapshot-id=null;

5.Hive SQL: 创建新的 Paimon Tables

在Hive CLI中运行以下Hive SQL创建新的paimon表。

-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's create a new paimon table.

SET hive.metastore.warehouse.dir=warehouse_path;

CREATE TABLE hive_test_table(
    a INT COMMENT 'The a field',
    b STRING COMMENT 'The b field'
)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';

6.Hive SQL: 通过外部表访问 Paimon 的表

在Hive中将paimon表注册为外部表，在Hive CLI中运行以下Hive SQL来访问。

-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's use the test_table created in the above section.
-- To create an external table, you don't need to specify any column or table properties.
-- Pointing the location to the path of table is enough.

CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION '/path/to/table/store/warehouse/default.db/test_table';
    
-- In addition to the way setting location above, you can also place the location setting in TBProperties
-- to avoid Hive accessing Paimon's location through its own file system when creating tables.
-- This method is effective in scenarios using Object storage,such as s3.

CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
TBLPROPERTIES (
 'paimon_location' ='s3://xxxxx/path/to/table/store/warehouse/default.db/test_table'
);

-- Read records from external_test_table

SELECT a, b FROM external_test_table ORDER BY a;

/*
OK
1	Table
2	Store
*/

-- Insert records into test table

INSERT INTO external_test_table VALUES (3, 'Paimon');

SELECT a, b FROM external_test_table ORDER BY a;

/*
OK
1	Table
2	Store
3	Paimon
*/

7.Hive 和 Paimon 的类型映射

列出了Hive和Paimon之间所有支持的类型转换，Hive的所有数据类型都可以在org.apache.hadoop.hive.serde2.typeinfo中找到。

Hive Data Type	Paimon Data Type	Atomic Type
`StructTypeInfo`	`RowType`	false
`MapTypeInfo`	`MapType`	false
`ListTypeInfo`	`ArrayType`	false
`PrimitiveTypeInfo("boolean")`	`BooleanType`	true
`PrimitiveTypeInfo("tinyint")`	`TinyIntType`	true
`PrimitiveTypeInfo("smallint")`	`SmallIntType`	true
`PrimitiveTypeInfo("int")`	`IntType`	true
`PrimitiveTypeInfo("bigint")`	`BigIntType`	true
`PrimitiveTypeInfo("float")`	`FloatType`	true
`PrimitiveTypeInfo("double")`	`DoubleType`	true
`CharTypeInfo(length)`	`CharType(length)`	true
`PrimitiveTypeInfo("string")`	`VarCharType(VarCharType.MAX_LENGTH)`	true
`VarcharTypeInfo(length)`	`VarCharType(length), length is less than VarCharType.MAX_LENGTH`	true
`PrimitiveTypeInfo("date")`	`DateType`	true
`PrimitiveTypeInfo("timestamp")`	`TimestampType`	true
`DecimalTypeInfo(precision, scale)`	`DecimalType(precision, scale)`	true
`PrimitiveTypeInfo("binary")`	`VarBinaryType`, `BinaryType`	true