Iceberg 整合 Hive
把iceberg连接hive的包拷贝到hive的lib目录及auxlib目录
cp /home/bonc/iceberg-hive-runtime-0.13.2.jar /opt/cloudera/parcels/CDH/lib/hive/auxlib
cp /home/bonc/iceberg-hive-runtime-0.13.2.jar /opt/cloudera/parcels/CDH/lib/hive/lib/
1. 开启Iceberg的支持
hive> add jar /home/bonc/iceberg-hive-runtime-0.13.2.jar;
hive> set iceberg.engine.hive.enabled=true;
也可以配置到hive-site.xml
<property>
<name>iceberg.engine.hive.enabled</name>
<value>true</value>
<description>Hive是否开启Iceberg的支持</description>
</property>
2. Catalog管理
Hive本身没有Catalog的概念,但是Iceberg有Catalog。所以Hive将Catalog的信息用键值对的属性来实现,这样建表的时候就可以直接使用创建的Catalog
Hive集成Iceberg支持Hive Catalog和Hadoop Catalog
2.1 创建Hive Catalog
set iceberg.catalog.hive_catalog.type=hive;
set iceberg.catalog.hive_catalog.uri=thrift://hive1:9083;
set iceberg.catalog.hive_catalog.clients=5;
set iceberg.catalog.hive_catalog.warehouse=hdfs://cdh01:8020/user/iceberg/hive_catalog;
2.2 创建Hadoop Catalog
hive> set iceberg.engine.hive.enabled=true;
hive> set iceberg.catalog.hadoop_catalog.type=hadoop;
hive> set iceberg.catalog.hadoop_catalog.warehouse=hdfs://cdh01:8020/user/iceberg/hadoop_catalog;
3. 数据库的创建
3.1 Hive Catalog下的数据库
对于其它系统将该Hive作为Catalog,创建的数据库,则可以直接使用该数据库,而不用创建。因为Hive和Iceberg的数据库能直接对应上
3.2 Hadoop Catalog下的数据库
因为Hive没有Catalog的概念,所以不能通过上面的方式创建的Catalog自动发现数据库。所以需要创建Hive数据库和Iceberg的数据库对应。例如下面:
hive> create schema iceberg_db location 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db';
4. 表的删除和创建
外部表
对于已经通过其它系统创建的Iceberg表,可以通过在Hive中,创建外部表,来读写Iceberg表
4.1 Hive Catalog下的表
对于其它系统将该Hive作为Catalog,创建的数据库表,则可以直接使用该表,而不用创建。因为Hive和Iceberg的表能直接对应上
4.2 Hadoop Catalog下的表
创建Hive的表和Iceberg的表对应上。查询的数据结果和Iceberg中的表结果一样
hive> create external table iceberg_db.t_iceberg_sample_1(
id bigint, data string
)
stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/t_iceberg_sample_1'
tblproperties('iceberg.catalog'='hadoop_catalog');
如果创建表,不指定iceberg.catalog表属性,则默认使用Hive Catalog,元数据储存到当前Hive的元数据位置。不指定存储位置,则表数据储存到当前Hive的warehouse中。
4.3 create table
可以通过Hive直接创建Iceberg表。默认的iceberg.catalog是Hive Catalog
hive> create table iceberg_db.student(
id bigint,
name string
) partitioned by (birthday date, country string)
stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
hive> show create table iceberg_db.student;
show create table iceberg_db.student
OK
CREATE TABLE `iceberg_db.student`(
`id` bigint COMMENT 'from deserializer',
`name` string COMMENT 'from deserializer',
`birthday` date COMMENT 'from deserializer',
`country` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.iceberg.mr.hive.HiveIcebergSerDe'
STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION
'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/student'
TBLPROPERTIES (
'engine.hive.enabled'='true',
'external.table.purge'='TRUE',
'metadata_location'='hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/student/metadata/00000-2f2f2315-7b25-49a1-a89f-6dc268e3ae26.metadata.json',
'table_type'='ICEBERG',
'transient_lastDdlTime'='1657524217',
'uuid'='25b91543-6023-42db-b0e7-b6e4ac88ac53')
Time taken: 0.419 seconds, Fetched: 19 row(s)
查看HDFS路径如下。也会有Iceberg表的metadata元数据
删除表,再去hdfs目录下查看,就不存在student表目录了
hive> drop table if exists iceberg_db.student;
drop table iceberg_db.student
OK
Time taken: 0.279 seconds
创建表,指定iceberg.catalog
hive> create table iceberg_db.employee(
id bigint,
name string
) partitioned by (birthday date, country string)
stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
location 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee'
tblproperties('iceberg.catalog'='hadoop_catalog');
虽然Iceberg的表是分区表,但是查看Hive表结构是看不到分区信息的。且目前不支持计算列作为分区列
hive> show create table iceberg_db.employee;
OK
CREATE TABLE `iceberg_db.employee`(
`id` bigint COMMENT 'from deserializer',
`name` string COMMENT 'from deserializer',
`birthday` date COMMENT 'from deserializer',
`country` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.iceberg.mr.hive.HiveIcebergSerDe'
STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION
'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee'
TBLPROPERTIES (
'engine.hive.enabled'='true',
'external.table.purge'='TRUE',
'iceberg.catalog'='hadoop_catalog',
'metadata_location'='hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee/metadata/00000-c8bc7b63-1db4-4380-aa4f-435f11b2f2da.metadata.json',
'table_type'='ICEBERG',
'transient_lastDdlTime'='1657524915',
'uuid'='78bf3e9a-ab8d-4487-9674-96cf9af89919')
Time taken: 0.343 seconds, Fetched: 20 row(s)
插入数据
hive> insert into iceberg_db.student(id, name, birthday, country)
values(1, 'zhang_san', null, 'china'),
(2, 'zhang_san', null, 'china');
查询数据
select * from iceberg_db.student;