偶然的机会发现用其他用户重写impala用户下的分区数据时有些表报错,有些表不报错,重写报错信息(Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Exception when loading 1 in table)。都是用impala用户插入的分区数据有些HDFS文件权限是755,有些权限是777。最后发现是因为是否执行alter table add partition影响的。
测试如下:
首先创建一个分区表:
CREATE TABLE p3edwadm.tab_test (
id STRING,
name STRING
)
PARTITIONED BY (
date_timekey STRING,
hour STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS PARQUET
查看此时后台文件的权限是777(黏贴位可忽略,也可去除,对本实验不影响)
drwxrwxrwt - impala hive 0 2021-11-09 10:13 /user/hive/warehouse/p3edwadm.db/tab_test
不手动添加分区,直接插入分区数据,发现文件权限是755
INSERT OVERWRITE TABLE test.tab_test
PARTITION(date_timekey,hour)
select '1','ee','20211101','202111010730'
hdfs dfs -ls /user/hive/warehouse/test.db/tab_test
Found 2 items
drwxrwxrwt - impala hive 0 2021-11-09 10:43 /user/hive/warehouse/test.db/tab_test/_impala_insert_staging
drwxr-xr-x - impala hive 0 2021-11-09 10:43 /user/hive/warehouse/test.db/tab_test/date_timekey=20211101
手动添加分区之后再插入分区数据,发现文件权限就是777:
Alter table test.tab_test add if not exists partition (date_timekey='20211102',hour='202111020730') ;
INSERT OVERWRITE TABLE test.tab_test
PARTITION(date_timekey,hour)
select '1','ee','20211102','202111020730'
hdfs dfs -ls /user/hive/warehouse/test.db/tab_test
Found 3 items
drwxrwxrwt - impala hive 0 2021-11-09 10:45 /user/hive/warehouse/test.db/tab_test/_impala_insert_staging
drwxr-xr-x - impala hive 0 2021-11-09 10:43 /user/hive/warehouse/test.db/tab_test/date_timekey=20211101
drwxrwxrwt - impala hive 0 2021-11-09 10:45 /user/hive/warehouse/test.db/tab_test/date_timekey=20211102
结论:
在impala新增分区数据的时候最好手动添加分区后再插入分区数据,以免后续其他用户操作报错。
历史数据可用如下命令进行修改:
hdfs dfs -chmod 777 /user/hive/warehouse/test.db/tab_test/*