1、现象:
impala多次重复insert overwrite插入导致sql执行失败:
[*******-hadoop:21000] > insert overwrite table x_sell.fdl_xsell_info(uuid,phone_number,camp_offer_id,customer_name,repayment_date,repayment_total_amount,repayment_amount_one,repayment_amount_two,time_inst,time_upd) partition(dt = from_unixtime(unix_timestamp()-86400,'yyyyMMdd'),task_id = '170') select uuid,phone_number,concat('dwh_sb_withhold_01',cast(from_unixtime(unix_timestamp(),'yyyyMMdd') as string)) camp_offer_id,customer_name,repayment_date,msamount+currtermsnowballamt as repayment_total_amount,msamount,currtermsnowballamt,time_inst,time_upd from x_sell.collect_sms_ready_snowball where dt=from_unixtime(unix_timestamp()-86400,'yyyyMMdd') AND is_snowball_withhold='Y' and msamount>0 and currtermsnowballamt>0 limit 20;
Query: insert overwrite table x_sell.fdl_xsell_info(uuid,phone_number,camp_offer_id,customer_name,repayment_date,repayment_total_amount,repayment_amount_one,repayment_amount_two,time_inst,time_upd) partition(dt = from_unixtime(unix_timestamp()-86400,'yyyyMMdd'),task_id = '170') select uuid,phone_number,concat('dwh_sb_withhold_01',cast(from_unixtime(unix_timestamp(),'yyyyMMdd') as string)) camp_offer_id,customer_name,repayment_date,msamount+currtermsnowballamt as repayment_total_amount,msamount,currtermsnowballamt,time_inst,time_upd from x_sell.collect_sms_ready_snowball where dt=from_unixtime(unix_timestamp()-86400,'yyyyMMdd') AND is_snowball_withhold='Y' and msamount>0 and currtermsnowballamt>0 limit 20
ERROR: AnalysisException: Unable to INSERT into target table (x_sell.fdl_xsell_info) because Impala does not have WRITE access to at least one HDFS path: hdfs://*********-hadoop:8020/user/hive/warehouse/x_sell.db/fdl_xsell_info/dt=20170727/task_id=170
2、原因是多次重复插入导致分区目录权限丢失:
https://issues.apache.org/jira/browse/IMPALA-3297
https://issues.apache.org/jira/browse/IMPALA-3363
While using the insert_inherit_permissions option, Impala only applies the cached permissions to newly created folders. This is problematic when using insert overwrite, since even though new folders are created by Impala, the partitions which were already present are not treated as new and get default permissions.
3、屏蔽故障的方法(依次在impala的sql之前执行下面2条语句,修改你的表权限,重新加载元数据):
3.1 我是在python脚本中执行:os.system("hadoop fs -chmod -R 777 /user/hive/warehouse/x_sell.db/fdl_xsell_info")
3.2 在insert overwrite之前:cursor.execute("invalidate metadata x_sell.fdl_xsell_info;")