msck repair table tablename
修复表分区,常用于手动复制目录到hive表的location下,此时Hive元数据中没有记录到该目录是hive的分区,所以查不到该分区数据。
表test_tbl,存储路径为/user/hive/warehouse/test_tbl,分区字段为pt,正常添加分区是这样的:
alter table add if not exist partition (pt='20190101');
然后在hdfs中表的存储目录下会生成分区目录/user/hive/warehouse/test_tbl/pt=20190101,此时无论是往该目录新增数据文件还是往hive中insert数据,之后都可以在hive中查到该分区的数据。
但如果直接将一个文件夹名为pt=20190201的上传至hdfs的/user/hive/warehouse/test_tbl/路径下,在hive中是查不到分区pt='20190201'的数据的,就如前面所说,hive的metadata中没有记录到分区20190201这个分区,因此需要执行msck repair table命令修复。
此命令也常用于distcp数据之后修复表分区。
例子:
建表语句:
CREATE TABLE `dwa_w_mrt_e_two_time_length_report`(
`province_name` string COMMENT '省份',
`city_name` string COMMENT '地市',
`channel_name` string COMMENT '厅名称',
`channel_id` string COMMENT '厅编码',
`wait_time_avg` string COMMENT '平均等待时长',
`total_num_take` string COMMENT '取号总量',
`wait_time_30` string COMMENT '30≤时长-等待',
`wait_time_15` string COMMENT '30<时长≤15-等待',
`wait_time_10` string COMMENT '15<时长≤10-等待',
`wait_time_5` string COMMENT '10<时长≤5-等待',
`wait_time_4` string COMMENT '时长<5-等待',
`counter_time_avg` string COMMENT '平均临柜时长',
`total_num_call` string COMMENT '叫号总量',
`counter_time_30` string COMMENT '30≤时长-临柜',
`counter_time_15` string COMMENT '30<时长≤15-临柜',
`counter_time_10` string COMMENT '15<时长≤10-临柜',
`counter_time_5` string COMMENT '10<时长≤5-临柜',
`counter_time_4` string COMMENT '时长<5-临柜')
COMMENT '进厅用户双时长报表-日报表'
PARTITIONED BY (
`month_id` string COMMENT '月',
`day_id` string COMMENT '日')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'classification'='AL',
'createdate'='2020-09-26',
'creator'='王洪峰',
'cycle'='W',
'domain'='RES',
'email'='jc-wanghf2@chinaunicom.cn',
'entity'='RES',
'last_modified_by'='jc-zhangr',
'last_modified_time'='1596693791',
'layer'='dwa_db',
'process'='USE',
'transient_lastDdlTime'='1601195416')
数据库:zbg_dwa
表名:dwa_w_mrt_e_two_time_length_report
hdfs表路径:hdfs://nameservice1/inceptor1/user/hive/warehouse/zbg_dwa.db/jc_zb_caiji/dwa_w_mrt_e_two_time_length_report
本地路径:/home/jc_zb_caiji/dwa_w_mrt_e_two_time_length_report/month_id=202009/day_id=27
ip,用户,密码:10.161.17.118 jc_zb_caiji udme@2020
beeline登录语句:beeline -u jdbc:hive2://10.161.17.108:6666/default -n jc_zb_caiji -p udme@2020
操作流程:
1. hdfs dfs -put -f /home/jc_zb_caiji/dwa_w_mrt_e_two_time_length_report/month_id=202009/day_id=27 hdfs://nameservice1/inceptor1/user/hive/warehouse/zbg_dwa.db/jc_zb_caiji/dwa_w_mrt_e_two_time_length_report/month_id=202009
2. beeline -u jdbc:hive2://10.161.17.108:6666/default -n xxx -p xxx
3. use zbg_dwa;
4. msck repair table dwa_w_mrt_e_two_time_length_report;
查询语句:
select * from dwa_w_mrt_e_two_time_length_report where month_id='202009'and day_id='27' limit 10;