现有hdfs路径hadoop fs -du -h /user/portal/ODM/push/pushcatch_data_collect/
路径下有每天分区
284.2 K /user/portal/ODM/push/pushcatch_data_collect/2018-12-18
158.8 K /user/portal/ODM/push/pushcatch_data_collect/2018-12-19
现有两种建表方法,
1.建表语句中包含分区(通常是时间分区), 建表完成后, 需手动插入对应的分区;
alter table tmp.test1 add if not exists partition (day = 20181201) location ‘/user/portal/ODM/push/pushcatch_data_collect/2018-12-18’;
2.不插入时间分区, 直接建表
两者都能取到数据;
注意时间的选择
concat(substr(insertTime,1,4),substr(insertTime,6,2),substr(insertTime,9,2)) = ‘${ts}’
###############
数据源是lzo压缩格式, hive建表时候注意建表语句;
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop